Rewards¶
The rewards of the human agents is defined as the negative of their travel time. The reward of the autonomous vehicles (AVs) can vary depending on their specific behavior.
Defining Automated Vehicles Behavior Through Reward Formulations¶
As described in the paper, the reward function enforces a selected behavior on the agent. For an agent k with behavioral parameters φₖ ∈ ℝ⁴, the reward is defined as:
where Tₖ is a vector of travel time statistics provided to agent k, containing:
Own Travel Time (\(T_{\text{own}, k}\)): The amount of time the agent has spent in traffic.
Group Travel Time (\(T_{\text{group}, k}\)): The average travel time of agents in the same group (e.g., AVs for an AV agent).
Other Group Travel Time (\(T_{\text{other}, k}\)): The average travel time of agents in other groups (e.g., humans for an AV agent).
System-wide Travel Time (\(T_{\text{all}, k}\)): The average travel time of all agents in the traffic network.
Behavioral Strategies & Objective Weightings¶
Behavior |
ϕ₁ |
ϕ₂ |
ϕ₃ |
ϕ₄ |
Interpretation |
---|---|---|---|---|---|
Altruistic |
0 |
0 |
0 |
1 |
Minimize delay for everyone |
Collaborative |
0.5 |
0.5 |
0 |
0 |
Minimize delay for oneself and one’s own group |
Competitive |
2 |
0 |
-1 |
0 |
Minimize self-delay & maximize delay for others |
Malicious |
0 |
0 |
-1 |
0 |
Maximize delay for the other group |
Selfish |
1 |
0 |
0 |
0 |
Minimize delay for oneself |
Social |
0.5 |
0 |
0 |
0.5 |
Minimize delay for oneself & everyone |