Rewards

The rewards of the human agents is defined as the negative of their travel time. The reward of the autonomous vehicles (AVs) can vary depending on their specific behavior.

Defining Automated Vehicles Behavior Through Reward Formulations

As described in the paper, the reward function enforces a selected behavior on the agent. For an agent k with behavioral parameters φₖ ∈ ℝ⁴, the reward is defined as:

\[ r_k = \varphi_{k1} \cdot T_{\text{own}, k} + \varphi_{k2} \cdot T_{\text{group}, k} + \varphi_{k3} \cdot T_{\text{other}, k} + \varphi_{k4} \cdot T_{\text{all}, k} \]

where Tₖ is a vector of travel time statistics provided to agent k, containing:

  • Own Travel Time (\(T_{\text{own}, k}\)): The amount of time the agent has spent in traffic.

  • Group Travel Time (\(T_{\text{group}, k}\)): The average travel time of agents in the same group (e.g., AVs for an AV agent).

  • Other Group Travel Time (\(T_{\text{other}, k}\)): The average travel time of agents in other groups (e.g., humans for an AV agent).

  • System-wide Travel Time (\(T_{\text{all}, k}\)): The average travel time of all agents in the traffic network.


Behavioral Strategies & Objective Weightings

Behavior

ϕ₁

ϕ₂

ϕ₃

ϕ₄

Interpretation

Altruistic

0

0

0

1

Minimize delay for everyone

Collaborative

0.5

0.5

0

0

Minimize delay for oneself and one’s own group

Competitive

2

0

-1

0

Minimize self-delay & maximize delay for others

Malicious

0

0

-1

0

Maximize delay for the other group

Selfish

1

0

0

0

Minimize delay for oneself

Social

0.5

0

0

0.5

Minimize delay for oneself & everyone