RouteRL Quickstart

We simulate a simple network topology where humans and later AVs make routing decisions to maximize their rewards (i.e., minimize travel times) over a sequence of days.

  • For the first 100 days, we model a human-driven system, where drivers update their routing policies using behavioral models to optimize rewards.

  • Each day, we simulate the impact of joint actions using the SUMO traffic simulator, which returns the reward for each agent.

  • After 100 days, we introduce 10 Autononmous Vehicles (AVs) as Petting Zoo agents, allowing them to use any MARL algorithm to maximise rewards. In this tutorial, we use a trained policy from the Independent Deep Q-Learning (IDQN) algorithm.

  • Finally, we analyse basic results from the simulation.


  • Establishing the Connection with SUMO

  • Initializing the Traffic Environment

    • Define the TrafficEnvironment, which initializes human agents and generates the routes agents will travel within the network.

  • Training Human Agents

    • Train human-driven vehicles to navigate the environment efficiently using human behavioural models from transportation research.

  • Introducing Autonomous Vehicles (AVs)

    • Transform a subset of human agents into AVs.

    • AVs select their routes using a pre-trained policy based on the IDQN algorithm.

  • Analyzing the Impact of AVs

    • Evaluate the effects of AV introduction on human travel time, congestion, and CO₂ emissions.

    • Demonstrate how AV deployment can potentially increase travel delays and environmental impact.

In this repository we don’t recommend adjusting the number of agents, because the policy is trained for 10 AV agents.

Two-route network

Two-route network

Import libraries

import sys
import os
import pandas as pd
import torch
from torchrl.envs.libs.pettingzoo import PettingZooWrapper


sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '../../')))

from routerl import TrafficEnvironment

Define hyperparameters

Users can customize parameters for the TrafficEnvironment class by consulting the routerl/environment/params.json file. Based on its contents, they can create a dictionary with their preferred settings and pass it as an argument to the TrafficEnvironment class.

human_learning_episodes = 100


env_params = {
    "agent_parameters" : {
        "num_agents" : 100,
        "new_machines_after_mutation": 10, # the number of human agents that will mutate to AVs
        "human_parameters" : {
            "model" : "gawron"
        },
        "machine_parameters" :
        {
            "behavior" : "selfish",
        }
    },
    "simulator_parameters" : {
        "network_name" : "two_route_yield"
    },  
    "plotter_parameters" : {
        "phases" : [0, human_learning_episodes], # the number of episodes human learning will take
    },
}

Environment initialization

In our setup, road networks initially consist of human agents, with AVs introduced later.

  • The TrafficEnvironment environment is firstly initialized.

  • The traffic network is instantiated and the paths between designated origin and destination points are determined.

  • The drivers/agents objects are created.

env = TrafficEnvironment(seed=42, **env_params)
[CONFIRMED] Environment variable exists: SUMO_HOME
[SUCCESS] Added module directory: C:\Program Files (x86)\Eclipse\Sumo\tools

Available paths create using the Janux framework.

print("Number of total agents is: ", len(env.all_agents), "\n")
print("Number of human agents is: ", len(env.human_agents), "\n")
print("Number of machine agents (autonomous vehicles) is: ", len(env.machine_agents), "\n")
Number of total agents is:  100 

Number of human agents is:  100 

Number of machine agents (autonomous vehicles) is:  0 

Reset the environment and the connection with SUMO

env.start()

Human learning

for episode in range(human_learning_episodes):
    env.step() # all the human agents execute an action in the environment

Average travel time of human agents during their training process.

Show the initial .csv file saved that contain the information about the agents available in the system.

df = pd.read_csv("training_records/episodes/ep1.csv")
df
travel_time id kind action origin destination start_time reward reward_right cost_table
0 3.483333 0 Human 1 0 0 99 -3.483333 -3.483333 0.2217422606191504,-1.514644828413727
1 1.200000 1 Human 1 0 0 58 -1.200000 -1.200000 0.2217422606191504,-0.3729781617470602
2 4.550000 2 Human 1 0 0 112 -4.550000 -4.550000 0.2217422606191504,-2.0479781617470603
3 4.916667 3 Human 1 0 0 118 -4.916667 -4.916667 0.2217422606191504,-2.231311495080394
4 0.933333 4 Human 1 0 0 31 -0.933333 -0.933333 0.2217422606191504,-0.23964482841372692
... ... ... ... ... ... ... ... ... ... ...
95 1.066667 95 Human 1 0 0 46 -1.066667 -1.066667 0.2217422606191504,-0.30631149508039357
96 1.083333 96 Human 1 0 0 50 -1.083333 -1.083333 0.2217422606191504,-0.3146448284137269
97 1.216667 97 Human 1 0 0 60 -1.216667 -1.216667 0.2217422606191504,-0.3813114950803935
98 3.566667 98 Human 1 0 0 101 -3.566667 -3.566667 0.2217422606191504,-1.5563114950803936
99 1.366667 99 Human 1 0 0 62 -1.366667 -1.366667 0.2217422606191504,-0.4563114950803936

100 rows × 10 columns

Mutation

Mutation: a portion of human agents are converted into machine agents (autonomous vehicles).

env.mutation()
print("Number of total agents is: ", len(env.all_agents), "\n")
print("Number of human agents is: ", len(env.human_agents), "\n")
print("Number of machine agents (autonomous vehicles) is: ", len(env.machine_agents), "\n")
Number of total agents is:  100 

Number of human agents is:  90 

Number of machine agents (autonomous vehicles) is:  10 
env.machine_agents
[Machine 1,
 Machine 15,
 Machine 10,
 Machine 91,
 Machine 22,
 Machine 73,
 Machine 5,
 Machine 52,
 Machine 81,
 Machine 77]

In order to employ the TorchRL library in our environment we need to use their PettingZooWrapper function.

group = {'agents': [str(machine.id) for machine in env.machine_agents]}

env = PettingZooWrapper(
    env=env,
    use_mask=True,
    categorical_actions=True,
    done_on_any = False,
    group_map=group,
)

Use an already trained policy using the Independent Deep Q-Learning algorith.

qnet_explore = torch.load("trained_policy.pt")

Human and AV agents interact with the environment over multiple episodes, with AVs following a trained policy.

num_test_episodes = 100

for episode in range(num_test_episodes): # run rollous in the environment using the already trained policy
    env.rollout(len(env.machine_agents), policy=qnet_explore)

Show the first .csv file saved after the mutation that contains the information about the agents available in the system after the mutation.

df = pd.read_csv("training_records/episodes/ep101.csv")
df
travel_time id kind action origin destination start_time reward reward_right cost_table
0 0.700000 1 AV 0 0 0 58 -0.700000 NaN 0,0
1 1.100000 15 AV 0 0 0 64 -1.100000 NaN 0,0
2 3.783333 10 AV 0 0 0 116 -3.783333 NaN 0,0
3 2.383333 91 AV 1 0 0 87 -2.383333 NaN 0,0
4 3.900000 22 AV 0 0 0 126 -3.900000 NaN 0,0
... ... ... ... ... ... ... ... ... ... ...
95 0.533333 95 Human 0 0 0 46 -0.533333 -0.583333 -0.5833333333333333,-0.6864890808735301
96 0.533333 96 Human 0 0 0 50 -0.533333 -0.583333 -0.5833333333333333,-0.6989890808735301
97 0.883333 97 Human 0 0 0 60 -0.883333 -0.683333 -0.6833333333333333,-0.79898908087353
98 2.983333 98 Human 0 0 0 101 -2.983333 -2.916667 -2.916666666666667,-3.036205603551716
99 0.966667 99 Human 0 0 0 62 -0.966667 -0.816667 -0.8166666666666667,-0.9114890808735301

100 rows × 10 columns

Plot results

This will be shown in the \plots folder.

env.plot_results()

The results highlight a critical challenge in AV deployment: rather than improving traffic flow, AVs may increase travel time for human drivers. This suggests potential inefficiencies in mixed traffic conditions due to differences in driving behavior. Understanding these effects is essential for designing better reinforcement learning strategies, informing policymakers, and optimizing AV integration to prevent increased congestion and CO₂ emissions.

Action shifts of human and AV agents

Action shifts of all vehicles in the network

Interrupt the connection with SUMO.

env.stop_simulation()