PettingZoo environment¶

class routerl.environment.TrafficEnvironment(seed: int = 23423, create_agents: bool = True, create_paths: bool = True, **kwargs)[source]

A PettingZoo AECEnv interface for optimal route choice using SUMO simulator. This environment is designed for the training of human agents (rational decision-makers) and machine agents (reinforcement learning agents).

See SUMO for details on SUMO.

See PettingZoo for details on PettingZoo.

Note

Users can configure the experiment with keyword arguments, see the structure below. Moreover, users can provide custom demand data in training_records/agents.csv. You can refer to the structure of such a file here.

Parameters:

seed (int, optional) – Random seed for reproducibility. Defaults to 23423.
create_agents (bool, optional) – Whether to create agent data. Defaults to True.
create_paths (bool, optional) – Whether to generate paths. Defaults to True.
**kwargs (dict, optional) – User-defined parameter overrides. These override default values from params.json and allow experiment configuration.

Keyword arguments (see the usage below):

agent_parameters (dict, optional):
Agent settings.

num_agents (int, default=100):
Total number of agents.

new_machines_after_mutation (int, default=25):
Number of humans converted to machines.

machine_parameters (dict):
Machine agent settings.

behavior (str, default=”selfish”):
Route choice behavior. Options: selfish, social, altruistic, malicious, competitive, collaborative.

observed_span (int, default=300):
Time window considered for observations.

observation_type (str, default=”previous_agents_plus_start_time”):
Type of observation. Options: previous_agents, previous_agents_plus_start_time.

human_parameters (dict):
Human agent settings.

model (str, default=”culo”):
Decision-making model (options: gawron, culo, w_avg).

alpha_j (float, default=0.5):
Cost expectation coefficient (0-1 range).

alpha_zero (float, default=0.5):
Sensitivity to new experiences.

beta (float, default=-1.5):
Decision randomness parameter.

beta_randomness (float, default=0.1):
Variability in beta among the human population.

remember (int, default=3):
Number of past experiences retained.

environment_parameters (dict, optional):
Environment settings.

number_of_days (int, default=1):
Number of days in the scenario.

simulator_parameters (dict, optional):
SUMO simulator settings.

network_name (str, default=”csomor”):
Network name (e.g., arterial, cologne, grid).

simulation_timesteps (int, default=180):
Total simulation time in seconds.

sumo_type (str, default=”sumo”):
SUMO execution mode (sumo or sumo-gui).

path_generation_parameters (dict, optional):
Path generation settings.

number_of_paths (int, default=3):
Number of routes per OD.

beta (float, default=-3.0):
Sensitivity to travel time in path generation.

weight (str, default=”time”):
Optimization criterion.

num_samples (int, default=100):
Number of samples for path generation.

origins (str | list[str], default=”default”):
Origin points from the network. (e.g., ["-25166682#0", "-4936412"])

destinations (str | list[str], default=”default”):
Destination points from the network. (e.g., ["-115604057#1", "-279952229#4"])

plotter_parameters (dict, optional):
Plotting & logging settings.

records_folder (str, default=”training_records”):
Directory for training records.

plots_folder (str, default=”plots”):
Directory for plots.

smooth_by (int, default=50):
Smoothing parameter for plots.

phases (list[int], default=[0, 100]):
X-axis positions for phase markers.

phase_names (list[str], default=[“Human learning”, “Mutation - Machine learning”]):
Phase names for labeling phase markers.

Usage:

Case 1

% Your file structure in the beginning
project_directory/
|-- your_script.py

>>> # Environment initialization
... env = TrafficEnvironment(
...     seed=42,
...     agent_parameters={
...         "num_agents": 5,
...         "new_machines_after_mutation": 1,
...         "machine_parameters": {
...             "behavior": "selfish"
...             }},
...     simulator_parameters={"sumo_type": "sumo-gui"},
...     path_generation_parameters={"number_of_paths": 2}
... )

% File structure after the initialization:
project_directory/
|-- your_script.py
|-- training_records/
|   |-- agents.csv
|   |-- paths.csv
|   |-- detector/
|   |   |--             % to be populated during simulation
|   |-- episodes/
|   |   |--             % to be populated during simulation
|-- plots/
|   |-- 0_0.png
|   |-- ...             % visuals of generated paths for each OD
|   |-- ...             % to be populated after the experiment

Case 2

% Your file structure in the beginning
project_directory/
|-- your_script.py
|-- training_records/
|   |-- agents.csv      % your custom demand, conforming to the structure

Warning

Demand data in agents.csv should be aligned with the specified experiment settings (e.g., number of agents, number of origins and destinations, etc.).

>>> env = TrafficEnvironment(
...     create_agents=False, # Environment will use your agent data
...     agent_parameters={
...         "new_machines_after_mutation": 10,
...         "machine_parameters": {
...             "behavior": "selfish"
...             }},
...     simulator_parameters={"network_name": "arterial"},
...     path_generation_parameters={"number_of_paths": 3}
... )

% File structure after the initialization:
project_directory/
|-- your_script.py
|-- training_records/
|   |-- agents.csv      % stays the same, used for agent generation
|   |-- paths.csv
|   |-- detector/
|   |   |--             % to be populated during simulation
|   |-- episodes/
|   |   |--             % to be populated during simulation
|-- plots/
|   |-- 0_0.png
|   |-- ...             % visuals of generated paths for each OD
|   |-- ...             % to be populated after the experiment

Warning

Same approach does not translate to path generation.

paths.csv is mainly used for visualization purposes. For SUMO to operate correctly, a route.rou.xml should be generated inside the routerl/networks/<net_name>/ folder.

It is advised to generate paths in each experiment providing a random seed, or set create_paths=False only when above criteria is met.

Variables:

day (int) – Current day index in the simulation.
human_learning (bool) – Whether human agents are learning.
number_of_days (int) – Number of days to simulate.
action_space_size (int) – Size of the action space.
recorder (Recorder) – Object for recording simulation data.
simulator (SumoSimulator) – SUMO simulator instance.
all_agents (list) – List of all agent objects.
machine_agents (list) – List of all machine agent objects.
human_agents (list) – List of all human agent objects.

action_space(agent: str)[source]

Method that returns the action space of the agent.

Parameters:: agent (str) – The agent name.
Returns:: self._action_spaces[agent] (Any) – The action space of the agent.

close() → None[source]

Not implemented.

Returns:: None

get_free_flow_times() → dict[source]

Retrieve free flow times for all origin-destination pairs from the simulator paths data.

Returns:: ff_dict (dict) – A dictionary where keys are tuples of origin and destination, and values are lists of free flow times.

get_observation() → tuple[source]

Retrieve the current observation from the simulator.

This method returns the current timestep of the simulation and the values of the episode actions.

Returns:: tuple – A tuple containing the current timestep and the episode actions.

get_observation_function() → Observations[source]

Returns an observation object based on the provided parameters.

Returns:: Observations – An observation object.
Raises:: ValueError – If model is unknown.

mutation(disable_human_learning: bool = True) → None[source]

Perform mutation by converting selected human agents into machine agents.

This method identifies a subset of human agents that start after the 25th percentile of start times of other vehicles, removes a specified number of these agents, and replaces them with machine agents.

Parameters:: disable_human_learning (bool, optional) – Boolean flag to disable human agents.
Returns:: None
Raises:: ValueError – If there are insufficient human agents available for mutation.

observation_space(agent: str)[source]

Method that returns the observation space of the agent.

Parameters:: agent (str) – The agent name.
Returns:: self._observation_spaces[agent] (Any) – The observation space of the agent.

observe(agent: str) → ndarray[source]

Retrieve the observations for a specific agent.

Parameters:: agent (str) – The identifier for the agent whose observations are to be retrieved.
Returns:: self.observation_obj.agent_observations (agent) (np.ndarray) – The observations for the specified agent.

plot_results() → None[source]

Method that plot the results of the simulation.

Returns:: None

reset(seed: int = None, options: dict = None) → tuple[source]

Resets the environment.

Parameters:

seed (int, optional) – Seed for random number generation. Defaults to None.
options (dict, optional) – Additional options for resetting the environment. Defaults to None.

Returns:

observations (dict) – observations.
infos (dict) – dictionary of information for the agents.

simulation_loop(machine_action: int, machine_id: int) → None[source]

This function contains the integration of the agent’s actions to SUMO.

We iterate through all the time steps of the simulation. For each timestep there are none, one or more than one agents type (humans, machines) that start. If more than one machine agents have the same start time, we break from this function because we need to take the agent’s action from the STEP function.

Parameters:

machine_action (int) – The id of the machine agent whose action is to be performed.
machine_id (int) – The id of the machine agent whose action is to be performed.

Returns:

None

start() → None[source]

Start the connection with SUMO.

Returns:: None

step(machine_action: int = None) → None[source]

Step method.

Takes an action for the current agent (specified by agent_selection) and updates various parameters including rewards, cumulative rewards, terminations, truncations, infos, and agent_selection. Also updates any internal state used by observe().

Parameters:: machine_action (int, optional) – The action to be taken by the machine agent. Defaults to None.
Returns:: None

stop_simulation() → None[source]

End the simulation.

Returns:: None