Human Learning and Decision-Making Models¶
RouteRL provides a catalog of human learning and decision-making models, including three state-of-the-art discrete choice models. These models, popular within transportation community, emulate human agents as utility maximizers, where individual utilities are influenced by individual characteristics—unlike reinforcement learning algorithms that primarily focus on cost minimization.
Note
Users can create their own human models by inheriting BaseLearningModel
.
Gawron Model¶
- class routerl.human_learning.Gawron(params, initial_knowledge)[source]
The Gawron learning model. This model is based on: Gawron (1998)
In summary, it iteratively shifts the cost expectations towards the received reward.
For decision-making, calculates action utilities based on the
beta
parameter and cost expectations, and selects the action with the lowest utility.- Parameters:
params (dict) – A dictionary containing model parameters.
initial_knowledge (list or array) – Initial knowledge of cost expectations.
- Variables:
beta (float) – A parameter representing deviations in individual decision-making.
alpha_zero (float) – Agent’s adaptation to new experiences.
alpha_j (float) – Weight for previous cost expectation (1 - ALPHA_ZERO).
cost (np.ndarray) – Agent’s cost expectations for each option.
- act(state) int [source]
Selects an action based on the cost expectations.
- Parameters:
state (Any) – The current state of the environment (not used).
- Returns:
action (int) – The index of the selected action.
- learn(state, action, reward) None [source]
Updates the cost associated with the taken action based on the received reward.
- Parameters:
state (string) – The current state of the environment (not used).
action (int) – The action that was taken.
reward (float) – The reward received after taking the action.
- Returns:
None
Cumulative Logit Model¶
- class routerl.human_learning.Culo(params, initial_knowledge)[source]
The CUmulative LOgit learning model. This model is based on: Li et al. (2024).
In summary, it updates its cost expectations by iteratively accumulating perceived rewards.
For decision-making, calculates action utilities based on the
beta
parameter and cost expectations, and selects the action with the lowest utility.- Parameters:
params (dict) – A dictionary containing model parameters.
initial_knowledge (list or array) – Initial knowledge of cost expectations.
- Variables:
beta (float) – A parameter representing deviations in individual decision-making.
alpha_zero (float) – Agent’s adaptation to new experiences.
alpha_j (float) – Weight for previous cost expectation (constant = 1).
cost (np.ndarray) – Agent’s cost expectations for each option.
- act(state) int [source]
Selects an action based on the cost expectations.
- Parameters:
state (Any) – The current state of the environment (not used).
- Returns:
action (int) – The index of the selected action.
- learn(state, action, reward) None [source]
Updates the cost associated with the taken action based on the received reward.
- Parameters:
state (Any) – The current state of the environment (not used).
action (int) – The action that was taken.
reward (float) – The reward received after taking the action.
- Returns:
None
Weighted Average Model¶
- class routerl.human_learning.WeightedAverage(params, initial_knowledge)[source]
Weighted Average learning model. Theory based on: Cascetta (2009).
In summary, the model uses the reward and a weighted average of the past cost expectations to update the current cost expectation.
For decision-making, calculates action utilities based on the
beta
parameter and cost expectations, and selects the action with the lowest utility.- Parameters:
params (dict) – A dictionary containing model parameters.
initial_knowledge (list or array) – Initial knowledge of cost expectations.
- Variables:
beta (float) – A parameter representing deviations in individual decision-making.
alpha_zero (float) – Agent’s adaptation to new experiences.
alpha_j (float) – Weight for previous cost expectation (1 - ALPHA_ZERO).
remember (string) – Memory size.
cost (np.ndarray) – Agent’s cost expectations for each option.
memory (list(list)) – A list of lists containing the memory of each state.
- act(state) int [source]
Selects an action based on the cost expectations.
- Parameters:
state (Any) – The current state of the environment (not used).
- Returns:
action (int) – The index of the selected action.
- create_memory() None [source]
Creates a memory of previous cost expectations.
- Returns:
None
- learn(state, action, reward) None [source]
Updates the cost associated with the taken action based on the received reward.
- Parameters:
state (Any) – The current state of the environment (not used).
action (int) – The action that was taken.
reward (float) – The reward received after taking the action.
- Returns:
None
Base Learning Model¶
- class routerl.human_learning.learning_model.BaseLearningModel[source]
This is an abstract base class for the learning models used to model human learning and decision-making.
Users can create their own learning models by inheriting from this class.
- abstractmethod act(state) None [source]
Method to select an action based on the current state and cost.
- Returns:
None
- abstractmethod learn(state, action, reward) None [source]
Method to learn the model based on the current state and cost.
- Parameters:
state (Any) – The current state of the environment.
action (Any) – The action to take.
reward (Any) – The reward received from the environment.
- Returns:
None