5. Modelling language (basics)¶
5.1. YAML Syntax in a nutshell¶
EMULSION models must respect the YAML format, which is based on lists and key-value mappings. Data structures are delimited by 2-space indentation.
- Comments
Whatever is put after
#
is not interpreted.- Values
Numbers (
3
,3.14
), strings ('some text'
), booleans (yes
/no
), lists or key-value mappings.- Lists
A succession of values, e.g.
[value1, value2, value3]
which is equivalent to:
- value1 - value2 - value3
- Key-value mappings
An (unordered) set of associations between unique indentifiers (keys) and any value, e.g.
{key1: value1, key2: value2, key3: value3}
which is equivalent to:
key1: value1 key2: value2 key3: value3
All elements above can be combined and nested to build complex structures, for instance:
# Here a key mapped to a list
key1: [v1, v2, v3]
# Here a key mapped to another mapping
key2:
subkey1:
# the value associated with subkey1 is a list
- item1
- item2
subkey2: 'an important message'
subkey3:
# and each element of the list below is a mapping
- another: value1
withother: value2
- another: value3
withother: value4
5.2. Model structure¶
An EMULSION model is divided in several “sections”, corresponding to the main components of a model. Each section corresponds to a first-level key (i.e. put directly at the beginning of a line without any indentation).
Below is a short description of their nature. This is just an overview of what can be found in a typical EMULSION model. To go further, dive into the next chapter!
model_name
¶
The name of the model. Used to name figures and diagrams.
Example
model_name: compart_SIR
model_info
¶
Several optional information on the model, such as an abstract to describe the model principles and purpose, the authors, references, a license if any, etc.
Example
model_info:
abstract: 'A very long description of the model'
authors:
- 'First Author'
- 'Another Colleague'
DOI: 'my_doi/10.10.10.'
This part is only intended to provide information to the reader. All subsections can be freely defined according to the modeller’s needs.
time_info
¶
This section defines the time unit used in the whole model for parameter values (e.g. hours, days, weeks) and the duration of one time step in the simulation. Optionally, it can specify:
the date where the simulation starts (
origin_date
)the total duration of the simulation (
total_duration
)a condition to interrupt the simulation before the total duration (
stop_condition
)calendars with events (see Regulate time)
Example
time_info:
# all durations (resp. rates) parameter values are expressed in days (resp. per day)
time_unit: 'days'
# the simulation step is 1 day
delta_t: 1
# simulations start on 01/01 (default: current year)
origin: 'January 1'
# simulations run for 100 days
total_duration: '100'
# each run stops before 100 days if the infection is gone
stop_condition: 'infection_terminated'
state_machines
¶
State machines are the main way to define processes involved in an EMULSION model. A state machine is defined by a list of states and a list of transitions between the states. It can also define a list of productions links between states, to specify which states can produce new individuals.
An EMULSION model can contain several state machines, the only constraint being that all state names must be different.
Example of a typical state machine
state_machines:
health_state: # the name of the state machine
desc: 'The state machine which defines the evolution of health states'
# Below, the list of states with their attributes.
states:
- S:
name: 'Susceptible'
desc: 'suceptible of becoming infected'
fillcolor: 'deepskyblue'
- I:
name: 'Infectious'
desc: 'infected and able to transmit the disease'
fillcolor: 'red'
- R:
name: 'Resistant'
desc: 'healthy again and resistant to infection'
fillcolor: 'limegreen'
# Below, a list of transitions between states
transitions:
- {from: S, to: I, rate: 'force_of_infection'}
- {from: I, to: R, rate: 'recovery'}
# Below, a list of production links: all states produce S individuals
productions:
- {from: S, to: S, rate: 'birth_rate', prototype: 'newborn'}
- {from: I, to: S, rate: 'birth_rate', prototype: 'newborn'}
- {from: R, to: S, rate: 'birth_rate', prototype: 'newborn'}
State machine diagrams can be produced automatically by EMULSION with command emulsion diagrams <model.yaml>
.
levels
¶
In EMULSION, a level is a name associated with an entity of a
given scale. At least two levels are expected in a model
(e.g. individuals and the population). A level can contain other
sub-levels, base on a specific aggregation type (compartment
,
IBM
, hybrid
or metapopulation
). A level is essentially a
concept, hence not necessarily simulated explicitly by EMULSION: for
instance, the notion of individuals exist in compartmental models,
though calculations only involve populations. A level can also define
aggregated variables calculated from the values of another variable at
the sublevel.
Level names can be chosen arbitrarily, to identify entities with the most relevant terms.
Example
A typical level specification at the population scale (here in a compartment-based model):
levels:
population: # arbitrary name
desc: 'level of the population'
aggregation_type: 'compartment'
contains:
- individuals # the sublevel
individuals: # arbitrary name
desc: 'level of the individuals'
grouping
¶
This section is mandatory in hybrid models, to describe explicitly how entities from a sublevel are grouped for optimizing the calculations.
In compartmental models and IBM, a grouping section can be
introduced to provide automatic variables of the form
e.g. total_X_Y_Z
where X, Y, and Z can be any states of three
different state machines. The grouping
section specifies upon
which state machines a population is partitioned.
For instance, in a hybrid model, the operation in the health_state
state machine are likely to depend only on the actual health_state
value in each individual. Thus, individuals must be grouped by health
state (all S together, all I together, etc.). Assuming for instance
that we also have a state machine for defining several species and one
for specifying age groups, grouping individuals by age_group
and
species
automatically provides variables of the form
total_Juvenile_Vector
or total_Adult_Host
. Grouping names can
be chosen arbitrarily.
Example
grouping:
population: # a level name with sublevel
infection: [health_state]
pop_structure: [age_group, species]
processes
¶
This section specifies the list of major processes that take place at each level during the simulation. A process name can be either:
The name of a state machine: then, the corresponding level (even virtual individuals as in the compartment aggregation type) is endowed with a variable with the same name as the state machine (e.g.
health_state
), which contains the current state (e.g.S
). The state machines applies on the specified level, making the state evolve over time.The name of a method in a class implemented in a Python add-on. In that case, the corresponding code is executed on the corresponding level.
In hybrid models, the name of the process is expected to be associated with the name of the grouping, used by the state machine to compute the flows on transitions.
Examples
A typical processes specification in a compartment-based/IBM model:
processes:
individuals:
- health_state
A typical processes specification in a hybrid model:
processes:
individuals:
- health_state: infection
# assuming a grouping named "infection":
grouping:
population:
infection: [health_state]
parameters
¶
This section is intended to define:
model parameters (stricto sensu), i.e. numerical values coming from experts, data or assumptions and driving the dynamics of the model
configuration parameters, i.e. numerical values used in initial conditions or scenario definition
distributions expressed by functions and returning a new sample each time they are “used” in a computation
expressions which can combine other parameters or variables
Each entry must be endowed with a full description of its role
(desc:
) and can also provide information on where the value (or
expression) comes from (source:
).
Example
parameters:
# a model parameter
transmission_I:
desc: 'transmission rate from infectious individuals (/day)'
value: 0.5
# an expression (of another parameter and variables)
force_of_infection:
desc: 'infection function'
value: 'transmission_I * total_I / total_population'
source: 'classical function assuming frequency dependence'
# a distribution
initial_age:
desc: 'distribution of ages when initializing individuals'
value: 'random_integers(0, 20)'
prototypes
¶
Prototypes are intended to specify typical individuals or populations which are characterized by specific values of their variables (in compartmental models, such variables can only be state machines).
Prototypes are used mainly in:
initial conditions (see below), to specify how many individuals of each kind must be created
in production links of state machines, to indicate the nature of individuals produced
in the built-in action
become
, to make a state machine induce changes in other variables, including another state machine
Example
prototypes:
# here the level for which the prototypes are defined
individuals:
- healthy: # the name of the prototype
desc: 'healthy individuals'
health_state: S
# variable age_group is one of the existing states
age_group: random
- infected:
desc: 'infected individuals'
health_state: I
# here we intend to start with infected juveniles
age_group: J
See also
To go further: Design prototypes for typical individuals or populations
initial_conditions
¶
Initial conditions specify how to initialize each level. They rely on prototypes, considered as the description of typical sub-levels (e.g. typical populations in metapopulations, typical individuals in IBM/hybrid models, typical sub-populations in compartment-based models).
Examples
A typical specification of initial conditions in an IBM/hybrid/compartment-based model:
initial_conditions:
population:
# a list of prototypes with the number of individuals
# to create with each prototype
- prototype: infected
amount: 'initial_infected'
- prototype: healthy
amount: 'initial_population_size - initial_infected'
outputs
¶
When running an EMULSION model, the amounts of individuals in each
state for all state machines are computed automatically at each time
step. The outputs section specifies how the output data are stored
(CSV file, database…) and at which period (in time steps).
Additional variables can also be logged (extra_vars
).
Using command-line option --plot
, one plot per state machine is
automatically produced, as well as one plot for all extra variables.
Example
outputs:
type: csv # produces counts.csv in output directory
population: # outputs for level population
period: 1 # at each time step
extra_vars:
# add an expression (from 'parameters')
- 'percentage_prevalence'
# add a population variable
- total_population
statevars
¶
This section appears when a level requires variables which are not
defined automatically by their state machines, nor defined as
expressions in the parameters section, nor computed by aggregating
variables from a sublevel. In that case, the section contains a simple
description of the meaning of the variable, which has to be handled
either in prototypes and built-in actions (e.g. set_var
), or in a
Python code add-on.
Example
statevars:
nb_mothers_of_infected_calf:
desc: 'identify and counts cows which gave birth to an infected calf'
input_data
¶
Warning
This feature is experimental and likely to evolve in further versions.
This section is used to connect the model description to data (and possibly to Python code add-ons). Two main subsections can be specified: preprocessors and data-based parameters.
Preprocessors are intended to define treatments which have to be
done before all stochastic repetitions within the same emulsion
run
command.
For instance, to load a data file on trade movements between the populations of a metapopulation: this has to be done once before all repetitions start, then it has to be stored in memory and made available for each repetition. Each preprocessor is associated with a specific Python class.
The preprocessing:
subsection is composed of a list of
preprocessor specifications, with the name of the Python file where
the preprocessor class is defined, the corresponding class name, a
description of the treatment, and possibly input files, output files
and a dependance to data-based parameters. The Python class must
implement two methods: init_preprocessor
and run_preprocessor
.
Preprocessors are executed in sequence, before handling data-based parameters if they do not depend on them.
Example
The specification of a preprocessor in the YAML model file:
preprocessing:
- file: 'hybrid_SIR_JA_metapop_data.py'
class_name: TradeMovementsReader
desc: 'A preprocessor class for reading the CSV that describes the trade movements and restructuring it as a dictionary, stored in shared information in the simulation.'
input_files:
trade_file: 'moves.csv'
The definition of the preprocessor in the Python file:
class TradeMovementsReader(EmulsionPreprocessor):
"""A preprocessor class for reading the CSV... """
# The method which checks the validity of the preprocessor definition
def init_preprocessor(self):
if self.input_files is None or 'trade_file' not in self.input_files:
raise SemanticException("A valid 'trafe_file' must be specified in the input_files section for pre-processing class {}".format(self.__class__.__name__))
# The method which performs the actual treatment
def run_preprocessor(self):
# Note that data shared between runs can be stored in a shared_data
# dictionary in the simulation
if 'moves' not in self.simulation.shared_data:
debuginfo('Reading {}'.format(self.input_files.trade_file))
self.simulation.shared_data['moves'] = self.restructure_moves()
else:
debuginfo('Trade movements already loaded in simulation')
# The treatment itself
def restructure_moves(self):
"""Restructure the CSV file as nested dictionaries"""
...
Data-based parameters allow to force parameter values at a given
level through a data file. For instance, environmental parameters such
as the temperature or a carring capacity can be entirely
data-driven. The principle then is to used a CSV file where some
columns are used as index (key_variables
) to identify which
entities have to update, and other columns are the values of the
parameters. In practice, those “parameter” values are stored into
variables defined at the corresponding level.
The data_based_parameters
subsection must specify the level
concerned by the data-driven parameters, the CSV file to use, the list
of variables used as index, and the dictionary of parameters to change
based on data with their description.
Example
data_based_parameters:
- level: herd
file: 'herd_params.csv'
key_variables: [population_id, step]
parameters:
carrying_capacity:
desc: 'carrying capacity of the population environment at current time step'
precipitation:
desc: 'amount of water available in the population environment at current time step'
See also
TODO
actions
¶
This section appears when a level requires actions which are not provided in EMULSION generic engine. In that case, the section contains a simple description of the meaning of the action, which has to be defined in a separate Python file.
Example
actions:
read_mailbox:
desc: 'Check if messages were received from other individuals and modify contact network accordingly.'