5. Modelling language (basics)

5.1. YAML Syntax in a nutshell

EMULSION models must respect the YAML format, which is based on lists and key-value mappings. Data structures are delimited by 2-space indentation.

Comments

Whatever is put after # is not interpreted.

Values

Numbers (3, 3.14), strings ('some text'), booleans (yes/no), lists or key-value mappings.

Lists

A succession of values, e.g.

[value1, value2, value3]

which is equivalent to:

- value1
- value2
- value3
Key-value mappings

An (unordered) set of associations between unique indentifiers (keys) and any value, e.g.

{key1: value1, key2: value2, key3: value3}

which is equivalent to:

key1: value1
key2: value2
key3: value3

All elements above can be combined and nested to build complex structures, for instance:

# Here a key mapped to a list
key1: [v1, v2, v3]

# Here a key mapped to another mapping
key2:
  subkey1:
    # the value associated with subkey1 is a list
    - item1
    - item2
  subkey2: 'an important message'
  subkey3:
    # and each element of the list below is a mapping
    - another: value1
      withother: value2
    - another: value3
      withother: value4

5.2. Model structure

An EMULSION model is divided in several “sections”, corresponding to the main components of a model. Each section corresponds to a first-level key (i.e. put directly at the beginning of a line without any indentation).

Below is a short description of their nature. This is just an overview of what can be found in a typical EMULSION model. To go further, dive into the next chapter!

model_name

The name of the model. Used to name figures and diagrams.

Example

model_name: compart_SIR

model_info

Several optional information on the model, such as an abstract to describe the model principles and purpose, the authors, references, a license if any, etc.

Example

model_info:
  abstract: 'A very long description of the model'
  authors:
    - 'First Author'
    - 'Another Colleague'
  DOI: 'my_doi/10.10.10.'

This part is only intended to provide information to the reader. All subsections can be freely defined according to the modeller’s needs.

time_info

This section defines the time unit used in the whole model for parameter values (e.g. hours, days, weeks) and the duration of one time step in the simulation. Optionally, it can specify:

  • the date where the simulation starts (origin_date)

  • the total duration of the simulation (total_duration)

  • a condition to interrupt the simulation before the total duration (stop_condition)

  • calendars with events (see Regulate time)

Example

time_info:
  # all durations (resp. rates) parameter values are expressed in days (resp. per day)
  time_unit: 'days'
  # the simulation step is 1 day
  delta_t: 1
  # simulations start on 01/01 (default: current year)
  origin: 'January 1'
  # simulations run for 100 days
  total_duration: '100'
  # each run stops before 100 days if the infection is gone
  stop_condition: 'infection_terminated'

state_machines

State machines are the main way to define processes involved in an EMULSION model. A state machine is defined by a list of states and a list of transitions between the states. It can also define a list of productions links between states, to specify which states can produce new individuals.

An EMULSION model can contain several state machines, the only constraint being that all state names must be different.

Example of a typical state machine

state_machines:
  health_state:  # the name of the state machine
    desc: 'The state machine which defines the evolution of health states'
    # Below, the list of states with their attributes.
    states:
      - S:
          name: 'Susceptible'
          desc: 'suceptible of becoming infected'
          fillcolor: 'deepskyblue'
      - I:
          name: 'Infectious'
          desc: 'infected and able to transmit the disease'
          fillcolor: 'red'
      - R:
          name: 'Resistant'
          desc: 'healthy again and resistant to infection'
          fillcolor: 'limegreen'
    # Below, a list of transitions between states
    transitions:
      - {from: S, to: I, rate: 'force_of_infection'}
      - {from: I, to: R, rate: 'recovery'}
    # Below, a list of production links: all states produce S individuals
    productions:
      - {from: S, to: S, rate: 'birth_rate', prototype: 'newborn'}
      - {from: I, to: S, rate: 'birth_rate', prototype: 'newborn'}
      - {from: R, to: S, rate: 'birth_rate', prototype: 'newborn'}

State machine diagrams can be produced automatically by EMULSION with command emulsion diagrams <model.yaml>.

levels

In EMULSION, a level is a name associated with an entity of a given scale. At least two levels are expected in a model (e.g. individuals and the population). A level can contain other sub-levels, base on a specific aggregation type (compartment, IBM, hybrid or metapopulation). A level is essentially a concept, hence not necessarily simulated explicitly by EMULSION: for instance, the notion of individuals exist in compartmental models, though calculations only involve populations. A level can also define aggregated variables calculated from the values of another variable at the sublevel.

Level names can be chosen arbitrarily, to identify entities with the most relevant terms.

Example

A typical level specification at the population scale (here in a compartment-based model):

levels:
  population:  # arbitrary  name
    desc: 'level of the population'
    aggregation_type: 'compartment'
    contains:
      - individuals  # the sublevel
  individuals:  # arbitrary name
    desc: 'level of the individuals'

grouping

This section is mandatory in hybrid models, to describe explicitly how entities from a sublevel are grouped for optimizing the calculations.

In compartmental models and IBM, a grouping section can be introduced to provide automatic variables of the form e.g. total_X_Y_Z where X, Y, and Z can be any states of three different state machines. The grouping section specifies upon which state machines a population is partitioned.

For instance, in a hybrid model, the operation in the health_state state machine are likely to depend only on the actual health_state value in each individual. Thus, individuals must be grouped by health state (all S together, all I together, etc.). Assuming for instance that we also have a state machine for defining several species and one for specifying age groups, grouping individuals by age_group and species automatically provides variables of the form total_Juvenile_Vector or total_Adult_Host. Grouping names can be chosen arbitrarily.

Example

grouping:
  population:  # a level name with sublevel
    infection: [health_state]
    pop_structure: [age_group, species]

processes

This section specifies the list of major processes that take place at each level during the simulation. A process name can be either:

  • The name of a state machine: then, the corresponding level (even virtual individuals as in the compartment aggregation type) is endowed with a variable with the same name as the state machine (e.g. health_state), which contains the current state (e.g. S). The state machines applies on the specified level, making the state evolve over time.

  • The name of a method in a class implemented in a Python add-on. In that case, the corresponding code is executed on the corresponding level.

In hybrid models, the name of the process is expected to be associated with the name of the grouping, used by the state machine to compute the flows on transitions.

Examples

A typical processes specification in a compartment-based/IBM model:

processes:
  individuals:
    - health_state

A typical processes specification in a hybrid model:

processes:
  individuals:
    - health_state: infection
# assuming a grouping named "infection":
grouping:
  population:
    infection: [health_state]

parameters

This section is intended to define:

  • model parameters (stricto sensu), i.e. numerical values coming from experts, data or assumptions and driving the dynamics of the model

  • configuration parameters, i.e. numerical values used in initial conditions or scenario definition

  • distributions expressed by functions and returning a new sample each time they are “used” in a computation

  • expressions which can combine other parameters or variables

Each entry must be endowed with a full description of its role (desc:) and can also provide information on where the value (or expression) comes from (source:).

Example

parameters:
  # a model parameter
  transmission_I:
    desc: 'transmission rate from infectious individuals (/day)'
    value: 0.5
  # an expression (of another parameter and variables)
  force_of_infection:
    desc: 'infection function'
    value: 'transmission_I * total_I / total_population'
    source: 'classical function assuming frequency dependence'
  # a distribution
  initial_age:
    desc: 'distribution of ages when initializing individuals'
    value: 'random_integers(0, 20)'

prototypes

Prototypes are intended to specify typical individuals or populations which are characterized by specific values of their variables (in compartmental models, such variables can only be state machines).

Prototypes are used mainly in:

  • initial conditions (see below), to specify how many individuals of each kind must be created

  • in production links of state machines, to indicate the nature of individuals produced

  • in the built-in action become, to make a state machine induce changes in other variables, including another state machine

Example

prototypes:
  # here the level for which the prototypes are defined
  individuals:
    - healthy:  # the name of the prototype
        desc: 'healthy individuals'
        health_state: S
        # variable age_group is one of the existing states
        age_group: random
    - infected:
        desc: 'infected individuals'
        health_state: I
        # here we intend to start with infected juveniles
        age_group: J

initial_conditions

Initial conditions specify how to initialize each level. They rely on prototypes, considered as the description of typical sub-levels (e.g. typical populations in metapopulations, typical individuals in IBM/hybrid models, typical sub-populations in compartment-based models).

Examples

A typical specification of initial conditions in an IBM/hybrid/compartment-based model:

initial_conditions:
  population:
    # a list of prototypes with the number of individuals
    # to create with each prototype
    - prototype: infected
      amount: 'initial_infected'
    - prototype: healthy
      amount: 'initial_population_size - initial_infected'

outputs

When running an EMULSION model, the amounts of individuals in each state for all state machines are computed automatically at each time step. The outputs section specifies how the output data are stored (CSV file, database…) and at which period (in time steps). Additional variables can also be logged (extra_vars).

Using command-line option --plot, one plot per state machine is automatically produced, as well as one plot for all extra variables.

Example

outputs:
  type: csv  # produces counts.csv in output directory
  population:      # outputs for level population
    period: 1  # at each time step
    extra_vars:
      # add an expression (from 'parameters')
      - 'percentage_prevalence'
      # add a population variable
      - total_population

statevars

This section appears when a level requires variables which are not defined automatically by their state machines, nor defined as expressions in the parameters section, nor computed by aggregating variables from a sublevel. In that case, the section contains a simple description of the meaning of the variable, which has to be handled either in prototypes and built-in actions (e.g. set_var), or in a Python code add-on.

Example

statevars:
  nb_mothers_of_infected_calf:
    desc: 'identify and counts cows which gave birth to an infected calf'

input_data

Warning

This feature is experimental and likely to evolve in further versions.

This section is used to connect the model description to data (and possibly to Python code add-ons). Two main subsections can be specified: preprocessors and data-based parameters.

Preprocessors are intended to define treatments which have to be done before all stochastic repetitions within the same emulsion run command.

For instance, to load a data file on trade movements between the populations of a metapopulation: this has to be done once before all repetitions start, then it has to be stored in memory and made available for each repetition. Each preprocessor is associated with a specific Python class.

The preprocessing: subsection is composed of a list of preprocessor specifications, with the name of the Python file where the preprocessor class is defined, the corresponding class name, a description of the treatment, and possibly input files, output files and a dependance to data-based parameters. The Python class must implement two methods: init_preprocessor and run_preprocessor.

Preprocessors are executed in sequence, before handling data-based parameters if they do not depend on them.

Example

The specification of a preprocessor in the YAML model file:

preprocessing:
  - file: 'hybrid_SIR_JA_metapop_data.py'
    class_name: TradeMovementsReader
    desc: 'A preprocessor class for reading the CSV that describes the trade movements and restructuring it as a dictionary, stored in shared information in the simulation.'
    input_files:
      trade_file: 'moves.csv'

The definition of the preprocessor in the Python file:

class TradeMovementsReader(EmulsionPreprocessor):
    """A preprocessor class for reading the CSV... """
    # The method which checks the validity of the preprocessor definition
    def init_preprocessor(self):
        if self.input_files is None or 'trade_file' not in self.input_files:
            raise SemanticException("A valid 'trafe_file' must be specified in the input_files section for pre-processing class {}".format(self.__class__.__name__))
    # The method which performs the actual treatment
    def run_preprocessor(self):
        # Note that data shared between runs can be stored in a shared_data
        # dictionary in the simulation
        if 'moves' not in self.simulation.shared_data:
            debuginfo('Reading {}'.format(self.input_files.trade_file))
            self.simulation.shared_data['moves'] = self.restructure_moves()
        else:
            debuginfo('Trade movements already loaded in simulation')
     # The treatment itself
     def restructure_moves(self):
         """Restructure the CSV file as nested dictionaries"""
             ...

Data-based parameters allow to force parameter values at a given level through a data file. For instance, environmental parameters such as the temperature or a carring capacity can be entirely data-driven. The principle then is to used a CSV file where some columns are used as index (key_variables) to identify which entities have to update, and other columns are the values of the parameters. In practice, those “parameter” values are stored into variables defined at the corresponding level.

The data_based_parameters subsection must specify the level concerned by the data-driven parameters, the CSV file to use, the list of variables used as index, and the dictionary of parameters to change based on data with their description.

Example

data_based_parameters:
  - level: herd
    file: 'herd_params.csv'
    key_variables: [population_id, step]
    parameters:
      carrying_capacity:
        desc: 'carrying capacity of the population environment at current time step'
      precipitation:
        desc: 'amount of water available in the population environment at current time step'

See also

TODO

actions

This section appears when a level requires actions which are not provided in EMULSION generic engine. In that case, the section contains a simple description of the meaning of the action, which has to be defined in a separate Python file.

Example

actions:
  read_mailbox:
    desc: 'Check if messages were received from other individuals and modify contact network accordingly.'