PEtab 2.0 tutorial

Overview

In the following, we demonstrate how to set up a parameter estimation problem in PEtab based on a realistic application example. To this end, we consider the model and experimental data by Boehm et al. (2014). The model describes the dynamics of phosphorylation and dimerization of the transcription factors STAT5A and STAT5B. A visualization and the corresponding reactions of the model are provided below, although the details of the model are not relevant for the purpose of this tutorial. For more details, we refer to the original publication.

We will start with the model, and then proceed to link the model to experimental data by defining experimental conditions, observation functions, and measurements. After this, we will define the parameters to be estimated, and finally group all files in a YAML file to define the PEtab problem.

1. The model

PEtab assumes that an SBML file of the model exists. Here, we use the SBML model provided in the original publication, which is also available on Biomodels (https://www.ebi.ac.uk/biomodels/BIOMD0000000591). For illustration purposes we slightly modified the SBML model and shortened some parts of the PEtab files. The full PEtab problem introduced in this tutorial is available online.

../../_images/tutorial_model1.png

Visualization of the model used as example in this tutorial. The model describes the dynamics of phosphorylation and dimerization of the transcription factors STAT5A and STAT5B.

Reactions included in the example model.

ID

Reaction

Rate law

R1

2 STAT5A → pApA

cyt * BaF3_Epo * STAT5A^2 * k_phos

R2

STAT5A + STAT5B → pApB

cyt * BaF3_Epo * STAT5A * STAT5B * k_phos

R3

2 STAT5B → pBpB

cyt * BaF3_Epo * STAT5B^2 * k_phos

R4

pApA → nucpApA

cyt * k_imp_homo * pApA

R5

pApB → nucpApB

cyt * k_imp_hetero * pApB

R6

pBpB → nucpBpB

cyt * k_imp_homo * pBpB

R7

nucpApA → 2 STAT5A

nuc * k_exp_homo * nucpApA

R8

nucpApB → STAT5A + STAT5B

nuc * k_exp_hetero * nucpApB

R9

nucpBpB → 2 STAT5B

nuc * k_exp_homo * nucpBpB

2. Linking model and measurements

The model by Boehm et al. (2014) was calibrated on measurements on phosphorylation levels of STAT5A and STAT5B as well as relative STAT5A abundance for different timepoints between 0 - 240 minutes after stimulation with erythropoietin (Epo):

../../_images/tutorial_data1.png

Measurements considered for model calibration in our example.

To define a parameter estimation problem in PEtab, we need to map measurements to the model state. To this end, we need to 1) specify the experimental conditions the measurements were generated from, 2) specify observation functions and error models, and 3) specify the measurements themselves. For this, we need to define observation functions as well as experimental conditions under which a measurement was performed.

2.1 Specifying experimental conditions

All measurements were collected under the same experimental condition, which is a stimulation with Epo. In PEtab, we can define experiments, which are characterized by specific conditions (here: discrete changes) that are applied to the model at certain time points.

In the problem considered here, the relevant the model parameter is Epo_concentration, the initial concentration of Epo, which we want to set to a value of 1.25E-7. Since in this example we include data from only one single experiment, it would not be necessary to specify the condition parameter here, but instead the value could have been also set in the model or in the parameter table. However, the benefit of specifying this change as an experiment is that it allows us to easily add measurements from other experiments performed with different Epo concentrations later on.

We define a single experiment in the PEtab experiment table, a tab-separated values (TSV) file[1]:

Experiment table experiments.tsv.

experimentId

time

conditionId

epo_stimulation

0.0

epo_bolus

This means that in the experiment we call epo_stimulation, at time point 0.0, the condition epo_bolus is applied to the model. The condition itself is defined in the condition table, another TSV file, below.

The condition table specifies the discrete changes to model parameters or model state that are applied when the respective condition is activated. In our example, we only have one condition with a single change that sets the Epo concentration to 1.25E-7:

Conditions table experimental_conditions.tsv.

conditionId

targetId

targetValue

epo_bolus

Epo_concentration

1.25E-7

In more complex scenarios, multiple conditions could be defined here, and targetValue could contain more complex expressions.

2.2 Specifying the observation model

To link the model state to the measurements shown above, we specify observation functions. Additionally, a noise model is be introduced to account for the measurement errors. In PEtab, this is encoded in the observable table:

Observable table observables.tsv.

observableId

observableName

pSTAT5A_rel

Rel. STAT5A phosphorylation [%]

pSTAT5B_rel

Rel. STAT5B phosphorylation [%]

rSTAT5A_rel

Rel. STAT5A abundance [%]

Observable table observables.tsv (continued).

observableFormula

100*(2*pApA + pApB) / (2*pApA + pApB + STAT5A)

100*(2*pBpB + pApB) / (2*pBpB + pApB + STAT5B)

100*(STAT5A + pApB + 2*pApA) / (2 * pApB + 2* pApA + STAT5A + STAT5B + 2*pBpB)

Observable table observables.tsv (continued).

noiseFormula

noisePlaceholders

noiseDistribution

pSTAT5A_rel_sigma

pSTAT5A_rel_sigma

normal

pSTAT5B_rel_sigma

pSTAT5B_rel_sigma

normal

rSTAT5A_rel_sigma

rSTAT5A_rel_sigma

normal

  • observableId specifies a unique identifier to the observables that can be used to link them to the measurements (see below).

  • observableName can be used as a human readable description of the observable.

  • observableFormula is a mathematical expression defining how the model output is calculated. The formula can consist of species and parameters defined in the SBML file. In our example, we measure e.g. the relative phosphorylation level of STAT5A (pSTAT5A_rel), which is the sum of all species containing phosphorylated STAT5A over the sum of all species containing any form of STAT5A.

  • noiseFormula is used to describe the formula for the measurement noise. Together with noiseDistribution, it defines the noise model. In this example, we assume additive, normally distributed measurement noise. In this scenario, {observableId}_sigma is the standard deviation of the measurement noise. Because we want to estimate the standard deviation from the data, we parameterize it here. Furthermore, we flag these parameters as placeholders in the noisePlaceholders column, which allows us to substitute them with specific values for each measurement in the measurement table (see below).

2.3 Specifying measurements

The experimental data is linked to the experiments via the experimentId and to the observables via the observableId. This is defined in the PEtab measurement file:

Measurement table measurement_data.tsv.

observableId

experimentId

measurement

time

noiseParameters

pSTAT5A_rel

epo_stimulation

7.9

0

sd_pSTAT5A_rel

pSTAT5A_rel

epo_stimulation

15.4

240

sd_pSTAT5A_rel

pSTAT5B_rel

epo_stimulation

4.6

0

sd_pSTAT5B_rel

pSTAT5B_rel

epo_stimulation

10.96

240

sd_pSTAT5B_rel

rSTAT5A_rel

epo_stimulation

14.7

0

sd_rSTAT5A_rel

rSTAT5A_rel

epo_stimulation

32.2

240

sd_rSTAT5A_rel

  • observableId references the observableId from the observable file.

  • experimentId references the experimentId from the experiment file.

  • measurement defines the values that are measured for the respective observable and experiment.

  • time is the time point at which the measurement was performed. For brevity, only the first and last time point of the example are shown here (the omitted measurements are indicated by “…” in the example).

  • noiseParameters relates to the noiseParameters in the observable table. In our example, the measurement noise is unknown. Therefore we specify parameters here which have to be estimated (see parameters sheet below). If the noise is known, e.g. from multiple replicates, numeric values can be specified in this column.

3. Defining parameters

The model by Boehm et al. (2014) contains nine unknown parameters that need to be estimated from the experimental data. Additionally, it has one known parameter that is fixed to a literature value.

The parameter table for this is given by:

Parameter table parameters.tsv.

parameterId

lowerBound

upperBound

nominalValue

estimate

Epo_degradation_BaF3

1E-5

1E+5

true

k_exp_hetero

1E-5

1E+5

true

k_exp_homo

1E-5

1E+5

true

k_imp_hetero

1E-5

1E+5

true

k_imp_homo

1E-5

1E+5

true

k_phos

1E-5

1E+5

true

ratio

0.693

false

sd_pSTAT5A_rel

1E-5

1E+5

true

sd_pSTAT5B_rel

1E-5

1E+5

true

sd_rSTAT5A_rel

1E-5

1E+5

true

  • parameterId references parameters defined in the SBML file or introduced in the condition table or the measurement table. In this example, the first seven parameters are specified in the model, and the last three parameters are the standard deviations for the different observables (sd_{observableId}) that we introduced in the measurement table.

  • lowerBound and upperBound define the bounds for the parameters used during estimation. These are usually biologically plausible ranges.

  • estimate defines whether the parameter will be estimated (true) or be fixed (false) to the value in the nominalValue column.

  • nominalValue are known values used for simulation. The entry can be left empty, if a value is unknown and requires estimation.

5. YAML file

The parameter estimation problem is fully defined by the files created above. However, to facilitate importing the problem into tools supporting PEtab, a YAML file is used to group the files together. This file has the following format (Boehm_JProteomeRes2014.yaml):

format_version: 2.0.0
model_files:
  model:
    location: model_Boehm_JProteomeRes2014.xml
    language: sbml
parameter_files:
  - parameters.tsv
experiment_files:
  - experiments.tsv
condition_files:
  - experimental_conditions.tsv
observable_files:
  - observables.tsv
measurement_files:
  - measurement_data.tsv

The first line specifies the PEtab version this file and the files referenced adhere to. The next block specifies the model file, in this case an SBML file. This is followed by lists of the different PEtab files created above: parameter, experiment, condition, observable, and measurement files. Here, each list contains only one file, but multiple files can be referenced if needed.

7. Further information

This tutorial only demonstrates a subset of PEtab functionality. For full reference, consult the PEtab specification. After finishing the implementation of the PEtab problem, its correctness can be verified using the petablint tool provided by the PEtab Python library (usage). The PEtab problem can then be used as input to the supporting toolboxes to estimate the unknown parameters or calculate parameter uncertainties.

References

  • Martin E. Boehm, Lorenz Adlung, Marcel Schilling, Susanne Roth, Ursula Klingmüller, and Wolf D. Lehmann. Journal of Proteome Research 2014 13 (12), 5685-5694. DOI: 10.1021/pr5006923.