Parameter inference for SDE in epidemiological models

My overall goal is to do inference for a SIR-model that is represented by a multidimensional SDE.
My idea was to interprete the Euler-scheme as an autoregressive process with mean and standard deviation non-linear dependent on the last state of the process.
The Euler-scheme for a SDE takes the form

x_{t+1}=x_t+\mu(x_t)*dt+\sigma(x_t)*dB_t

Since dB_t is a vector of independent normally distributed random variables it should be multivariate normal distributed. But trying to implement that I always get a RuntimeError: Initialization failed.error appearing.
A minimal example I created is using the following two-dimensional SDE for an SIS-model:

\begin{cases} dS(t) = [\beta-\alpha S(t)I(t)+\gamma I(t)-\beta S(t)]dt-\sigma S(t)I(t)dB_1(t)\\ dI(t) = [\alpha S(t)I(t)-(\beta+\gamma)I(t)]dt+\sigma S(t)I(t)dB_2(t) \end{cases}

I encoded the model as follows:

functions {

    vector mu(vector y, real alpha, real beta, real gamma) {

        vector[2] m;

        m[1] = (beta-alpha*y[1]*y[2]-gamma*y[2]-beta*y[1]);

        m[2] = (alpha*y[1]*y[2]-(beta+gamma)*y[2]);

        return m;

    }

    vector Sigma(vector y, real sigma) {

        vector[2] sig;

        sig[1] = sigma*y[1]*y[2]/10;

        sig[2] = sigma*y[1]*y[2]/10;

        return sig;

    }

}

data{

    int<lower=0> N;

    real<lower=0> dt;

    array[N] vector[2] y;

}

parameters{

    real<lower=0, upper=1> alpha;

    real<lower=0, upper=1> beta;

    real<lower=0, upper=1> gamma;

    real<lower=0, upper=1> sigma;

}

model{

for (n in 2:N) {

        y[n] ~ normal(y[n-1]+mu(y[n-1], alpha, beta, gamma)*dt, sqrt(dt)*Sigma(y[n-1], sigma));

    }

}

And then used same synthetic generated data to build and fit the model using PyStanwith

post_sec = stan.build(second_model, data=sec_input_data )
sec_fit = post_sec.sample(num_chains=4, num_samples=1000)

This worked fine, at least no errors occured.
But this was just an easy model, which I could write as a vector of independent normal distributed random variables. The model which I normally want to fit is much more complicated and especially involves correlation of the noise of the different compartments, so I would need to sample a vector from a multivariate normal with given mean and covariance matrix.
So the next step was to take the simple model from above and write the noise function as a diagonal matrix (no correlation in the simple model) and then draw y from a multivariate normal:

functions {
    vector mu(vector y, real alpha, real beta, real gamma) {
        vector[2] m;
        m = y*(beta-alpha-gamma-beta);
        return m;
    }
    matrix Sigma(vector y, real sigma) {
        matrix[2, 2] sig;
        sig[1, 1] = - sqrt(sigma*y[1]*y[2]/100);
        sig[2, 2] = sqrt(sigma*y[1]*y[2]/100);
        sig[2,1] = 0
        sig[1,2] = 0
        return sig;
    }
}
data{
    int<lower=0> N;
    real<lower=0> dt;
    array[N] vector[2] y;
}
parameters{
    real<lower=0, upper=1> alpha;
    real<lower=0, upper=1> beta;
    real<lower=0, upper=1> gamma;
    real<lower=0, upper=1> sigma;
}
model{
for (n in 2:N) {
        y[n] ~ multi_normal(y[n-1]+mu(y[n-1], alpha, beta, gamma)*dt, sqrt(dt)*Sigma(y[n-1], sigma));
    }
}

The model still compiles and I try to sample from it using the same synthetic data as before

post_third = stan.build(third_model, data=third_input_data)
third_fit = post_third.sample(num_chains=4, num_samples=1000)

but this gives the following error

Sampling:   0%Sampling: Initialization failed.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/home/vinc777/masterthesis/two_variant_model/PyStan/try.ipynb Cell 19' in <module>
----> 1 third_fit = post_third.sample(num_chains=4, num_samples=1000)

File ~/.local/lib/python3.8/site-packages/stan/model.py:89, in Model.sample(self, num_chains, **kwargs)
     61 def sample(self, *, num_chains=4, **kwargs) -> stan.fit.Fit:
     62     """Draw samples from the model.
     63 
     64     Parameters in ``kwargs`` will be passed to the default sample function.
   (...)
     87 
     88     """
---> 89     return self.hmc_nuts_diag_e_adapt(num_chains=num_chains, **kwargs)

File ~/.local/lib/python3.8/site-packages/stan/model.py:108, in Model.hmc_nuts_diag_e_adapt(self, num_chains, **kwargs)
     92 """Draw samples from the model using ``stan::services::sample::hmc_nuts_diag_e_adapt``.
     93 
     94 Parameters in ``kwargs`` will be passed to the (Python wrapper of)
   (...)
    105 
    106 """
    107 function = "stan::services::sample::hmc_nuts_diag_e_adapt"
--> 108 return self._create_fit(function=function, num_chains=num_chains, **kwargs)

File ~/.local/lib/python3.8/site-packages/stan/model.py:311, in Model._create_fit(self, function, num_chains, **kwargs)
    308     return fit
    310 try:
--> 311     return asyncio.run(go())
    312 except KeyboardInterrupt:
    313     return

File ~/.local/lib/python3.8/site-packages/nest_asyncio.py:38, in _patch_asyncio.<locals>.run(main, debug)
     36 task = asyncio.ensure_future(main)
     37 try:
---> 38     return loop.run_until_complete(task)
     39 finally:
     40     if not task.done():

File ~/.local/lib/python3.8/site-packages/nest_asyncio.py:81, in _patch_loop.<locals>.run_until_complete(self, future)
     78 if not f.done():
     79     raise RuntimeError(
     80         'Event loop stopped before Future completed.')
---> 81 return f.result()

File /usr/lib/python3.8/asyncio/futures.py:178, in Future.result(self)
    176 self.__log_traceback = False
    177 if self._exception is not None:
--> 178     raise self._exception
    179 return self._result

File /usr/lib/python3.8/asyncio/tasks.py:280, in Task.__step(***failed resolving arguments***)
    276 try:
    277     if exc is None:
    278         # We use the `send` method directly, because coroutines
    279         # don't have `__iter__` and `__next__` methods.
--> 280         result = coro.send(None)
    281     else:
    282         result = coro.throw(exc)

File ~/.local/lib/python3.8/site-packages/stan/model.py:235, in Model._create_fit.<locals>.go()
    233         sampling_output.clear()
    234         sampling_output.write_line("<info>Sampling:</info> <error>Initialization failed.</error>")
--> 235         raise RuntimeError("Initialization failed.")
    236     raise RuntimeError(message)
    238 resp = await client.get(f"/{fit_name}")

RuntimeError: Initialization failed.

I already tried to give more constraints on the parameters to avoid some log-probability going to infinity, but it did not work.

I am wondering whether I am using the multivariate normal in a wrong way, or it is not supported.

Or is there any better way to deal with inference for SDEs in STAN?
Or would you even say STAN is not yet the right package for that and have some other recommendations.
I am glad for every idea or hint.

For the initial minimal example there seems to be some discrepancies between the SDE model you have written and the (time-discretised) Stan implementation.

Specifically you state the SDE system is

\begin{cases} dS(t) = [\beta-\alpha S(t)I(t)+\gamma I(t)-\beta S(t)]dt-\sigma S(t)I(t)dB(t)\\ dI(t) = [\alpha S(t)I(t)-(\beta+\gamma)I(t)]dt+\sigma S(t)I(t)dB(t) \end{cases}

but your definition of the mu and Sigma functions seems to correspond to a system

\begin{cases} dS(t) = [\beta-\alpha S(t)I(t)-\gamma I(t)-\beta S(t)]dt+\sigma (1 - S(t)I(t)) / 10dB_1(t)\\ dI(t) = [\alpha S(t)I(t)-(\beta+\gamma)I(t)]dt+\sigma (1 - S(t)I(t)) / 10dB_2(t) \end{cases}

As well as the changes in the signs of some terms and differing definition of the diffusion coefficient, your original definition seemed to imply a single scalar-valued Wiener noise process B was driving both the S and I processes - was this intentional? If so then using a Euler-Maruyama discretisation with no data augmentation (inclusion of the state at additional unobserved intermediate time steps as additional latent variables to infer) and observations of both S and I will not be possible (the Euler-Maruyama discretisation introduces one scalar latent variable - the normal variate used to simulate the Wiener increment - per timestep in this case, compared to the two observed values constraining the system per time step - this will mean there are more observational constraints than latent values in general).


EDIT: from a bit of searching it looks like you are using something similar to the model described in this paper (non-paywalled PDF), which does use a single Brownian noise process to drive both S and I but also notes that S and I are deterministically related in this case and so a reduced SDE in just one variable can be considered.


Further there does not seem to be any scaling of the variance of the normal distribution used to represent the Euler-Maruyama time discretisation by the timestep dt - for a time-homogeneous SDE system with drift function \mu and diffusion coefficient function \sigma the Euler-Maruyama discretisation corresponds to

X(t+\delta t) \approx X(t) +\mu(X(t)) \delta t + \sigma(X(t)) \sqrt{\delta t} \,V

where V is a vector of standard normal variates or equivalently

X(t+\delta t) \,|\, X(t) = x \sim \mathsf{normal}\big(x + \delta t \mu(x), \sqrt{\delta t} \sigma(x)\big)

With regards to the second implementation you posted using multi_normal, I think part of the issue here is that while normal(m, s) corresponds to a normal distribution with mean m and standard deviation (square root of variance) s, multi_normal(m, C) corresponds to a multivariate normal distribution with mean vector m and covariance matrix C. To get something more analogous to normal in terms of being parameterised in terms of a ‘square-root’ of the covariance matrix you could instead use multi_normal_cholesky(m, L) where L is the lower-triangular Cholesky factor for the covariance matrix. In your current case as the covariance is diagonal using a vectorised call to normal is a more performant option though it seems your eventual plan is to have correlation between the noise processes driving the S and I processes - if so if the (matrix valued) diffusion coefficient \sigma is not always lower-triangular valued you will either need to perform a change of variables using Itî’s lemma to transform to a system in which the diffusion coefficient is lower-triangular, or explicitly form the product \sigma \sigma^T corresponding to the (scaled) covariance of the Euler-Maruyma discretisation

X(t+\delta t) \,|\, X(t) = x \sim \mathsf{multi\_normal}\big(x + \delta t \mu(x), \delta t \sigma(x)\sigma(x)^T\big)

Currently as y[1] and y[2] are not guaranteed to be positive your implementation may be failing on initialisation due to numerical issues when trying to factorise non-positive definite matrices being passed to multi_normal.

With regards to your general questions

Or is there any better way to deal with inference for SDEs in STAN?
Or would you even say STAN is not yet the right package for that and have some other recommendations.

Using an Euler-Maruyama method to time discretise the SDE and performing inference on the discretised process is a reasonable approach, though assuming the interobservation interval is sufficiently small to warrant using a single Euler-Maruyama step between each pair of observed times will in general be problematic. A more typical approach is to split each interobservation interval in to small timesteps and jointly infer the time-discretised latent paths and parameters which allows the discretisation time step to be controlled independently of the interobservation interval, at the cost of a much larger latent space to perform inference in. In this more general setting there is also the issue that as the discretisation time step tends to zero the correlation between the states at successive time steps tends to one, leading to a challenging posterior geometry for the Markov chains to explore. This can be alleviated by using a non-centred parameterisation in terms of the standard normal variates used to simulate the Wiener increments at the intermediate time steps, though the observed time step will still need to use a centred parameterisation.

In terms of alternatives, the paper Manifold MCMC methods for Bayesian inference in diffusion models (disclosure: I am one of the authors) proposes a specific constrained Hamiltonian Monte Carlo based approach for performing inference efficiently in SDE models, with associated Python code in this GitHub repository which builds on-top of the Python package Mici (again for full disclosure, I am the author of Mici). We show in the paper that the proposed approach can give significant performance improvements over applying a standard Hamiltonian Monte Carlo algorithm to a non-centred parameterisation of the latent space, including in an SIR model with time-varying contact rate (code here). If you are interested in trying out the approach on your problem I would be happy to help with the implementation.

Another alternative would be the guided proposal / continuous-discrete smoothing approach described in this paper by Marcin Mider, Moritz Schauer and Frank van der Meulen. An associated Julia package implementing their approach is available here.

1 Like

For a somewhat brief introducing to stochastic differential equations in Stan see An Infinitesimal Introduction to Stochastic Differential Equation Modeling. The examples here are all one-dimensional but it sounds like that may be sufficient for this model, and if not the key is working out a multivariate Euler-Maruyama discretization but the implementation will be very similar using multivariate normal density functions instead of the one-dimensional normal density functions demonstrated in the case study.

1 Like

Thank you for your reply.
I edited the post to correct for the difference between the stated SDE and the implementation and added the scaling for the noise in the implementation.
I also think that I know where the error comes from, but don’t know why.
In the new implementation of the last model I pass the square-root of the (scaled) covariance matrix to the multinormal distribution, which has the form

\left( \begin{array}{cc} -\sqrt{\sigma S(t)I(t)/10}0 & 0\\ 0 & \sqrt{\sigma S(t)I(t)/100}\\ \end{array} \right)

If I would remove the - in the first entry, the error would disappear. But I don’t know why this is a problem and how to get over it since this matrix is positive-definite it should be a valid square-root of the covariance matrix. I know that in this case removing the minus will not make a difference but I later I want to have negative correlation.
I mean it still does not match the stated system since there the noise enters with a negative sign in the first equation and trying to implement that also always gives the initialization error.
Maybe you have an idea for that.

Also thanks for the alternatives. I will have a look on your approach and see whether it is feasible for my problem.

I am already trying the Julia package parallel to my implementation to compare results and decide which one to use when increasing my model.

The matrix

\left( \begin{array}{cc} -\sqrt{\sigma S(t)I(t)/10}0 & 0\\ 0 & \sqrt{\sigma S(t)I(t)/100}\\ \end{array} \right)

is not positive-definite - it is equivalent to

\sqrt{\sigma S(t)I(t)/100}\left( \begin{array}{cc} -1 & 0\\ 0 & 1\\ \end{array} \right)

which irrespective of the value of S(t) and I(t) will be non-definite as it will have both positive and negative eigenvalues.

You are currently passing this as the covariance parameter to multi_normal in the second part of the code in your updated post. For a \mathbb{R}^N-valued diffusion process X described by

\mathrm{d}X(t) = \mu(X(t)) \mathrm{d}t + \sigma(X(t)) \mathrm{d}B(t)

where \mu : \mathbb{R}^N \to \mathbb{R}^N is the vector-valued drift function, \sigma : \mathbb{R}^N \to \mathbb{R}^{N\times N} is the matrix valued diffusion coefficient and B is N-dimensional vector of independent Wiener noise processes, then Euler-Maruyama discretisation corresponds to

X(t+\delta t) \,|\, X(t) = x \sim \mathsf{multi\_normal}\big(x + \delta t \mu(x), \delta t \sigma(x)\sigma(x)^T\big)

that is the covariance parameter scales with the time step (rather than its square root) and it is the symmetric product of the diffusion coefficient \sigma(x)\sigma(x)^T that determines the covariance. As in your case the diffusion coefficient matrix is diagonal you could instead use

X(t+\delta t) \,|\, X(t) = x \sim \mathsf{multi\_normal\_cholesky}\big(x + \delta t \mu(x), \sqrt{\delta t} \sigma(x)\big)

though for the diagonal case it makes more sense to use a vectorised call to normal.

The system the updated code in the first part of your original post is simulating now appears to be

\begin{cases} dS(t) = [\beta-\alpha S(t)I(t)-\gamma I(t)-\beta S(t)]dt+(\sigma (S(t)I(t)) / 10)dB_1(t)\\ dI(t) = [\alpha S(t)I(t)-(\beta+\gamma)I(t)]dt+(\sigma (S(t)I(t)) / 10)dB_2(t) \end{cases}

As the distribution on the Brownian noise process is symmetric if we define \bar{B}_1 = -B_1 then we see that an equivalent system is

\begin{cases} dS(t) = [\beta-\alpha S(t)I(t)-\gamma I(t)-\beta S(t)]dt - (\sigma (S(t)I(t)) / 10)d\bar{B}_1(t)\\ dI(t) = [\alpha S(t)I(t)-(\beta+\gamma)I(t)]dt+(\sigma (S(t)I(t)) / 10)dB_2(t) \end{cases}

so from a distribution perspective it doesn’t matter if the signs are switched in the diffusion coefficient. Note however this is still different from the system you originally posted in that there are two independent driving noise processes B_1 and B_2, while your original system has a single noise process B shared between both S and I. As I said above as S and I are deterministically related in this case the posterior distribution on observing both directly when using a Euler-Maruyama discretisation as you have is not well-defined.

2 Likes

I want the two independent noise processes B_1 and B_2, so edited that again.
Okay so far I understand that we need to give the covariance matrix as an argument to the multi_normal, which is the symmetric product of the diffusion coefficient and this should then be a positive-definite matrix. (I agree that my example was not well chosen for that.)
Is that correct?

Yes - the symmetric product \sigma \sigma^T will always be positive semi-definite and will be strictly positive-definite providing the diffusion coefficient \sigma is full-rank.

For a N \times N diffusion coefficient, forming the symmetric product and computing a matrix factorization when computing the multivariate normal log density using multi_normal will both require \mathcal{O}(N^3) operations per observed time in each evaluation of the posterior log density (and its gradient). Using the diffusion coefficient directly in multi_normal_cholesky (this requires the diffusion coefficient to be lower-triangular, however, if not already we can often analytically solve for a lower-triangular factor of \sigma \sigma^T for example using a symbolic algebra package and use this in place of the original diffusion coefficient) will only require \mathcal{O}(N^2) operations per observed time in each evaluation of the posterior log density. For small N such as the current N = 2 this won’t make a significant difference obviously but this might be worth bearing in mind if the intention is to extend to a higher dimensional system.

Okay, some problems again on the same model.
I just wanted to implement the model with measurements y in the data block and x being the latent state of the process as a parameter. But somehow I end up with the same initialization error again. My guess is that it is because the covariance matrix depends on the last state x[n-1] of the process and can be not positive definite if that state has negative values or zero for the infected people compartment.
So a way around would be to sample from a truncated multivariate normal I guess, but truncation is not supported in stan for multivariate distributions and I could not create a workaround yet.
Maybe you know whether I am correct where the error comes from and have an idea how to resolve it.

The model code is

toy_model2 = """
functions{
    vector mu(vector x, real alpha, real gamma) {
        vector[2] m;
        m[1] = -alpha*x[1]*x[2];
        m[2] = alpha*x[1]*x[2]-gamma*x[2];
        return m;
    }

    matrix Sigma(vector x, real alpha, real gamma) {
        matrix[2, 2] sig;
        matrix[2, 2] L;
        sig[1, 1] = alpha*x[1]*x[2];
        sig[1, 2] = -alpha*x[1]*x[2];
        sig[2, 1] = -alpha*x[1]*x[2];
        sig[2 ,2] = alpha*x[1]*x[2]+gamma*x[2];
        L = cholesky_decompose(sig);
        return L;
    }
}
data{
    int<lower=0> N; // number of timesteps
    real<lower=0> dt; // stepsize
    array[N] vector[2] y; // vector of observations
}
parameters {
    real<lower=0, upper=1> alpha;
    real<lower=0, upper=1> gamma;
    real<lower=0, upper=1> sigma_measurement;
    array[N] vector[2] x;
}
model {
for (n in 2:N) {
        x[n] ~ multi_normal_cholesky(x[n-1]+dt*mu(y[n-1], alpha, gamma), sqrt(dt/100)*Sigma(x[n-1], alpha, gamma));
        y[n][1] ~ normal(x[n][1], sigma_measurement)T[0,];
        y[n][2] ~ normal(x[n][2], sigma_measurement)T[0,];
    }
}
"""

I used truncated normals for the observations, because I did as well in create the measurements with it gets as an input to have only non-negative values. But anyhow this should not be the reason for the error.

The rest of the code with the error message is:

measurements2 = pd.read_csv("/home/vinc777/masterthesis/two_variant_model/own_simulation/SIR_simple.csv")
del measurements2['Unnamed: 0']
del measurements2['R']

# get some artifical data
input_data2 = {"N": 100,
              "dt": 0.1,
              "y": np.array(measurements2),
              }

post2 = stan.build(toy_model2, data=input_data2)

fit2 = post2.sample(num_chains=4, num_samples=1000)
Sampling:   0%Sampling: Initialization failed.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/home/vinc777/masterthesis/two_variant_model/PyStan/try.ipynb Cell 13' in <module>
----> 1 fit2 = post2.sample(num_chains=4, num_samples=1000)

File ~/.local/lib/python3.8/site-packages/stan/model.py:89, in Model.sample(self, num_chains, **kwargs)
     61 def sample(self, *, num_chains=4, **kwargs) -> stan.fit.Fit:
     62     """Draw samples from the model.
     63 
     64     Parameters in ``kwargs`` will be passed to the default sample function.
   (...)
     87 
     88     """
---> 89     return self.hmc_nuts_diag_e_adapt(num_chains=num_chains, **kwargs)

File ~/.local/lib/python3.8/site-packages/stan/model.py:108, in Model.hmc_nuts_diag_e_adapt(self, num_chains, **kwargs)
     92 """Draw samples from the model using ``stan::services::sample::hmc_nuts_diag_e_adapt``.
     93 
     94 Parameters in ``kwargs`` will be passed to the (Python wrapper of)
   (...)
    105 
    106 """
    107 function = "stan::services::sample::hmc_nuts_diag_e_adapt"
--> 108 return self._create_fit(function=function, num_chains=num_chains, **kwargs)

File ~/.local/lib/python3.8/site-packages/stan/model.py:311, in Model._create_fit(self, function, num_chains, **kwargs)
    308     return fit
    310 try:
--> 311     return asyncio.run(go())
    312 except KeyboardInterrupt:
    313     return

File ~/.local/lib/python3.8/site-packages/nest_asyncio.py:38, in _patch_asyncio.<locals>.run(main, debug)
     36 task = asyncio.ensure_future(main)
     37 try:
---> 38     return loop.run_until_complete(task)
     39 finally:
     40     if not task.done():

File ~/.local/lib/python3.8/site-packages/nest_asyncio.py:81, in _patch_loop.<locals>.run_until_complete(self, future)
     78 if not f.done():
     79     raise RuntimeError(
     80         'Event loop stopped before Future completed.')
---> 81 return f.result()

File /usr/lib/python3.8/asyncio/futures.py:178, in Future.result(self)
    176 self.__log_traceback = False
    177 if self._exception is not None:
--> 178     raise self._exception
    179 return self._result

File /usr/lib/python3.8/asyncio/tasks.py:280, in Task.__step(***failed resolving arguments***)
    276 try:
    277     if exc is None:
    278         # We use the `send` method directly, because coroutines
    279         # don't have `__iter__` and `__next__` methods.
--> 280         result = coro.send(None)
    281     else:
    282         result = coro.throw(exc)

File ~/.local/lib/python3.8/site-packages/stan/model.py:235, in Model._create_fit.<locals>.go()
    233         sampling_output.clear()
    234         sampling_output.write_line("<info>Sampling:</info> <error>Initialization failed.</error>")
--> 235         raise RuntimeError("Initialization failed.")
    236     raise RuntimeError(message)
    238 resp = await client.get(f"/{fit_name}")

RuntimeError: Initialization failed.

You are correct that the error here may be due to the state x potentially becoming negative in the time discretised system, meaning that the diffusion covariance term

\begin{bmatrix}\alpha x_{1} x_{2} & - \alpha x_{1} x_{2} \\-\alpha x_{1} x_{2} & \alpha x_{1} x_{2} + \gamma x_{2} \end{bmatrix}

is not guaranteed to be positive definite. One solution, assuming that in the original SDE system the components of the process x should only take positive values, is to use Itî’s lemma to find an equivalent SDE system in a transformed state u = \log x, with u then being unbounded and x= \exp(u) guaranteed to remain positive.

You are also currently computing the Cholesky decomposition on each evaluation of Sigma - it would be better to analytically solve for the Cholesky (which is feasible for small dimensions) and encode this directly. This can be done using SymPy for example -

import sympy
x = sympy.symbols('x_1 x_2', positive=True)
α, γ = sympy.symbols('α γ', positive=True)
sympy.Matrix(
    [
        [+α * x[0] * x[1], -α * x[0] * x[1]], 
        [-α * x[0] * x[1], +α * x[0] * x[1] + γ * x[1]]
    ]
).cholesky()

gives the Cholesky factor corresponding to the diffusion coefficient as

\displaystyle \left[\begin{matrix}\sqrt{\alpha} \sqrt{x_{1}} \sqrt{x_{2}} & 0\\- \sqrt{\alpha} \sqrt{x_{1}} \sqrt{x_{2}} & \sqrt{\gamma} \sqrt{x_{2}}\end{matrix}\right]

The sde Python package I wrote has some helper functions in sde.transforms for automating change of variables for SDE systems using another package SymNum. Applying a log-transformation to your system with drift term

\displaystyle \left[\begin{matrix}- x_{1} x_{2} α \\ x_{1} x_{2} α - x_{2} γ\end{matrix}\right]

and diffusion coefficient term

\displaystyle \left[\begin{matrix}\sqrt{x_{1}} \sqrt{x_{2}} \sqrt{α} & 0\\- \sqrt{x_{1}} \sqrt{x_{2}} \sqrt{α} & \sqrt{x_{2}} \sqrt{γ}\end{matrix}\right]

can be done with the following code (requires installing sde package plus dependencies)

import sympy
import sde
import symnum.numpy as snp
import symnum 

def drift_func(x, z):
    α, γ = z
    return snp.array(
        [-α * x[0] * x[1], α * x[0] * x[1] - γ * x[1]]
    )

def diff_coeff(x, z):
    α, γ = z
    return snp.array(
        [
            [snp.sqrt(α * x[0] * x[1]), 0],
            [-snp.sqrt(α * x[0] * x[1]), snp.sqrt(γ * x[1])],
        ]
    )

transformed_drift_func, transformed_diff_coeff = sde.transforms.transform_sde(
    snp.log,
    snp.exp,
)(drift_func, diff_coeff)

u = snp.array(sympy.symbols('u_1 u_2'))
z = snp.array(sympy.symbols('α γ', positive=True))

print(transformed_drift_func(u, z))
print(transformed_diff_coeff(u, z))

with the drift term for the transformed state u = \log x being

\displaystyle \left[\begin{matrix}- α e^{u_{2}} - \frac{α e^{- u_{1} + u_{2}}}{2} \\ α e^{u_{1}} - \frac{α e^{u_{1} - u_{2}}}{2} - γ - \frac{γ e^{- u_{2}}}{2}\end{matrix}\right]

and the transformed diffusion coefficient term

\displaystyle \left[\begin{matrix}\sqrt{α} e^{- \frac{u_{1}}{2} + \frac{u_{2}}{2}} & 0\\- \sqrt{α} e^{\frac{u_{1}}{2} - \frac{u_{2}}{2}} & \sqrt{γ} e^{- \frac{u_{2}}{2}}\end{matrix}\right]

If you apply an Euler-Maruyama discretisation to the SDE in u you can then compute the equivalent x = \exp(u) values, while avoiding issues with x becoming negative in the discretised system.

1 Like

First thanks for the hint with the functions in your sde package to transform SDEs.
The idea is good but then when I implement it as an Euler-scheme for u in PyStan that would mean that in the model block I need to sample u using multi_normal_cholesky with the new \mu and \sigma, right?

Something like the following should work (not tested and could be made more efficient by storing intermediate results for reuse)

functions {
    vector mu(vector u, real alpha, real gamma) {
        vector[2] m;
        m[1] = -alpha * (exp(u[2]) - exp(u[2] - u[1]) / 2); 
        m[2] = alpha * (exp(u[1]) - exp(u[1] - u[2]) / 2) - gamma * (1 + exp(-u[2]) / 2);
        return m;
    }
    matrix Sigma(vector u, real alpha, real gamma) {
        matrix[2, 2] L;
        L[1, 1] = sqrt(alpha) * exp((u[2] - u[1]) / 2);
        L[1, 2] = 0;
        L[2, 1] = -sqrt(alpha) * exp((u[1] - u[2]) / 2);
        L[2 ,2] = sqrt(gamma) * exp(-u[2] / 2);
        return L;
    }
}

data {
    int<lower=0> N; // number of timesteps
    real<lower=0> dt; // stepsize
    array[N] vector[2] y; // vector of observations
}

parameters {
    real<lower=0, upper=1> alpha;
    real<lower=0, upper=1> gamma;
    real<lower=0, upper=1> sigma_measurement;
    array[N] vector[2] u;
}

model {
    u[1] ~ normal(0, 1);
    for (n in 2:N) {
        u[n] ~ multi_normal_cholesky(u[n-1] + dt * mu(u[n-1], alpha, gamma), sqrt(dt)*Sigma(u[n-1], alpha, gamma));
        y[n][1] ~ normal(exp(u[n][1]), sigma_measurement)T[0,];
        y[n][2] ~ normal(exp(u[n][2]), sigma_measurement)T[0,];
    }
}

Previously you weren’t assigning a prior to the initial state x[1] which may have been part of the problem as this will have defaulted to an improper prior I believe. Here I have assigned a normal(0, 1) prior to u[1] equivalent to a lognormal(0, 1) prior on x[1]. You were also using y[n-1] instead of x[n-1] in the call to mu and a (1 / 100) scaling factor on the covariance which I wasn’t clear if was intentional or not.

2 Likes

I already tried the model in stan that you proposed and it did not work, thats why I asked my last question. (I did not have a prior on u[1] but adding did not resolve the initialization failed error as well).
The scaling by 1/100 is intentional since I asume to have a total population size of 100 but want to simulate the scaled process, X/100\in [0,1]. I corrected the diffusion coefficient for that as well, but should not be the source of the error anyway.

Edit: The above model code does work now but needs a long time and did not have a look yet whether it gives useful results.

Still I am wondering whether there is a “better” way or any way to use truncated multivariate distributions. Because if my model grows to more compartments than rewriting the process and its drift and diffusion using Ito’s Lemma might lead to quite ugly calculations and complicated drift and diffusion coefficients then.

The stochastic SIR model you are using here is a diffusion based approximation to a Markov jump process. The problem of components of the process becoming negative in discretisations of the process is a well-known problem and there have been various approaches suggested as remedies see for example the articles

These all also involve some level of additional implementation effort though.

Truncating the conditional distributions on the states given the previous states is problematic both as Stan doesn’t have any in-built support for truncated multivariate distributions, but also because doing this truncation in a way which doesn’t introduce boundaries / non-smoothness in to the posterior distribution (which would have implications for how well gradient-based MCMC methods such as those used in Stan will work) I think is non-trivial.

The above model code does work now but needs a long time and did not have a look yet whether it gives useful results.

With regards to the poor performance of the current approach, there are few possibilities here.

The current centred parameterization you are using in terms of the state process may be leading to a challenging posterior geometry for the chains to navigate. Switching to a non-centred parametrization in terms of the Wiener process noise increments may help with this.

The upper-bound constraint you put on the sigma_measurement is probably not necessary (generally we would only require a standard-deviation parameter to be lower-bounded by 0) and may be causing issues if the data supports larger values of the sigma_measurement as the posterior may end up artificially concentrated at the upper boundary. This may also apply to alpha and gamma as I believe these are typically only constrained to be non-negative.

The log-transform based approach I suggested may also be causing numerical issues when one of the transformed state components tends to negative infinity (corresponding to zero in the original state space). When using this approach for a similar model (SIR model with contact rate parameter also modelled as a diffusion process), we artificially clipped any updates to the transformed states below at -500 (corresponding to the a lower bound on the original states of \exp(-500)) to remedy this.

2 Likes