# Looking for help of state space model and question about ode_rk45

Hi everyone,

I am trying to build a state space model with some explained variables. The basic idea of this project is understanding a vegetation population trend over the years. The structure I used to create a simulation data is:

True population size[t] = True population size[t-1] * (growth rate - death rate * disturbance event[t-1])

Measured population size[t] = lognormal(log(True population size[t], measurement error)

I learned two methods of SSM from Chapter 16 in Richard McElreath’s book and shihrks’ git repository. I tried to modified both of there stan script and came up with the following question:

1. What is the advantage of using ode_rk45 function? Can I add a vector of explained variable in the ode_rk45 function?
functions { // The function pass use in ode_rk45
vector dpop_dt( real t,                 // time
// initial state {y[1]:lynx, y[2]hares}
vector y,
// parameters
real bh, // birth rate of hare
real mh, // mortality rate of hare
real ml,  // mortality rate of lynx
real bl // birth rate of lynx
) {
// differential equations. The famous Lotka-volterra equation
vector[2] dP_dt;

dP_dt[1] = (bl * y[2] - ml) * y[1]; // lynx growth depend on hare population
dP_dt[2] = (bh - mh * y[1]) * y[2];  // hare death depend on lynx population
return  dP_dt ;
}
}
data {
int<lower=0> N;              // number of measurement times
real<lower=0> pelts[N,2];    // measured populations
array[N] real time;
}
parameters {
real<lower=0> bh;
real<lower=0> mh;
real<lower=0> ml;
real<lower=0> bl;
vector[2] y0;
real<lower=0> sigma[2];      // measurement errors
real<lower=0,upper=1> p[2];  // trap rate
}
transformed parameters {
array[N] vector[2]  pop = ode_rk45(
dpop_dt, y0, 0, time, bh, mh, ml, bl);
}


1. Using transformed parameter and a loop in the model section. I am able to add the explained variables but the model came out with many fitting problem. What might be the possible solution?
I always low E-BFMI (less than 0.2) and super low effective sample size + High Rhat for gp and dp.
data {
int<lower=0> N;      // number of measurement times
vector[N] p;  // measured populations
vector[N] c;  // randomly produce cyclone event
}
parameters {
real<lower=0> gp;      //  growth rate of population
real<lower=0> dp;      // death rate of population
real<lower=0> sigma;   // measurement errors
real<lower=0> sigma_na;// natrual variation
row_vector<lower=0>[N] mu;  // Population size of each time point
}
transformed parameters {
matrix<lower=0>[N, N] y;
y = (gp - dp * c) * mu;  // Determine the link of population size and
vector<lower=0>[N] yhat; // True population size (state)
yhat = diagonal(y);
// Because I am not familiar how  can I store a product of a vector*row_vector.
// I use this really stupid method to grab true population.
// This create a lot of unused parameter (y) in the model
}
model {
// priors
gp ~ normal( 1 , 0.5 );    //  growth rate of population
dp ~ normal(1, 0.5);         // death rate of population
sigma ~ exponential( 1 );    // measurment error
sigma_na ~ exponential(1);   // natural variation

// observation model
// connect latent population state to observed population size
for ( t in 2:N )
mu[t] ~ lognormal(log(mu[t-1]), sigma_na); // Use a for loop to assign the

p ~ lognormal(log(yhat), sigma);
}


1. As you might have seen above, this is a super naïve stan code. I am happy to get any suggestion. Especially if I could get rid of the diagonal(y).

I am really new to stan so please let me know if I should elaborate anything more. I am looking forward to any kind of suggestion. Thank you in advance.

Here is the R code I used to create simulated data and the full stan code I use for the analysis.

sim_popc <- function( n_steps , init , theta , dt= 0.01 ) {
Pop <- rep(NA,n_steps)
C <- rep(NA, n_steps) # Cyclone number
Pop[1] <- init[1] # Initial population size
C[1] <- init[2]
for ( i in 2:n_steps ) {
Pop[i] <- Pop[i-1] + dt*Pop[i-1]*(theta[1] - theta[2] * C[i-1]) # Population change is positively affect by i-1 population size and negatively affect by i-1 number of Cyclone
C[i] <- sample(c(1:10), 1) # random draw cyclone
}
return(cbind(Pop, C))
}

#' Create the simulated population and cyclone
simPopc <- sim_popc(100, init = c(10, 4), theta = c(1.1, 0.2))

data {
int<lower=0> N;      // number of measurement times
vector[N] p;  // measured populations
vector[N] c;
}

parameters {
real<lower=0> gp;      //  gp
real<lower=0> dp;      //  dp
real<lower=0> sigma;   // measurement errors
real<lower=0> sigma_na;// natrual variation
row_vector<lower=0>[N] mu;  // Population size of each time point
}
transformed parameters {
matrix<lower=0>[N, N] y;
y = (gp - dp * c) * mu;
vector<lower=0>[N] yhat;
yhat = diagonal(y);
}
model {
// priors
gp ~ normal( 1 , 0.5 );    // gp
dp ~ normal(1, 0.5);         // dp
sigma ~ exponential( 1 );    // measurment error
sigma_na ~ exponential(1);   // natural variation

// observation model
// connect latent population state to observed population size
for ( t in 2:N )
mu[t] ~ lognormal(log(mu[t-1]), sigma_na);

p ~ lognormal(log(yhat), sigma);
}
generated quantities {
// Directly produce posterior prediction in stan
array[N] real p_pred;
for(t in 1:N)
p_pred[t] = lognormal_rng(log(yhat[t]), sigma);
}


If I understand correctly this first example is from a book or repo. I’m not sure if the question is about the ode_rk45 method as opposed to other ODE solvers — in which case the answer can be long and model-dependent, but boils down to something like: “some solvers are generally better than others, but each may do better with a certain kind of (harder/easier) problems” — or if the question is about using ODE solvers at all instead of discrete-time solvers computed in a simple loop (as your second question suggests). If it’s the latter, the general (short) answer is ODE solvers have internal error control and step size optimization that simple discrete-time calculations don’t, so they are more reliable in an inference context where we need to ensure that a valid solution is generated for arbitrary parameters (although the advantage of ODE over DT solvers is not absolute). To answer the second part of this question, you can pass whatever parameters you’d like to the ODE solver to be used in the differential equations.

The way this is structured you seem to be simply using a parameter mu and fitting its values to the data, multiplying it by the growth/death rates (times c) is not generating the effect you are probably expecting, it’s just scaling mu, but their values are probably meaningless.

This formulation is not equivalent to that of question 1. To set up a dynamic state-space model you need to compute the difference (or differential, like dP_dt in the Lotka-Volterra example) and compute the trajectory from an initial value (fixed, or as a parameter). As mentioned above, this is best using an ODE solver, although you may want to set up a discrete time solver to understand the kind of calculation is doing, and then use the solver functions for the actual inferences.

I’d suggest spending more time understanding the Lotka-Volterra formulation so you can implement a version with your own model, then go from there into any further issues.

Hi Caesoma,

Thank you so much for your reply. They are all super helpful. I kept reading some related posts and papers these day. I understood more about the application and mechanism of the ode functions. Just would like to clarify my understanding. Please correct me if I am wrong.

The ode function take the targeted value of a time point t(i) and calculate the difference to the next one t(i+1). And the ode can estimate the parameter supplied in the function to correctly predict/estimate the targeted value at t(n) based on t(0), steps (n), and parameter. The parameters or interventions should also be a continuous process.

Blockquote
The way this is structured you seem to be simply using a parameter mu and fitting its values to the data, multiplying it by the growth/death rates (times c ) is not generating the effect you are probably expecting, it’s just scaling mu , but their values are probably meaningless.

Now I understand what you meant about the scaling mu with c. Thank you very much for pointed it out.

I also found your post about a SSM with a discrete output. I will try to understand that one as well.

Here is the updated model that separated the continuous process (handle by ode) and discrete disturbance event. This is just another try to make it closer to the cause of population fluctuation.

functions {
vector dpop_dt(real t,               // time
vector y, // popoulation size used in diffrential equation
int N,
// parameters
real cp    // change rate = grwoth - death
) {
// Declare differential equation type
vector[1] dp_dt;

// differential equations
dp_dt[1] = cp * y[1];

// Function product
return dp_dt ;
}
}
data {
int<lower=0> N;      // number of measurement times
array[N] real<lower=0> p;  // measured populations
vector[N] c;
array[N] real time;  // time steps
}
transformed data{
real t0 = 0;
array[N-1] real t;
t = time[2:N];
}

parameters {
real<lower=0> cp;      //  cp
real<lower=0> sigma;      // measurement errors
vector[1] y0;             // Initial population size
real<lower=0> effectc;    //effect of cyclone
}
transformed parameters {
// declare the estimated population size: pop
// use ode_rk45 to estimate population size based on differential equation
array[N] vector[1] pop;
pop[1, 1] = y0[1];
pop[2:N]= ode_rk45(dpop_dt, y0, t0, t, N, cp);
}
model {
// priors
cp ~ normal( 1 , 0.5 );      // gp
sigma ~ exponential( 1 );    // measurment error
y0 ~ lognormal( log(10) , 1 );    // initial population size
effectc ~ lognormal(log(1), 0.5);  // effect of cyclone

// observation model
// connect latent population state to observed population size
p[1] ~ lognormal(log(pop[1, 1]), sigma);
for(i in 2:N){
p[i] ~ lognormal(log(pop[i, 1] - effectc * c[i - 1] * pop[i, 1]) , sigma); // At the end of each time point, the effect of disturbance is introduced and substract a certain proportion of the population size
}
}
generated quantities {
// Directly produce posterior prediction in stan
array[N] real p_pred;
p_pred[1] = lognormal_rng(log(pop[1, 1]), sigma);
for(i in 2:N){
p_pred[i] = lognormal_rng(log(pop[i, 1] - effectc * c[i] *pop[i, 1),sigma);
}
}



That’s right, that’s the general idea, since we don’t have the explicit function y=f(t|\theta) but \frac{dy}{dt} = g(t,y|\theta) (or a version of it, like \frac{dy}{dt} = g(t|\theta)), and (normally) cannot solve for y analytically the ODE solver will provide a numerical approximation to the solution.

The solver in and of itself has nothing to do with inference, it’s like computing the output of any model, but it is necessary if you want to obtain estimates (the solution is needed to evaluate the likelihood of any set of parameters).

The parameters don’t need to be continuous necessarily, but for HMC to work properly they kind of do. I’m not sure what you are calling interventions, but if it’s for instance a vector with number of deaths dealt by random a cyclone event (if I got it right) they could in principle just be added to the solution of the ODE, and wouldn’t need to be continuous.

That was me, and it’s not really a standard kind of model, so I suggest not starting with that, because basically there’s no good solution to that, at least for now.

That seems to have the right ingredients discussed above, but I’m sorry that I don’t have the time to check all the details.

Hi Caesoma,

Thank you again for your time and patient. I really appreciated your help. I will follow your suggestion and keep working on it. Thank you vey much.

Cheers,

Chieh

1 Like