Hi everyone,
I am trying to build a state space model with some explained variables. The basic idea of this project is understanding a vegetation population trend over the years. The structure I used to create a simulation data is:
True population size[t] = True population size[t-1] * (growth rate - death rate * disturbance event[t-1])
Measured population size[t] = lognormal(log(True population size[t], measurement error)
I learned two methods of SSM from Chapter 16 in Richard McElreath’s book and shihrks’ git repository. I tried to modified both of there stan script and came up with the following question:
- What is the advantage of using ode_rk45 function? Can I add a vector of explained variable in the ode_rk45 function?
functions { // The function pass use in ode_rk45
vector dpop_dt( real t, // time
// initial state {y[1]:lynx, y[2]hares}
vector y,
// parameters
real bh, // birth rate of hare
real mh, // mortality rate of hare
real ml, // mortality rate of lynx
real bl // birth rate of lynx
) {
// differential equations. The famous Lotka-volterra equation
vector[2] dP_dt;
dP_dt[1] = (bl * y[2] - ml) * y[1]; // lynx growth depend on hare population
dP_dt[2] = (bh - mh * y[1]) * y[2]; // hare death depend on lynx population
return dP_dt ;
}
}
data {
int<lower=0> N; // number of measurement times
real<lower=0> pelts[N,2]; // measured populations
array[N] real time;
}
parameters {
real<lower=0> bh;
real<lower=0> mh;
real<lower=0> ml;
real<lower=0> bl;
vector[2] y0;
real<lower=0> sigma[2]; // measurement errors
real<lower=0,upper=1> p[2]; // trap rate
}
transformed parameters {
array[N] vector[2] pop = ode_rk45(
dpop_dt, y0, 0, time, bh, mh, ml, bl);
}
- Using transformed parameter and a loop in the model section. I am able to add the explained variables but the model came out with many fitting problem. What might be the possible solution?
I always low E-BFMI (less than 0.2) and super low effective sample size + High Rhat for gp and dp.
data {
int<lower=0> N; // number of measurement times
vector[N] p; // measured populations
vector[N] c; // randomly produce cyclone event
}
parameters {
real<lower=0> gp; // growth rate of population
real<lower=0> dp; // death rate of population
real<lower=0> sigma; // measurement errors
real<lower=0> sigma_na;// natrual variation
row_vector<lower=0>[N] mu; // Population size of each time point
}
transformed parameters {
matrix<lower=0>[N, N] y;
y = (gp - dp * c) * mu; // Determine the link of population size and
vector<lower=0>[N] yhat; // True population size (state)
yhat = diagonal(y);
// Because I am not familiar how can I store a product of a vector*row_vector.
// I use this really stupid method to grab true population.
// This create a lot of unused parameter (y) in the model
}
model {
// priors
gp ~ normal( 1 , 0.5 ); // growth rate of population
dp ~ normal(1, 0.5); // death rate of population
sigma ~ exponential( 1 ); // measurment error
sigma_na ~ exponential(1); // natural variation
// observation model
// connect latent population state to observed population size
for ( t in 2:N )
mu[t] ~ lognormal(log(mu[t-1]), sigma_na); // Use a for loop to assign the
p ~ lognormal(log(yhat), sigma);
}
- As you might have seen above, this is a super naïve stan code. I am happy to get any suggestion. Especially if I could get rid of the diagonal(y).
I am really new to stan so please let me know if I should elaborate anything more. I am looking forward to any kind of suggestion. Thank you in advance.
Here is the R code I used to create simulated data and the full stan code I use for the analysis.
sim_popc <- function( n_steps , init , theta , dt= 0.01 ) {
Pop <- rep(NA,n_steps)
C <- rep(NA, n_steps) # Cyclone number
Pop[1] <- init[1] # Initial population size
C[1] <- init[2]
for ( i in 2:n_steps ) {
Pop[i] <- Pop[i-1] + dt*Pop[i-1]*(theta[1] - theta[2] * C[i-1]) # Population change is positively affect by i-1 population size and negatively affect by i-1 number of Cyclone
C[i] <- sample(c(1:10), 1) # random draw cyclone
}
return(cbind(Pop, C))
}
#' Create the simulated population and cyclone
simPopc <- sim_popc(100, init = c(10, 4), theta = c(1.1, 0.2))
data {
int<lower=0> N; // number of measurement times
vector[N] p; // measured populations
vector[N] c;
}
parameters {
real<lower=0> gp; // gp
real<lower=0> dp; // dp
real<lower=0> sigma; // measurement errors
real<lower=0> sigma_na;// natrual variation
row_vector<lower=0>[N] mu; // Population size of each time point
}
transformed parameters {
matrix<lower=0>[N, N] y;
y = (gp - dp * c) * mu;
vector<lower=0>[N] yhat;
yhat = diagonal(y);
}
model {
// priors
gp ~ normal( 1 , 0.5 ); // gp
dp ~ normal(1, 0.5); // dp
sigma ~ exponential( 1 ); // measurment error
sigma_na ~ exponential(1); // natural variation
// observation model
// connect latent population state to observed population size
for ( t in 2:N )
mu[t] ~ lognormal(log(mu[t-1]), sigma_na);
p ~ lognormal(log(yhat), sigma);
}
generated quantities {
// Directly produce posterior prediction in stan
array[N] real p_pred;
for(t in 1:N)
p_pred[t] = lognormal_rng(log(yhat[t]), sigma);
}