 # Help with unsual state space model

Hello, I am new to stan and state space models and have a conceptual and Stan modelling question.

Model Setting

I am trying to measure the decline of immature shrimp individuals in growth tanks so no reproduction just survival. At day 3 and 5 of each week I pull all live shrimp with a net and record their total mass. Then I calculate the number(N) of shrimp as:

N_t=\frac{(Shrimp ~total~Biomass)_t}{(mean~weight~shrimp)_t}

where mean weight of a shrimp is the average weight of a sample of n shrimps so the number of shrimp becomes a continuous variable. The process part of the model would be:

N_{t+\Delta}=N_{t}\theta_{t}^{\Delta}

where \theta_{t} is the daily survival bounded to [0,1]

\theta_{t} \sim Beta(\alpha,\beta)

and the measurement error:

Y_{t+\Delta} \sim N(N_{t+\Delta},\sigma^2)

Generated R Data

 set.seed(1000)
days=90
phi=0.98;lambda=10
alpha = lambda*phi;
beta = lambda*(1-phi)
theta=rbeta(days-1,alpha,beta)
N=rep(NA,90)
N=1000
for(i in 1:89){
N[i+1]=N[i]*theta[i]
}
y=N+rnorm(N,0,30)
day_sampled=c(1,sort(c(seq(3,84,7),seq(7,84,7))))
dt=diff(day_sampled)
y=y[day_sampled]


Stan Model

Have not thought about the prior but the model for one tank to be extended to several would be:

data {
int<lower=0> N;
real y[N];
int NO;
real dt[N-1];
}
parameters {
real<lower=0,upper=1> phi;
real<lower=0, upper=100> lambda; // maybe model lambda as function of time ellapsed
real <lower=0,upper=200> sigma_obs;
vector<lower=0,upper=1>[NO] theta;
real<lower=0,upper=4000> N_est1; // Initial N
}
transformed parameters {
vector[N] N_est;
N_est=N_est1;
for (k in 1:NO) {
N_est[k+1]=N_est[k]*theta[k]^dt[k];
}
}
model {
theta ~ beta(lambda*phi,lambda*(1-phi));
y ~ normal(N_est,sigma_obs);
}


 stsp_beta<-
rstan::stan(model_code=model_beta,
data = list(N=25,dt=dt,y=y,NO=24),
chains = 4,cores=4,iter=30000,verbose = TRUE)


Is this model sensible from a conceptual point of view? and correctly specified in stan(wo looking at the priors)?

It looks it can estimate phi and sigma_obs, however lambda is overestimated, not sure if this is a identifiability problem or a syntax issue.

I think in addition the fact that there are different time intervals should impact
the process error, not sure if this could be modelled by modelling the scale parameter
as function of time elapsed.

Thank you for any help.

2 Likes

In the parameter block, I would not put upper limits on parameters that are not bounded naturally by the parameter space. For instance, I would change:

real<lower=0, upper=0> lambda;


to

real<lower=0> lambda;


Removing the upper bounds likely means you need to place a prior on those parameters. There’s some discussion here: Stan pedantic mode « Statistical Modeling, Causal Inference, and Social Science

Are the values of y discrete or continuous? I’ve tried to fit continuous models to somewhat discrete data (recorded at 0.5 intervals) and have had loads of issues fitting the models. Just a heads up.

There’s an overview of state-space models with Stan here: 9.1 Overview | Applied Time Series Analysis for Fisheries and Environmental Sciences I agree with the process error part, just not exactly sure what to do in this moment. Will try to come back once I’ve looked through a few things. Also not exactly sure why lambda might be overestimated. Perhaps a prior distribution that limits the amount of probability mass of say lambda > 50 or something could help?

2 Likes

Thank you very much for the help vianeylb. The y’s are continuous, the problem is that the measurement error causes sometimes to have Nt+1>Nt and hence theta>1. The time intervals on the other hand are discrete more less an approximation. I’ve tried a gamma(1,0.1) prior for lambda to have most of the mass to the left of 50, and I used simulations at dt=1 as I was concerned that not addressing the lambda parameter for longer dt could impact the parameter recovery. Given these conditions, I get back a lambda(bar)=18.24 vs. 10. Look forward to any other suggestions you may have.

1 Like

P.D. Actually given dt=1, it seems possible to recover lambda around 10, 18.24 (above) seems to be just by chance of the simulated values. However, given larger dt’s the retrieved lambda seems bigger.

1 Like