Hi, I am struggling with model selection for state-space models coded in rstan.
In my state-space models, the states (longitudinal positions of 10 fish individuals in a river) are assumed to follow a Cauchy random walk. I use the inverse function method to express the Cauchy random walk, specifying the state using the following formula:
state[i] ~ Cauchy(state[i-1], s_w)
I want to examine how the shape parameter s_w varies according to survey month and individual fish. To do this, I implement the following regression equation in the model block:
s_w[m, k] ~ lognormal(a[k] + b_TL*TL[m], sigma)
where m is the survey month, and k is the individual.
a is the individual intercept and b_TL is the coefficient expressing the effect of TL.
a is specified by the following formula:
a[m] = a0 + a_sig * a_raw
a_raw ~ normal(0, 1)
(non-centered reparameterization)
Generally, leave-one-out cross-validation (LOOCV) can be used for model comparison, and this can be done using loo::loo()
. However, in my case, the data come from different individuals and different survey months, and I am wondering whether I can still compare models using the same method.
Or should I instead use leave-one-individual-out cross-validation in my case?
Additionally, is leave-one-individual-out cross-validation still valid when the number of survey months differs between individuals (e.g., individual A has data from 2 survey months, while individual B has data from 5 months)?
My model is something like follows:
data {
int<lower=0> N; // Number of observations per individual per survey month
int<lower=0> K; // Number of individuals
int<lower=0> M; // Number of months
array[N, K, M] real Y; // Observed values
matrix[K, M] zero_Dist; // Initial position for each individual on the first day of each survey month
vector[M] TL; // Total length (TL) of fish per survey month
}
parameters {
real<lower=0> s_v; // Observation error standard deviation
matrix[K, M]<lower=0> s_w; // State error standard deviation
matrix[K, M] state_zero_raw; // Standardized state deviation (unconstrained)
array[N, K, M] real state_raw; // Used for inverse function method
// Hierarchical model parameters
real a0; // Intercept
real<lower=0> a_sig;
vector[K] a_raw;
real b_TL;
real<lower=0> sigma;
}
transformed parameters {
array[N, K, M] real state; // Longitudinal position of fish
for (m in 1:M) {
for (k in 1:K) {
// Initial value for the first day of each month
state[1, k, m] = zero_Dist[k, m] + 3.04 * state_zero_raw[k, m];
// For subsequent days, update state using state error
for (n in 2:N) {
state[n, k, m] = state[n - 1, k, m] + s_w[k, m] * tan(state_raw[n, k, m]); // Inverse function method
}
}
}
// 個体効果のreparametarization
vector[K] a;
a = a0 + a_sig * a_raw;
}
model {
// Priors
s_v ~ normal(3, 1);
state_zero_raw ~ normal(0, 1);
a_raw ~ normal(0, 1);
sigma ~ normal(0, 10);
// Observation model
for (m in 1:M) {
for (k in 1:K) {
for (n in 1:N) {
Y[n, k, m] ~ normal(state[n, k, m], s_v);
}
}
}
// Regression model
for (m in 1:M) {
for (k in 1:K) {
s_w[k, m] ~ lognormal(a[k] + b_TL * TL[m], sigma);
}
}
}
Thank you!