LOO model selection for state space model

I am trying to figure out how to code up the log_lik variable in the “generated quantites” block for use with the LOO package, and therefore to choose between different model structures. The model that I have is a state-space model, where the states are represented by a random walk, and there is a gaussian observation term on top of it.

Now, I would like to use the PSIS-LOO approach to choose between different versions of this model (specifically how the observation process is modelled. But I’m very unsure what the correct way to write the generated data and log_lik term is. Should it include just the observation component of the likelihood? Or also the random walk component as well (I’m not sure how that would be coded up though).

More specifically, there are two elements to the model. Firstly, the underlying “state” of the system (S) is represented by a first-order random walk

S_{i+1} = S_{i} + \epsilon

\epsilon \sim N(0, \sigma_{RW})

Secondly is the observation layer, where the observations Y are linearly related to the state S via a gaussian observation model

Ŷ_i=\alpha S_i

Y_i \sim N(Ŷ_i,\sigma_{ObsErr})

The Stan code I am using is:

model {
  //Observation component of likelihood
  target += normal_lpdf(Y | Y_hat , sigmaObsErr);
  
  //Random walk component
  for(i in 2:n_years){
    target += normal_lpdf(S[i] | S[i-1] ,sigmaRW);
  }
}
 

I’m guessing that the log_lik that LOO needs should only contain the observation term, so that the generated quantites block then becomes:

generated quantities {
  vector[n_obs] log_lik;
  for(j in 1:n_obs) {
    log_lik[j] = normal_lpdf(Y[j] | Y_hat[j] , sigmaObsErr);
  }
}

But it’s only a guess!

4 Likes

Correct.

You might be interested also in checking the papers and the code in

6 Likes

Great! Thanks for the clarification.

Mark

While we’re on the topic, I wonder if @avehtari has any thoughts on loo with a state space model with multiple observations per time point, and where the loglikelihood function filters the observations such that the loglikelihood is 0 for a time point with no observations. Just dropping the time point should be fine for the case where all obs are missing, but I couldn’t see an obvious solution when only some obs are missing.

Are you considering leave-one-obs-out or leave-one-time-point-out?

leave one time point out

And how do you compute the likelihood now?

filter the observation vector, prediction vector, and covariance matrix for missingness, then multivariate normal type stuff.

Are you modeling the missing observations (ie sampling them)?

Can you post the relevant code lines for the likelihood?

No, missing observations are being filtered out. Code is complex, but the basic idea is that only the relevant indices from the predicted mean and covariance matrix are used, and this may vary from time point to time point. If it doesn’t make sense I’ll post some simplified code another day.

Code could be useful as I can’t figure out from this how many dimensions in the covariance matrix and how many observations, and is the multivariate just for one time point or all etc.

Let’s assume the prediction vector (ypred) and covariance matrix (covm) for each time point is some function of the parameters and time. The rowwise likelihood function looks something like this, where ‘whichobs[t]’ is an array of integers specifying non missing observations at that t.

for(t in 1:maxT){
llrow[t] ~ multinormal(ypred[t, whichobs[t] ], covm[t,  whichobs[t], whichobs[t] )
}

If whichobs[t] in most cases specifies 10 non missing variables, but occasionally there is only 1 or two non missing, then I expect leave one time point out will break for the cases with 1 or 2. Not sure if there’s any straightforward fix, I sure can’t think of one.

You can use llrow for loo package and it’s then leave-one-time-point-out. loo package has diagnostics to tell whether this works or not. And soon it has iterative moment matching approach which makes it work in more cases.