Model selection of nonlinear flexible hierarchical model with loo

Dear Stan community,

Here a quite new stan user that would really appreciate your imput. I am fitting a multi-species functional response using holling’s disk equation. Where we are interested in fitting the following equation: c_i/sum(c_j) = a_i * N_i^m / sum(a_j*N_j^m).

Parameters:
c_i = consumption of species i, a_i = attack rate of species i, m_i = shape parameter, N_i = density of species i, a_j = attack rate of species j, N_j = density of species j

I would like to compare a few models.

  1. setting the shape parameter either to 1 which means a hyperbolic shape,
  2. setting m to 1.5 which means light sigmoidal shape,
  3. estimate an overall m for all species,
  4. estimate an m for each species.

The data are 20 different sampling locations in space and time for which diet estimations including uncertainty in terms of proportion of each species (n=10) to the diet. All of these proportions thus sum up to 1. For now, I have coded this up as coming from a normal distribution. However, a Dirichlet might be more appropriate? Prey availability is also estimated with uncertainty for each species and scaled to a maximum value of 100. For one of the prey species, lest call it species x, we are missing some estimates and are giving this a prior of N(50,20) at those sites to let the model estimate availability. Additionally, we have a species category “others” which compromises the sum of several other species that individually do not contribute considerably to the diet. For those species we assume a constant availability. The parameters of interest are attack rate a and shape parameter m while being able to recapture diet c.
I have used the loo package to do model comparison but I get some high pareto k value’s for all the models. My understanding so far is that if k goes above 0.7 it implies that we have an unstable estimate and we cannot trust the loo estimates for the model as it is likely to be too optimistic. If I use simulation data with known parameters and using loo we however do select the correct model. This is also confirmed by comparing the observed diet and the diet estimates from the model.
There are several reasons why these high pareto values might be occurring with these models. However, the most likely is that our model is very flexible. We have many parameters and “few” observations by species. Some points might be influential (when a prey species is generally not eaten but a few times it is) and our priors for a and m are not very informative.
One alternative is to use k-fold cross validation, however, as we have different proportions according to sites and sometimes unknown availability of species x I am not sure this will be very beneficial.
I would really appreciate input, if even though I get high pareto k values, I am still able to use loo to do model comparison between these different models? Or are there other good alternatives, especially some sort of penalization for model complexity, or should we even go as far back to describing observed vs predicted?
Thank you!

Below code for fixed m model.

// The input data 
data {
  int<lower=1> nData; // number of data
  int<lower=1> nSpecies; // number of species
  // prey available -1 for others
  matrix[nData, nSpecies-1] N_avail; 
   // sd prey available 
  matrix[nData, nSpecies-1] sdfish;
   // prey consumption biomass mean
  matrix[nData, nSpecies] N_diet; 
  // prey consumption biomass sd
  matrix[nData, nSpecies] sigma_diet; 
}

// The parameters accepted by the model. 
parameters {
  real<lower=0, upper =10 > aa[(nSpecies-1)]; // attack rate
   matrix<lower=0>[nData,nSpecies-1] NN; // prey availability
}

transformed parameters {
  matrix[nData, nSpecies] q;
  matrix<lower=0>[nData, nSpecies] mu;
  vector[nData] total;
  vector[nSpecies] a;
  matrix[nData, nSpecies] N;


// set other prey to 100
for(i in 1:nData){
    for(j in 1:nSpecies-1){
   N[i,j] = NN[i,j];
}
N[i,nSpecies] = 100;
}

// set attack rate of one species to 1, so we get relative attack to this one
 for(z in 1:(nSpecies-2)){
		a[z]=aa[z];
	}
 a[(nSpecies-1)] = 1;
 a[(nSpecies)] = aa[(nSpecies-1)];

// estimate consumption  
  for(i in 1:nData){
    for(j in 1:nSpecies){ 
      q[i, j] = pow((a[j]*N[i, j]), 1);
    }
    total[i] = sum(q[i,1:nSpecies]);
    for(j in 1:nSpecies){
      mu[i, j] =  100 * q[i, j] / total[i]; 
    }}}


// The model to be estimated
model {
   aa ~ gamma(1,1); // attack rate

  // prey availability
  for(i in 1:nData){
    for(j in 1:nSpecies-1){
  NN[i,j] ~ normal(N_avail[i,j], sdfish[i,j]) T[0,];
 } }

  // diet
  for(i in 1:nData){
    for(j in 1:nSpecies){
      N_diet[i,j]~normal(mu[i,j],sigma_diet[i,j]);
      }  }}

// added quantities
generated quantities{ 
  real log_lik[nData, nSpecies];    
  for(i in 1:nData) {
    for(j in 1:nSpecies){
      log_lik[i, j] = normal_lpdf(N_diet[i,j] | mu[i,j], sigma_diet[i,j]);
    }
  }
}

  • If you can use posterior predictive checking to show that some of the models are not sensible (or all), you don’t need to care about loo comparison
  • if one of the models is the encompassing model that includes the special cases, and the parameters are not correlated a posteriori then you can look directly at the posterior instead of predictive performance (see, e.g. Beta blocker cross-validation demo and counter-example Bayesian version of Does model averaging make sense?)
  • If you’re using RStan, you could use Moment matching for efficient approximate leave-one-out cross-validation (LOO) — loo_moment_match • loo which can sometimes get rid of high k values
  • If not using RStan, and k is just above 0.7, you may try running MCMC for more iterations to get more accurate Pareto k estimate.
  • If two above ones don’t help, and only some Pareto k are high, consider running MCMC just for these leave-one-out-folds (just make the observations as missing data to keep the model structure, this is discussed in Section 3.1 of Efficient leave-one-out cross-validation for Bayesian non-factorized normal and Student-t models)
  • If none of the above helps, then the issues is that you don’t know how big the error is, but if the difference between models is very big then there is only a small probability that the order of the models would be different with more accurate computation. With k larger than 0.7, it’s difficult to give a generic advice what does very big difference mean.
2 Likes