Problem running loo_subsample(), wheras loo() works

This question may be similar to this topic:

From my model fit I extract the draws needed for the log likelihood computations as follows:

draws = fit$draws( variables=c( "mu", "sigma_e" ), format="draws_matrix" )

Note that I also tried the following two approaches, which are inspired by the topic mentioned above:

draws = as_draws_matrix( fit$draws( variables=c( "mu", "sigma_e" ) ) )
draws = subset_draws( as_draws_matrix( fit ), variable=c( "mu", "sigma_e" ) )

With this line I convert the data to a dataframe:

myDat = data.frame( x=seq( 1, length( stanDat$y ) ), y=stanDat$y )

The function for computing the log likelihood is as follows:

llfun = function( data_i, draws, log=TRUE ) { mu = draws[, c( 1:( dim( draws )[2] )-1 )][, data_i$x] sigma_e = draws[, dim( draws )[2]] dlnorm( x=data_i$y, mean=mu, sd=sigma_e, log=log ) }

Computing the ELPD LOO works for either the whole dataset with:

elpdloo = loo( llfun, draws=draws, data=myDat, r_eff=NA, cores=1) #note that I omit r_eff for simplification here

or:

elpdloo = loo_subsample( llfun, draws=draws, data=myDat, r_eff=NULL, cores=1, observations=2000 ) # There are 2000 observations

It works for one observation with:

elpdloo1 = loo_i( i=1, llfun, r_eff=NA, data=myDat, draws=draws )

But loo_subsample() fails when using a subset of all observations:

elpdloo = loo_subsample( llfun, draws=draws, data=myDat, r_eff=NULL, cores=1, observations=200 )

Yielding the following error:

Error in draws[, c(1:(dim(draws)[2]) - 1)][, data_i$x]: incorrect number of dimensions

I guess that the error is due to how I subset the draws in llfun() and how loo_subsample subsets the observations. I looked into the source code, but couldn’t figure out what I would need to change. Also, this topic seems to be relevant.

In the examples of loo_subsample() predictors are provided as matrix, but I wouldn’t know how to use that format with my data, because I have an if else argument in my model:

transformed parameters {
  array[N] real mu;
  array[N] real mu_route1;
  array[N] real mu_route2;
  
  for (i in 1:N) {
    if (number[i] == -1) {                       // plural
      mu_route1[i] = BfreqF*freqF[i];
      mu_route2[i] = BfreqL*freqL[i] + penalty;
      mu[i] = intercept + Btrial*trial[i] + Bdura*dura[i] + fmin(mu_route1[i], mu_route2[i]) + u[subj[i]];
      
    } else if (number[i] == 1) {                 // singular
      mu_route1[i] = 99;
      mu_route2[i] = 99;
      mu[i] = intercept + Btrial*trial[i] + Bdura*dura[i] + BfreqL*freqL[i] + u[subj[i]];
    }
  }
}

model {
  // priors
  penalty ~ normal(penaltyReal, abs(penaltyReal)/4 + 4/(abs(penaltyReal)+4));
  sigma_e ~ exponential(0.1);
  sigma_u ~ exponential(0.1);
  u ~ normal(0, sigma_u);
  intercept ~ normal(interceptReal, abs(interceptReal)/4);
  Btrial ~ normal(BtrialReal, abs(BtrialReal)/4);
  Bdura ~ normal(BduraReal, abs(BduraReal)/4);
  BfreqL ~ normal(BfreqLReal, abs(BfreqLReal)/4);
  BfreqF ~ normal(BfreqFReal, abs(BfreqFReal)/4);
  y ~ lognormal(mu, sigma_e);
}

Can you explain what you are trying to do here? And could you use subset_draws() here?

Yes, of course.

draws[, c( 1:( dim( draws )[2] )-1 )] corresponds to the posterior draws of mu, that is, mean of the normal distribution from which samples are taken. ( draws[, c( 1:( dim( draws )[2] corresponds to the variance of the normal distribution.)

With [, data_i$x] I use a subset corresponding to the draws for the first, second, …, nth observation (because data_i is a subset of the whole dataset corresponding to the ith observation.

Basically, I borrowed this approach from this topic.

In the meantime, I solved my problem by computing the loglik in the generated quantities block and I defined:

draws = as_draws_matrix( fit$draws( variables=c( "loglik" ) ) )
llfun = function( data_i, draws, log=TRUE) {loglik[, data_i$x]}

However, I will check whether using subset_draws() also solves the problem later when I have time.

Thanks for the reply!

1 Like