Log lik for new group data in hierarchical multinomial model

This feels like an easy problem, but I am unsure how to code the solution. I have a simple hierarchical multinomial model, and I would like to calculate log_lik for new group data. The new data will not be used for fitting the model. I have no trouble obtaining log_lik for the data used to fit the model (X in code below), but the new group data (X_new) is another story. The problem is that I do not know how to handle the lack of defined group-level parameters for the new data.

I spent some time reading about LOGO-CV, which I assume requires Stan code that deals with this, but was still unable to figure out the syntax.

Thanks!

data{
  int<lower=3> K;
  real<lower=0> prior_MuSigma;
  real<lower=0> prior_SigmaSigma;
  real<lower=0> priorEta;
  
  // Modeled data
  int<lower=1> N;
  int<lower=0> X[K, N];
  
  // New data
  int<lower=1> N_new;
  int<lower=0> X_new[K, N_new];
}
parameters{
  vector[K - 1] Mu;
  matrix[K - 1, N] z;
  cholesky_factor_corr[K - 1] L_Rho;
  vector<lower=0>[K - 1] sigma;
}
transformed parameters{
  
  matrix[K - 1, N] v;
  matrix[K, N] alpha;
  
  v = diag_pre_multiply(sigma, L_Rho)*z;
  for(n in 1:N){
    alpha[, n] = softmax(append_row(0, Mu + v[, n]));
  }
}
model{
  // priors
  L_Rho ~ lkj_corr_cholesky(priorEta);
  Mu ~ normal(0, prior_MuSigma);
  sigma ~ normal(0, prior_SigmaSigma);
  to_vector(z) ~ normal(0, 1);

  // likelihood
  for(n in 1:N){
    X[, n] ~ multinomial(alpha[, n]);
  }
}
generated quantities{
  vector[N] log_lik_X; //log lik for modeled groups
  vector[N_new] log_lik_X_new; //log lik for new groups
  
  for(n in 1:N){
    log_lik_X[n] = multinomial_lpmf(X[, n] | alpha[, n]);
  }
}

Does this thread help? It feels like a similar problem where the solution is to use _rng() functions for the unobserved groups.

Thanks, I did come across that thread and tried the _rng() approach for handling effects for unobserved groups. I also implemented the solution suggested in this reply, which seems to conform more closely to what I am interested in, despite a far higher computational cost.

1 Like