Resolving difficult geometry in logistic normal multinomial model

Hello all, I have been benchmarking a logistic normal multinomial model with an arbitrary correlation matrix as a parameter. I simulated some data following the logistic normal multinomial model and I fitted the simulated data but got frequent warnings about divergent transition/maximum tree depth reached, even for just 3 taxa in simulated data.


The model code looks like:
(The weird dimensionality of parameters comes from a previous linear design. I simplified it to intercept only for benchmarking purposes.)

data{
  int<lower=1> N; // Number of samples
  int<lower=1> M; // Number of taxa
  array[N,M] int<lower=0> y; // count data

  // Prior info
  array[2] real prior_sd_intercept;
  array[2] real prior_mean_intercept;
  real<lower=0> prior_corr_eta; 
}

transformed data{
  matrix[M,M-1] Helmert = canonical_Helmert(M);
}

parameters{
  // ILR means, not constrained
  matrix[M-1,1] beta_raw;

  // ILR log-standard deviations, not constrained
  matrix[1, M-1] alpha_raw;
  
  // ILR Cholesky factors for correlation matrices
  array[1] cholesky_factor_corr[M-1] L;
  
  // raw ILR latent residuals
  matrix[N, M-1] intermediate_u_raw;
  
}

transformed parameters{
  // Non-centered parameterisation for ILR means
  matrix[M-1,1] beta = beta_raw*prior_mean_intercept[2] + prior_mean_intercept[1];
  
  // Non-centered parameterisation for ILR log-standard deviations
  matrix[1, M-1] alpha = alpha_raw*prior_sd_intercept[2] + prior_sd_intercept[1];
  
  // Non-centered parameterisation for ILR intermediate u
  array[1] cholesky_factor_cov[M-1] Lv;
  Lv[1] = diag_pre_multiply(exp(to_vector(alpha[1])),L[1]);
  matrix[N , M-1] intermediate_u;
  for(n in 1:N){
     intermediate_u[n] = to_row_vector(Lv[1]*to_vector(intermediate_u_raw[n]));
  }
  
}

model{
  // Fit main distribution
  for(n in 1:N){
     target += multinomial_logit_lupmf(
        to_array_1d(y[n,]) |
        Helmert*(beta[,1]+intermediate_u[n]') // latent composition in CLR
     );
  }

  // Priors for standard deviations
  alpha_raw[1]  ~ std_normal();
 
  // Priors for means
  beta_raw[,1] ~ std_normal();

  // (Hyper-)priors for the correlation matrices
  L[1] ~ lkj_corr_cholesky(prior_corr_eta);
  
  // Priors for intermediate_u
  to_vector(intermediate_u_raw) ~ std_normal();
  
}

The Stan model is run via cmdstanr using the following specifications:

  this_fit = LNM_isometric_single_model$sample(
    data = this_data,
    output_dir = "cmdstanr_output/",
    chains = 2,parallel_chains = 2,
    init = 0,
    iter_warmup = 1000,iter_sampling = 1000
  )

I use standard normal priors for means and log-sd of the logistic normal (i.e., prior_mean_intercept and prior_sd_intercept are both c(0,1)), and LKJ prior with eta=2 for the correlation matrix.

As far as I am aware of, I addressed the compositional constraints on parameters by transformation to ILR space, used non-centered parameterisation for all parameters, and applied the model on a dataset with very few taxa. So there would be little room for further model optimisation in terms of coding/parameterisation.


There were divergent transitions and/or tree-depth limit reached in almost every run except when the sample size is 1000, and there was a decreasing trend of E-BFMI as sample size increased.

Rhat of latent residuals showed a rather strange pattern for large sample sizes:

In addition, rhat of parameters of interest (alpha, beta and L) inflated as sample size increased:

I believe it’s the latent residuals causing the issue as I fitted the logistic normal model (without the multinomial part) without encountering the problems above.

I did a quick search online and with LLM, and it appears that LNM with full latent residuals has a complex geometry near certain region, leading to the warnings. It’s said that Laplace approximation or variational analysis can help with the issue, but I am not sure how to implement them in Stan/cmdstanr. An arbitrary correlation matrix is preferred so I don’t want to switch to Dirichlet multinomial.

Any ideas/tips on how to efficiently implement the LNM model or its approximation to address the geometry problem?

I’m not at all an expert in this, but we just added a generic nested Laplace approximation in the latest version of Stan and @avehtari, @charlesm93, and @stevebronder are the ones to ask about this.

Whether the centered or non-centered parameterization is more efficient depends on how much data you have and how informative the prior is. The more data and the tighter the prior, the more likely it is that centered parameterizations will be better.

You can just use

simplex[K] theta;

and our most recent implementation uses the isometric log ratio (ILR) transform under the hood with an appropriate Jacobain correction.

You can drop this statement as it’s uniform and hence a no-op given your declaration of L:

L[1] ~ lkj_corr_cholesky(prior_corr_eta);

Why are you making this an array of length 1? It’d be easier to follow without the extra indirection.

You an use offset/multiplier to code non-centered parameterizations, e.g.,

matrix<offset=prior_mean_intercept[1], multiplier=prior_mean_intercept[2]>[M - 1, 1] beta_raw;

But this should just be declared as a Vector[M - 1] rather than as an (M-1) x 1 matrix. Same for alpha_raw, which can be coded as type RowVector[M - 1]. And I wouldn’t call the scale and location "prior_mean_intercept`, even though the location acts like an intercept (the scale doesn’t).

For efficiency, you want to choose natural types and get rid of all the to_vector code here. When you see to_vector nested inside of to_row_vector, that’s going to be very inefficient. intermediate_u_raw[n] is already a row vector, so you can just use Lv[1] * intermediate_u_raw[n]. But what you really want to try to reduce all of these to matrix operations rather than loops—it’s way more efficient.