Hello all, I have been benchmarking a logistic normal multinomial model with an arbitrary correlation matrix as a parameter. I simulated some data following the logistic normal multinomial model and I fitted the simulated data but got frequent warnings about divergent transition/maximum tree depth reached, even for just 3 taxa in simulated data.
The model code looks like:
(The weird dimensionality of parameters comes from a previous linear design. I simplified it to intercept only for benchmarking purposes.)
data{
int<lower=1> N; // Number of samples
int<lower=1> M; // Number of taxa
array[N,M] int<lower=0> y; // count data
// Prior info
array[2] real prior_sd_intercept;
array[2] real prior_mean_intercept;
real<lower=0> prior_corr_eta;
}
transformed data{
matrix[M,M-1] Helmert = canonical_Helmert(M);
}
parameters{
// ILR means, not constrained
matrix[M-1,1] beta_raw;
// ILR log-standard deviations, not constrained
matrix[1, M-1] alpha_raw;
// ILR Cholesky factors for correlation matrices
array[1] cholesky_factor_corr[M-1] L;
// raw ILR latent residuals
matrix[N, M-1] intermediate_u_raw;
}
transformed parameters{
// Non-centered parameterisation for ILR means
matrix[M-1,1] beta = beta_raw*prior_mean_intercept[2] + prior_mean_intercept[1];
// Non-centered parameterisation for ILR log-standard deviations
matrix[1, M-1] alpha = alpha_raw*prior_sd_intercept[2] + prior_sd_intercept[1];
// Non-centered parameterisation for ILR intermediate u
array[1] cholesky_factor_cov[M-1] Lv;
Lv[1] = diag_pre_multiply(exp(to_vector(alpha[1])),L[1]);
matrix[N , M-1] intermediate_u;
for(n in 1:N){
intermediate_u[n] = to_row_vector(Lv[1]*to_vector(intermediate_u_raw[n]));
}
}
model{
// Fit main distribution
for(n in 1:N){
target += multinomial_logit_lupmf(
to_array_1d(y[n,]) |
Helmert*(beta[,1]+intermediate_u[n]') // latent composition in CLR
);
}
// Priors for standard deviations
alpha_raw[1] ~ std_normal();
// Priors for means
beta_raw[,1] ~ std_normal();
// (Hyper-)priors for the correlation matrices
L[1] ~ lkj_corr_cholesky(prior_corr_eta);
// Priors for intermediate_u
to_vector(intermediate_u_raw) ~ std_normal();
}
The Stan model is run via cmdstanr using the following specifications:
this_fit = LNM_isometric_single_model$sample(
data = this_data,
output_dir = "cmdstanr_output/",
chains = 2,parallel_chains = 2,
init = 0,
iter_warmup = 1000,iter_sampling = 1000
)
I use standard normal priors for means and log-sd of the logistic normal (i.e., prior_mean_intercept and prior_sd_intercept are both c(0,1)), and LKJ prior with eta=2 for the correlation matrix.
As far as I am aware of, I addressed the compositional constraints on parameters by transformation to ILR space, used non-centered parameterisation for all parameters, and applied the model on a dataset with very few taxa. So there would be little room for further model optimisation in terms of coding/parameterisation.
There were divergent transitions and/or tree-depth limit reached in almost every run except when the sample size is 1000, and there was a decreasing trend of E-BFMI as sample size increased.
Rhat of latent residuals showed a rather strange pattern for large sample sizes:
In addition, rhat of parameters of interest (alpha, beta and L) inflated as sample size increased:
I believe it’s the latent residuals causing the issue as I fitted the logistic normal model (without the multinomial part) without encountering the problems above.
I did a quick search online and with LLM, and it appears that LNM with full latent residuals has a complex geometry near certain region, leading to the warnings. It’s said that Laplace approximation or variational analysis can help with the issue, but I am not sure how to implement them in Stan/cmdstanr. An arbitrary correlation matrix is preferred so I don’t want to switch to Dirichlet multinomial.
Any ideas/tips on how to efficiently implement the LNM model or its approximation to address the geometry problem?


