Log probability evaluates to log(0) for Noncentered Regression

Hi all,

I’m having some trouble with this hierarchical non-centered linear model I’m attempting to run. Whenever I begin sampling, I get the error: Log probability evaluates to log(0), i.e. negative infinity.
Stan can’t start sampling from this initial value.

I’ve run this model on simulated data with the same result, and I friend of mine attempting to use the same model is also receiving this error. This leads us to believe that there’s potentially some sort of misspecification in the model, but we’re not sure where or what the issue would be. Any help in the right direction would be super helpful.

In terms of data, both my friend and I have outcome variables that are constrained to be positive: mine is an index between 0 and 100 (though I’ve tried transforming the data on an offset log scale before running the model with the same result), and my friend is analyzing sales data with the outcome between $0 and $2million.


// Index values, observations, and covariates.
data {
  int<lower = 1> N;                     // Number of observations.
  int<lower = 1> K;                     // Number of groups/clusters.
  int<lower = 1> I;                     // Number of observation-level covariates.
  vector[N] y;                          // Vector of observations.
  array[N] int<lower = 1, upper = K> g; // Vector of group assignments.
  matrix[N, I] X;                       // Matrix of observation-level covariates.

// Parameters.
parameters {
  vector[I] gamma;                      // Vector of population-level parameters.
  cholesky_factor_corr[I] L_Omega;      // Population model correlation matrix Cholesky factor.
  vector[I] tau;             // Population model scale parameters.
  matrix[I, K] Delta;                   // Matrix of non-centered observation-level parameters.
  real<lower = 0> sigma;                // Variance of the observation model.

// Transformed parameters.
transformed parameters {
  matrix[K, I] V;                       // Matrix of rescaled non-centered observation-level parameters.
  matrix[K, I] Beta;                    // Matrix of observation-level parameters.

  // Compute the non-centered parameterization.
  V = (diag_pre_multiply(tau, L_Omega) * Delta)';
  for (i in 1:I) {
    Beta[,i] = gamma[i] + V[,i];

// Regression model.
model {
  // Hyperpriors.
  gamma ~ normal(0, 5);
  L_Omega ~ lkj_corr_cholesky(4);
  tau ~ normal(0, 5);

  // Prior.
  sigma ~ normal(0, 5);

  // Population model.
  to_vector(Delta) ~ normal(0, 1);

  // Observation model.
  for (n in 1:N) {
    y[n] ~ lognormal(X[n,] * Beta[g[n],]', sigma);

// Generate quantities at each draw.
generated quantities {
  // Compute the log-likelihood.
  vector[N] log_lik;
  for (n in 1:N) {
    log_lik[n] = lognormal_lpdf(y[n] | X[n,] * Beta[g[n],]', sigma);

  // Compute the correlation matrix Omega.
  corr_matrix[I] Omega;
  Omega = multiply_lower_tri_self_transpose(L_Omega);

Thanks again for any help you can provide! We’re pretty stumped on this one.

Hey Ethan,

A few things which jump out without running the model myself and knowing exactly the data you put in:

  1. You are missing that tau should be limited by adding <lower=0> in the parameter definition.
  2. Your priors are quite large with normal(0, 5) but I guess that is something you have to evaluate yourself (for gamma, tau, sigma). For such cases you can also check out the student-t distribution.

You can check whether that solves the main issue for now.
Also you can use std_normal() for Delta which is a bit more efficient and you might want to check out the docs on how to avoid the loop for the likelihood.

Once it starts running you probably want to include a small conditional clause (if(prior_only==0)) around the likelihood estimation part to get prior predictive checks (running the model without the data). This will help you to get a better feeling for how sensible you choice of priors really is.
Good luck!