Something went wrong after call_sampler

I’m just getting started with modeling in Stan, and I’m trying to fit a simple hierarchical linear model. I’m generating data according to the following process:

For each g of G groups, a slope \beta_{g} and an intercept \alpha_{g} are drawn i.i.d. from normals. That is, \beta_{g} ~ N(\beta_\mu, \beta_\sigma) and \alpha_{g} ~ N(\alpha_\mu, \alpha_\sigma). Within each group, y ~ N(X \beta_{g} + \alpha, \epsilon). I want to infer MAP point estimates from my fake dataset for:

  • \beta_{g}
  • \alpha_{g}
  • \beta_\mu
  • \beta_\sigma
  • \alpha_\mu
  • \alpha_\sigma
  • \epsilon
    During data generation, I’m setting
  • \beta_\mu = 0
  • \beta_\sigma = 1
  • \alpha_\mu = 0
  • \alpha_\sigma = 1
  • \epsilon = 0.5
    , with 1000 groups and 1000 datapoints per group. During fitting, I’m placing N(0, 5) hyperpriors on location parameters and LogNormal(0, 5) hyperpriors on scale parameters.

With a centered parameterization, I get very tight, accurate estimates for the parameters \beta_{g}, \alpha_{g}, \epsilon but the the hyperparameter inferences are way off (\mu, \sigma \approx 100). I found this especially odd since if I plotted the point estimates for \alpha, \beta, the N(0, 1) structure was very obvious. So I tried using a non-centered parameterization according to Matt’s trick and now I’m getting

Something went wrong after call_sampler as a python error at runtime. Checking the actual CLI, I see
Line search failed to achieve a sufficient decrease, no more progress can be made. If I try taking HMC samples from the posterior instead of finding a the mode, I get a lot of messages of the form
Exception: lognormal_lpdf: Random variable is -4.97792e-08, but must be >= 0!

Here’s my code for the non centered version


data {
  int<lower=0> N;
  int<lower=1> G;
  int<lower=1,upper=G> data_group[N];
  vector[N] x;
  vector[N] y;
}
parameters {
  real beta_mu;
  real beta_sd;
  real alpha_mu;
  real alpha_sd;
  real epsilon;
  vector[G] beta_raw;
  vector[G] alpha_raw;
}
transformed parameters {
  vector[G] beta = beta_raw * beta_sd + beta_mu;
  vector[G] alpha = alpha_raw * alpha_sd + alpha_mu;
}
model {
  //increment lprob for hyperpriors
  beta_mu ~ normal(0, 5);
  beta_sd ~ lognormal(0, 5);
  alpha_mu ~ normal(0, 5);
  alpha_sd ~ lognormal(0, 5);
  epsilon ~ lognormal(0, 5);
  
  // increment lprob for untransformed parameters
  for (g in 1:G) {
    beta_raw[g] ~ std_normal();
    alpha_raw[g] ~ std_normal();
  }
  
  // increment lprob for conditional log-likelihood of the data
  for (n in 1:N) {
    y[n] ~ normal(beta[data_group[n]] * x[n] + alpha[data_group[n]], epsilon);
  }
}

and for the centered one

data {
  int<lower=0> N;
  int<lower=1> G;
  int<lower=1,upper=G> data_group[N];
  vector[N] x;
  vector[N] y;
}
parameters {
  real beta_mu;
  real beta_sd;
  real alpha_mu;
  real alpha_sd;
  real epsilon;
  vector[G] beta;
  vector[G] alpha;
}
model {
  // specify hyperpriors
  beta_mu ~ normal(0, 5);
  beta_sd ~ lognormal(0, 5);
  alpha_mu ~ normal(0, 5);
  alpha_sd ~ lognormal(0, 5);
  epsilon ~ lognormal(0, 5);
  
  // increment for untransformed parameters
  for (g in 1:G) {
    beta[g] ~ normal(beta_mu, beta_sd);
    alpha[g] ~ normal(alpha_mu, alpha_sd);
  }
  
  // increment for conditional log-likelihood of the data
  for (n in 1:N) {
    y[n] ~ normal(beta[data_group[n]] * x[n] + alpha[data_group[n]], epsilon);
  }
}

I’m using the pystan interface to stan.

You need to declare your always-positive parameters with with a constraint

parameters {
  real beta_mu;
  real<lower=0> beta_sd;
  real alpha_mu;
  real<lower=0> alpha_sd;
  real<lower=0> epsilon;
  vector[G] beta_raw;
  vector[G] alpha_raw;
}

“Something went wrong after call_sampler” means that the C++ function
stan::services::sample::hmc_nuts_dense_e returned a non-zero value
(but did not throw an exception).

I can’t recall immediately how best to debug this. It’s an atypical
error. Perhaps someone can spot the issue in the model code?

@nhuurre
Thanks, that got rid of those error messages. I’m still getting issues.

Here’s a more complete call to optimization

Initial log joint probability = -8.38742e+08
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
       1  -8.38742e+08             0   1.71066e+09       0.001       0.001       12
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.

Optimization terminated with error:
  Line search failed to achieve a sufficient decrease, no more progress can be made

Update: I thought maybe singularities near zero for the scale parameters might be the problem, so I used boundary avoiding gamma(2, 1) parameters as recommended in the user guide. Didn’t help. Trying to sample with the default 4 chains with 2000 iterations gets stuck at iteration 200/2000 (warmup).

Have you tried this with a subset of the data? A hierarchical model with 1000 groups and 1000 observations per group is going to take a while with HMC.

@stijn

Yeah, scaling down the size of the dataset helps enormously. Full bayes actually works and MAP doesn’t error if I keep the data set to about 10K points. I’m guessing the log probability gradient was blowing up with the full dataset?

Not sure what a correct way of handling larger datasets is, but in any case this topic can be closed I guess. Thanks for the help

It’s possible that the MAP problem and the Full Bayes problem are unrelated. Maximum likelihoods can blow up with corner solutions (\sigma = 0). Full Bayes could be a computational problem (too much information can cause strong correlations between parameters) or a memory problem (too many parameters x iterations to keep track of).