Divergences when adding prior sampling

I am working my way through Bayesian Cognitive Modeling, recoding the models in Stan + cmdstanr and adding a bunch of quality checks (e.g. prior-posterior updates, predictive checks, etc.).

I am working through the “Seven Scientists” model (7 measurements with a common mean, but different sd’s, see code below), and kept getting divergences. So I went back to an old very JAGS-y implementation by Martin Smira (https://github.com/stan-dev/example-models/blob/master/Bayesian_Cognitive_Modeling/ParameterEstimation/Gaussian/SevenScientists_Stan.R, and below).

I was puzzled by the extremely uncertain priors chosen by the authors (mean is a normal(0,sqrt(30)), sd is 1/sqrt(gamma(0.001, 0.001)). I ran the model and everything is fine. I then added some code to sample and output the priors, before getting to work at reducing the uncertainty. However, when I add the code to sample the prior, I suddenly start getting a very slow model and divergences, which I cannot understand.

I marked the code below for the additions that induce divergences

Model to be saved a 8_SevenScientist.stan (below the r code to make it run):

model <- "
// The Seven Scientists
data { 
  int<lower=1> n;
  vector[n] x;
}
parameters {
  real mu;
  real muprior; \\ prior checks addition
  vector<lower=0>[n] lambda;
  real lambdaprior; \\ prior checks addition
} 
transformed parameters {
  vector[n] sigma;
  
  for (i in 1:n)
    sigma[i] = inv_sqrt(lambda[i]);
}
model {
  // Priors
  mu ~ normal(0, sqrt(1000));
  muprior ~ normal(0, sqrt(1000)); \\ prior checks addition
  lambda ~ gamma(.001, .001);
  lambdaprior ~ gamma(.001, .001); \\ prior checks addition
  
  // Data Come From Gaussians With Common Mean But Different Precisions
  x ~ normal(mu, sigma);
}

\\ prior checks addition
generated quantities {
  real sigmaprior;
  sigmaprior = inv_sqrt(lambdaprior);
}"

R code

# clears workspace: 
rm(list=ls()) 

library(cmdstanr)

# Define data
x <- c(-27.020, 3.570, 8.191, 9.898, 9.603, 9.945, 10.056)
n <- length(x)

data <- list(x = x, n = n)

file <- file.path( "8_SevenScientists_b.stan")
mod <- cmdstan_model(file, cpp_options = list(stan_threads = TRUE))

samples <- mod$sample(
  data = data,
  seed = 123,
  chains = 2,
  parallel_chains = 2,
  threads_per_chain = 2,
  iter_warmup = 1000,
  iter_sampling = 2000,
  refresh = 500,
  max_treedepth = 20,
  adapt_delta = 0.99,
)

as a sanity check I also added a prior sampling that mirrors the posteriors (7 different sigmas). Same overwhelming n of divergences.

// The Seven Scientists
data { 
  int<lower=1> n;
  vector[n] x;
}
parameters {
  real mu;
  real muprior;
  vector<lower=0>[n] lambda;
  vector<lower=0>[n] lambdaprior;
} 
transformed parameters {
  vector<lower=0>[n] sigma;
  vector<lower=0>[n] sigmaprior;
  
  for (i in 1:n)
    sigma[i] = inv_sqrt(lambda[i]);
    
  for (j in 1:n)
     sigmaprior[j] = inv_sqrt(lambdaprior[j]);
}
model {
  // Priors
  mu ~ normal(0, sqrt(1000));
  muprior ~ normal(0, sqrt(1000));
  lambda ~ gamma(.001, .001);
  lambdaprior ~ gamma(.001, .001);
  
  // Data Come From Gaussians With Common Mean But Different Precisions
  x ~ normal(mu, sigma);
}

Given that there are n observations and the model is structured for inference on n variances, it’s fundamentally unidentified to a degree that I wouldn’t think that any amount of no prior certainty is going to help. I’m frankly surprised that you are getting some configurations that achieve zero divergences.

Just found this

Relevant is that (1) it’s possible that the original example had multiple observations per scientist, yielding identifiability, and (2) the priors combined with the precision-parameterization might be leading to numeric issues.

thanks a lot. The insufficient data matches my intuition, and the link is a nice explainer.

I am still baffled by the stan program by Smira running without any divergences, just commenting out the lines with “// prior checks addition”.