Divergences when adding prior sampling

I am working my way through Bayesian Cognitive Modeling, recoding the models in Stan + cmdstanr and adding a bunch of quality checks (e.g. prior-posterior updates, predictive checks, etc.).

I am working through the “Seven Scientists” model (7 measurements with a common mean, but different sd’s, see code below), and kept getting divergences. So I went back to an old very JAGS-y implementation by Martin Smira (https://github.com/stan-dev/example-models/blob/master/Bayesian_Cognitive_Modeling/ParameterEstimation/Gaussian/SevenScientists_Stan.R, and below).

I was puzzled by the extremely uncertain priors chosen by the authors (mean is a normal(0,sqrt(30)), sd is 1/sqrt(gamma(0.001, 0.001)). I ran the model and everything is fine. I then added some code to sample and output the priors, before getting to work at reducing the uncertainty. However, when I add the code to sample the prior, I suddenly start getting a very slow model and divergences, which I cannot understand.

I marked the code below for the additions that induce divergences

Model to be saved a 8_SevenScientist.stan (below the r code to make it run):

model <- "
// The Seven Scientists
data { 
  int<lower=1> n;
  vector[n] x;
}
parameters {
  real mu;
  real muprior; \\ prior checks addition
  vector<lower=0>[n] lambda;
  real lambdaprior; \\ prior checks addition
} 
transformed parameters {
  vector[n] sigma;
  
  for (i in 1:n)
    sigma[i] = inv_sqrt(lambda[i]);
}
model {
  // Priors
  mu ~ normal(0, sqrt(1000));
  muprior ~ normal(0, sqrt(1000)); \\ prior checks addition
  lambda ~ gamma(.001, .001);
  lambdaprior ~ gamma(.001, .001); \\ prior checks addition
  
  // Data Come From Gaussians With Common Mean But Different Precisions
  x ~ normal(mu, sigma);
}

\\ prior checks addition
generated quantities {
  real sigmaprior;
  sigmaprior = inv_sqrt(lambdaprior);
}"

R code

# clears workspace: 
rm(list=ls()) 

library(cmdstanr)

# Define data
x <- c(-27.020, 3.570, 8.191, 9.898, 9.603, 9.945, 10.056)
n <- length(x)

data <- list(x = x, n = n)

file <- file.path( "8_SevenScientists_b.stan")
mod <- cmdstan_model(file, cpp_options = list(stan_threads = TRUE))

samples <- mod$sample(
  data = data,
  seed = 123,
  chains = 2,
  parallel_chains = 2,
  threads_per_chain = 2,
  iter_warmup = 1000,
  iter_sampling = 2000,
  refresh = 500,
  max_treedepth = 20,
  adapt_delta = 0.99,
)
1 Like

as a sanity check I also added a prior sampling that mirrors the posteriors (7 different sigmas). Same overwhelming n of divergences.

// The Seven Scientists
data { 
  int<lower=1> n;
  vector[n] x;
}
parameters {
  real mu;
  real muprior;
  vector<lower=0>[n] lambda;
  vector<lower=0>[n] lambdaprior;
} 
transformed parameters {
  vector<lower=0>[n] sigma;
  vector<lower=0>[n] sigmaprior;
  
  for (i in 1:n)
    sigma[i] = inv_sqrt(lambda[i]);
    
  for (j in 1:n)
     sigmaprior[j] = inv_sqrt(lambdaprior[j]);
}
model {
  // Priors
  mu ~ normal(0, sqrt(1000));
  muprior ~ normal(0, sqrt(1000));
  lambda ~ gamma(.001, .001);
  lambdaprior ~ gamma(.001, .001);
  
  // Data Come From Gaussians With Common Mean But Different Precisions
  x ~ normal(mu, sigma);
}
1 Like

Given that there are n observations and the model is structured for inference on n variances, it’s fundamentally unidentified to a degree that I wouldn’t think that any amount of no prior certainty is going to help. I’m frankly surprised that you are getting some configurations that achieve zero divergences.

1 Like

Just found this

Relevant is that (1) it’s possible that the original example had multiple observations per scientist, yielding identifiability, and (2) the priors combined with the precision-parameterization might be leading to numeric issues.

4 Likes

thanks a lot. The insufficient data matches my intuition, and the link is a nice explainer.

I am still baffled by the stan program by Smira running without any divergences, just commenting out the lines with “// prior checks addition”.