Unexplained change in convergence on hierarchical model

agathevanlam · October 11, 2022, 2:21pm

Hello fellow STAN-ers,

We’ve been working on a hierarchical model (menstrual cycle lengths woman-wise and the woman’s characteristics population-wise) for more than six months. We built the model incrementally and tested a lot of different datasets with different characteristics (to learn what the model could fit best and how). This summer, we were happy with our model’s convergence: the Rhat&ESS looked good and resulting interpretable parameters made sense.

Recently, I re-ran RStan only to find the same MC did not converge anymore. Since then, I’ve gone back to playing with the model, implementing it in cmdStanR, changing the sampler parameters, etc., without any success. It fails to converge at all (the parameters have Rhat between 1.2 and 4) and seems to be hitting the maximum tree depth in most iterations (75%).

Has this “unexplainable” switch of convergence ever been seen before ? The dataset is exactly the same, the code versioning (git) shows no differences in the scripts and there have been no automatic or manual system updates. . The seed hasn’t been changed either.
The two R wrappers, cmdStanR and Rstan give slightly different results (on smaller, easier, converging datasets with which I’m testing my implementation). I imagine this is to be expected?
Do you have any idea of what I could further try? As I’m trying to replicate an existing model (described in a published article), I cannot re-parametrize it further unless there is no other option at all.

Thanks so much for any help, all input welcome!

data {
 int < lower = 1 > N; // Number of cycles
 int < lower = 1 > W; // Number of women
 int < lower = 1 > iwoman[N]; // Table: women to whom the cycle belongs
 vector[N] lnC; // Cycle lengths
}
parameters {
 real < lower = 0 > half_nu; // Degrees of freedom
 vector < lower = 0 >[N] phi; // Precision

 real mub;     // Population mean of women means
 real < lower = 0 > psi;   // Population variance of women means
 real < lower = 0 > a;   // Population parameters for women variances
 real < lower = 0 > b;

 vector[W] mu; // Women means
 vector < lower = 0 >[W] sigmasq; // Women variances
}
transformed parameters{
 real invpsi;
 invpsi = 1/psi;
}
model {
  half_nu ~ gamma(1.0e-3, 1.0e-3);
  phi ~ gamma(half_nu, half_nu);

  mub ~ normal(0,sqrt(1000));
  psi ~ inv_gamma(1.0e-3, 1.0e-3);
  a ~ gamma(1.0e-3, 1.0e-3);
  b ~ gamma(1.0e-3, 1.0e-3);

  mu ~ normal(mub, sqrt(psi));
  sigmasq ~ gamma(a, b);

  lnC ~ normal( mu[iwoman], sqrt(sigmasq[iwoman] ./ phi) );
}


fit_CL <- CL_fullhierarchy$sample(data = CL_train_data,
                                  iter_warmup = 3000, iter_sampling = 3000, thin = 10,
                                  chains = 4, threads_per_chain = 2, seed = 803214053,
                                  adapt_delta = 0.99, step_size = 0.01,
                                  refresh = 300, output_dir = paste0(diagnosticspath, "pi_cycle1/"))

js592 · October 11, 2022, 3:18pm

Without a deep dive into your code, it is entirely (unfortunately) possible that a model that fits fine on a downsampled dataset will fail when applied to the larger version. This could be because there is some structure in the whole data which is eliminated in the smaller trial fits that causes prior/data or likelihood/data conflict and tough posterior geometry. Similarly, if you ran the model on previous datasets and recently added more data the new data could be causing similar problems. This could indicate some distribution shift in the sampled population or perhaps some issues with the most recent round of data collection.

yizhang · October 11, 2022, 5:37pm

The previous working fit stores Stan version and wrapper version, are those the same with the failed runs?

Yes. They are likely using different versions of Stan.

It’d be unlikely the reproducibility is broken (see 22 Reproducibility | Stan Reference Manual) so I’d go through the list to see if anything changed.

I only took a glance of the model, gamma(1.e-3, 1.e-3) seems unusual. It implies a large curvature in the likelihood function which may stress the sampler.

agathevanlam · October 12, 2022, 7:57am

Thank you for your reply! However, the model stopped fitting exactly the same dataset, which is puzzling.

js592 · October 12, 2022, 4:32pm

Ah, my bad – didn’t notice that in the earlier post. Does the problem still occur if someone else in your group tries to run the model on their machine?

WardBrian · October 12, 2022, 8:21pm

This is something of a long shot, but the default chain ID did change from 0 to 1 during Stan 2.28. This means the same pRNG seed would have different results before and after that version if you did not manually specify a chain ID. If your model is extremely sensitive to that kind of thing and if your version went from <2.28 to >2.28, it could be it?

andrjohns · October 13, 2022, 6:49am

To diagnose whether it’s a version issue, you could use cmdstanr to install an older version and compare the results:

install_cmdstan(version = "2.27.0")
set_cmdstan_path("/path/to/cmdstan-2.27.0")

agathevanlam · October 13, 2022, 9:58am

Thanks for the replies! A little additional info :

I realised my initial question is confusing, because I quote the cmdStanR command CL_fullhierarchy$sample(…). But the initial script that managed to converge all summer was in Rstan : so the actual command was with “fit_CL ← stan(file = CL_fullhierarchy, data = CL_train_data, warmup = 3000, iter = 6000, chains = 4, thin = 1, seed = 803214053, control = list(adapt_delta = 0.99))”. This doesn’t run anymore, which is why I tried out cmdStanR two weeks ago (thought maybe another wrapper will give more insights). (RStan and cmdStanR give compatible results on much smaller development data where they both converge.) I hope I’m explaining it clearly.
Checking in zsh, no R packages were changed since setting it up in January, except cmdStanR. So Rstan (rstan_2.28.2.9000) did not move version.
Someone else tried running on their machine - no success there either.

Topic		Replies	Views
Convergence of multiple chains Modeling rstan , fitting-issues	5	342	March 27, 2024
Issues with hierarchical linear regression stan model Modeling rstan , fitting-issues	2	468	January 26, 2021
Convergence issues with heteroskedastic hierarchical model Modeling rstan , fitting-issues	3	545	December 21, 2022
Sample size, divergence and Rhat in hierarchical model Modeling	2	1439	December 21, 2017
Reparameterizing and Convergence with rstanarm rstanarm rstan	12	1242	May 2, 2018

Unexplained change in convergence on hierarchical model

Related topics