Largest Rhat is NA when one parameter is fixed to a constant

In my Stan model, I fixed one parameter to zero to satisfy some identifiability constraints.

When I sample, I obtained the desired sampling but I obtain warning messages that the largest Rhat is NA because of the fixed parameter.

Warning messages:
1: The largest R-hat is NA, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat 
2: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
3: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess 
      mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat valid Q5 Q50 Q95 MCSE_Q2.5 MCSE_Q25 MCSE_Q50 MCSE_Q75 MCSE_Q97.5 MCSE_SD Bulk_ESS
k[1]    0       0  0    0   0   0   0     0    NA    1     1  0   0   0        NA       NA       NA       NA         NA      NA     4000
     Tail_ESS
k[1]     4000

In the forum, I saw some “tricks” to avoid this message, for example:

  • Include the rstan sampling call in suppressWarnings(). This will indeed suppress warnings but might suppress also other important warnings as well.
  • Include the parameter constraint in the model block instead of the transformed parameters block. That would indeed work but I would like to keep the constrained parameter in the set of my parameters.

My general question is then: is it possible to fix one parameter in the parameter space without having this “false” warning?

Thank you for your help!

3 Likes

Try this: in the generated quantities section, add a minuscule random jitter.

Thanks for your reply.

Could me give me a hint on how I can perform this? And how will this potentially solve the parameter problem?

Thanks.

I’m still a beginner when it comes to Stan, but are you sure that this is really a “false warning”? The way I understand it, whatever’s declared in the parameter section is meant to be estimated - it’s an unknown dimension of the posterior that the sampler needs to explore. If you fix a parameter to a number, then it’s not an abstract estimated parameter to be estimated anymore but a concrete quantity, so personally I wouldn’t put it in the parameter section (I’d either declare it in data, if I was going to use different values at some point, or in the model section).

No, as the diagnostic code doesn’t know whether you fixed it or whether the sampling is stuck. Longer explanation and comment improving the warning below.

The problem with parameters fixed to a constant is that from a finite number of posterior draws that all have the same value, we can’t distinguish between a parameter that is really a constant and a parameter which is logically not constant, but the chain was stuck on one value. The stored information in the posterior draws doesn’t include information if some parameter is logically constant or something which could have different values in a much longer chain.

Probably the most common example would when in generated quantities an indicator function is used for counting how many times some logical condition holds. For example, if we want to estimate the probability that theta>0, we can have in generated quantities

int theta_gt_zero = theta>0;

If the p(theta>0) is very close to 0 or 1 but not equal to 0 or 1, we could still see just 0’s or 1’. Based on just those constant looking values, we can’t then know how efficient MCMC is for estimating that probability (except with additional information).

Another example would be if alpha is continuous and in logit prob scale. It is then possible that draws of alpha all have different values, but due to the limited floating point accuracy inv_logit(alpha) can have all constant draws.

Because of this ambiguity we decided to report NA for Rhat, ESSs and MCSEs, if all posterior draws for a parameter are equal. In case of NA, then the user needs to use the additional information to check whether the parameter is logically constant or use some other information to check the convergence and mixing if it’s not logically constant (e.g. in the above example check the convergence and efficiency for theta).

The warning you see is based on the new code, but it seems what you print for k[1] is based on the old code (it helps if you tell which function you call, form which package, and the package version). I recommend to use posterior package GitHub - stan-dev/posterior: The posterior R package which has the latest code.

See more discussion on this at

  • The warning message should be modified to give more informative message when there are NAs.
  • Also the documentation for rhat* and ess* functions should be edited to include comment why output can have NA.

I made issues on these Document edge case behavior for summary functions (e.g. rhat and ess) · Issue #70 · stan-dev/posterior · GitHub and Improve convergence diagnostic message for NA · Issue #769 · stan-dev/rstan · GitHub

EDIT: added example where draws are apparently constant due to the limited floating point accuracy.
EDIT2: added the issue links

2 Likes

Thank you very much @avehtari for your detailed answer and explanation.

For the moment, I will put the constraint parameter in the model block and only consider as parameters the parameters that need to be truly estimated. Hence, the warning message should not appear anymore.

For info, my R version: 3.6.2, rstan 2.19.3 and the functions that I called were sampling and monitor from the rstan package.

I also totally agree with you that it would be great if the warning message indicates something like

1: Some parameters have constant draws which can be due to parameters being constant, limited numerical accuracy or mixing problems. R-hat, Bulk-ESS and Tail-ESS for these parameters will be reported as NA.
See http://mc-stan.org/misc/warnings.html#r-hat 

If you had a parameter, mu with multiple elements and the first constrained to a constant, you would do:

generated quantities{
    mu[1] = uniform_rng(1e-15, 1e-16);
}

Since this gets run after a sample has been accepted, it doesn’t affect the sampling whatsoever but should eliminate the warning. You just need to remember when looking at the posterior that the posterior for mu[1] is meaningless.

1 Like

Thanks for your proposition @mike-lawrence but it does not work. If the parameter constraint mu[1]=0 is included in the transformed parameters block, I cannot assign another value in the generated quantities block.

Ah, shoot. I was worried about that possibly being the case.

Obviously the true solution to this problem is for the automatic warnings to detect such constant parameters and give the user the option to disable warnings about them (defaulting to yes). That or put something in the language to explicitly label constant parameters as such, and then skipping them in the rhat cacluations. But an interim solution might be to revisit the rationale for not permitting modifying parameters in the generated quantities block; tagging in some language design folks to weigh in on this: @Bob_Carpenter @jonah @bgoodri

1 Like

If it’s limited exclusively to generated quantities then it’s technically not going to break anything because it only modifies the model output while it is being written. It still feels very wrong.

1 Like

Just weighing in here to add that I encounter this issue all the time when I have parameters declared as cholesky_factor_corr and then transformed to the corresponding corr_matrix in generated quantities. The ESS and Rhat diagnostics will flag the diagonal elements of the correlation matrix. (Weirdly, they often flag some but not all diagonal elements.) It’s not a huge deal, but it requires me to patiently explain to users of my package why they should ignore warnings about those parameters, but not others. I suppose I could add a miniscule jitter to the diagonal elements, but that would defeat the purpose of making the correlation matrices usable as such in R.

Here’s an idea: save the jitter used in yet another qenerated quantity then provide a helper function in R to automatically subtract it back out. (I know, such a hack)

So does that mean we shouldn’t worry about the warning and assume the other estimates estimated are fine?

No, you should not assume the other estimates are fine.

Makes senses. So currently, I’m estimating a hierarchical model. For the random effects, I used the Cholesky Factorization and some estimate of the factors (i.e. The one being constant through the draws) have a convergence error. So I guess I could use the proposed solution to my problem. Here’s the title of my thread in case you want some more information: Convergence Error: Multilevel Model.

I have the same problem. So if you’re saying to users to ignore the message, then it means that the warning messages shouldn’t not affect the other estimates? What do you mean by adding a jitter? Could you elaborate a little bit? Thanks!

Could you post an example code please, that’d be great! Thanks!

I recommend to use posterior package https://github.com/jgabry/posterior which has the latest code. It has fixed code for handling variables with only one observed value. The code can’t know wether only one value is observed because the variable is constant or because the sampler is stuck, and you need to act based on what you know about that variable.

So you recommend that package to add a “jitter” in the constant values of the correlation matrix?

Could we do that with matrix values?
Here’s the matrix: cholesky_factor_corr[2] L_u;
We can’t do something like L_u[1,1] = uniform_rng(1e-15, 1e-16).
At least I tried but stan won’t let me do it! :)