Parameter sign switching after warmup

Hi Stan users and experts,

I am new to Stan and have been trying to implement a choice model using Pystan. The sampling seems to be working fine except that the signs of some of the parameters are switched. For example, a parameter whose true value is 0.5 is estimated to be -0.5 by the sampler.

On further investigation, looking at the trace plots, I found that the parameter signs seem to be switching almost instantly as soon as warm-up finishes. As far as I know this shouldn’t be happening, so I wonder what the issue is? Thanks a lot for reading; have a good day/evening!

1 Like

Can you share your Stan code, or better yet your code + data for a full reproducible example?

2 Likes

Definitely need the code + data to be sure, but it sounds like what might be happening is some parameters are switching together, e.g. if you have y \propto \beta_1 \beta_2 then if both \beta_1 and \beta_2 switch signs, you get the value for y. In practice, this could lead to multiple posterior modes that fit the data equally well so you’d need additional constraints or stronger priors to avoid it.

1 Like

Yes, this is the probable cause.

Can you (op) check arviz plot_pair for multiple modes?

Hi @jsocolar, thanks for your reply. I would’ve liked to share the code and data here, but I am not sure I can do that before asking permission from someone else. I will do that.

Thanks for your repsonse @amas0. I’ve been working on this model for quite a while, and the way it was structured earlier, I was having this same issue when estimating it using a normal Gibbs sampling routine. I tweaked the model until this problem went away.

But now that I am trying to estimate the model in STAN, this issue has popped up again. The reason you mention could be plausible, but what looks strange is that the parameters seem to be switching signs right after warm-up. Do you have any idea why this might be happening? I am not too knowledgable about HMC, but might this be caused due to the way HMC works?

It’s hard to say. Typically, you’d expect the MCMC procedure to not get out of a local mode once it settles in during warmup. If it is a problem with posterior multimodality then you’d expect to see different chains getting stuck in different modes and not mixing as a result.

Providing traceplots or other diagnostics could help with diagnosis. Even a restricted version of the code with just the parts that have the problem parameters could be enough to figure out the issue.

1 Like

Hi @amas0, that was a super quick response. Thanks!

Upon seeing your comment that the chain shouldn’t usually get out of a model once it settles during warm up, I investigated my code further. Turns out, I was misinterpreting the output of fit.extract(). I thought it returns the warm-up draws by default, but I just found out that it doesn’t. The sudden switching of signs that I saw on the trace plots is just because I was plotting the 2 chains side-by-side (which I thought is warm-up + chain1).

Now I am quite certain that multiple posterior modes is the reason for the issue. I will work on making the model more identifiable. Thanks very much for your responses and for bearing with my confusion.

1 Like

Glad to hear it! No worries on the confusion, lots of “gotchas” in the space that are hard to realize unless you know what you’re looking for.

Follow up with any other issues or just to let us know you figured it out.

1 Like

Hamiltonian Monte Carlo explores really well, and in many problems that means that it encountered pathological behavior that other samplers had simply ignored. This leads to the “it worked fine before but is broken in Stan” complaints; the problem is that it was never working before and the previous samplers just weren’t able to identify or otherwise diagnose the problem.

In theory Hamiltonian Monte Carlo is not only better at exploring modes but also jumping between modes that aren’t too far apart, although if the modes are sufficiently far apart then it will still take too long to be practical. One benefit of Stan’s default configuration is that it runs multiple Markov chains which can also be helpful for diagnosing multiple modes relative to running just one chain, which seems to have been the case here.

1 Like