Hi all -
I am experimenting with a new way of identifying time-varying latent parameters by constraining the min or max of the parameter vector \theta_t, as in the following:
min(theta) ~ normal(1,0.01);
The code works fine - it appears to act like a positivity constraint, but not fully hard, and seems to sample well when the goal is identification in a model where \theta_{t} depends on \theta_{t-1} and so on . It seems to work a bit better than using Stan’s hard-coded constraints which don’t always work well for identification.
The point in posting this, though, is to ask whether the use of min()
or max()
on a parameter vector in the model block will break continuous differentation of the HMC proposal distribution and lead to a minor but nonetheless real bias. My intuition is that it won’t because I am assigning a tight normal prior, but then again min
and max
are both integer-valued functions and necessarily discrete.
Any thoughts from HMC gurus? I could run a simulation to try to figure it out, but it would be fairly complicated as the bias is likely to be hard to detect.
I’m not sure when you get HMC guru status but the key to thinking about this is whether the posterior becomes discontinuous due to the transformation. In this case when the two smallest parameters switch all of a sudden the normal log density is being calculated on a different parameter so you’d think it’s a problem. OTOH at the time of the switch the two are very close in value, something near the integrator stepsize multiplied by the weights (inverse weights?). So as long as the resulting step in log density is within numerical tolerances for the leapfrog integrator error you won’t have a problem. For reference lots of Stan functions have errors around 1e-8, 1e-6 is less common. If you use this trick with a bivariate normal and look at the acceptance probability around switch points you might see if it dips there.
1 Like
Thanks for the idea, I will give that a go. For the record, doing this doesn’t cause divergent transitions at least in the models I’ve run.
I suppose a continuous analogue would be to fit some kind of distribution over \theta_t that constrains the min or max, but what such a distribution would be I’m not sure.
What you’ve defined is continuous (not in derivatives I think), it might just not play well with the integrator.
I keep running into this too. In my current project we just constrain the differences to decay to zero at the boundaries which gets you part of the way there. It quickly turns into SDE’s that are tractable in Stan, just not very fast.