Soft constraint with min/max - does this break continuous differentation?

saudiwin · December 14, 2018, 6:37pm

Hi all -

I am experimenting with a new way of identifying time-varying latent parameters by constraining the min or max of the parameter vector \theta_t, as in the following:

min(theta) ~ normal(1,0.01);

The code works fine - it appears to act like a positivity constraint, but not fully hard, and seems to sample well when the goal is identification in a model where \theta_{t} depends on \theta_{t-1} and so on . It seems to work a bit better than using Stan’s hard-coded constraints which don’t always work well for identification.

The point in posting this, though, is to ask whether the use of min() or max() on a parameter vector in the model block will break continuous differentation of the HMC proposal distribution and lead to a minor but nonetheless real bias. My intuition is that it won’t because I am assigning a tight normal prior, but then again min and max are both integer-valued functions and necessarily discrete.

Any thoughts from HMC gurus? I could run a simulation to try to figure it out, but it would be fairly complicated as the bias is likely to be hard to detect.

sakrejda · December 15, 2018, 2:06pm

I’m not sure when you get HMC guru status but the key to thinking about this is whether the posterior becomes discontinuous due to the transformation. In this case when the two smallest parameters switch all of a sudden the normal log density is being calculated on a different parameter so you’d think it’s a problem. OTOH at the time of the switch the two are very close in value, something near the integrator stepsize multiplied by the weights (inverse weights?). So as long as the resulting step in log density is within numerical tolerances for the leapfrog integrator error you won’t have a problem. For reference lots of Stan functions have errors around 1e-8, 1e-6 is less common. If you use this trick with a bivariate normal and look at the acceptance probability around switch points you might see if it dips there.

saudiwin · December 15, 2018, 6:41pm

Thanks for the idea, I will give that a go. For the record, doing this doesn’t cause divergent transitions at least in the models I’ve run.

I suppose a continuous analogue would be to fit some kind of distribution over \theta_t that constrains the min or max, but what such a distribution would be I’m not sure.

sakrejda · December 15, 2018, 8:20pm

What you’ve defined is continuous (not in derivatives I think), it might just not play well with the integrator.

I keep running into this too. In my current project we just constrain the differences to decay to zero at the boundaries which gets you part of the way there. It quickly turns into SDE’s that are tractable in Stan, just not very fast.

Topic		Replies	Views
Ordering constraints Developers	1	868	January 6, 2017
Hoping for some guidance / help with implementing custom log likelihood and gradient for research project (details below) General	23	2220	October 18, 2021
Constraints on parameters/data General	3	790	December 1, 2020
Tutorial on Monte Carlo EM and variants for MML and MMAP Algorithms	16	3720	October 22, 2018
Parameter sign switching after warmup Modeling	9	591	August 23, 2021

Soft constraint with min/max - does this break continuous differentation?

Related topics