I’m wondering if there is a way (even backend hack) to force stan to evaluate a subset of parameters usimg random walk metropolis hastings and the rest via HMC?
The reason is I’m interested in changepoint models with the changepoint as a parameter and I can’t marginalize out the segment likelihood. The likelihood will not be continuous with respect to the changepoint but would be continuous for the other parameters.
In theory, you can write your own code to do the MH step, and then call Stan for one iteration treating the changepoint as data (note: I am assuming that this procedure can be expected, subject to usual caveats about convergence, to yield the correct posterior. I’m not the sort of person who would know for sure off the top of my head whether this is true). In practice, the question is how do you adapt the HMC parameters (step size, mass matrix) to a configuration that works well across the entire range of changepoints that are consistent with the data. Maybe by taking a plausible changepoint, adapting to it, and then substantially reducing the adapted step size?? Or fixing the mass matrix and doing a backend hack to adapt the step-size yourself via your own implementation of the dual averaging??
One potential alternative is to imagine that the covariate with the changepoint is measured with some error (e.g. Gaussian) whose standard deviation you define a priori. Now the likelihood is no longer discontinuous, because as the changepoint “slides past” a datapoint the likelihood receives continuously reweighted contributions from the possibility that the datapoint is above the changepoint and the possibility that the datapoint is below the changepoint. In the limit of choosing a very small standard deviation, you recover the original changepoint model, which might be desirable if you want to treat this whole thing as a hack to get a differentiable likelihood rather than as a principled model of measurement noise. However, in this limit the derivatives get increasingly unwieldy and the likelihood contorts into something that’s impossible to sample. As you pick larger and larger standard deviations, sampling should get progressively more tractable, but you move further away from the original model that you described above.
Edit: another option is to discretize the variable with the changepoint, and then marginalize as in 7.2 Change point models | Stan User’s Guide.
Many thanks for your prompt and detailed response. There is a lot for me to think about but this definitely help.