From your description it sounds like you might be talking about the *tempering during a trajectory* approach described by Radford Neal in Section 5.7 of his *MCMC using Hamiltonian Dynamics* chapter in the *Handbook of Markov chain Monte Carlo*? If so I think you are correct that it would problematic using this scheme within a no-U-turn sampler (NUTS) or more specifically the dynamic multinomial HMC scheme currently used in Stan (which includes several improvements to the original NUTS algorithm), both due to the number of integrator steps in each trajectory not being fixed or known at the start of each trajectory, but also as the trajectory integrate both forwards and backwards in time.

There are a variety of other related ‘tempering’ schemes that can be used with a dynamic HMC algorithm however. Simulated tempering augments the original state the target distribution is defined on with an auxiliary (inverse) ‘temperature’ variable with a Markov chain then simulated by alternating transitions which update the original state for a fixed value of the inverse temperature variable with transitions which update the inverse temperature for a fixed value of the original state. A dynamic HMC algorithm can be used to implement the former transitions (of the original state given inverse temperature). This could potentially be performed using Stan if you included the inverse temperature variable in the data block and used an loop in the language you are using to interface with Stan to alternate sampling a single (non-adaptive) chain transition using Stan with the current value of the inverse temperature passed in as data, with transitions updating the inverse temperature variable (which in the original simulated tempering algorithm is discrete).

A potentially simpler alternative is the continuously tempered Hamiltonian Monte Carlo algorithm (disclosure: I am an author on the paper so I my viewpoint might be a bit biased here!). Rather than augmenting the original state with a discrete inverse temperature variable, the key idea here is to instead use a continuous inverse temperature variable. We can then jointly update both the original state and inverse temperature using a dynamic HMC algorithm targeting the augmented joint distribution. This means the sampling can be performed entirely within Stan without the need to manually deal with separately updating the inverse temperature variable.

A key issue in both the original simulated tempering algorithm and continuous tempering variant, is the need to define a suitable ‘base’ distribution that corresponds to the targeted distribution when the inverse temperature variable is equal to zero (with the original target distribution corresponding to the inverse temperature variable being one). A simple approach is to use the prior distribution as this base distribution, however in the common situation that the posterior distribution significantly differing from the prior this can lead to poor mixing of the chains along the auxiliary inverse temperature variable axis. An approach we suggest in the continuously tempered HMC paper is to instead using a simple approximation to the posterior, for example a Gaussian approximated fitted using variational inference, as the base distribution, with this often significantly improving performance.

Another related approach that I am less familiar with is *Adaptive Path Sampling and Continuous Tempering* - this shares some similarities with the continuously tempered HMC approach, but has a clever adaptive strategy for tuning the marginal distribution on the inverse temperature variable that (I think) helps address the issue I allude to above. @yuling may be able to give more details! Example code using Stan within this framework is available at GitHub - yao-yl/path-tempering: continuous tempering by path sampling.

An alternative class of Monte Carlo methods which are often well suited to working with multimodal distributions are sequential Monte Carlo samplers (SMCS) - the recent paper *An invitation to sequential Monte Carlo samplers* by Dai et al. is a very nice and comprehensive introduction. SMCS methods involve sequentially updating an ensemble of particles, on each iteration generating an approximation to one of a sequence of target distributions. Each iteration requires updating each particle by applying a Markov transition leaving the current target distribution in the sequence invariant, computing weights for the particles based on the ratio of their densities under the current and previous target distributions, and then resampling the ensemble based on these weights. The sequence of target distributions can be specified to correspond to a ‘tempering’ like scheme which geometrically bridges from a base distribution to a target posterior distribution of interest, but other ‘paths’ of distributions can also be used - for example by sequentially introducing likelihood terms corresponding to individual observations.

The Markov transitions leaving each target distribution in the path invariant can use dynamic HMC algorithms such as that implemented in Stan, an in fact there is an example in the *invitation to SMCS* paper of using Stan within a SMCS to target the posterior distribution of a susceptible-infected-recovered ordinary differential equation model (code available at GitHub - pierrejacob/smcsamplers). By requiring an ensemble of particles to be simulated SMCS methods can be somewhat computationally expensive compared to running a few chains with a standard MCMC method, but compared to the simulated / continuous tempering approaches described above, they are often much more robust to mismatch between the base and target distributions used, particularly if an adaptive scheme is used for choosing the next distribution on the path to use based on maintaining an lower bound on the estimated effective sample size.