Poor mixing with inverse transform

I’m playing with transformations, currently with a toy model that mirrors what I aim to later do in a larger model. Here is my simple toy model:
x_{i,j} \sim \rm{Normal} (m_{i}, \sigma)
m_{i}=1/\hat m_{i}
\hat m_{i} \sim \rm{Normal}(0,\tau)
\sigma \sim \rm{Half-Cauchy}(2.5)
\tau \sim \rm{Half-Cauchy}(2.5)

The code below includes the determinant of the inverse transformation m_{x,i}=1/m_{y,i}, i.e. -2 log(m_{x,i}), and as far as I can get this to work, I think this seems to be OK. Yet, I get quite poor mixing, especially of \tau, as is demonstrated in the figure below. I’ve tried a variable exchange trick that usually works for hierarchical models, here m_hat[i]= m_hat_raw[i] * tau and then m_hat_raw[i] ~ normal(0,1), but it doesn’t improve much, so there is presumably something more I should adjust. I’ve used simulated data and, to give Stan a best case scenario, seeded on the values used for simulation. Any suggestions for how to improve mixing in a model like this?

Edit: the figure is from an example with:

data {
  int N;
  int n;
  real x[N,n];

parameters {
  real m_hat_raw[N];
  real<lower=0> sigma;
  real<lower=0> tau;

transformed parameters{
  real m[N];
  real m_hat[N];
  for (i in 1:N){
  m_hat[i]= m_hat_raw[i] * tau; 

model {
  for (i in 1:N){
    for (j in 1:n){
   // m_hat[i]~normal(0,tau);
    m_hat_raw[i] ~ normal(0,1);
    target += log(m[i]^-2);
 tau~cauchy (0,2.5);
 sigma~cauchy (0,2.5);

Hi, sorry for taking a bit too long to respond. Your question is relevant and well written.

I see two problems:

The (likely) bigger one is that the density is discontinuous at \hat{m}_i = 0. I find it likely that this is not a desirable feature, maybe you wanted to constrain the sign of \hat{m}_i? (that should make the model better behaved). Alternatively, you may note that the implied distribution of m_i is likely similar to Cauchy, as if X,Y \sim N(0,1) then \frac{X}{Y} \sim \mathrm{Cauchy}(0,1) so maybe you want to use Cauchy directly?

Finally the Jacobian adjustment is needed if and only if you are putting transformed parameters “on the left of sampling statements”, i.e. if you wanted to have m = 1/mhat; m ~ normal(0,1), then you would need it. If the only quantities on “left of sampling statements” are non-transformed parameters or data, you don’t need it. So you are having an extra term distorting the model.

Best of luck with modelling!