Chains don't mix where I think they should

I simplified my model down to a piece that is giving me grief on simulated data. In the simplified model, I want to predict y according to the likelihood:

y_i\sim\mathcal{N}\left( \frac{g_i}{r_i}x_i , \sigma\right)

where y_i and x_i are observed in the data, and g_i and r_i are parameters. Since this is a hierarchical model, I wish to marginalize out g_i and r_i (0<r_i) according to their priors:

g_i \sim\mathcal{N}(\bar{g},\sigma_g) \\ r_i \sim \text{Exponential}(\bar{r})

where \bar{g}, \sigma_g, and \bar{r} are global hyper-parameters (I’ll ignore the hyper-priors for now to keep things simple). It’s important elsewhere in my model to keep g_i and r_i separate, which is why I don’t combine them into a single coefficient parameter – say, \beta_i = g_i/r_i – and call it done.

The issue arises in that, while a coefficient on x_i – the ratio g_i/r_i – can be identified, the individual parameters cannot. At least, not without more data.

To identify the individual parameters, I introduce an observation on r_i with error.

t_i\sim\mathcal{N}(r_i,\sigma_r)

where t_i is also observed in the data.

So I thought I was good. I thought that, since the ratio g_i/r_i could be identified by x_i, and the parameter r_i could be identified by t_i, I should be able to identify g_i. So I thought…

I simulate fake data in the transformed parameters block.

transformed data {
  int N = 1000;
  
  // Seed hyper-parameter values
  real g_bar_   = 0;
  real g_sigma_ = 1;
  real r_bar_   = 6;
  
  // Draw values of g and r
  vector[N] g_; for (n in 1:N) g_[n] = normal_rng(g_bar_,g_sigma_);
  vector[N] r_; for (n in 1:N) r_[n] = exponential_rng(r_bar_);
  
  // Simulate observations on x, y, and t
  real sigma_   = 1;
  real sigmar_  = 1;
  vector[N] x; for (n in 1:N) x[n] = normal_rng(0,10);
  vector[N] y; for (n in 1:N) y[n] = normal_rng(x[n]*g_[n]/r_[n],sigma_);
  vector[N] t; for (n in 1:N) t[n] = normal_rng(r_[n],sigmar_);
}

parameters {
  
  // Parameters
  vector[N] g;
  vector<lower=0>[N] r;
  real<lower=0> sigmar;
  real<lower=0> sigma;
  
  // Hyperparameters
  real g_bar; 
  real<lower=0> g_sigma;
  real<lower=0> r_bar; 

}

model {
  
  // Hyperpriors
  g_bar ~ normal(0,1);
  g_sigma ~ exponential(1);
  r_bar ~ exponential(1);
  
  // Priors
  g ~ normal(g_bar,g_sigma);
  r ~ exponential(r_bar);
  sigmar ~ exponential(1);
  sigma ~ exponential(1);
  
  // Likelihood
  t ~ normal(r,sigmar); 
  y ~ normal(g.*x./r,sigma);
}

Where I’m stumped is that most transitions after warmup exceed the maximum treedepth (there are often very few or no divergent transitions). Otherwise, the results are as you would expect when the chains don’t converge (i.e. low ESS, high Rhat, wild traceplots). (Changing to non-centered parameterization does not appear to resolve the issue either.)

The pairs plot suggests some multi-modality, but I can’t tell if this would resolve with more efficient sampling:

I’m familiar with the posted advice: " We do not generally recommend increasing max treedepth. In practice, the max treedepth limit being reached can be a sign of model misspecification, and to increase max treedepth can then result in just taking longer to fit a model that you don’t want to be fitting."

I’m stumped on how the model could be misspecified or where I can go from here. Any insight would really help.

You can’t simulate “data” in the transformed data block because it will change every iteration. Instead, define simulated data in the transformed data block. If you’re testing vs. the simulated data, this is likely your first problem.

However the data’s simulated, that model is going to be hard to identify. You have two parameters (g_i, r_i) that vary with two observations (y_i, t_i). The prior might save you, but it’s going to have do a lot of heavy lifting.

Thanks, @Bob_Carpenter !

That was indeed a typo. I think the Stan code is correct. The data is simulated in transformed data block, not the transformed parameters block like I mentioned. Wish I caught that sooner.