Hi Anton.
Happy to help as much as I can! I’ve tried a couple things and haven’t had much luck with the non-centered parameterization in rstan; the naive strategy that I put forth in my first reply removes all dependence of u on the “zeta” regression, which is no good.
I spent a little time trying to optimize the centered parameterization and had one realization. The model as you currently have specified it has an identifiability issue. Stan is not able to uniquely estimate sigma_u (the variance of the unobserved u’s about the “zeta” regression) and sigma (the variance of the observed data about the estimated multi-regression). Because the u’s are unobserved and get used to inform the expectation value of each data point y[i], there is no way for the model to separate out variability in the y[i]'s due to uncertainty in u (i.e., sigma_u) and variability about the multi-regression trend (sigma). Therefore, you might want to only have the model estimate a single variance parameter sigma, for the multi-regression, and set the variance parameter for the u regression to a constant. I think that whether that constant is accurate or not is irrelevant, the model’s estimate for sigma will adjust to whether or not the constant under/over-estimates the variance in u.
To show what I mean, I stripped away a lot of the model complexity and just focused on the truncated normal parameter u and its use in a regression model. Here is R code I used for a simplified simulation of just this part of the generative model
Simulation.R (1020 Bytes)
And here is the stripped down Stan model I used:
data {
int<lower=0> N;
vector[N] Y;
int<lower=0> Q; // Number of expla
matrix[N,Q] Z; // matrix for regression
}
parameters {
vector[Q] zeta;
real alpha;
real<lower=0> sigma_u;
vector<lower=0>[N] u;
real<lower=0> sigma;
}
model {
// u regression intercept and sd
alpha ~ normal(0, 0.5);
sigma_u ~ normal(0.25,1);
// data regression sd
sigma ~ gamma(1,1);
// Z regression slope
zeta ~ normal(1,0.25);
u ~ normal(alpha + Z*zeta, sigma_u);
Y ~ normal(- u , sigma);
And here’s a pairs plot from one run of the simulation + model (there were lots of warnings about rhats and low Bayesian fraction of missing information estimates)
Notice the striking correlation between sigma and sigma_u, which is evidence of the identifiability issue I mentioned.
Using this model instead, where all I changed was replaced sigma_u with a constant significantly improved model fit and performance:
data {
int<lower=0> N;
vector[N] Y;
int<lower=0> Q; // Number of expla
matrix[N,Q] Z; // matrix for regression
}
parameters {
vector[Q] zeta;
real alpha;
vector<lower=0>[N] u;
real<lower=0> sigma;
}
model {
// u regression intercept and sd
alpha ~ normal(0, 0.5);
// data regression sd
sigma ~ gamma(1,1);
// Z regression slope
zeta ~ normal(1,0.25);
u ~ normal(alpha + Z*zeta, 1);
Y ~ normal(- u , sigma);
And the model’s estimate of the us agrees well with the simulated values, as shown in this plot (red line is y = x):
You might also notice that I changed the priors for several of the parameters as well. That is because I did some prior predictive simulations to make sure that there wasn’t significant prior density < 0 for u, though these changes did not seem to significantly alter model performance. The script to reproduce the prior predictive simulations is here
Priors.R (1.3 KB)
I fit the over-simplified model to your data (so left out all of the other covariate data), and the model fit really well, though there were still some tail effective sample size warnings. You could try some other tricks like standardizing your regression covariates (that has helped some of my complicated linear models). I haven’t tried fitting a full model with all of the regression parameters to your data, but I hope this helps with the truncated normal parameter that seemed to be causing many of the model convergence issues.