Looking for any references or recommendations on how I might optimize Stan for modeling covariates in a linear model as a function of some latent variable. for example, let’s say the covariate data I have is distance, and the latent variable is the “true distance” at which some main effect is hypothesized to be impact the outcome
e.g.
data{int N; // sample size
vector[N] y;
vector[N] x;
vector[N] dist;}
parameters{
real alpha;
real beta_one
real beta_two;
real<lower=0> theta;
real<lower=0> sigma;
}
transformed parameters{
vector[N] X_dist;
for(index in 1:N){
if(dist[index]<=theta)
X_dist[index] = 1;
else
X_dist[index] = 0;
}
}
model{
// insert reasonable priors here
y ~ normal(alpha + x * beta_one + X_dist*beta_two, sigma);
}
In attempting to fit models of this family, I’m finding issues with both divergence, reaching maximum tree depth (though it still results in accurate estimates), both of which are sensitive to the scale of the latently impacted parameters/covariates. In general, the model fits are not particularly consistent.
Any guidance on a better parameterization or method of fitting a model of this kind are welcome.
This is simulated data. Trying to follow the recommendations and build up slowly. I’m trying to implement a modeling scheme that aggregates covariates under a latent variable in the method I’ve shown above.
Regarding how many divergences - it varies dependent upon the scale of the main effect, e.g. beta_two in the example above. For large beta_two I get no divergences, small I get quite a few. Still those are not as plentiful as the “max_tree_depth” transition warnings. Nearly all of my samples (say 98-100)% will “have exceeded max tree_depth”.
I’m guessing the posterior geometry specific to this data structure is complex - obviously it’s not exactly smooth.
I was looking at the change point model earlier today - I’ll take another look at that and see if I can find a similarity that will solve my problem.
In the meantime - I have some working demos here that should illustrate exactly what I’m going for - unfortunately, they’re no longer producing the same results on the upgraded version of Stan I’m now using.
This is just a suggestion. You are building in a discontinuity in the transformed parameters which HMC does not like. If that is not a hard, physical condition, you can make it smoother by making your X_dist a probability. Something like
The lambda controls the steepness of the invlogit transform. With lambda --> infinity, you get your model. You can just plug-in Xp for X_dist in the model or you could use a mixture model with probability Xp and two components (which I think is closer in spirit to your original model).
normal_lpdf(X | alpha + x * beta_one + beta_two, sigma);
normal_lpdf(X | alpha + x * beta_one, sigma);
To simulate data, you need to simulate from the priors. You didn’t include any priors, so it’s unclear how you simulated data. The model and simulation should match, and then you are guaranteed to get appropriate posterior coverage (i.e., calibrated posteriors) for data simulated from the priors. You might want to check out the paper by Cook, Gelman, and Rubin on sampling software validation.