Modeling covariates that are functions of latent parameters

Looking for any references or recommendations on how I might optimize Stan for modeling covariates in a linear model as a function of some latent variable. for example, let’s say the covariate data I have is distance, and the latent variable is the “true distance” at which some main effect is hypothesized to be impact the outcome

e.g.

data{int N; // sample size
     vector[N] y; 
     vector[N] x; 
     vector[N] dist;}
parameters{ 
     real alpha;
     real beta_one
     real beta_two;
     real<lower=0> theta;
     real<lower=0> sigma;
}
transformed parameters{
 vector[N] X_dist;
 for(index in 1:N){
      if(dist[index]<=theta)
           X_dist[index] = 1;
      else
           X_dist[index] = 0;
  }
}
model{
 // insert reasonable priors here
 y ~ normal(alpha + x * beta_one + X_dist*beta_two, sigma);
 }

In attempting to fit models of this family, I’m finding issues with both divergence, reaching maximum tree depth (though it still results in accurate estimates), both of which are sensitive to the scale of the latently impacted parameters/covariates. In general, the model fits are not particularly consistent.

Any guidance on a better parameterization or method of fitting a model of this kind are welcome.

Assuming everything else is good, there’s probably some priors you could throw on this that would help it a lot (I see the comments but the priors themselves are missing?). Check here: https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations

Maybe that’d help a little? How many divergences we talking?

Also, (the follow up question), how do you know the estimates are accurate if there are divergences :D? Is this simulated data?

Is this like a change point model? Do you have a plot of y vs x?

Questions questions!

This is simulated data. Trying to follow the recommendations and build up slowly. I’m trying to implement a modeling scheme that aggregates covariates under a latent variable in the method I’ve shown above.

Regarding how many divergences - it varies dependent upon the scale of the main effect, e.g. beta_two in the example above. For large beta_two I get no divergences, small I get quite a few. Still those are not as plentiful as the “max_tree_depth” transition warnings. Nearly all of my samples (say 98-100)% will “have exceeded max tree_depth”.

I’m guessing the posterior geometry specific to this data structure is complex - obviously it’s not exactly smooth.

I was looking at the change point model earlier today - I’ll take another look at that and see if I can find a similarity that will solve my problem.

In the meantime - I have some working demos here that should illustrate exactly what I’m going for - unfortunately, they’re no longer producing the same results on the upgraded version of Stan I’m now using.

1 Like

This is just a suggestion. You are building in a discontinuity in the transformed parameters which HMC does not like. If that is not a hard, physical condition, you can make it smoother by making your X_dist a probability. Something like

parameters{
  ...
  real<lower=0> lambda;
}
transformed parameters{
vector<lower=0, upper=1>[N] Xp;
Xp = invlogit(lambda(theta - dist));
}

The lambda controls the steepness of the invlogit transform. With lambda --> infinity, you get your model. You can just plug-in Xp for X_dist in the model or you could use a mixture model with probability Xp and two components (which I think is closer in spirit to your original model).

normal_lpdf(X | alpha + x * beta_one + beta_two, sigma);
normal_lpdf(X | alpha + x * beta_one, sigma);
1 Like

Hey @stijn, just checking to make sure I understand - is there a typo in your last two lines i.e.
it should be:

normal_lpdf(y |alpha + x* beta_one + beta_two, sigma);
normal_lpdf(y| alpha + x* beta_one, sigma);

not

normal_lpdf(X |alpha + x* beta_one + beta_two, sigma);
normal_lpdf(X| alpha + x* beta_one, sigma);

I like both ideas and I’ll see if I can develop them further.

Yes! Sorry about that.

To simulate data, you need to simulate from the priors. You didn’t include any priors, so it’s unclear how you simulated data. The model and simulation should match, and then you are guaranteed to get appropriate posterior coverage (i.e., calibrated posteriors) for data simulated from the priors. You might want to check out the paper by Cook, Gelman, and Rubin on sampling software validation.