Multiple 'target +=' statement in Stan

Hi,

I am wondering whether we could use multiple ‘target+=’ statement in Stan such as following simple normal model:

data{
real<lower = 0> N1;
real<lower = 0> N2;
vector [N1] y1;
vector [N1] x1_original;
vector [N2] y2;
vector [N2] x2_original;
}
parameters{
real<lower = 0> theta;
real<lower = 0> phi;
real<lower = 0> sigma;
real<lower = 0, upper = 2> alpha;
vector<lower=0> [3] beta;
}
transformed parameters{
vector<lower=0>[N1] x1_transformed; 
vector<lower=0>[N2] x2_transformed;
for(i in 1 : N1){ 
x1_transformed[i] = theta * x1_original[i];
}
for(i in 1 : N2){
x2_transformed[i] = phi * x2_original[i];
}
}
model{
sigma ~ uniform(0,5);
alpha ~ uniform(0,2);
for (i in 1:3){
beta[i] ~ lognormal(0, 10);
}
for (i in 1:N1){
target += normal_lpdf(y_1[i]|beta[1] + beta[2]* x1_transformed[i], sigma*(beta[1] + beta[2]* x1_transformed[i])^alpha);
}
for (n in 1:N2){
target += normal_lpdf(y_2[n]|beta[1] + beta[2]* x2_transformed[n], sigma*(beta[1] + beta[2]* x2_transformed[n])^alpha);
}
}

In my understanding the model is correct but I could estimate the parameters theta and beta well (‘well’ means the posterior mean is close to the underlying true values of parameters generating the simulated dataset) if I only input y1 and x1 data. However, if I input all y1, y2, x1 and x2 the estimates for beta, theta and phi become very strange (far from true value and the Rhat is much greater than 1 even if I increase the number of iterations). So I am wondering whether Stan allows multiple ‘target +=’ statement or where my progam might went wrong.

Thx!

Yes, this is legal.

That being said, I have a few suggestions:

First, You can vectorize your normal_lpdf statements to avoid the for loop


transformed parameters {
  // I'm defining these to include the betas, since that's how they're used
  vector<lower=0>[N1] x1_transformed = beta[1] + beta[2] * theta * x1_original ; 
  vector<lower=0>[N2] x2_transformed = beta[1] + beta[2] * phi * x2_original ;
}
model {
// Priors for sigma and alpha;
beta ~ lognormal(0, 10); // this is auto-vectorized
target += normal_lpdf(y_1 | x1_transformed, sigma* x1_transformed^alpha);
target += normal_lpdf(y_2 | x2_transformed, sigma* x2_transformed^alpha);
}

Secondly, your prior for sigma - uniform(0, 5) - does not match the support you’ve given it when it was declared (0, infinity). If you really believe that it can only possibly have values in that range, the you should declare it with an upper limit of 5. That being said, hard constraints like this are are almost never recommended; they can cause sampling problems. I’d recommend looking at the Prior Choice Recommendations for a better option. Also, you essentially never need to explicitly include a uniform prior; anything that doesn’t have a prior explicitly assigned has an implicit normal over its support.

Third, you have an unbound, uniform prior on theta and phi. I’m guessing that this is likely the source of many of your problems; distinguishing between the effects of beta[2] and theta/phi will be difficult in this model without well considered priors.

2 Likes

Hi,

Thanks so much for your advice!

For your second point, do you mean that if I just want to put weak informative prior (eg, sigma ~ uniform (0,5) then I should use ‘real<lower = 0, upper =5> sigma’ ? (Since the link you provided stated that the flat uniform prior over the support might not be a good choice) Also may I ask that why hard constraints like this are are almost never recommended: they can cause sampling problems? (since for me such prior is weakly informative prior (level 3 in the linked provided) and should be a recommanded choice)

For your third point, yes thanks for pointing out that and I would add more constiained log-normal priors on them. I am also curious about distinguishing problem: how could I judge whether distinguishing between effect of beta[2] and theta/phi would be a problem if I don’t have much information and could just use the weakly informative priors?

A quick follow up question is that does the grammer for the Stan code seem to be correct ?

Thanks so much for your help!

For the first option, I’ll quote language in the compiler’s pedantic mode:

Hard constraints are not recommended, for two reasons: (a) Except when there are logical or physical constraints, it is very unusual for you to be sure that a parameter will fall inside a specified range, and (b) The infinite gradient induced by a hard constraint can cause difficulties for Stan’s sampling algorithm. As a consequence, we recommend soft constraints rather than hard constraints; for example, instead of constraining an elasticity parameter to fall between 0, and 1, leave it unconstrained and give it a normal(0.5,0.5) prior distribution.

Something like a (folded) student-t prior with 7-14 degrees of freedom and a scale of 4-5 would probably work better for sigma in your case. In general, it’s a good idea to visualize the prior and determine the probability that it’s below/within a certain range before using it.

how could I judge whether distinguishing between effect of beta[2] and theta/phi would be a problem if I don’t have much information and could just use the weakly informative priors?

That could be a problem. The total information that Stan has to work with is a combination of your data and your prior. If you have limited data and weak priors, you’re going to have troubling getting much of a useful inference even on well-identified models. I think this model would be rather difficult to fit without a decent prior on beta[2], theta, and phi.

2 Likes

Thanks so much for your kindly reply! If my Stan code is correct then I could further improve the priors.

Thx!

1 Like