Constraints on parameters/data

Hi,

Recently I am looking at the Stan User’s Guide and I am thinking about the following two situations:

Scenerio 1: Suppose theta is the paramter in my model and I know theta should be constraint to be positive. If I would like to put diffuse normal prior on it, should I write as following:

parameters{
real <lower = 0> theta;
}
model{
log(theta) ~ N(0, 100^2)
}

Or I should better directly define the parameter as log_theta and write the following, then manually tranfer log_theta back to theta in generated quantities:

parameters{
real <lower = 0> log_theta;
}
model{
log_theta ~ N(0, 100^2);
}
generated quantities {
real <lower = 0> theta = exp(log_theta);
}

And in both of the two ways, do I need to consider the Jacobian for paramter transformation?

Scenerio 1: Suppose Y is the observed data in my model and I know Y should be constraint to be positive (all the observed Y are positive). In the model I know Y is normal:

data{
real <lower = 0> Y[N];
parameters{
real mu;
real sigma;
}
model{
Y ~ N(mu, sigma^2);
}

But I am curious that even if I let mu to be always positive, for a normal distribution there is still some possibility to generate negative Y, which contradicts my restriction that Y should be positive. In this scenerio, should I constaint it like Scenerio 1? (I should not use log-transformation here since Y itself is normal, but maybe a truncated normal should work?)

In summary, I am wondering what is the best solution under Scenerio 1 and Scenerio 2 accordingly and especially for Scenerio 2 I think I am confused between Bayesian Linear Regression and Bayesian Truncated/Censored Linear Regression.

Thx!

Scenario 1:
In your code, you are putting a normal prior on the logarithm of theta. There are several equivalent ways to do this, including

parameters{
real <lower = 0> theta;
}
model{
theta ~ LogNormal(0, 100);
}

or

parameters{
real log_theta;
}
model{
log_theta ~ normal(0, 100);
}

The second method will require a Jacobian adjustment if and only if theta exp(log_theta) appears on the left-hand side of an additional sampling statement in the model block.

But your text description of scenario 1 suggests that you want a truncated normal (not lognormal) prior on theta. This can be achieved simply with :

parameters{
real <lower = 0> theta;
}
model{
theta ~ normal(0, 100);
}

Alternatively, you could declare log_theta on the unconstrained scale, transform it, and sample it:

parameters{
real log_theta;
}
model{
real theta = exp(log_theta);
theta ~ normal(0, 100);
target += log_theta; //this is the Jacobian adjustment
}

Scenario 2:
There are many normal distributions (e.g. normal(100, .1)) that will yield datasets that are entirely positive. If you think that your data arise from such a normal distribution, then you don’t have to do anything special at all with constraints. On the other hand, if the mean(Y) is close to zero and var(Y) is large (and Y contains more than a small handful of elements), then the observation that Y is always positive strongly suggests that Y does not arise from a normal distribution, and you’ll want to write down some model other than Y ~ normal(mu, sigma) (note the syntax here, not N(mu, sigma^2)).

Edited to improve code formatting.

1 Like

Thanks so much for your reply!

According to your answers, I have three questions:

  1. In Scenerio 1 the reason why I use log-normal is log-normal has positive support, and the domain of log(theta) is the whole real line so that I could add a diffuse prior to let the model learn from the data. I am curious why you think in this scenerio I need a truncated normal prior instead of log-normal prior?

  2. For your following model, I am wondering why this is a truncated normal because to me it just a normal distribution centered around 0.

parameters{
real <lower = 0> theta;
}
model{
theta ~ normal(0, 100);
}
  1. For the Jacobian adjustment, I have read some expression like target += -log(fabs(y)) and may I ask that is this equivalent to target += log_theta as you mentioned in your code?

By the way, I think the main difference between Scenerio 1 and Scenerio 2 is that for Scenerio 1 you could easily reparamterize your parameters in your model, but for Scenerio 2 you think to think a good enough underlying data generating process that best matches your data, am I correct?

Thanks!

  1. Only because you began your original post by saying that you want to “put a diffuse normal prior on [theta]”. If you instead want a diffuse lognormal prior on theta, that’s fine, but it’s a different prior. If you want a diffuse truncated normal prior on theta that’s fine too. Half-normal priors provide some shrinkage towards zero (but for your extremely diffuse priors this shrinkage will be extremely weak), which can be desirable in some scenarios. Log-normal priors rule out values in the immediate (positive) neighborhood of zero. So the truncated-normal and lognormal priors say REALLY different things about your prior beliefs about certain values for theta.
  2. Putting a normal prior on the variable but declaring the variable with a bound (in this case a lower bound) is exactly equivalent to putting a truncated normal prior on the variable. Re-normalization is not necessary in Stan because Stan doesn’t care about constants in the log-probability function.
  3. If fabs is the absolute derivative of the transform, then your expression is nearly correct except that it contains a sign error. The logarithm of the absolute derivative of exp(log_theta) is log_theta.

The difference between scenario 1 and scenario 2 is that in scenario 1 you are making a choice that ought to reflect your prior beliefs about theta, whereas in scenario 2 you are making a choice that ought to reflect your sense of a reasonable generative model for Y.

2 Likes