With x1=[0,1,2,3,4,5], y1=[2, 3, 6, 11, 18, 27] and s=1.0 the means of c0 and c1 are close to 2 and 1.
As far as I understood the lower bounds are implemented as a parameter transformation. Therefore I tried to perform the transformation manually:
parameters {
real xc0;
real xc1;
}
transformed parameters {
real c0 = exp(xc0);
real c1 = exp(xc1);
}
For smaller values for s the predictions are very similar, but for s=1.0 the posterior mean is clearly wrong (c0=0.19846174, c1=1.100796) and I get some exceptions:
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: normal_lpdf: Location parameter[1] is inf, but must be finite!
So what’s wrong here and what is the difference between using a constrained parameter and a parameter that is transformed manually? Thank you!
I’m not positive, but I think the problem in your case has to do with Stan’s default priors. Technically either way of defining a constrained parameter should work, but they are different under the hood (see manual for more details. If you set a constraint in the parameter block, the sampler always makes sure that the constraint is enforced. When you try and constrain a parameter yourself in the transformed parameters block though, you could run into cases where you are exponentiating an extremely large or small number, which could lead to overflow problems for Stan.
The reason I bring up Stan’s default priors is that they are super wide; Stan uses Uniform(-Inf, Inf) improper priors as default. Therefore, my hypothesis is that in version 2 of your model, the lack of constraint that these priors place on the unconstrained parameters xc0 and xc1 could be causing problems when calculating c0 and c1. As mild evidence of this, I used this slightly altered model with explicit regularizing priors:
data {
int<lower=1> N1;
vector[N1] x1;
vector[N1] y1;
real<lower=0> s;
}
parameters {
real xc0;
real xc1;
}
transformed parameters {
real<lower=0> c0 = exp(xc0);
real<lower=0> c1 = exp(xc1);
vector[N1] mu = c0 + c1*square(x1);
}
model {
// More informative priors
// These keep xc0 and xc1 away from regions in which exp(xc0) or exp(xc1) explode
xc0 ~ normal(0,1);
xc1 ~ normal(0,1);
y1 ~ normal(mu, s);
}
and the model no longer had any convergence issues or troubles estimating c0 or c1. I’m a bit surprised by how drastic the difference in performance is for the two models when using the default priors (when playing around with a more general simulation, the 1st model you posted typically converges well with only the occasional divergent transition while the 2nd model almost always fails catastrophically), so there might be something else going on that I am ignorant to. R code I used to simulate more datasets was:
library(tidyverse)
library(rstan)
library(bayesplot)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
# Parameters
x1 = -5:5
s = 1
c0 = 0.75
c1 = 1.2
# Simulate data
y1 <- rnorm(length(x1), mean = c0 + c1*(x1^2), sd = s)
# Data for Stan
data_list <- list(
N1 = length(x1),
x1 = x1,
y1 = y1,
s = 1
)
# Fit model
fit <- stan(file = "path/to/stan/model.stan", data = data_list)
and the following implement the same density for alpha:
parameters {
real alpha_unc;
}
transformed parameters {
real<lower=0> alpha = exp(alpha_unc);
}
model {
target += alpha_unc; // include log Jacobian adjustment, log(abs(d/d.a exp(a))) = a.
alpha ~ ...
}
I’d also recommend providing priors for alpha. If you would rather put priors on the unconstrained parameter, you can also do that. In that case, you don’t need the log Jacobian adjustment.
Unfortunately, this will also be a problem for Stan’s built-ins. We’re a little more clever with overflow and underflow in the lower/upper bound case than if you just scale and translate an inverse logit.
That’s true for unconstrained parameters. The more general statement is that Stan uses a (perhaps improper) uniform prior on constrained parameters. For example a real<lower=0, upper=1> variable gets a uniform(0, 1) prior which is proper. These aren’t supposed to be default priors, per se. The idea is that because they’re uniform, you can add your own prior in and it behaves like that prior, because you multiply by uniform, and the uniform contributes only a constant.
I think what made me to believe that my code above should be equivalent was the note on Jacobian adjustments in the reparametrization chapter that states, that there are no adjustments needed if the transformed parameters are not given a distribution.
In my example, however, I didn’t realize that the prior distribution on c0 and c1 was hidden and changing the parameters also silently replaced the implicit uniform prior on c0 and c1 with an uniform prior on xc0 and xc1.
I’m not sure what you mean by that. You put a prior on xc0 and xc1
parameters {
real xc0;
transformed parameters {
real<lower=0> c0 = exp(xc0);
model {
xc0 ~ normal(0,1);
The standard normal prior on xc0 induces a standard lognormal prior on c0. It’s not hidden in the sense that you explicitly define the transform in the transformed parameters block. The code above will give you the same results as this:
parameters {
real<lower=0> c0;
transformed parameters {
real xc0 = log(c0);
model {
c0 ~ lognormal(0, 1);
...