I’m just getting started with modeling in Stan, and I’m trying to fit a simple hierarchical linear model. I’m generating data according to the following process:
For each g of G groups, a slope \beta_{g} and an intercept \alpha_{g} are drawn i.i.d. from normals. That is, \beta_{g} ~ N(\beta_\mu, \beta_\sigma) and \alpha_{g} ~ N(\alpha_\mu, \alpha_\sigma). Within each group, y ~ N(X \beta_{g} + \alpha, \epsilon). I want to infer MAP point estimates from my fake dataset for:
- \beta_{g}
- \alpha_{g}
- \beta_\mu
- \beta_\sigma
- \alpha_\mu
- \alpha_\sigma
-
\epsilon
During data generation, I’m setting - \beta_\mu = 0
- \beta_\sigma = 1
- \alpha_\mu = 0
- \alpha_\sigma = 1
-
\epsilon = 0.5
, with 1000 groups and 1000 datapoints per group. During fitting, I’m placing N(0, 5) hyperpriors on location parameters and LogNormal(0, 5) hyperpriors on scale parameters.
With a centered parameterization, I get very tight, accurate estimates for the parameters \beta_{g}, \alpha_{g}, \epsilon but the the hyperparameter inferences are way off (\mu, \sigma \approx 100). I found this especially odd since if I plotted the point estimates for \alpha, \beta, the N(0, 1) structure was very obvious. So I tried using a non-centered parameterization according to Matt’s trick and now I’m getting
Something went wrong after call_sampler
as a python error at runtime. Checking the actual CLI, I see
Line search failed to achieve a sufficient decrease, no more progress can be made
. If I try taking HMC samples from the posterior instead of finding a the mode, I get a lot of messages of the form
Exception: lognormal_lpdf: Random variable is -4.97792e-08, but must be >= 0!
Here’s my code for the non centered version
data {
int<lower=0> N;
int<lower=1> G;
int<lower=1,upper=G> data_group[N];
vector[N] x;
vector[N] y;
}
parameters {
real beta_mu;
real beta_sd;
real alpha_mu;
real alpha_sd;
real epsilon;
vector[G] beta_raw;
vector[G] alpha_raw;
}
transformed parameters {
vector[G] beta = beta_raw * beta_sd + beta_mu;
vector[G] alpha = alpha_raw * alpha_sd + alpha_mu;
}
model {
//increment lprob for hyperpriors
beta_mu ~ normal(0, 5);
beta_sd ~ lognormal(0, 5);
alpha_mu ~ normal(0, 5);
alpha_sd ~ lognormal(0, 5);
epsilon ~ lognormal(0, 5);
// increment lprob for untransformed parameters
for (g in 1:G) {
beta_raw[g] ~ std_normal();
alpha_raw[g] ~ std_normal();
}
// increment lprob for conditional log-likelihood of the data
for (n in 1:N) {
y[n] ~ normal(beta[data_group[n]] * x[n] + alpha[data_group[n]], epsilon);
}
}
and for the centered one
data {
int<lower=0> N;
int<lower=1> G;
int<lower=1,upper=G> data_group[N];
vector[N] x;
vector[N] y;
}
parameters {
real beta_mu;
real beta_sd;
real alpha_mu;
real alpha_sd;
real epsilon;
vector[G] beta;
vector[G] alpha;
}
model {
// specify hyperpriors
beta_mu ~ normal(0, 5);
beta_sd ~ lognormal(0, 5);
alpha_mu ~ normal(0, 5);
alpha_sd ~ lognormal(0, 5);
epsilon ~ lognormal(0, 5);
// increment for untransformed parameters
for (g in 1:G) {
beta[g] ~ normal(beta_mu, beta_sd);
alpha[g] ~ normal(alpha_mu, alpha_sd);
}
// increment for conditional log-likelihood of the data
for (n in 1:N) {
y[n] ~ normal(beta[data_group[n]] * x[n] + alpha[data_group[n]], epsilon);
}
}
I’m using the pystan interface to stan.