I have a linear hierarchical random intercept model with a two level hierarchy.
I initially set the priors for the random intercepts as normally distributed and want to run a version now where the parameters
are t-distributed and where the parameter v of the t-distribution can be learned.
In “parameters” I set the range for v as:
‘’’
real<lower=2.1, upper=100000> v; // degrees of freedom for t-distribution
‘’’
In “transformed parameters”
I changed the following distributions from normal do student_t
‘’’
lprior += student_t_lpdf(sd_1[1] | v, 0, 1) // #CHANGED: from normal to student_t
- 1 * student_t_lccdf(0 | v, 0, 1); // #CHANGED: from normal to student_t
lprior += student_t_lpdf(sd_2[1] | v, 0, 1) // #CHANGED: from normal to student_t
- 1 * student_t_lccdf(0 | v, 0, 1); // #CHANGED: from normal to student_t
‘’’
The full code of the model is provided below.
I get the following error in PYSTAN and am not sure where the problem comes from:
Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can’t start sampling from this initial value.
Initialization between (-2, 2) failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
Changing the initialization does not help:
‘’’
Define an initialization function
def init_func():
return {‘v’: np.random.uniform(3, 10)}
Fitting the model with custom initialization
fit = model.sampling(data=stan_data, iter=4000, chains=5, warmup=2000, seed=12, init=init_func)
‘’’
I would highly appreciate any suggestions:
Here is the full model:
stan_code = “”"
functions {
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
int<lower=1> J_1[N]; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
// data for group-level effects of ID 2
int<lower=1> N_2; // number of grouping levels
int<lower=1> M_2; // number of coefficients per level
int<lower=1> J_2[N]; // grouping indicator per observation
// group-level predictor values
vector[N] Z_2_1;
int prior_only; // should the likelihood be ignored?
}
transformed data {
}
parameters {
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // dispersion parameter
vector<lower=0>[M_1] sd_1; // group-level standard deviations
vector[N_1] z_1[M_1]; // standardized group-level effects
vector<lower=0>[M_2] sd_2; // group-level standard deviations
vector[N_2] z_2[M_2]; // standardized group-level effects
real<lower=2.1, upper=100000> v; // degrees of freedom for t-distribution
}
transformed parameters {
vector[N_1] r_1_1; // actual group-level effects
vector[N_2] r_2_1; // actual group-level effects
real lprior = 0; // prior contributions to the log posterior
r_1_1 = (sd_1[1] * (z_1[1]));
r_2_1 = (sd_2[1] * (z_2[1]));
lprior += student_t_lpdf(Intercept | 3, 0, 2.5);
lprior += student_t_lpdf(sigma | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
lprior += student_t_lpdf(sd_1[1] | v, 0, 1) // #CHANGED: from normal to student_t
- 1 * student_t_lccdf(0 | v, 0, 1); // #CHANGED: from normal to student_t
lprior += student_t_lpdf(sd_2[1] | v, 0, 1) // #CHANGED: from normal to student_t
- 1 * student_t_lccdf(0 | v, 0, 1); // #CHANGED: from normal to student_t
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Intercept;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];
}
target += normal_lpdf(Y | mu, sigma);
}
// priors including constants
target += lprior;
target += std_normal_lpdf(z_1[1]);
target += std_normal_lpdf(z_2[1]);
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept;
vector[N] y_pred;
vector[N] log_lik;
for (n in 1:N) {
y_pred[n] = normal_rng(Intercept + r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n], sigma);
log_lik[n] = normal_lpdf(Y[n] | Intercept + r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n], sigma);
}
}
“”"