T-distributed priors with learned v parameter for hierarchical random intercept model - error when fitting

James_Wittington · October 14, 2023, 8:22am

I have a linear hierarchical random intercept model with a two level hierarchy.
I initially set the priors for the random intercepts as normally distributed and want to run a version now where the parameters
are t-distributed and where the parameter v of the t-distribution can be learned.

In “parameters” I set the range for v as:

‘’’
real<lower=2.1, upper=100000> v; // degrees of freedom for t-distribution
‘’’

In “transformed parameters”

I changed the following distributions from normal do student_t

‘’’
lprior += student_t_lpdf(sd_1[1] | v, 0, 1) // #CHANGED: from normal to student_t
- 1 * student_t_lccdf(0 | v, 0, 1); // #CHANGED: from normal to student_t
lprior += student_t_lpdf(sd_2[1] | v, 0, 1) // #CHANGED: from normal to student_t
- 1 * student_t_lccdf(0 | v, 0, 1); // #CHANGED: from normal to student_t
‘’’

The full code of the model is provided below.

I get the following error in PYSTAN and am not sure where the problem comes from:

Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can’t start sampling from this initial value.

Initialization between (-2, 2) failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.

Changing the initialization does not help:

‘’’

Define an initialization function

def init_func():
return {‘v’: np.random.uniform(3, 10)}

Fitting the model with custom initialization

fit = model.sampling(data=stan_data, iter=4000, chains=5, warmup=2000, seed=12, init=init_func)
‘’’

I would highly appreciate any suggestions:

Here is the full model:

stan_code = “”"
functions {
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable

// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
int<lower=1> J_1[N]; // grouping indicator per observation

// group-level predictor values
vector[N] Z_1_1;

// data for group-level effects of ID 2
int<lower=1> N_2; // number of grouping levels
int<lower=1> M_2; // number of coefficients per level
int<lower=1> J_2[N]; // grouping indicator per observation

// group-level predictor values
vector[N] Z_2_1;
int prior_only; // should the likelihood be ignored?
}
transformed data {
}
parameters {
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // dispersion parameter
vector<lower=0>[M_1] sd_1; // group-level standard deviations
vector[N_1] z_1[M_1]; // standardized group-level effects
vector<lower=0>[M_2] sd_2; // group-level standard deviations
vector[N_2] z_2[M_2]; // standardized group-level effects
real<lower=2.1, upper=100000> v; // degrees of freedom for t-distribution

}
transformed parameters {
vector[N_1] r_1_1; // actual group-level effects
vector[N_2] r_2_1; // actual group-level effects
real lprior = 0; // prior contributions to the log posterior
r_1_1 = (sd_1[1] * (z_1[1]));
r_2_1 = (sd_2[1] * (z_2[1]));
lprior += student_t_lpdf(Intercept | 3, 0, 2.5);
lprior += student_t_lpdf(sigma | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
lprior += student_t_lpdf(sd_1[1] | v, 0, 1) // #CHANGED: from normal to student_t
- 1 * student_t_lccdf(0 | v, 0, 1); // #CHANGED: from normal to student_t
lprior += student_t_lpdf(sd_2[1] | v, 0, 1) // #CHANGED: from normal to student_t
- 1 * student_t_lccdf(0 | v, 0, 1); // #CHANGED: from normal to student_t

}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Intercept;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];
}
target += normal_lpdf(Y | mu, sigma);
}
// priors including constants
target += lprior;
target += std_normal_lpdf(z_1[1]);
target += std_normal_lpdf(z_2[1]);
}

generated quantities {
// actual population-level intercept
real b_Intercept = Intercept;

vector[N] y_pred;
vector[N] log_lik;

for (n in 1:N) {
    y_pred[n] = normal_rng(Intercept + r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n], sigma);
    log_lik[n] = normal_lpdf(Y[n] | Intercept + r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n], sigma);
}

}

“”"

desislava · October 14, 2023, 4:18pm

Does the proposed prior make sense? If I read the code correctly, there is one each of sd_1 and sd_2. (That would explain statements such as lprior += student_t_lpdf(sd_1[1] | v, 0, 1) etc.) So you propose to estimate the degrees of freedom v of a student-t distribution given one draw from that distribution.

Another question is whether the Jacobian adjustment – student_t_lccdf(0 | v, 0, 1) – can depend on an unknown parameter.

James_Wittington · October 18, 2023, 8:48am

Thank you very much for you comment. The proposed prior did not make sense. It was for the standard deviations and should have been for the coefficients r_1_1 and r_2_1. I removed the priors and instead put the following in the model section: target += student_t_lpdf(r_1_1 | v, 0, sd_1[1]);
target += student_t_lpdf(r_2_1 | v, 0, sd_2[1]);
Now it seems to work.

Topic		Replies	Views
Undesirable(?) behavior in initial values of hierarchical model General	9	940	November 13, 2020
Help with specification of hierarchical priors? Modeling	9	1303	December 4, 2018
Initialization between (-2, 2) failed after 100 attempts Modeling fitting-issues	3	603	January 30, 2021
Issue with model: Gradient evaluated at the initial value is not finite Modeling rstan , specification , multinomial-response , hierarchical-model , dirichlet-multinomial	9	485	September 14, 2023
Random intercept and slope model with constraints Modeling hierarchical-model	3	70	March 12, 2025

T-distributed priors with learned v parameter for hierarchical random intercept model - error when fitting

Define an initialization function

Fitting the model with custom initialization

Related topics