T-distributed priors with learned v parameter for hierarchical random intercept model - error when fitting

I have a linear hierarchical random intercept model with a two level hierarchy.
I initially set the priors for the random intercepts as normally distributed and want to run a version now where the parameters
are t-distributed and where the parameter v of the t-distribution can be learned.

In “parameters” I set the range for v as:

‘’’
real<lower=2.1, upper=100000> v; // degrees of freedom for t-distribution
‘’’

In “transformed parameters”

I changed the following distributions from normal do student_t

‘’’
lprior += student_t_lpdf(sd_1[1] | v, 0, 1) // #CHANGED: from normal to student_t
- 1 * student_t_lccdf(0 | v, 0, 1); // #CHANGED: from normal to student_t
lprior += student_t_lpdf(sd_2[1] | v, 0, 1) // #CHANGED: from normal to student_t
- 1 * student_t_lccdf(0 | v, 0, 1); // #CHANGED: from normal to student_t
‘’’

The full code of the model is provided below.

I get the following error in PYSTAN and am not sure where the problem comes from:

Rejecting initial value:
Gradient evaluated at the initial value is not finite.
Stan can’t start sampling from this initial value.

Initialization between (-2, 2) failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.

Changing the initialization does not help:

‘’’

Define an initialization function

def init_func():
return {‘v’: np.random.uniform(3, 10)}

Fitting the model with custom initialization

fit = model.sampling(data=stan_data, iter=4000, chains=5, warmup=2000, seed=12, init=init_func)
‘’’

I would highly appreciate any suggestions:

Here is the full model:

stan_code = “”"
functions {
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable

// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
int<lower=1> J_1[N]; // grouping indicator per observation

// group-level predictor values
vector[N] Z_1_1;

// data for group-level effects of ID 2
int<lower=1> N_2; // number of grouping levels
int<lower=1> M_2; // number of coefficients per level
int<lower=1> J_2[N]; // grouping indicator per observation

// group-level predictor values
vector[N] Z_2_1;
int prior_only; // should the likelihood be ignored?
}
transformed data {
}
parameters {
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // dispersion parameter
vector<lower=0>[M_1] sd_1; // group-level standard deviations
vector[N_1] z_1[M_1]; // standardized group-level effects
vector<lower=0>[M_2] sd_2; // group-level standard deviations
vector[N_2] z_2[M_2]; // standardized group-level effects
real<lower=2.1, upper=100000> v; // degrees of freedom for t-distribution

}
transformed parameters {
vector[N_1] r_1_1; // actual group-level effects
vector[N_2] r_2_1; // actual group-level effects
real lprior = 0; // prior contributions to the log posterior
r_1_1 = (sd_1[1] * (z_1[1]));
r_2_1 = (sd_2[1] * (z_2[1]));
lprior += student_t_lpdf(Intercept | 3, 0, 2.5);
lprior += student_t_lpdf(sigma | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
lprior += student_t_lpdf(sd_1[1] | v, 0, 1) // #CHANGED: from normal to student_t
- 1 * student_t_lccdf(0 | v, 0, 1); // #CHANGED: from normal to student_t
lprior += student_t_lpdf(sd_2[1] | v, 0, 1) // #CHANGED: from normal to student_t
- 1 * student_t_lccdf(0 | v, 0, 1); // #CHANGED: from normal to student_t

}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Intercept;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];
}
target += normal_lpdf(Y | mu, sigma);
}
// priors including constants
target += lprior;
target += std_normal_lpdf(z_1[1]);
target += std_normal_lpdf(z_2[1]);
}

generated quantities {
// actual population-level intercept
real b_Intercept = Intercept;

vector[N] y_pred;
vector[N] log_lik;

for (n in 1:N) {
    y_pred[n] = normal_rng(Intercept + r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n], sigma);
    log_lik[n] = normal_lpdf(Y[n] | Intercept + r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n], sigma);
}

}

“”"

Does the proposed prior make sense? If I read the code correctly, there is one each of sd_1 and sd_2. (That would explain statements such as lprior += student_t_lpdf(sd_1[1] | v, 0, 1) etc.) So you propose to estimate the degrees of freedom v of a student-t distribution given one draw from that distribution.

Another question is whether the Jacobian adjustment – student_t_lccdf(0 | v, 0, 1) – can depend on an unknown parameter.

Thank you very much for you comment. The proposed prior did not make sense. It was for the standard deviations and should have been for the coefficients r_1_1 and r_2_1. I removed the priors and instead put the following in the model section: target += student_t_lpdf(r_1_1 | v, 0, sd_1[1]);
target += student_t_lpdf(r_2_1 | v, 0, sd_2[1]);
Now it seems to work.

1 Like