Hi all. I am trying to estimate a series of ordinal probit models with increasing complexity. Further down, I am attaching the script for the most complex one. The model is based on Kruschke and Liddell’s 2018 paper, where the first and last thresholds are fixed to identify the model while estimating the intercept and the standard deviation. The modeling questions:
- Does it make sense to put the Dirichlet prior on the raw thresholds (in code: tau_prop)? Or is it recommended to put a normal prior on the transformed thresholds (tau)? Extra: Is there some useful interpretation of the tau_prop parameter, or do we just mention it as a tool for fixing the lower and upper thresholds?
- In the model without eta, the model does fine (no divergencies, trace plots and other diagnostics looks satisfying). Including eta, I get a few divergences with adapt_delta = 0.95. Would it be more appropriate to use a non-centered parametrisation for eta here? If so, do we follow the same procedure as for normal non-centered distributions? I am using the t distribution to prevent too much shrinkage, given that there are only six agencies.
Some background: The data is original survey data at the individual level, with unit of analysis being individual-agency. We have asked our respondents to report their general confidence in six Swedish public agencies and also asked for their evaluation of each agency in terms of their competence, motivation, and opportunity to perform their task. All items have seven response categories.
The theoretical quantity that I am interested in is the gamma part, basically trying to predict general confidence using our theoretical components of trust. I am including a model at the individual level to strengthen exchangeability among the observations: covariates include some demographical and political variables. Eta is motivated theoretically to adjust for differences in scale-usage between the agencies (inspired by Jackman’s chapter on ordinal models) but I realise that the constants will soak up any agency-unique variance as well.
The way I have parametrised the thresholds are in line with the solution given by @nhuurre in this post.
I would be extremely thankful for any comments, both relating to the questions or other reflections. All the best!
Oskar
data {
int<lower = 1> N_n; // number of rows
int<lower = 1, upper = 7> N_k; // number of categories on all items, 7
int<lower = 3> N_d; // number of theoretical trust dimensions, 3
int<lower = 1> N_r; // number of unique respondents, 1810
int<lower = 1, upper = 6> N_a; // number of agencies asked for, 6
int<lower = 1, upper = N_r> resp[N_n]; // respondent at row n
int<lower = 1, upper = N_a> agency[N_n]; // agency at row n
real<lower = 0.0, upper = 1.5> tau_lower; // == 1 + 0.5 = 1.5
real<lower = 0.0, upper = 6.5> tau_upper; // == N_k - 0.5 = 6.5
int<lower = 2> nu_eta; // degree of freedom for eta, 4
matrix[N_n, N_d] l1_xmat; // Individual-agency level predictor matrix, N_n by N_d
int<lower = 1> R; // number of predictors in l2_xmat
matrix[N_r, R] l2_xmat; // Individual-level predictor matrix, N_r by R
int<lower = 1, upper = 7> general[N_n]; // outcome vector with general trust statements
}
parameters {
// Thresholds
simplex[N_k-2] tau_prop; // Define tau_prop, used to transform into thresholds
// Individual-level section
vector[N_r] raw_alpha; // raw_alpha for non-centered parametrization of varying intercepts
real<lower = 0> sigma_alpha; // standard deviation for distribution of varying intercepts
real mu_alpha; // grand mean for varying intercept distribution
real<lower = 0.0> sigma_mu_alpha; // standard deviation for grand mean for varying intercept distribution
vector[R] beta; // vector of regression coefficients for individual level model (level 2)
vector<lower = 0.0>[R] sigma_beta; // vector of standard deviations for beta
// Agency-individual section
vector<lower = 0.0>[N_d] sigma_gamma_fixed; // vector of standard deviations for population level regression parameters
vector[N_d] gamma_fixed; // length 3 vector of population level regrssion parameters
matrix[N_d, N_a] z_gamma; // matrix of 3 by 6, unit-normal
cholesky_factor_corr[N_d] L_dim; // Cholesky factor for group-level effects
vector<lower = 0.0>[N_d] sigma_dim; // standard deviaton for each dimension of trust
// Eta = agency-specific constants
vector[N_a] eta_raw; // vector of agency-specific constants
real<lower = 0.0> sigma_eta; // standard deviation of eta distribution
// Outcome section
real<lower = 0.0> sigma_y; // standard deviation of outcomes
}
transformed parameters {
ordered[N_k-1] tau; // transformed ordered thresholds
vector[N_r] alpha; // transformed varying intercepts (1 | resp)
vector[N_r] alpha_hat; // predictor for alpha = mu_alpha + model
simplex[N_k] theta[N_n]; // category-response probabilities, N_k repeated N_n times
matrix[N_d, N_a] gamma; // 3 by 6 matrix for random effects ( {competence, motivation, opportunity} | agency)
vector[N_a] eta; // transformed agency-specific constants
// Thresholds
tau[1] = tau_lower; // first value is set to fixed lower from data, 1.5
for(c in 2:(N_k-1)) tau[c] = tau[c-1] + (tau_upper - tau_lower) * tau_prop[c-1] ; // add (upper-lower) between tau[1] and tau[6] by multiplying "on" the simplex bit-by-bit
// Model for level 2
for(i in 1:N_r) alpha_hat[i] = mu_alpha + l2_xmat[i,1] * beta[1] + l2_xmat[i,2] * beta[2] + l2_xmat[i,3] * beta[3] + l2_xmat[i,4] * beta[4]; // model for alpha hat
// Varying intercepts: centering alpha
alpha = alpha_hat + raw_alpha * sigma_alpha;
// Transform random effects
gamma = diag_pre_multiply(sigma_dim, L_dim) * z_gamma; // Multiple the diag(Sigma_dim) by the Cholesky factor then multiply with z_u to get random effects
// Eta
eta = eta_raw - mean(eta_raw); // sum eta to 0 recursively
// Model for level 1
for(n in 1:N_n) { // N_n loop starts here...
real mu[N_n]; // local variable mu for the conditional mean of the linear predictor
mu[n] = alpha[resp[n]] + eta[agency[n]] +
(gamma_fixed[1] + gamma[1, agency[n]]) * l1_xmat[n,1] +
(gamma_fixed[2] + gamma[2, agency[n]]) * l1_xmat[n,2] +
(gamma_fixed[3] + gamma[3, agency[n]]) * l1_xmat[n,3];
//Calculate response probabilitities
theta[n,1] = Phi_approx( ( tau[1] - mu[n] ) / sigma_y );
theta[n,N_k] = 1 - Phi_approx( ( tau[N_k - 1] - mu[n] ) / sigma_y );
for (k in 2:(N_k-1)) {theta[n,k] = Phi_approx( ( tau[k] - mu[n] ) / sigma_y ) - Phi_approx( ( tau[k-1] - mu[n] ) / sigma_y );}
} // ... N_n loop ends here
}
model {
// Level 2 parametrs
sigma_mu_alpha ~ exponential(1.0); // exponential for sigma_mu_alpha
sigma_alpha ~ normal(0.0, 5.0); // half-normal for standard deviation
raw_alpha ~ normal(0, 1); // standard normal for non-centered
mu_alpha ~ normal( (N_k+1.0)/2.0 , sigma_mu_alpha); // location of mean alpha, prior location is mid-point of the response scale
sigma_beta ~ exponential(1.0); // prior for sigma_beta scale
beta ~ normal(0, sigma_beta); // prior for beta, individual level coefficients
// Level 1 parametrs
sigma_gamma_fixed ~ exponential(1.0);
sigma_y ~ exponential(1.0);
gamma_fixed ~ normal(0.0, sigma_gamma_fixed); // for each gamma, location is 0 with adaptive prior for sigmas
L_dim ~ lkj_corr_cholesky(1.0); // LKJ(1) on the Cholesky factor to reflect uniform expectations on the correlation matrix
sigma_dim ~ normal(0.0, 5.0); // Half-normal for the standard deviations for each dimension
to_vector(z_gamma) ~ normal(0.0, 1.0); // Standard-normal prior on the random deviations
sigma_eta ~ normal(0.0, 2.0); // half-normal for standard deviation
eta_raw ~ student_t(nu_eta, 0.0, sigma_eta); // student t distribution for the agency specific constants, df from data, located at 0.0
// thresholds
tau_prop ~ dirichlet( rep_vector(1.0, (N_k-2))); // Uniform Dirichlet prior on the threshold proportions
// Increament log probabibility of general as categorical
for(n in 1:N_n) general[n] ~ categorical( theta[n, 1:N_k] ); // General is the outcome vector
}
generated quantities {
vector[N_n] log_lik;
matrix[N_d, N_d] Omega;
vector[N_n] y_rep;
// Get log likelihood of data point n
for(n in 1:N_n) log_lik[n] = categorical_lpmf( general[n] | theta[n, 1:N_k] );
// y_rep, given parameters and observed data
for(n in 1:N_n) y_rep[n] = categorical_rng(theta[n, 1:N_k]);
// Calculate correlation matrix of ( {comp, moti, oppo} | agency )
Omega = multiply_lower_tri_self_transpose(L_dim); // retrieve the correlation matrix
}