I’m in the process of trying to figure out a more efficient way to fit a four level logistic regression model for repeated cross-sectional survey data in which the i^{th} respondent is nested within birth cohort l, survey-year t, and country j. The Stan code for the baseline model with only the group level intercepts is as follows:

```
// Hierarchical Logistic Regression Model of Support for Democracy
// Model 1: Null Model with Varying Intercepts country, survey-year, and cohort
// Data Dimensions: N = 208,107; L = 2460; T = 169; J = 58
data {
int <lower = 0> N; // Total Number of Observations
int <lower = 0, upper = 1> Y[N]; // The Response Vector
int <lower = 1> K; // Number of Fixed Effects in the Model
matrix[N, K] X; // Design Matrix for the Fixed Effects
// Data for Country-Level Effects (Level 4)
int <lower = 1> J; // Total Number of Countries J
int<lower=1> M_J; // Number of Country Level Coefficients
int <lower = 1> jj[N]; // Country-Observation Indicator
vector[N] Z_J; // Country-Level Predictors
// Data for Country-Year Effects (Level 3)
int<lower=1> T; // Total Number of Country-Years T
int<lower=1> M_T; // Number of Country-Year Level Coefficients
int<lower=1> tt[N]; // Grouping Indicator Per Observation
vector[N] Z_T; // Time Varying Country-Level Predictors
// Data for Country-Birth Cohort Effects (Level 2)
int<lower=1> L; // Total Number Cohorts L
int<lower=1> M_L; // Number of Country-Birth Cohort Level Coefficients
int<lower=1> ll[N]; // Country-Birth Cohort Indicator
vector[N] Z_L; // Country-Birth Cohort Intercepts
}
parameters {
vector[K] beta; // Fixed Effects Parameters
vector <lower = 0>[M_J] sigma_j; // Country-Level Standard Deviations
vector[J] alpha_j[M_J]; // Standardized Country-Level Intercepts
vector <lower = 0>[M_T] sigma_t; // Country-Year Standard Deviations
vector[T] alpha_t[M_T]; // Standardized Country-Level Intercepts
vector <lower = 0>[M_L] sigma_l; // Country-Year Standard Deviations
vector[L] alpha_l[M_L]; // Standardized Birth Cohort-Level Intercepts
}
transformed parameters {
vector[J] country; // Actual Country-Level Effects
country = (sigma_j[1] * (alpha_j[1]));
vector[T] survey_year; // Actual Country-Year Effects
survey_year = (sigma_t[1] * (alpha_t[1]));
vector[L] cohort; // Actual Birth Cohort Effects
cohort = (sigma_l[1] * (alpha_l[1]));
}
model {
vector[N] mu = X * beta;
// Priors for the Model Parameters
profile("Priors") {
target += normal_lpdf(beta | 0, 1);
target += student_t_lpdf(sigma_j | 3, 0, 2) - 1 * student_t_lccdf(0 | 3, 0, 2);
target += std_normal_lpdf(alpha_j[1]);
target += student_t_lpdf(sigma_t | 3, 0, 2) - 1 * student_t_lccdf(0 | 3, 0, 2);
target += std_normal_lpdf(alpha_t[1]);
target += student_t_lpdf(sigma_l | 3, 0, 2) - 1 * student_t_lccdf(0 | 3, 0, 2);
target += std_normal_lpdf(alpha_l[1]);
}
// Linear Predictor
profile("Linear Predictor") {
for (n in 1:N) {
mu[n] += country[jj[n]] * Z_J[n] + survey_year[tt[n]] * Z_T[n] + cohort[ll[n]] * Z_L[n];
}
}
// Likelihood
profile("Likelihood") {
target += bernoulli_logit_lpmf(Y | mu);
}
}
```

However, I have run into what based on the profiling output below seems to be a severe bottle-neck in the linear predictor and likelihood evaluation sections.

```
name thread_id total_time forward_time reverse_time chain_stack no_chain_stack autodiff_calls no_autodiff_calls
1 Likelihood 1 4465 2649 1815.6 691190 0 345595 1
2 Linear Predictor 1 7286 4830 2456.2 175600275450 0 345595 1
3 Priors 1 1220 1006 213.8 4492735 0 345595 1
```

I’m hoping there’s some way to implement the model more efficiently but I haven’t had much luck so far and as things stand now, if the null specification is this slow I imagine adding predictors will just make things worse since the full specification is a within-between random effects model with around 20 predictors and a cross-level interaction.

I’m using cmdstanr with cmdstan 2.27.0 on a Windows 10 desktop with a Ryzen 9 5900X, 64GB of Memory, and an Nvidia RTX 3090 and I’m hoping I can take advantage of the expanded OpenCL support that was added in the latest version of cmdstan since the 3090 should be capable of delivering some pretty substantial computational power (it requires less than half the wall time the CPU does for the OpenCL cmdstanr vignette model).

Any help anyone can provide with this would be very much appreciated.

Model File: Churchill_HLogit_Null.stan (2.9 KB)

Null Model Data: Data for the Stan Model