Hi all!
I am sure many of you know useful tricks to make hierarchical models fit faster. I have been trying different things for a while, and I would like to share a specification that is very fast and that, as far as I know, I have not seen discussed in the Stan forums.
Because my background is in psychometrics and experimental psychology, I will use a simple example inspired by common experimental tasks such as Stroop, flanker, or Simon tasks. Let Y_{ik\ell} be the observed response for subject i on trial \ell in condition k, where k indexes two conditions, for example congruent and incongruent trials.[1]
where X_k is coded as -\frac{1}{2} and +\frac{1}{2}. Thus, \alpha_i is the subjectβs average response, and \beta_i is the subject-specific experimental effect. At the population level,
The R code below generates the simulated data used as a running example in this post. Specifically, I simulate a balanced design with 150 subjects and 150 trials per subject and condition. The next sections show how to fit the same model to these data using three different specifications: one using brms, and two more efficient versions programmed directly in Stan. In all cases, I use the same prior distributions.
Click here to see the R code used to simulate the data
library(MASS)
# Basic gaussian hierarchical model with random interceps and slopes
# Population parameters from here: https://osf.io/preprints/psyarxiv/gv6k7_v2
set.seed(2026)
mu_alpha <- .65
mu_beta <- .06
sigma_alpha <- .12
sigma_beta <- .04
rho_ab <- matrix(c(1,.5,.5,1), ncol=2)
sigma_trial <- .18
Sigma_REs <- diag(c(sigma_alpha, sigma_beta)) %*% rho_ab %*%
diag(c(sigma_alpha, sigma_beta))
# Subjects and trials per subject and coondition (2, e.g., congruent/incongruent)
I <- 150
L <- 150
# Simulate random effects
REs <- MASS::mvrnorm(
n = I,
mu = c(mu_alpha, mu_beta),
Sigma = Sigma_REs
)
# Simulate observed data
dat <- matrix(NA, ncol = 3, nrow = I * L * 2)
cnt <- 1
for(i in 1:I) {
for(k in 1:2) {
# Alpha, beta and contidion
alpha_i <- REs[i,1]
beta_i <- REs[i,2]
X_k <- (k - 1.5)
mu_i <- alpha_i + beta_i * X_k
for(l in 1:L) {
dat[cnt, 1] <- i
dat[cnt, 2] <- k
# Trial-level data
dat[cnt, 3] <- rnorm(1, mu_i, sigma_trial)
cnt <- cnt + 1
}
}
}
# Store as data frame
dat <- as.data.frame(dat)
colnames(dat) <- c("subj", "cond", "rt")
1. Fitting the model with brms
As a reference, I first fit the model using brms:
This is the most convenient specification. In my computer, this model requires 16 minutes to be estimated.
Click here to see the `brms` code
# Set condition as contrast sum
dat$cond <- dat$cond - 1.5
# Set prior distributions
priors_brms <- c(
prior(normal(0, 1), class = "Intercept"),
prior(normal(0, 1), class = "b", coef = "cond"),
prior(student_t(4, 0, 0.5), class = "sd", group = "subj", coef = "Intercept"),
prior(student_t(4, 0, 0.5), class = "sd", group = "subj", coef = "cond"),
prior(student_t(4, 0, 0.5), class = "sigma"),
prior(lkj(1), class = "cor", group = "subj")
)
# Fit brms model
brms_fit <- brm(
formula = rt ~ 1 + cond + (1 + cond | subj),
data = dat,
prior = priors_brms,
iter = 2000,
warmup = 1000,
chains = 4,
cores = 4,
seed = 2026)
# Estimation time
et_brms <- mean(rowSums(get_elapsed_time(brms_fit$fit)))
Before moving to the hand-written Stan versions, it is useful to look at what brms does in the model block:
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Intercept;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n];
}
target += normal_id_glm_lpdf(Y | Xc, mu, b, sigma);
}
// priors including constants
target += lprior;
target += std_normal_lpdf(to_vector(z_1));
}
In this Gaussian model, brms builds a temporary linear predictor of length N, where here N=I \times K \times L = 45,000, and then evaluates the likelihood in one vectorized call. The two Stan versions below avoid building this full N-length vector.
2. A hand-written Stan model using a long-format matrix
The first hand-written Stan version still uses the long-format data. The main change is that I do not build a full N-length linear predictor. Instead, I evaluate the likelihood separately for each subject, using only that subjectβs slice of the long data. This only requires two indexing objects: the number of rows per subject and the row where each subject starts. If the data are sorted by subject, both can be obtained with two lines of R code:
# Number of observations for each subject
trialsum <- as.integer(rle(dat$subj)$lengths)
# First row of each subject in the long data frame
idx_start = cumsum(c(1L, head(trialsum, -1L)))
The Stan model below fits the same hierarchical model as the brms specification. The key part is the loop over subjects in the model block: for each subject, I find the relevant rows and apply the normal model to that slice of the data.
data {
int<lower=1> N; // Number of observations
int<lower=1> I; // Number of subjects
matrix[N, 3] Y; // Columns: (1 = subj, 2 = cond, 3 = rt)
array[I] int<lower=1> trialsum; // Number of observed trials per subject
array[I] int<lower=1> idx_start; // First row of each subject in Y
}
parameters {
vector[2] mu_REs; // Population means
vector<lower=0>[2] sd_REs; // Population std. devs
real<lower=0> sigma; // Trial-level std. dev
cholesky_factor_corr[2] L_R; // Population REs correlation
matrix[2, I] REs_std; // Standardized REs
}
transformed parameters{
// Scaled random effects
matrix[I, 2] REs = rep_matrix(mu_REs', I) +
(diag_pre_multiply(sd_REs, L_R) * REs_std)';
}
model {
// Model priors
mu_REs ~ std_normal();
sd_REs ~ student_t(4, 0, 0.5);
sigma ~ student_t(4, 0, 0.5);
L_R ~ lkj_corr_cholesky(1);
to_vector(REs_std) ~ std_normal();
// Model Log-Likelihood
for (i in 1:I) {
// idx of the last observed raw for subject i
int idx_end = idx_start[i] + trialsum[i] - 1;
// Vectorized likelihood for subject i
Y[idx_start[i]:idx_end, 3] ~ normal(
REs[i, 1] + REs[i, 2] * Y[idx_start[i]:idx_end, 2],
sigma
);
}
}
generated quantities {
// Model-implied correlation matrix
corr_matrix[2] Rho_REs = multiply_lower_tri_self_transpose(L_R);
}
Click here to see the R code for the long-format Stan model
# The efficient hierarchical model via blocking
HM_stancode_block <- "
data {
int<lower=1> N; // Number of observations
int<lower=1> I; // Number of subjects
matrix[N, 3] Y; // Columns: (1 = subj, 2 = cond, 3 = rt)
array[I] int<lower=1> trialsum; // Number of observed trials per subject
array[I] int<lower=1> idx_start; // First row of each subject in Y
}
parameters {
vector[2] mu_REs; // Population means
vector<lower=0>[2] sd_REs; // Population std. devs
real<lower=0> sigma; // Trial-level std. dev
cholesky_factor_corr[2] L_R; // Population REs correlation
matrix[2, I] REs_std; // Standardized REs
}
transformed parameters{
// Scaled random effects
matrix[I, 2] REs = rep_matrix(mu_REs', I) +
(diag_pre_multiply(sd_REs, L_R) * REs_std)';
}
model {
// Model priors
mu_REs ~ std_normal();
sd_REs ~ student_t(4, 0, 0.5);
sigma ~ student_t(4, 0, 0.5);
L_R ~ lkj_corr_cholesky(1);
to_vector(REs_std) ~ std_normal();
// Model Log-Likelihood
for (i in 1:I) {
// idx of the last observed raw for subject i
int idx_end = idx_start[i] + trialsum[i] - 1;
// Vectorized likelihood for subject i
Y[idx_start[i]:idx_end, 3] ~ normal(
REs[i, 1] + REs[i, 2] * Y[idx_start[i]:idx_end, 2],
sigma
);
}
}
generated quantities {
// Model-implied correlation matrix
corr_matrix[2] Rho_REs = multiply_lower_tri_self_transpose(L_R);
}
"
# Store trialsum
trialsum <- as.integer(rle(dat$subj)$lengths)
# Prepare Stan data
sdata_block <- list(
N = nrow(dat),
I = length(unique(dat$subj)),
Y = as.matrix(dat),
trialsum = trialsum,
idx_start = as.integer(cumsum(c(1L, head(trialsum, -1L))))
)
# Fit stan model
fit_stan_block <- stan(
model_code = HM_stancode_block,
data = sdata_block,
iter = 2000,
warmup = 1000,
chains = 4,
cores = 4,
seed = 2026)
# Estimation time
et_stan_block <- mean(rowSums(get_elapsed_time(fit_stan_block)))
On my computer, this model takes a little under three minutes to fit, which makes it about six times faster than the brms specification.
3. A hand-written Stan model using a balanced array
The second hand-written Stan version uses an array representation of the data. Because the simulated design is balanced, the response variable can be stored as
where i indexes subjects, k indexes conditions, and \ell indexes trials. This makes the likelihood very compact: for each subject and condition, all trials share the same linear predictor, so the likelihood can be vectorized over trials:
data {
int<lower=1> I; // Number of subjects
int<lower=1> L; // Number of trials per subj and cond
array[I, 2, L] real Y; // Array with observed responses
}
parameters {
vector[2] mu_REs; // Population means
vector<lower=0>[2] sd_REs; // Population std. devs
real<lower=0> sigma; // Trial-level std. dev
cholesky_factor_corr[2] L_R; // Population REs correlation
matrix[2, I] REs_std; // Standardized REs
}
transformed parameters{
// Scaled random effects
matrix[I, 2] REs = rep_matrix(mu_REs', I) +
(diag_pre_multiply(sd_REs, L_R) * REs_std)';
}
model {
// Model priors
mu_REs ~ std_normal();
sd_REs ~ student_t(4, 0, 0.5);
sigma ~ student_t(4, 0, 0.5);
L_R ~ lkj_corr_cholesky(1);
to_vector(REs_std) ~ std_normal();
// Model Log-Likelihood
for (i in 1:I) {
for(k in 1:2) {
// Linear prediction
real mu_ik = REs[i, 1] + REs[i, 2] * (k - 1.5);
// Vectorized log-likelihood
Y[i, k, ] ~ normal(mu_ik, sigma);
}
}
}
generated quantities {
// Model-implied correlation matrix
corr_matrix[2] Rho_REs = multiply_lower_tri_self_transpose(L_R);
}
Click here to see the R code for the array-based Stan model
# The efficient hierarchical model
HM_stancode_array <- "
data {
int<lower=1> I; // Number of subjects
int<lower=1> L; // Number of trials per subj and cond
array[I, 2, L] real Y; // Array with observed responses
}
parameters {
vector[2] mu_REs; // Population means
vector<lower=0>[2] sd_REs; // Population std. devs
real<lower=0> sigma; // Trial-level std. dev
cholesky_factor_corr[2] L_R; // Population REs correlation
matrix[2, I] REs_std; // Standardized REs
}
transformed parameters{
// Scaled random effects
matrix[I, 2] REs = rep_matrix(mu_REs', I) +
(diag_pre_multiply(sd_REs, L_R) * REs_std)';
}
model {
// Model priors
mu_REs ~ std_normal();
sd_REs ~ student_t(4, 0, 0.5);
sigma ~ student_t(4, 0, 0.5);
L_R ~ lkj_corr_cholesky(1);
to_vector(REs_std) ~ std_normal();
// Model Log-Likelihood
for (i in 1:I) {
for(k in 1:2) {
// Linear prediction
real mu_ik = REs[i, 1] + REs[i, 2] * (k - 1.5);
// Vectorized log-likelihood
Y[i, k, ] ~ normal(mu_ik, sigma);
}
}
}
generated quantities {
// Model-implied correlation matrix
corr_matrix[2] Rho_REs = multiply_lower_tri_self_transpose(L_R);
}
"
# Store data in a three-dimensional array
Y_array <- aperm(array(dat$rt, dim = c(L, 2, I)), c(3, 2, 1))
# Prepare Stan data
sdata_array <- list(
I = I,
L = L,
Y = Y_array
)
# Fit the model
fit_stan_array <- stan(
model_code = HM_stancode_array,
data = sdata_array,
iter = 2000,
warmup = 1000,
chains = 4,
cores = 4,
seed = 2026
)
# Estimation time
et_stan_array <- mean(rowSums(get_elapsed_time(fit_stan_array)))
Here I use the simplest version, because the simulated design is balanced. The same idea also works with unbalanced data, as long as we can keep track of which observations belong to each subject. In fact, I used this strategy with unbalanced data in my own work, based on Haines et al. (2025), which is where I first saw this way of specifying hierarchical models.
On my computer, this model takes about 27 seconds to fit. This makes it around 6 times faster than the previous Stan version, and around 35 times faster than the brms specification.
4. Model comparison
The table below shows the elapsed time for the three approaches on my computer:
| Model | Seconds | Minutes | Relative to array |
|---|---|---|---|
brms |
962.6 | 16.04 | 34.9 |
| Stan, long-data | 171.3 | 2.86 | 6.2 |
| Stan, array | 27.6 | 0.46 | 1.0 |
Below, I also report the estimates of the group-level parameters obtained with the three models.
# ------------------- #
# Results: brms #
# ------------------- #
Family: gaussian
Links: mu = identity
Formula: rt ~ 1 + cond + (1 + cond | subj)
Data: dat (Number of observations: 45000)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~subj (Number of levels: 150)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.12 0.01 0.11 0.14 1.06 138 311
sd(cond) 0.04 0.00 0.03 0.04 1.01 1039 2195
cor(Intercept,cond) 0.64 0.06 0.51 0.75 1.01 942 2183
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.66 0.01 0.64 0.67 1.03 90 182
cond 0.05 0.00 0.05 0.06 1.01 278 667
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.18 0.00 0.18 0.18 1.00 6770 2631
# ----------------------------------- #
# Results: Long format approach #
# ----------------------------------- #
# Results: Long format approach
parameter variable mean sd median q2.5 q97.5 rhat ess_bulk ess_tail
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 mu_alpha mu_REs[1] 0.654 0.00957 0.654 0.636 0.674 1.01 41.7 93.5
2 mu_beta mu_REs[2] 0.0524 0.00342 0.0524 0.0457 0.0590 1.01 154. 717.
3 sigma_alpha sd_REs[1] 0.123 0.00738 0.123 0.110 0.139 1.02 129. 196.
4 sigma_beta sd_REs[2] 0.0377 0.00282 0.0376 0.0324 0.0436 1.00 851. 1826.
5 sigma sigma 0.181 0.000586 0.181 0.180 0.182 1.00 7633. 2823.
6 rho_alpha_beta Rho_REs[1,2] 0.636 0.0619 0.639 0.503 0.748 1.00 755. 1665.
# ----------------------------- #
# Results: Array approach #
# ----------------------------- #
parameter variable mean sd median q2.5 q97.5 rhat ess_bulk ess_tail
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 mu_alpha mu_REs[1] 0.655 0.0100 0.655 0.635 0.675 1.01 85.0 164.
2 mu_beta mu_REs[2] 0.0527 0.00348 0.0526 0.0460 0.0594 1.00 277. 891.
3 sigma_alpha sd_REs[1] 0.124 0.00713 0.123 0.111 0.138 1.00 240. 530.
4 sigma_beta sd_REs[2] 0.0379 0.00291 0.0378 0.0326 0.0441 1.00 1164. 1814.
5 sigma sigma 0.181 0.000589 0.181 0.180 0.182 1.000 6423. 2537.
6 rho_alpha_beta Rho_REs[1,2] 0.634 0.0641 0.638 0.494 0.748 1.00 845. 1920.
To conclude
Overall, I think there may be room for improving the way some common hierarchical models are implemented in experimental psychology, including cognitive-process models such as drift diffusion models, which can be especially slow. I am not sure how far this idea generalizes to other hierarchical settings, such as longitudinal models with autoregressive structure.
In my case, the array-based approach was useful because, in my main work, I extend the previous model to J experimental tasks. The population-level model then becomes something like
where \boldsymbol{\alpha}_i and \boldsymbol{\beta}_i contain subject iβs intercepts and experimental effects across J tasks. As a fun fact, in my own version I also parameterize \mathbf{\Sigma}_{\beta} as \mathbf{\Lambda}\mathbf{\Phi}\mathbf{\Lambda}^{\top} + \mathbf{\Psi}, as in a psychometric factor model.
This model specification, with or without the latent factor parameterization, becomes computationally expensive in brms. For example, I tried a version with 200 subjects, 6 tasks, and 100 trials per task. The array-based Stan version took about 5 minutes to fit, whereas the same model in brms took around 40 hours.
Of course, this is much less flexible than brms for specifying arbitrary models. Still, I wonder whether this kind of structured likelihood could be useful in other settings too. I would be very curious to hear what others think.
I leave below the full R code that I used for this illustration.
Click here to see the full R code used in this illustration
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# β SCRIPT OVERVIEW β
# β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
# β Script Name : stan_eff_hiermods.R β
# β Author : Ricardo Rey-SΓ‘ez β
# β Role : PhD Candidate in Psychology β
# β Institution : Autonomous University of Madrid, Spain β
# β Email : ricardoreysaez95@gmail.es β
# β Date : 09-05-2026 β
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# SECTION 1: Load Packages
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Libraries necessary for the script to function
library(MASS)
library(dplyr)
library(brms)
library(rstan)
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# SECTION 2: User-Defined Functions
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Basic gaussian hierarchical model with random interceps (alpha) and slopes (beta)
# Population parameters from here: https://osf.io/preprints/psyarxiv/gv6k7_v2
set.seed(2026)
mu_alpha <- .65
mu_beta <- .06
sigma_alpha <- .12
sigma_beta <- .04
rho_ab <- matrix(c(1,.5,.5,1), ncol=2)
sigma_trial <- .18
Sigma_REs <- diag(c(sigma_alpha, sigma_beta)) %*% rho_ab %*%
diag(c(sigma_alpha, sigma_beta))
# Subjects and trials per subject and coondition (2, e.g., congruent/incongruent)
I <- 150
L <- 150
# Simulate random effects
REs <- MASS::mvrnorm(
n = I,
mu = c(mu_alpha, mu_beta),
Sigma = Sigma_REs
)
# Simulate observed data
dat <- matrix(NA, ncol = 3, nrow = I * L * 2)
cnt <- 1
for(i in 1:I) {
for(k in 1:2) {
# Alpha, beta and contidion
alpha_i <- REs[i,1]
beta_i <- REs[i,2]
X_k <- (k - 1.5)
mu_i <- alpha_i + beta_i * X_k
for(l in 1:L) {
dat[cnt, 1] <- i
dat[cnt, 2] <- k
# Trial-level data
dat[cnt, 3] <- rnorm(1, mu_i, sigma_trial)
cnt <- cnt + 1
}
}
}
# Store as data frame
dat <- as.data.frame(dat)
colnames(dat) <- c("subj", "cond", "rt")
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# SECTION 3: Simulate and fit brms model
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Set condition as contrast sum
dat$cond <- dat$cond - 1.5
# Set prior distributions
priors_brms <- c(
prior(normal(0, 1), class = "Intercept"),
prior(normal(0, 1), class = "b", coef = "cond"),
prior(student_t(4, 0, 0.5), class = "sd", group = "subj", coef = "Intercept"),
prior(student_t(4, 0, 0.5), class = "sd", group = "subj", coef = "cond"),
prior(student_t(4, 0, 0.5), class = "sigma"),
prior(lkj(1), class = "cor", group = "subj")
)
# Fit brms model
brms_fit <- brm(
formula = rt ~ 1 + cond + (1 + cond | subj),
data = dat,
prior = priors_brms,
iter = 2000,
warmup = 1000,
chains = 4,
cores = 4,
seed = 2026)
# Estimation time
et_brms <- mean(rowSums(get_elapsed_time(brms_fit$fit)))
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# SECTION 4: Fit efficient hierarchical model --- block
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# The efficient hierarchical model via blocking
HM_stancode_block <- "
data {
int<lower=1> N; // Number of observations
int<lower=1> I; // Number of subjects
matrix[N, 3] Y; // Columns: (1 = subj, 2 = cond, 3 = rt)
array[I] int<lower=1> trialsum; // Number of observed trials per subject
array[I] int<lower=1> idx_start; // First row of each subject in Y
}
parameters {
vector[2] mu_REs; // Population means
vector<lower=0>[2] sd_REs; // Population std. devs
real<lower=0> sigma; // Trial-level std. dev
cholesky_factor_corr[2] L_R; // Population REs correlation
matrix[2, I] REs_std; // Standardized REs
}
transformed parameters{
// Scaled random effects
matrix[I, 2] REs = rep_matrix(mu_REs', I) +
(diag_pre_multiply(sd_REs, L_R) * REs_std)';
}
model {
// Model priors
mu_REs ~ std_normal();
sd_REs ~ student_t(4, 0, 0.5);
sigma ~ student_t(4, 0, 0.5);
L_R ~ lkj_corr_cholesky(1);
to_vector(REs_std) ~ std_normal();
// Model Log-Likelihood
for (i in 1:I) {
// idx of the last observed raw for subject i
int idx_end = idx_start[i] + trialsum[i] - 1;
// Vectorized likelihood for subject i
Y[idx_start[i]:idx_end, 3] ~ normal(
REs[i, 1] + REs[i, 2] * Y[idx_start[i]:idx_end, 2],
sigma
);
}
}
generated quantities {
// Model-implied correlation matrix
corr_matrix[2] Rho_REs = multiply_lower_tri_self_transpose(L_R);
}
"
# Store trialsum
trialsum <- as.integer(rle(dat$subj)$lengths)
# Prepare Stan data
sdata_block <- list(
N = nrow(dat),
I = length(unique(dat$subj)),
Y = as.matrix(dat),
trialsum = trialsum,
idx_start = as.integer(cumsum(c(1L, head(trialsum, -1L))))
)
# Fit stan model
fit_stan_block <- stan(
model_code = HM_stancode_block,
data = sdata_block,
iter = 2000,
warmup = 1000,
chains = 4,
cores = 4,
seed = 2026)
# Estimation time
et_stan_block <- mean(rowSums(get_elapsed_time(fit_stan_block)))
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# SECTION 5: Fit efficient hierarchical model --- array
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# The efficient hierarchical model
HM_stancode_array <- "
data {
int<lower=1> I; // Number of subjects
int<lower=1> L; // Number of trials per subj and cond
array[I, 2, L] real Y; // Array with observed responses
}
parameters {
vector[2] mu_REs; // Population means
vector<lower=0>[2] sd_REs; // Population std. devs
real<lower=0> sigma; // Trial-level std. dev
cholesky_factor_corr[2] L_R; // Population REs correlation
matrix[2, I] REs_std; // Standardized REs
}
transformed parameters{
// Scaled random effects
matrix[I, 2] REs = rep_matrix(mu_REs', I) +
(diag_pre_multiply(sd_REs, L_R) * REs_std)';
}
model {
// Model priors
mu_REs ~ std_normal();
sd_REs ~ student_t(4, 0, 0.5);
sigma ~ student_t(4, 0, 0.5);
L_R ~ lkj_corr_cholesky(1);
to_vector(REs_std) ~ std_normal();
// Model Log-Likelihood
for (i in 1:I) {
for(k in 1:2) {
// Linear prediction
real mu_ik = REs[i, 1] + REs[i, 2] * (k - 1.5);
// Vectorized log-likelihood
Y[i, k, ] ~ normal(mu_ik, sigma);
}
}
}
generated quantities {
// Model-implied correlation matrix
corr_matrix[2] Rho_REs = multiply_lower_tri_self_transpose(L_R);
}
"
# Store data in a three-dimensional array
Y_array <- aperm(array(dat$rt, dim = c(L, 2, I)), c(3, 2, 1))
# Prepare Stan data
sdata_array <- list(
I = I,
L = L,
Y = Y_array
)
# Fit stan model
fit_stan_array <- stan(
model_code = HM_stancode_array,
data = sdata_array,
iter = 2000,
warmup = 1000,
chains = 4,
cores = 4,
seed = 2026)
# Estimation time
et_stan_array <- mean(rowSums(get_elapsed_time(fit_stan_array)))
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# SECTION 6: Model comparison
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# brms model is 6 times slower
round(et_brms / et_stan_block)
# brms model is 35 times slower (!!)
round(et_brms / et_stan_array)
# stan block model is X times slower than stan array
round(et_stan_block / et_stan_array)
# Same population means
cbind(true = c(mu_alpha, mu_beta),
brms = fixef(brms_fit)[1:2,"Estimate"],
stan_block = summary(fit_stan_block, "mu_REs")$summary[,"mean"],
stan_array = summary(fit_stan_array, "mu_REs")$summary[,"mean"])
# Same population Std.devs
cbind(true = c(sigma_alpha, sigma_beta),
brms = VarCorr(brms_fit)$subj$sd[,"Estimate"],
stan_block = summary(fit_stan_block, "sd_REs")$summary[,"mean"],
stan_array = summary(fit_stan_array, "sd_REs")$summary[,"mean"])
# Same trial-level residual standard deviation
cbind(true = sigma_trial,
brms = VarCorr(brms_fit)$residual__$sd[,"Estimate"],
stan_block = summary(fit_stan_block, "sigma")$summary[,"mean"],
stan_array = summary(fit_stan_array, "sigma")$summary[,"mean"])
# Same true correlation between experimental effects
cbind(true = .5,
brms = VarCorr(brms_fit)$subj$cor["Intercept" ,"Estimate", "cond"],
stan_block = summary(fit_stan_block, "Rho_REs[1,2]")$summary[,"mean"],
stan_array = summary(fit_stan_array, "Rho_REs[1,2]")$summary[,"mean"])
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
EDIT: I will use the Gaussian likelihood for the sake of clarity, but keep in mind that the same idea can be used with any other likelihood, especially those better suited for response times. β©οΈ