# Efficient sampling SRM with large number of correlated random effects

Hello,

I am working on a model for social-network (round-robin) data with a large number of correlated random effects. If anyone is familiar with the Social Relations Model, this is an extension that allows each component to vary across 3 external observers who each rate the dyadic behaviors within the network. My goal is simply to estimate the (co)variance components, and use them to calculate some intraclass correlation coefficients.

My program works (slowly), but one of my parameters (\sigma_p) mixes poorly (R-hat > 1.1 & trace plots show that chains do not mix, although their random walks all seem to stay in the same parameter space; see attached traceplot). My intuition is that the problem is due to estimating a very small (nearly zero) variance component (\sigma_p = 0.054). To be clear, I actually estimate standard deviations and correlations, then assemble covariance matrices in transformed parameters block, then take the cholesky transformation to use for sampling in the model block (syntax attached and pasted below).

I tried specifying a more informative half-normal prior for \sigma_p (only weakly informative priors for the rest) and increased the number of iterations to 4000 (N chains = 3), which takes 3 CPU hours, but it still won’t stabilize. I intend to fit this model to simulated data in a Monte Carlo study, so I would like to know how I can make my Stan syntax more efficient. For example, I read in another thread something about sampling on the unit scale and scaling by the estimated standard deviation. Would that help with sampling my random effects with the near-zero variance component? Would that mean drawing random effects using the (cholesky of the) correlation matrix, then multiplying the random effects by their SDs when I create yHat in the transformed parameters block?

Any other ideas would be helpful, thank you! My stan syntax and R syntax look as follows:


data {
// sample sizes at each level
int NPersons;            // number of unique participants
int NRaters;             // number of unique raters

// ID variables
int IDr[NDyads*NRaters];          // ID for rater

// observed data
matrix[NDyads*NRaters,2] Y;      // observed bivariate outcome (DVs for each dyad, per rater)
// use range(Y) for priors
real rangeY;
}

parameters {
real<lower=1,upper=6> M;    // marginal mean(Y) >> M
vector[NRaters] m;              // Random intercept(of)Y >> m
matrix[NPersons, 2] AP;         // correlated actor/partner effects >> A and P
matrix[NPersons, 2] ap[NRaters]; // correlated rater deviations from A and P >> a and p
// Hyperparameters of random effects
real<lower=0,upper=rangeY*2> sigmam;         // sigmam:   random intercept for raters

vector<lower=0,upper=rangeY*2>[2] sigmaAP;      // sigmaAP:       SDs of actor/partner effects
real<lower=-1,upper=1> rhoAP;                   // rhoAP:         generalized reciprocity

real<lower=0,upper=rangeY*2> sigmaE;            // sigmaE:        relationship SD
real<lower=-1,upper=1> rhoE;                    // rhoE:          dyadic reciprocity

vector<lower=0,upper=rangeY*2>[2] sigmaap; // sigmaap:  SD Rater deviations from A and P
real<lower=-1,upper=1> rhoap;              // rhoap:    correlation rater ks rating of A and P

real<lower=0,upper=rangeY*2> sigmae;      // sigmae:  SD error /rater deviation from relship effect
real<lower=-1,upper=1> rhoe;              // rhoe:    correlation rater ks deviation from relationship effects Eij and Eji
}

transformed parameters {
matrix[NDyads*NRaters,2] yHat;  // expected value of Y, given person dyad effects and rater deviations
matrix[2,2] covPerson;       // person level covariance matrix
matrix[2,2] cholPerson;      // person level Cholesky-covariances
vector[2] MPerson;           // person-level means (actor/partner effects)
vector[2] SDPerson;          // SDs of actor/partner effects

matrix[2,2] covDyad;        // person level covariance matrix
matrix[2,2] cholDyad;       // person level Cholesky-covariances
vector[2] MDyad;            // person-level means (actor/partner effects)
vector[2] SDDyad;           // SDs of actor/partner effects

matrix[2,2] covRaterPL;       // Rater-Person level covariance matrix
matrix[2,2] cholRaterPL;      // Rater-Person level Cholesky-covariances
vector[2] MRaterPL;           // Rater-Person level means (actor/partner effects)
vector[2] SDRaterPL;          // SDs of Rater-Person level

matrix[2,2] covRaterDL;       // Rater-Dyad level covariance matrix
matrix[2,2] cholRaterDL;      // Rater-Dyad level Cholesky-covariances
vector[2] MRaterDL;           // Rater-Dyad level means (actor/partner effects)
vector[2] SDRaterDL;          // SDs of Rater-Dyad level

yHat[i, 1] = M + m[IDr[i]] + AP[IDi[i], 1] + ap[IDr[i], IDi[i], 1] + AP[IDj[i], 2] + ap[IDr[i], IDj[i], 2] + E[IDd[i], 1];
yHat[i, 2] = M + m[IDr[i]] + AP[IDj[i], 1] + ap[IDr[i], IDj[i], 1] + AP[IDi[i], 2] + ap[IDr[i], IDi[i], 2] + E[IDd[i], 2];
}

// put hyperparameters into  matrices:
// Person level matrices
covPerson[1,1] = sigmaAP[1] * sigmaAP[1];
covPerson[2,2] = sigmaAP[2] * sigmaAP[2];
covPerson[1,2] = rhoAP * sigmaAP[1] * sigmaAP[2];
covPerson[2,1] = covPerson[1,2];
cholPerson = cholesky_decompose(covPerson);

covDyad[1,2] = rhoE * sigmaE * sigmaE;

// Rater*Person level matrices
covRaterPL[1,1] = sigmaap[1] * sigmaap[1];
covRaterPL[2,2] = sigmaap[2] * sigmaap[2];
covRaterPL[1,2] = rhoap * sigmaap[1] * sigmaap[2];
covRaterPL[2,1] = covRaterPL[1,2];
cholRaterPL = cholesky_decompose(covRaterPL);

covRaterDL[1,1] = sigmae * sigmae;
covRaterDL[2,2] = covRaterDL[1,1];
covRaterDL[1,2] = rhoe * sigmae * sigmae;
covRaterDL[2,1] = covRaterDL[1,2];
cholRaterDL = cholesky_decompose(covRaterDL);
}

model {
// priors
sigmaap[2] ~ normal(0, 0.05);
m ~ normal(0, sigmam);
for (i in 1:NPersons) AP[i] ~ multi_normal_cholesky(rep_vector(0, 2), cholPerson);
for (r in 1:NRaters) {
for (p in 1:NPersons) {
ap[r, p] ~ multi_normal_cholesky(rep_vector(0, 2), cholRaterPL);
}
}
for (i in 1:NRaters*NDyads) Y[i] ~ multi_normal_cholesky(yHat[i], cholRaterDL);
}



Running the model in rstan with the following code:

head(dyadData)

## load the package to fit models to data
library(rstan)
library(Rcpp)
## load the package to calculate model-fit information criteria
library(loo)

## Set standard rstan options
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores() - 1)

## The following syntax assumes the Stan syntax below ("RESRM_mod_simple.stan) is already stored in a
## separate file.

## There is a dyad-level data set in the R object "dyadData", which
## has 1 row per dyad and columns for:
##   - ID1 and ID2: the person-level IDs of the first and second person in the dyad
##   - Rater: Indicating the rater rating the dyadic behaviour
##   - AB: the outcome of actor ID1 mimicking partner ID2
##   - BA: the outcome of actor ID2 mimicking partner ID1
## Rows do not have missing values

## list of knowns to pass as data
IDlevelsPersons <- unique(sort(c(dyadData$ID1, dyadData$ID2)))
IDlevelsDyads <- unique(sort(dyadData$Dyad)) IDlevelsRaters <- unique(sort(dyadData$Rater))

knowns <- list(NDyads = length(unique(dyadData$Dyad)), NPersons = length(unique(c(dyadData$ID1, dyadData$ID2))), NRaters = length(unique(c(dyadData$Rater))),

IDi = as.integer(factor(dyadData$ID1, levels = IDlevelsPersons)), IDj = as.integer(factor(dyadData$ID2, levels = IDlevelsPersons)),
IDd = as.integer(factor(dyadData$Dyad, levels = IDlevelsDyads)), IDr = as.integer(factor(dyadData$Rater, levels = IDlevelsRaters)),

rangeY = diff(range(c(dyadData$mimicryAB, dyadData$mimicryBA), na.rm = TRUE)))

## list of unknowns to keep track of
fixedFX <- c("M")
ranScales <- c("sigmam", "sigmaAP", "rhoAP", "sigmaE", "rhoE", "sigmaap", "rhoap", "sigmae", "rhoe")
unknowns <- c(fixedFX, ranScales)

## compile model
SRM.IRR <- stan_model("RESRM_mod_simple.stan", model_name = "Rater extended SRM")

## fit model to data
out.IRR <- sampling(SRM.IRR, data = knowns, pars = unknowns, seed = 3141593,
chains = 3, iter = 4000)

## fixed effects, SDs and correlation of random effects
print(out.IRR, pars = c(fixedFX, ranScales), probs = c(0.025, 0.975),
digits_summary = 3)


Resulting in the following output:

> round(ranEfTables\$summary, 3)
mean se_mean    sd   2.5% 97.5%    n_eff  Rhat
M           2.996   0.015 0.306  2.590 3.687  405.615 1.005
sigmam      0.284   0.026 0.592  0.014 1.862  512.429 1.005
sigmaAP[1]  0.622   0.001 0.056  0.519 0.740 3042.239 1.001
sigmaAP[2]  0.303   0.002 0.054  0.195 0.407  615.776 1.005
rhoAP       0.717   0.003 0.101  0.496 0.891  934.604 1.004
sigmaE      0.682   0.001 0.038  0.612 0.758 1814.002 1.000
rhoE        0.745   0.001 0.050  0.639 0.835 1845.725 1.000
sigmaap[1]  0.340   0.001 0.025  0.292 0.391 1567.415 1.001
sigmaap[2]  0.054   0.010 0.026  0.016 0.109    6.643 1.346
rhoap      -0.257   0.027 0.361 -0.864 0.550  177.619 1.022
sigmae      0.576   0.000 0.013  0.551 0.603 2835.810 1.004
rhoe        0.196   0.001 0.048  0.098 0.286 1120.716 1.001


Original data stems from the following Open Science Framework account: https://osf.io/b4nvf/

Thanks for thinking along!

DebbyRESRM_mod_simple.stan (5.1 KB)RESRM_rStan_Analysis_simple.R (2.7 KB)Pi_TracePlots.pdf (79.4 KB)

Maybe try a non-centered parameterization for the ap parameters?

That’s just a gut reaction hearing that the standard deviation was getting really small. Are there divergences?

If the answer is yes, there’s a section in the manual (2.17) titled “Non-Centered Parameterization” that explains what a non-centered parameterization is. There’s a Betancourt case study that explains the situation in more detail: http://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html

Code-wise, what that means is instead of doing:

parameters {
matrix[NPersons, 2] ap[NRaters];
...
}

model {
for (r in 1:NRaters) {
for (p in 1:NPersons) {
ap[r, p] ~ multi_normal_cholesky(rep_vector(0, 2), cholRaterPL);
}
}
...
}


Do

parameters {
matrix[NPersons, 2] ap_z[NRaters];
...
}

transformed parameters {
matrix[NPersons, 2] ap[NRaters];

...

for (r in 1:NRaters) {
for (p in 1:NPersons) {
ap[r, p] = cholRaterPL * ap_z[r, p]; // The prior distribution on ap is multi_normal_cholesky(rep_vector(0, 2), cholRaterPL);
}
}
}

model {
for (r in 1:NRaters) {
for (p in 1:NPersons) {
ap_z[r, p] ~ normal(0, 1);
}
}
...
}


That said, non-centering might help the timestep adaptation work better (and hence mix better) even if this isn’t a problem with divergences.

You should be able to get some performance mileage out of vectorizing the multi_normal_cholesky calls. Instead of:

parameters {
matrix[NPersons, 2] AP;
...
}

model {
for (i in 1:NPersons) AP[i] ~ multi_normal_cholesky(rep_vector(0, 2), cholPerson);
...
}


Do:

parameters {
vector[2] AP[NPersons];
...
}

model {
AP ~ multi_normal_cholesky(rep_vector(0, 2), cholPerson);
...
}