Estimates from hierarchical longitudinal analysis in stan much less certain than estimates from same model in brms

Apologies for the long data strings and code. I would just like people to be able to reproduce my analysis.

I am having trouble getting what I consider to be good estimates of the parameters in a mixed-effects model run in stan.

df <- data.frame(id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 25, 25, 26, 26, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27, 27, 27, 28, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37, 37, 37, 38, 38, 39, 39, 39, 39, 39, 39, 39, 40, 40, 40, 40, 40, 40, 40, 41, 41, 41, 41, 41, 41, 41, 42, 42, 42, 42, 42, 42, 42, 43, 43, 44, 44, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 45, 45, 46, 46, 46, 46, 46, 46, 46),
                 day = c(1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 6, 7, 1, 2, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7),
                 score = c(5, 4.05, 2.22, 1.63, 2.53, 0.63, 1, 4.84, 0.79, 1.58, 2.16, 3.68, 1.34, 3.33, 0.6, 0.79, 0.37, 0.58, 0.37, 0.53, 0.26, 0.16, 0.58, 0.63, 1.63, 1, 0.58, 0.16, 3.32, 2.26, 2.11, 2.47, 1.28, 2.78, 2.7, 3.26, 0.68, 0.84, 1.95, 0.26, 0.79, 1.89, 1.74, 2, 2.58, 2.16, 2.74, 2.63, 2.42, 4.84, 6.26, 6.26, 5.58, 6.37, 1.26, 1, 2.58, 0.95, 0.37, 0.89, 1.11, 0.89, 0.74, 1.06, 4.05, 5, 7.56, 7.11, 6.26, 4.32, 3.11, 2.32, 2.63, 3, 2.53, 2.74, 1.37, 0.47, 1.68, 1.42, 0.42, 0.42, 0.37, 0.47, 2.63, 2.53, 2.21, 1.74, 1.05, 0.89, 0.74, 3.53, 3.84, 2.47, 0.89, 0.63, 0.68, 0.63, 2.89, 2.05, 1.47, 1.32, 1.47, 2.11, 0.95, 3.42, 1.95, 2.21, 1.74, 2.5, 2, 0.53, 2.53, 3.16, 2.16, 1.16, 2.32, 3.11, 2.42, 0.58, 1.47, 1.42, 0.89, 0.37, 1, 0.32, 1.11, 0.79, 1.05, 1.42, 0.74, 0.63, 1.95, 3.47, 0.95, 3.47, 2.21, 2.42, 2.16, 2.05, 4.21, 2.58, 0.79, 0.63, 0.89, 0.95, 1.11, 0, 0.53, 1.05, 1.05, 1.05, 0.84, 1.16, 4.39, 2.58, 1.63, 2.63, 4.11, 3.74, 2.67, 4.32, 2.79, 5.11, 3.42, 2.63, 2.53, 2.26, 5.47, 4.74, 4.63, 4.84, 4.16, 3.32, 4.05, 2.11, 1.74, 1, 1.16, 0.68, 0.84, 0.89, 0.37, 1.05, 1.89, 1.58, 2.26, 2.11, 3.21, 1.16, 3.84, 3.22, 2.33, 2.16, 3.05, 3.84, 2.47, 3.79, 4.68, 1.74, 1.58, 2.21, 1.58, 1.53, 2.84, 1.26, 0.95, 2, 1.42, 2.32, 0.11, 0, 0.58, 0, 0, 0, 0.47, 6.05, 4.53, 2.79, 4, 1.84, 2.68, 0.37, 1.68, 1.63, 1.11, 1.42, 2.21, 2, 1.56, 1.21, 4.37, 4.63, 4.61, 2.79, 1.89, 0.94, 3.95, 4.47, 3, 4, 4.89, 5.37, 5.84, 0.26, 1.11, 0.26, 0.16, 0.11, 0.21, 0.05, 5.24, 5.63, 6.74, 5.74, 5.47, 2.68, 1.67, 0.58, 0.11, 0.84, 0.21, 0.5, 0.58, 0.58, 1, 2.32, 2.42, 4.89, 1.37, 1.95, 1.21, 0.95, 4.37, 6.21, 6.74, 6.26, 4.95, 4.79, 3.05, 2.63, 3.79, 1.68, 1.79, 3.37, 2.58, 1.95, 3.27, 5.11, 1.21, 3.68, 2.84, 0.89, 0.94, 0.11, 0.11, 2.68, 1.26, 1.37, 0.84, 0.63, 3.37, 2.11, 2.16, 2.68, 3.37, 3.89, 3.42, 2, 1.63))

In this analysis the goal is to model the estimated starting score for 46 participants and their rate of change in score per day of a 7-day trial.

Now the approach I took I adapted from techniques explained in a brilliant series of lectures on stan for Psychologists by Mike Lawrence, specifically this one (https://www.youtube.com/watch?v=woho42axipg&list=PLu77iLvsj_GNmWqDdX-26kZ56dCZqkzBO&index=21). In these models you use the design matrix to model lower-level, within-subjects coefficients and higher-order group coefficients, like so

# matrix of within-subject predictors
W <- model.matrix(score ~ day, df)

Then you specify a simple random-slopes mixed-effects for repeated measures regression, with day of the trial as the sole predictor (range 1-7). We also model the correlation between intercept and rate of change coefficients. Here is the code for the model itself and to run the model in rstan.

# model string
ms_within <- "
  data {
    int nP;           // number of participants
    int nY;           // the number of observations
    int nW;           // the number of within-subject predictors
    vector[nY] y;     // vector of scores
    matrix[nY, nW] W; // matrix of within-subjects predictors
    int S[nY];        // integer value of subject associated with each y outcome  
  }
  
  transformed data {
    vector[nY] y_scaled;               
    y_scaled = (y - mean(y)) / sd(y);  // standardising via z-transform
  }
  
  parameters {
    vector[nW] scaled_coef_means;         //scaled coefficients for group means
    vector<lower=0>[nW] scaled_coef_sds;
    corr_matrix[nW] corr_mat;             // correlation among coefficients
    matrix[nP,nW] scaled_subj_coefs;      // scaled subject coefficients
    real<lower=0> noise;             // noise parameter 
  }
  
  model {
  
  	// weakly informed priors
  	scaled_coef_means ~ normal(0,1) ;   // flat gaussian prior on mean of subj coefs
  	scaled_coef_sds ~ weibull(2,1) ;    // flat prior on sd of subj coefs 
  	corr_mat ~ lkj_corr(1) ;            // flat prior on correlations
  	noise ~ weibull(2,1) ;              //prior on measuement noise
  
  	// likelihood for lower-tier in the hierarchy. estimates a matrix of coefficients with as many rows as participants and with as many within-subject coefficients as there are subjects in the model (intercept and effect) for each
  	for(p in 1:nP){
  		scaled_subj_coefs[p,] ~ multi_normal(
  			  scaled_coef_means
  			, quad_form_diag(corr_mat, scaled_coef_sds)  
  		) ;
  	}
  	
  	// now for the higher tier in the hierarchy
  	for (i in 1:nP) {
        y_scaled[i] ~ normal(
          sum(scaled_subj_coefs[S[i]].*W[i,]), 
          noise
        );
  	}
  }
  
  generated quantities{
  
    //declare variables
    vector[nW] coef_means;   // transform mean coefficients back to original y scale
    vector[nW] coef_sds;     // transform sds back to original y scale
    

    // rescale everything
    coef_means = scaled_coef_means*sd(y);        // for mean coefs multiply by sd
    coef_means[1] = coef_means[1] + mean(y);     // for intercept also add mean
    coef_sds = scaled_coef_sds*sd(y);            // for group sds multiply by sd
  }
"

Now we run the analyses and calculate the lower and upper 95% HPDI and median estimate for the two parameters we are interested in: overall intercept and rate of change in score per day.

post <- stan(model_code = ms_within,
                          data = list(nP = max(df$id),
                                      nY = nrow(df),
                                      nW = ncol(W),
                                      y = df$score,
                                      W = W,
                                      S = df$id),
                          seed = 1,
                          cores = 3,
                          chains = 1,
                          warmup = 1e3,
                          iter = 2e3)

# extract the estimates of the higher-level effects, intercept and effect
pX <- as.data.frame(as.array(post)[,,grep("^coef_means", dimnames(post)$parameters, perl=T)])

# give the effects sensible names
names(pX) <- c("intercept", "rateOfChange") 

# hpdi function
HPDIFunct <- function (vector, HPDI = 0.95) { 
  sortVec <- sort(vector)
  ninetyFiveVec <- ceiling(HPDI*length(sortVec))
  fiveVec <- length(sortVec) - length(ninetyFiveVec)
  diffVec <- sapply(1:fiveVec, function (i) sortVec[i + ninetyFiveVec] - sortVec[i])
  minVal <- sortVec[which.min(diffVec)]
  maxVal <- sortVec[which.min(diffVec) + ninetyFiveVec]
  median <- median(sortVec)
  return(list(est = round(c(minVal, median, maxVal),2)))
}

# lower 95% HPDIx, median, and upper 95% HPDI
summaryDF <- as.data.frame(t(unlist(HPDIFunct(pX$intercept))))
summaryDF[2,] <- t(unlist(HPDIFunct(pX$rateOfChange)))
row.names(summaryDF) <- c("Intercept", "day")
names(summaryDF) <- c("lowerStan", "medianStan", "upperStan")

Now we run an alternative bayesian version of the same model in brms and an nhst version of the same model in the R-package nlme

#  brms model
library(brms)
brmsMod <- brm(formula = score ~ day + (day|id),
                                  data = df,
                                  family = gaussian)

# bind brms estimates to the stan estimates
summaryDF <- cbind(summaryDF, 
                   round(summary(brmsMod)$fixed[,c("l-95% CI", "Estimate", "u-95% CI")],2))

#  nhst model
library(nlme)
nhstMod <- lme(fixed = score ~ day,
               random = ~day|id,
               data = df)

cbind(summaryDF, round(as.data.frame(intervals(nhstMod)$fixed),2))

Now here is the output form those three versions of the same model side by side

#           lowerStan medianStan upperStan l-95% CI Estimate u-95% CI lower  est. upper
# Intercept      1.11       2.29      3.43     2.23     2.78     3.32  2.26  2.79  3.31
# day           -0.39      -0.16      0.13    -0.22    -0.14    -0.05 -0.22 -0.14 -0.06

The left three columns are from the stan model, the middle three from the brms model, the right three from the nhst model.

So my question is: why are the parameter estimates so much less certain in my stan model than the other two?

The stan code used by brms is much more complex than mine, but I had hoped that I would get close. The day coefficient in particular is much broader and less certain in the stan model than the other two.

What am I doing wrong?

A couple of things to try:
Instead of 95% I’d use 50% is more computational stable and makes for a better comparison.

Are the priors the same in the stan and brms model. In brms you can get_prior and set_prior.

I think nmle essentially uses uniform priors?

Thank you Ara Winter. If it is the priors making all the difference then why are the brms and nlme models so similar? brms does not use uniform priors as default.

I chose 95% because I wanted to emulate the intervals returned by brms and nlme.

Not sure yet but can you make the brms priors match the stan priors and then we can go from there. I’ll try and get the code copied over into R on my side tomorrow for a run through.

Thank you @Ara_Winter I really appreciate the help. I actually found a statement in the ‘brms_overview’ vignette that said “The default prior is an improper flat prior over the reals” but when I run stancode(brmsMod) this seems not to be the case (though I must confess the stancode is well beyond my understanding).

I don’t have a lot of experience with setting priors in brms but I had a go at trying to emulate the priors in the stan model in brms as you suggested with the following brms model syntax

brmsMod2 <-  brm(formula = score ~ day + (day|id),
                 data = df,
                 family = gaussian,
                 prior = c(set_prior("normal(0,1)", class = "b"),
                           set_prior("weibull(2,1)", class = "sd"),
                           set_prior("lkj(1)", class = "cor")))

Which yielded the following output (using summary(brmsMod2)) for the population-level effects

            l-95% CI   Estimate    u-95% CI
Intercept  2.2802316  2.7927099  3.30701714
day       -0.2242224 -0.1402993 -0.05853283

Which are very similar lower and upper 95% HPDI and median point estimates to the original brms model. Once again both intercept and day coefficients are much more certain than my stan model, the point estimate for intercept is higher, and the 95% HPDI for the day effect does not include 0.

I strongly suspect something malfunctioning in my code somewhere but can’t find where it might be.

Anyway I really appreciate you helping out.

Let me see what I can do. I’ll get this up and running this morning. Hopefully others folks will pile in on the code end of thing.

A couple of things looking at your model calls. I’d recommend 4 chains, 2000 iter, and 500 for the warmup to start.

When I run your Stan model I end up with a number of divergent transitions.

@Ara_Winter I tried that but the width of the 95% HPDIs do not change

Well the divergent transitions tells me there is something up with the model or data. I am not sure what yet. Sorry haven’t had much time to dive into the details.