Trend analysis: Full stan model does not match resutls from same model in brms

I apologise for the repetition and my obssessiveness but I did not get a satisfactory answer to my question on my last post https://discourse.mc-stan.org/t/estimates-from-hierarchical-longitudinal-analysis-in-stan-much-less-certain-than-estimates-from-same-model-in-brms/18108.

I am modelling change in a simple gaussian outcome variable seven days. I adapted techniques laid out in a series of lectures by Mike Lawrence (https://www.youtube.com/watch?v=woho42axipg&list=PLu77iLvsj_GNmWqDdX-26kZ56dCZqkzBO&index=21), notably creating a model-matrix of within-subjects predictors (in this case an intercept and a continuous time variable) and modelling correlation between coefficients.

Here is the data. There are only three variables: participant id, day each observation was recorded (continuous variable from 1 to 7) and score on outcome variable. I want to estimate the initial value of the outcome and the per-day rate of change in the outcome across the 7-day trial. This model should be simple but I cannot get results that match the brms model.

Note: Code is in R, apologies to non-R users.

df <- data.frame(id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 25, 25, 26, 26, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27, 27, 27, 28, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37, 37, 37, 38, 38, 39, 39, 39, 39, 39, 39, 39, 40, 40, 40, 40, 40, 40, 40, 41, 41, 41, 41, 41, 41, 41, 42, 42, 42, 42, 42, 42, 42, 43, 43, 44, 44, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 45, 45, 46, 46, 46, 46, 46, 46, 46),
                 day = c(1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 6, 7, 1, 2, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7),
                 score = c(5, 4.05, 2.22, 1.63, 2.53, 0.63, 1, 4.84, 0.79, 1.58, 2.16, 3.68, 1.34, 3.33, 0.6, 0.79, 0.37, 0.58, 0.37, 0.53, 0.26, 0.16, 0.58, 0.63, 1.63, 1, 0.58, 0.16, 3.32, 2.26, 2.11, 2.47, 1.28, 2.78, 2.7, 3.26, 0.68, 0.84, 1.95, 0.26, 0.79, 1.89, 1.74, 2, 2.58, 2.16, 2.74, 2.63, 2.42, 4.84, 6.26, 6.26, 5.58, 6.37, 1.26, 1, 2.58, 0.95, 0.37, 0.89, 1.11, 0.89, 0.74, 1.06, 4.05, 5, 7.56, 7.11, 6.26, 4.32, 3.11, 2.32, 2.63, 3, 2.53, 2.74, 1.37, 0.47, 1.68, 1.42, 0.42, 0.42, 0.37, 0.47, 2.63, 2.53, 2.21, 1.74, 1.05, 0.89, 0.74, 3.53, 3.84, 2.47, 0.89, 0.63, 0.68, 0.63, 2.89, 2.05, 1.47, 1.32, 1.47, 2.11, 0.95, 3.42, 1.95, 2.21, 1.74, 2.5, 2, 0.53, 2.53, 3.16, 2.16, 1.16, 2.32, 3.11, 2.42, 0.58, 1.47, 1.42, 0.89, 0.37, 1, 0.32, 1.11, 0.79, 1.05, 1.42, 0.74, 0.63, 1.95, 3.47, 0.95, 3.47, 2.21, 2.42, 2.16, 2.05, 4.21, 2.58, 0.79, 0.63, 0.89, 0.95, 1.11, 0, 0.53, 1.05, 1.05, 1.05, 0.84, 1.16, 4.39, 2.58, 1.63, 2.63, 4.11, 3.74, 2.67, 4.32, 2.79, 5.11, 3.42, 2.63, 2.53, 2.26, 5.47, 4.74, 4.63, 4.84, 4.16, 3.32, 4.05, 2.11, 1.74, 1, 1.16, 0.68, 0.84, 0.89, 0.37, 1.05, 1.89, 1.58, 2.26, 2.11, 3.21, 1.16, 3.84, 3.22, 2.33, 2.16, 3.05, 3.84, 2.47, 3.79, 4.68, 1.74, 1.58, 2.21, 1.58, 1.53, 2.84, 1.26, 0.95, 2, 1.42, 2.32, 0.11, 0, 0.58, 0, 0, 0, 0.47, 6.05, 4.53, 2.79, 4, 1.84, 2.68, 0.37, 1.68, 1.63, 1.11, 1.42, 2.21, 2, 1.56, 1.21, 4.37, 4.63, 4.61, 2.79, 1.89, 0.94, 3.95, 4.47, 3, 4, 4.89, 5.37, 5.84, 0.26, 1.11, 0.26, 0.16, 0.11, 0.21, 0.05, 5.24, 5.63, 6.74, 5.74, 5.47, 2.68, 1.67, 0.58, 0.11, 0.84, 0.21, 0.5, 0.58, 0.58, 1, 2.32, 2.42, 4.89, 1.37, 1.95, 1.21, 0.95, 4.37, 6.21, 6.74, 6.26, 4.95, 4.79, 3.05, 2.63, 3.79, 1.68, 1.79, 3.37, 2.58, 1.95, 3.27, 5.11, 1.21, 3.68, 2.84, 0.89, 0.94, 0.11, 0.11, 2.68, 1.26, 1.37, 0.84, 0.63, 3.37, 2.11, 2.16, 2.68, 3.37, 3.89, 3.42, 2, 1.63))

And here is the model

# matrix of within-subject predictors
W <- model.matrix(score ~ day, df)


# model string
ms_within <- "
  data {
    int nP;           // number of participants
    int nY;           // the number of observations
    int nW;           // the number of within-subject predictors
    vector[nY] y;     // vector of scores
    matrix[nY, nW] W; // matrix of within-subjects predictors
    int S[nY];        // integer value of subject associated with each y outcome  
  }
  
  transformed data {
    vector[nY] y_scaled;               
    y_scaled = (y - mean(y)) / sd(y);  // standardising via z-transform
  }
  
  parameters {
    vector[nW] scaled_coef_means;         //scaled coefficients for group means
    vector<lower=0>[nW] scaled_coef_sds;
    corr_matrix[nW] corr_mat;             // correlation among coefficients
    matrix[nP,nW] scaled_subj_coefs;      // scaled subject coefficients
    real<lower=0> noise;                  // noise parameter 
  }
  
  model {
  
  	// weakly informed priors
  	scaled_coef_means ~ normal(0,1) ;   // flat gaussian prior on mean of subj coefs
  	scaled_coef_sds ~ weibull(2,1) ;    // flat prior on sd of subj coefs 
  	corr_mat ~ lkj_corr(1) ;            // flat prior on correlations
  	noise ~ weibull(2,1) ;              // prior on measuement noise
  
  	// likelihood for lower-tier in the hierarchy. estimates a matrix of coefficients with as many rows as participants and with as many within-subject coefficients as there are subjects in the model (intercept and effect)
  	for(p in 1:nP){
  		scaled_subj_coefs[p,] ~ multi_normal(
  			  scaled_coef_means
  			, quad_form_diag(corr_mat, scaled_coef_sds)  
  		) ;
  	}
  	
  	// now for the higher tier in the hierarchy
  	for (i in 1:nP) {
        y_scaled[i] ~ normal(
          sum(scaled_subj_coefs[S[i]].*W[i,]), 
          noise
        );
  	}
  }
  
  generated quantities{
  
    //declare variables
    vector[nW] coef_means;   // transform mean coefficients back to original y scale
    vector[nW] coef_sds;     // transform sds back to original y scale
    

    // rescale everything
    coef_means = scaled_coef_means*sd(y);        // for mean coefs multiply by sd
    coef_means[1] = coef_means[1] + mean(y);     // for intercept also add mean
    coef_sds = scaled_coef_sds*sd(y);            // for group sds multiply by sd
  }
"

Now we run the analyses and calculate the lower and upper 95% HPDI and median estimate for the two parameters we are interested in: overall intercept and rate of change in score per day.

post <- stan(model_code = ms_within,
             data = list(nP = max(df$id),
                         nY = nrow(df),
                         nW = ncol(W),
                         y = df$score,
                         W = W,
                         S = df$id),
             seed = 1,
             cores = 3,
             chains = 1,
             warmup = 1e3,
             iter = 2e3)

And when we call the stan model these are the results

print(post, pars = "coef_means", probs = c(0.025, 0.975))

The output from this is

               mean se_mean   sd  2.5% 97.5% n_eff Rhat
coef_means[1]  2.26    0.04 0.58  0.99  3.37   209    1
coef_means[2] -0.15    0.01 0.13 -0.41  0.11   200    1

Where coef_means[1] is the population-level score at baseline and coef_means[2] is the per-day population-level linear rate of change in the outcome variable.

When we run the same model in brms

brmsMod <- brm(formula = score ~ day + (day|id),
               data = df,
               family = gaussian)

And summarise the results

round(summary(brmsMod)$fixed,2)

We get

          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     2.77      0.27     2.23     3.28    1     1497     2222
day          -0.14      0.04    -0.22    -0.05    1     2255     2669

Can anyone tell me why the result are so different? The intercept in the stan model is much lower and though the point estimate of the day effect is similar, the 95% HPDI excludes 0 in the brms model but not my stan model. I tried setting priors in the brms model that matched the stan model and generating longer chains and larger run-in periods but the results stay much the same. I feel like I must be doing something fundamentally wrong.

Any help much appreciated.

Hi Llew,

The simplest way to debug this is to extract the Stan model that brms is using and look at how it differs from your own. You can do this via:

brms_stanmod = stancode(brmsMod)

Thank you for responding @andrjohns. Yes I have looked at the stancode from the brms model and honestly I find a fair bit of it inscrutable. The automated code is clearly based on knowledge of stan techniques that are well above my ability level. I guess I just need some tips on where I might have gone wrong in my own model. It’s a simple-ish model so I am just confused why my results differ so markedly.

When I run your Stan model I get the following warnings:

4: There were 27 divergent transitions after warmup. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them. 
5: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low 
6: Examine the pairs() plot to diagnose sampling problems
 
7: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
8: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess 

These indicate that the results are highly unreliable and that there are serious problems with the model as specified. This is likely why the brms and Stan results are different.

From a quick look, I would guess that the issue is coming from the hieararchical part of the model (the random effects) - as the parameterisation you’re using (known as the ‘centered’ parameterisation) can sample really poorly. See this section of the manual for more information: https://mc-stan.org/docs/2_24/stan-users-guide/reparameterization-section.html#hierarchical-models-and-the-non-centered-parameterization

1 Like

Thank you @andrjohns , you gave me what I needed which was a place to start. Looks like I have some reading to do.

Non-centered parameterisation has come up a few times in my studies but I was too busy being confused by one thing to add another to the list. Out of interest:

(1) Is non-centered ALWAYS as good or better than centered?
(2) under what conditions is it better? The manual passage you lined to seemed to suggest it’s better when the sample size is very small, which is certainly the case with my data.

Hi

  1. No. It depends on a number of things, i.e., the model, priors, and data.
  2. It depends on #1 :)

However, when I run your model in brms (latest) with cmdstanr/cmdstan it samples beautifully. But I do believe that brms uses NCP as default.