Failure to recover simulated group means in cross-classified LMM with monotonic predictor

Hi everyone,

I have a question regarding a model that I tried to fit, giving me inconsistent results that I do not fully understand. In short, I am simulating cross-classified data in which an observation is determined by a population intercept + slope and random adjustments of intercept and slope per participant and stimulus. I am able to recover the simulation parameters with a model assuming a linear relationship, while I fail recovering some of the parameters when using a monotonic predictor instead. I will explain below.

Data Simulation

This is the R-Script (6.4 KB) I used to produce the results in this question.

I also include the important code-chunks here so you do not need to download the file necessarily.

I used for the simulation adapted from Schad, Betancourt, Vasishth (2019):

sim_dv <- function(dat=data,
                   n_participant=100, # number of participants
                   n_stimulus=20,     # number of stimuli
                   beta0=165.07,   # intercept
                   beta1=-19.07,   # effect-size
                   sigma_u0=46.51, # SD of participant-intercept (random participant intercept)
                   sigma_u1=18.94, # SD of effect per participant (random slope for participant)
                   sigma_w0=5.19, # SD for stimulus-intercept (random stimulus intercept)
                   sigma_w1=3.57, # SD of effect per stimulus (random slope per stimulus) 
                   rho_u=0.16,   # random correlation between participant-int and participant-slope
                   rho_w=-0.37,    # random correlation between stimulus-int and stimulus-slope
                   sigma=37.04,     # unexplained variance / residual variance
                   N=dim(dat)[1]) {  # number of observations to be sampled
  Sigma_u <- matrix(c(sigma_u0^2, sigma_u0*sigma_u1*rho_u,
                      sigma_u0*sigma_u1*rho_u,sigma_u1^2),
                    ncol=2)  # var-covar matrix for participant
  ## generate by subject intercepts and slopes:
  u <- mvrnorm(n_participant,c(0,0),Sigma_u)
  
  Sigma_w <- matrix(c(sigma_w0^2, sigma_w0*sigma_w1*rho_w,
                      sigma_w0*sigma_w1*rho_w,sigma_w1^2),
                    ncol=2) # var-covar matrix for stimulus
  ## generate by stimulus intercepts and slopes:
  w <- mvrnorm(n_stimulus,c(0,0),Sigma_w)
  ## generate normally distributed data:
  y <- NA
  for (j in seq(1, N)){ # j <- 1
    y[j] <- rnorm(1, beta0 +                   # common / fixed intercept
                    u[dat$participant[j],1] +         # + random intercept for participant
                    w[dat$stimulus[j],1] +     # + random intercept for stimulus
                    (beta1 +                   # + fixed effect
                       u[dat$participant[j],2] +      # + random slope participant
                       w[dat$stimulus[j],2]    # + random slope stimulus
                    ) * dat$x[j],       # + multiplicator for effect
                  sigma) # unexplained variance over all variables
  }
  return(y)
}

where dat is generated using the following code

gen_design <- function(n_participant = 100, 
                       n_stimulus = 20,
                       n_levels = 4){
  participants <- rep(1:n_participant, each = n_stimulus)
  #stims <- rep(1:n_stimulus, times = n_participant)
  x <- c()
  stims <- as.vector(sapply(seq(1, n_participant), function(x) sample(seq(1,n_stimulus), size = n_stimulus)))
  for(i in 1:n_participant){
    x <- append(x, seq(0,(n_levels-1)))
  }
  expdesign <- data.frame(participant = participants, stimulus = stims, x = x)
  return(expdesign)
}

Thus I simulate data that represent a linear effect of the ordinal predictor x = {0,1,2,3} (the idea is that x represents a continuous construct of which only a subset of 4 possible values has been observed/manipulated) across 100 participants U who see a fixed set of 20 different stimuli W each. Across participants, the stimuli are randomly assigned to one of the 4 ordinal predictor levels, while within participants, a specific stimulus is only associated with one specific level of x. Moreover, I simulate random intercept and slope adjustments across both participants and stimuli.

Thus, the data is simulated as:

y_{uw} = b_0 + U_{0u} + W_{0w} + (b_1+U_{1u} + W_{1w})*X_{uw} + \epsilon_{uw}

where

  • beta_0 = 165.07 is the population-level intercept
  • U_{0u} = 46.51 is the standard deviation of the adjustment to the participant intercepts
  • W_{0w} = 5.19 is the standard deviation of the adjustment to the stimulus intercepts
  • beta_1 = -19.07 is the population-level slope
  • U_{1u} = 18.94 is the standard deviation of the slope-adjustment on the participant level
  • W_{1w} = 3.57 is the standard deviation of the slope-adjustment on the stimulus level
  • \epsilon_{uw} = 37.04 is the residual standard-deviation

moreover, U_{0u} and U_{1u} are correlated with \rho_u =0.16 and W_{0w} and W_{1w} are correlated with \rho_w = -0.37 by sampling them from a multivariate normal

Note that these parameter values are based on estimates from another model, but I tried different parameter values as well and the problem remains the same.

so that I end up with the following dataset exp1

str(exp1)

'data.frame':	2000 obs. of  4 variables:
 $ participant: int  1 1 1 1 1 1 1 1 1 1 ...
 $ stimulus   : int  4 7 1 2 13 19 11 17 14 3 ...
 $ x          : int  0 1 2 3 0 1 2 3 0 1 ...
 $ y          : num  120 102 165 130 153 ...

head(exp1)

  participant stimulus x        y
1           1        4 0 120.1112
2           1        7 1 102.2981
3           1        1 2 164.8664
4           1        2 3 129.6759
5           1       13 0 152.6609
6           1       19 1 192.1953

with(exp1, table(x, participant))[, 1:10]

   participant
x   1 2 3 4 5 6 7 8 9 10
  0 5 5 5 5 5 5 5 5 5  5
  1 5 5 5 5 5 5 5 5 5  5
  2 5 5 5 5 5 5 5 5 5  5
  3 5 5 5 5 5 5 5 5 5  5

with(exp1, table(x, stimulus))

   stimulus
x    1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
  0 21 23 25 25 27 25 20 21 28 23 26 27 27 34 26 24 21 34 21 22
  1 25 30 26 21 20 22 32 31 26 29 23 24 30 20 27 31 23 19 22 19
  2 30 30 21 26 26 31 27 23 20 21 22 21 23 26 28 18 21 27 33 26
  3 24 17 28 28 27 22 21 25 26 27 29 28 20 20 19 27 35 20 24 33

Problem

linear predictor recovers parameters well

If I now try to recover the simulated parameters, it works quite well with a gaussian model estimating a linear effect of the predictor x:

m_linear <- brm(y ~ x + (1 + x | participant) + (1 + x | stimulus), data = exp1, cores = 4)
summary(m_linear)

summary of linear predictor model
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x + (1 + x | participant) + (1 + x | stimulus) 
   Data: exp1 (Number of observations: 2000) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Group-Level Effects: 
~participant (Number of levels: 100) 
                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)       46.95      3.72    40.19    54.81 1.00     1511     2310
sd(x)               19.25      1.59    16.35    22.57 1.00     2643     2892
cor(Intercept,x)     0.12      0.11    -0.10     0.34 1.00     1586     2068

~stimulus (Number of levels: 20) 
                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)        4.88      2.40     0.48     9.90 1.00     1410     1329
sd(x)                5.25      1.35     2.99     8.29 1.00     1056     1637
cor(Intercept,x)    -0.61      0.33    -0.97     0.33 1.00      572      627

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept   171.35      5.00   161.26   181.02 1.00     1284     1800
x           -18.70      2.38   -23.34   -13.88 1.00     2199     2653

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    36.30      0.62    35.11    37.54 1.00     4246     2967

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
pairs plot of linear predictor model

resulting in the following plot, showing the posterior parameters against the simulated parameters (left) and the marginal effects against the observed (i.e. simulated) means per level of predictor x (right)

However, when I fit the same data with a monotonic effect of x, the model fails to recover the beta1 parameter, and the posterior means differ substantially from the observed means.

monotonic predictor does not recover beta1 accurately

m_monotonic <- brm(y ~ mo(x) + (1 + mo(x) | participant) + (1 + mo(x) | stimulus), data = exp1, cores = 4)
summary(m_monotonic)

summary of monotonic model
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ mo(x) + (1 + mo(x) | participant) + (1 + mo(x) | stimulus) 
   Data: exp1 (Number of observations: 2000) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Group-Level Effects: 
~participant (Number of levels: 100) 
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)         46.93      3.67    40.09    54.55 1.00     1689     2267
sd(mox)               19.36      1.61    16.41    22.72 1.00     2419     2741
cor(Intercept,mox)     0.09      0.11    -0.12     0.31 1.00     1412     2323

~stimulus (Number of levels: 20) 
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)          5.07      2.37     0.60     9.93 1.00     1442     1359
sd(mox)                5.23      1.31     3.02     8.28 1.00     1556     2121
cor(Intercept,mox)    -0.67      0.29    -0.97     0.16 1.00     1045     1173

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept   172.47      5.14   162.55   182.83 1.00     1162     1879
mox          -9.35      1.18   -11.68    -7.06 1.00     2424     3106

Simplex Parameters: 
        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
mox1[1]     0.38      0.03     0.33     0.43 1.00     5673     3516
mox1[2]     0.32      0.03     0.26     0.38 1.00     8091     2899
mox1[3]     0.30      0.03     0.25     0.35 1.00     5867     2918

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    36.25      0.60    35.12    37.42 1.00     5904     3497

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

resulting in the following parameter recovery and marginal effect vs. observed plot:

It appears that in this case, the size of the effect is largely underestimated. Interestingly, all other parameters still seem to be estimated just fine…

My first hinge was that there might be something weird going on with parameters being highly correlated, but the pairs(m_monotonic) does not look too weird to me:

pairs plot of monotonic predictor model

removing either random slope from monotonic model resolves this

Interestingly, when removing either random slope from the monotonic predictor model, the means are again recovered accurately.

m_monotonic_RSppid <- brm(y ~ mo(x) + (1 + mo(x) | participant) + (1 | stimulus), data = exp1, cores = 4)
summary(m_monotonic_RSppid)

summary of monotonic model with no random slope across stimuli
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ mo(x) + (1 + mo(x) | participant) + (1 | stimulus) 
   Data: exp1 (Number of observations: 2000) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Group-Level Effects: 
~participant (Number of levels: 100) 
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)         46.71      3.72    40.05    54.69 1.00     1511     2039
sd(mox)               19.24      1.57    16.37    22.56 1.00     1859     2462
cor(Intercept,mox)     0.10      0.11    -0.12     0.32 1.00     1218     1575

~stimulus (Number of levels: 20) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     4.89      1.44     2.38     8.10 1.00     1476     1843

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept   172.39      5.13   162.33   182.54 1.00      973     1236
mox         -18.60      2.04   -22.70   -14.52 1.00     1483     1799

Simplex Parameters: 
        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
mox1[1]     0.38      0.03     0.33     0.43 1.00     4785     3253
mox1[2]     0.32      0.03     0.26     0.39 1.00     5944     3135
mox1[3]     0.29      0.03     0.24     0.35 1.00     6376     3117

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    36.72      0.61    35.55    37.98 1.00     4836     2919

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

m_monotonic_RSstim <- brm(y ~ mo(x) + (1 | participant) + (1 + mo(x) | stimulus), data = exp1, cores = 4, iter = 4000)
summary(m_monotonic_RSstim)
Note that I needed to increase the iterations here as I got a ‘low Bulk ESS’ warning, but after increasing iter to 4000 it worked fine.

summary of monotonic model with no random slope across participants
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ mo(x) + (1 | participant) + (1 + mo(x) | stimulus) 
   Data: exp1 (Number of observations: 2000) 
Samples: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
         total post-warmup samples = 8000

Group-Level Effects: 
~participant (Number of levels: 100) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)    57.62      4.11    49.95    66.28 1.00      809     1362

~stimulus (Number of levels: 20) 
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)          3.82      2.43     0.16     9.10 1.00     2183     2849
sd(mox)                4.53      1.28     2.38     7.42 1.00     1708     3194
cor(Intercept,mox)    -0.44      0.44    -0.96     0.71 1.00      695      979

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept   171.60      6.26   159.16   183.84 1.02      352      731
mox         -18.49      1.38   -21.18   -15.78 1.00     4003     4559

Simplex Parameters: 
        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
mox1[1]     0.34      0.04     0.25     0.42 1.00    10116     5838
mox1[2]     0.36      0.05     0.26     0.46 1.00     8792     5469
mox1[3]     0.31      0.04     0.22     0.39 1.00     9746     5231

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma    42.29      0.67    41.00    43.64 1.00     9156     5860

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

resulting in the following plots respectively

Thus, in general the monotonic model seems to work fine but there seems to be a problem when adding both random slope adjustments to the model

Am I missing something?

This result surprised me quite a bit as the monotonic models with only 1 random slope adjustment (for either participants or stimuli) seems to work fine. However, I assumed that there should be no problem of adding a the random slope adjustments as monotonic predictor. Does it not work the same way as adding a random slope adjustment of the linear predictor in this case. Does anyone have an idea why there is a difference? Any help or comments are highly appreciated, and I am happy to provide more information if needed.

  • Operating System: Windows 10 x64 (build 16299)
  • brms Version: 2.10.0
Complete Session Info
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=Dutch_Netherlands.1252  LC_CTYPE=Dutch_Netherlands.1252    LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C                      
[5] LC_TIME=Dutch_Netherlands.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gridExtra_2.3 plyr_1.8.4    brms_2.10.0   Rcpp_1.0.3    MASS_7.3-51.4 ggplot2_3.2.1 sjPlot_2.7.2 

loaded via a namespace (and not attached):
  [1] TH.data_1.0-10       minqa_1.2.4          colorspace_1.4-1     ggridges_0.5.1       rsconnect_0.8.15     sjlabelled_1.1.1     estimability_1.3    
  [8] markdown_1.1         parameters_0.2.0     base64enc_0.1-3      rstudioapi_0.10      rstan_2.19.2         ggrepel_0.8.1        DT_0.9              
 [15] mvtnorm_1.0-11       bridgesampling_0.7-2 codetools_0.2-16     splines_3.6.1        mnormt_1.5-5         knitr_1.25           shinythemes_1.1.2   
 [22] sjmisc_2.8.2         zeallot_0.1.0        bayesplot_1.7.0      nloptr_1.2.1         ggeffects_0.12.0     broom_0.5.2          shiny_1.4.0         
 [29] compiler_3.6.1       sjstats_0.17.7       emmeans_1.4.2        backports_1.1.5      assertthat_0.2.1     Matrix_1.2-17        fastmap_1.0.1       
 [36] lazyeval_0.2.2       cli_1.1.0            later_1.0.0          htmltools_0.4.0      prettyunits_1.0.2    tools_3.6.1          igraph_1.2.4.1      
 [43] coda_0.19-3          gtable_0.3.0         glue_1.3.1           reshape2_1.4.3       dplyr_0.8.3          vctrs_0.2.0          nlme_3.1-141        
 [50] crosstalk_1.0.0      psych_1.8.12         insight_0.7.0        xfun_0.10            stringr_1.4.0        ps_1.3.0             lme4_1.1-21         
 [57] mime_0.7             miniUI_0.1.1.1       lifecycle_0.1.0      gtools_3.8.1         zoo_1.8-6            scales_1.0.0         colourpicker_1.0    
 [64] hms_0.5.2            promises_1.1.0       Brobdingnag_1.2-6    parallel_3.6.1       sandwich_2.5-1       inline_0.3.15        shinystan_2.5.0     
 [71] loo_2.1.0            StanHeaders_2.19.0   stringi_1.4.3        bayestestR_0.4.0     dygraphs_1.1.1.6     boot_1.3-23          pkgbuild_1.0.6      
 [78] rlang_0.4.1          pkgconfig_2.0.3      matrixStats_0.55.0   lattice_0.20-38      purrr_0.3.3          labeling_0.3         rstantools_2.0.0    
 [85] htmlwidgets_1.5.1    processx_3.4.1       tidyselect_0.2.5     magrittr_1.5         R6_2.4.1             generics_0.0.2       multcomp_1.4-10     
 [92] withr_2.1.2          pillar_1.4.2         haven_2.2.0          foreign_0.8-72       xts_0.11-2           survival_2.44-1.1    abind_1.4-5         
 [99] tibble_2.1.3         performance_0.4.0    modelr_0.1.5         crayon_1.3.4         grid_3.6.1           callr_3.3.2          forcats_0.4.0       
[106] threejs_0.3.1        digest_0.6.22        xtable_1.8-4         tidyr_1.0.0          httpuv_1.5.2         stats4_3.6.1         munsell_0.5.0   

Hi, I am sorry that neither me or anyone else responded to this post yet. I currently don’t have the time investigate such a long post in detail unfortunately and try to detect what the cause could be. But could you run some model comparisons with loo to see which of the models fits better. Perhaps the issue ist just with the conditionining on the random effects being zero in the plot.

Have you tried to use make_stancode and compare how the random effects for

  • the linear random effects model (two random effects)
  • the monotonic models with one random effect
  • the monotonic model with two random effects

are calculated?

Thank you very much for the reply @paul.buerkner and no reason to apologize, I really appreciate how much time and effort you are willing to invest to help people out on this forum. I was already thinking that I might have made this question too long but I wanted to provide as detailed information as I could.
The general question however might be much shorter framed as is there any reason to expect that the random-effects structure of a cross-classified monotonic predictor model would differ from one that has a linear predictor that would explain why the linear predictor works (in recovering means) while the monotonic predictor does not?

As for your suggestion

The random effects are actually not zero, the y-axis scale is just somewhat unfortunate as the intercept is quite high, the smallest random-effect is still around sd = 5. I tried to compare them with loo and turns out the that the full monotonic model as well as the one with a random slope for participants have quite some problematic observations (> 20 obs with pareto-k > .7) which I also do not understand.
I also added a linear predictor model with no random slope for participants, and compared all of these five models:

                       elpd_diff se_diff  elpd_loo p_loo    looic   
loo_m_linear                0.0       0.0 -10130.0    197.5  20260.0
loo_m_linear_RSppid      -259.5      23.5 -10389.6    118.3  20779.1
loo_m_monotonic_RSstim   -298.8      24.4 -10428.8    123.4  20857.6
loo_m_monotonic_RSppid  -1236.3      73.4 -11366.3    586.3  22732.6
loo_m_monotonic         -1596.9      91.1 -11726.9    672.6  23453.8

As I would expect, the full linear model fits the best as it covers all terms that also went into the simulation. However, it still seems odd to me that the full monotonic model performs so badly. Any thoughts?


@Guido_Biele, Thanks for your suggestion! I already had a quick look a the stan-code back when I wrote this question up and could not identify anything strange, but I have to admit that I also do not feel entirely proficient in reading (brms-generated) stan code, so maybe I missed an important difference that would explain it. Here is the stancode:

linear model stancode
// generated with brms 2.10.0
functions {
}
data {
  int<lower=1> N;  // number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  vector[N] Z_1_2;
  int<lower=1> NC_1;  // number of group-level correlations
  // data for group-level effects of ID 2
  int<lower=1> N_2;  // number of grouping levels
  int<lower=1> M_2;  // number of coefficients per level
  int<lower=1> J_2[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_2_1;
  vector[N] Z_2_2;
  int<lower=1> NC_2;  // number of group-level correlations
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b;  // population-level effects
  // temporary intercept for centered predictors
  real Intercept;
  real<lower=0> sigma;  // residual SD
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  matrix[M_1, N_1] z_1;  // standardized group-level effects
  // cholesky factor of correlation matrix
  cholesky_factor_corr[M_1] L_1;
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  matrix[M_2, N_2] z_2;  // standardized group-level effects
  // cholesky factor of correlation matrix
  cholesky_factor_corr[M_2] L_2;
}
transformed parameters {
  // actual group-level effects
  matrix[N_1, M_1] r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)';
  // using vectors speeds up indexing in loops
  vector[N_1] r_1_1 = r_1[, 1];
  vector[N_1] r_1_2 = r_1[, 2];
  // actual group-level effects
  matrix[N_2, M_2] r_2 = (diag_pre_multiply(sd_2, L_2) * z_2)';
  // using vectors speeds up indexing in loops
  vector[N_2] r_2_1 = r_2[, 1];
  vector[N_2] r_2_2 = r_2[, 2];
}
model {
  // initialize linear predictor term
  vector[N] mu = Intercept + Xc * b;
  for (n in 1:N) {
    // add more terms to the linear predictor
    mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n] + r_2_1[J_2[n]] * Z_2_1[n] + r_2_2[J_2[n]] * Z_2_2[n];
  }
  // priors including all constants
  target += student_t_lpdf(Intercept | 3, 148, 70);
  target += student_t_lpdf(sigma | 3, 0, 70)
    - 1 * student_t_lccdf(0 | 3, 0, 70);
  target += student_t_lpdf(sd_1 | 3, 0, 70)
    - 2 * student_t_lccdf(0 | 3, 0, 70);
  target += normal_lpdf(to_vector(z_1) | 0, 1);
  target += lkj_corr_cholesky_lpdf(L_1 | 1);
  target += student_t_lpdf(sd_2 | 3, 0, 70)
    - 2 * student_t_lccdf(0 | 3, 0, 70);
  target += normal_lpdf(to_vector(z_2) | 0, 1);
  target += lkj_corr_cholesky_lpdf(L_2 | 1);
  // likelihood including all constants
  if (!prior_only) {
    target += normal_lpdf(Y | mu, sigma);
  }
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
  // group-level correlations
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
  vector<lower=-1,upper=1>[NC_1] cor_1;
  // group-level correlations
  corr_matrix[M_2] Cor_2 = multiply_lower_tri_self_transpose(L_2);
  vector<lower=-1,upper=1>[NC_2] cor_2;
  // extract upper diagonal of correlation matrix
  for (k in 1:M_1) {
    for (j in 1:(k - 1)) {
      cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
    }
  }
  // extract upper diagonal of correlation matrix
  for (k in 1:M_2) {
    for (j in 1:(k - 1)) {
      cor_2[choose(k - 1, 2) + j] = Cor_2[j, k];
    }
  }
}
full monotonic model stancode
// generated with brms 2.10.0
functions {

  /* compute monotonic effects
   * Args:
   *   scale: a simplex parameter
   *   i: index to sum over the simplex
   * Returns:
   *   a scalar between 0 and 1
   */
  real mo(vector scale, int i) {
    if (i == 0) {
      return 0;
    } else {
      return rows(scale) * sum(scale[1:i]);
    }
  }
}
data {
  int<lower=1> N;  // number of observations
  vector[N] Y;  // response variable
  int<lower=1> Ksp;  // number of special effects terms
  int<lower=1> Imo;  // number of monotonic variables
  int<lower=2> Jmo[Imo];  // length of simplexes
  // monotonic variables
  int Xmo_1[N];
  // prior concentration of monotonic simplexes
  vector[Jmo[1]] con_simo_1;
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  int<lower=1> NC_1;  // number of group-level correlations
  // data for group-level effects of ID 2
  int<lower=1> N_2;  // number of grouping levels
  int<lower=1> M_2;  // number of coefficients per level
  int<lower=1> J_2[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_2_1;
  int<lower=1> NC_2;  // number of group-level correlations
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
}
parameters {
  // temporary intercept for centered predictors
  real Intercept;
  // special effects coefficients
  vector[Ksp] bsp;
  // simplexes of monotonic effects
  simplex[Jmo[1]] simo_1;
  real<lower=0> sigma;  // residual SD
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  matrix[M_1, N_1] z_1;  // standardized group-level effects
  // cholesky factor of correlation matrix
  cholesky_factor_corr[M_1] L_1;
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  matrix[M_2, N_2] z_2;  // standardized group-level effects
  // cholesky factor of correlation matrix
  cholesky_factor_corr[M_2] L_2;
}
transformed parameters {
  // actual group-level effects
  matrix[N_1, M_1] r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)';
  // using vectors speeds up indexing in loops
  vector[N_1] r_1_1 = r_1[, 1];
  vector[N_1] r_1_2 = r_1[, 2];
  // actual group-level effects
  matrix[N_2, M_2] r_2 = (diag_pre_multiply(sd_2, L_2) * z_2)';
  // using vectors speeds up indexing in loops
  vector[N_2] r_2_1 = r_2[, 1];
  vector[N_2] r_2_2 = r_2[, 2];
}
model {
  // initialize linear predictor term
  vector[N] mu = Intercept + rep_vector(0, N);
  for (n in 1:N) {
    // add more terms to the linear predictor
    mu[n] += (bsp[1] + r_1_2[J_1[n]]) * mo(simo_1, Xmo_1[n]) + (bsp[1] + r_2_2[J_2[n]]) * mo(simo_1, Xmo_1[n]) + r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];
  }
  // priors including all constants
  target += student_t_lpdf(Intercept | 3, 148, 70);
  target += dirichlet_lpdf(simo_1 | con_simo_1);
  target += student_t_lpdf(sigma | 3, 0, 70)
    - 1 * student_t_lccdf(0 | 3, 0, 70);
  target += student_t_lpdf(sd_1 | 3, 0, 70)
    - 2 * student_t_lccdf(0 | 3, 0, 70);
  target += normal_lpdf(to_vector(z_1) | 0, 1);
  target += lkj_corr_cholesky_lpdf(L_1 | 1);
  target += student_t_lpdf(sd_2 | 3, 0, 70)
    - 2 * student_t_lccdf(0 | 3, 0, 70);
  target += normal_lpdf(to_vector(z_2) | 0, 1);
  target += lkj_corr_cholesky_lpdf(L_2 | 1);
  // likelihood including all constants
  if (!prior_only) {
    target += normal_lpdf(Y | mu, sigma);
  }
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept;
  // group-level correlations
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
  vector<lower=-1,upper=1>[NC_1] cor_1;
  // group-level correlations
  corr_matrix[M_2] Cor_2 = multiply_lower_tri_self_transpose(L_2);
  vector<lower=-1,upper=1>[NC_2] cor_2;
  // extract upper diagonal of correlation matrix
  for (k in 1:M_1) {
    for (j in 1:(k - 1)) {
      cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
    }
  }
  // extract upper diagonal of correlation matrix
  for (k in 1:M_2) {
    for (j in 1:(k - 1)) {
      cor_2[choose(k - 1, 2) + j] = Cor_2[j, k];
    }
  }
}
participant-only monotonic stancode
// generated with brms 2.10.0
functions {

  /* compute monotonic effects
   * Args:
   *   scale: a simplex parameter
   *   i: index to sum over the simplex
   * Returns:
   *   a scalar between 0 and 1
   */
  real mo(vector scale, int i) {
    if (i == 0) {
      return 0;
    } else {
      return rows(scale) * sum(scale[1:i]);
    }
  }
}
data {
  int<lower=1> N;  // number of observations
  vector[N] Y;  // response variable
  int<lower=1> Ksp;  // number of special effects terms
  int<lower=1> Imo;  // number of monotonic variables
  int<lower=2> Jmo[Imo];  // length of simplexes
  // monotonic variables
  int Xmo_1[N];
  // prior concentration of monotonic simplexes
  vector[Jmo[1]] con_simo_1;
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  int<lower=1> NC_1;  // number of group-level correlations
  // data for group-level effects of ID 2
  int<lower=1> N_2;  // number of grouping levels
  int<lower=1> M_2;  // number of coefficients per level
  int<lower=1> J_2[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_2_1;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
}
parameters {
  // temporary intercept for centered predictors
  real Intercept;
  // special effects coefficients
  vector[Ksp] bsp;
  // simplexes of monotonic effects
  simplex[Jmo[1]] simo_1;
  real<lower=0> sigma;  // residual SD
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  matrix[M_1, N_1] z_1;  // standardized group-level effects
  // cholesky factor of correlation matrix
  cholesky_factor_corr[M_1] L_1;
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  // standardized group-level effects
  vector[N_2] z_2[M_2];
}
transformed parameters {
  // actual group-level effects
  matrix[N_1, M_1] r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)';
  // using vectors speeds up indexing in loops
  vector[N_1] r_1_1 = r_1[, 1];
  vector[N_1] r_1_2 = r_1[, 2];
  // actual group-level effects
  vector[N_2] r_2_1 = (sd_2[1] * (z_2[1]));
}
model {
  // initialize linear predictor term
  vector[N] mu = Intercept + rep_vector(0, N);
  for (n in 1:N) {
    // add more terms to the linear predictor
    mu[n] += (bsp[1] + r_1_2[J_1[n]]) * mo(simo_1, Xmo_1[n]) + r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];
  }
  // priors including all constants
  target += student_t_lpdf(Intercept | 3, 148, 70);
  target += dirichlet_lpdf(simo_1 | con_simo_1);
  target += student_t_lpdf(sigma | 3, 0, 70)
    - 1 * student_t_lccdf(0 | 3, 0, 70);
  target += student_t_lpdf(sd_1 | 3, 0, 70)
    - 2 * student_t_lccdf(0 | 3, 0, 70);
  target += normal_lpdf(to_vector(z_1) | 0, 1);
  target += lkj_corr_cholesky_lpdf(L_1 | 1);
  target += student_t_lpdf(sd_2 | 3, 0, 70)
    - 1 * student_t_lccdf(0 | 3, 0, 70);
  target += normal_lpdf(z_2[1] | 0, 1);
  // likelihood including all constants
  if (!prior_only) {
    target += normal_lpdf(Y | mu, sigma);
  }
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept;
  // group-level correlations
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
  vector<lower=-1,upper=1>[NC_1] cor_1;
  // extract upper diagonal of correlation matrix
  for (k in 1:M_1) {
    for (j in 1:(k - 1)) {
      cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
    }
  }
}
stimulus-only monotonic stancode
// generated with brms 2.10.0
functions {

  /* compute monotonic effects
   * Args:
   *   scale: a simplex parameter
   *   i: index to sum over the simplex
   * Returns:
   *   a scalar between 0 and 1
   */
  real mo(vector scale, int i) {
    if (i == 0) {
      return 0;
    } else {
      return rows(scale) * sum(scale[1:i]);
    }
  }
}
data {
  int<lower=1> N;  // number of observations
  vector[N] Y;  // response variable
  int<lower=1> Ksp;  // number of special effects terms
  int<lower=1> Imo;  // number of monotonic variables
  int<lower=2> Jmo[Imo];  // length of simplexes
  // monotonic variables
  int Xmo_1[N];
  // prior concentration of monotonic simplexes
  vector[Jmo[1]] con_simo_1;
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  // data for group-level effects of ID 2
  int<lower=1> N_2;  // number of grouping levels
  int<lower=1> M_2;  // number of coefficients per level
  int<lower=1> J_2[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_2_1;
  int<lower=1> NC_2;  // number of group-level correlations
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
}
parameters {
  // temporary intercept for centered predictors
  real Intercept;
  // special effects coefficients
  vector[Ksp] bsp;
  // simplexes of monotonic effects
  simplex[Jmo[1]] simo_1;
  real<lower=0> sigma;  // residual SD
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  // standardized group-level effects
  vector[N_1] z_1[M_1];
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  matrix[M_2, N_2] z_2;  // standardized group-level effects
  // cholesky factor of correlation matrix
  cholesky_factor_corr[M_2] L_2;
}
transformed parameters {
  // actual group-level effects
  vector[N_1] r_1_1 = (sd_1[1] * (z_1[1]));
  // actual group-level effects
  matrix[N_2, M_2] r_2 = (diag_pre_multiply(sd_2, L_2) * z_2)';
  // using vectors speeds up indexing in loops
  vector[N_2] r_2_1 = r_2[, 1];
  vector[N_2] r_2_2 = r_2[, 2];
}
model {
  // initialize linear predictor term
  vector[N] mu = Intercept + rep_vector(0, N);
  for (n in 1:N) {
    // add more terms to the linear predictor
    mu[n] += (bsp[1] + r_2_2[J_2[n]]) * mo(simo_1, Xmo_1[n]) + r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];
  }
  // priors including all constants
  target += student_t_lpdf(Intercept | 3, 148, 70);
  target += dirichlet_lpdf(simo_1 | con_simo_1);
  target += student_t_lpdf(sigma | 3, 0, 70)
    - 1 * student_t_lccdf(0 | 3, 0, 70);
  target += student_t_lpdf(sd_1 | 3, 0, 70)
    - 1 * student_t_lccdf(0 | 3, 0, 70);
  target += normal_lpdf(z_1[1] | 0, 1);
  target += student_t_lpdf(sd_2 | 3, 0, 70)
    - 2 * student_t_lccdf(0 | 3, 0, 70);
  target += normal_lpdf(to_vector(z_2) | 0, 1);
  target += lkj_corr_cholesky_lpdf(L_2 | 1);
  // likelihood including all constants
  if (!prior_only) {
    target += normal_lpdf(Y | mu, sigma);
  }
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept;
  // group-level correlations
  corr_matrix[M_2] Cor_2 = multiply_lower_tri_self_transpose(L_2);
  vector<lower=-1,upper=1>[NC_2] cor_2;
  // extract upper diagonal of correlation matrix
  for (k in 1:M_2) {
    for (j in 1:(k - 1)) {
      cor_2[choose(k - 1, 2) + j] = Cor_2[j, k];
    }
  }
}

This could indicate a bug somewhere. I will take a closer look soon.

It turned out to be a bug indeed. Accidently, the overall effect of x was added twice in the monotonic model due to a typo in the stan code generating function. This should be fixed in the brms github version now.

2 Likes

Thanks @paul.buerkner! I just gave it a try, and now its indeed fixed for the simulation above, the means are recovered correctly and the model-comparison also makes sense now.

                   elpd_diff se_diff
m_monotonic           0.0       0.0 
m_linear             -1.3       2.2 
m_monotonic_RSppid  -20.1       6.2 
m_linear_RSppid    -259.2      23.5 
m_monotonic_RSstim -261.1      23.5 

And now that I know what to look at, I can also see the difference in the stancode so also thanks to @Guido_Biele for the suggestion.