Hi everyone,
I have a question regarding a model that I tried to fit, giving me inconsistent results that I do not fully understand. In short, I am simulating cross-classified data in which an observation is determined by a population intercept + slope and random adjustments of intercept and slope per participant and stimulus. I am able to recover the simulation parameters with a model assuming a linear relationship, while I fail recovering some of the parameters when using a monotonic predictor instead. I will explain below.
Data Simulation
This is the R-Script (6.4 KB) I used to produce the results in this question.
I also include the important code-chunks here so you do not need to download the file necessarily.
I used for the simulation adapted from Schad, Betancourt, Vasishth (2019):
sim_dv <- function(dat=data,
n_participant=100, # number of participants
n_stimulus=20, # number of stimuli
beta0=165.07, # intercept
beta1=-19.07, # effect-size
sigma_u0=46.51, # SD of participant-intercept (random participant intercept)
sigma_u1=18.94, # SD of effect per participant (random slope for participant)
sigma_w0=5.19, # SD for stimulus-intercept (random stimulus intercept)
sigma_w1=3.57, # SD of effect per stimulus (random slope per stimulus)
rho_u=0.16, # random correlation between participant-int and participant-slope
rho_w=-0.37, # random correlation between stimulus-int and stimulus-slope
sigma=37.04, # unexplained variance / residual variance
N=dim(dat)[1]) { # number of observations to be sampled
Sigma_u <- matrix(c(sigma_u0^2, sigma_u0*sigma_u1*rho_u,
sigma_u0*sigma_u1*rho_u,sigma_u1^2),
ncol=2) # var-covar matrix for participant
## generate by subject intercepts and slopes:
u <- mvrnorm(n_participant,c(0,0),Sigma_u)
Sigma_w <- matrix(c(sigma_w0^2, sigma_w0*sigma_w1*rho_w,
sigma_w0*sigma_w1*rho_w,sigma_w1^2),
ncol=2) # var-covar matrix for stimulus
## generate by stimulus intercepts and slopes:
w <- mvrnorm(n_stimulus,c(0,0),Sigma_w)
## generate normally distributed data:
y <- NA
for (j in seq(1, N)){ # j <- 1
y[j] <- rnorm(1, beta0 + # common / fixed intercept
u[dat$participant[j],1] + # + random intercept for participant
w[dat$stimulus[j],1] + # + random intercept for stimulus
(beta1 + # + fixed effect
u[dat$participant[j],2] + # + random slope participant
w[dat$stimulus[j],2] # + random slope stimulus
) * dat$x[j], # + multiplicator for effect
sigma) # unexplained variance over all variables
}
return(y)
}
where dat
is generated using the following code
gen_design <- function(n_participant = 100,
n_stimulus = 20,
n_levels = 4){
participants <- rep(1:n_participant, each = n_stimulus)
#stims <- rep(1:n_stimulus, times = n_participant)
x <- c()
stims <- as.vector(sapply(seq(1, n_participant), function(x) sample(seq(1,n_stimulus), size = n_stimulus)))
for(i in 1:n_participant){
x <- append(x, seq(0,(n_levels-1)))
}
expdesign <- data.frame(participant = participants, stimulus = stims, x = x)
return(expdesign)
}
Thus I simulate data that represent a linear effect of the ordinal predictor x = {0,1,2,3} (the idea is that x represents a continuous construct of which only a subset of 4 possible values has been observed/manipulated) across 100 participants U who see a fixed set of 20 different stimuli W each. Across participants, the stimuli are randomly assigned to one of the 4 ordinal predictor levels, while within participants, a specific stimulus is only associated with one specific level of x. Moreover, I simulate random intercept and slope adjustments across both participants and stimuli.
Thus, the data is simulated as:
y_{uw} = b_0 + U_{0u} + W_{0w} + (b_1+U_{1u} + W_{1w})*X_{uw} + \epsilon_{uw}
where
- beta_0 = 165.07 is the population-level intercept
- U_{0u} = 46.51 is the standard deviation of the adjustment to the participant intercepts
- W_{0w} = 5.19 is the standard deviation of the adjustment to the stimulus intercepts
- beta_1 = -19.07 is the population-level slope
- U_{1u} = 18.94 is the standard deviation of the slope-adjustment on the participant level
- W_{1w} = 3.57 is the standard deviation of the slope-adjustment on the stimulus level
- \epsilon_{uw} = 37.04 is the residual standard-deviation
moreover, U_{0u} and U_{1u} are correlated with \rho_u =0.16 and W_{0w} and W_{1w} are correlated with \rho_w = -0.37 by sampling them from a multivariate normal
Note that these parameter values are based on estimates from another model, but I tried different parameter values as well and the problem remains the same.
so that I end up with the following dataset exp1
str(exp1)
'data.frame': 2000 obs. of 4 variables:
$ participant: int 1 1 1 1 1 1 1 1 1 1 ...
$ stimulus : int 4 7 1 2 13 19 11 17 14 3 ...
$ x : int 0 1 2 3 0 1 2 3 0 1 ...
$ y : num 120 102 165 130 153 ...
head(exp1)
participant stimulus x y
1 1 4 0 120.1112
2 1 7 1 102.2981
3 1 1 2 164.8664
4 1 2 3 129.6759
5 1 13 0 152.6609
6 1 19 1 192.1953
with(exp1, table(x, participant))[, 1:10]
participant
x 1 2 3 4 5 6 7 8 9 10
0 5 5 5 5 5 5 5 5 5 5
1 5 5 5 5 5 5 5 5 5 5
2 5 5 5 5 5 5 5 5 5 5
3 5 5 5 5 5 5 5 5 5 5
with(exp1, table(x, stimulus))
stimulus
x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0 21 23 25 25 27 25 20 21 28 23 26 27 27 34 26 24 21 34 21 22
1 25 30 26 21 20 22 32 31 26 29 23 24 30 20 27 31 23 19 22 19
2 30 30 21 26 26 31 27 23 20 21 22 21 23 26 28 18 21 27 33 26
3 24 17 28 28 27 22 21 25 26 27 29 28 20 20 19 27 35 20 24 33
Problem
linear predictor recovers parameters well
If I now try to recover the simulated parameters, it works quite well with a gaussian model estimating a linear effect of the predictor x:
m_linear <- brm(y ~ x + (1 + x | participant) + (1 + x | stimulus), data = exp1, cores = 4)
summary(m_linear)
summary of linear predictor model
Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ x + (1 + x | participant) + (1 + x | stimulus)
Data: exp1 (Number of observations: 2000)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Group-Level Effects:
~participant (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 46.95 3.72 40.19 54.81 1.00 1511 2310
sd(x) 19.25 1.59 16.35 22.57 1.00 2643 2892
cor(Intercept,x) 0.12 0.11 -0.10 0.34 1.00 1586 2068
~stimulus (Number of levels: 20)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 4.88 2.40 0.48 9.90 1.00 1410 1329
sd(x) 5.25 1.35 2.99 8.29 1.00 1056 1637
cor(Intercept,x) -0.61 0.33 -0.97 0.33 1.00 572 627
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 171.35 5.00 161.26 181.02 1.00 1284 1800
x -18.70 2.38 -23.34 -13.88 1.00 2199 2653
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 36.30 0.62 35.11 37.54 1.00 4246 2967
Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
resulting in the following plot, showing the posterior parameters against the simulated parameters (left) and the marginal effects against the observed (i.e. simulated) means per level of predictor x (right)
However, when I fit the same data with a monotonic effect of x, the model fails to recover the beta1 parameter, and the posterior means differ substantially from the observed means.
monotonic predictor does not recover beta1 accurately
m_monotonic <- brm(y ~ mo(x) + (1 + mo(x) | participant) + (1 + mo(x) | stimulus), data = exp1, cores = 4)
summary(m_monotonic)
summary of monotonic model
Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ mo(x) + (1 + mo(x) | participant) + (1 + mo(x) | stimulus)
Data: exp1 (Number of observations: 2000)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Group-Level Effects:
~participant (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 46.93 3.67 40.09 54.55 1.00 1689 2267
sd(mox) 19.36 1.61 16.41 22.72 1.00 2419 2741
cor(Intercept,mox) 0.09 0.11 -0.12 0.31 1.00 1412 2323
~stimulus (Number of levels: 20)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 5.07 2.37 0.60 9.93 1.00 1442 1359
sd(mox) 5.23 1.31 3.02 8.28 1.00 1556 2121
cor(Intercept,mox) -0.67 0.29 -0.97 0.16 1.00 1045 1173
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 172.47 5.14 162.55 182.83 1.00 1162 1879
mox -9.35 1.18 -11.68 -7.06 1.00 2424 3106
Simplex Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
mox1[1] 0.38 0.03 0.33 0.43 1.00 5673 3516
mox1[2] 0.32 0.03 0.26 0.38 1.00 8091 2899
mox1[3] 0.30 0.03 0.25 0.35 1.00 5867 2918
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 36.25 0.60 35.12 37.42 1.00 5904 3497
Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
resulting in the following parameter recovery and marginal effect vs. observed plot:
It appears that in this case, the size of the effect is largely underestimated. Interestingly, all other parameters still seem to be estimated just fine…
My first hinge was that there might be something weird going on with parameters being highly correlated, but the pairs(m_monotonic)
does not look too weird to me:
removing either random slope from monotonic model resolves this
Interestingly, when removing either random slope from the monotonic predictor model, the means are again recovered accurately.
m_monotonic_RSppid <- brm(y ~ mo(x) + (1 + mo(x) | participant) + (1 | stimulus), data = exp1, cores = 4)
summary(m_monotonic_RSppid)
summary of monotonic model with no random slope across stimuli
Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ mo(x) + (1 + mo(x) | participant) + (1 | stimulus)
Data: exp1 (Number of observations: 2000)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Group-Level Effects:
~participant (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 46.71 3.72 40.05 54.69 1.00 1511 2039
sd(mox) 19.24 1.57 16.37 22.56 1.00 1859 2462
cor(Intercept,mox) 0.10 0.11 -0.12 0.32 1.00 1218 1575
~stimulus (Number of levels: 20)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 4.89 1.44 2.38 8.10 1.00 1476 1843
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 172.39 5.13 162.33 182.54 1.00 973 1236
mox -18.60 2.04 -22.70 -14.52 1.00 1483 1799
Simplex Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
mox1[1] 0.38 0.03 0.33 0.43 1.00 4785 3253
mox1[2] 0.32 0.03 0.26 0.39 1.00 5944 3135
mox1[3] 0.29 0.03 0.24 0.35 1.00 6376 3117
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 36.72 0.61 35.55 37.98 1.00 4836 2919
Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
m_monotonic_RSstim <- brm(y ~ mo(x) + (1 | participant) + (1 + mo(x) | stimulus), data = exp1, cores = 4, iter = 4000)
summary(m_monotonic_RSstim)
Note that I needed to increase the iterations here as I got a ‘low Bulk ESS’ warning, but after increasing iter
to 4000 it worked fine.
summary of monotonic model with no random slope across participants
Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ mo(x) + (1 | participant) + (1 + mo(x) | stimulus)
Data: exp1 (Number of observations: 2000)
Samples: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
total post-warmup samples = 8000
Group-Level Effects:
~participant (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 57.62 4.11 49.95 66.28 1.00 809 1362
~stimulus (Number of levels: 20)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 3.82 2.43 0.16 9.10 1.00 2183 2849
sd(mox) 4.53 1.28 2.38 7.42 1.00 1708 3194
cor(Intercept,mox) -0.44 0.44 -0.96 0.71 1.00 695 979
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 171.60 6.26 159.16 183.84 1.02 352 731
mox -18.49 1.38 -21.18 -15.78 1.00 4003 4559
Simplex Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
mox1[1] 0.34 0.04 0.25 0.42 1.00 10116 5838
mox1[2] 0.36 0.05 0.26 0.46 1.00 8792 5469
mox1[3] 0.31 0.04 0.22 0.39 1.00 9746 5231
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 42.29 0.67 41.00 43.64 1.00 9156 5860
Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
resulting in the following plots respectively
Thus, in general the monotonic model seems to work fine but there seems to be a problem when adding both random slope adjustments to the model
Am I missing something?
This result surprised me quite a bit as the monotonic models with only 1 random slope adjustment (for either participants or stimuli) seems to work fine. However, I assumed that there should be no problem of adding a the random slope adjustments as monotonic predictor. Does it not work the same way as adding a random slope adjustment of the linear predictor in this case. Does anyone have an idea why there is a difference? Any help or comments are highly appreciated, and I am happy to provide more information if needed.
- Operating System: Windows 10 x64 (build 16299)
- brms Version: 2.10.0
Complete Session Info
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)
Matrix products: default
locale:
[1] LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252 LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C
[5] LC_TIME=Dutch_Netherlands.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gridExtra_2.3 plyr_1.8.4 brms_2.10.0 Rcpp_1.0.3 MASS_7.3-51.4 ggplot2_3.2.1 sjPlot_2.7.2
loaded via a namespace (and not attached):
[1] TH.data_1.0-10 minqa_1.2.4 colorspace_1.4-1 ggridges_0.5.1 rsconnect_0.8.15 sjlabelled_1.1.1 estimability_1.3
[8] markdown_1.1 parameters_0.2.0 base64enc_0.1-3 rstudioapi_0.10 rstan_2.19.2 ggrepel_0.8.1 DT_0.9
[15] mvtnorm_1.0-11 bridgesampling_0.7-2 codetools_0.2-16 splines_3.6.1 mnormt_1.5-5 knitr_1.25 shinythemes_1.1.2
[22] sjmisc_2.8.2 zeallot_0.1.0 bayesplot_1.7.0 nloptr_1.2.1 ggeffects_0.12.0 broom_0.5.2 shiny_1.4.0
[29] compiler_3.6.1 sjstats_0.17.7 emmeans_1.4.2 backports_1.1.5 assertthat_0.2.1 Matrix_1.2-17 fastmap_1.0.1
[36] lazyeval_0.2.2 cli_1.1.0 later_1.0.0 htmltools_0.4.0 prettyunits_1.0.2 tools_3.6.1 igraph_1.2.4.1
[43] coda_0.19-3 gtable_0.3.0 glue_1.3.1 reshape2_1.4.3 dplyr_0.8.3 vctrs_0.2.0 nlme_3.1-141
[50] crosstalk_1.0.0 psych_1.8.12 insight_0.7.0 xfun_0.10 stringr_1.4.0 ps_1.3.0 lme4_1.1-21
[57] mime_0.7 miniUI_0.1.1.1 lifecycle_0.1.0 gtools_3.8.1 zoo_1.8-6 scales_1.0.0 colourpicker_1.0
[64] hms_0.5.2 promises_1.1.0 Brobdingnag_1.2-6 parallel_3.6.1 sandwich_2.5-1 inline_0.3.15 shinystan_2.5.0
[71] loo_2.1.0 StanHeaders_2.19.0 stringi_1.4.3 bayestestR_0.4.0 dygraphs_1.1.1.6 boot_1.3-23 pkgbuild_1.0.6
[78] rlang_0.4.1 pkgconfig_2.0.3 matrixStats_0.55.0 lattice_0.20-38 purrr_0.3.3 labeling_0.3 rstantools_2.0.0
[85] htmlwidgets_1.5.1 processx_3.4.1 tidyselect_0.2.5 magrittr_1.5 R6_2.4.1 generics_0.0.2 multcomp_1.4-10
[92] withr_2.1.2 pillar_1.4.2 haven_2.2.0 foreign_0.8-72 xts_0.11-2 survival_2.44-1.1 abind_1.4-5
[99] tibble_2.1.3 performance_0.4.0 modelr_0.1.5 crayon_1.3.4 grid_3.6.1 callr_3.3.2 forcats_0.4.0
[106] threejs_0.3.1 digest_0.6.22 xtable_1.8-4 tidyr_1.0.0 httpuv_1.5.2 stats4_3.6.1 munsell_0.5.0