Results from monotonic effect versus dummy variables in brms

Hi,
I tried to model the same question with two different models for an ordinal outcome and ordinal predictor:
fit1: crated 3 dummy variables for an 4 levels ordinal variable and each level was always compared to basic level as 0.

fit 2: Used the ordinal variable in the model but use mo()

Below are the results for fit 1 and 2 and loo_comparison.
And marginal graphs for mo.
It seems fit 1 is better if i am right, but how come these two estimates are too different in fit1 and fit2.
Fit 1 estimates is more similar to frequentist ordinal regression model!!!

fit1: 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]      0.43      0.36    -0.27     1.14 1.00      914      843
Intercept[2]      1.59      0.35     0.93     2.34 1.00      882      812
Intercept[3]      2.63      0.35     2.00     3.33 1.00      850      811
Intercept[4]      3.46      0.35     2.80     4.20 1.00      813      867
Intercept[5]      4.23      0.36     3.54     4.95 1.00      802      811
Intercept[6]      5.01      0.36     4.33     5.75 1.00      804      872
Intercept[7]      5.69      0.37     5.00     6.43 1.00      809      802
Intercept[8]      6.42      0.38     5.73     7.20 1.00      781      842
Intercept[9]      7.24      0.39     6.49     8.01 1.00      851      807
Intercept[10]     8.03      0.40     7.28     8.86 1.00      831      727
Intercept[11]     9.18      0.44     8.36    10.07 1.00      783      892
hearing2cat1      1.35      0.14     1.07     1.61 1.00      904      743
hearing3cat1      1.28      0.24     0.81     1.72 1.00     1272      849
hearing4cat1      1.33      0.30     0.75     1.91 1.00     1113      704
age10int1         0.66      0.05     0.57     0.75 1.00      785      763
sex1              0.30      0.10     0.10     0.48 1.00     1366      912
yrs2              0.04      0.04    -0.03     0.11 1.00     1044      906

fit2:
Population-Level Effects: 
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]         -0.04      0.42    -0.84     0.81 1.00     3046     3209
Intercept[2]          1.17      0.42     0.38     2.03 1.00     3051     2904
Intercept[3]          2.27      0.43     1.48     3.15 1.00     3160     2665
Intercept[4]          3.13      0.43     2.33     4.00 1.00     3096     2837
Intercept[5]          3.91      0.43     3.09     4.78 1.00     3075     2849
Intercept[6]          4.68      0.44     3.85     5.56 1.00     3095     2681
Intercept[7]          5.36      0.44     4.52     6.25 1.00     3110     2781
Intercept[8]          6.08      0.45     5.23     6.99 1.00     3132     2639
Intercept[9]          6.88      0.45     6.03     7.78 1.00     3131     2701
Intercept[10]         7.66      0.46     6.81     8.60 1.00     3167     2741
Intercept[11]         8.79      0.49     7.84     9.80 1.00     3408     2846
sex1                  0.31      0.10     0.11     0.51 1.00     7074     3825
yrs2                  0.04      0.04    -0.03     0.12 1.00     4747     3722
morec_sevhearing1     0.56      0.07     0.45     0.70 1.00     4011     3279
moage10int1           0.63      0.07     0.51     0.77 1.00     2465     2756

Simplex Parameters: 
                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
morec_sevhearing11[1]     0.73      0.08     0.56     0.89 1.00     3585
morec_sevhearing11[2]     0.12      0.06     0.02     0.26 1.00     6196
morec_sevhearing11[3]     0.15      0.08     0.03     0.33 1.00     3997
moage10int11[1]           0.13      0.06     0.02     0.26 1.00     3198
moage10int11[2]           0.10      0.05     0.03     0.20 1.00     5083
moage10int11[3]           0.32      0.05     0.22     0.43 1.00     3751
moage10int11[4]           0.16      0.04     0.09     0.24 1.00     4982
moage10int11[5]           0.11      0.03     0.05     0.18 1.00     5265
moage10int11[6]           0.06      0.03     0.01     0.12 1.00     6426
moage10int11[7]           0.11      0.05     0.02     0.22 1.00     2997

 LOO fit 1
Computed from 1000 by 1320 log-likelihood matrix

         Estimate   SE
elpd_loo  -2832.5 22.8
p_loo        16.4  0.4
looic      5665.0 45.7
------
Monte Carlo SE of elpd_loo is 0.1.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.


loo fit2
         Estimate   SE
elpd_loo  -2819.9 23.0
p_loo        19.3  0.5
looic      5639.7 46.1
------
Monte Carlo SE of elpd_loo is 0.1.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.
 
loo_compare(fit1,fit2)
loo_compare(fit1,fit2)
     elpd_diff se_diff
fit2   0.0       0.0  
fit1 -12.6       5.6  

the marginal graph for fit2 is attached! 
Another question is about yrs(figure is also attached), with a wide variation! Do I need to transform this variable? 
Thanks for your inputs, it will help me move forward! ![hearing1|575x324](upload://kOmTf2SEEPZ2YpGaP3AyRGEaBqY.png) ![time2_bodn2|575x324](upload://kIwkwZkJuKXgpLObhsH3gAaenPq.png)
1 Like

hearing1

Fit monotonic effect

time2_bodn2

variable yrs!

Hi,

fit2 is in a way ā€˜betterā€™ than fit1:

> -12.6 + c(-1,1)*1.96*5.6
[1] -23.576  -1.624

Have you standardized Years, i.e., scale(yrs2)?

The uncertainty is there either way, and I donā€™t understand why this is a problem? Without seeing your models, and the data, itā€™s hard for me to say why thereā€™s such a large difference in the estimates.

1 Like

Hello,

model for fit1:

if (!file.exists("hearingcat1_2.RData")) {
 
			bodi_w2=read.csv("disease_w_tofu2.csv",check.names=TRUE)
			  
	
			pmulti_w<-get_prior( rec_n_r_bodn2~hearing2cat1+ hearing3cat1+ hearing4cat1+
				age10int1+sex1+yrs2+age10int1,family=cumulative(),data =bodi_w2)
			  
			  
			    pmulti_w$prior[1]<-"normal(0,1)" # sets a N(0,1) on \beta
				pmulti_w$prior[14]<-"normal(0,10)" # for each cutpoint
				
				 fit_hearingcat1_2<-brm( rec_n_r_bodn2~
			  
			   hearing2cat1+ hearing3cat1+ hearing4cat1+age10int1+sex1+yrs2+age10int1
			   ,family=cumulative(),data =bodi_w2,prior=pmulti_w, chains=5,control=list(adapt_delta=0.99,max_treedepth=15),cores=future::availableCores())
			  
			   save(pmulti_w, fit_hearingcat1_2, file="hearingcat1_2.RData")
				} else {
			load("hearingcat1_2.RData")
			}

Model for fit 2:

phear<-get_prior( rec_n_r_bodn2~ mo(rec_sevhearing1)+sex1+age10int1+yrs2,family=cumulative(),
  data =bodi_w2)

phear$prior[1] <- "normal(0,1)" # sets a N(0,1) on \beta
phear$prior[6] <- "normal(0,10)" # for each cutpoint
phear$prior[18] <- "dirichlet(2,2,2)" # sets weakly regularizing dirichlet for 1-3 Likert scale response

fit_hearing2<-brm(
  formula = rec_n_r_bodn2~ mo(rec_sevhearing1)+sex1+age10int1+yrs2,family=cumulative(),
  data =bodi_w2,prior=phear, chains=5,control=list(adapt_delta=0.99,max_treedepth=15),cores=future::availableCores())

yrs2 is time interval from first time that varies for each person.
n_r_bodn2 is response at yrs2 and the predictor is measured at yrs1 which is 1 for every person.
age10int is age categories by 10 years interval.
test.csv (35.3 KB)

Hi,

maybe someone else can provide more input, but I see you use adapt_delta and max_treedepth when sampling and that to me indicates you have a misspecified model. Something makes the samples struggle. Check pairs plots.

Well! I should have taken them out. First I ran this model for repeated data in long format and with repeated model (1+1|id) I for 10 visits I got the warning to increase adapt_delta and max_treedepth, and I assume it was because not every participant had equal number of visits.
But, with these models I did not need to do it as Rhat =1. My confusion here is to use monotonic effect or dummy variables. I plan to use the estimate as a score!
Thanks,

Hi Shabnam,

You have two models and one is apparently doing a better job in out of sample prediction. Do posterior predictive checks, i.e., pp_check(), to check if the model behaves. You can also simulate data and check if the model recovers all parameters.

I used pp_check and both model behaved well, also checked traceplot, they are good as well.
I need to practice simulation. Thanks for your advice.
On a differnt note, can i get individual-based posterior estimate? I mean posterior estimate for each person?
Thanks again.

If I understand the model above correctly then, no. If you want to model each individual I believe you need to use a varying intercept, i.e., (1 | individual). Perhaps @martinmodrak can provide some insights.

What I understood about varying intercept is that provides an oveall ā€œsdā€ for the model, not for each individual.
In lme4 it is possible to perform post hoc analyses and store individual prediction value or estimate. Not sure if this is possible with brm or other Bayes models.
Thanks,

Well, if you have a varying intercept then you will be able to look at ranef(Model) for each individual.

Great. Thanks. I run the model and see how it is and will update here.
Thanks again for all.

Yes (if I understand the question correctly), you can use posterior_predict or posterior_linpred (the former includes the observation nose, the latter only the model uncertainty). You can also predict for new data. One method Iā€™ve found useful in some cases is to make predictions for each person while varying only the factor of interest - see PPC for ordered_logistic? - #2 by martinmodrak for a bit more detail and ideas by others.

In Bayesian models it is usually beneficial to work with all the posterior draws instead of just the mean estimate, although this can make the post processing a bit more involved. predict and fitted methods of brmsfit also offer setting summary = TRUE to get a single estimate if thatā€™s what you prefer.

I believe @torkar might have interpreted your query as how to model individual-level variation, instead of obtaining individual-level prediction which might have led to some confusion (unless I am the one interpreting the conversation wrong :-) )

The difference in the model coefficients themselves is slightly surprising, but hard to interpret on its own as the intercepts have also changed noticeably. As you have correctly noted, using prediction is usually better for comparing differences.

Best of luck with your model!

1 Like

You got it right. I need individual estimate. The reason for that is I add up estimates of various predictors to create a total score for each person. Thanks for the hints. I need to learn them for first time. Thank you both for your time and help.

Still @torkar suggestion may work to consider each individual as a level! Just needs more chains as ids are not repeated!!

Do you have only one measurement/person? If thatā€™s the case I believe it would not work.

Yes, in the models above its one measure/person. When we get estimates its mean estimate, I need to get personalized estimate. I will try predict

While I donā€™t think you should include varying intercept per person unless you have good reason to (e.g. from PP checks or theory), I think there is a small confusion about how that would work.

In my experience non-repeating ids are (usually) not a problem for Bayesian models and I have fitted individual-level models succesfully in brms on default settings (I donā€™t think there is any reason to run more chains), although all of them were with different families than cumulative.

The theoretical reason is that having a varying intercept for each observation is actually the same as using a more ā€œoverdispersedā€ distribution for observation model. AFAIK the only commonly used distribution where such an ā€œoverdispersedā€ alternative does not exist and the model ceases to be identifiable is the bernoulli (logistic regression). Still it is possible that for your particular dataset and model, varying intercept for each observation will introduce computational issues with the cumulative family and it will likely take a much longer time to fit. I however donā€™t see an a priori reason why it should not be possible. It is also not clear whether it would be actually of any use, in particular if it would be better than changing the link parameter to say cauchit.

But you have motivated me to do a little experiment of my own using individual-level intercept with cumulative response and it works more often than not, recovering the parameters from simulated data, although it needs slightly increased adapt_delta and not very wide prior to avoid computational issues (the effective sample size per iteration is on the low end of acceptable).

Hereā€™s the code:

library(tidyverse)
library(brms)
options(mc.cores = parallel::detectCores())

set.seed(8964654)
N <- 100
N_levels <- 6
intercepts <- sort(rt(N_levels - 1, 3))
varying_sd <- abs(rnorm(1, 0, 1))
test_data <- data.frame(id = 1:N, varying_effect = rnorm(N, 0, varying_sd)) %>%
  mutate(response_raw = rlogis(N) + varying_effect,
         # The response could probably be simulated in a more clever way
         response_ordinal = case_when(response_raw < intercepts[1] ~ 1,
                                      response_raw < intercepts[2] ~ 2,
                                      response_raw < intercepts[3] ~ 3,
                                      response_raw < intercepts[4] ~ 4,
                                      response_raw < intercepts[5] ~ 5,
                                      TRUE ~ 6))


# Priors matching how I simulated the data
priors <- c(set_prior("normal(0, 1)", class = "sd"), set_prior("student_t(3, 0, 1)", class = "Intercept"))
fit <- brm(response_ordinal ~  1|id, data = test_data, family = cumulative(), prior = priors, control = list(adapt_delta = 0.95))

summary(fit)
2 Likes

Thanks for all the explanation and the experiment. I play with my data as well accordingly and will update here if anything interesting comes up.
Thanks again @martinmodrak and @torkar