Confusing result of brms predict() and emmeans() for the ordinal continuation model

petrovska · January 29, 2021, 12:50pm

Dear all,

Could someone advise if the ordinal continuation ratio model accounts for the frequency of response when estimating its conditional probability?

The count (response) and its frequency are low on all plots for some species (explanatory variable), nevertheless the model assigns almost 100% probability for Acer in response 1. At the same time, it assesses lower probability for Fagus that is frequent and abundant. Moreover, the number of species in 20 cm dbh class is low and thus, the line should decline but it is going up.

In contrast, Fagus has a higher probability for the response 2 and 3, which I assume is the correct interpretation of the observed data.

.

Besides, I am puzzled with marginal mean in emmeans. The difference in response is huge. I cannot figure out what threshold was used in this case. I would appreciate any hints or advice.

inits <- list(`Intercept[1]` = 0.19,
              `Intercept[2]` = 1.33,
              `Intercept[3]` = 2.15,
              `Intercept[4]` = 2.92,
              `Intercept[5]` = 3.69,
              `Intercept[6]` = 4.34,
              `Intercept[7]`= 5.26,
              `Intercept[8]`=6.7)

inits_list <- list(inits, inits)
p= c(prior_string("student_t(5,0,0.5)", class="Intercept"),
     prior(normal(0,0.5), class=b),
     prior(student_t(5,0.1,0.5), class=sd))

f=brmsformula(count1 ~ ba_sqrt + dbh_log + species.p + species.m + 
position.m + status.p + y + offset(log(count_dead)) + 
(y|subplot_id), center=TRUE) + lf(disc ~ 0 + status.p+y+species.p, cmc = FALSE)

fit4=brm(f, data=d1, family = cratio(link="logit"), save_pars=save_pars(all = TRUE), 
cores  = 2, iter = 800 + 800, warmup = 800, chains = 2, seed=123,
inits = inits_list, sample_prior="yes", silent=TRUE, 
open_progress=FALSE, control = list(adapt_delta = 0.85, max_treedepth = 10))

nd <-data.frame(species.p=c("Acer platanoides", "Fagus sylvatica"), 
        species.m=d1$species.m, position.m=d1$position.m, 
        ba_sqrt=seq(2, 4, by=0.2), dbh_log = seq(2.3, 3, by=0.07), 
        count_dead=1, y="2015", status.p="alive", subplot_id=d1$subplot_id)
View(nd)
max_iter <- 100
p=as.data.frame(predict(fit4,  newdata = nd, subset = 1:max_iter, summary = T, 
        re_formula = NULL,  scale = "response")) 
together=cbind(nd$species.p, nd$dbh_log, p)
together$gr=rep(1:368, times=1, each=11)
prep=melt(together, id=c("nd$species.p", "nd$dbh_log", "gr"))
prep$dbh_exp=exp(nd$dbh_log)
names(prep)[names(prep) == "nd$species.p"] <- "species.p"

prep %>% group_by(species.p, variable, dbh_exp, gr) %>% distinct(value) 
ggplot(subset(prep, variable %in% c("P(Y = 1)", "P(Y = 2)", "P(Y = 3)")), 
       aes(dbh_exp, value, group=gr, color=species.p), alpha=0.2)+
  stat_summary(fun = mean, geom = "line", aes(group=species.p))+
  facet_wrap(.~variable, scale="free")+theme(legend.position = "top")

Operating System: Windows 10
brms Version: 2.14.6

martinmodrak · February 5, 2021, 4:49pm

Sorry for not getting to your question earlier. It is relevant and on-topic.

That sounds correct to me - if the counts are low, then the model should assign high probability to Y = 1, i.e. that the count is the lowest possible. If the counts are high, than assigning low probability of Y = 1 sounds correct.

That - once again - looks correct to me: the probabilities for individual count response cannot be interpreted very directly - increase in probability that Y = 1 combined with decrease in probability that Y = 2 is likely and indication of decrease in average count. You might want to use posterior_epred to get you predictions of the posterior mean directly.

Additionally, the model is quite complex and I honestly don’t fully understand what is going on. I think there are two simple explanations that also cannot be ruled out immediately:

The model is a poor fit to the data. This could be easily investigated with some posterior predictive checks. I would guess that things like pp_check(fit4, type = "bars_grouped", group = "species.p") pp_check(fit4, type = "stat_grouped", stat = "sd", group = "species.p") could show some potential problems.
There is a bug in how you handle the predictions. I am for example somewhat suspicious of the use of distinct n the plot preparation, but I don’t really understand what is going on there, so I am likely to be wrong.

Unfortunately emmeans isn’t part of my standard toolkit, so can’t really help there. Is this comparable to results with conditional_effects from brms (which I have more experience with and could thus help)?

Best of luck with your model!

Topic		Replies	Views
Thresholds for ordinal brms model in emmeans brms emmeans	0	612	January 26, 2021
Conditional_effects() plots from brms model has a response category with consistent 0 probability (flat plot) General rstan , multinomial-response , conditional-effects , brms	1	137	July 23, 2024
Post-hoc test for multinomial logistic regression brm model (categorical responce) brms r , multinomial-response , brms , emmeans	4	425	July 7, 2024
Estimating contrasts (code peer-review) brms specification , brms , emmeans	3	819	May 20, 2021
Marginal_effects for multivariate models with mixed continuous/ordinal responses brms	4	584	May 15, 2019

Confusing result of brms predict() and emmeans() for the ordinal continuation model

Related topics