Confusing result of brms predict() and emmeans() for the ordinal continuation model

Dear all,

Could someone advise if the ordinal continuation ratio model accounts for the frequency of response when estimating its conditional probability?

The count (response) and its frequency are low on all plots for some species (explanatory variable), nevertheless the model assigns almost 100% probability for Acer in response 1. At the same time, it assesses lower probability for Fagus that is frequent and abundant. Moreover, the number of species in 20 cm dbh class is low and thus, the line should decline but it is going up.

In contrast, Fagus has a higher probability for the response 2 and 3, which I assume is the correct interpretation of the observed data.


Besides, I am puzzled with marginal mean in emmeans. The difference in response is huge. I cannot figure out what threshold was used in this case. I would appreciate any hints or advice.

inits <- list(`Intercept[1]` = 0.19,
              `Intercept[2]` = 1.33,
              `Intercept[3]` = 2.15,
              `Intercept[4]` = 2.92,
              `Intercept[5]` = 3.69,
              `Intercept[6]` = 4.34,
              `Intercept[7]`= 5.26,

inits_list <- list(inits, inits)
p= c(prior_string("student_t(5,0,0.5)", class="Intercept"),
     prior(normal(0,0.5), class=b),
     prior(student_t(5,0.1,0.5), class=sd))

f=brmsformula(count1 ~ ba_sqrt + dbh_log + species.p + species.m + 
position.m + status.p + y + offset(log(count_dead)) + 
(y|subplot_id), center=TRUE) + lf(disc ~ 0 + status.p+y+species.p, cmc = FALSE)

fit4=brm(f, data=d1, family = cratio(link="logit"), save_pars=save_pars(all = TRUE), 
cores  = 2, iter = 800 + 800, warmup = 800, chains = 2, seed=123,
inits = inits_list, sample_prior="yes", silent=TRUE, 
open_progress=FALSE, control = list(adapt_delta = 0.85, max_treedepth = 10))

nd <-data.frame(species.p=c("Acer platanoides", "Fagus sylvatica"), 
        species.m=d1$species.m, position.m=d1$position.m, 
        ba_sqrt=seq(2, 4, by=0.2), dbh_log = seq(2.3, 3, by=0.07), 
        count_dead=1, y="2015", status.p="alive", subplot_id=d1$subplot_id)
max_iter <- 100,  newdata = nd, subset = 1:max_iter, summary = T, 
        re_formula = NULL,  scale = "response")) 
together=cbind(nd$species.p, nd$dbh_log, p)
together$gr=rep(1:368, times=1, each=11)
prep=melt(together, id=c("nd$species.p", "nd$dbh_log", "gr"))
names(prep)[names(prep) == "nd$species.p"] <- "species.p"

prep %>% group_by(species.p, variable, dbh_exp, gr) %>% distinct(value) 
ggplot(subset(prep, variable %in% c("P(Y = 1)", "P(Y = 2)", "P(Y = 3)")), 
       aes(dbh_exp, value, group=gr, color=species.p), alpha=0.2)+
  stat_summary(fun = mean, geom = "line", aes(group=species.p))+
  facet_wrap(.~variable, scale="free")+theme(legend.position = "top")
  • Operating System: Windows 10
  • brms Version: 2.14.6

Sorry for not getting to your question earlier. It is relevant and on-topic.

That sounds correct to me - if the counts are low, then the model should assign high probability to Y = 1, i.e. that the count is the lowest possible. If the counts are high, than assigning low probability of Y = 1 sounds correct.

That - once again - looks correct to me: the probabilities for individual count response cannot be interpreted very directly - increase in probability that Y = 1 combined with decrease in probability that Y = 2 is likely and indication of decrease in average count. You might want to use posterior_epred to get you predictions of the posterior mean directly.

Additionally, the model is quite complex and I honestly don’t fully understand what is going on. I think there are two simple explanations that also cannot be ruled out immediately:

  1. The model is a poor fit to the data. This could be easily investigated with some posterior predictive checks. I would guess that things like pp_check(fit4, type = "bars_grouped", group = "species.p") pp_check(fit4, type = "stat_grouped", stat = "sd", group = "species.p") could show some potential problems.

  2. There is a bug in how you handle the predictions. I am for example somewhat suspicious of the use of distinct n the plot preparation, but I don’t really understand what is going on there, so I am likely to be wrong.

Unfortunately emmeans isn’t part of my standard toolkit, so can’t really help there. Is this comparable to results with conditional_effects from brms (which I have more experience with and could thus help)?

Best of luck with your model!