Model specification and approach: dist model/mixture models don't recover the structure of the data accurately

Hi,

I am working with a dataset where I have measured traits on multiple plant species and populations. For most of the traits I measured, I was able to fit appropriate models and posterior predictive checks corroborated this - chains mixed well and converged (there were still a few divergent transition warnings, but I am not too worried about those).

However, there is one trait where I have a problem figuring out the right approach. It is a continuous positive number, technically bounded (as it is time until midpoint of growth). Unlike other traits I measure however, this turned out to be multimodal. This is all fine and good, as there are biological explanations for this.

My first instinct was to try fitting a distributional model, as upon splitting the data into species, the density plots looked unimodal enough.

prior_norm = c(
  prior(normal(0, 1), "b"),
  prior(student_t(3, 3.4, 2.5), "Intercept"),
  prior(student_t(3, 0, 2.5), "sd"),
  prior(student_t(3, 0, 2.5), class="Intercept", dpar="sigma"),
  prior(student_t(3, 0, 2.5), class="sd", dpar="sigma"),
  prior(normal(0, 1), class="b", dpar="sigma")
)
f1 <- brmsformula(xmid ~ elev_Org_m * treatment * delta_elev + 
                                        (1 | gr(phylo, cov=A)) + (1 | species:region),
                             sigma ~ elev_Org_m * treatment * delta_elev + 
                                         (1 | gr(phylo, cov=A)) + (1 | species:region), 
                             family = lognormal)  

#elev_Org_m and delta_elev describe the elevational origin of the species and plant 

m <- brm(
  f1,
  data = d,
  data2=list(A=A), 
  prior = prior_norm,
  chains = 4, cores = 4,
  iter = 4000, warmup = 1000
)

However, this is what the pp_check() looks like:


While the posterior does display bimodality, it doesn’t recover the actual structure of the data all that well. I have tried making sigma simpler (just the species level random effects) but there isn’t much of a difference.

So after some searching I found that mixture models could be an alternative approach, which could also make sense as populations can have slow and fast growers. As I cannot explicitly model this in a distreg framework, I thought of letting mixture models do the hard work for me

f2 <- brmsformula(xmid ~ elev_Org_m * treatment * delta_elev + 
                    (1 | gr(phylo, cov=A)) + (1 | species:region), family = mixture(gaussian, gaussian)) 
#you will notice i use gaussian here: only did it as a starting point as maybe it makes things easier
#in general i use lognormal dist. to model my traits as all are restricted to be positive

prior_mix = c(
  prior(normal(0, 5), "b", dpar="mu1"),
  prior(normal(0, 5), "b", dpar="mu2"),
  prior(normal(18, .05), "Intercept", dpar="mu1", lb = (5), ub = (70)),
  prior(normal(40, .05), "Intercept", dpar="mu2", lb = (5), ub = (70)),
  prior(student_t(3, 0, 14), "sd", dpar = "mu1"),
  prior(student_t(3, 0, 14), "sd", dpar = "mu2"),
  prior(student_t(3, 0, 14), class="sigma1"),
  prior(student_t(3, 0, 14), class="sigma2"), 
  prior(dirichlet(10), class="theta")
)

m <- brm(
  f2,
  data = d,
  data2=list(A=A), 
  prior = prior_mix,
  chains = 4, cores = 4,
  iter = 4000, warmup = 1000
)

I read some of the other posts on convergence issues on mixture models, and that’s why my priors for the intercept have such low standard deviations. At least at this stage, parts of the model have good convergence, but other parts don’t. I have very few divergent transitions, high Rhat, and low ESS values.


Looks like I’m a bit closer here. I still think lognormal is the appropriate distribution, but since the log scale is smaller, further reducing SDs for the intercept priors make estimation unnecessarily hard…

I guess essentially this question boils down to mixture models and appropriate model (and prior specification) for them, so maybe I’ve provided too much needless info here (sorry!). So my final question is where to go from here and how to specify the mixture model a bit better and get my models to work?

Thank you in advance, I appreciate any help.

  • Operating System: MacOS (Apple Silicon)
  • brms Version: 2.20.4

@scholz any idea? sorry to tag you here but i’m still lost. thanks in advance!

Hey, no worries re. the ping. Happy to help if I can.
Just for starters, could you give me a little more info on your data? How many observations, and what is the process you are trying to model?

I think a mixture model is a good approach if you can’t fit the bimodality with some kind of multilevel structure. Is there anything in your data (you mention that there is a biological explanation for this) that you could use to model the bimodality? Maybe using a nonlinear model based on that theoretical background?

You also mention that you have a double-bounded outcome if my understanding is correct. In that case you could try a beta likelihood after transforming the outcome to (0,1), however, only if your upper bound is constant accross cases. Won’t solve the bimodality problem, but in my experience, the beta generally samples better than the lognormal, and you would properly account for both boundaries.

What parts of the model are struggling to converge? Depending on how many groups and cases per group you have, it might just be hard to estimate some of the group parameters.

And finally, the priors for the b’s look rather wide. Have you checked prior-predictive checks? (run the model with sample_prior = "only" and run a pp_check on it. It might just be that you need stronger regularization for some model parts to tie it down. Similarly, the intercepts could be too restrictive and not allow the model to explore the parameter space sufficiently.

Hope this helps a little. Let me know how it goes :)

3 Likes

Thanks for the response!

Some more info on the data: I have about 1220 observations in total, which are divided among species and further divided among populations within species. The smallest “group” has on average 18 observations. The trait I estimated is essentially the time individuals take to reach the “midpoint” of their growth.

model the bimodality

Unfortunately I did try to use species as an explanatory variable for a distreg model, as the distribution of the trait within each species looked “unimodal enough”. However, on careful probing some of the distributions are very wide, showing that within a species there’s a wide range of phenotypes. Funnily enough, fitdistrplus::descdist() on my response says it is “uniform”. I even used all my predictors of the trait as predictors for sigma (shown in the original post) - but it doesn’t give me a good fit. I think all the variables I could put into the model which I as the experimenter controlled, went into it already.

nonlinear

While there could be a nonlinear process going on, I have elevation values in km and their interactions will be way too small, making convergence impossible.

beta likelihood

I actually did try the Beta method already! The upper bound for all groups should be the same, as it cannot be longer than the duration of the experiment (they all reach an asymptote in growth). I took the inverse and scaled it to put it in (0,1) and here’s what I got:

f3 <- brmsformula(scaled.inv.xmid ~ elev_Org_m * treatment * delta_elev + 
                    (1 | gr(phylo, cov=A)) + (1 | species:region),
                  phi ~ (1 | gr(phylo, cov=A)) + (1 | species:region),family = brms::Beta()) 
m <- brm(
  f3,
  data = d,
  data2=list(A=A), 
  prior = prior_norm,
  chains = 4, cores = 4,
  iter = 4000, warmup = 1000
)


I was initially quite happy with this - chains converged well, high ESS and Rhat ~ 1. However, loo(m) tells me I have 3 problematic pareto-k values.

Computed from 12000 by 1224 log-likelihood matrix

         Estimate   SE
elpd_loo   1375.8 42.8
p_loo        38.4  6.8
looic     -2751.7 85.6
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     1221  99.8%   1117      
 (0.5, 0.7]   (ok)          0   0.0%   <NA>      
   (0.7, 1]   (bad)         2   0.2%   92        
   (1, Inf)   (very bad)    1   0.1%   26        

What parts of the model are struggling to converge? Depending on how many groups and cases per group you have, it might just be hard to estimate some of the group parameters.

Quite a lot of parts actually… only mu1_Intercept has an Rhat of 1, mu2_Intercept with 1.01, rest are all greater. As for the latter, I estimated many traits with similar amounts of groups and cases/group, didn’t encounter any problems there, so I hope this isn’t a problem.

priors for the b’s

I did try prior predictive checks, and now based on your comments after reducing b’s and making intercepts less restrictive here is what I get:

prior_mix = c(
  prior(normal(0.5, 0.5), "b", dpar="mu1"),
  prior(normal(0.5, 0.5), "b", dpar="mu2"),
  prior(normal(18, 2.5), "Intercept", dpar="mu1", lb = (5), ub = (70)),
  prior(normal(40, 4.0), "Intercept", dpar="mu2", lb = (5), ub = (70)),
  prior(student_t(3, 0, 15), "sd", dpar = "mu1"),
  prior(student_t(3, 0, 10), "sd", dpar = "mu2"),
  prior(student_t(3, 0, 14), class="sigma1"),
  prior(dirichlet(5), class="theta")
)


Honestly, when I get these sort of pp_check plots with priors, I am kinda lost. The fact that most of the yreps are flat make me think that my priors aren’t informative enough (is my interpretation correct?). It even looks the same with more restrictive Intercept prios. In general though, these stump me because I don’t know what part of the prior is making them behave this way.

Thanks again :)