Issues with bimodality in parameter posterior distributions of non-linear time series

Hi all,

I will preamble by saying thank for reading this, and that any commenter should feel free to criticise any aspect of this modelling specification or workflow you deem naive or uninformed.

I am simulating two groups from this double sigmoid curve process:

\begin{equation*} \begin{aligned} \operatorname{y} &= \frac{y_{mid}}{1 + e^{(-a \cdot time+b)}} \ + \frac{y_{max}}{1 + e^{(-c\cdot time+d)}} , \quad 0 \leq y \leq 100 \end{aligned} \end{equation*}

Using the following code:

sw_dbl_log <- function(a, b, c, d, time){
  ((50)/(1+exp(-a*Time + b)))+((50)/(1+exp(-c*Time + d)))

nn = 300
Raw<- tibble(n = 1:(nn*2),
              Group = rep(c("Control", "Experimental"), each = nn),
              a = c(rnorm(nn, 0.1, 0.01),
                    rnorm(nn, 0.1,  0.01)),
              b = c(rnorm(nn, 5,  0.01),
                    rnorm(nn, 5,  0.01)),
              c = c(rnorm(nn, 0.1, 0.01),
                    rnorm(nn, 0.1, 0.002)),
              d = c(rnorm(nn, 5, 0.01),
                    rnorm(nn, 25, 0.01))
) %>% 
  tidyr::expand(nesting(n,Group, a, b, c, d),
                Time = seq(0, 300, by = 10)) %>% 
  mutate(y = sw_dbl_log(a, b, c, d, Time))

As expected, this produces gaussian distributions for each Group’s parameters (a:d), with the Control group’s time series resembling a single sigmoid with one growth phase. The Experimental group’s time series has two phases of growth, as intended. I am leveraging this specification because each parameter gives a clearly interpretable output, that is, a and c represent growth rates and b represents the time point of inflexion for each of the growth phases.

My fitting procedure is as follows :

formula <- bf(y ~ (50/(1+exp(-a*Time + b))) + (50/(1+exp(-c*Time + d))), 
              a + b + c + d ~ 0 + Group,
              nl = TRUE)

prior1 <- c(
  prior(normal(0.1, 0.1), nlpar = "a", lb = 0),
  prior(normal(5, 2), nlpar = "b", lb = 0, ub = 10),
  prior(normal(0.1, 0.1), nlpar = "c",lb = 0),
  prior(normal(25, 5), nlpar = "d", coef = "GroupExperimental"),
  prior(normal(5, 2), nlpar = "d", coef = "GroupControl")

fit <- brm(formula, 
           data = Raw, 
           prior = prior1,
           control = list(adapt_delta = 0.95, max_treedepth = 15),
           iter = 10000,
           warmup = 5000,
           cores = 3, 
           chains = 3)

I’ve omitted prior fitting and predictive checks from the code above, but visual inspection looks good.

This issue I am currently having is that the model fit is poor, namely, the Rhat and ESS numbers are too low, and it’s principally due to bimodality in the posterior distribution of the parameters for either one of the groups, sometimes Control and sometimes Experimental, that is causing the chains not to converge.

I cannot seem to define why the model is fitting poorly and causing this bimodality. My first guess is that the priors are causing issues, but, as per my naïve understanding, the priors are specified pretty tightly around the original simulation distributions so I cannot see why they might be causing the problem. I must admit that this conclusion might expose my ignorance on the topic which is fueling my inability to fix the fit issues.

Should I change the priors, the model, stipulate init values or simply run for more iterations? Any help would be greatly appreciated.

As an update to this, here are the posteriors and trace plots for the posteriors by Group.

One clue to the issue might be that if I made the Control group’s simulation values for b and d different, and thus make the time series represent a double sigmoid instead of a single sigmoid, their posteriors return as expected. For example, if

a_{C} = 0.1, c_{C} = 0.1, \\ b_{C} = 5, d_{C} = 20

those values match the mean of the posteriors and the chains converge as expected for that group, but do not for the E group.

Hello @Matt_Zelko,

This looks like it might be a label-switching problem to me. The parameters ‘b_b_GroupE’ and ‘b_d_GroupE’ represent where the inflection points are, right? That these posteriors are bimodal, with about the same shape, suggests that the modes represent the two parameters being estimated, and there is not enough constraint in the structure of your model to ensure which ‘real’ parameter corresponds to which ‘named’ parameter.

In my experience it is effective to build such constraints in the definition of the parameters. For example, if d<b, then instead of estimating d as a primary parameter, estimate d-b and constrain it to be positive.

1 Like

Hey Andrew,

Thanks for the productive feedback here! It definitely does look like label switching, since the posteriors appear to take on parts of each others distributions. However, the solution probably hits at the heart of the model, that is, sometimes d = b, although I think specifying d = b + \delta might work.



I just thought of the directional constraints because they were suitable for me for this sort of issue in pharmacometric models (but in that case such constraints are easy to justify). Maybe there are other constraints that might be suitable for this system. If not, I suppose it is a conceptual problem to have a model that is flexible enough for your purpose, but not so flexible that it is susceptible to this sort of issue.

There might be some relevant ideas around this from the mixture modelling space, if there’s no way to respecify the model (but I don’t have much practical experience with that).

Thanks for the tip about the mixture modelling approach, I will look at this more today.

On a side note, reparameterising d as b - \delta did manage to make the model converge once, but subsequent fits again suffered from label switching, this time in the \delta parameter.

Another piece of advice I was given here was to model them separately, one using a single sigmoid and the other using a double sigmoid, then compare them using their posterior samples. This approach assumes that error variance is not biasing the estimates enough to realistically impact inferential decisions. What do you make of this suggestion?

As an update, since I require a flexible solution for model these time series, I’ve modelled the time series as a longitudinal gaussian process using the lgpr package:

Thanks again Andrew, really appreciate your efforts here!