Mixture model for color histograms

Hello,
I am trying to make a longitudinal model of color histograms and I am running into a couple problems that I hope someone can provide expertise on.

I have image data from plants where color is a phenotype of interest. The color values I am most interested in are Hue values (HSV colorspace), so they are bounded 0:360 in a circle so that 0 and 360 are extremely similar reds. Luckily, these are not Poinsettias so I am not too worried about hitting the red boundary at 0 or 360. When the color data is returned from image analysis, it is a histogram bounded 1:180 with each bin representing 2 degrees of hue and each histogram value being a number of pixels with those hues.

Sometimes the histogram follows a standard distribution for the entire experiment, gaussian or lognormal is common. When that happens I feel pretty good about modeling the circular mean over time (see Only unimodal gaussian histograms reprex).

Sometimes the histogram does not follow a standard distribution and changes a lot between treatment groups. When this comes up I have generally used pairwise comparisons at some set time or pairwise distances between the histograms, but that isn’t very satisfying and I’d like to be able to model the data across time. I have been trying to make a mixture model to capture this, but even starting with what I hoped would be a simplified version has been difficult for me as the model takes about 2 hours to run 600 iterations and did not converge (see Mixture histograms reprex).

My specific questions and examples are below.

1: Are there parts of my mixture model below that seem like they would slow it down/cause problems?
2: Is there a better way to account for the circular shape of Hue than truncating the response term?
3: Is there a way to have the model exclude a mixture component altogether if the histogram is unimodal at that time? I was planning to use a spline to model theta over time by group so that the proportion could go towards 0, but it seems like that will slow the model down significantly more.
4: Did I completely miss some style of modeling that makes more sense here?

Thank you for your time!

Mixture histograms

Apologies for long reprex, not sure what the most painless way to simulate the data is.

library(brms)
library(ggplot2)
dists_a <- list(rlnorm = list(meanlog = log(80), sdlog = 0.3),
              rlnorm = list(meanlog = log(100), sdlog = 0.25),
              rnorm = list(mean = 120, sd = 10),
              rnorm = list(mean = 150, sd = 15),
              rmixnorm = list(mean=c(150, 110), sd = c(10, 20), alpha = c(0.7, 0.3)),
              rmixnorm = list(mean=c(150, 100), sd = c(10, 5), alpha = c(0.6, 0.4))
              )

dists_b <- list(rlnorm = list(meanlog = log(80), sdlog = 0.3),
              rlnorm = list(meanlog = log(90), sdlog = 0.25),
              rnorm = list(mean = 100, sd = 10),
              rnorm = list(mean = 125, sd = 15),
              rnorm = list(mean = 150, sd = 10),
              rnorm = list(mean = 155, sd = 7)
              )
dists <- list("a" = dists_a, "b" = dists_b)
time=24
nreps = 5
min_bin <- 1
max_bin <- 180
set.seed(123)

# showing ggplot of distributions

ggplot(data=data.frame(x=1:180), aes(x=x))+
  facet_grid(dist~group)+
  lapply(c("a", "b"), function(grp){
    lapply(1:6, function(di){
      rfun <- names(dists[[grp]])[di]
      rargs <- append(dists[[grp]][[di]], 1000)
      names(rargs)[length(rargs)]<-"n"
      v1 <- do.call(rfun, rargs)
      v1[v1>max_bin]<-max_bin
      v1[v1<min_bin]<-min_bin
      d <- data.frame(group = grp, x = v1, dist = di)
      geom_histogram(data=d, aes(x=x, fill=di), alpha=1, binwidth=1, show.legend = FALSE)
    })
  })+
  theme_light()
# Making data
df2 <- do.call(rbind, lapply(c("a", "b"), function(grp){
  do.call(rbind, lapply(1:nreps, function(rep){
    do.call(rbind, lapply(1:time, function(ti){
      whichDist <- ceiling(ti/4)
      rfun <- names(dists[[grp]])[whichDist]
      rargs <- append(dists[[grp]][[whichDist]], 1000)
      names(rargs)[length(rargs)]<-"n"
      v1 <- do.call(rfun, rargs)
      v1[v1>max_bin]<-max_bin
      v1[v1<min_bin]<-min_bin
      s1<-hist(v1, breaks=seq(min_bin, (max_bin+1), 1), plot=FALSE)$counts
      s1d<-as.data.frame(cbind(data.frame(grp, rep, ti), matrix(s1,nrow=1)))
      colnames(s1d)<-c("group","rep", "time", paste0("sim_",min_bin:max_bin))
      s1d
    }))
  }))
})) # data generally comes in wide format

library(data.table)
dt2 <- as.data.table(df2)
df2l <- as.data.frame(melt(dt2, id.vars = c("group", "rep", "time"), # lengthen the data
                           measure.vars = patterns("sim_*"), value.name = "weight"))
df2l$bin <- as.numeric(gsub("sim_", "", df2l$variable))
df2l<-df2l[df2l$weight>0,] # remove 0 weighted bins

# mixture model

mix <- mixture(gaussian, gaussian)
prior2 <- prior(student_t(3,0,5), nlpar = "A") +
  prior(student_t(3,0,5), nlpar = "B")

fit2 <- brm(bf(bin|weights(weight)+trunc(lb=0, ub=180) ~ A + B*time,
               #sigma1 ~ 0+time*group, # hoping to model these once the mu1&2 model works
               #sigma2 ~ 0+time*group,
               #theta1 ~ 0+group:time,
               # autocor = ~arma(~time|rep:group,1,1),
               A + B ~ 0+group, 
               nl = TRUE),
            family = mix,
            prior = prior2,
            data = df2l,
            iter = 600, cores = 2, chains = 2, backend = "cmdstanr", silent=0, init=0)

That model takes ~2 hours to run and looks pretty terrible.

> fit2
 Family: mixture(gaussian, gaussian) 
  Links: mu1 = identity; sigma1 = identity; mu2 = identity; sigma2 = identity; theta1 = identity; theta2 = identity 
Formula: bin | weights(weight) + trunc(lb = 0, ub = 180) ~ A + B * time 
         A ~ 0 + group
         B ~ 0 + group
   Data: df2l (Number of observations: 22383) 
  Draws: 2 chains, each with iter = 600; warmup = 300; thin = 1;
         total post-warmup draws = 600

Population-Level Effects: 
         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
A_groupa    87.86      1.36    87.53    88.53 1.40        4       68
A_groupb    87.85      1.13    87.45    88.51 1.43        4       84
B_groupa     2.80      0.12     2.70     2.92 1.52        4       47
B_groupb     2.80      0.11     2.71     2.93 1.56        4       41

Family Specific Parameters: 
       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma1    28.49     21.36     2.24    76.24 1.79        4       26
sigma2    29.45      0.83    28.77    30.78 1.59        4       20
theta1     0.02      0.03     0.00     0.07 2.18        3       12
theta2     0.98      0.03     0.93     1.00 2.23        3       11

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Warning messages:
1: Parts of the model have not converged (some Rhats are > 1.05). Be careful when analysing the results! We recommend running more iterations and/or setting stronger priors. 
2: There were 106 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 

Only unimodal gaussian histograms

This one works okay but I am still curious if anyone knows of a better way to model this data.

# simulate some data
set.seed(123)
time=25
means <- cbind(seq(80, 130, length.out=time), seq(65, 150, length.out=time))
sds <- cbind(seq(15, 8, length.out=time), seq(20, 5, length.out=time))
min_bin <- 1
max_bin <- 180
set.seed(123)
df1 <- do.call(rbind, lapply(1:time, function(ti){
  do.call(rbind, lapply(1:2, function(i){
    grp <- letters[i]
    do.call(rbind, lapply(1:5, function(rep){
      mu <- rnorm(1, means[ti, i], means[ti, i]/10)
      sd <- rnorm(1, sds[ti, i], sds[ti, i]/10)
      v1 <- rnorm(1000, mu, sd)
      v1[v1>max_bin]<-max_bin
      v1[v1<min_bin]<-min_bin
      s1<-hist(v1, breaks=seq(min_bin, (max_bin+1), 1), plot=FALSE)$counts
      s1d<-as.data.frame(cbind(data.frame(grp, rep, ti), matrix(s1,nrow=1)))
      colnames(s1d)<-c("group","rep", "time", paste0("sim_",min_bin:max_bin))
      s1d
    }))
  }))
}))
# aggregate it to Mu/Sd
ag1 <- do.call(rbind, lapply(1:nrow(df1), function(i){
  sub <- df1[i, -c(1:3)]
  hist <- rep(1:180, times = as.numeric(sub))
  data.frame(group = df1[i,"group"], rep = df1[i,'rep'], time = df1[i,'time'],
             mu = mean(hist), sd = sd(hist)) 
}))
# basic model of it
prior1 <- prior(student_t(3,0,5), nlpar = "A") +
  prior(student_t(3,0,5), dpar="sigma") +
  prior(gamma(2,0.1), class="nu")

fit1 <- brm(bf(mu ~ A + B*time, 
               sigma~ s(time, by=group),
               A + B ~ 0+group, 
               autocor = ~arma(~time|rep:group,1,1),nl = TRUE),
            family = student, prior = prior1, data = ag1, iter = 600, 
            cores = 2, chains = 2, backend = "cmdstanr", silent=0 )
fit1 # should work fine

# alternatively using weights instead of the mean
library(data.table)
dt <- as.data.table(df1)
df1l <- as.data.frame(melt(dt, id.vars = c("group", "rep", "time"),
            measure.vars = patterns("sim_*"), value.name = "weight"))
df1l$bin <- as.numeric(gsub("sim_", "", df1l$variable))

prior1b <- prior(student_t(3,0,5), nlpar = "A") +
  prior(gamma(2,0.1), class="nu")

fit1b <- brm(bf(bin|weights(weight) ~ A + B*time,
               A + B ~ 0+group, 
               autocor = ~arma(~time|rep:group,1,1),nl = TRUE),
            family = student, prior = prior1b, data = df1l, iter = 600, 
            cores = 2, chains = 2, backend = "cmdstanr", silent=0 )
fit1b # does not love to include a sigma term, but this version works

This is only a partial answer to some of your questions. I don’t actually know brms.

In Stan itself you can use a unit vector type for circular statistics.

Not that I know of. What will happen is that you’ll get two overlapping components with a lot of variation per run. You can try this with a mixture of two identical normals. But unmorality doesn’t mean it’s not a mixture. For example, normal(0, 1) and normal(0, 10) can be evenly mixed and it’s unimodal at 0, but is not normal. You can try to put something like a Dirichlet(0.1) or Dirichlet(0.0001) prior on the mixture probability which will encourage 0 values in these cases, but it’s not guaranteed structurally.

Thank you, I am much less familiar with using Stan directly and I did not know about that option! I’ll check out the unit vector for circular statistics.

That makes sense about a unimodal non-normal mixture of normals. I was worried initially about the sampler having trouble when the means are the same or close to the same but I’ll have to play with it some more to get a better feel for how that performs.