Help obtaining the probability each data point arose under each mixture component

datarichard · July 25, 2022, 2:43am

Hi,

I’ve been trying to understand if a finite mixture model will help my project, and so I began with Paul Burkner’s helpful but brief vignette on mixture modelling in brms (Finite Mixture Families in brms — mixture • brms).

However my project is a clustering problem, so I’m particularly interested in obtaining the probability each data point arose under the different mixture components.

In this particular case, I’m modelling changes in test scores from people, where test scores are collected repeatedly over time and I’m interesting in identifying the people whose trajectory improves.

A replicable example is shown below:

library(tidyverse)
library(brms)

# create a data set from two different linear eqns (clusters):  
# y ~ -1 + 2*x
# y ~ 1 - 2*x
# 
# Persons are identifed by w, and each cluster is a set of unique persons 

set.seed(1) # for replicability
x = rnorm(100)
data.frame(
  y = rnorm(100, -1 + 2*x), # mu_I = -1, mu_x = +2
  x = x,
  w = sample(LETTERS[1:10], 100, replace = T) # persons A to J
) -> d1

x = rnorm(100)
data.frame(
  y = rnorm(100, 1 - 2*x), # mu_I = +1, mu_x = -2
  x = x,
  w = sample(LETTERS[11:20], 100, replace = T) # persons K to T
) -> d2

dat <- bind_rows(d1, d2) 

# Set up the model
f <- bf(y ~ 1 + x + (1|w))
mix = mixture(gaussian, gaussian)

# Informed priors to help sampling in this example:
prior <- c(
  prior(normal(-1, 3), Intercept, dpar = mu1),
  prior(normal(1, 3), Intercept, dpar = mu2),
  prior(normal(2, 3), b, dpar = mu1),
  prior(normal(-2, 3), b, dpar = mu2)
)


fit <- brm(formula = f, 
           data = dat,
           family = mix,
           prior = prior,
           chains = 2,
           seed = 1)

summary(fit)

#  Family: mixture(gaussian, gaussian) 
#   Links: mu1 = identity; sigma1 = identity; mu2 = identity; sigma2 = identity; 
# theta1 = identity; theta2 = identity 
# Formula: y ~ 1 + x + (1 | w) 
#    Data: dat (Number of observations: 200) 
#   Draws: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
#          total post-warmup draws = 2000
# 
# Group-Level Effects: 
#   ~w (Number of levels: 20) 
#                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sd(mu1_Intercept)     0.22      0.18     0.01     0.66 1.00      638      861
# sd(mu2_Intercept)     0.61      0.35     0.06     1.44 1.02      189      363
# 
# Population-Level Effects: 
#               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# mu1_Intercept    -1.17      0.15    -1.46    -0.87 1.00      763      980
# mu2_Intercept     0.92      0.24     0.36     1.35 1.00      454      208
# mu1_x             1.94      0.12     1.69     2.18 1.00     1443      758
# mu2_x            -1.94      0.14    -2.19    -1.64 1.00      995      766
# 
# Family Specific Parameters: 
#        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma1     0.89      0.09     0.71     1.08 1.00      903     1108
# sigma2     1.06      0.10     0.88     1.27 1.00     1060     1184
# theta1     0.47      0.05     0.37     0.57 1.00      839      646
# theta2     0.53      0.05     0.43     0.63 1.00      839      646
# 
# Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
# and Tail_ESS are effective sample size measures, and Rhat is the potential
# scale reduction factor on split chains (at convergence, Rhat = 1).

Is there a way to identify which mixture component each person (w) came from? I hope this question makes sense. I’ve read comments on the Stan user guide which makes me think it is possible (5.5 Inferences supported by mixtures | Stan User’s Guide), and I’m hoping something similar can be done using a post estimation procedure with brms…

Thanks,

Rich

Ara_Winter · August 19, 2022, 4:31pm

Sorry about the delay in getting to your questions. I saw somewhere a nice bit of code for doing just this but I totally did not bookmark the site. I’ll keep digging around. Hopefully someone else here might have a faster answer.

Topic		Replies	Views
Question re: finite discrete mixture model in brms brms techniques , specification	3	433	February 4, 2021
Mixture model rejecting initial value brms fitting-issues , specification , brms	2	1005	February 21, 2022
Interpreting recovered mixing parameters for mixture of > 2 distributions brms	0	408	September 29, 2022
Possible to specify which datapoints belong to each distribution in mixture model? brms brms	6	498	March 7, 2023
Specifying joint mixture model brms mixture	2	1447	January 15, 2021

Help obtaining the probability each data point arose under each mixture component

Related topics