I am working with data where around 30% of observations are measured incorrectly and are 2x larger than they should have been recorded. I want to use a mixture model to identify these observations to recover their actual value.
# simulate data
# (MacOS 11.1 and brms 2.14.4)
library(tidyverse)
set.seed(1234)
dat <- tibble(x = 1 + rbinom(300, 1, 0.69)) %>%
mutate(y1 = x*rnorm(300, 2.5),
y2 = x*rnorm(300, 5.5)
)
As x
is a hidden variable, what I have so far is:
# fit and check brms model
library(brms)
options(mc.cores = parallel::detectCores())
prior <- c(
prior(normal(2, 1), class = Intercept, dpar = mu1, resp = y1),
prior(normal(4, 2), class = Intercept, dpar = mu2, resp = y1),
prior(normal(5, 1), class = Intercept, dpar = mu1, resp = y2),
prior(normal(10, 2), class = Intercept, dpar = mu2, resp = y2)
)
fit1 <- brm(mvbind(y1, y2) ~ 1,
dat, family = mixture(gaussian, nmix = 2),
prior = prior,
chains = 8)
summary(fit1)
pp_check(fit1, resp = "y1")
pp_check(fit1, resp = "y2")
From this model, I can get an estimate of each observation being in the incorrectly coded group with pp_mixture
on both y1
and y2
.
Currently the model creates two independent parameters for the mixture weights theta1
and theta2
. However, whilst I don’t know the exact mixture weight, I know that the value of theta1
and theta2
are the same. I would like to either equate the two (theta1 = theta2
) or model only one theta
parameter. How can I set this in the model?