Please also provide the following information in addition to your question:
- Operating System: OSX 10.14.5
- brms Version: 2.8.0
Hi,
I have data on the measurements of diameter (here in mm) of a skin test that look this this:
library(tidyverse)
library(brms)
#> Loading required package: Rcpp
#> Registered S3 method overwritten by 'xts':
#> method from
#> as.zoo.xts zoo
#> Loading 'brms' package (version 2.8.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
set.seed(1234)
test <- data.frame(result = c(rpois(500, 0.25), rpois(150, 10)))
test %>%
ggplot() +
geom_bar(aes(x=result))
Created on 2019-07-15 by the reprex package (v0.3.0)
There are two underlying processes to be estimated:
- Between 0 and around 4, most people probably don’t have the disease, although some will
- At 5 or above, most people probably do have the disease, although some people with a smaller result may in fact have disease
I have been reading around mixture models as implemented in brms
, but have been struggling to specify correctly for my data. I have tried the following, with a message about not mixing families:
library(brms)
#> Loading required package: Rcpp
#> Registered S3 method overwritten by 'xts':
#> method from
#> as.zoo.xts zoo
#> Loading 'brms' package (version 2.8.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
mix <- mixture(poisson, gaussian)
#> Error: Cannot mix families with real and integer support.
prior <- c(
prior(poisson(0.25), Intercept, dpar = mu1),
prior(gaussian(10, 4), Intercept, dpar = mu2)
)
fit <- brm(bf(result ~ 1),
data = test,
family = mix,
prior = prior,
chains = 1)
#> Error in validate_formula.brmsformula(formula, data = data, family = family, : object 'mix' not found
pp_check(fit)
#> Error in pp_check(fit): object 'fit' not found
Created on 2019-07-15 by the reprex package (v0.3.0)
Thanks.
1 Like
I’m not an expert and fairly new to using library(brms)
myself. Have you tried making a mixture family that’s both Poisson’s? Your example data was generated using 2 Poissons, so since (in this example) you are privy to the underlying process it would make sense to do that.
Thanks so much - yes, I did try that too, but without success:
library(brms)
#> Loading required package: Rcpp
#> Registered S3 method overwritten by 'xts':
#> method from
#> as.zoo.xts zoo
#> Loading 'brms' package (version 2.8.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
mix <- mixture(poisson, poisson)
#> Setting order = 'mu' for mixtures of the same family.
prior <- c(
prior(poisson(0.25), Intercept, dpar = mu1),
prior(poisson(10), Intercept, dpar = mu2)
)
fit <- brm(bf(result ~ 1),
data = test,
family = mix,
prior = prior,
chains = 1)
#> Error in isTRUE(attr(data, "brmsframe")): object 'test' not found
pp_check(fit)
#> Error in pp_check(fit): object 'fit' not found
Created on 2019-07-15 by the reprex package (v0.3.0)
Gives the error message:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for:
poisson_lpdf(real, real)
Function poisson_lpdf not found.
error in 'modelb44f45f46c52_fileb44f6d3f6de' at line 29, column 51
-------------------------------------------------
27: vector[N] mu2 = temp_mu2_Intercept + rep_vector(0, N);
28: // priors including all constants
29: target += poisson_lpdf(temp_mu1_Intercept | 0.25);
^
30: target += poisson_lpdf(temp_mu2_Intercept | 10);
-------------------------------------------------
Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname, :
failed to parse Stan model 'fileb44f6d3f6de' due to the above error.
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for:
poisson_lpdf(real, real)
Function poisson_lpdf not found.
error in 'modelb44f71642f97_fileb44f6d3f6de' at line 29, column 51
-------------------------------------------------
27: vector[N] mu2 = temp_mu2_Intercept + rep_vector(0, N);
28: // priors including all constants
29: target += poisson_lpdf(temp_mu1_Intercept | 0.25);
^
30: target += poisson_lpdf(temp_mu2_Intercept | 10);
-------------------------------------------------
Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname, :
failed to parse Stan model 'fileb44f6d3f6de' due to the above error.
This looks strange. Do you have a full reproducible example for me?
Not OP, but I was testing his code and have a reprex demonstrating his exact problem:
library(tidyverse)
library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.9.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
set.seed(1234)
test <- data.frame(result = c(rpois(500, 0.25), rpois(150, 10)))
mix <- mixture(poisson, poisson)
#> Setting order = 'mu' for mixtures of the same family.
prior <- c(
prior(poisson(0.25), class = "Intercept", dpar = "mu1"),
prior(poisson(10), class = "Intercept", dpar = "mu2")
)
fit.brm <- brm(bf(result ~ 1),
data = test,
family = mix,
prior = prior,
chains = 1)
#> SYNTAX ERROR, MESSAGE(S) FROM PARSER:
#>
#> No matches for:
#>
#> poisson_lpdf(real, real)
#>
#> Function poisson_lpdf not found.
#> error in 'model744555c541_file74455ad5a546' at line 29, column 52
#> -------------------------------------------------
#> 27: vector[N] mu2 = temp_mu2_Intercept + rep_vector(0, N);
#> 28: // priors including all constants
#> 29: target += poisson_lpdf(temp_mu1_Intercept | 0.25);
#> ^
#> 30: target += poisson_lpdf(temp_mu2_Intercept | 10);
#> -------------------------------------------------
#>
#> Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname, :
#> failed to parse Stan model 'file74455ad5a546' due to the above error.
#> SYNTAX ERROR, MESSAGE(S) FROM PARSER:
#> No matches for:
#>
#> poisson_lpdf(real, real)
#>
#> Function poisson_lpdf not found.
#> error in 'model744533d0b466_file74455ad5a546' at line 29, column 52
#> -------------------------------------------------
#> 27: vector[N] mu2 = temp_mu2_Intercept + rep_vector(0, N);
#> 28: // priors including all constants
#> 29: target += poisson_lpdf(temp_mu1_Intercept | 0.25);
#> ^
#> 30: target += poisson_lpdf(temp_mu2_Intercept | 10);
#> -------------------------------------------------
#>
#> Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname, : failed to parse Stan model 'file74455ad5a546' due to the above error.
Created on 2019-07-15 by the reprex package (v0.3.0)
At the risk of stating the obvious, the Stan code should probably be poisson_lpmf
instead of _lpdf
. Hope this helps!
Ah, now that I look at this code, indeed, why did you specify a poison prior for the intercepts? This can’t possibly work as parameters in stan are continuous while the poisson is a discrete distribution.
1 Like
Okay - that makes sense. I guess I am still struggling to conceptualise how to specify priors to reflect my understanding of the data here. I think my difficultly is in ensuring that the first distribution doesn’t fall below zero, which wouldn’t be possible in the data:
library(brms)
#> Loading required package: Rcpp
#> Registered S3 method overwritten by 'xts':
#> method from
#> as.zoo.xts zoo
#> Loading 'brms' package (version 2.8.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
test <- data.frame(result = c(rpois(500, 0.25), rpois(150, 10)))
mix <- mixture(gaussian, gaussian)
#> Setting order = 'mu' for mixtures of the same family.
prior <- c(
prior(normal(0, 2), Intercept, dpar = mu1),
prior(normal(0, 2), Intercept, dpar = mu2)
)
fit <- brm(bf(result ~ 1),
data = test,
family = mix,
prior = prior,
chains = 1)
#> Compiling the C++ model
#> Start sampling
#>
#> SAMPLING FOR MODEL '7b517954ea2a96396e16f5dec78c77c2' NOW (CHAIN 1).
#> Chain 1:
#> Chain 1: Gradient evaluation took 0.000286 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.86 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1:
#> Chain 1:
#> Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
#> Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
#> Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
#> Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
#> Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
#> Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
#> Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
#> Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
#> Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
#> Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
#> Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
#> Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
#> Chain 1:
#> Chain 1: Elapsed Time: 1.04822 seconds (Warm-up)
#> Chain 1: 1.2112 seconds (Sampling)
#> Chain 1: 2.25942 seconds (Total)
#> Chain 1:
summary(fit)
#> Family: mixture(gaussian, gaussian)
#> Links: mu1 = identity; sigma1 = identity; mu2 = identity; sigma2 = identity; theta1 = identity; theta2 = identity
#> Formula: result ~ 1
#> Data: test (Number of observations: 650)
#> Samples: 1 chains, each with iter = 2000; warmup = 1000; thin = 1;
#> total post-warmup samples = 1000
#>
#> Population-Level Effects:
#> Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
#> mu1_Intercept 0.20 0.02 0.17 0.24 1479 1.00
#> mu2_Intercept 9.04 0.30 8.43 9.62 1197 1.00
#>
#> Family Specific Parameters:
#> Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
#> sigma1 0.40 0.01 0.38 0.43 1241 1.00
#> sigma2 3.56 0.22 3.17 4.02 958 1.00
#> theta1 0.74 0.02 0.71 0.78 1201 1.00
#> theta2 0.26 0.02 0.22 0.29 1201 1.00
#>
#> Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
#> is a crude measure of effective sample size, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
pp_check(fit)
#> Using 10 posterior samples for ppc type 'dens_overlay' by default.
Created on 2019-07-16 by the reprex package (v0.3.0)
I am struggling to work out how to specify this prior, and grateful for any suggestions.
Thanks
What you are looking for is not a prior but a likelihood (specified via the family
argument). In this case, a mixture of two poisson distributions. Use mixture(poisson, poisson)
instead of mixture(gaussian, gaussian)
.
Thanks - I have tried that, but get an error message:
Error: Family 'mixture(poisson, poisson)' requires response greater than or equal to 0.
But didn’t you create data according to a poisson distribution or is your actual data structured differently?
You are correct. In trying to get working, I had changed my test data to see if that works. With this specified as:
library(tidyverse)
library(brms)
#> Loading required package: Rcpp
#> Registered S3 method overwritten by 'xts':
#> method from
#> as.zoo.xts zoo
#> Loading 'brms' package (version 2.8.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
set.seed(1234)
test <- data.frame(result = c(rpois(500, 0.25), rpois(150, 10)))
test %>%
ggplot() +
geom_bar(aes(x=result))
mix <- mixture(poisson, poisson)
#> Setting order = 'mu' for mixtures of the same family.
prior <- c(
prior(cauchy(0,5), Intercept, dpar = mu1),
prior(normal(0,2), Intercept, dpar = mu2)
)
fit <- brm(bf(result ~ 1),
data = test,
family = mix,
prior = prior,
chains = 1)
#> Compiling the C++ model
#> Start sampling
#>
#> SAMPLING FOR MODEL '46b36408f139ed6444538aceb75a3d47' NOW (CHAIN 1).
#> Chain 1:
#> Chain 1: Gradient evaluation took 0.000296 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.96 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1:
#> Chain 1:
#> Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
#> Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
#> Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
#> Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
#> Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
#> Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
#> Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
#> Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
#> Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
#> Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
#> Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
#> Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
#> Chain 1:
#> Chain 1: Elapsed Time: 4.23044 seconds (Warm-up)
#> Chain 1: 3.74466 seconds (Sampling)
#> Chain 1: 7.9751 seconds (Total)
#> Chain 1:
summary(fit)
#> Family: mixture(poisson, poisson)
#> Links: mu1 = log; mu2 = log; theta1 = identity; theta2 = identity
#> Formula: result ~ 1
#> Data: test (Number of observations: 650)
#> Samples: 1 chains, each with iter = 2000; warmup = 1000; thin = 1;
#> total post-warmup samples = 1000
#>
#> Population-Level Effects:
#> Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
#> mu1_Intercept -1.47 0.10 -1.66 -1.29 286 1.00
#> mu2_Intercept 2.26 0.03 2.21 2.31 1060 1.00
#>
#> Family Specific Parameters:
#> Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
#> theta1 0.77 0.02 0.73 0.80 441 1.00
#> theta2 0.23 0.02 0.20 0.27 441 1.00
#>
#> Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
#> is a crude measure of effective sample size, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
pp_check(fit)
#> Using 10 posterior samples for ppc type 'dens_overlay' by default.
Created on 2019-07-16 by the reprex package (v0.3.0)
This now works. Thanks so much for your help - has really helped my understanding here. Much appreciated.