Challenge with Formula Syntax

I am attempting to apply a Bayesian mixed model to estimate the impact of temperature on a home’s energy consumption.

priors <- c(set_prior("lognormal(0.01494647, 0.00937701)", class = "b", coef = "Temp"))

ins_month <- brm(use ~ Temp + (1+Temp|Month.f), 
                              data = dpa.ins.home, family = gaussian(link="log"), warmup = 100, iter = 200, chains = 2, inits = "random", control = list(adapt_delta = .95, max_treedepth = 12),
                              prior = priors, cores = 2, sample_prior = TRUE))

While I am comfortable with mixed-effects regression formulation, I am struggling with the details of the brms formula syntax specifically the priors and families . With the below information am I using priors and families correctly?

energy_use has a beta distribution

> descdist(ACH_test$use)
summary statistics
min:  0.1473333   max:  10.4126 
median:  0.6733333 
mean:  1.349991 
estimated sd:  1.475063 
estimated skewness:  1.857041 
estimated kurtosis:  6.252937 

The prior for Temp is closes to a lognormal distribution

> descdist(temp$T.Impact)
summary statistics
min:  0   max:  0.1714414 
median:  0.01265037 
mean:  0.01494647 
estimated sd:  0.009377701 
estimated skewness:  2.071458 
estimated kurtosis:  12.71367 
  • Operating System: Windows 10
  • brms Version:

The formula syntax isn’t needed for priors or families. How you defined the prior there looks correct. You can check what the model used by using prior_summary() on the model.

You say that use has a beta distribution, but the model code family = gaussian(link="log")uses a gaussian family with a log link. To use the beta family, you should use Beta(). See ?brms::brmsfamily for a description of the supported families.

When using family = Beta() I get an error:
Error: Family ‘beta’ requires response smaller than 1.response needs to be under 1.
Is there a way to mitigate this? I guess I could make the use MWh instead of kWh but then wouldn’t I have issues with the coefficients being really small?

There’s a zero_one_inflated_beta family:

This is still for between 0 and 1 values. Zero_inflated_beta is just when there is also zeroes and ones in the data. I have values from .001 to 9.6 to consider.

Then you don’t have a Beta distribution. Is it possible that you meant Gamma?

1 Like

Meant Beta because of the following. Am I missing something from this result?

This plot is only suggesting that the skewness and kurtosis of your data is compatible with a Beta distribution. But clearly a Beta distribution is not appropriate for your data as the support for beta is (0,1). Instead of (or in addition to) using such heuristic checks, you may also try to think about the generative process that you think gave rise to the observed data. As a first step, it is always helpful to observe the histogram of your responses.

Oh wow thank you for that clarification regarding the plot! The observed data is the energy use of a home. When you say the generative process do you mean what creates the energy use of a home? If so how does that translate to understanding the distribution of the data?

​For example, you may think that energy use depends on many factors with unknown mean and variance, and that all factors together multiplicatively produce the outcome. In such a case, you should perhaps be inclined towards choosing a lognormal distribution.

1 Like