Error with custom family

I want to fit a model using a custom beta family function. The function called bbeta = beta(a,b)*m, where beta(a,b) is the beta distribution with parameters an and b, and m is a scaling parameter. Here is the custom_family() code:

Define the custom beta family function

bbeta ← custom_family(
“bbeta”,
dpars = c(“a”, “phi”, “mu”), # Changed “m” to “mu”
links = c(“logit”, “logit”, “identity”),
lb = c(0, 0, 0),
ub = c(Inf, Inf, Inf),
)

stan_density_vec ← "
real bbeta_lpmf(real a, real phi, real mu) {
return dbeta(a, phi) * mu;
}
"

I’m training the model on a dataset d with outcome variable PU and independent variable time. PU has an upper bound that varies exponentially with time, and that is a relationship that the model needs to estimate (through parameter m). Thus, m~intercept +exp^(time*c)

Here is the code to fit the model:

Define the model formula

model_formula ← PU ~ time

Define the model

model ← brm(
data = d,
family = bbeta,
formula = model_formula,
cores = 4,
chains = 4
)

When I run it, I get the following error message:
Compiling Stan program…
Error in stanc(file = file, model_code = model_code, model_name = model_name, :
0

Semantic error in ‘string’, line 37, column 16 to column 51:

A returning function was expected but an undeclared identifier ‘bbeta_lpdf’ was supplied.

I’m running R 4.3.2 on MacOs Ventura 13.6.1. Stan_2.32.3 and latest brms package.

Any help would be appreciated.

Thank you,
Charles

Hi, welcome to the Stan discourse! Take a look at the brms custom family vignette here, with particular attention to the parts dealing with constructing and passing the stanvars argument. Does that clear things up?

https://cran.r-project.org/web/packages/brms/vignettes/brms_customfamilies.html

A bit, thanks. I’m not entirely sure what this does but I’m working on it! I guess I’m not sure where my formulas should be and how to link the variables and parameters of my model to the parameters in these functions. Here’s a summary. I’m regressing variables PU and time:

PU_t~bbeta(a,b,mu_t))

where bbeta is a custom scaled version of the Beta distribution:

bbeta~Beta(a,b)*mu_t

mu is dependent on a variable in my dataset, “time”:
mu_t~intercept + exp(c*time)

Thus, the parameters that need to be estimated by the model are a, b (i.e. beta’s parameters), c (a parameter of the exponential function), and the intercept of the mu formula.

Where in the code would the mu equation go? Inside the stanvars argument?

In addition, I’m unclear how when I define the priors for all these parameters I can index them so that brm and Rstan know which prior is for which variable. Is it using the name of the parameters, or order in which they are inputed?

Define the custom beta family function

bbeta ← custom_family(
“bbeta”,
dpars = c(“a”, “b”, “mu”),
links = c(“logit”, “logit”, “identity”),
lb = c(0, 0, 0),
ub = c(Inf, Inf, Inf),
)

stan_funs ← "
real bbeta_lpdf(real a,real b, real mu) {
return beta_lpdf(y|a, b) * mu;
}
"
stanvars ← stanvar(scode = stan_funs, block = “functions”)

Define the model formula

model_formula ← PU ~ exp(1+time)

Define the model

model ← brm(
data = d,
family = bbeta,
formula = model_formula,
cores = 4,
chains = 4,
stanvars = stanvars
)

Hear my code so far. I want mu to be the dependent variable in my model. When I run this, I get the following error:

Compiling Stan program…
Error in stanc(file = file, model_code = model_code, model_name = model_name, :
0

Semantic error in ‘string’, line 5, column 19 to column 20:

Identifier ‘y’ not in scope.

It looks like you didn’t call y in your custom function, bbeta_lpdf(real a,real b, real mu), but tried to return a value, beta_lpdf(y|a, b) * mu, that requires it for its calculations. You need your custom function to both call in the data, y, and the parameters, a, b, and mu. Something like: bbeta_lpdf(vector y, real a, real b, real mu) depending on what container y has (vector was chosen for demonstration purposes only)

1 Like

Thank you Corey, very helpful. I’m slowing making progress in understanding what I’m doing here. Maybe starting with a custom family is not the easiest way to learn BRMS :)

Am I reading these functions correctly?

real bbeta_lpdf(real y, real a,real b, real mu) {
This defines a new _lpdf function with parameters y, a, b and mu

return beta_lpdf(y|a, b) * mu;
What the function does. In this case, take beta_lpdf at value y, conditioned on parameters a and b, and this multiply by mu. However, I now realize that this line is wrong or at least incomplete, as it no longer integrates to 1.

I tried the following line instead:
return beta_lpdf( (y/mu) | a, beta);

but I’m getting Exception: Exception: beta_lpdf: Random variable is nan, but must be in the interval [0, 1] when the model runs.

As a note, my scientific interest lies in the variable mu, a scaling factor the beta distribution that allows observations to be distributed with a fixed upper bound but also be greater than 1. In particular, I’m interested in how mu varies with my predictor variable time. Is adding this scaling parameter to the beta distribution that I’m trying to achieve here.

That is saying that y/mu has no numerical value. Are you sure the y being supplied to the function is not already nan?

1 Like

That was the problem. Some of my y had values too high and when exponentiated returned Inf values.

I fixed the lower bound and upper bound in the custom function part.

However, now I’m getting messages such as

Exception: beta_lpdf: Random variable is -4.35264, but must be in the interval [0, 1] (in 'anon_model

It makes me think that I may not understand the syntax of the _lpdf functions, and that the error may come from the y/mu part. My understanding is that y is the value of the outcome the model is trying to predict (PU in my case). And mu is a parameter that needs to be estimated.

Could the problem be in that the first term of beta_lpdf includes a parameter?

Beta distributions have support (a domain) over (0,1), so a beta pdf (and, by extension, cdf) can only accept values from 0 to 1 as the random variable. This indicates that for at least some of your data y/mu ∉ (0,1). Also, when you supply y/mu as the input to the pdf, you are actually saying that the beta distribution (when the correct shape parameters are sampled) should look like the distribution of y/mu. That implies that y/mu must be values from 0 to 1.

1 Like

Yes. I tried to scale y to y/mu prior to feeding it to the beta_lpdf function, along with
stan_funs ← "
real bbeta_lpdf(real y, real a,real beta, real mu) {
real y_scaled = y / mu;
if (y_scaled <= 0 || y_scaled >= 1) {
return(0.0);
}
return beta_lpdf(y_scaled | a, beta);
}
"

The model runs. But now I’m running into issues with the post-processing.

expose_functions(model, vectorize = TRUE)

log_lik_bbeta ← function(i, prep) {
mu ← brms::get_dpar(prep, “mu”, i = i)
beta() ← brms::get_dpar(prep, “beta”, i = i)
a ← brms::get_dpar(prep, “a”, i = i)
y ← prep$data$Y[i]
bbeta_lpdf(y, a, beta, mu)
}

When I test this using loo(model), Error in beta() ← brms::get_dpar(prep, “beta”, i = i) :
invalid (NULL) left side of assignment

Thank you for your help. With your comments I’m getting a grasp of how brms works for custom functions.

From a modeling standpoint, I’m not sure it make sense to select a beta distribution for your y/mu if any of the values are not supported. It will let the program run, but it will create computational problems and, more importantly, it means that the set of y/mu does not actually have the form of a beta distribution. Could you talk a bit more about what you are trying to model?

Also, I’m pretty sure it will result in competition between mu and the shape parameters such that the posterior is very degenerate and will never converge. Before doing leave one out, did you look at whether or not your parameters are actually converging, did you look at the bivariate marginal densities between the parameter samples, and did you use similated data to see if you are actually recovering the parameters from the generative model?

1 Like

I can confirm that the model doesn’t run well and chains aren’t converging.

What I want to model is a distribution that has an upper limit, with the upper limit increasing over time. These are linear measurements of phenotypes that are great than 1. So my idea was to model this as a scaled beta distribution beta(a,b)*m, where m is a scaling parameter >1. The reasoning being that this is a distribution with an upper limit. It wasn’t clear to me how modeling this using some truncated distribution would work since I would want the point of truncation as a parameter in the model. Any pointers of ideas on modeling this is welcome, I’m not wedded to this approach.

An alternative approach I explored was to write a custom version for bbeta=beta(a,b)m function that returns the pdf of the function.

fbbeta_lpdf ← function(x, mu, phi, m) {

Define unscaled PDF function

unscaled_pdf ← function(x) {
x^(phi * mu - 1) * (1 - x)^((1 - mu) * phi - 1) / beta(phi * mu, (1 - mu) * phi)
}

Calculate integral of unscaled PDF

integral_unscaled_pdf ← integrate(unscaled_pdf, lower = 0, upper = 1)$value

Calculate normalization factor

normalization_factor ← 1 / (m * integral_unscaled_pdf)

Apply scaling and normalization

scaled_pdf ← m * normalization_factor * unscaled_pdf(x)

Return the scaled PDF

return(scaled_pdf)
}

and then

stan_funs ← "
real bbeta_lpdf(real y, real a,real beta, real mu) {
return fbbeta_lpdf(y| a, beta, mu);
}
"
stanvars ← stanvar(scode = stan_funs, block = “functions”)

But I get this error: A returning function was expected but an undeclared identifier ‘fbbeta_lpdf’ was supplied.

It seems that there may be a way to declare functions in Stan that I’m not capturing here.