I am having trouble with the custom family for beta binomial

I wanted to try beta binominal to see if it helps with the high dispersion in my data. I had just been using beta distribution in brms. I did just like the vignette and it was able to run the model

beta_binomial2 <- custom_family(
  "beta_binomial2", dpars = c("mu", "phi"),
  links = c("logit", "log"), lb = c(NA, 0),
  type = "int", vars = "vint1[n]"
)


stan_funs <- "
  real beta_binomial2_lpmf(int y, real mu, real phi, int T) {
    return beta_binomial_lpmf(y | T, mu * phi, (1 - mu) * phi);
  }
  int beta_binomial2_rng(real mu, real phi, int T) {
    return beta_binomial_rng(T, mu * phi, (1 - mu) * phi);
  }
"

stanvars <- stanvar(scode = stan_funs, block = "functions")

brmbetabin = brm(incurrent | vint(consumed) ~ Region+food+genus +Region:food + Region:genus + genus:food + (1|sample), family = beta_binomial2, data = REdata, stanvars = stanvars)

however I was getting errors

SAMPLING FOR MODEL '56e4d552c3161029f952436f8350b609' NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.

I would like to just use the percentage data so I tried to change it so I took out the vint1 from the code and I changed it a couple of times and kept getting different errors in how it was written

beta_binomial2 <- custom_family(
  "beta_binomial2", dpars = c("mu", "phi"),
  links = c("logit", "log"), lb = c(NA, 0),
  type = "real"
)


stan_funs <- "
  real beta_binomial2_lpmf(int y, real mu, real phi, real T) {
    return beta_binomial_lpmf(y | T, mu * phi, (1 - mu) * phi);
  }
  int beta_binomial2_rng(real mu, real phi, int T) {
    return beta_binomial_rng(T, mu * phi, (1 - mu) * phi);
  }
"

stanvars <- stanvar(scode = stan_funs, block = "functions")

brmbetabin = brm(Redecimal ~ Region+food+genus +Region:food + Region:genus + genus:food + (1|sample), family = beta_binomial2, data = REdata, stanvars = stanvars)

I get different errors for the stan_funs code about the lmpf or rng

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: 

  beta_binomial_lpmf(int, real, real, real)

Available argument signatures for beta_binomial_lpmf:

  beta_binomial_lpmf(int, int, real, real)
  beta_binomial_lpmf(int, int, real, real[ ])
  beta_binomial_lpmf(int, int, real, vector)
  beta_binomial_lpmf(int, int, real, row_vector)
  beta_binomial_lpmf(int, int, real[ ], real)
  beta_binomial_lpmf(int, int, real[ ], real[ ])
  beta_binomial_lpmf(int, int, real[ ], vector)

or

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: 

  beta_binomial2_lpmf(int, real, real)

Available argument signatures for beta_binomial2_lpmf:

  beta_binomial2_lpmf(int, real, real, int)

 error in 'model59b436405ba8_file59b4ba57408' at line 65, column 54
  -------------------------------------------------
    63:   if (!prior_only) {
    64:     for (n in 1:N) {
    65:       target += beta_binomial2_lpmf(Y[n] | mu[n], phi);
                                                             ^
    66:     }
  -------------------------------------------------

I tried different switches from continuous to discrete data but have been unable to get it to run. I am new to coding and running models so any help with the rejecting of initial values error or how to modify the custom family properly to get the model to run. I am saying that my data is continuous because they they come from large count values. I have consumed food particles out of total incurrent food particles and I made them into a percentage of retention efficiency.

  • Operating System: Windows 10
  • brms Version: 2.12.0

thanks

Hi, sorry for letting your question sit for that long.

My first guess for the initial value rejected errors is that incurrent contains some negative values or values larger than consumed - in this case the log probability would be correctly calculataed as log(0), could you check this?

Best of luck with your model!

There is no negative values for incurrent but yes consumed has values lower than incurrent. My percentage for retention efficiency of food that I am looking at comes from (consumed/incurrent). should I switch the values in the model to start with
brm(consumed | vint(incurrent) ~…

Thanks

I am not sure I follow you completely - just to be on the same page - in the binomial and beta-binomial model you specify a known number of trials (coin flips, …) and the random variable is the number of successes. So if you have N trials, you - by definition - can’t have more than N successes. What would be the “trials” and “sucesses” in the case of your dataset?

I guess the number of food particles in the incurrent water is the number of trials and the successes would be the total number of consumed food particles

OK, then brm(consumed | vint(incurrent) ~ ... would probably make sense. Best of luck!

Alright, thank you