Regression (Beta) from posteriors rather than single points - with stan or stanarm/brms

I would like to perform a beta regression, the only thing I have are posterior estimates (instead of single points).

I know ideally you would like a full Bayes model, but I would like to understand the best practices when this is the only possible starting point. For example instead of having a single number (e.g., 0.022) I have access to the uncertainty around it

image


Therefore the first question is: is it possible to use stanarm (or brms) beta regression with densities instead of input points?


The second question is about building a custom model (sorry for parameters name, too many “betas” around…)

An approach could be to build the model that:

  1. parametrizes all those input densities

data{
vector<lower=0, upper=1>[N] beta[S];
...
}
model{
for(s in 1:S) 
      beta[s] ~ beta(beta_b_inv[s] * beta_a[s], beta_b_inv[s] * (1 - beta_a[s]));  //mean precision parametriz.
...
}
  1. Draw a new set of parameters back from such distributions
...
for(s in 1:S) beta_hat[s] ~ beta(beta_b_inv[s] * beta_a[s], beta_b_inv[s] * (1 - beta_a[s]));
...
  1. Set the likelihood on those new parameters
transformed parameters{
 vector[S] beta_mu = inv_logit(X * alpha);

...
}
model{
...
// Likelihood
for(s in 1:S) beta_hat[s] ~ beta(beta_mu[s] * phi, (1 - beta_mu[s]) * phi);
...

However this leads to no mixing

image

Another possible approach is to directly run the regression on the (inferred) means beta_a of the data densities, instead of the whole density beta_hat i.e.,


model{
...
// Likelihood
for(s in 1:S) beta_a[s] ~ beta(beta_mu[s] * phi, (1 - beta_mu[s]) * phi);

In principle I would think the first approach is the more correct because I consider the whole uncertainty, while the second is only based on the uncertainty around the mean of such input densities.

Thanks

This produces stable estimates, but still I am wondering why the first strategy does not work, and if the second does not decrease the amount of uncertainty considered.

Pretty much everyone who’s reported here on beta regression has found it difficult to stabilize. The difference between your first and second approach as far as I can tell is that there’s a separate component for variance in the first. This is problematic, as the fit gets better as it heads off to infinity. So the only thing thta will control individual counts is a strong hierarchical prior.

I’m not sure what the usual approach is in beta regression or if it’s built into something like brms or rstanarm.

1 Like

Thanks @Bob_Carpenter.

Help me to understand one simple principle.

I understand that, however if I base each variance parameter on a posterior (that I plug in as data) defined by 2000 points, should’n such variance be defined well enough? While the fit of the model takes those variances to infinity, this contradicts those input distributions (2000 points for each) more and more, and in my mind this should be enough to counteract this effect.

While I see this clearly true if I just inputted directly the two parameter of each beta posterior.

What am I missing here?

Each of those scale/variance term only gets information from the expressions in which it appears.