Fit a beta distribution that captures correlation in the fitted parameters?

I’m trying to figure out how to fit a beta distribution to data, and to capture the parameters’ correlation.

(My ultimate goal is successive updating of the posteriors with new data. And at each update I want to be able to specify a correlated prior based on the correlation of the previous iteration’s posterior.)

A) What I have been able to do (with two models):

  1. Generate fake data from a beta distribution
  2. Fit a model to that data
  3. Extract (a, b) samples from the fitted model
  4. Fit a second model to the (a, b) samples that captures the samples’ correlation

B) What I’d like to be able to do (with a single model):

  1. Generate fake data from a beta distribution
  2. Fit a (single) model to that data that captures the parameters’ correlation and that allows me to specify a prior on that correlation.

In particular, my two-model (A) code (indirectly) captures the parameters’ strong positive correlation of 0.83 (in the second model within (A)):

              mean          sd
aMu      4.3672460 0.014254404
bMu      2.2089633 0.006574057
Rho[1,2] 0.8313109 0.015791045

… whereas my single-model (B) code doesn’t:

               mean        sd
a         4.3207410 0.2802322
b         2.1848314 0.1363592
Rho[1,2] -0.1489613 0.5907234

Presumably (?) this is because in (B) I’m fitting a single (a, b) pair to all the data (albeit with uncertainty about a and b), for which parameter correlation doesn’t even make sense.

I’m not sure if my single-model “I’d like to” goal (B) makes sense, so I’m hoping for some feedback/guidance.

Here’s my (A) code:

library(rstan)
options(mc.cores = parallel::detectCores())

## A.1) Generate fake data from a beta distribution 

N = 400
beta_a = 4
beta_b = 2
betaSamps = rbeta( n=N, shape1=beta_a, shape2=beta_b )
dataListA1 = list(samp=betaSamps, N=N)

mA1 = "
data{ 
    int N; 
    vector[N] samp;
} 
parameters{ 
    real<lower = 0> a; 
    real<lower = 0> b; 
} 
model{ 
    samp ~ beta(a, b); 
    a ~ normal(2, 2); 
    b ~ normal(2, 2); 
} 
"

## A.2) Fit a model to that data

mA1Fit = stan( model_code=mA1, data=dataListA1 )

## A.3) Extract (a, b) samples from the fitted model

modelSamps = extract(mA1Fit)

## A.4) Fit a second model to the (a, b) samples that captures the samples' 
#       correlation

# No need for the full 4000 samples, so use just N
dataListA2 = list( aSamp=modelSamps$a[1:N], bSamp=modelSamps$b[1:N], 
                   N=N )

mA2 = "
data{
    int N;
    vector[N] bSamp;
    vector[N] aSamp;
}
parameters{
    real aMu;
    real bMu;
    real<lower=0> bSigma;
    real<lower=0> aSigma;
    corr_matrix[2] Rho;
}
model{
    bMu ~ normal( 3 , 2 );
    aMu ~ normal( 4 , 2 );
    bSigma ~ normal( 1, 1);
    aSigma ~ normal( 1, 1);
    Rho ~ lkj_corr( 1 );
    {
    vector[2] AB[N];
    vector[2] MU;
    MU = [ aMu , bMu ]';
    vector[2] Sigma;
    Sigma = [ aSigma , bSigma ]';
    for ( j in 1:N ) AB[j] = [ aSamp[j] , bSamp[j] ]';
    AB ~ multi_normal( MU , quad_form(Rho , diag_matrix(Sigma) ) );
    }
}
"

fitMA2 = stan( model_code=mA2, data=dataListA2 )
fitMA2Summary = summary( fitMA2, pars = c("aMu", "bMu", "Rho[1,2]") )$summary
fitMA2Summary = fitMA2Summary[, c("mean", "sd")]
print(fitMA2Summary)

And here’s my unsuccessful single-model (B) code:

## B.1) Generate fake data from a beta distribution
#       (Use data from step A1)

## B.2) Fit a model to that data that captures the parameters’ correlation and 
#       that allows me to specify a prior on that correlation. 

mB = "
data{
  int N;
  vector[N] samp;
}
parameters{
  real<lower = 0> a;
  real<lower = 0> b;
  corr_matrix[2] Rho;
}
model{
  samp ~ beta(a, b);
  Rho ~ lkj_corr( 1 );
  {
    vector[2] AB;
    vector[2] MU;
    MU = [ 4 , 3 ]';
    vector[2] Sigma;
    Sigma = [ 1 , 2 ]';
    AB = [ a , b ]';
    AB ~ multi_normal( MU , quad_form(Rho , diag_matrix(Sigma) ) );
  }
}
"

fitMB = stan( model_code=mB, data=dataListA1 )
fitMBSummary = summary( fitMB, pars=c("a", "b", "Rho[1,2]") )$summary
fitMBSummary = fitMBSummary[,c("mean", "sd")]
print(fitMBSummary)

You can avoid the need for capturing the correlation by reparameterising the beta distribution and instead fitting the sum a+b and the mean a/(a+b). This orthogonolizes the parameters and removes correlation. Within the model, you can easily derive a and b from the sum and mean in the transformed parametera block, or as local parameters in the model block. This approach should also be more efficient for Stan to sample the parameters.

2 Likes

Yay! Many, many thanks, @LucC! Very much appreciated. I’m back on my path now. With your advice I found this page on reparameterizations in the User’s Guide, and wrote the following code:

library(rstan)
options(mc.cores = parallel::detectCores())

## Generate fake data from a beta distribution 

set.seed(100)

N = 400
beta_a = 4
beta_b = 2
betaSamps = rbeta( n=N, shape1=beta_a, shape2=beta_b )
dataListA1 = list(samp=betaSamps, N=N)

## Fit a model to the data

m1 = "
data { 
    int N; 
    vector[N] samp;
} 
parameters {
    real<lower=0> phi;
    real<lower=0> lambda;
}
transformed parameters { 
    real<lower=0> alpha = lambda * phi; 
    real<lower=0> beta = lambda * (1 - phi); 
} 
model {
    phi ~ beta( 1 , 1 ); // mean
    lambda ~ pareto( 0.1, 1.5 ); // concentration
    for (n in 1:N)
      samp[n] ~ beta( alpha , beta );
}
"

fitM1 = stan( model_code=m1, data=dataListA1 )
fitM1Summary = summary( fitM1, pars=c("phi", "lambda", "alpha", "beta"))$summary
fitM1Summary = fitM1Summary[, c("mean", "sd")]
print(fitM1Summary) 

Which yields the two equivalent fitted parameterizations:

            mean          sd
phi    0.6692769 0.008622793
lambda 6.4071006 0.424559329
alpha  4.2888425 0.300086972
beta   2.1182580 0.140278902

My ultimate goal is Bayesian updating, so I used the same reparameterization trick to express phi’s beta distribution in terms of its mean and concentration. The code below gets the same result as the code above, but will allow me to use a previous iteration’s fitted phi mean (e.g., 0.669) as the next iteration’s prior on phi’s mean, via the settable (fixed) parameter phiP. And with an “appropriate” lambdaP, which I need to experiment with. (Phew; that’s complicated! I think I’ve got it right…)

set.seed(100)

## Specify a prior on phi, the beta distribution's mean (in terms
#  of phi's mean and concentration). Note: phiP = 0.5 and lambdaP = 2 
#  is equivalent to alphaP = 1 and betaP = 1, a flat prior.
dataListA1$phiP = .5
dataListA1$lambdaP = 2

m2 = "
data {
    int N;
    vector[N] samp;
    real<lower=0> phiP; // mean for phi
    real<lower=0> lambdaP; // concentration for phi
}
parameters {
    real<lower=0> phi;
    real<lower=0> lambda;
}
transformed parameters {
    // Reparameterize the beta distribution on the data
    real<lower=0> alpha = lambda * phi;
    real<lower=0> beta = lambda * (1 - phi);
    // Reparameterize the beta distribution on phi, the *mean*
    // of the data's beta distribution.
    real<lower=0> alphaP = lambdaP * phiP;
    real<lower=0> betaP = lambdaP * (1 - phiP);
}
model {
    phi ~ beta( alphaP , betaP ); // mean
    lambda ~ pareto( 0.1, 1.5 ); // concentration
    for (n in 1:N)
      samp[n] ~ beta( alpha , beta );
}
"

fitM2 = stan( model_code=m2, data=dataListA1 )
fitM2Summary = summary( fitM2, pars=c("phi", "lambda", "alpha", "beta"))$summary
fitM2Summary = fitM2Summary[, c("mean", "sd")]
print(fitM2Summary) 
2 Likes

For your Bayesian updating, I don’t think your model specification is correct. I would expect that you would want to specify four additional inputs in the data block: two parameters for each of the prior distributions of phi and lambda, each derived from the marginal posterior distributions of your previous calibration.

As phi is between zero and one, a beta prior would probably be appropriate. However, the two shape parameters of that distribution should bebbased on two additional input items in the data block. You would have to derive these inputs from the posterior marginal distribution of phi from your previous model calibration. Your code is now just plugging in point estimates of the posterior for a and b of the previous calibration. I’m contrast, you need to use a representation of the uncertainty of the posterior of phi from you previous calibration. Fit a beta distribution to the posterior samples of the first calibration and you its estimated parameters as prior parameters in your Bayesian update.

For updating lambda , I’m not sure what an appropriate choice of prior distribution might be. If your marginal posterior for lambda from a previous calibration can be well approximated by a pareto distribution (though I doubt that), use that. The parameters of that prior distribution would have to be informed by another two additional input elements in the data block, in a similar way as I described for phi above.

1 Like

Many thanks, again, @LucC! Your comments have been super-helpful to me in sorting through all this. I was aware that I was indeed supplying point estimates from the previous posterior, which of course neglects that posterior’s uncertainty. But I hadn’t begun at all to think through how I might start accounting for it. So your comments are huge help!

As an aside, part of my meta-goal is not “straight” Bayesian updating, but rather to have some “forgetting” of prior information via artificial floors on the posterior parameters’ sd’s, as they become the new priors. That is, I’m not necessarily intending for my successive priors to fully capture the previous posteriors’ precise distributions (which will tend to become tighter as more and more data is incorporated via updates). That said, I very much want to understand how any potential adjustments in the priors impact the successive posteriors.

Anyway, if I understand your suggestion, you’re proposing that I move phiP and lambdaP into my model and make them both functions of two new fixed parameters each (e.g., a mean and a shape for phiP and a mean and a shape for lambdaP), which I’d learn from the previous posterior. And which I might learn by fitting a distribution to samples from the posterior, vs. necessarily (somehow) relying more directly on, e.g. these values:

            mean          sd
phi    0.6692769 0.008622793
lambda 6.4071006 0.424559329

Do I have that right? Or even in the ballpark? Again, many thanks! (Lots to think about!)

Yeah I think you got the gist of it. I’m not sure what you mean with moving phiP and lambdaP into the model, so just for clarity:

Where you first model (the very first analysis) has the following lines:

model {
    ...
    phi ~ beta( 1 , 1 ); // mean
    lambda ~ pareto( 0.1, 1.5 ); // concentration
    ...
}    

For a subsequent Bayesian updating these would have to be:

model {
    ...
    phi ~ beta( phi_a, phi_b ); // mean
    lambda ~ some_appropriate_distribution( lambda_location, lambda_scale ); // concentration
    ...
}    

and the data block would have to include these four new hyperparameters phi_a, phi_b, and the two hyperparameters for the prior for lambda. As you say, the values of these four parameters would have to be derived from the posterior samples of phi and lambda of the previous analysis, with or without “forgetting” some of the information. For the beta prior for phi, “forgetting” information can be achieved by reducing the sum phi_a + phi_b (i.e., reducing the two parameters by equal proportions). For lambda, the approach would depend on the choice of prior distribution.

Edit: the choice of how you derive the values of the four hyperparameters from the previous posterior is entirely up to you. You could fit a distribution to the samples of phi and lambda each. Or you could calculate the parameter values based on the mean and sd of the previous posterior (probably quicker but also cruder). E.g., for a beta distribution, if you know the mean and variance (or sd) you can easily derive the two shape parameters by juggling around the equations for mean and variance.

1 Like

@LucC: got it! I am now so far ahead of where I was before your first reply; thank you!