Trying to use partial pooling in a Beta model

JLC · September 1, 2021, 4:10pm

I am building a non-linear model with the beta_proportion distribution and am having some difficulty specifying partial pooling/hierarchical structure.

The unpooled model fits okay, but is lacking some flexibility and not capturing some of the observations.

I have specified that one as:

data {
  int<lower=1> N;  // total number of observations
  vector[N] prop;  // response variable
  int<lower=1> N_Times; // Number of times in time series
  matrix[N, N_Times] X1;
  matrix[N, N_Times] X2;
  vector[N] X3;
  int<lower = 0, upper = 1> run_estimation; // evaluate the likelihood?
  }
parameters {
  real<lower=0> a; 
  real<lower=0> kappa;  // precision 
}
model {
vector[N] mu;
    for (n in 1:N) {
      mu[n] = (sum(row(X1, n) .* (row(X2, n))) * a)  / X3[n];
    }
    if(run_estimation==1){ // Evaluate likelihood 
       prop ~ beta_proportion(mu, kappa);
    }
    // priors
a ~ normal(5,3);
}
generated quantities {
        vector[N] prop_pred;
        row_vector[N] mu_pred;
        for (n in 1:N) {
                mu_pred[n] = sum(row(X1, n) .* (row(X2, n)) * a) / X3[n];
        }
        for (n in 1:N) {
        if(mu_pred[n] > 1) prop_pred[n] = 1; // An exception for prior predictive checks if response is out of range
        else if(mu_pred[n] < 0) prop_pred[n] = 0;
        else
                prop_pred[n] = beta_proportion_rng(mu_pred[n], kappa);
                }
}

A prior predictive check on this model looks reasonable with most of the mass below 0.10.

I tried to create a partial-pooling model as follows:

data {
  int<lower=1> N;  // total number of observations
  vector[N] prop;  // Observed outcome
  int<lower=1> N_Grp;  // Number ofgroups
  int<lower = 0, upper = N_Grp> Grp[N]; // group index
  int<lower=1> N_Times; // Number of times in time series
  matrix[N, N_Times] X1;
  matrix[N, N_Times] X2; // 
  vector[N] X3;
  int<lower = 0, upper = 1> run_estimation; // evaluate the likelihood?
  }
parameters {
  vector<lower = 0, upper = 1>[N_Grp] alpha; //Vector of group intercepts
  real<lower=1> a; 
  real<lower=0> kappa_Grp;  // precision for each group
  real<lower=0> kappa;  // Population-level precision 
  real mu_Grp; // Group-level location parameter
}
model {
// conditional location parameter
 vector[N] mu;
  // linear combination
  for (n in 1:N) {
      mu[n] = alpha[Grp[n]] + (sum(row(X1, n) .* row(X2, n))  / X3[n]) * a;
  }
// priors 
  kappa_Grp ~ gamma(5,5);
  kappa ~ gamma(5,5);
  a ~ normal(5,3);
// hyper-priors
  mu_Grp ~ normal(0, 1);
// level-2 likelihood
  alpha ~ beta_proportion(mu_Grp, kappa_Grp);
  if(run_estimation==1){ // Evaluate likelihood 
// level-1 likelihood
  prop ~ beta_proportion(mu, kappa);
  }
}
generated quantities {
    vector[N] prop_pred;
    row_vector[N] mu_pred;
        for (n in 1:N) {
                mu_pred[n] = alpha[Grp[n]] + (sum(row(X1, n) .* row(X2, n))  / X3[n]) * a;
        }
        for (n in 1:N) {
        if(mu_pred[n] > 1) prop_pred[n] = 1; // An exception for prior predictive checks if response is out of range
        else if(mu_pred[n] < 0) prop_pred[n] = 0;
        else
                prop_pred[n] = beta_proportion_rng(mu_pred[n], kappa);
                }
}

But, a prior predictive check on this model creates an undesired U shape with most of the mass on values near zero and values near 1.

Have I specified the structure incorrectly? Or is there something I can do to re-gain the desired expectation of predictions?

JLC · September 1, 2021, 8:10pm

Another attempt, that maybe improves things a little, although it runs much slower.

data {
  int<lower=1> N;  // total number of observations
  vector[N] prop;  // Observed outcome
  int<lower=1> N_Grp;  // Number of groups
  int<lower = 0, upper = N_Grp> Grp[N]; // Group index
  int<lower=1> N_Times; // Number of times in time series
  matrix[N, N_Times] X1;
  matrix[N, N_Times] X2; // 
  vector[N] X3;
  int<lower = 0, upper = 1> run_estimation; // evaluate the likelihood?
  }
parameters {
  real alpha; 
  real<lower=1> a; 
  real<lower=0> kappa;  // Population-level precision 
  real<lower=0> tau_Grp;  // sd for each group
  vector[N_Grp] mu_Grp; // Group-level location parameter
}
model {
// priors 
  alpha ~ normal(0, 0.1);
  a ~ normal(5,3);
  kappa ~ gamma(5,5);
  tau_Grp ~ normal(0, 0.1);
  mu_Grp~ normal(0, tau_prov);
 if(run_estimation==1){ // Evaluate likelihood 
for (n in 1:N) {
  prop[n] ~ beta_proportion(alpha + mu_Grp[Grp] + ((sum(row(X1, n) .* row(TX2, n))  /X3[n]) * a), kappa);
 }
  }
}
generated quantities {
    real prop_pred[N];    
        for (n in 1:N) {
            prop_pred = beta_proportion_rng(alpha + mu_Grp[Grp] + ((sum(row(X1, n) .* row(X2, n))  / X3[n]) * a), kappa);
                }
}

Does this seem like more reasonable approach to varying intercepts?

martinmodrak · September 12, 2021, 8:37pm

Hi,
the standard (which does not mean necessarily better) way to code this model would be to keep all your predictors, including varying intercepts, unconstrained (I.e without bounds) and only after you add all those unconstrained terms, transform them via inv_logit to a value that lies between 0 and 1 and can be used as a mean for the beta_,proportion distribution. This obviously changes the interpretation of the model coefficients. Would such a transformation make sense for your use case?

Best of luck with your model!

JLC · September 13, 2021, 8:20pm

Thank you, @martinmodrak!

I made some changes and moved to a non-centered parameterization, which helped a lot.

data {
        int<lower=1> N;  // total number of observations
        vector[N] y;  // response variable
        int<lower=1> N_Grp;  // number of groups
        int<lower=1> Grp[N];  // Group indicator
        int<lower=1> N_Times; // Number of times in time series
        matrix[N, N_Times] X1;
        matrix[N, N_Times] X2; 
        vector[N] X3;
}
parameters {
        real beta;  // population-level effect
        real<lower=0> kappa;  // precision parameter
        vector<lower=0>[1] sd_grp;  // Group standard deviation
        vector[N_Grp] mu_grp[1];  // standardized group means
}
transformed parameters {
        vector[N_Grp] grp_int;  // 
        grp_int = (sd_grp[1] * (mu_grp[1])); // Group intercepts
}
model {
        vector[N] mu;
                for (n in 1:N) {
                mu[n] = grp_int[Grp[n]] + ((sum(row(X1, n) .* row(X2, n))  / X3[n]) * beta);
                }
        // likelihood 
        y ~ beta(inv_logit(mu) * kappa, (1 - inv_logit(mu)) * kappa);
        // priors 
        beta ~ normal(10,10);
        kappa ~ gamma(0.01, 0.01);
        sd_grp ~ student_t(3, 0, 2.5);
        mu_grp[1] ~ std_normal();
}
generated quantities {
        vector[N] log_lik;
        vector[N] y_pred;
        vector[N] mu_pred;
                for (n in 1:N) {
                mu_pred[n] = grp_int[Grp[n]] + ((sum(row(X1, n) .* row(X2, n))  / X3[n]) * beta);
                }
                for (n in 1:N) {
                y_pred[n] = beta_rng(inv_logit(mu_pred[n]) * kappa, (1 - inv_logit(mu_pred[n])) * kappa);
                log_lik[n] = beta_lpdf(y_pred[n] | inv_logit(mu_pred[n]) * kappa, (1 - inv_logit(mu_pred[n])) * kappa);
                }
}

Topic		Replies	Views
Partially pooled beta binomial model Modeling	9	1813	April 12, 2019
Partial pooling in generated quantities? Modeling	5	385	February 4, 2021
Hierarchical partial pooling for repeated binary trials Modeling specification , rstanarm	2	500	December 15, 2020
Bias when using partial pooling model to estimate standard deviation? Modeling	1	470	June 8, 2018
Partial-pooling of correlation (or covariance) matrices? Modeling specification	8	2004	March 1, 2022

Trying to use partial pooling in a Beta model

Related topics