Trying to use partial pooling in a Beta model

I am building a non-linear model with the beta_proportion distribution and am having some difficulty specifying partial pooling/hierarchical structure.

The unpooled model fits okay, but is lacking some flexibility and not capturing some of the observations.

I have specified that one as:

data {
  int<lower=1> N;  // total number of observations
  vector[N] prop;  // response variable
  int<lower=1> N_Times; // Number of times in time series
  matrix[N, N_Times] X1;
  matrix[N, N_Times] X2;
  vector[N] X3;
  int<lower = 0, upper = 1> run_estimation; // evaluate the likelihood?
  }
parameters {
  real<lower=0> a; 
  real<lower=0> kappa;  // precision 
}
model {
vector[N] mu;
    for (n in 1:N) {
      mu[n] = (sum(row(X1, n) .* (row(X2, n))) * a)  / X3[n];
    }
    if(run_estimation==1){ // Evaluate likelihood 
       prop ~ beta_proportion(mu, kappa);
    }
    // priors
a ~ normal(5,3);
}
generated quantities {
        vector[N] prop_pred;
        row_vector[N] mu_pred;
        for (n in 1:N) {
                mu_pred[n] = sum(row(X1, n) .* (row(X2, n)) * a) / X3[n];
        }
        for (n in 1:N) {
        if(mu_pred[n] > 1) prop_pred[n] = 1; // An exception for prior predictive checks if response is out of range
        else if(mu_pred[n] < 0) prop_pred[n] = 0;
        else
                prop_pred[n] = beta_proportion_rng(mu_pred[n], kappa);
                }
}

A prior predictive check on this model looks reasonable with most of the mass below 0.10.

I tried to create a partial-pooling model as follows:

data {
  int<lower=1> N;  // total number of observations
  vector[N] prop;  // Observed outcome
  int<lower=1> N_Grp;  // Number ofgroups
  int<lower = 0, upper = N_Grp> Grp[N]; // group index
  int<lower=1> N_Times; // Number of times in time series
  matrix[N, N_Times] X1;
  matrix[N, N_Times] X2; // 
  vector[N] X3;
  int<lower = 0, upper = 1> run_estimation; // evaluate the likelihood?
  }
parameters {
  vector<lower = 0, upper = 1>[N_Grp] alpha; //Vector of group intercepts
  real<lower=1> a; 
  real<lower=0> kappa_Grp;  // precision for each group
  real<lower=0> kappa;  // Population-level precision 
  real mu_Grp; // Group-level location parameter
}
model {
// conditional location parameter
 vector[N] mu;
  // linear combination
  for (n in 1:N) {
      mu[n] = alpha[Grp[n]] + (sum(row(X1, n) .* row(X2, n))  / X3[n]) * a;
  }
// priors 
  kappa_Grp ~ gamma(5,5);
  kappa ~ gamma(5,5);
  a ~ normal(5,3);
// hyper-priors
  mu_Grp ~ normal(0, 1);
// level-2 likelihood
  alpha ~ beta_proportion(mu_Grp, kappa_Grp);
  if(run_estimation==1){ // Evaluate likelihood 
// level-1 likelihood
  prop ~ beta_proportion(mu, kappa);
  }
}
generated quantities {
    vector[N] prop_pred;
    row_vector[N] mu_pred;
        for (n in 1:N) {
                mu_pred[n] = alpha[Grp[n]] + (sum(row(X1, n) .* row(X2, n))  / X3[n]) * a;
        }
        for (n in 1:N) {
        if(mu_pred[n] > 1) prop_pred[n] = 1; // An exception for prior predictive checks if response is out of range
        else if(mu_pred[n] < 0) prop_pred[n] = 0;
        else
                prop_pred[n] = beta_proportion_rng(mu_pred[n], kappa);
                }
}

But, a prior predictive check on this model creates an undesired U shape with most of the mass on values near zero and values near 1.

Have I specified the structure incorrectly? Or is there something I can do to re-gain the desired expectation of predictions?

1 Like

Another attempt, that maybe improves things a little, although it runs much slower.

data {
  int<lower=1> N;  // total number of observations
  vector[N] prop;  // Observed outcome
  int<lower=1> N_Grp;  // Number of groups
  int<lower = 0, upper = N_Grp> Grp[N]; // Group index
  int<lower=1> N_Times; // Number of times in time series
  matrix[N, N_Times] X1;
  matrix[N, N_Times] X2; // 
  vector[N] X3;
  int<lower = 0, upper = 1> run_estimation; // evaluate the likelihood?
  }
parameters {
  real alpha; 
  real<lower=1> a; 
  real<lower=0> kappa;  // Population-level precision 
  real<lower=0> tau_Grp;  // sd for each group
  vector[N_Grp] mu_Grp; // Group-level location parameter
}
model {
// priors 
  alpha ~ normal(0, 0.1);
  a ~ normal(5,3);
  kappa ~ gamma(5,5);
  tau_Grp ~ normal(0, 0.1);
  mu_Grp~ normal(0, tau_prov);
 if(run_estimation==1){ // Evaluate likelihood 
for (n in 1:N) {
  prop[n] ~ beta_proportion(alpha + mu_Grp[Grp] + ((sum(row(X1, n) .* row(TX2, n))  /X3[n]) * a), kappa);
 }
  }
}
generated quantities {
    real prop_pred[N];    
        for (n in 1:N) {
            prop_pred = beta_proportion_rng(alpha + mu_Grp[Grp] + ((sum(row(X1, n) .* row(X2, n))  / X3[n]) * a), kappa);
                }
}

Does this seem like more reasonable approach to varying intercepts?

1 Like

Hi,
the standard (which does not mean necessarily better) way to code this model would be to keep all your predictors, including varying intercepts, unconstrained (I.e without bounds) and only after you add all those unconstrained terms, transform them via inv_logit to a value that lies between 0 and 1 and can be used as a mean for the beta_,proportion distribution. This obviously changes the interpretation of the model coefficients. Would such a transformation make sense for your use case?

Best of luck with your model!

Thank you, @martinmodrak!

I made some changes and moved to a non-centered parameterization, which helped a lot.

data {
        int<lower=1> N;  // total number of observations
        vector[N] y;  // response variable
        int<lower=1> N_Grp;  // number of groups
        int<lower=1> Grp[N];  // Group indicator
        int<lower=1> N_Times; // Number of times in time series
        matrix[N, N_Times] X1;
        matrix[N, N_Times] X2; 
        vector[N] X3;
}
parameters {
        real beta;  // population-level effect
        real<lower=0> kappa;  // precision parameter
        vector<lower=0>[1] sd_grp;  // Group standard deviation
        vector[N_Grp] mu_grp[1];  // standardized group means
}
transformed parameters {
        vector[N_Grp] grp_int;  // 
        grp_int = (sd_grp[1] * (mu_grp[1])); // Group intercepts
}
model {
        vector[N] mu;
                for (n in 1:N) {
                mu[n] = grp_int[Grp[n]] + ((sum(row(X1, n) .* row(X2, n))  / X3[n]) * beta);
                }
        // likelihood 
        y ~ beta(inv_logit(mu) * kappa, (1 - inv_logit(mu)) * kappa);
        // priors 
        beta ~ normal(10,10);
        kappa ~ gamma(0.01, 0.01);
        sd_grp ~ student_t(3, 0, 2.5);
        mu_grp[1] ~ std_normal();
}
generated quantities {
        vector[N] log_lik;
        vector[N] y_pred;
        vector[N] mu_pred;
                for (n in 1:N) {
                mu_pred[n] = grp_int[Grp[n]] + ((sum(row(X1, n) .* row(X2, n))  / X3[n]) * beta);
                }
                for (n in 1:N) {
                y_pred[n] = beta_rng(inv_logit(mu_pred[n]) * kappa, (1 - inv_logit(mu_pred[n])) * kappa);
                log_lik[n] = beta_lpdf(y_pred[n] | inv_logit(mu_pred[n]) * kappa, (1 - inv_logit(mu_pred[n])) * kappa);
                }
}

1 Like