Beta regression - Complimentary responses


Two responses (percentages) A and B should make a sum of 100%. But they don’t as there is always a missing part M, like M = 1-Y1-Y2. Although Y1 & Y2 are measured M is not so I’m thinking that I should treat it as parameter. Both Y1 & Y2 have the same predictors and I would like to get the predictors effect on Y1, Y2 and M.

Beta regression seems to be appropriate and I have the below code.
Essentially is the code found here just replicated for two responses Y1 and Y2, and the parameter M.

I have M in the transformed parameters block where I say that M = 1-Y1-Y2

The model run quickly and smoothly but the M_res have an n_eff = 2 and all the parameters I have declared have the same estimates e.g. beta_prior[1] = Y1_beta[1] = Y2_beta[1] = M_beta[1]

data {
  int<lower=1> N;                      // sample size
  int<lower=1> K;                      // K predictors
  vector<lower=0,upper=1>[N] Y1_res;    // responses
  vector<lower=0,upper=1>[N] Y2_res;
  matrix[N,K] X;                       // predictor matrix

parameters {
  vector[K] beta_prior;                     // reg coefficients
  vector<lower=0>[K] phi_prior;             // dispersion parameter

transformed parameters{
  vector[K] Y1_beta = beta_prior * 1;                    
  vector[K] Y1_betY1_phi= phi_prior * 1;  

  vector[K] Y2_beta = beta_prior * 1;                    
  vector[K] Y2_betY1_phi= phi_prior * 1;  

  vector[K] M_beta = beta_prior * 1;                    
  vector[K] M_betY1_phi = phi_prior * 1;  

  // Leftover amount is a parameter C = 1 - A - B   
  vector<lower=0,upper=1>[N] M_res = 1 - Y1_res - Y2_res;  

model {
  vector[N] Y1_LP;                        // linear predictor
  vector[N] Y1_LP_phi;
  vector[N] Y1_mu;                        // transformed linear predictor
  vector[N] Y1_phi;   
  vector[N] Y1_A;                         // parameter for beta distn
  vector[N] Y1_B;                         // parameter for beta distn

  vector[N] Y2_A;                        
  vector[N] Y2_B;      
  vector[N] Y2_LP;                        
  vector[N] Y2_LP_phi;
  vector[N] Y2_mu;                        
  vector[N] Y2_phi; 
  vector[N] M_A;                         
  vector[N] M_B;      
  vector[N] M_LP;                        
  vector[N] M_LP_phi;
  vector[N] M_mu;                        
  vector[N] M_phi; 
  // A part
  Y1_LP = X * Y1_beta;
  Y1_LP_phi = X * Y1_betY1_phi;
  for (i in 1:N) { 
    Y1_mu[i] = inY1_logit(Y1_LP[i]);   
  for (i in 1:N) { 
    Y1_phi[i] = exp(Y1_LP_phi[i]);   
  Y1_A = Y1_mu .* Y1_phi;
  Y1_B = (1.0 - Y1_mu) .* Y1_phi;

  // B part
  Y2_LP = X * Y2_beta;
  Y2_LP_phi = X * Y2_betY1_phi;
  for (i in 1:N) { 
    Y2_mu[i] = inY1_logit(Y2_LP[i]);   
  for (i in 1:N) { 
    Y2_phi[i] = exp(Y2_LP_phi[i]);   
  Y2_A = Y2_mu .* Y2_phi;
  Y2_B = (1.0 - Y2_mu) .* Y2_phi;

  // Missing C
  M_LP = X * M_beta;
  M_LP_phi = X * M_betY1_phi;
  for (i in 1:N) { 
    M_mu[i] = inY1_logit(M_LP[i]);   
  for (i in 1:N) { 
    M_phi[i] = exp(M_LP_phi[i]);   
  M_A = M_mu .* M_phi;
  M_B = (1.0 - M_mu) .* M_phi;

  // priors
  beta_prior ~ normal(0, 1);   
  phi_prior ~ normal(0, 1);
  // likelihood
  Y1_res ~ beta(Y1_A, Y1_B);
  Y2_res ~ beta(Y2_A, Y2_B);
  M_res ~ beta(M_A, M_B);

I think a causal diagram should be like the following (X are predictors in randomized experiments)
X → Y1 → M
X → Y2 → M
So, M seems to be a collider but I hope that since I treat it as parameter I can get away of this (or might not).
Maybe a better DAG is one with the measurement error like that , and in this case it should be
M_real = 1 - Y1_real - Y2_real

Working with normal distribution seems straightforward but with Beta I have no clue how this could work out.

Any ideas of how to treat the response M as parameter and estimate the predictors effect on Y1, Y2 and M it would be great (or any other feedback).


I have not spent much time on your Stan model, but if Y1 is the complement of Y2 (save for the missing part), it seems like you would want one beta distribution to handle both Y1 and Y2, vs a separate beta for Y1 and for Y2.

I also wonder whether you should move to the Dirichlet distribution. The Dirichlet would allow you to jointly model Y1, Y2, and M while also handling the restriction that they sum to 1.


Note that if y_1 and y_2 are measured, and m = 1 - y_1 - y_2, then m is also measured, and it takes a fixed value. Treat m as transformed data, and not a transformed parameter. As @edm says, you definitely should consider a modeling framework that directly reflects the constraint that y_1 + y_2 + m = 1

1 Like

I haven’t thought of Dirichlet but indeed I should look at it.
About having the same beta model for both Y1 & Y2, do you have on your mind and example you can share?

You are right m is measured but in an indirect way. The error of m is artificial because (I guess) is the sum of the error from two different processes. From the process measuring the Y1 and another measuring Y2.
On my head goes liked that. If there is one extreme Y1 value then the m would be extreme too and I would like the Y2 to inform the model about it (from the path Y1 = 1 - m - Y2)

I don’t have a specific example. My thought was that, if M = M1 + M2, and

true Y1 = observed Y1 + M1
true Y2 = observed Y2 + M2
true Y2 = 1 - true Y1

Then you have some model that splits M into M1 and M2, and put a beta distribution on true Y1. A beta distribution on true Y2 would then be redundant.

1 Like

Does M represent some distinct third quantity (e.g. preference for a third candidate) or is it the combination of the mis-measurements of the first two quantities?

If it is a distinct third quantity, then @edm’s suggestion to use a Dirichlet distribution (where M is completely determined and pre-calculated, as pointed out by @jsocolar) seems like the best way forward.

If M is the combination of errors, then I could imagine allocating some proportion of M to each of the two observed quantities and using a Beta distribution to predict one of them (edit: as @edm just suggested as well). This is certainly possible in theory, though I’m never seen this exact idea implemented. It might also require a Jacobian adjustment. But conceptually, something like the bare-bones, no-prior model below gets at this idea.

    int<lower = 0> N;
    int<lower = 0> P;
    vector<lower = 0, upper = 1>[N] Y1;
    vector<lower = 0, upper = 1>[N] Y2;
    matrix[N,P] X;
transformed data{
    vector[N] M = 1.0 - (Y1 + Y2);
    real alpha_y1;
    vector[P] beta_y1;
    // Using one scale parameter for the beta distribution
    real<lower = 0> sigma_y1;

    // Proportion of error assigned to
    // first measurement
    vector<lower = 0, upper = 1>[N] rho;
transformed parameters{
    // Assign some proportion of error
    // to first measurement
    vector[N] Y1_c = Y1 + (M .* rho);
    vector[N] mu_y1 = inv_logit(alpha_y1 + X * beta_y1);
    // Equivalent to Y1_c ~ beta(mu_y1 * sigma_y1, (1 - mu_y1) * sigma_y1);
    Y1_c ~ beta_proportion(mu_y1, sigma_y1);
1 Like

Indeed, it’s type of error.
Seems good idea to have a parameter for the proportion of M, I’ll try that for sure!
The beta_proportion seems to be a simpler way to build the model compared to beta but I have to understand their differences. Also, I have no clue about Jacobian adjustment :\ so I hope is not the case :D