Regression of proportions

Hello Community

I have a two layer hierarchical model of a regression (x * beta). With a second layer I am trying to put a prior on the proportions (beta [0,1]) calculated for many “replicates”

for(s in 1:S) beta[s] ~ dirichlet( to_vector( beta_hat[s] ) );

Now since I have a covariate (X) for each replicate, I am trying to regress the predicted probabilities:

beta_hat = exp(X * alpha);
for(s in 1:S) beta[s] ~ dirichlet( to_vector(beta_hat[s]) );

My questions are:

  • is this a sensible this to do?
  • Should I use log-link regression since hyperparameter of a Dirichlet must be >0?

Thanks a lot

Below the full model.

data{
	int G;                                                           // Number of marker genes
	int P;                                                           // Number of cell types
	int S;                                                           // Number of mix samples
	int<lower=1> R;                                                           // Number of covariates (e.g., treatments)
	vector<lower=0>[G] y[S];                                                  // Mix samples matrix
	matrix<lower=0>[G,P] x;                                // Array of covariates
  matrix[S,R] X;                                                    // Array of covariates hierarchical

}
transformed data{
	vector[G] y_log[S];                                                  // Mix samples matrix

	for(s in 1:S) y_log[s] = log(y[s] + 1);

}
parameters {
	simplex[P] beta[S]; // coefficients for predictors
	real<lower=0> sigma; // error scale
	matrix[R,P] alpha;    // Prior to a

}
transformed parameters{
	vector[G] y_hat[S];
	for(s in 1:S) y_hat[s] = x* beta[s];
}
model {

	matrix[S,P] beta_hat;

	// Regression
	sigma ~ normal(0, 0.01);

	// Dirichlet prior on proportions
	alpha[1] ~ normal(0, 0.2);
	if(R > 1) to_vector(alpha[2:R]) ~ normal(0, 0.2);

	// Regression
       for(s in 1:S) y_log[s] ~ student_t(8, log(y_hat[s]), sigma);

	// Hypothesis testing
	beta_hat = exp(X * alpha);
	
  for(s in 1:S) beta[s] ~ dirichlet( to_vector(beta_hat[s]) );

}

P.S. I had to simplify the model I hope to not have left any incongruence

Thanks

I am realizing that Dirichlet might not be the right choice as the hyperparameters do not translate in an absolute distibution

for example

Dirichlet(2,3,4,5)

could have similar shape of

Dirichlet(1,2,3,4)

Any feedback whether, there is a way to fix this relativity or use an alternative hyperprior? For example Beta regression, GLM with sigmoid link function.

[EDIT]

I found this old conversation from Bob useful

https://groups.google.com/forum/#!topic/stan-users/Naz034bBg8Q

I’d just use:

transformed data {
  vector<lower =0>[K] alpha_phi; 

parameters {
  simplex[K] phi;  // prior mean
  real<lower = 0> kappa;  // prior "count"

model {
  theta ~ dirichlet(kappa * phi);
  kappa ~ ... something like pareto or exponential ...
  phi ~ dirichlet(alpha_phi);
1 Like

You might want to look up “beta regression”—looks like that’s what you’re trying to do. If I recall from the RStanArm discussions, it’s challenging to fit.

Thanks,

I am using beta regression, inspired by your message

data { 
	int P;
  int<lower=1> S; 
	int R;
  matrix<lower=0,upper=1>[P,S] y; 
  matrix[S,R] x; 
} 
parameters { 
  real<lower=0> phi; 
  matrix[P,R] alpha; 
} 
model { 

    matrix[P,S] mu; 
    mu = inv_logit( alpha * x'); 
    to_vector(y) ~ beta(to_vector(mu) * phi, (1 - to_vector(mu)) * phi); 

} 

Indeed it fails to converge for components of the multiple correlation with little slope

And it converge if I force the phi (error) parameter bigger than a threshold, then all the proportion predictions are kind of squeezed around the convergence function

How many samples do you have (S)?

This github code helped me out when I was trying to get my first beta regression to converge.

Hello,

in my case I have 80 samples discritixed in three covariate points

image

Now the posterior predictions of the proportions for the beta regression is quite different

image

This happens when the proportions of this group predicted by this simplex are actually small because the simplex is multiplied by the proportion of the whole group. Therefore can happen that

[ (proportions in simplex PARAMETER) * counts_DATA ] * Proportion_group_DATA 

+

[ (proportions in background simplex DATA) * counts_bachground_DATA ]  * (1- Proportion_group_DATA )

When Proportion_group < 0.01 then the information from the data for the regression decrease (i assume) and the chain stop to converge UNLESS I fix phi parameter to 3 for example.

[EDIT]

Yes now I got it, of course the means (dots in the plot) get squished around the regression line, but have huge variance around that mean, things that in the plot we don’t observe.

The mystery for me is why for some group of cell types (the one above is one example) the chain converge and for others the chains do not converge even if I have all priors in place

data{
	int G;                                                           // Number of marker genes
	int P;                                                           // Number of cell types
	int S;                                                           // Number of mix samples 
	int<lower=1> R;                                                           // Number of covariates (e.g., treatments)
  matrix[S,R] X;                                                    // Array of covariates
	matrix<lower=0>[S,G] y;                                                  // Mix samples matrix
	matrix<lower=0>[G,P] x;    
	
	// Background cell types
	vector<lower=0, upper=1>[S] p_ancestor;
	matrix[S,G] y_bg_hat;    
}
transformed data{
	matrix<lower=0>[S,G] y_log;           // Mix samples matrix     
	real<lower=0> phi_adj;    // trick for Prior to err in beta regression to avoid to go too low (high error) and misconverge

	y_log = log(y+1);
	phi_adj = round(-log10(mean(p_ancestor)));

print(mean(p_ancestor));
print(phi_adj);
}
parameters {
	simplex[P] beta[S]; // coefficients for predictors
	real<lower=0> sigma; // error scale
	matrix[R,P] alpha;    // Prior to a
	real<lower=0> phi;
}
transformed parameters{
	matrix[S,P] beta_adj;
	matrix[S,G] y_hat; 

	
	for(s in 1:S) beta_adj[s] = to_row_vector(beta[s]) * p_ancestor[s];
	y_hat = beta_adj * x';
	y_hat = y_hat + y_bg_hat;
	
}
model {

	matrix[S,P] beta_hat;

	// Regression
	sigma ~ normal(0, 0.1);
	
	// Dirichlet prior on proportions
	to_vector(alpha) ~ normal(0, 0.1);
	phi ~ normal(0, 1);
	
	// Regression
  to_vector(y_log) ~ student_t(8, log(to_vector(y_hat)), sigma);   
  
  // Hypothesis testing
	beta_hat = X * alpha;
	beta_hat = inv_logit(beta_hat);
	for(s in 1:S)   beta[s] ~beta(beta_hat[s] * (phi + phi_adj), (1 - beta_hat[s]) *  (phi + phi_adj));  // normal(beta_hat[s], phi); //

}
generated quantities{
	vector[P] generated_beta[S];
	matrix[S,P] beta_hat;

  // Hypothesis testing
	beta_hat = X * alpha;
	beta_hat = inv_logit(beta_hat);
	for(s in 1:S) for(p in 1:P) generated_beta[s,p] = beta_rng(beta_hat[s,p] *  (phi + phi_adj), (1 - beta_hat[s,p]) *  (phi + phi_adj));
}

As I was thinking, the uncerstanties around those mean estimates (relative proportions; being simplex multiplied by the group proportion, for example being 1%) are huge, that’s why lacking information the mean collapse around the prior

The only peculiar thing I see it’s

beta = theta * 10;               // "matt trick" if desired

Which seems a bit strange, because multiplies also the intercept by 10

If x ~ normal(0, 1) then 10 * x ~ normal(0, 10). It’s just a way of scaling so that adaptation doesn’t have to work so hard. But you should only do that if you really think x is going to be unit normal, not just to add a broad prior.

Regarding convergence, make sure you’re indexing your parameters correctly. As I understand your code, each element of the alpha matrix is a parameter estimate of the relationship between X and the mean of some proportion of interest beta_hat.

However you declare only a single variance parameter phi. Unless you have some reason to constrain the variance of your proportion estimates across the 15 groups you plot to be the same, I’d expect the number of phis to be the same as the number of alphas, which should be the same as the number of proportions your modeling. Again it seems like you’re modeling 15 proportions in the top two plots (but in the last figure seems like only 5).

One question I have of this regression model is that the parameter phi seems to be the inverse of the variance of the distribution. Usually we put a prior on variance to “push” it close to 0, in this case

phi ~ cauchy(0,5);

Has the effect to push the variance to be high instead of low.

would not be better to do (?)

beta ~ beta(beta_hat / phi, (1 - beta_hat) / phi); 
phi ~ cauchy(0,5);

Thanks

Depends on what you want the prior to be. A half-cauchy isn’t going to push anything to be high because the slope of the log density is the other way; it will be consistent with large values, though. We have a Pareto(1, 1.5) in the manual following the fifth chapter of BDA (takes p(phi) proportional to 1 / phi^(2.5).

I would not mind exploring a regression using Dirichlet having less degrees of freedom, and maybe being more stable for little data and/or faster (from 80 second without beta regression I go to 600 seconds).

However I don’t understand, how this

relates to my model. Could you please be a bit more explicit in where the regression is, whether I could use a inv_logit link function and about the parameters in you draft model?

Thanks

Yes I am using now alpha of dimension P. Well spotted, now I have a better inference, in terms of uncertanty around the proportions predicted

(The generated quantities)

I have to say that I reach convergence only if I use normal prior on phi. Cauchy or other long tail priors, lead to phi spiking to e-10 and non converging.

data{
	int G;                                                           // Number of marker genes
	int P;                                                           // Number of cell types
	int S;                                                           // Number of mix samples 
	int<lower=1> R;                                                           // Number of covariates (e.g., treatments)
  matrix[S,R] X;                                                    // Array of covariates
	matrix<lower=0>[S,G] y;                                                  // Mix samples matrix
	matrix<lower=0>[G,P] x;    
	
	// Background cell types
	vector<lower=0, upper=1>[S] p_ancestor;
	matrix[S,G] y_bg_hat;    
}
transformed data{
	matrix<lower=0>[S,G] y_log;           // Mix samples matrix     
	real<lower=0> phi_adj;    // Prior to a

	y_log = log(y+1);
	phi_adj = round(-log10(mean(p_ancestor)));

}
parameters {
	simplex[P] beta[S]; // coefficients for predictors
	real<lower=0> sigma; // error scale
	matrix[R,P] alpha;    // Prior to a
	vector<lower=0.1>[P] phi;
}
transformed parameters{
	matrix[S,P] beta_adj;
	matrix[S,G] y_hat; 
	matrix[R,P] alpha_10;  

	
	for(s in 1:S) beta_adj[s] = to_row_vector(beta[s]) * p_ancestor[s];
	y_hat = beta_adj * x';
	y_hat = y_hat + y_bg_hat;
	alpha_10 = alpha * 10;   
}
model {

	matrix[S,P] beta_hat;

	// Regression
	sigma ~ normal(0, 0.1);
		
	// Dirichlet prior on proportions
	to_vector(alpha[1]) ~ normal(0, 2);
	if(R>1) to_vector(alpha[,2:R]) ~ normal(0,1);
	**phi ~ normal(0.1, 5);** 
	
	// Regression
  to_vector(y_log) ~ student_t(8, log(to_vector(y_hat)), sigma);   
  
  // Hypothesis testing
	beta_hat = X * alpha_10;
	for(s in 1:S) for(p in 1:P) beta_hat[s,p] = inv_logit(beta_hat[s,p]);
	
	for(s in 1:S) beta[s] ~ beta(beta_hat[s] .* to_row_vector(phi), (1 - beta_hat[s]) .* to_row_vector(phi));  // normal(beta_hat[s], phi); //

}

But also with normal prior on phi the convergence partially fails

Warning messages:
1: There were 160 divergent transitions after warmup. Increasing adapt_delta above 0.98 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
2: There were 1040 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
3: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low

Hello,

an Update about the model.

I get some divergent transitions if a proportion is generally low (is a signature is not present at all, having the regression not information to be based on) (see alpha[1,1] = intercept, alpha[2,1] = slope, phi[1])

Could you guys give me an advise regarding my model?

data{
	int G;                                                           // Number of marker genes
	int P;                                                           // Number of cell types
	int S;                                                           // Number of mix samples 
	int<lower=1> R;                                                           // Number of covariates (e.g., treatments)
  matrix[S,R] X;                                                    // Array of covariates
	matrix<lower=0>[S,G] y;                                                  // Mix samples matrix
	matrix<lower=0>[G,P] x;    
	
}
transformed data{
	matrix<lower=0>[S,G] y_log;           // Mix samples matrix     

	y_log = log(y+1);

}
parameters {
	simplex[P] beta[S]; // coefficients for predictors
	real<lower=0> sigma; // error scale
	matrix[R,P] alpha;    // Prior to a
	vector<lower=0.1>[P] phi;
}
transformed parameters{

	matrix[S,G] y_hat; 

	for(s in 1:S) beta_adj[s] = to_row_vector(beta[s]) ;
	y_hat = beta_adj * x';

}
model {

	matrix[S,P] beta_hat;

	// Regression
	sigma ~ normal(0, 0.1);
		
	// Dirichlet prior on proportions
	to_vector(alpha) ~ normal(0, 1);
	phi ~ normal(0.1, 5); //
	
	// Regression
  to_vector(y_log) ~ student_t(8, log(to_vector(y_hat)), sigma);   
  
  // Hypothesis testing
	beta_hat = X * alpha;
	for(s in 1:S) for(p in 1:P) beta_hat[s,p] = inv_logit(beta_hat[s,p]);
	
	for(s in 1:S) beta[s] ~ beta(beta_hat[s] .* to_row_vector(phi), (1 - beta_hat[s]) .* to_row_vector(phi));  

}

Thanks