Many regressions: unique (stable) function for exponential and s-shape behaviour

@avehtari I agree, indeed with hierarchical prior is what I do (axcept the horseshoe becomes less effective and I could not re-tune the parameters to increase its effect, see below)

I think this might be slope == 0, plateau informed, baseline/intercept not informed and informed → 0 (in theory) with the Andrew suggestion of putting intercept and slope under the same hierarchical prior (I fail to do it for the horseshoe prior, since has so many parameters)

I want to present you the full model with results so you can have an idea.

Simulated data: 70 genes with 0 change, 30 genes with positive change.

With horseshoe prior and just linear equation exp(beta*X) I am able to normalise the data as expected, and have a good FP/FN rate. This is great.

data_sigma$nu_local = 1
data_sigma$nu_global = 45
data_sigma$par_ratio = 0.05

data_sigma$slab_df = 4
data_sigma$slab_scale = 5

functions{

  vector reg_horseshoe(
						vector zb, 
						vector beta1_z, 
						real aux1_global , 
						real aux2_global, 
						vector  aux1_local , 
						vector aux2_local , 
						real  caux, 
						real scale_global ,
						real slab_scale
						) {
    int K = rows(zb);

    // Horseshoe variables
		real tau ; // global shrinkage parameter
		vector [ K] lambda ; // local shrinkage parameter
		vector [ K] lambda_tilde ; // ’ truncated ’ local shrinkage parameter
		real c; // slab scale

	

		// Horseshoe calculation
		lambda = aux1_local .* sqrt ( aux2_local );
		tau = aux1_global * sqrt ( aux2_global ) * scale_global * 1 ;
		c = slab_scale * sqrt ( caux );
		lambda_tilde = sqrt ( c ^2 * square ( lambda ) ./ (c ^2 + tau ^2* square ( lambda )) );
		return  zb .* lambda_tilde * tau ;
  }
}
data {
    int<lower = 0> G;                   // all genes
    int<lower = 0> T;                   // tube
		int<lower=0> R;
    int<lower = 0> y[T, G];             // RNA-seq counts
		matrix[T,R] X;

		// Horseshoe
		real < lower =0 > par_ratio ; // proportion of 0s
		real < lower =1 > nu_global ; // degrees of freedom for the half -t prior	
		real < lower =1 > nu_local ; // degrees of freedom for the half - t priors
		real < lower =0 > slab_scale ; // slab scale for the regularized horseshoe
		real < lower =0 > slab_df ; // slab degrees of freedom for the regularized
	}
transformed data{
		real < lower =0 > scale_global = par_ratio / sqrt(1.0 * T); // scale for the half -t prior for tau
}
parameters {
		// Linear model
		row_vector[G] beta0;
	
		// Horseshoe
		vector [ G] beta1_z;
		real < lower =0 > aux1_global ;
		real < lower =0 > aux2_global ;
		vector < lower =0 >[ G] aux1_local ;
		vector < lower =0 >[ G] aux2_local ;
		real < lower =0 > caux ;
}
transformed parameters {


	row_vector[G] beta1 = to_row_vector(reg_horseshoe(beta1_z, beta1_z, aux1_global , aux2_global, aux1_local , aux2_local , caux, scale_global, slab_scale ));
		matrix[R, G] beta = append_row(beta0, beta1);
}
model {

		// Horseshoe
		beta1_z ~ normal (0 , 1);
		aux1_local ~ normal (0 , 1);
		aux2_local ~ inv_gamma (0.5* nu_local , 0.5* nu_local );
		aux1_global ~ normal (0 , 1);
		aux2_global ~ inv_gamma (0.5* nu_global , 0.5* nu_global );
		caux ~ inv_gamma (0.5* slab_df , 0.5* slab_df );


	// Lnear system
	beta0 ~ normal(0,5);
	sum(beta0) ~ normal(0, 0.01 * G);

	for (t in 1:T) 
		y[t] ~ multinomial( softmax(to_vector(X[t] * beta) ) );


}

image

Now I would hope that changing the link function for k * inv_logit(beta * X) I obtain similar results, with the sparsity and normalization behaving well.


functions{
	vector log_gen_inv_logit(row_vector y, row_vector k) {
		return to_vector( k - log1p_exp(- ( y ) ) );
	}

  vector reg_horseshoe(
						vector zb, 
						vector beta1_z, 
						real aux1_global , 
						real aux2_global, 
						vector  aux1_local , 
						vector aux2_local , 
						real  caux, 
						real scale_global ,
						real slab_scale
						) {
    int K = rows(zb);

    // Horseshoe variables
		real tau ; // global shrinkage parameter
		vector [ K] lambda ; // local shrinkage parameter
		vector [ K] lambda_tilde ; // ’ truncated ’ local shrinkage parameter
		real c; // slab scale

	

		// Horseshoe calculation
		lambda = aux1_local .* sqrt ( aux2_local );
		tau = aux1_global * sqrt ( aux2_global ) * scale_global * 1 ;
		c = slab_scale * sqrt ( caux );
		lambda_tilde = sqrt ( c ^2 * square ( lambda ) ./ (c ^2 + tau ^2* square ( lambda )) );
		return  zb .* lambda_tilde * tau ;
  }
}
data {
    int<lower = 0> G;                   // all genes
    int<lower = 0> T;                   // tube
		int<lower=0> R;
    int<lower = 0> y[T, G];             // RNA-seq counts
		matrix[T,R] X;

		// Horseshoe
		real < lower =0 > par_ratio ; // proportion of 0s
		real < lower =1 > nu_global ; // degrees of freedom for the half -t prior	
		real < lower =1 > nu_local ; // degrees of freedom for the half - t priors
		real < lower =0 > slab_scale ; // slab scale for the regularized horseshoe
		real < lower =0 > slab_df ; // slab degrees of freedom for the regularized
	}
transformed data{
		real < lower =0 > scale_global = par_ratio / sqrt(1.0 * T); // scale for the half -t prior for tau
}
parameters {
		// Linear model
		row_vector[G] beta0;
		row_vector[G] k;

		// Horseshoe
		vector [ G] beta1_z;
		real < lower =0 > aux1_global ;
		real < lower =0 > aux2_global ;
		vector < lower =0 >[ G] aux1_local ;
		vector < lower =0 >[ G] aux2_local ;
		real < lower =0 > caux ;
}
transformed parameters {


	row_vector[G] beta1 = to_row_vector(reg_horseshoe(beta1_z, beta1_z, aux1_global , aux2_global, aux1_local , aux2_local , caux, scale_global, slab_scale ));
		matrix[R, G] beta = append_row(beta0, beta1);
}
model {

		// Horseshoe
		beta1_z ~ normal (0 , 1);
		aux1_local ~ normal (0 , 1);
		aux2_local ~ inv_gamma (0.5* nu_local , 0.5* nu_local );
		aux1_global ~ normal (0 , 1);
		aux2_global ~ inv_gamma (0.5* nu_global , 0.5* nu_global );
		caux ~ inv_gamma (0.5* slab_df , 0.5* slab_df );


	// Lnear system
	beta0 ~ normal(0,1);

	k ~ normal(0,2);
	sum(k) ~ normal(0, 0.01 * G);

	for (t in 1:T) 
		y[t] ~ multinomial( softmax( log_gen_inv_logit(X[t] * beta, k) ) );
}



However I loose the normalization property of the horseshoe

What is the issue in your opinion?

Is this due to the correlation of intercept and upper bound when slope == 0 ?

Then I try to apply Aki suggestion

But whatever scale I choose for beta[1] (intercept), keeping all the other parameters of the horseshoe in common with the slope (for determinablity of the intercept), I get worse results.

Here pretty much the whole story.


Just to give a laugh, for desperation I started to experiment with the trick of common SD for determinability, applying in obviously the wrong way.


// trick
	for(g in 1:G) beta[,g] ~ normal(0,sigma_trick[g]); //beta[2] is already transformed from horseshoe!!
	sigma_trick ~ normal(0,.2);

And is the only was I could achieve decent results.


functions{
	vector log_gen_inv_logit(row_vector y, row_vector k) {
		return to_vector( k - log1p_exp(- ( y ) ) );
	}

  vector reg_horseshoe(
						vector zb, 
						vector beta1_z, 
						real aux1_global , 
						real aux2_global, 
						vector  aux1_local , 
						vector aux2_local , 
						real  caux, 
						real scale_global ,
						real slab_scale
						) {
    int K = rows(zb);

    // Horseshoe variables
		real tau ; // global shrinkage parameter
		vector [ K] lambda ; // local shrinkage parameter
		vector [ K] lambda_tilde ; // ’ truncated ’ local shrinkage parameter
		real c; // slab scale

	

		// Horseshoe calculation
		lambda = aux1_local .* sqrt ( aux2_local );
		tau = aux1_global * sqrt ( aux2_global ) * scale_global * 1 ;
		c = slab_scale * sqrt ( caux );
		lambda_tilde = sqrt ( c ^2 * square ( lambda ) ./ (c ^2 + tau ^2* square ( lambda )) );
		return  zb .* lambda_tilde * tau ;
  }
}
data {
    int<lower = 0> G;                   // all genes
    int<lower = 0> T;                   // tube
		int<lower=0> R;
    int<lower = 0> y[T, G];             // RNA-seq counts
		matrix[T,R] X;

		// Horseshoe
		real < lower =0 > par_ratio ; // proportion of 0s
		real < lower =1 > nu_global ; // degrees of freedom for the half -t prior	
		real < lower =1 > nu_local ; // degrees of freedom for the half - t priors
		real < lower =0 > slab_scale ; // slab scale for the regularized horseshoe
		real < lower =0 > slab_df ; // slab degrees of freedom for the regularized
	}
transformed data{
		real < lower =0 > scale_global = par_ratio / sqrt(1.0 * T); // scale for the half -t prior for tau
}
parameters {
		// Linear model
		row_vector[G] beta0;
		row_vector[G] k;
		vector<lower=0>[G] sigma_trick;
		// Horseshoe
		vector [ G] beta1_z;
		real < lower =0 > aux1_global ;
		real < lower =0 > aux2_global ;
		vector < lower =0 >[ G] aux1_local ;
		vector < lower =0 >[ G] aux2_local ;
		real < lower =0 > caux ;
}
transformed parameters {


	row_vector[G] beta1 = to_row_vector(reg_horseshoe(beta1_z, beta1_z, aux1_global , aux2_global, aux1_local , aux2_local , caux, scale_global, slab_scale ));
		matrix[R, G] beta = append_row(beta0, beta1);
}
model {

		// Horseshoe
		beta1_z ~ normal (0 , 1);
		aux1_local ~ normal (0 , 1);
		aux2_local ~ inv_gamma (0.5* nu_local , 0.5* nu_local );
		aux1_global ~ normal (0 , 1);
		aux2_global ~ inv_gamma (0.5* nu_global , 0.5* nu_global );
		caux ~ inv_gamma (0.5* slab_df , 0.5* slab_df );


	// Lnear system
	beta0 ~ normal(0,1);

	// trick
	for(g in 1:G) beta[,g] ~ normal(0,sigma_trick[g]);
	sigma_trick ~ normal(0,.2);

	k ~ normal(0,2);
	sum(k) ~ normal(0, 0.01 * G);

	for (t in 1:T) 
		y[t] ~ multinomial( softmax( log_gen_inv_logit(X[t] * beta, k) ) );
}


looking good but using nonsensical priors on the transformed parameters


A possibility is just that I implemented your model wrong, since I didn’t understand it fully. I observed that even though the plateau and intercept get confounded in my model giving a fix prior on plateau and non a hierarchical one (which is needed because I wrap the regression with softmax) eliminated divergences even with high parameter correlation. Of course the intercept becomes less defined, but when the slope == 0 I can almost ignore it in further analyses; while the plateau won’t be further analysed anyway.

The majority of the experiments are well far from the level of details of kinetics. They are mainly related to disease, and the gene interactions are likely to be much bigger in magnitude than the chemical dynamics of transcription itself. My attempt is just to not loose genes that reach a plateau just because can’t be explained by a linear function

Well the intercept indicates if the change is likely to happen early or late in cancer development. This is another nice thing of using a sigmoid like function. I want to think that a well regularised and penalised regression even more complex than linear is possible, after all is just another degree of freedom.

RNAseq that’s why I can use multinomial based regression