GLM with growth function - problem of indetermiinability when slope = 0

Hello,

I am implementing a GLM with a monotonic link growth function, because for my data a generalised sigmoid function, includes all the possible dynamics interesting to me.

( @Bob_Carpenter this is a dummy example that isolates the issue, on real data I am trying applying this link function in the multinomial framework )

( I realise I should use beta prior on data, however I believe that for this dummy example does not make the difference )

data {
      int<lower = 0> G;                   // all genes
      int<lower = 0> T;                   // tube
      int<lower=0> R;                     //covariate
      vector<lower = 0>[G] y[T];    // RNA-seq counts
      matrix[T,R] X;                        // design matrix
}
parameters {
	matrix[R, G] beta;          //design matrix 
	vector<lower=1>[G] k;   // amplitude
	real<lower=0> sigma;    //standard deviation
	real<lower=0> k_mu;     // prior of amplitude
	real<lower=0> k_sigma;  // prior amplitude
}
model {
  for (t in 1:T) 
        y[t] ~ normal( inv_logit( X[t] * beta ) .* to_row_vector(k), sigma );

	for(r in 1:R) beta[r] ~ normal(0, 5);

	k ~ normal(k_mu,k_sigma);
	sigma ~ cauchy(0,2.5);

	k_mu ~ normal(0,2);
	k_sigma ~ normal(0,2);
}

In the case I have a slope != 0

I have good inference

In the case I have a slope == 0 the formula

 inv_logit( X[t] * beta ) .* to_row_vector(k),

has multiplicative indeterminability between beta[1] (intercept) and k. Because for a flat line the intercept is confounded with the amplitude k

Is there a good way to avoid this? For xample

  • constrain parameters to avoid banana shaped posterior
  • use another growth function more stable for Bayesian inference.

Thanks a lot.

Martin discusses this problem in his blog post on identifiability and divergences.

In the same thread, I suggest one (unprincipled and untested) way to constrain the beta parameter.

Thanks a lot,

I will try.

(@martinmodrak any news about your experience ? )

With this method I managed to resolve the banana shape, however I still have linear correlation, and it seems to result in divergent transitions. Anything more I could look at?

data {
      int<lower = 0> G;                   // all genes
      int<lower = 0> T;                   // tube
	int<lower=0> R;
      vector<lower = 0>[G] y[T];             // RNA-seq counts
      matrix[T,R] X;
}
parameters {
	matrix[R, G] beta;
	vector<lower=1>[G] k;
	real<lower=0> sigma;
	vector<lower=0>[G] prior_sd;
	real<lower=0> k_mu;
	real<lower=0> k_sigma;
}
model {
  for (t in 1:T) 
		y[t] ~ normal( inv_logit( X[t] * beta ) .* to_row_vector(k), sigma );

	beta[1] ~ normal(0, prior_sd);
	for(r in 2:R) beta[r] ~ normal(0, prior_sd);
	k ~ normal(k_mu,k_sigma);
	sigma ~ cauchy(0,2.5);

	k_mu ~ normal(0,2);
	k_sigma ~ normal(0,2);
	prior_sd ~ cauchy(0,0.1);
}

I don’t really have anything to add beyond what is in the StanCon submission. The problem with slope = 0 is already partilly mitigated with reparametrizing via mean and sd of the input to the sigmoid (X[t] * beta in your case) - see the submission for the math. However, there might be situations where there is an OK fit with both slope = 0 + fine tuning the other parameters and abs(slope) > 0 + a wider range of other parameters, resulting in ugly multimodality which I was not able to get rid off - and this is where the simplified model (in your case that would be just a linear model) comes in handy. If the sigmoid model does not improve over linear model, it should be preferred and you get rid of the multimodality.

I have to admit that this reparametrization is not clear to me. Could you make an example for my model, or expand on this?

In my case I have 20000 genes, for which many are well explained with sigmoid-like function, and the genes with slope==0 are well explained i guess with linear model. In this case I have to choose one model for all genes.

P.S. At the moment I am trying a 2 step solution, where

  1. I detect non-identifiable genes (i.e., slope ~ 0), and
  2. I fix the intercept to 0 for those genes, and rerun the whole model with more meaningful “tricks free” hierarchical priors

That’s then basically identical to my process (although I process each gene separately).

The idea is that instead of
matrix[R, G] beta;

you will have (not tested, just a sketch):

vector[G] mean_regulatory_input;
vector<lower=0>[G] sd_regulatory_input;
simplex[R] relative_weights[G];

and then

transformed parameters {
   vector[R] observed_mean_input;
   for(r in 1:R) {
        real observed_sd_input = sd(X[,r]);
        observed_mean_input[r] = mean(X[,r]);
        beta[,r] = regulation_signs[r,] .* (sd_regulatory_input ./ observed_sd_input) .* to_vector(relative_weights[,r]);
   }
   intercept = observed_mean_input - beta * observed_mean_input;
}

Note that intercept ceases to be a part of the design matrix and is treated explicitly. The math can probably be adjusted to have intercept part of the design matrix, but that is something you will have to do yourself :-). Also there are in my experience non-identifiabilities with different signs of beta (sometimes both beta[r,g] > 0 and beta[r,g] < 0 is plausible but with very different k and intercept), so the regulation signs are fixed here as data (the regulation_signs data variable).

You then have to proceed to reparametrize k with mean_observed_value where k = mean_observed_value / mean(inv_logit(beta * X)).

The whole point is that the case slope = 0 becomes sd_regulatory_input = 0 which leaves mean_observed_value more-or-less identifiable and mean_regulatory_input dominated by the prior. Also data-independent priors similar to mean_regulatory_input ~ normal(0,5) and sd_regulatory_input ~ normal(0,5); become defensible, as we know that outside [-10,10] the sigmoid provides no useful information.

1 Like

I realized I was probably a bit wrong: the non-identifiability with different signs of beta probably does not apply to your case and arises only because I handle time series and solve the differential equation over time, linear models should be identifiable in this respect.

Thanks for this.

This issue discouraged me, in favour of my “two step process”.

I will attempt your trick

I don’t see this parameter anywhere

Is this supposed to be like this?

Sorry, there is a mistake, it should be:

intercept = mean_regulatory_input - beta * observed_mean_input;

which answers both suggestions. Really just try to follow the math for the reparametrization in the blog post, it is relatively simple and I am not sure I can explain it better.

I tried to convert your model to my case (simplified).

I assume from your documentation and discourse reply:

  • your targets == my genes
  • your regulators == my covariates (I just have one in this example)
  • if you have 1 regulator/covariate: relative_weights == [1, 1, …, 1], so we can ignore
  • if the sign of the slope is not supposed to not cause non-identifiability issues, regulation_signs is omitted and sd_regulatory_input must exist in the domain -(Inf, Inf)

My basic link function with equation is

inv_logit( intercept + X * beta ) * k

Given all this my model is

data {
	      int<lower = 0> T;                   // tube colleciton of genes for a patient
	      vector<lower = 0>[T] y;          // RNA-seq counts
				vector[T] X;                      // design matrix
	}
	parameters {
	real mean_regulatory_input;
	real sd_regulatory_input;
	real<lower=0> sigma;
	real mean_observed_value;
	}
	transformed parameters {
	  real observed_sd_input = sd(X);
	  real observed_mean_input = mean(X);
		real slope = sd_regulatory_input / observed_sd_input;
		real intercept = mean_regulatory_input - slope * observed_mean_input;
		real k = mean_observed_value / mean(inv_logit(intercept + X * slope));
	}
	
	model {
	  for (t in 1:T) y[t] ~ normal( inv_logit( intercept + X[t] * slope ) * k, sigma );
	
	  mean_regulatory_input ~ normal(0,5);
		sd_regulatory_input ~ normal(0,5);
		sigma ~ cauchy(0,2.5);
		mean_observed_value ~ normal(0,5);
	}

it does not converge so well although tends to predict reasonable values for parameters.

My questions are: it is correct in your opinion? Am I missing something on the prior usage?

Slope == 0

Slope == 5

Yes, that seems basically the same as what I would do. If it still diverges, I don’t have any better ideas so far.