I am implementing a GLM with a monotonic link growth function, because for my data a generalised sigmoid function, includes all the possible dynamics interesting to me.
( @Bob_Carpenter this is a dummy example that isolates the issue, on real data I am trying applying this link function in the multinomial framework )
( I realise I should use beta prior on data, however I believe that for this dummy example does not make the difference )
With this method I managed to resolve the banana shape, however I still have linear correlation, and it seems to result in divergent transitions. Anything more I could look at?
I don’t really have anything to add beyond what is in the StanCon submission. The problem with slope = 0 is already partilly mitigated with reparametrizing via mean and sd of the input to the sigmoid (X[t] * beta in your case) - see the submission for the math. However, there might be situations where there is an OK fit with both slope = 0 + fine tuning the other parameters and abs(slope) > 0 + a wider range of other parameters, resulting in ugly multimodality which I was not able to get rid off - and this is where the simplified model (in your case that would be just a linear model) comes in handy. If the sigmoid model does not improve over linear model, it should be preferred and you get rid of the multimodality.
I have to admit that this reparametrization is not clear to me. Could you make an example for my model, or expand on this?
In my case I have 20000 genes, for which many are well explained with sigmoid-like function, and the genes with slope==0 are well explained i guess with linear model. In this case I have to choose one model for all genes.
P.S. At the moment I am trying a 2 step solution, where
I detect non-identifiable genes (i.e., slope ~ 0), and
I fix the intercept to 0 for those genes, and rerun the whole model with more meaningful “tricks free” hierarchical priors
Note that intercept ceases to be a part of the design matrix and is treated explicitly. The math can probably be adjusted to have intercept part of the design matrix, but that is something you will have to do yourself :-). Also there are in my experience non-identifiabilities with different signs of beta (sometimes both beta[r,g] > 0 and beta[r,g] < 0 is plausible but with very different k and intercept), so the regulation signs are fixed here as data (the regulation_signs data variable).
You then have to proceed to reparametrize k with mean_observed_value where k = mean_observed_value / mean(inv_logit(beta * X)).
The whole point is that the case slope = 0 becomes sd_regulatory_input = 0 which leaves mean_observed_value more-or-less identifiable and mean_regulatory_input dominated by the prior. Also data-independent priors similar to mean_regulatory_input ~ normal(0,5) and sd_regulatory_input ~ normal(0,5); become defensible, as we know that outside [-10,10] the sigmoid provides no useful information.
I realized I was probably a bit wrong: the non-identifiability with different signs of beta probably does not apply to your case and arises only because I handle time series and solve the differential equation over time, linear models should be identifiable in this respect.
which answers both suggestions. Really just try to follow the math for the reparametrization in the blog post, it is relatively simple and I am not sure I can explain it better.
I tried to convert your model to my case (simplified).
I assume from your documentation and discourse reply:
your targets == my genes
your regulators == my covariates (I just have one in this example)
if you have 1 regulator/covariate: relative_weights == [1, 1, …, 1], so we can ignore
if the sign of the slope is not supposed to not cause non-identifiability issues, regulation_signs is omitted and sd_regulatory_input must exist in the domain -(Inf, Inf)
My basic link function with equation is
inv_logit( intercept + X * beta ) * k
Given all this my model is
data {
int<lower = 0> T; // tube colleciton of genes for a patient
vector<lower = 0>[T] y; // RNA-seq counts
vector[T] X; // design matrix
}
parameters {
real mean_regulatory_input;
real sd_regulatory_input;
real<lower=0> sigma;
real mean_observed_value;
}
transformed parameters {
real observed_sd_input = sd(X);
real observed_mean_input = mean(X);
real slope = sd_regulatory_input / observed_sd_input;
real intercept = mean_regulatory_input - slope * observed_mean_input;
real k = mean_observed_value / mean(inv_logit(intercept + X * slope));
}
model {
for (t in 1:T) y[t] ~ normal( inv_logit( intercept + X[t] * slope ) * k, sigma );
mean_regulatory_input ~ normal(0,5);
sd_regulatory_input ~ normal(0,5);
sigma ~ cauchy(0,2.5);
mean_observed_value ~ normal(0,5);
}
it does not converge so well although tends to predict reasonable values for parameters.
My questions are: it is correct in your opinion? Am I missing something on the prior usage?