I am trying to fit a model to predict gene expression in experiment B based on gene expression in experiment A. The initial model is listed below:
In this model, y_ij is the expression in experiment B for gene i in sample j. x_ij is the same thing in experiment A. The global intercept (alpha) and gene-specific random intercept (alpha_i) represents the variation between the two experiments.
In reality, there are situations that one gene is highly expressed in experiment A (large value in x_i) but its expression is completely lost in experiment B (0 in y_i). This happens at a reasonably high probability. Therefore, the current model cannot capture this effect. In the current model, the only chance for this to happen is that the random intercept of a gene is very very low (close to -Inf). However, since I assume the random intercept to follow a normal distribution, this is not very likely to happen. I need a model to allow this to happen more frequently.
Therefore, I am thinking of modeling this effect in two different ways:
-
Instead of assuming the gene-specific random intercept (alpha_i) to follow a normal distribution, can I assume it to follow a bimodal distribution (a mixture of current normal distribution and also another distribution with some density around a very low value)? If yes, how to do it in brms? I have seen examples of using mixed distribution in brms but they are for the response variable, not prior.
-
Another way I can think of is to modified the current model to the model below:
Basically, I want to add an indicator function through a Bernoulli distribution. For genes that behave normally (beta_i = 1), the model will collapse to the initial model. For genes that the expression is completely lost (beta_i = 0), its random intercept will turn into a very low value. The delta in the Bernoulli distribution basically captures this gene silencing event. However, I donât know how to specify this model using brms. I tried the code below but it doesnât work.
m <- brm(
bf(response ~ 0 + Intercept + X + gene.silence*(1 | gene) + (1 - gene.silence)*very.low.value),
data = data2fit,
family = zero_inflated_negbinomial(),
iter = 2000, chains = 4, cores = 4
)
response is the y here. X here is x. gene.silence is a vector with 0 and 1. 1 means the gene behaves normally. When gene.silence=0, we will assign a very.low.value (here I use -10^6) to it. Here is the error message I got when running this code:
Normally âInterceptâ should be an internal variable that doesnât need to be in the data. I am not sure what goes wrong here.
Essentially the question boils down to how to model a random intercept that behaves differently under two different conditions. Any body have any idea?
- Operating System: Linux
- brms Version: brms_2.14.4