How to add customized constraints when making predictions using posterior_predict

YidaZhang0628 · April 3, 2021, 3:47pm

Hi, below is a simple hierarchical Bayes model that I want to fit. Y_ij here can be considered as the expression of gene i in cell j measured by technology 1 and x_ij is the expression measured by technology 2. Y_ij is count data so I use a ZINB distribution to model it. X_ij is relative proportion so it is between 0 and 1. Cell size is used as an offset.

The key part of the model is the alpha_i, which represents distortion between two technologies. It is simply a group-level effect, i.e., for all cells from the same gene, they are distorted in the same way. Different genes are distorted in different ways.

Eventually, what I want is that for a new gene (let’s say we measured its expression in 10 cells), I can predict its expression in technology 1 based on its expression in technology 2. And this requires proper prediction/sampling of distortion (alpha_i) for this new gene. In my case, I want the sampling to work this way:
(1) when making a prediction of this gene for all 10 cells, I want all the cells to have the same level of distortion, i.e., alpha_i to be the same for these 10 cells instead of sampling one alpha_i for each cell.
(2) if this gene is in the training data, then it means that it has an estimated alpha_i already. My second constraint is that when making prediction for a gene that is already seen in the training data, instead of randomly sample a new alpha_i for it, I want to use the same alpha_i estimated during training.
(3) In reality, the distortion of each gene may not be random. For example, longer genes may have higher alpha_i than shorter genes. There may not be a strong correlation between gene length and distortion level so I don’t plan to include this in the model. But when making predictions, I want to assign a higher alpha_i for a longer gene instead of completely random sampling alpha_i from its posterior distribution.

My first question is that can posterior_predict achieve the three constraints when making predictions for new data? In general, can I control how the group-level effect is sampled when making predictions for new data?

My understanding is that for (1) and (2), the default posterior_predict can already do it. But when I looked at the description of prepare_predictions, I feel like the default posterior_predict will sample alpha_i for each cell separately even for the same gene. And even for a gene that is seen in the training data, when making predictions, it will sample a new alpha_i instead of using the alpha_i estimated during training. Is this true? In general, how does posterior_predict work in terms of sampling new group-level effects?

I tried to look into the source code of the posterior_predict function. I found the R script of posterior_predict.R on Github but it doesn’t contain the details. I am trying to write my own code to do this. Here is a description of the procedure:

If we want to make a prediction of a new gene with 10 cells:
(1) the input is x_ij and cell size j
(2) sample alpha_i from its posterior distribution. Since I am doing it as an independent step, I can control the sampling. For example, I can use the same alpha_i during training if this gene is in the training set.
(3) randomly sample other parameters like alpha, theta, and pie from their posterior distribution.
(4) plug all these values into the model and randomly draw count from the ZINB distribution.

I don’t know much about MCMC sampling so I am doing the prediction directly based on the model. Can anyone tell me if this is correct?

Thank you!

YidaZhang0628 · April 4, 2021, 5:34pm

A quick version of the post if you don’t have time:

When making posterior predictive prediction on new data using a fitted model with group level effect, is there any way to control how the group level effect is sampled? Can I add constraint to it instead of pure random sampling from posterior predictive distribution?

@paul.buerkner

paul.buerkner · April 4, 2021, 6:08pm

some options are documented in ?prepare_predictions

perhaps that helps already.

YidaZhang0628 · April 4, 2021, 7:08pm

Thanks for the reply!

Yes I have checked the document of prepare_predictions. Based on my understanding, what is related to my question is the “sample_new_levels” parameter. If I understand correctly, “uncertainty” means that samples from the same new level are randomly selected and they can be from different existing levels (also randomly selected). On the contrary, “old_levels” means that samples from the same new level are randomly selected from the same existing level and this existing level is randomly selected from all existing levels. Is this correct? If yes, then they cannot do what I want to do here. My goal is for samples from the same new level, they are randomly selected from the same existing level but this existing level is not randomly selected from all existing levels. This existing level is selected based on some constraints. For example, if the new level is already seen in the training data, then the same group level effect estimated during training is selected as the “existing level” in prediction (not randomly sample from existing levels). I was wondering if there is any way to achieve this?

Thank you!

paul.buerkner · April 5, 2021, 6:47am

I see. This is unfortunately not yet possible in brms.

YidaZhang0628 · April 5, 2021, 3:57pm

Sure. I am thinking about implementing it myself. I saw from this post that for making predictions on new data, I can sample values for each parameter (in my case they are alpha, alpha_i, theta, and pie) from their posterior distribution respectively and then sample yrep from the ZINB distribution with sampled alpha, alpha_i, theta, and pie. Is this how the posterior_predict function works? If yes, then I can add the constraints to the first step to customize the sampling process. I tried to look for the source code of posterior_predict but the posterior_predict.R file on Github doesn’t include all the details…

paul.buerkner · April 5, 2021, 3:58pm

This is basically how the posterior_predict function works, yes.

YidaZhang0628 · April 5, 2021, 3:59pm

Thanks!

Topic		Replies	Views
Posterior_predict() in brms brms	1	901	June 14, 2020
Question regarding posterior_predict used on new observations with new covariate levels brms	1	369	August 11, 2020
Using only certain group-level effects in posterior_linpred() brms	8	920	February 25, 2019
Sample new subjects brms	2	506	August 5, 2020
Brms posterior_predict for single factor Modeling brms	3	914	March 6, 2023

How to add customized constraints when making predictions using posterior_predict

Related topics