Estimating fitted values from hierarchical Gaussian process model

jr-leary7 · January 9, 2025, 9:18pm

Hi all,

I’m currently working with some spatial data (spatial transcriptomics to be exact), which is a rather new area for me. My basic idea is to use a Gaussian process (GP) to estimate the mean and variance of gene expression in order to identify genes that vary significantly across Euclidean space. The data take the following form (after transforming the counts matrix to a long data.frame):

spot	gene	x	y	count
Spot 1	Gene 1	1.3	0.7	10
Spot 2	Gene 2	2.4	-1.2	0
Spot 3	Gene 3	-3.1	4.3	2
…	…	…	…	…
Spot n_s	Gene n_g	0.5	1.9	15

The model I’m using for the mean \mu and overdispersion (shape) \theta is as follows, where i = 1, \dots, n_s is the index of spots (locations), g = 1, \dots, n_g is the index of genes, the subscript 0 indicates a global intercept, \gamma is a random intercept, and \ell is the length-scale of the GP:

\begin{aligned} y_{ig} &\sim \text{NB}(\mu_{ig}, \theta) \\ \log\left(\mathbb{E}\left[\mu_{ig}\right]\right) &= \beta_0 + f(x_i, y_i) + \gamma_g \\ f(x_i, y_i) &\sim \mathcal{GP} \left(0, \sigma^2_f\exp\left(-\frac{(x_i - x_j)^2 - (y_i - y_j)^2}{2\ell^2}\right)\right) \\ \log\left(\mathbb{E}\left[\theta\right]\right) &= \alpha_0 \\ \end{aligned}

I’d like to use the formula for the Negative-binomial variance (shown below) to obtain per-gene estimates of expression variance across space given estimates of \mu_g and \theta from the model.

\text{Var}(y) = \mu \left(1 + \frac{\mu}{\theta} \right)

I posted once a while ago about how to set up the GP model - which I believe I’ve now done successfully - but I’m still unsure of how to summarize the draws from the approximate posterior in a way that will provide me with \hat{\mu}_g, since I don’t know what the estimated parameters of the GP mean.

The code I’m using to fit the model is as shown below, assuming we have a data.frame of the form shown in the table above called expr_df. I’m using brms (the most recent development version from GitHub) with the cmdstanr (also the most recent version) backend to fit the variational inference model:

model_formula <- bf(count ~ 1 + gp(x, y, scale = TRUE, k = 20, cov = "exp_quad") +  (1 | gene), 
                    shape ~ 1)
brms_fit <- brm(model_formula, 
                data = expr_df,
                family = negbinomial(link = "log", link_shape = "log"),
                chains = 1L,
                iter = 1000L,
                warmup = 250L,
                thin = 5L,
                cores = 1L,
                threads = 4L,
                normalize = FALSE, 
                silent = 2,
                backend = "cmdstanr",
                algorithm = "meanfield",
                stan_model_args = list(stanc_options = list("O1")), 
                seed = 312)

I then generate draws from the approximate posterior like so:

posterior_samples <- as.data.frame(posterior::as_draws_df(brms_fit))

I know how to extract and summarize the random intercepts for each gene along with the global intercept, but what’s confusing me is the parameters for the GP. They’re named as zgp_gpxy[m] for m = 1, \dots, 400. I assume this is because I set the number of basis functions approximating the GP as k = 20 and there are two spatial dimensions, ergo 20^2 = 400. How should I include these parameters in the computation of \mu_{g_j} = \exp(\eta_{g_j}) for each posterior draw j = 1, \dots, 1000 ?

Any help is greatly appreciated !

Thanks,
Jack

Topic		Replies	Views
Spatial hierachical negative-binomial model formulation Modeling cmdstanr , brms	6	140	December 23, 2024
Formulating a hierarchical multivariate Gaussian process using the Hilbert space approximation Modeling cmdstanr , gaussian-process	8	230	February 7, 2025
Wildly inflated mean values in negative-binomial hierarchical model Modeling fitting-issues , brms	2	91	December 17, 2024
Hierarchical spatial GP with uneven sampling coordinates Modeling rstan , techniques , gaussian-process	2	1309	February 8, 2022
Auto-grouping option in brms' gp() brms gaussian-process	5	1104	September 29, 2019

Estimating fitted values from hierarchical Gaussian process model

Related topics