Hi all,
I’ll start with a brief biological motivation. I’m working with spatial transcriptomics data, which is composed of “spots”, each with an x and y coordinate location and a measured, nonnegative value for gene expression. Gene expression is known to follow a negative-binomial distribution (see the supplemental material of this paper for a discussion of such), and i’m interested in modeling the gene-specific mean and variance of expression using brms
(or just Stan if utilizing brms
ends up being impossible).
The issue is, i’m unsure of how to account for the spatial component in my model formulation, as I’m not very experienced in spatial analysis. The data are in long format with five columns: one each for gene name (denoted gene
), spot ID (spot
), x location (x
), y location (y
), and integer expression (count
). The response variable of interest is gene expression, and I’d like to fit a distributional regression model that models the parameters of the negative-binomial distribution \mu and \theta as functions of spatial location with a random effect for each gene. The goal is thus for each gene to sample from the posterior distributions of (and then summarize) the mean \left(\mu_g\right) and variance \left( \mu_g + \frac{\mu_g^2}{\theta_g}\right) of expression while pooling information across genes and accounting for spatial relationships between spots.
I read this bit of brms
documentation relating to the spatial conditional autoregressive (CAR) structure, but it appears that the dimension of the adjacency matrix must match that of the data to be modeled, which is not the case with my data as there are many observations (spots) for each gene. Next, I tried using a Gaussian process on x and y via the following formula (ignoring modeling \theta for simplicity), but the model failed to converge and any attempts to tinker with it resulted in errors:
model_formula <- brms::bf(count ~ 1 + g(x, y) + (1 | gene))
brms_fit <- brms::brm(model_formula,
data = count_data, # assume count_data contains the relevant dataset
family = brms::negbinomial(link = "log", link_shape = "log"),
chains = 3L,
iter = 1000L,
warmup = 250L,
thin = 5L,
cores = 3L,
threads = 2L,
normalize = FALSE,
silent = 2,
backend = "cmdstanr",
algorithm = "meanfield",
stan_model_args = list(stanc_options = list("O1")),
seed = 312)
Lastly, I considered using the following formula (once again ignoring \theta for simplicity), but I’m unsure if using random slopes for the location coordinates accurately accounts for the spatial relationships between spots:
model_formula <- brms::bf(count ~ 1 + x + y + (1 + x + y | gene))
brms_fit <- brms::brm(model_formula,
data = count_data, # assume count_data contains the relevant dataset
family = brms::negbinomial(link = "log", link_shape = "log"),
chains = 3L,
iter = 1000L,
warmup = 250L,
thin = 5L,
cores = 3L,
threads = 2L,
normalize = FALSE,
silent = 2,
backend = "cmdstanr",
algorithm = "meanfield",
stan_model_args = list(stanc_options = list("O1")),
seed = 312)
I imagine this type of model might be too complex to fit using just brms
, but I’m very open to using plain Stan as well. In addition, if anything I’ve posited is incorrect please correct me ! I’m also happy to provide R code to load and pre-process the data if it’s helpful, but I figured I’d omit that for now to keep things simpler as this is already a rather complex question.
Thanks much,
jack