I am trying to construct a linear model with a poisson measurement error for both predictors and outcomes. The model currently looks like this:
X ~ Poisson(X_true)
Y_mean = X_true * beta
Y ~ Poisson(Y_mean)
beta ~ Normal(0, sigma)
X_true ~ Lognormal(a,b)
X_true, beta >= 0
where X and Y are both matrices of measured count data (single cell RNA-seq), X_true and Y_mean are parameters, a and b are guessed separately from data and sigma is a user-specified constant, currently set to 0.05 (I expect most coefficients to be close to 0).
I have two qeustions about this model:
-
Is the prior for X_true necessary? The actual distribution of X is quite complex and hard to capture in a reasonable prior - my guess is that bad prior may hinder convergence. I also noticed than in some examples on the forums as well as in McElreath’s book (Chapter 14.1.), no prior is given for parameters corresponding to measurement error on predictors.
-
The number of parameters grows quickly - it is dominated by the size of X_true - and it takes prohibitively long to fit real datasets (the smallest I am really interested in has X of dimension 20286 * 124, currently at 5% after 20hours of computation). Initializing X_true with X greatly sped up convergence for small simulated datasets, but seems to have made little difference for the real data. Since I am not really interested in the latent X_true, I tried to integrate it out, but failed. My calculus is however rusty - are there any examples of models where it is possible to integrate out non-normal measurement error on predictors? Or is it known to be impossible to integrate out the measurement error in some classes of models?
Thanks for any hints.
Below is the Stan model for completeness:
data {
int num_cells;
int num_regulators;
int num_targets;
int<lower=0> regulator_expression[num_regulators,num_cells];
int<lower=0> target_expression[num_targets,num_cells];
real<lower=0> w_prior_sigma;
real<lower=0> regulator_prior_sigma;
real<lower=0> regulator_prior_mean;
}
parameters {
matrix<lower=0>[num_regulators, num_targets] w;
matrix<lower=0>[num_cells, num_regulators] regulator_expression_mean;
}
model {
matrix[num_cells, num_targets] target_expression_mean = regulator_expression_mean * w;
for(reg in 1:num_regulators) {
regulator_expression[reg,] ~ poisson(regulator_expression_mean[,reg]);
}
for(target in 1:num_targets) {
target_expression[target,] ~ poisson(target_expression_mean[,target]);
}
for(reg in 1:num_regulators) {
regulator_expression_mean[,reg] ~ lognormal(regulator_prior_mean, regulator_prior_sigma);
}
for(target in 1:num_targets){
w[,target] ~ normal(0, w_prior_sigma);
}
}