Hello!
I am fitting a negative binomial model with the following code:
data {
int<lower=1> n_obs;
int<lower=1> n_transcripts;
array[n_obs] int<lower=1> transcript_id;
array[n_obs] int<lower=0> count;
array[n_obs] int<lower=1> freq;
}
parameters {
array[n_transcripts] real<lower=0> mu;
array[n_transcripts] real<lower=0> odisp;
real<lower=0> odisp_mu;
real<lower=0> odisp_sigma;
}
model {
int t;
// regularization of overdispersion factors
odisp ~ lognormal(odisp_mu, odisp_sigma);
for (i in 1:n_obs) {
t = transcript_id[i];
// negative binomial sampling process
target += freq[i] * neg_binomial_2_lpmf(count[i] | mu[t], odisp[t]);
}
}
Unfortunately, this is a much simpler model relative to the one I would eventually like to fit(as I am slowly including more complexity into the model), and even now as the number of unique transcripts n_transcripts
increases, MCMC takes longer and longer. So MCMC currently does not seem like a realistic method I can use for estimation.
So I turned to using Pathfinder, but as a sanity check I wanted to ensure the posteriors looked similar enough. I started with these arguments:
## Fitting model with Pathfinder
fit_path = model.pathfinder(model_data,
num_paths = 1,
num_single_draws = 10_000)
I calculated the difference in posterior variances and then subsetted the 6 transcripts with the largest(in magnitude) differences:
This was too much of a difference for me to be comfortable so I tried changing Pathfinder’s arguments like tol_grad
, num_elbo_draws
, num_paths
, etc:
## Fitting model with Pathfinder
fit_path = model.pathfinder(model_data,
draws = 10_000,
num_paths = 10,
psis_resample = True,
tol_grad = 1e-12,
tol_obj = 1e-12,
tol_param = 1e-12)
With those arguments I get plots like this:
I’ve repeated this many times and all the runs have some transcripts with this behaviour(parameters tightly centred on the wrong value, way too uncertain, etc) and it’s enough of a problem that I would like to know if there’s anything I can do to prevent this before moving on to making my model more complex.
- How should I change Pathfinder’s hyper parameters to make it more accurate?
- My understanding was that Pathfinder was a better alternative to other variational inference(VI) methods, is that not necessarily true?
- The posterior will always be approximately gaussian in my situation, does this information make other VI methods more accurate compared to Pathfinder?
Thank you!