I transformed the calculation to a linear one using “pre-calculated” vector (the function have been tested as producing right results compared to the original implementation)
functions{
real poisson_inverse_gaussian_lpmf(int y, real log_mu, real tau){
real tau_mu_2_log = log(tau) + log_mu + 0.6931472; // log(2)
real tau_mu_2_1_log = log(exp(tau_mu_2_log) + 1);
real tau_mu_2_1_sqrt_log = 1.0/2 * tau_mu_2_1_log;
vector[y + 1] p_arr_log;
// Start to create array
p_arr_log[1] = 1.0/tau * (1-exp(tau_mu_2_1_sqrt_log));
if(y>0) p_arr_log[2] = log_mu - tau_mu_2_1_sqrt_log + p_arr_log[1];
if(y>1) {
real tau_mu_2_log_tau_mu_2_1_log = tau_mu_2_log - tau_mu_2_1_log;
real two_log_mu_tau_mu_2_1_log = 2* log_mu - tau_mu_2_1_log;
for(y_dot in 2:y) {
p_arr_log[y_dot + 1] =
log_sum_exp(
tau_mu_2_log_tau_mu_2_1_log + log(1-3.0/(2*y_dot)) + p_arr_log[y_dot],
two_log_mu_tau_mu_2_1_log - log(y_dot) - log(y_dot-1) + p_arr_log[y_dot-1]
);
}
}
return p_arr_log[ y + 1 ];
}
}
data {
int<lower=0> N;
int<lower=0> G;
int<lower=0> counts[N,G];
real my_prior[2];
int<lower=0, upper=1> omit_data;
// Alternative models
int<lower=0, upper=1> is_prior_asymetric;
}
parameters {
// Overall properties of the data
real<lower=0> lambda_mu; // So is compatible with logGamma prior
real<lower=0> lambda_sigma_raw;
//real<lower=0> sigma;
real<lower=0> tau;
//real exposure_rate[N];
// Gene-wise properties of the data
vector[G] lambda_raw;
}
transformed parameters {
real<lower=0> lambda_sigma = lambda_sigma_raw;
vector[G] lambda = lambda_mu + lambda_raw * lambda_sigma;
}
model {
// Overall properties of the data
lambda_mu ~ gamma(3,2);
lambda_sigma_raw ~ gamma(6,5);
//sigma ~ normal(0,2);
tau ~ normal(0,2);
//exposure_rate ~ normal(0,1);
//sum(exposure_rate) ~ normal(0, 0.001 * N);
// Gene-wise properties of the data
lambda_raw ~ normal(0, 1);
// Sample from data
if(omit_data==0) for(n in 1:N) for(g in 1:G) counts[n,g] ~ poisson_inverse_gaussian(lambda[g], tau);
}
Although it is as optimised as I could I still get huge execution times
DIAGNOSTIC(S) FROM PARSER:
Info: left-hand side variable (name=p_arr_log) occurs on right-hand side of assignment, causing inefficient deep copy to avoid aliasing.
(negative binomial takes virtually 0 seconds to execute)
Should I abandon this probability distribution as too hard to compute, or you can see an easy way to optimise it?
Well, if you log-likelihood is just counts[n,g] ~ poisson_inverse_gaussian(lambda[g], tau); then you only need to evaluate it at the unique values of the counts and multiply the log-likelihood contributions by the number of times they occur for each lambda[g].
I will definitely try to implement it. Before that, let me ask about a possible alternative.
Considering that a summation once again is introduced, and v can be pretty big (~=100-100.000+), the calculation would become pretty expensive.
I assume is extremely impractical (and maybe stupid), but would it be possible to precompute all functions for realistic values of v < 1.000.000, e.g., (from wolfram-alpha BesselK[100-0.5, z];)
And with an array of formulas (or switch like function) stored somewhere to avoid the summation loop and speed up the calculation, comparably with negbinomial_2?
Maybe a more reasonable alternative is to precompute just this part
The matern covariances uses the Bessel function. Although having a all options,
it’s good to test 1/2, 3/2 and 5/2 and squared exponential. I’m pretty sure, if
you make it flexible you’ll run into speed problems. That’s why I decided at that
time not to use Stan’s Bessel function.
Hello @bgoodri , I am not sure if this is a feasible route, but I tried to approximate the summation part from such function starting the summation from the ‘s’ term (e.g., s = 960:1000 instead of 0:1000)
In Stan you can always take the Bayes approach, sample from an Inverse Gaussian and then
use this for the Poisson distribution as input. Then test your data and if appropriate optimize,
otherwise move on. Just my 2 cents.