My stan code is pasted below, but I am not sure if the vectorizations are correct. Could you kindly give me some suggestions on how to improve the vectorization? Thanks a lot!
data {
int<lower=1> I; // number of points in each group
int<lower=1> J; // number of groups
int<lower=1> K; // number of coefficients
int<lower=1> N;
int<lower=1> Y[N]; // count outcome
real<lower=1> X[K,N]; // attribute matrix
int<lower=1> group_id[N];
}
parameters {
real coeff[K, J];
real mu[K];
real<lower=0> sigma2[K];
}
transformed parameters {
real log_lambda[N];
for (i in 1:N) {
log_lambda[i] = dot_product(coeff[, group_id[i]], X[, i]);
}
}
model {
target += normal_lpdf(mu |0, 2);
target += inv_gamma_lpdf(sigma2 |1, 1);
for (k in 1:K) {
target += normal_lpdf(coeff[k, ] | mu[k], sqrt(sigma2[k]));
}
target += poisson_log_lpmf(Y|log_lambda);
}
Thanks. It works, but the chains of posterior samples of the hyper parameters are different from those from the un-vectorized version (The posterior of coefficients are the same, though). For example:
for (k in 1:K) {
target += normal_lpdf(coeff[k, ] | mu[k], sqrt(sigma2[k]));
}
VS
for (k in 1:K) {
for ( j in 1:J) {
target += normal_lpdf(coeff[k, j] | mu[k], sqrt(sigma2[k]));
}
}
I assume there should be not any different except for the sampling time, right?
I recommend negative binomial rather than Poisson. Poisson has that fixed variance which can make it hard to fit data and can lead to overfitting and slow mixing.
Hm, that does sound like somethingβs wrong. Sorry for my premature green-light; I tend to get confused when it comes to indexing and multi-dimensional arrays. Maybe @martinmodrak could take a look?
Hi, I found that the difference in posteior samples might not because of the vectorization. Even if I tried to sample with the same stan code (the vectorized version) multiple times, the return trace plots for the hyper paprameters were different!
Some variation is to be expected but were they notably different? Also I wouldnβt fix too much on the traceplots, but instead on the posterior intervals and diagnostics (Rhat, n_eff)β¦
Also if you just want to use this model or similar (and your goal is not learning Stan or moving from this model to some complex beast) and are working with R, you should be able to use the brms package which will do a lot of the hard work for youβ¦
Thanks! I will check out the package. The posteriors for the hyper parameters are considerably different and the mean values as well. However, the posteriors for the coefficients remain the same.
Sorry for the delay. So the trace plots look like the region with large sigma is problematic to explore. This looks like it might be a case of Nealβs funnel and needs non-centered parametrization (search the manual and/or forums) for more details. Also generally we recommend putting half-normal prior on the sigma parameter in hierarchical models, the inverse-gamma priors on variance are known to have some undesirable properties (can put a lot of prior mass on very large variance - if I remeber correctly).