Hi PyStan Experts,
I’m starting to explore PyStan for bayesian regression model (with the help from Bayesian Models for Astrophysical Data). In short, I’m assuming variable X1 and X2 are related to y where y follows negative binomial distribution. My problem is I still haven’t figured out how to predict y using new dataset (i.e. only X1 and X2 variables are available) but with the Stan model that I have trained. Do you have any suggestion on how to solve my issue?
"""data{ int<lower=0> N; int<lower=1> K; matrix[N,K] X; int y[N]; } parameters{ vector[K] beta; real<lower=0> theta; } model{ vector[N] mu; mu = exp(X*beta); theta ~ gamma(0.001, 0.001); for (i in 1:N) y[i] ~ neg_binomial_2(mu[i], theta); } generated quantities{ real dispersion; vector[N] expY; vector[N] varY; vector[N] PRes; vector[N] mu2; vector[N] y_predict; mu2 = exp(X * beta); expY = mu2; for (j in 1:N){ y_predict[j] = neg_binomial_2_rng(mu2[j], theta); varY[j] = mu2[j] + pow(mu2[j], 2) / theta; PRes[j] = pow((y[j] - expY[j]) / sqrt(varY[j]), 2); } dispersion = sum(PRes) / (N - (K + 1)); } """
PS. I have tried to used generated quantities block, but obviously it won’t work if I only have X1 and X2 in the new dataset (since I don’t know yet the value of y for prediction stage). Indeed, when I only input X1 and X2, this error message appears
‘RuntimeError: Exception: int variable contained non-int values; processing stage=data initialization; variable name=y; base type=int (in ‘unknown file name’ at line 6)’
Thanks in advance.
Novia