Hello everyone,
I have some problems with understanding how to parametrize models. Or at least I think that’s the case. Im trying to do some examples to get better, and I’m starting with examples from BDA3. I was doing the example of kidney cancer from section 2.7, where the idea is to estimate the cancer death rate per individual county with an informed prior.
It is a poisson regression, of the form
y_i\sim\mathrm{Poisson}(n_i\theta_i)
\theta_i\sim\mathrm{Gamma}(20,430000)
I wanted to do a model that would be able to handle all the counties simultaneously, with my reasoning being that because the components are not interacting with each other (model is not hierarchical, all parameters have the same prior with known parameters).
I’ve created the following model:
data {
int N; //number of observations
int y[N]; //observed cases at each observations
vector[N] pop; //population at each observation
}
parameters {
vector[N] theta;//estimated death parameters
}
transformed parameters {
vector[N] lambda=pop .* theta;
}
model {
theta ~ gamma(20,430000);
y ~ poisson(lambda);
}
However, what became surprising for me, was that increasing the number of components (i.e. N) problem becomes worse and worse conditioned, as divergences start with 3, and with 4 there is a lot of them and traceplots look weird. The data is not the problem, as individual models for every county fit without any problems.
I’ve shared my model and relevant notebook and data on GitHub (in PyStan)- https://github.com/jerzybaranowski/stan_public/tree/master/Cancer%20example
I’d be grateful if someone could explain to me why this is wrong, I assume that I’m not getting something about how such model is specified.
1 Like
Well apparently it is me being stupid. I have not constrained parameters properly, and they wandered all over the place.
Following model works without an issue:
data {
int N; //number of observations
int y[N]; //observed cases at each observations
vector[N] pop; //population at each observation
}
parameters {
vector<lower=0,upper=1>[N] theta;//estimated death parameters
}
transformed parameters {
vector[N] lambda=pop .* theta;
}
model {
theta ~ gamma(20,430000);
y ~ poisson(lambda);
}
EDIT: Probably no one will care, but I found one more thing and I am adding it for completeness. If model is run on the entire dataset this new model encounters problems. They come from varying scale of pop variable, which ranges from 50 to 8 milion. If entire range is covered, there are divergences. Splitting dataset into potions with pop up to 200,000 and above solves the problem.
EDIT2: I’ve updated Jupiter notebook. What is very surprising, is that it is very sensitive to the split of the sample. Depending on how the population is split it can saturate tree, or not, and this can be a difference of one sample. Also behavior is different whether samples are sorted or not.
2 Likes
What if you were to log10 the pop variable before passing it to the model? Presumably, you’d have to also transform that informative prior somehow.
I eventually did use the log transform for the rate, and then almost all ‘interesting’ things disappeared. I’ve finally decided to drop it out completely from my course materials because of that. Let me know if you are interested, then I’ll make the relevant notebook public.