I am still slowly moving along learning PyStan and shifting our models over. I am trying to understand how the generated quantities work. I have a simple linear model of ocean fish catch plotted against ocean net primary production (NPP). Previous work shows that this can be modeled roughly with a simple linear model. I also want to see how well the model does. I added a generated quantities section to the model.
I log transformed the data. Which thinking about this might be my issue with the model.
When I run fit.plot([‘y_pred’]) that values look weird to me. They range 4 to 6 while the original data goes from -8 to 2.
@HeltonMaciel Thanks! I think I have it working. If I understand correctly I was only generating one y or a given x. With the for loop I get all the y’s for a given x.
I also went back to just a simple linear model and then built up the model to a generated quantity model.
In the raw data y goes from 0.000305 to 9.3 (mean ten year fish catch, tons per km2).
y_pred (mean) only covers part of this range 0.39 to 2.63. The max y_pred covers the upper range but the min y_pred goes negative. I found that weird since you can’t have a negative tonnage.
This leaves me wondering if I didn’t set up the model or priors correctly. Or the original data is just not that great.
That’s why I was suggesting simulating data from the model, then fitting it. Then the y_pred should look right. It’s the only way to separate model misspecification from implementation issues.
Ok, cool. I think I understand all this. The y_tilde range with the model above gives a y_tilde mean between ~0.4 to ~2.8 . There are still negative numbers in the y_tilde traceplot.