Hi Professor. Thank you for your response.
In my data, the response variable is the number of cases of three different conditions: normal weight, overweight and obesity in 470 geographical units (gus). For each one of these units I have eight covariates like % of unemployment, average years of schooling, etc. Our interest is just to estimate the effect of each of this covariates over the conditions. As you said, in some gus, we have 0 cases of obesity for example and the total number of cases (sum of cases of each condition) varies from 4 to 6102.
We think that variable selecction must be done, in order to use and explain just the variables that are really needed. I already read some of your articles and apparently, using CV or information criteria is not a good choice for this purposes, but I’m not clear about the alternatives. If I stick to the path recommended by Stan warnings, I get stuck in K-fold, because CAR models are not supported, it cannot handle new locations, and the command mi() that handles with missing data is just available for continuous dependent variables. An interesting alternative that I found is this one:
to achive something like this:
assigning randomly 0s and 1s in the weight variable instead of making the folds. However, I’m still working on it.
About the posterior predictive checks, if I use the regular command, for a continuos response, they look “to good to be true” nice, which supports your overfit statement I think.
Any advice? Besides the model selection issue, does all of these problems means that this model is bad or invalid?
Thank you again for your time.