Hi all,
I’m struggling with a model that uses one-dimensional random walks to produce time series, one for each region in the world. These time series are then used to predict/model public opinion measures from those regions in different years.
My problem is with the variability of the time series, i.e. the sizes of the “steps” in the random walk. Currently, I’m using a non-centered parameterization to govern this. Here are some snippets:
model{
region_effect_raw[,1] ~ normal(0, region_sigma);
for(t in 2:TT){
region_effect_raw[,t] ~ normal(0, 1);
}
}
transformed parameters {
for(r in 1:R){
region_effect[r,] = cumulative_sum(region_effect_raw[r,]) * region_step;
}
}
So first I extract the individual “steps” (called region_effect_raw) in the random walk for each region from a standard normal, and then those are transformed into one random walk per region by (1) taking their cumulative sum, and (2) multiplying that by the “region_step” hyperparameter.
As you can see, the “region_step” parameter is there to set a prior on the sizes of these steps, and as far as I can tell this should be equivalent to:
model{
region_effect_raw[,1] ~ normal(0, region_sigma);
for(t in 2:TT){
region_effect_raw[,t] ~ normal(0, region_step);
}
}
transformed parameters {
for(r in 1:R){
region_effect[r,] = cumulative_sum(region_effect_raw[r,]);
}
}
Now, on to the problem. I have a very tight prior on “region_step”, because I don’t think regions will vary all that much over time (I’m more interested in extracting stable differences between regions). But no matter how tight I make this prior, it seems to have no influence on the posterior. Instead, the model picks a value that way out on the right tail of the prior, and thereby insists on allowing the regions to take huge steps from year to year. For example, if I set the following prior, which reflects my actual prior beliefs nicely:
model{
region_step ~ gamma(1, 16); //...sd of region random walks
//note! second parameter is rate, not scale, so the larger this parameter, the smaller the mean and variance
}
after sampling with god convergence I get a posterior mean of 0.244, which is wildly unlikely according to the prior. At the same time, the “region_effect_raw” parameters themselves seem to partly compensate for this by having posterior means that have a standard deviation of only 0.53, instead of an sd of 1 (my expectation).
Still, it’s very hard to imagine there would be enough year-to-year variability in the public opinion data to justify as much change over time as the model is now allowing. As a result, the time series get tossed around much more than I would like. An annoying side effect is also that the posteriors on “region_effect” become enormously broad in years where there is no data, because the model is like “omg, anything could have been happening in those years, since regions can deviate hugely from their value just one period prior!”.
I have tried tightening and tightening the “region_step” prior, but no matter what prior I set, the posterior is pretty much the same. I also tried versions of the model where “region_step” takes on a fixed value instead of being a hyperparameter. The problem there is that convergence depends totally on the value that I pick, with only one value I’ve tried leading to convergence and others to total non-convergence. It also feels a bit wrong to put a hard value on this parameter; the truth is that I of course don’t have that strong of a prior belief about it.
So, my questions are:
- do you see any problems with my logic/code above?
- any clue why the mode would overdo it with the size of “region_step” but then compensate with the sizes of “region_effect_raw”?
- is there another way out than fixing the value of “region_step”, and going with whatever value leads to convergence?