Hyperparameter: prior gets ignored, but setting hard value leads to non-convergence

Hi all,

I’m struggling with a model that uses one-dimensional random walks to produce time series, one for each region in the world. These time series are then used to predict/model public opinion measures from those regions in different years.

My problem is with the variability of the time series, i.e. the sizes of the “steps” in the random walk. Currently, I’m using a non-centered parameterization to govern this. Here are some snippets:

model{
region_effect_raw[,1] ~ normal(0, region_sigma);
  for(t in 2:TT){
    region_effect_raw[,t] ~ normal(0, 1);
  }
}

transformed parameters {
  for(r in 1:R){
    region_effect[r,] = cumulative_sum(region_effect_raw[r,]) * region_step;
  }
}

So first I extract the individual “steps” (called region_effect_raw) in the random walk for each region from a standard normal, and then those are transformed into one random walk per region by (1) taking their cumulative sum, and (2) multiplying that by the “region_step” hyperparameter.

As you can see, the “region_step” parameter is there to set a prior on the sizes of these steps, and as far as I can tell this should be equivalent to:

model{
region_effect_raw[,1] ~ normal(0, region_sigma);
 for(t in 2:TT){
   region_effect_raw[,t] ~ normal(0, region_step);
 }
}

transformed parameters {
 for(r in 1:R){
   region_effect[r,] = cumulative_sum(region_effect_raw[r,]);
 }
}

Now, on to the problem. I have a very tight prior on “region_step”, because I don’t think regions will vary all that much over time (I’m more interested in extracting stable differences between regions). But no matter how tight I make this prior, it seems to have no influence on the posterior. Instead, the model picks a value that way out on the right tail of the prior, and thereby insists on allowing the regions to take huge steps from year to year. For example, if I set the following prior, which reflects my actual prior beliefs nicely:

model{
   region_step ~ gamma(1, 16); //...sd of region random walks
 //note! second parameter is rate, not scale, so the larger this parameter, the smaller the mean and variance
}

after sampling with god convergence I get a posterior mean of 0.244, which is wildly unlikely according to the prior. At the same time, the “region_effect_raw” parameters themselves seem to partly compensate for this by having posterior means that have a standard deviation of only 0.53, instead of an sd of 1 (my expectation).

Still, it’s very hard to imagine there would be enough year-to-year variability in the public opinion data to justify as much change over time as the model is now allowing. As a result, the time series get tossed around much more than I would like. An annoying side effect is also that the posteriors on “region_effect” become enormously broad in years where there is no data, because the model is like “omg, anything could have been happening in those years, since regions can deviate hugely from their value just one period prior!”.


I have tried tightening and tightening the “region_step” prior, but no matter what prior I set, the posterior is pretty much the same. I also tried versions of the model where “region_step” takes on a fixed value instead of being a hyperparameter. The problem there is that convergence depends totally on the value that I pick, with only one value I’ve tried leading to convergence and others to total non-convergence. It also feels a bit wrong to put a hard value on this parameter; the truth is that I of course don’t have that strong of a prior belief about it.

So, my questions are:

  • do you see any problems with my logic/code above?
  • any clue why the mode would overdo it with the size of “region_step” but then compensate with the sizes of “region_effect_raw”?
  • is there another way out than fixing the value of “region_step”, and going with whatever value leads to convergence?
1 Like

Hi,
it is always possible that there is some bug somewhere in your model. The snippets you shared appear OK to me. If the problem is not a bug, than it is possible that you are seeing a prior-data conflict. If you have a lot of data, and the data are consistent only with large differences between successive timepoints (and thus large region_step) it will overwhelm even a very narrow prior. The prior you shared is also not that narrow - i.e. a priori, there is ~2% probability for region_step > 0.244, so a modest amount of data could plausibly steer the model into that region.

So I would suggest inspecting your data to see how much year-to-year variability there actually is. Also note that if your data don’t really let you discern the random walk variability from the residuals, you could have a highly correlated posterior between the residual and region_step which would then also manifest as a wide posterior for region_step (and possibly overwhelm your prior).

Alternatively, you could see if posterior predictive checks from the fit with large region_step indicate problems.

If you confirm that you are seeing a prior-data conflict, than it might be an indication that the model needs to be modified in some way.

Does that make sense?

Best of luck with your model!

2 Likes

Thanks so much! You answer makes sense to me, except this part:

Also note that if your data don’t really let you discern the random walk variability from the residuals, you could have a highly correlated posterior between the residual and region_step which would then also manifest as a wide posterior for region_step (and possibly overwhelm your prior).

Are you saying that region_effect_raw and region_step will have highly correlated posteriors if region_step is a quantity that can’t really be estimated well from the data? So instead of being tethered to the data, region_step will have a broad posterior which in turn will make the posterior of region_effect broad as well?

Otherwise, you are right that it is possible the data should in fact be overwhelming the prior by this much. It’s tricky to verify because the quantity that’s taking this random walk is a latent variable (it’s the ability parameter in an IRT model), so it’s hard to compare levels of variability between it and the data.

We are indeed using posterior predictive checks to check the reasonableness of the models.

In the meantime, I also experimented with tweaking other parts of the model, because I know non-convergence can of course be contagious across different parameters. I was able to improve convergence under other fixed values of region_step by quite a bit–it would be a little out of scope here to explain how, since it involves other parameters that I haven’t talked about, but if anyone is interested I can give it a go.