Homogeneity of variance, also homoscedasticity, is a major modelling assumption, which when using maximum likelihood is tested based on residual plots, i.e. is learned from the data. If data are heteroscedastic, variance is typically modelled with some function in addition to the function modelling the mean.
Since transitioning to Bayesian statistics a few years ago I have not thought about homoscedasticity, until now. The reason is probably that like many people I learned using Statistical Rethinking and as far as I recall homoscedasticity is assumed in all examples.
Intuitively Iâd assume the choice to assume homoscedasticity or model heteroscedasticity is made a priori in Bayesian statistics. Is this assumption correct? Why have I not come across many models that model sigma alongside mu when heteroscedasticity is so common?
I donât know what field you work in, but my guess is that this is highly dependent on the field. In my field (linguistics), most people donât receive enough up-to-date methods training to know that location-scale models are even an option (regardless of whether youâre running Bayesian or MLE models).
That said, Iâm starting to see them more and more. I used them in my dissertation (published 2024) and there are other papers Iâve seen, mostly from the last few years, that are interested in modelling sigma as well as mu.
These same residual plots could be used as a retrodictive (aka posterior predictive) check. That is actually the idea of retrodictive checking - compare summaries of posterior model predictions to the same summaries of observations. In the Bayesian modeling framework, these plots (possibly) inform substantive changes to your model. These changes arenât based on curve fitting - i.e. tuning your model for the simple goal of better fitting the observed data - but rather on âgoing back to the drawing boardâ and thinking about how the generative process might be different than what you originally thought. If you see heteroscedasticity in the variance, then that would be something to consider in an updated model.
Not in the framework that I use. As I wrote above, a modeling workflow should involve model critique. A good description of workflow is Towards A Principled Bayesian Workflow
Iâm not sure why you havenât come across this, but even Bayesian regression tools like brms make this easy, see the introduction and first example in this vignette for an example Estimating Distributional Models with brms
My advice would be to build the model that you think best captures the data generation process, starting from simple and building up as needed. There is a nice analogy to story telling here (What's the Probabilistic Story) Modeling Glory?
Youâll also find heterogeneity of variance in change point models and hidden Markov models (also in the Userâs Guide). In general, state space models used as filters (e.g., Kalman filters) will model the variance dynamically.
One way to fudge it in a time series is to use a wide-tailed innovation distribution like a Student-t. That will allow sudden jumps which can handle periods of increased volatility better than thin-tailed models.
You also commonly see this in Poisson regressionsâthe Poisson by itself has variance equal to its mean, so any regression involving a Poisson process will also have varying variance.
Another way that heteroscedasticity gets added is with random effects. For example, in a Poisson process there might be a Gamma-distributed varying effect. That will essentially allow variance to vary because variance is equal to the mean in a Poisson (with just a gamma-distributed varying effect, you get a negative binomial, but the point is that you usually have other effects and the extra negative binomial is just to deal with varying variance).
Youâll also see it show up in Gaussian processes. They have covariance that gets fit to the data, which can vary around the function its fitting.
One obstacle to fitting more complex models is that you need more data to identify them. Thatâs usually because thereâs now two explanations for a large valueâeither the trend is going up or there is increasing volatility. Whenever that happens , the model can struggle to fit without stronger priors or lots of cross-cutting effects that can identify means well and let variances vary.
Thanks for your replies. I really appreciate the insight.
@jd_c I agree that Bayesian workflow is iterative in practice, but the decision to model variance is also a pretty strong type of prior, hence my hesitancy.
Could you please share an example of how you use residual plots in the Bayesian framework? I have never come across this.
For me it often seems reasonable to assume heteroscedasticity when I think of the data generation process, but it is usually much more âexpensiveâ. So this pretty fundamental decision on whether or not to model heteroscedasticity already involves a giant leap from simple to complex.
Are you saying to only model heteroscedasticity when I can âaffordâ to do so given my data and priors and otherwise stick with assuming homoscedasticity?
@Bob_Carpenter I had a look at the linked sections of the Stan Userâs Guide but I am talking about much simpler models here so am unsure how applicable these examples are. I may be misreading the models since they are a bit complex for me, but it seems that sigma is usually defined as real<lower=0>, which in my mind equates to assuming homogeneity of variance. sigma would have to be a vector that is either stratified by some categorical variable or modelled as a function of some continuous variable to fulfil my expectation of modelling heteroscedasticity.
This makes sense to me because Poisson only has one likelihood parameter. But what about other likelihoods commonly used in generalised linear models, such as gamma or log-normal? Here the mean and variance are not independent. Is it overkill to model mean and variance separately with such likelihoods?
Are you saying that in a simple hierarchical model where different amounts of uncertainty flow into different groups there is no need to stratify sigma by group because the heteroscedasticity between groups is already accounted for by the hyperparameter uncertainty?
This links to my final question to @jd_c above. And youâre right to point out that this type of non-identifiability in the likelihood is not just an issue in the case of gamma or log-normal where mean and variance are directly dependent. If data are sparse and priors wide it could also be an issue with a normal likelihood.
I guess I am looking for a rule of thumb when it comes to deciding whether or not to model heteroscedasticity. Given that it is almost always the most generative/realistic way to model, at least in examples I work on, does it come down to whether or not I can âaffordâ it given my data and priors? Is avoiding this complexity and likely convergence issues why the default assumption I have seen in Bayesian models is homogeneity of variance?
If heteroscedasticity is part of the known process that generates the data, then you should model it, ideally. If this is difficult/expensive/lack data then you would start with a simpler homogeneous variance model and use retrodictive checking to see if there is any retrodictive tension between the posterior predictions and the observed data that would warrant including heterogeneous variance. If the checks do indicate this, then you could subsequently include it in the updated model and re-check. In my experience, too little data to model a known behavior often just means that there is not enough resolution to distinguish the behavior in a simplified model. In this example, the retrodictive checks for the homogeneous model would look fine, even though the known process was heterogeneous. Ultimately one can never reach the generative process with finite knowledge and data, so itâs only about modeling the resolution (knowledge and data) that you have. If you have good prior information on the heterogenous behavior, then you can include that even if your retrodictive checks of the homogeneous model donât warrant it, but you just wonât learn anything from the data.
Note that there is not the retrodictive check but only a retrodictive check(s). These checks can and should be as bespoke as the model is. They could include residuals if you wanted. While handy software packages provide a suite of them, this is by no means the only kind. Imagine that youâre a carpenter and want to check a board for straightness - you would need to turn it and view it along different planes of the 3 dimensions. The same is true of model checking. You canât check everything about more complex models with a single visual, so you might come up with many different kinds of summaries to check different aspects. In my opinion, Betancourt provides the best examples of this in his various case studies Bay(es) Window , Die Another Bayes , or Randomized Time Trial for example. Iâm sure others do the same, but his are the best examples of bespoke checks that I have seen and remember off the top of my head.
From my understanding, the idea of retrodictive checks (aka posterior predictive checks) came from residual analysis. I could be wrong about the origins, but it is pretty easy to see how related they are. I donât have time to work up an example, but if you look at section 1.4.3 of the case study that I linked above Towards A Principled Bayesian Workflow he describes the relationship between the two. As mentioned above, you could easily create a retrodictive check of residuals if you wanted. I think Betancourt uses one in the Randomized Time Trial study above in models 3 and 4 for some particular aspects of the model that he is checking. His own visualization software suite has options for residuals in some of the functions he uses for retrodictive checks.
I bet others could chime in and reference other people who have done the same or maybe provide older references as to the origin, but these are the ones I remember recently off the top of my head that provide demonstrations and use residuals.