Homogeneity of variance / homoscedasticity in Bayesian models

lukaseamus · May 20, 2025, 1:23pm

Homogeneity of variance, also homoscedasticity, is a major modelling assumption, which when using maximum likelihood is tested based on residual plots, i.e. is learned from the data. If data are heteroscedastic, variance is typically modelled with some function in addition to the function modelling the mean.

Since transitioning to Bayesian statistics a few years ago I have not thought about homoscedasticity, until now. The reason is probably that like many people I learned using Statistical Rethinking and as far as I recall homoscedasticity is assumed in all examples.

Intuitively I’d assume the choice to assume homoscedasticity or model heteroscedasticity is made a priori in Bayesian statistics. Is this assumption correct? Why have I not come across many models that model sigma alongside mu when heteroscedasticity is so common?

sjp · May 20, 2025, 1:45pm

I don’t know what field you work in, but my guess is that this is highly dependent on the field. In my field (linguistics), most people don’t receive enough up-to-date methods training to know that location-scale models are even an option (regardless of whether you’re running Bayesian or MLE models).

That said, I’m starting to see them more and more. I used them in my dissertation (published 2024) and there are other papers I’ve seen, mostly from the last few years, that are interested in modelling sigma as well as mu.

jd_c · May 20, 2025, 1:56pm

These same residual plots could be used as a retrodictive (aka posterior predictive) check. That is actually the idea of retrodictive checking - compare summaries of posterior model predictions to the same summaries of observations. In the Bayesian modeling framework, these plots (possibly) inform substantive changes to your model. These changes aren’t based on curve fitting - i.e. tuning your model for the simple goal of better fitting the observed data - but rather on ‘going back to the drawing board’ and thinking about how the generative process might be different than what you originally thought. If you see heteroscedasticity in the variance, then that would be something to consider in an updated model.

Not in the framework that I use. As I wrote above, a modeling workflow should involve model critique. A good description of workflow is Towards A Principled Bayesian Workflow

I’m not sure why you haven’t come across this, but even Bayesian regression tools like brms make this easy, see the introduction and first example in this vignette for an example Estimating Distributional Models with brms

My advice would be to build the model that you think best captures the data generation process, starting from simple and building up as needed. There is a nice analogy to story telling here (What's the Probabilistic Story) Modeling Glory?

Bob_Carpenter · May 20, 2025, 3:07pm

To get you started, here’s the section of the Stan User’s Guide on stochastic volatility.

You’ll also find heterogeneity of variance in change point models and hidden Markov models (also in the User’s Guide). In general, state space models used as filters (e.g., Kalman filters) will model the variance dynamically.

One way to fudge it in a time series is to use a wide-tailed innovation distribution like a Student-t. That will allow sudden jumps which can handle periods of increased volatility better than thin-tailed models.

You also commonly see this in Poisson regressions—the Poisson by itself has variance equal to its mean, so any regression involving a Poisson process will also have varying variance.

Another way that heteroscedasticity gets added is with random effects. For example, in a Poisson process there might be a Gamma-distributed varying effect. That will essentially allow variance to vary because variance is equal to the mean in a Poisson (with just a gamma-distributed varying effect, you get a negative binomial, but the point is that you usually have other effects and the extra negative binomial is just to deal with varying variance).

You’ll also see it show up in Gaussian processes. They have covariance that gets fit to the data, which can vary around the function its fitting.

One obstacle to fitting more complex models is that you need more data to identify them. That’s usually because there’s now two explanations for a large value—either the trend is going up or there is increasing volatility. Whenever that happens , the model can struggle to fit without stronger priors or lots of cross-cutting effects that can identify means well and let variances vary.

lukaseamus · May 21, 2025, 1:10am

Thanks for your replies. I really appreciate the insight.

@jd_c I agree that Bayesian workflow is iterative in practice, but the decision to model variance is also a pretty strong type of prior, hence my hesitancy.

Could you please share an example of how you use residual plots in the Bayesian framework? I have never come across this.

For me it often seems reasonable to assume heteroscedasticity when I think of the data generation process, but it is usually much more “expensive”. So this pretty fundamental decision on whether or not to model heteroscedasticity already involves a giant leap from simple to complex.

Are you saying to only model heteroscedasticity when I can “afford” to do so given my data and priors and otherwise stick with assuming homoscedasticity?

@Bob_Carpenter I had a look at the linked sections of the Stan User’s Guide but I am talking about much simpler models here so am unsure how applicable these examples are. I may be misreading the models since they are a bit complex for me, but it seems that sigma is usually defined as real<lower=0>, which in my mind equates to assuming homogeneity of variance. sigma would have to be a vector that is either stratified by some categorical variable or modelled as a function of some continuous variable to fulfil my expectation of modelling heteroscedasticity.

This makes sense to me because Poisson only has one likelihood parameter. But what about other likelihoods commonly used in generalised linear models, such as gamma or log-normal? Here the mean and variance are not independent. Is it overkill to model mean and variance separately with such likelihoods?

Are you saying that in a simple hierarchical model where different amounts of uncertainty flow into different groups there is no need to stratify sigma by group because the heteroscedasticity between groups is already accounted for by the hyperparameter uncertainty?

This links to my final question to @jd_c above. And you’re right to point out that this type of non-identifiability in the likelihood is not just an issue in the case of gamma or log-normal where mean and variance are directly dependent. If data are sparse and priors wide it could also be an issue with a normal likelihood.

I guess I am looking for a rule of thumb when it comes to deciding whether or not to model heteroscedasticity. Given that it is almost always the most generative/realistic way to model, at least in examples I work on, does it come down to whether or not I can “afford” it given my data and priors? Is avoiding this complexity and likely convergence issues why the default assumption I have seen in Bayesian models is homogeneity of variance?

jd_c · May 21, 2025, 4:43pm

If heteroscedasticity is part of the known process that generates the data, then you should model it, ideally. If this is difficult/expensive/lack data then you would start with a simpler homogeneous variance model and use retrodictive checking to see if there is any retrodictive tension between the posterior predictions and the observed data that would warrant including heterogeneous variance. If the checks do indicate this, then you could subsequently include it in the updated model and re-check. In my experience, too little data to model a known behavior often just means that there is not enough resolution to distinguish the behavior in a simplified model. In this example, the retrodictive checks for the homogeneous model would look fine, even though the known process was heterogeneous. Ultimately one can never reach the generative process with finite knowledge and data, so it’s only about modeling the resolution (knowledge and data) that you have. If you have good prior information on the heterogenous behavior, then you can include that even if your retrodictive checks of the homogeneous model don’t warrant it, but you just won’t learn anything from the data.

Note that there is not the retrodictive check but only a retrodictive check(s). These checks can and should be as bespoke as the model is. They could include residuals if you wanted. While handy software packages provide a suite of them, this is by no means the only kind. Imagine that you’re a carpenter and want to check a board for straightness - you would need to turn it and view it along different planes of the 3 dimensions. The same is true of model checking. You can’t check everything about more complex models with a single visual, so you might come up with many different kinds of summaries to check different aspects. In my opinion, Betancourt provides the best examples of this in his various case studies Bay(es) Window , Die Another Bayes , or Randomized Time Trial for example. I’m sure others do the same, but his are the best examples of bespoke checks that I have seen and remember off the top of my head.

From my understanding, the idea of retrodictive checks (aka posterior predictive checks) came from residual analysis. I could be wrong about the origins, but it is pretty easy to see how related they are. I don’t have time to work up an example, but if you look at section 1.4.3 of the case study that I linked above Towards A Principled Bayesian Workflow he describes the relationship between the two. As mentioned above, you could easily create a retrodictive check of residuals if you wanted. I think Betancourt uses one in the Randomized Time Trial study above in models 3 and 4 for some particular aspects of the model that he is checking. His own visualization software suite has options for residuals in some of the functions he uses for retrodictive checks.
I bet others could chime in and reference other people who have done the same or maybe provide older references as to the origin, but these are the ones I remember recently off the top of my head that provide demonstrations and use residuals.

Topic		Replies	Views
Heteroscedastic model with variance as function of mean magnitude General	0	392	April 9, 2022
Hearchical model with varying sigma Modeling specification , hierarchical-model	3	495	September 28, 2021
Binomial GLMM Assumptions brms techniques , fitting-issues , stan , hierarchical-model , brms	11	1828	February 17, 2023
Models/priors for varying variances? Modeling	5	851	June 8, 2020
Uncertainty around central tendency - Bayes SE vs. Bayesin CI General	10	1653	June 15, 2017

Homogeneity of variance / homoscedasticity in Bayesian models

Related topics