Differences between posterior_smooths vs posterior_predict vs posterior_epred

I’ve previously read on this forum about the differences between predict and epred, especially with this guide (A guide to correctly calculating posterior predictions and average marginal effects with multilievel Bayesian models | Andrew Heiss and Confusion on difference between posterior_epred() and posterior_predict() in a mixed effects modelling context). In addition to these posterior predictions, there also appears to be a posterior_smooths function to predict based on s(x) terms. In short, I would like to better understand how these differ.

For more context, I have a non-linear hierarchical model with scaled/centered Shannon’s diversity values, using a gaussian family.
brm(Diversity ~ s(Year) + (1| Site)
Following which I did the above mentioned predict functions to calculate Diversity at each Year within the dataset using new data with year values for each year within the dataset. Year is a continuous variable spanning 35 years with some missing data, Diversity is a scaled and centered variable, therefore with mean 0, with values ranging from -4.23 to 1.25. Models have been validated and checked.

Values from the prediction were very different between the posterior_smooths vs the other two. I understand the differences between posterior_predict and posterior_epred resulting in similar Estimate but different Est.Error but I’m not sure why posterior_smooths caused such a large difference in both of these terms. posterior_predict values also seem to be more consistent with the graphs from conditional_effects plots. If I’m only interested in the estimated posterior medians and SDs from the smoother variable, in this case Diversity at each Year, which would be most appropriate?

Edit: This is more of a theoretical question behind the functions, but happy to provide sample data if that helps with discerning the functions. Thanks!

Can you clarify what distribution family you are using for your model? Is it gaussian()? How many Years you have in your data? What is the possible range of value for your Diversity variable? This should give us more context.

I’m using a gaussian distribution with 35 years. Diversity values have been scaled and centered so mean is 0, but ranges from -4.23 to 1.26. There’s a few other different diversity metrics used with individual models for them, but all of variables have been scaled.

I guess my question is a bit more theoretical in trying to understand what the different predictions are doing that causes the differences in posterior prediction values between posterior_smooths and posterior_predict/posterior_epred. Hope this clarifies the question a bit more! I’ve also edited the original post to update the points above.

@samuelcyk presuming that this is not a hierarchical GAM, which would make things more complicated, then posterior_epred() or fitted() returns the expected value of the response variable considering all the population effects, including the smooth term, and excluding the error term. The posterior_smooths() should return the marginal effect of the smoothed variable(s) alone.

In your case as a univariable model, setting aside the (1|Site), I expect that the result of posterior_epred() will be equivalent to posterior_smooths() plus the intercept. Visually the form, obtained from conditional_effects() or conditional_smooths() should indicate that. You’d also need to consider the response scale if you’re using a link function.

2 Likes

Thank you so much to Cameron Patrick for reading over my answer below and making several important corrections. I will edit my answer to highlight some of those corrections - @samuelcyk please read this answer once more to make sure you don’t miss the corrections.

Ok, so your model looks at trends in Diversity over time at each of a number of sites. Not sure how many sites you have but ideally at least 5, as per a commonly used rule of thumb. You would need these 5 sites to be “broadly similar” to each other and representative of a larger number of sites about which you wish to make inferences.

The model as you have it assumes the site-specific trends have identical shapes over time for each of your observed sites (as well as for each of the unobserved sites). The only thing that changes from one site to another is the “height” of the trend curves.

It doesn’t really matter how you define your Year variable unless you want to explicitly force the smooth of Year to be 0 in the first year of study. If Year = 0 in the first year of study, Year = 1 in the second year, etc., then it’s easier to interpret the global intercept in the “classical” sense provided you force the smooth of Year to be 0 when Year = 0. The global intercept would then represent the expected response value in the first year of study for a “typical” site (i.e., a site with a random intercept of 0). How do you force the smooth of Year to be 0 when Year = 0? You can use the option pc = 0 inside the smooth of Year: s(Year, pc = 0). But this model parametrization comes with some disadvantages too.

The default model parametrization in GAM or GAMM models assumes the smooth of Year, s(Year), is subject to a zero-sum constraint. This means that the average of the smooth across all study years will be equal to 0. The interpretation of the global intercept will be quite different now: it represents the average response value (i.e., Diversity) over the years of study for a “typical” site (i.e., a site with a random intercept equal to 0). So assume from now on we are dealing with the default model parametrization.

For that same “typical” site, the smooth effect of year would represent the shape of the time trend but “stripped of its height”. That is because the smooth is “centered about 0” to make it identifiable. (For a gaussian family, brms identifies this intercept as a population-level intercept.)

So to get the “complete” time trend for the “typical” site, you need to get its proper height (via the global intercept) PLUS its shape (via the smooth of Year). Of course, if all you cared about was the shape of the trend, the smooth alone will suffice.

It is my understanding that you will get the “complete” time trend for the “typical” site using posterior_epred() with re.form set to NA (as this will set the random effect of Site to 0). Additionally, in the newdata argument, you will need to feed a grid of values for Year. You can also set Site to NA in the newdata. (Ultimately, newdata needs to be a dataframe.)

Sites that are included in your model (i.e., they are present in the data) will still get the exact shape of time trend in your model as that for the “typical” site but the “height” of that trend will be made up of two components: 1. the global intercept; 2. a Site-specific deviation from the global intercept (aka the random effect of Site).

Once the model is fitted, these two components will be estimated and added up together to give the overall height of the time trend for a specific site included in the data.

That overall height will represent the expected response value at that site in the first year of study (i.e., when Year = 0).

It is my understanding that you will get the “complete” time trend for any site present in the data by using posterior_epred() with re.form set to NULL. Additionally, in the newdata argument, you will need to feed the specific Site you have in mind, as well as a grid of values for Year. (Again, newdata needs to be a dataframe.)

I know @AWoodward referred to the smooth in a model without random effects of Site as a “marginal effect”, but I think all we can say is that it is a “partial effect” (i.e., a “piece” we are adding to the intercept to get the expected response value)? I am hoping he can clarify the meaning of that terminology? If we take the difference between two consecutive years, Year and Year + 1, the difference in expected response values at the “typical” site would be s(Year + 1) - s(Year). So it seems to me that a “marginal” effect of Year would actually be the first derivative of the smooth of Year, not the actual smooth of Year? The actual smooth of Year is just a “partial” effect. This hurts my brain!

It is also my understanding that if you wanted to predict Diversity for the year following the last year included in your study (i.e., forecast Diversity one year ahead), then you would use posterior_predict().

Even with posterior_predict(), you have (at least) two situations: a) forecast Diversity one-year ahead at a “typical” site; b) forecast Diversity one-year ahead at one of the sites included in the data. (Of course, there is also the possibility of forecasting at a new site from your target universe of sites, that was not included in the data.)

For a) and b), I think you can specify re.form and newdata exactly as suggested above but be sure to now use posterior_predict() instead of posterior_epred().

It seems to me that posterior_smooths() would only give you the shape of the time trend at the sites you specify in its formula, but not the height. Because your model assumes the same shape at all sites (which is a very strong assumption, hopefully supported by your data), you should get the same shape whether you are looking at the “typical” site or a specific site from your data. While I haven’t tried it myself, I would imagine pisterior_smooths() should allow you to specify newdata and re.form the same way I suggested above.

I hope someone smarter than me can confirm that everything I said here makes sense. 🤭

2 Likes

You’re right that I’ve used the term ‘marginal effect’ imprecisely and probably your term ‘partial effect’ is more reasonable. The description of the model that you’ve made matches my understanding. These concepts extend to hierarchical smooths too.

1 Like

Thanks both! I had 20 sites and multiple transects within each sites as replicates as well, but I wanted to just showcase this question with a simpler example! I also wasn’t trying to predict the year+1 diversity, but more interpolating for years where the specific diversity value was not measured.

I guess what I was really after was what you said here

Therefore, I created newdata by seq each year within the dataset and then looked at posterior_epred with re.form set to NA and this gave me the values that I wanted.

The term marginal effect is indeed confusing, that was also why I think the function name was changed from marginal_effect to conditional_effect, but some of these terms have different meanings in different use cases.

Thank you both for the clarifications again!