How bad is this pp_check? Should I alter the distribution?

janderson · March 17, 2025, 8:26pm

Hi! I am trying to estimate the effect of treatment on my response variable. I am using a Gaussian GLMM in brms in R, however when I check my model with pp_check, it seems my posterior predictions are underestimating the peakiness of my true data?

Is this pp_check okay, or should I alter my model somehow to better fit the data? I was thinking that maybe I need to transform my normal distribution to better reflect the data’s peakiness?

model <- response ~ treatment * distance + (1|individual / trial), family = gaussian()

sjp · March 17, 2025, 9:51pm

Yeah, that doesn’t look great. It appears that your model’s assumptions probably don’t align with what you know about your response variable. It looks like there are no negative values in your response, but the Gaussian distribution doesn’t know what and thinks there should be a bunch. So, some questions:

What is your response variable? Can it only be a positive number? Can it only be a non-negative integer (i.e. whole numbers including zero)?

Also, what do your other variables mean here? If you explain the structure and nature of your data better, we’ll be able to help you sort out your modelling troubles in a much more principled way.

janderson · March 17, 2025, 10:02pm

Hi sjp!

My response variable is a difference between two angles, so response = angle1 - angle2. The response does have some negative observations although much less than the positive observations, and there is a right skew to the data. I think I may have landed on a better model fit with the student t distribution, as it is better (although not perfectly) simulating the “peaky-ness” of my data.

avehtari · March 18, 2025, 7:58am

You could try family=skew_normal() (see list of all available data model families at Special Family Functions for brms Models — brmsfamily • brms)

amynang · March 18, 2025, 3:36pm

Would it make sense for your research question to work with the log ratio of the two angles instead of their difference? Skew normal is worth a try but my impression is it can handle only moderate skewness.

janderson · March 19, 2025, 9:50pm

I’m definitely intrigued by this idea!

I don’t fully understand the upside of the log ratio of angles instead of a difference in angles, is this to account for any skew that occurs? What are the upsides to a log ratio response rather than a difference in angles?

I am looking at acoustic bat echolocation data to determine whether a bat is directing the center of its echolocation beam closer to which of two very closely spaced objects, object1 or object2. So my response is the differences between angle1 (angle between the echolocation call’s center and object 1) and angle 2 (angle between the echolocation call’s center and object 2).

janderson · March 19, 2025, 10:52pm

I tried this and the model fits so much better using a log ratio response and the student t distribution! Thank you for your insight!!

amynang · March 20, 2025, 7:07am

With raw differences you have both positive and negative values, which prevents a log-tranformation (or the use of the lognormal distribution) to handle the skewness. With the angle ratio you straddle 1 rather than zero and with a log ratio you are looking at multiplicative difference (for base 2, log(a1/a2)=1 would indicate that a1 is twice as large as a2 so the call center is two thirds to the direction of object 2). Now that I wrote that I am thinking that, if the call center is always between the two objects, you could also work with a1/(a1+a2) and a Beta distribution.

janderson · March 20, 2025, 6:47pm

Oh fun! Thank you so much!

The angular differences are very small, I was playing with the natural log initially, but the base 2 transformation is more intuitive for back calculating the ratio! I’m including the histogram of the log ratio base 2 transformation as well as the beta, just for fun!

janderson · March 20, 2025, 7:40pm

I tried a beta regression model with the ratio (angle1 / (angle1 + angle2) and my model pp_check isn’t fitting very well, again due to the “peakiness” of my data. What methods are possible for adjusting the beta regression to better fit my data?

mathDR · March 20, 2025, 8:12pm

If the distribution is the difference between two angles, wouldn’t a von Mises distribution be what you want? This has support in [0, 2pi]

janderson · March 20, 2025, 8:37pm

My understanding was that VonMises helps account for circularity, i.e. 1° being closer to 359° than to 90°. Because I am comparing differences in two very similar angles, my data doesn’t occupy the entirety of the circular range, the difference between the two angles ranges from -1 to 8°, so if I converted to VonMises it would be 359 - 8°, I thought VonMises doesn’t perform well at very limited ranges with high concentrations ?

amynang · March 20, 2025, 9:46pm

What is the treatment and the distance?

janderson · March 20, 2025, 9:57pm

treatment1 is an object that has greater spatial complexity so like two balls that are textured and closely spaced, while treatment 0 is a simple object like 1 smooth ball.

Distance is how far the bat was from the object at each echolocation call. So a large distance is when the bat is far from the object, and distance gets smaller as it approaches.

amynang · March 20, 2025, 10:04pm

But doesn’t each trial involve two objects (also two distances)?

janderson · March 20, 2025, 10:14pm

Yup, the objects are super closely spaced, centimeters apart, and I measure the distance between the bat’s location and the object, they are so closely spaced we don’t know if the bat can differentiate whether it is two objects or not using echolocation, so you can think of it as an “object complex” rather than two objects.

I look at the distance between the bat and the mean of the two objects (so the center of the object complex). While the simple object I look at the distance between the bat and the center of the simple object. I look at echolocation behavior starting from when the bat is 5 m away to when they approach.

amynang · March 20, 2025, 10:25pm

If a trial in treatment1 involves two closely spaced objects and a1, a2 are the angles between the call’s center and each object, what are the angles in treatment0?

janderson · March 20, 2025, 10:38pm

The edge of the ball that is present in both treatments, we want to know if the bat is consistently shifting it’s call towards the center of the object complex or shifting it’s attention between the edges of the object complex, whereas we expect for the simple object their the bat’s call will generally be consistently in the center.

amynang · March 21, 2025, 8:25am

OK that really clears things up :)
It’s also a good demonstration for why the Beta would not make sense here. See how in treatment 1 the call center is to the right of the right point you calculate an angle for? The log-ratio would still make sense as it would register that as a call that is biased towards X2. But with calls whose center is outside the X1-X2 range, your two angles are not proportions of a full X1-bat-X2 angle.

Let’s break this down. Assuming individuals are exposed to both treatments and given that distance changes during a trial, the index-type specification of the model would be:

response ~ 0 + treatment + treatment:distance + 
          (0 + treatment + treatment:distance | individual) + 
          (0 + treatment:distance | individual:trial)

You are estimating an intercept for each treatment and a slope for distance for each treatment, assuming the effect of distance is linear. You let both vary by individual and you let the slope vary by trial. I think the corresponding contrast-type specification should be:

response ~ 1 + treatment + distance + treatment:distance + 
          (1 + treatment + distance + treatment:distance | individual) + 
          (0 + distance | individual:trial)

Depending on how many calls you have per trial you might also relax the linearity assumption.

janderson · March 21, 2025, 7:07pm

I really appreciate this thoughtful discussion!! Here is another fun caveat, what if the angular difference DOES have a ~somewhat~ nonlinear relationship with distance? Because we are looking at angles, as the bat approaches the object, the angles tend to increase exponentially as distance decreases, SO if there are larger angular differences, these measurements increase exponentially as distance decreases, BUT if the call is more equidistant between the points, then it stays close to zero.

Topic		Replies	Views
How to fix odd pp_check results? Modeling brms	2	95	August 1, 2024
Choosing a sampling distribution for left skewed data brms	15	1535	March 20, 2024
Plot doesn't look good from pp_check() in brms Modeling brms	3	1124	January 27, 2023
Skew_normal pp_ckeck looks better than skew_generalized_t Modeling fitting-issues , specification , brms	5	509	April 14, 2023
What Response Distribution (Family) Should I be Using? Modeling specification	8	510	May 1, 2021

How bad is this pp_check? Should I alter the distribution?

Related topics