Great. So here’s an example - let’s say we measure threee song lengths for three mice and we have the model
Resp ~ (1 || ID). This model has three “main” parameters:
Intercept (which we will not care about here), the “global” or “residual” variability
sigma and the “between-mice” variability
sd_Intercept_ID. Let’s say our data looks like this:
MouseA: 13 12 14
MouseB: 25 22 21
MouseC: 31 28 30
Without even running the model we can say that between-mice variability
sd_Intercept_ID is much larger than the residual variability
But the data could have looked differently:
MouseA: 13 28 21
MouseB: 25 12 30
MouseC: 31 22 14
Here, the mice actually don’t differ much from each other, so
sd_Intercept_ID will be smaller than
sigma which becomes large to encompass the measurements change notably even within the same mouse. Makes sense so far?
The point is that in both examples, the first observations are exactly the same, so if we observed just the first column, we are unable to distinguish between "large
sigma" and "small
sigma" (and everything in between).
Technical aside for completeness: if you observe only one value per mouse, the model only informs the total sd which is
sqrt(sd_Intercept_SD^2 + sigma^2), and that’s why the
pairs plot for the two looks roughly like a circle arc. Also, for some non-gaussian families you could - at least in principle - identify
sd_Intercept_ID even when you have only one observation per individual, because the gaussian varying intercept can create a somewhat different variability pattern than the variability introduced by the family, but for gaussian family (and few others) the case is hopeless even in theory, because (with a bit of sloppy notation)
normal(normal(mu, sd1), sd2) is exactly the same as
normal(mu, sqrt(sd1 ^ 2 + sd2 ^ 2)).
Now for your data, it might actually make sense to take a different approach for each of the responses: The song length is not fixed per individual, there is (I assume) substantial within-individual variability and you have multiple measurements of the song length so you can identify both the “within-individual” and “between-individual” variability. For body size, I would expect much smaller within-individual variability (although there probably is some due to measurement imprecisions, and I’ve heard people’s height changes slightly over the course of day so mice’s probably do so as well). While you can’t directly quantify the within-individual variability as you have only a single measurement, I guess you can easily put some quite strict bounds on it using your knowledge of the domain. In a single-response model this could be achived by putting a narrow prior on
There would be some technical challenges for putting both in a single
brms model. A slightly sub-optimal but probably easiest would be to use the
se() addition term. You would have
Resp1 | se(error) ~ ... This effectively fixes the
sigma at a specific value separately for each row in the dataset which will no longer be estimated. The
error represents a column in the data containing the standard error of the mean (
sd(x) / sqrt(n)), so for song length, you would put average song length as the response and the observed standard error of the mean as
error. For body size you would put the single measurement as the response and put the theoretically derived measurement error as
error. (this paragraph is pure speculation on my side, I’ve never built such a model, but I think it should work).
If your responses are all positive, than the natural trasformation would IMHO be be taking the log (and potentially scaling then, but that might not be necessary). The log is also likely to reduce the skew, so you might be able to get away with
gaussian family and use
rescor (which I now believe to be quite beneficial).
As I said, this will change the interpretation of the coefficients but I think that this is actually more natural - say you get estimate of
sd_intercept_ID roughly 5 for a model on the original scale: this means that between-mice variability is something like +/- 2*5. If the mean population song length is say 30, this is unproblematic but what if the mean song length is 8, this would imply that some mice have average song length of -2 …
If you work on the log scale and you get estimate of
0.55 ~= log(3)/2, this means that the between-mice variability is roughly between “the song is shorter by a factor of 3” and “the song is thirce as long” which makes sense regardless of the population mean…
Does that make sense?