MCMCglmm is custom-made for these types of models, so for one thing it has efficient implementations for several routines that are necessary. So speed is one thing, although it might just be that for this example both MCMCglmm and brms run in similar time.

That’s basically what I was interested in. But still kind of hard to guess the cause. Just to fiddle around a bit more - what if you use both responses, but play with the correlation term (I would try at least `(1 || ID)`

, `(1 |p| ID)`

and `(1 | gr(ID, cov = A))`

) which of those give you divergences?

I am really working from my intuitions and not any rigorous math/simulations, so it is quite likely, I am wrong. I’ll still try to explain my intuitions as best as I can. The problem IMHO is (as I *think* @diogro alluded to) that you need `(1 | p | ID)`

term to catch phenotypic correlations (assuming those are random among the population, not correlated with genes). If you only include `(1 | p | gr(ID, cov = A))`

, the correlation coefficient might “soak up” some of the phenotypic correlation, as there is no other part of the model where it could be reflected. If you have a lot of individuals and the correlations in `A`

are generally strong, then the `(1 | p | gr(ID, cov = A))`

IMHO will be less likely to “soak up” phenotypic correlation, as that would violate the correlation structure within each response (i.e. if the correlation is non-genetic and there is phenotypic variation in `Resp1`

, you would have mice that are genetically closely related, phenotypically also dissimilar in `Resp2`

- this goes against the correlations in `A`

and this variance might get attributed to overall `sigma`

instead). If on the other hand you have few individuals and the correlations are weak, there is little penalty for phenotypic correlations contaminating the genetic ones.

A hopefully related example is that if you examined only monozygotic and heterozygotic twins, the correlations both within heterozygotic and homozygotic twins separately would both be a mix of phenotypic and genetic correlations. But if you can assume the phenotypic correlations are the same in both cases, you can interpret the difference between the two correlations as related to the genetic component. `(1 | p | ID)`

could represent correlation that is “same” regardless of the genotype and so `(1 | p | gr(ID, cov = A))`

should capture the “rest” of the correlation. (once again, I hope I am not misunderstanding something fundamental here).

The problem is that having both `(1 | p | gr(ID, cov = A))`

(genetic correlation) and `(1 | p | ID)`

(phenotypic correlation) in the same model sounds like a recipe for fitting issues unless you have tons of data (and maybe even if you have tons of data). Moreover, you already have fitting issues when using just one of those terms and adding more terms rarely makes a model better behaved…

**EDIT:** When using two correlation terms it would be necessary to let them differ, so the hypothetical full formula would be `Sex + Site + (1 | p1 | gr(ID, cov = A)) + (1 | p2 | ID)`

.

I’m afraid I’m currently awash with deadlined COVID stuff, so I still can’t commit to digging into this too deeply. But on the genetic/non-genetic correlations, this it worth looking at:

https://besjournals.onlinelibrary.wiley.com/doi/10.1111/j.1365-2656.2009.01639.x

Tutorial 2 has a bivariate animal model example, and I agree with everyone else that modelling the covariance/correlations in the residuals is important.

brms definitely takes a lot longer than MCMCglmm to run, for my models at least!

Hi Tracy,

100 observations is very few for quantitative genetic analyses. As a very vague rule of thumb, in pig populations, we used to reckon about 1000 for heritability estimation (ie univariate) and closer to 2000 for correlations. And in those commercial populations for most traits the same animals would be recorded for at least a large block of them.

Are the same mice recorded forr more than one trait? All traits? Some subset? That would influence the existence of covariances at the residual level.

Single observations per animal for each trait, or repeated?

And doing frequentist (ASREML, DFREML) analyses we would also build up any analyses - use univariate results as starting values for bivariates, bivariates as starting values for multivariate (just so you’re starting near to where you want to end up). I’m not sure how to do that in a Bayesian analysis framework.

I suspect you will struggle just because of data quantity. Sorry.

Regards,

Ron.

…They all give me divergences haha.

Ok–I think I understand what you and @diogro are talking about now with the phenotypic and genetic correlation terms, thanks for clarifying! However, it looks like I can’t model a repeated group-level effect?

`brms version 2.13.0 `

`Error: Duplicated group-level effects are not allowed. Occured for effect 'Intercept' of group 'ID'.`

Looks similar to a previously posed (unsolved) question here.

As the general consensus seems to be, the data seem too few to be able to fit these models really well. An alternative I will consider if I cannot build an appropriate bivariate model would be to report how large a sample I would need for this analysis to be possible, and I’ll use advice from this post as a starting point to understanding how to do this in a Bayesian framework.

Hi Ron, thanks for the input. Yes, ours is an extremely small dataset compared to human genetics and laboratory studies. We are interested in adapting quantitative genetics methods to study wild populations. Because of the logistics of studying natural populations, researchers interested in ecologically meaningful traits and behaviors will generally have scarce data. There are beginning to be quant. genetics studies in small and wild populations, including heritability studies by Charmantier, Perrier, de Villemereuil and Morrissey.

As for the phenotypes, the mice included in the models are only ones who have both traits 1 and 2, and these are single observations (or collapsed into single observations).

It’s not clear to me why you are using the random effect term (1|p|gr(ID, cov=A)) instead of the simpler (1|gr(ID, cov=A)). You have only one random effect, you don’t need a specific id for them.

Can you please post the code you used for the MCMCglmm model?

I think those reveal the source of the divergences - for many families, individual level random effects can easily be non-identified when you have only one observation. For th e case of `(1 || ID)`

you have gaussian likelihood and add i.i.d. gaussian to each row separately. But since gaussian + gaussian is another gaussian, there is no way to distinguish between the variance of the varying intercept and the variance of the likelihood, only their sum (and hence the sum of squares of the standard deviations) is identified. The additional covariance structure is then probably insufficient to constrain the posterior to avoid this degeneracy.

If I am right, you would expect the posterior to show a strong negative correlation between the fitted `sd`

of the varying intercept and the `sigma`

of the likelihood (as can been seen e.g. via `pairs(fit, pars = ... )`

).

And if this is indeed the case, I fear the problem is insurmountable unless you would have multiple measurements for at least some of the individuals to let you distinguish between the measurement variability and between-mice variability. `MCMCglmm`

has to encounter exactly the same problem, but (I’ve heard that) Gibbs samplers simply lack a good way to diagnose the problem, while with NUTS you get divergences that signal the issue.

Does that make sense?

Since this is a multivariate formula, it will make the effect correlated between the two responses (because `mvbind(A,B) ~ something`

is basically a shorthand for `bf(A ~ something) + bf(B ~ something)`

so the `p`

is actually mentioned in both the formulae) - does that make sense?

Since this is a multivariate formula, it will make the effect correlated between the two responses (because

`mvbind(A,B) ~ something`

is basically a shorthand for`bf(A ~ something) + bf(B ~ something)`

so the`p`

is actually mentioned in both the formulae) - does that make sense?

Yes, thank you. Stupid me that I forgot to consider the translation into sum of two `bf`

.

Hi,

Nice to see a fellow mammalogists on here! *Scotinomys* are super cool.

I’m nowhere near as knowledgeable as most of the people posting on this thread, and I don’t know much about the quantitative genetics side of this research, but I have fit a lot of phylogenetic multilevel models in `brms`

, some with smaller sample sizes that you have (that’s often the best we can do in non-model systems). Looking at you priors and results I can pick up one one pattern I’ve regularly seen in my own models.

First, these phylo or animal models are often really hard to fit, mostly because of an identifiability issue between the `sigma`

of a variable and the `sd(Intercept)`

from the relationship correlation matrix. The `brms`

phylogenetics vignette uses a nice example where this is not an issue. I have yet to find an empirical dataset that performs without any issues like this tarsus/back example.

I can see that the sigma value on your Resp2 and the `sd(Resp2_Intercept)`

are not jiving. I’m guessing what is happening is that they are finding nice optima, but occasionally the chains are jumping up (or down) and crossing to the optima of the other variable. You should be able to clearly see this in your chain plots using `plot(model)`

. So your `sd(Intercept)Resp2`

posterior may have a nice peak around 0.4, and your `sigma_Resp2`

a peak around 0.9, but occasionally sigma plummets towards 0.4 and simultaneously sd jumps up towards 0.9, then everything goes back to “normal.” I bet this is the source of your divergent transitions.

A similar problem may happen if you fit this model using frequentist tools, except you may never know about it! One school of thought that I have picked up on in this forum and also reading (and re-reading) McElreath’s Statistical Rethinking is that regularizing priors are a good way to test for and deal with these issues. My way to diagnose and sometimes deal with this particular low ESS/Divergent transition issue would be try setting the sigma Resp2 prior to something like `normal(1, 0.2)`

and the sd Intercept Resp2 prior to `normal(0, 0.2)`

or otherwise nudge one of them to be lower and one of them to be higher. This should discourage this identifiability issue you are having.

In my experience, using highly regularizing priors on the phylo/animal group-level parameter can help boost ESS and eliminate divergent transitions, but typically population-level effects and rescor values don’t really change much following these adjustments. So if you are getting confusing correlation outputs, this may or may not help those. Seeing `lkj(20)`

definitely caught my attention though! Maybe you can play around with correlation priors once you work on that `sigma`

/`sd`

issue. Also, though some may disagree, I wouldn’t be shy to fit tighter priors throughout all your `brms`

models. One big reason I switched to `brms`

from `MCMCglmm`

is the transparency of the priors (for a non-statistician like me) and the ease of adjusting them. If all variables are mean centered and scaled to unity, I typically start with a `normal(0,1)`

on all my population-level effects. Good luck!

Hi Martin,

Thanks again for your thorough response! Your answers have been extremely helpful, and I greatly appreciate your time. I have a couple quick follow up questions, if you have time to answer them…Modeling is not my strength, so I apologize if I’m being a little slow to understand.

I could “un-collapse” my observations for repeated measures of song (*i.e.* each song trait is a composite of multiple songs per individual). (It isn’t clear to me exactly what this would do–I didn’t quite understand your explanation–but I’ll be rereading your response and doing some supplemental reading to better understand.)

I would like to clarify my understanding about modeling the phenotypic/non-genetic correlations–that is what the `rescor`

term in gaussian models is doing, correct? But because skew normal does not allow for this, we must explicitly add it to our models with the `(1|p|ID)`

term (which doesn’t seem to work due to the ‘duplicated terms’ error I received).

I am actually beginning to fit a few bivariate models that *are* running well (or seeming to run well based just on the divergent transitions being 0 and looking at the diagnostic plots; haven’t looked more closely yet. What I changed are as follows:

- The two traits I’m attempting to model (I realized the model I presented to you was among those with the most severely constrained sample size, though of the highest biological interest). Using other traits that were not constrained in this way already brought transitions down by quite a bit.
- Tighter priors: I switched from
`Intercept ~ Normal(0,10)`

to`Intercept ~ Normal (0,1)`

(which actually @jonnations just independently suggested as I am writing this response!) - Switching to gaussian family so I can model rescor. While for many of the song traits a skew_normal distribution was a better fit to the data for univariate models, as I mentioned in earlier posts in this thread, the additional term
`(1|p|ID)`

was not permitted (and I haven’t yet found a workaround).

Does this all seem reasonable?

Thank you for the suggestions @jonnations! I will also reread the relevant sections of McElreath’s book. Is the lkj term that I specified weird? (I was playing with increasing the value and seeing what was happening with my original model.) I don’t really know what is “reasonable”, but admittedly I only just started the lit search/discourse search for more info on this topic. Suggestions would be very much appreciated, however.

You are very welcome :-)

No worries, this stuff is hard. Also a) I can set boundaries if questions require too much time from me b) I am actually part-time employed by Stan to care for the forums, including answering questions (way more fun than my actual research as I get to discuss the interesting parts but don’t really have to do the annoying implementation/preprocessin/cleaning work ;-).

Un collapsing should help (collapsing is rarely beneficial and often detrimental for inference, although it can help speed). But you would still expect problems in the response you can’t uncollapse :-(

I’d be happy to explain more, but I’d like to confirm I am not wrong before that - does the `pairs`

plot show the pattern I described? (if you have issues getting the `pairs`

plot to work, do ask!)

`rescor`

does much less than `(1 | p | ID)`

- the former only says that the residuals are correlated but the variance of the individual responses stays the same. The latter also adds varying intercept per individual, i.e. an additional source of variance of the response. I can see how using `rescor`

might be preferable and resolve your issues - because we actually want just the correlation and don’t really need the additional variance.

To avoid the duplicated effect you can add a column `ID2 = ID`

and use `(1|p|ID2)`

(yeah it looks like cheating, but there’s nothing wrong with it - brms AFAIK forbids double use only because it breaks its parameter naming, not because it is somehow fundamentally bad :-)).

Good to hear! `normal(0,1)`

priors are usually reasonable *if the scale of the response is roughly a few units*. So check the units - if you have something like mouse body size in milimeters, this would be indefensibly tight (but for size in cms could be OK) BTW since your traits are strictly positive, it might also make sense to consider lognormal or gamma families (as those enforce positive outcome - and are skewed - by definition ) - note that those *change* the interpretation of your coefficents as now the coefficients are on the log scale (and hence much tighter priors are usually warranted).

Totes agree! :-)

This seems (to me, without domain knowledge) a bit dangerous - the non-identifiability you are dealing with is that the *data* do not contain enough information to tell which of the variabilities (measurement or the varying intercept) are larger. Unless you have theory to let you say a priori which is larger, such a strong prior is IMHO indefensible and may lead to unjustified inferences from your data.

:) Here’s the requested clarifications:

Yes–I see a strong negative correlation between sd_intercepts and sigma for both Resp1 and Resp2. The following plots are for my current model described in my last post (but only one chain).

This is such a good suggestion! I am implementing a model right now (takes a while to run).

Yes–all response variables are scaled such that SD = 1, mean = 0! Glad to hear your opinion on this.

Great. So here’s an example - let’s say we measure threee song lengths for three mice and we have the model `Resp ~ (1 || ID)`

. This model has three “main” parameters: `Intercept`

(which we will not care about here), the “global” or “residual” variability `sigma`

and the “between-mice” variability `sd_Intercept_ID`

. Let’s say our data looks like this:

```
Example 1:
MouseA: 13 12 14
MouseB: 25 22 21
MouseC: 31 28 30
```

Without even running the model we can say that between-mice variability `sd_Intercept_ID`

is much larger than the residual variability `sigma`

.

But the data could have looked differently:

```
Example 2:
MouseA: 13 28 21
MouseB: 25 12 30
MouseC: 31 22 14
```

Here, the mice actually don’t differ much from each other, so `sd_Intercept_ID`

will be smaller than `sigma`

which becomes large to encompass the measurements change notably even within the same mouse. Makes sense so far?

The point is that in both examples, the first observations are exactly the same, so if we observed just the first column, we are unable to distinguish between "large `sd_Intercept_ID`

, small `sigma`

" and "small `sd_Intercept_SD`

, large `sigma`

" (and everything in between).

Technical aside for completeness: if you observe only one value per mouse, the model only informs the total sd which is `sqrt(sd_Intercept_SD^2 + sigma^2)`

, and that’s why the `pairs`

plot for the two looks roughly like a circle arc. Also, for some non-gaussian families you could - at least in principle - identify `sd_Intercept_ID`

even when you have only one observation per individual, because the gaussian varying intercept can create a somewhat different variability pattern than the variability introduced by the family, but for gaussian family (and few others) the case is hopeless even in theory, because (with a bit of sloppy notation) `normal(normal(mu, sd1), sd2)`

is *exactly* the same as `normal(mu, sqrt(sd1 ^ 2 + sd2 ^ 2))`

.

Now for your data, it might actually make sense to take a different approach for each of the responses: The song length is not fixed per individual, there is (I assume) substantial within-individual variability and you have multiple measurements of the song length so you can identify both the “within-individual” and “between-individual” variability. For body size, I would expect much smaller within-individual variability (although there probably is some due to measurement imprecisions, and I’ve heard people’s height changes slightly over the course of day so mice’s probably do so as well). While you can’t directly quantify the within-individual variability as you have only a single measurement, I guess you can easily put some quite strict bounds on it using your knowledge of the domain. In a single-response model this could be achived by putting a narrow prior on `sigma`

.

There would be some technical challenges for putting both in a single `brms`

model. A slightly sub-optimal but probably easiest would be to use the `se()`

addition term. You would have `Resp1 | se(error) ~ ...`

This effectively fixes the `sigma`

at a specific value separately for each row in the dataset which will no longer be estimated. The `error`

represents a column in the data containing the standard error of the mean (`sd(x) / sqrt(n)`

), so for song length, you would put average song length as the response and the observed standard error of the mean as `error`

. For body size you would put the single measurement as the response and put the theoretically derived measurement error as `error`

. (this paragraph is pure speculation on my side, I’ve never built such a model, but I *think* it *should* work).

If your responses are all positive, than the natural trasformation would IMHO be be taking the log (and potentially scaling then, but that might not be necessary). The log is also likely to reduce the skew, so you might be able to get away with `gaussian`

family and use `rescor`

(which I now believe to be quite beneficial).

As I said, this will change the interpretation of the coefficients but I think that this is actually more natural - say you get estimate of `sd_intercept_ID`

roughly 5 for a model on the original scale: this means that between-mice variability is something like +/- 2*5. If the mean population song length is say 30, this is unproblematic but what if the mean song length is 8, this would imply that some mice have average song length of -2 …

If you work on the log scale and you get estimate of `sd_intercept_ID`

roughly `0.55 ~= log(3)/2`

, this means that the between-mice variability is roughly between “the song is shorter by a factor of 3” and “the song is thirce as long” which makes sense regardless of the population mean…

Does that make sense?

Hi Martin–thanks for your reply! And sorry for the delay in my response, was at a conference this week… and also had a little reading to do before I could figure out how to implement some of the things you mentioned.

Yes–I’m with you thus far!

To check that I understand you correctly, you are recommending the `se`

and `recor`

terms together in addition to the `(1|p|ID)`

term to model individual variability, the correlated residuals, and varying intercepts, respectively?

Here is what I am fitting:

```
mice$Song <- log(mice[,trait1])
mice$Song <- scale(mice$Song)
mice$Condition <- scale(mice[,trait2]) # this trait has neg and pos values, so log not appropriate
bf_song <- bf(Song | se(error, sigma=TRUE) ~ Sex + Site + (1 | p| gr(ID, cov = A) ) + (1 | q | ID2) )
bf_cond <- bf(Condition ~ Sex + Site + (1 | p| gr(ID, cov = A) ) + (1 | q | ID2) )
fit <- brm(bf_song + bf_cond,
data=mice, data2=list(A=covmat),
family=gaussian(),
control = list(adapt_delta=0.9,stepsize=0.001, max_treedepth=20),
warmup = 200, iter = 1000,
chains = 1, cores = 1,thin=1,prior=priors)
```

And here are the results:

```
Family: MV(gaussian, gaussian)
Links: mu = identity; sigma = identity
mu = identity; sigma = identity
Formula: Song | se(error, sigma = TRUE) ~ Sex + Site + (1 | p | gr(ID, cov = A)) + (1 | q | ID2)
Condition ~ Sex + Site + (1 | p | gr(ID, cov = A)) + (1 | q | ID2)
Data: mice (Number of observations: 81)
Samples: 1 chains, each with iter = 1000; warmup = 200; thin = 1;
total post-warmup samples = 800
Group-Level Effects:
~ID (Number of levels: 81)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Song_Intercept) 0.35 0.21 0.01 0.74 1.02 69 104
sd(Condition_Intercept) 0.44 0.27 0.02 0.99 1.02 32 112
cor(Song_Intercept,Condition_Intercept) -0.06 0.21 -0.46 0.38 1.00 585 592
~ID2 (Number of levels: 81)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Song_Intercept) 0.32 0.21 0.02 0.71 1.01 46 179
sd(Condition_Intercept) 0.53 0.28 0.04 1.01 1.03 38 110
cor(Song_Intercept,Condition_Intercept) -0.03 0.20 -0.42 0.38 1.00 456 553
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Song_Intercept -0.19 0.25 -0.66 0.30 1.00 887 659
Condition_Intercept -0.16 0.29 -0.71 0.41 1.00 1134 670
Song_SexM -0.01 0.27 -0.55 0.51 1.00 1042 717
Song_SiteMH -0.25 0.29 -0.84 0.33 1.00 902 635
Song_SitePC -1.05 0.73 -2.47 0.40 1.00 1445 800
Song_SitePH 0.12 0.41 -0.72 0.90 1.00 736 564
Condition_SexM 0.28 0.31 -0.34 0.87 1.00 1038 669
Condition_SiteMH -0.32 0.35 -1.00 0.39 1.01 990 562
Condition_SitePC -1.23 1.11 -3.50 0.95 1.00 941 523
Condition_SitePH 0.66 0.54 -0.48 1.68 1.00 1034 694
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_Song 0.38 0.20 0.05 0.73 1.05 35 147
sigma_Condition 0.64 0.26 0.19 1.07 1.02 23 54
Residual Correlations:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
rescor(Song,Condition) -0.04 0.17 -0.34 0.35 1.00 618 664
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
```

no divergent transitions…(but with more iterations/chains, perhaps?)

If this is correct, I think I need a little help still in how to interpret this. But first things first–have I interpreted the information you provided me above into an appropriate model?

Finally:

Mental math isn’t always my favorite–but yes, I understand your explanation!!

Thanks again!

So did I manage to get across why - with single observation per mouse - you are unable to estimate both `sigma`

and `sd_Intercept_ID`

? (honestly want to know if this example is good pedagogy and whether I can reuse it elsewhere).

Yes.

Some of this looks suspicious to me (but remember, you know your data better than I do, and also I didn’t even try to run the model, so don’t get fooled by the apparent certainty of my writing).

Just to be clear - we are no longer looking at body weight and song length, right? (otherwise I can’t see how you would get negative values).

I would keep `sigma=FALSE`

- as I tried to show before, estimating `sigma`

AND `1 | p | ID`

at the same time is likely problematic. Further I think we can somewhat safely avoid estimating `sigma`

, so great!

I *think* (and here, please really do check me) that while using `rescor=TRUE`

you can avoid the ` (1 | q | ID2)`

term, because the `rescor`

should model the non-genetic correlation and `se`

should model the within-mouse variability.

So `Condition`

is a trait where you only have one measurement per mouse, right? To avoid the non-identifiability between `sigma`

and `sd_Intercept_ID`

, I would use the `Condition | se(error_condition)`

term here where `error_condition`

would be some a-priori theoretical estimate of the individual (within-mouse) variability in `Condition`

with hypothetical repeated measurements. This obviously only makes sense if you actually can make a defensible estimate of this variability from theory.

I guess you were getting divergent transitions with less extreme settings, right? I *think* with the tweaks above, you *should* be able to have less extreme settings / revert to default on most. Also note that your posterior intervals for all quantities of interest (correlations and population-level effects) are pretty wide and basically centered around zero, i.e. even though you are able to fit the model, you are unable to learn much about its parameters from your data. I would expect there to be strong correlations between them on a `pairs`

plot and it is possible that with the less flexible model (as I suggested above), you would be able to constrain the parameters and learn something useful.

Does that make sense?

Yes, I understand your explanation!

Hmm–what about the model looks suspicious to you?

So–I changed the names of Resp1 and 2 so that it would be clearer to me where the error terms were coming from and what they were affecting. Apologies for not being consistent! The body size trait is the same as in previous models–I didn’t mention this for semantical simplicity (but I guess it does make a mathematical difference!), but the exact trait is residual body mass (RBM), so it’s mass that has been scaled by skeletal length. Thus, this trait is centered around zero and can have negative values. Sorry for not explaining earlier.

ttburk:

`se(error, sigma=TRUE)`

I would keep

`sigma=FALSE`

- as I tried to show before, estimating`sigma`

AND`1 | p | ID`

at the same time is likely problematic. Further I think we can somewhat safely avoid estimating`sigma`

, so great!

I see; I misunderstood your first message to mean “while it would be better to have both, the fitting is tricky and thus is likely impossible to do (but do it if you can)”.

However, I am actually having a hard time fitting models *with* `sigma=FALSE`

–I get the following message, no luck yet figuring out a workaround (but I’m sure I will find a useful link on discourse soon):

```
Chain 1: Rejecting initial value:
Chain 1: Log probability evaluates to log(0), i.e. negative infinity.
```

ttburk:

`(1 | p| gr(ID, cov = A) ) + (1 | q | ID2)`

I

think(and here, please really do check me) that while using`rescor=TRUE`

you can avoid the`(1 | q | ID2)`

term, because the`rescor`

should model the non-genetic correlation and`se`

should model the within-mouse variability.

Okay! Again, I think I misunderstood as I did above ^

So

`Condition`

is a trait where you only have one measurement per mouse, right? To avoid the non-identifiability between`sigma`

and`sd_Intercept_ID`

, I would use the`Condition | se(error_condition)`

term here where`error_condition`

would be some a-priori theoretical estimate of the individual (within-mouse) variability in`Condition`

with hypothetical repeated measurements. This obviously only makes sense if you actually can make a defensible estimate of this variability from theory.

Got it!

I guess you were getting divergent transitions with less extreme settings, right? I

thinkwith the tweaks above, youshouldbe able to have less extreme settings / revert to default on most. Also note that your posterior intervals for all quantities of interest (correlations and population-level effects) are pretty wide and basically centered around zero, i.e. even though you are able to fit the model, you are unable to learn much about its parameters from your data. I would expect there to be strong correlations between them on a`pairs`

plot and it is possible that with the less flexible model (as I suggested above), you would be able to constrain the parameters and learn something useful.Does that make sense?

Yes, this makes sense! Haven’t figured out how to fit it as specified though because of the issue mentioned above. Will work on this. Thanks again very much for your time and care on this problem.

Hmm–what about the model looks suspicious to you?

That was just an introduction to the individual points I raised below :-)

the exact trait is residual body mass (RBM), so it’s mass that has been scaled by skeletal length. Thus, this trait is centered around zero and can have negative values. Sorry for not explaining earlier.

No worries. There might be some room to make the model better by taking the process of how this RBM thing is calculated into account, but that’s probably a second order concern, let’s first get your model to work well…

However, I am actually having a hard time fitting models

with`sigma=FALSE`

–I get the following message, no luck yet figuring out a workaround (but I’m sure I will find a useful link on discourse soon):`Chain 1: Rejecting initial value: Chain 1: Log probability evaluates to log(0), i.e. negative infinity.`

Oh, bummer. If this message only appears with `sigma = FALSE`

, my first guess would be that some of the values of `error`

are invalid - are there any zeroes, infinities, etc.?

Good luck!

Model working with default priors and only 3 divergent transitions with 4 chains at 750 warmup 5000 iterations! Should be simple (hopefully) to fix divergent transitions with prior specification or other parameters.

```
bf_song <- bf(Song |se(error_song,sigma=FALSE) ~ Sex + Site + (1|p|gr(ID, cov = A)) )#+(1|q|ID2))
bf_cond <- bf(Condition |se(error_cond,sigma=FALSE) ~ Sex + Site + (1|p|gr(ID, cov = A)) )#+(1|q|ID2))
fit <- brm(bf_song + bf_cond,
data=mice, data2=list(A=covmat),
family=gaussian(),
control = list(adapt_delta=0.9,stepsize=0.001, max_treedepth=10),
warmup = 750, iter = 5000,
chains = 4, cores = 1,thin=1,prior=priors)
```

Song = song length

Condition = RBM aka residual body mass

```
Family: MV(gaussian, gaussian)
Links: mu = identity; sigma = identity
mu = identity; sigma = identity
Formula: Song | se(error_song, sigma = FALSE) ~ Sex + Site + (1 | p | gr(ID, cov = A))
Condition | se(error_cond, sigma = FALSE) ~ Sex + Site + (1 | p | gr(ID, cov = A))
Data: mice (Number of observations: 80)
Samples: 4 chains, each with iter = 5000; warmup = 750; thin = 1;
total post-warmup samples = 17000
Group-Level Effects:
~ID (Number of levels: 80)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Song_Intercept) 0.36 0.05 0.27 0.47 1.00 7512 10951
sd(Condition_Intercept) 1.09 0.09 0.93 1.29 1.00 3200 5798
cor(Song_Intercept,Condition_Intercept) 0.14 0.16 -0.17 0.44 1.01 763 1589
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Song_Intercept 0.03 0.22 -0.40 0.45 1.00 9138 10565
Condition_Intercept -0.50 0.52 -1.50 0.52 1.00 2938 5410
Song_SexM 0.19 0.16 -0.13 0.51 1.00 12381 13267
Song_SitePC -0.08 0.39 -0.84 0.69 1.00 15039 12655
Song_SitePH 0.21 0.30 -0.37 0.79 1.00 10302 11256
Song_SiteQERC 0.07 0.20 -0.32 0.45 1.00 8154 10254
Condition_SexM 0.33 0.32 -0.29 0.97 1.00 2856 6049
Condition_SitePC -0.79 1.08 -2.90 1.29 1.00 10800 11691
Condition_SitePH 0.90 0.73 -0.54 2.32 1.00 5512 8395
Condition_SiteQERC 0.31 0.50 -0.68 1.29 1.00 2983 5088
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_Song 0.00 0.00 0.00 0.00 1.00 17000 17000
sigma_Condition 0.00 0.00 0.00 0.00 1.00 17000 17000
Residual Correlations:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
rescor(Song,Condition) 0.04 0.33 -0.59 0.65 1.00 4655 4690
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Warning message:
There were 3 divergent transitions after warmup. Increasing adapt_delta above 0.9 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
```