Interpretation of cor term from multivariate animal models

jonnations · July 28, 2020, 6:07pm

Hi,

Nice to see a fellow mammalogists on here! Scotinomys are super cool.

I’m nowhere near as knowledgeable as most of the people posting on this thread, and I don’t know much about the quantitative genetics side of this research, but I have fit a lot of phylogenetic multilevel models in brms, some with smaller sample sizes that you have (that’s often the best we can do in non-model systems). Looking at you priors and results I can pick up one one pattern I’ve regularly seen in my own models.

First, these phylo or animal models are often really hard to fit, mostly because of an identifiability issue between the sigma of a variable and the sd(Intercept) from the relationship correlation matrix. The brms phylogenetics vignette uses a nice example where this is not an issue. I have yet to find an empirical dataset that performs without any issues like this tarsus/back example.

I can see that the sigma value on your Resp2 and the sd(Resp2_Intercept) are not jiving. I’m guessing what is happening is that they are finding nice optima, but occasionally the chains are jumping up (or down) and crossing to the optima of the other variable. You should be able to clearly see this in your chain plots using plot(model). So your sd(Intercept)Resp2 posterior may have a nice peak around 0.4, and your sigma_Resp2 a peak around 0.9, but occasionally sigma plummets towards 0.4 and simultaneously sd jumps up towards 0.9, then everything goes back to “normal.” I bet this is the source of your divergent transitions.

A similar problem may happen if you fit this model using frequentist tools, except you may never know about it! One school of thought that I have picked up on in this forum and also reading (and re-reading) McElreath’s Statistical Rethinking is that regularizing priors are a good way to test for and deal with these issues. My way to diagnose and sometimes deal with this particular low ESS/Divergent transition issue would be try setting the sigma Resp2 prior to something like normal(1, 0.2) and the sd Intercept Resp2 prior to normal(0, 0.2) or otherwise nudge one of them to be lower and one of them to be higher. This should discourage this identifiability issue you are having.

In my experience, using highly regularizing priors on the phylo/animal group-level parameter can help boost ESS and eliminate divergent transitions, but typically population-level effects and rescor values don’t really change much following these adjustments. So if you are getting confusing correlation outputs, this may or may not help those. Seeing lkj(20) definitely caught my attention though! Maybe you can play around with correlation priors once you work on that sigma/sd issue. Also, though some may disagree, I wouldn’t be shy to fit tighter priors throughout all your brms models. One big reason I switched to brms from MCMCglmm is the transparency of the priors (for a non-statistician like me) and the ease of adjusting them. If all variables are mean centered and scaled to unity, I typically start with a normal(0,1) on all my population-level effects. Good luck!

ttburk · July 28, 2020, 6:47pm

Hi Martin,

Thanks again for your thorough response! Your answers have been extremely helpful, and I greatly appreciate your time. I have a couple quick follow up questions, if you have time to answer them…Modeling is not my strength, so I apologize if I’m being a little slow to understand.

martinmodrak:

I think those reveal the source of the divergences - for many families, individual level random effects can easily be non-identified when you have only one observation. For th e case of (1 || ID) you have gaussian likelihood and add i.i.d. gaussian to each row separately. But since gaussian + gaussian is another gaussian, there is no way to distinguish between the variance of the varying intercept and the variance of the likelihood, only their sum (and hence the sum of squares of the standard deviations) is identified. The additional covariance structure is then probably insufficient to constrain the posterior to avoid this degeneracy.
[…]
And if this is indeed the case, I fear the problem is insurmountable unless you would have multiple measurements for at least some of the individuals to let you distinguish between the measurement variability and between-mice variability. MCMCglmm has to encounter exactly the same problem, but (I’ve heard that) Gibbs samplers simply lack a good way to diagnose the problem, while with NUTS you get divergences that signal the issue.

I could “un-collapse” my observations for repeated measures of song (i.e. each song trait is a composite of multiple songs per individual). (It isn’t clear to me exactly what this would do–I didn’t quite understand your explanation–but I’ll be rereading your response and doing some supplemental reading to better understand.)

I would like to clarify my understanding about modeling the phenotypic/non-genetic correlations–that is what the rescor term in gaussian models is doing, correct? But because skew normal does not allow for this, we must explicitly add it to our models with the (1|p|ID) term (which doesn’t seem to work due to the ‘duplicated terms’ error I received).

I am actually beginning to fit a few bivariate models that are running well (or seeming to run well based just on the divergent transitions being 0 and looking at the diagnostic plots; haven’t looked more closely yet. What I changed are as follows:

The two traits I’m attempting to model (I realized the model I presented to you was among those with the most severely constrained sample size, though of the highest biological interest). Using other traits that were not constrained in this way already brought transitions down by quite a bit.
Tighter priors: I switched from Intercept ~ Normal(0,10) to Intercept ~ Normal (0,1) (which actually @jonnations just independently suggested as I am writing this response!)
Switching to gaussian family so I can model rescor. While for many of the song traits a skew_normal distribution was a better fit to the data for univariate models, as I mentioned in earlier posts in this thread, the additional term (1|p|ID) was not permitted (and I haven’t yet found a workaround).

Does this all seem reasonable?

jonnations:

My way to diagnose and sometimes deal with this particular low ESS/Divergent transition issue would be try setting the sigma Resp2 prior to something like normal(1, 0.2) and the sd Intercept Resp2 prior to normal(0, 0.2) or otherwise nudge one of them to be lower and one of them to be higher. This should discourage this identifiability issue you are having.

In my experience, using highly regularizing priors on the phylo/animal group-level parameter can help boost ESS and eliminate divergent transitions, but typically population-level effects and rescor values don’t really change much following these adjustments. So if you are getting confusing correlation outputs, this may or may not help those. Seeing lkj(20) definitely caught my attention though! Maybe you can play around with correlation priors once you work on that sigma / sd issue. Also, though some may disagree, I wouldn’t be shy to fit tighter priors throughout all your brms models. One big reason I switched to brms from MCMCglmm is the transparency of the priors (for a non-statistician like me) and the ease of adjusting them. If all variables are mean centered and scaled to unity, I typically start with a normal(0,1) on all my population-level effects. Good luck!

Thank you for the suggestions @jonnations! I will also reread the relevant sections of McElreath’s book. Is the lkj term that I specified weird? (I was playing with increasing the value and seeing what was happening with my original model.) I don’t really know what is “reasonable”, but admittedly I only just started the lit search/discourse search for more info on this topic. Suggestions would be very much appreciated, however.

martinmodrak · July 28, 2020, 7:16pm

You are very welcome :-)

No worries, this stuff is hard. Also a) I can set boundaries if questions require too much time from me b) I am actually part-time employed by Stan to care for the forums, including answering questions (way more fun than my actual research as I get to discuss the interesting parts but don’t really have to do the annoying implementation/preprocessin/cleaning work ;-).

Un collapsing should help (collapsing is rarely beneficial and often detrimental for inference, although it can help speed). But you would still expect problems in the response you can’t uncollapse :-(

I’d be happy to explain more, but I’d like to confirm I am not wrong before that - does the pairs plot show the pattern I described? (if you have issues getting the pairs plot to work, do ask!)

rescor does much less than (1 | p | ID) - the former only says that the residuals are correlated but the variance of the individual responses stays the same. The latter also adds varying intercept per individual, i.e. an additional source of variance of the response. I can see how using rescor might be preferable and resolve your issues - because we actually want just the correlation and don’t really need the additional variance.

To avoid the duplicated effect you can add a column ID2 = ID and use (1|p|ID2) (yeah it looks like cheating, but there’s nothing wrong with it - brms AFAIK forbids double use only because it breaks its parameter naming, not because it is somehow fundamentally bad :-)).

Good to hear! normal(0,1) priors are usually reasonable if the scale of the response is roughly a few units. So check the units - if you have something like mouse body size in milimeters, this would be indefensibly tight (but for size in cms could be OK) BTW since your traits are strictly positive, it might also make sense to consider lognormal or gamma families (as those enforce positive outcome - and are skewed - by definition ) - note that those change the interpretation of your coefficents as now the coefficients are on the log scale (and hence much tighter priors are usually warranted).

Totes agree! :-)

This seems (to me, without domain knowledge) a bit dangerous - the non-identifiability you are dealing with is that the data do not contain enough information to tell which of the variabilities (measurement or the varying intercept) are larger. Unless you have theory to let you say a priori which is larger, such a strong prior is IMHO indefensible and may lead to unjustified inferences from your data.

ttburk · July 28, 2020, 8:42pm

:) Here’s the requested clarifications:

Yes–I see a strong negative correlation between sd_intercepts and sigma for both Resp1 and Resp2. The following plots are for my current model described in my last post (but only one chain).

This is such a good suggestion! I am implementing a model right now (takes a while to run).

Yes–all response variables are scaled such that SD = 1, mean = 0! Glad to hear your opinion on this.

martinmodrak · July 29, 2020, 6:11am

Great. So here’s an example - let’s say we measure threee song lengths for three mice and we have the model Resp ~ (1 || ID). This model has three “main” parameters: Intercept (which we will not care about here), the “global” or “residual” variability sigma and the “between-mice” variability sd_Intercept_ID. Let’s say our data looks like this:

Example 1:
MouseA: 13 12 14
MouseB: 25 22 21
MouseC: 31 28 30

Without even running the model we can say that between-mice variability sd_Intercept_ID is much larger than the residual variability sigma.

But the data could have looked differently:

Example 2:
MouseA: 13 28 21
MouseB: 25 12 30
MouseC: 31 22 14

Here, the mice actually don’t differ much from each other, so sd_Intercept_ID will be smaller than sigma which becomes large to encompass the measurements change notably even within the same mouse. Makes sense so far?

The point is that in both examples, the first observations are exactly the same, so if we observed just the first column, we are unable to distinguish between “large sd_Intercept_ID, small sigma” and “small sd_Intercept_SD, large sigma” (and everything in between).

Technical aside for completeness: if you observe only one value per mouse, the model only informs the total sd which is sqrt(sd_Intercept_SD^2 + sigma^2), and that’s why the pairs plot for the two looks roughly like a circle arc. Also, for some non-gaussian families you could - at least in principle - identify sd_Intercept_ID even when you have only one observation per individual, because the gaussian varying intercept can create a somewhat different variability pattern than the variability introduced by the family, but for gaussian family (and few others) the case is hopeless even in theory, because (with a bit of sloppy notation) normal(normal(mu, sd1), sd2) is exactly the same as normal(mu, sqrt(sd1 ^ 2 + sd2 ^ 2)).

Now for your data, it might actually make sense to take a different approach for each of the responses: The song length is not fixed per individual, there is (I assume) substantial within-individual variability and you have multiple measurements of the song length so you can identify both the “within-individual” and “between-individual” variability. For body size, I would expect much smaller within-individual variability (although there probably is some due to measurement imprecisions, and I’ve heard people’s height changes slightly over the course of day so mice’s probably do so as well). While you can’t directly quantify the within-individual variability as you have only a single measurement, I guess you can easily put some quite strict bounds on it using your knowledge of the domain. In a single-response model this could be achived by putting a narrow prior on sigma.

There would be some technical challenges for putting both in a single brms model. A slightly sub-optimal but probably easiest would be to use the se() addition term. You would have Resp1 | se(error) ~ ... This effectively fixes the sigma at a specific value separately for each row in the dataset which will no longer be estimated. The error represents a column in the data containing the standard error of the mean (sd(x) / sqrt(n)), so for song length, you would put average song length as the response and the observed standard error of the mean as error. For body size you would put the single measurement as the response and put the theoretically derived measurement error as error. (this paragraph is pure speculation on my side, I’ve never built such a model, but I think it should work).

If your responses are all positive, than the natural trasformation would IMHO be be taking the log (and potentially scaling then, but that might not be necessary). The log is also likely to reduce the skew, so you might be able to get away with gaussian family and use rescor (which I now believe to be quite beneficial).

As I said, this will change the interpretation of the coefficients but I think that this is actually more natural - say you get estimate of sd_intercept_ID roughly 5 for a model on the original scale: this means that between-mice variability is something like +/- 2*5. If the mean population song length is say 30, this is unproblematic but what if the mean song length is 8, this would imply that some mice have average song length of -2 …

If you work on the log scale and you get estimate of sd_intercept_ID roughly 0.55 ~= log(3)/2, this means that the between-mice variability is roughly between “the song is shorter by a factor of 3” and “the song is thirce as long” which makes sense regardless of the population mean…

Does that make sense?

ttburk · August 1, 2020, 11:10pm

Hi Martin–thanks for your reply! And sorry for the delay in my response, was at a conference this week… and also had a little reading to do before I could figure out how to implement some of the things you mentioned.

Yes–I’m with you thus far!

martinmodrak:

There would be some technical challenges for putting both in a single brms model. A slightly sub-optimal but probably easiest would be to use the se() addition term. You would have Resp1 | se(error) ~ ... This effectively fixes the sigma at a specific value separately for each row in the dataset which will no longer be estimated. The error represents a column in the data containing the standard error of the mean (sd(x) / sqrt(n)), so for song length, you would put average song length as the response and the observed standard error of the mean as error. For body size you would put the single measurement as the response and put the theoretically derived measurement error as error. (this paragraph is pure speculation on my side, I’ve never built such a model, but I think it should work).

To check that I understand you correctly, you are recommending the se and recor terms together in addition to the (1|p|ID) term to model individual variability, the correlated residuals, and varying intercepts, respectively?

Here is what I am fitting:

mice$Song <- log(mice[,trait1])
mice$Song <- scale(mice$Song)
mice$Condition <- scale(mice[,trait2]) # this trait has neg and pos values, so log not appropriate

bf_song <- bf(Song | se(error, sigma=TRUE) ~ Sex + Site + (1 | p| gr(ID, cov = A) ) + (1 | q | ID2) ) 
bf_cond <- bf(Condition ~ Sex + Site + (1 | p| gr(ID, cov = A) ) + (1 | q | ID2) ) 

fit <- brm(bf_song + bf_cond,
           data=mice, data2=list(A=covmat),
           family=gaussian(),
           control = list(adapt_delta=0.9,stepsize=0.001, max_treedepth=20),
           warmup = 200, iter = 1000,
           chains = 1, cores = 1,thin=1,prior=priors)

And here are the results:

Family: MV(gaussian, gaussian) 
  Links: mu = identity; sigma = identity
         mu = identity; sigma = identity 
Formula: Song | se(error, sigma = TRUE) ~ Sex + Site + (1 | p | gr(ID, cov = A)) + (1 | q | ID2) 
         Condition ~ Sex + Site + (1 | p | gr(ID, cov = A)) + (1 | q | ID2) 
   Data: mice (Number of observations: 81) 
Samples: 1 chains, each with iter = 1000; warmup = 200; thin = 1;
         total post-warmup samples = 800

Group-Level Effects: 
~ID (Number of levels: 81) 
                                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Song_Intercept)                          0.35      0.21     0.01     0.74 1.02       69      104
sd(Condition_Intercept)                     0.44      0.27     0.02     0.99 1.02       32      112
cor(Song_Intercept,Condition_Intercept)    -0.06      0.21    -0.46     0.38 1.00      585      592

~ID2 (Number of levels: 81) 
                                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Song_Intercept)                          0.32      0.21     0.02     0.71 1.01       46      179
sd(Condition_Intercept)                     0.53      0.28     0.04     1.01 1.03       38      110
cor(Song_Intercept,Condition_Intercept)    -0.03      0.20    -0.42     0.38 1.00      456      553

Population-Level Effects: 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Song_Intercept         -0.19      0.25    -0.66     0.30 1.00      887      659
Condition_Intercept    -0.16      0.29    -0.71     0.41 1.00     1134      670
Song_SexM              -0.01      0.27    -0.55     0.51 1.00     1042      717
Song_SiteMH            -0.25      0.29    -0.84     0.33 1.00      902      635
Song_SitePC            -1.05      0.73    -2.47     0.40 1.00     1445      800
Song_SitePH             0.12      0.41    -0.72     0.90 1.00      736      564
Condition_SexM          0.28      0.31    -0.34     0.87 1.00     1038      669
Condition_SiteMH       -0.32      0.35    -1.00     0.39 1.01      990      562
Condition_SitePC       -1.23      1.11    -3.50     0.95 1.00      941      523
Condition_SitePH        0.66      0.54    -0.48     1.68 1.00     1034      694

Family Specific Parameters: 
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_Song          0.38      0.20     0.05     0.73 1.05       35      147
sigma_Condition     0.64      0.26     0.19     1.07 1.02       23       54

Residual Correlations: 
                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
rescor(Song,Condition)    -0.04      0.17    -0.34     0.35 1.00      618      664

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

no divergent transitions…(but with more iterations/chains, perhaps?)

If this is correct, I think I need a little help still in how to interpret this. But first things first–have I interpreted the information you provided me above into an appropriate model?

Finally:

Mental math isn’t always my favorite–but yes, I understand your explanation!!

Thanks again!

martinmodrak · August 4, 2020, 9:10am

So did I manage to get across why - with single observation per mouse - you are unable to estimate both sigma and sd_Intercept_ID? (honestly want to know if this example is good pedagogy and whether I can reuse it elsewhere).

Yes.

Some of this looks suspicious to me (but remember, you know your data better than I do, and also I didn’t even try to run the model, so don’t get fooled by the apparent certainty of my writing).

Just to be clear - we are no longer looking at body weight and song length, right? (otherwise I can’t see how you would get negative values).

I would keep sigma=FALSE - as I tried to show before, estimating sigma AND 1 | p | ID at the same time is likely problematic. Further I think we can somewhat safely avoid estimating sigma, so great!

I think (and here, please really do check me) that while using rescor=TRUE you can avoid the (1 | q | ID2) term, because the rescor should model the non-genetic correlation and se should model the within-mouse variability.

So Condition is a trait where you only have one measurement per mouse, right? To avoid the non-identifiability between sigma and sd_Intercept_ID, I would use the Condition | se(error_condition) term here where error_condition would be some a-priori theoretical estimate of the individual (within-mouse) variability in Condition with hypothetical repeated measurements. This obviously only makes sense if you actually can make a defensible estimate of this variability from theory.

I guess you were getting divergent transitions with less extreme settings, right? I think with the tweaks above, you should be able to have less extreme settings / revert to default on most. Also note that your posterior intervals for all quantities of interest (correlations and population-level effects) are pretty wide and basically centered around zero, i.e. even though you are able to fit the model, you are unable to learn much about its parameters from your data. I would expect there to be strong correlations between them on a pairs plot and it is possible that with the less flexible model (as I suggested above), you would be able to constrain the parameters and learn something useful.

Does that make sense?

ttburk · August 4, 2020, 4:28pm

Yes, I understand your explanation!

Hmm–what about the model looks suspicious to you?

So–I changed the names of Resp1 and 2 so that it would be clearer to me where the error terms were coming from and what they were affecting. Apologies for not being consistent! The body size trait is the same as in previous models–I didn’t mention this for semantical simplicity (but I guess it does make a mathematical difference!), but the exact trait is residual body mass (RBM), so it’s mass that has been scaled by skeletal length. Thus, this trait is centered around zero and can have negative values. Sorry for not explaining earlier.

I see; I misunderstood your first message to mean “while it would be better to have both, the fitting is tricky and thus is likely impossible to do (but do it if you can)”.

However, I am actually having a hard time fitting models with sigma=FALSE–I get the following message, no luck yet figuring out a workaround (but I’m sure I will find a useful link on discourse soon):

Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.

Okay! Again, I think I misunderstood as I did above ^

Got it!

Yes, this makes sense! Haven’t figured out how to fit it as specified though because of the issue mentioned above. Will work on this. Thanks again very much for your time and care on this problem.

martinmodrak · August 4, 2020, 8:46pm

That was just an introduction to the individual points I raised below :-)

No worries. There might be some room to make the model better by taking the process of how this RBM thing is calculated into account, but that’s probably a second order concern, let’s first get your model to work well…

ttburk:

However, I am actually having a hard time fitting models with sigma=FALSE –I get the following message, no luck yet figuring out a workaround (but I’m sure I will find a useful link on discourse soon):
Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.

Oh, bummer. If this message only appears with sigma = FALSE, my first guess would be that some of the values of error are invalid - are there any zeroes, infinities, etc.?

Good luck!

ttburk · August 7, 2020, 11:47am

Model working with default priors and only 3 divergent transitions with 4 chains at 750 warmup 5000 iterations! Should be simple (hopefully) to fix divergent transitions with prior specification or other parameters.

bf_song <- bf(Song  |se(error_song,sigma=FALSE) ~ Sex + Site + (1|p|gr(ID, cov = A)) )#+(1|q|ID2)) 
bf_cond <- bf(Condition |se(error_cond,sigma=FALSE)  ~ Sex + Site + (1|p|gr(ID, cov = A)) )#+(1|q|ID2))
fit <- brm(bf_song + bf_cond,
           data=mice, data2=list(A=covmat),
           family=gaussian(),
           control = list(adapt_delta=0.9,stepsize=0.001, max_treedepth=10),
           warmup = 750, iter = 5000,
           chains = 4, cores = 1,thin=1,prior=priors)

Song = song length
Condition = RBM aka residual body mass

 Family: MV(gaussian, gaussian) 
  Links: mu = identity; sigma = identity
         mu = identity; sigma = identity 
Formula: Song | se(error_song, sigma = FALSE) ~ Sex + Site + (1 | p | gr(ID, cov = A)) 
         Condition | se(error_cond, sigma = FALSE) ~ Sex + Site + (1 | p | gr(ID, cov = A)) 
   Data: mice (Number of observations: 80) 
Samples: 4 chains, each with iter = 5000; warmup = 750; thin = 1;
         total post-warmup samples = 17000

Group-Level Effects: 
~ID (Number of levels: 80) 
                                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Song_Intercept)                          0.36      0.05     0.27     0.47 1.00     7512    10951
sd(Condition_Intercept)                     1.09      0.09     0.93     1.29 1.00     3200     5798
cor(Song_Intercept,Condition_Intercept)     0.14      0.16    -0.17     0.44 1.01      763     1589

Population-Level Effects: 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Song_Intercept          0.03      0.22    -0.40     0.45 1.00     9138    10565
Condition_Intercept    -0.50      0.52    -1.50     0.52 1.00     2938     5410
Song_SexM               0.19      0.16    -0.13     0.51 1.00    12381    13267
Song_SitePC            -0.08      0.39    -0.84     0.69 1.00    15039    12655
Song_SitePH             0.21      0.30    -0.37     0.79 1.00    10302    11256
Song_SiteQERC           0.07      0.20    -0.32     0.45 1.00     8154    10254
Condition_SexM          0.33      0.32    -0.29     0.97 1.00     2856     6049
Condition_SitePC       -0.79      1.08    -2.90     1.29 1.00    10800    11691
Condition_SitePH        0.90      0.73    -0.54     2.32 1.00     5512     8395
Condition_SiteQERC      0.31      0.50    -0.68     1.29 1.00     2983     5088

Family Specific Parameters: 
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_Song          0.00      0.00     0.00     0.00 1.00    17000    17000
sigma_Condition     0.00      0.00     0.00     0.00 1.00    17000    17000

Residual Correlations: 
                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
rescor(Song,Condition)     0.04      0.33    -0.59     0.65 1.00     4655     4690

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Warning message:
There were 3 divergent transitions after warmup. Increasing adapt_delta above 0.9 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

ttburk · August 7, 2020, 9:51pm

Got the model to run without divergences by adjusting the max tree depth to 15!

> summary(fit)
 Family: MV(gaussian, gaussian) 
  Links: mu = identity; sigma = identity
         mu = identity; sigma = identity 
Formula: Song | se(error_song, sigma = FALSE) ~ Sex + Site + (1 | p | gr(ID, cov = A)) 
         Condition | se(error_cond, sigma = FALSE) ~ Sex + Site + (1 | p | gr(ID, cov = A)) 
   Data: mice (Number of observations: 80) 
Samples: 4 chains, each with iter = 5000; warmup = 750; thin = 1;
         total post-warmup samples = 17000

Group-Level Effects: 
~ID (Number of levels: 80) 
                                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Song_Intercept)                          0.36      0.05     0.27     0.47 1.00     7457    10479
sd(Condition_Intercept)                     1.09      0.09     0.93     1.29 1.00     3009     5591
cor(Song_Intercept,Condition_Intercept)     0.14      0.15    -0.17     0.43 1.00      854     1889

Population-Level Effects: 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Song_Intercept          0.03      0.22    -0.41     0.46 1.00     9347    11013
Condition_Intercept    -0.51      0.52    -1.53     0.48 1.00     3152     5410
Song_SexM               0.19      0.17    -0.14     0.51 1.00    13033    13224
Song_SitePC            -0.09      0.38    -0.83     0.69 1.00    15065    11939
Song_SitePH             0.21      0.29    -0.36     0.79 1.00    10274    11456
Song_SiteQERC           0.07      0.20    -0.31     0.46 1.00     8439    10707
Condition_SexM          0.33      0.33    -0.32     0.98 1.00     2950     5458
Condition_SitePC       -0.78      1.07    -2.87     1.34 1.00    12042    11363
Condition_SitePH        0.90      0.72    -0.51     2.31 1.00     5084     8453
Condition_SiteQERC      0.32      0.50    -0.65     1.32 1.00     2897     5603

Family Specific Parameters: 
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_Song          0.00      0.00     0.00     0.00 1.00    17000    17000
sigma_Condition     0.00      0.00     0.00     0.00 1.00    17000    17000

Residual Correlations: 
                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
rescor(Song,Condition)     0.04      0.30    -0.54     0.60 1.00     4758     5861

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

So–if I’m not mistaken, I can proceed to interpreting the genetic correlations–and I think I can use the value of rescor to understand the degree of phenotypic correlation and the cor term to understand the degree of genetic correlation. I hope someone will yell at me if I am wrong…Thanks again to all who responded, particularly Martin

dpascall · August 8, 2020, 3:11pm

The degree of phenotypic correlation (after accounting for the population-level effects) is actually the sum of the genetic and the residual correlations.

See tutorial 2 here:
https://besjournals.onlinelibrary.wiley.com/doi/10.1111/j.1365-2656.2009.01639.x

martinmodrak · August 10, 2020, 1:54pm

Kind of :-) Looking at the estimates, the posteriors for basically all the parameters are quite wide and don’t even constrain the sign of the parameters, so I don’t think there would be much to actually interpret, except that your data didn’t provide enough information.

The fact that you needed to ramp up treedepth signals that some model modifications might still be warranted, but at this point, I doubt it’s a good use of anybody’s time - I don’t think you can defensibly answer your questions with a simpler model and you are able to inefficiently fit the complex model and see that the posterior is very wide…

Still hope this was a worthwhile endeavour for you…

jonnations · September 4, 2020, 12:58am

Late reply, just saw this. Thanks for your comment here @martinmodrak. I imagine other people have dealt with the issue highlighted in this post, as I have run across it in similar phylogenetic/animal regression datasets, large (500+ species) to small (20 species). sigma and sd_intercept posteriors find (mostly) stable peaks, but occasionally (~1% of samples) cross from one to the other, then immediately bounce back. This can lead to divergent transitions. Tightening priors like I described can make the divergent transitions “disappear” via brute force, however the other parameters in the model (b_, intercept, etc) don’t change, as they don’t seem to care where the error comes from, they just need know it’s there. Having no formal training, I’ve been under the impression that priors can help when we have beliefs but the data falters. Your emphasis on domain knowledge is appreciated, such as secondary verification of heritability/phylogenetic signal from other tools, or just keen observation of the system. The issue that’s challenging to understand from an introductory perspective is the (fuzzy?) line where updating priors to improve model performance goes from good to dangerous.

Thank you and ttburk for the hard work on this post! It’s chock full of helpful info.

martinmodrak · September 6, 2020, 3:27pm

The fact that the parameters you care about don’t change is definitely some comfort. While it is possible the degeneracy/non-identifiability really only affects the sigma/sd terms and doesn’t “spill” into the rest of the model, I don’t think this is guaranteed in the general case. Though I don’t understand these models well enough to be sure either way.

You are right that priors can help if data falters, but unless you can defend the priors without referring to your particular experiment (e.g. being able to a-priori say that the measurement variability is larger than between-individual variability), you always run the risk of influencing the inferences. If the model has issues without the strong prior, it can then be very challenging to check whether the priors do have influence on some of the inferences you care about. Divergences can signal that the sampler cannot explore a part of the posterior, and since divergent transitions are rejected, it can act somewhat like an additional prior that puts very low probability on the problematic region. So the similarity between results from the model with wide priors and divergences the model with narrow priors provides no guarantees that the prior didn’t actually influence your inferences.

I think the line where updating priors becomes dangerous is where the prior cannot be safely anchored in knowledge outside of the experiment you ran. This is obviously a bit fuzzy as reasonable people can disagree about which outside knowledge is relevant and how to express it in forms of prior. I think that you are entering the dangerous zone the moment you can’t be certain that you could defend the priors easily against a skeptical expert.

In any case, I think this specific problem (where you cannot distinguish between two sources of variability) is actually best handled by modelling only a single source of variability - this approach is unlikely to change your inferences about other parameters and makes it explicit you can only say something relevant about the total variability.

Does that make sense?

jonnations · September 6, 2020, 4:53pm

This is a very helpful explanation. I haven’t thought about divergences this way.

Yes, that makes sense, but it’s something I’ll have to play around with in practice to conceptually verify. I’m guessing that that is a technique that is used in other types of models with multiple error sources that are problematic? One benefit of these phylo/animal models is that, ideally (but sometimes not in reality), one could differentiate between the sources of error, which then tells us something about trait evolution. But if non-identifiability reigns, the overall error structure can be included as a single parameter, and the focal results of the hypothesis test are unchanged (effect of x on y), then this idea (in this case, dropping sigma) seems like a good second-place option.

Great convo :)

ron · September 8, 2020, 1:39pm

The phenotypic covariance matrix is the sum of the genetic and residual covariance matrices.

dpascall · September 8, 2020, 1:44pm

Yep. This is actually right. The sum of the correlations isn’t the same as the correlation of the sum of the covariances. That’s my error.

Topic		Replies	Views
Hierarchical brms model: interpretation of cor(Intercept, predictor) Modeling brms	1	517	August 25, 2021
BRMS missing genetic correlation brms	4	502	January 11, 2023
Brms & covariance among random effects brms	7	1873	October 5, 2023
Multivariate model interpreting residual correlation and group correlation brms interpret-results	14	3830	September 12, 2020
Correlation between random intercept and residual in a multivariate model brms	1	559	November 28, 2019

Interpretation of cor term from multivariate animal models

Related topics