Please also provide the following information in addition to your question:
- Operating System: MacOS 10.14.5 Mojave
- brms Version: 2.9.0
I’m trying to fit a symmetric dyadic model (akin to Kenny et al. (1979)'s social relations model) in brms. It has the form y_ij ~ b * X_ij + a_i + a_j where i and j are two different individuals and y_ij is the similarity between them on some dimension and X_ij is their similarity on some other dimension(s). The specification in brms I’m using is:
brm(similarity ~ x_similarity + (1 | mm(ego, alter))
ego and alter is the identity of the two individuals but which one is ego and which one is alter is random, hence the mm model. (If it’s clearer, the model is essentially the same one described here: Distance matrix regression)
Sampling is fine for all of the parameters except for the Intercept/sd. Despite using 4 chains with 500 samples (300 warmup), I can’t seem to get the ESS for the Intercept and RE sd parameters to get higher than 10 (or the Rhats to get anywhere near 1.01).
The outcome variable is a cosine similarity between two texts so varies from 0 to 1. I expect the betas to be quite small so I am currently using the following priors:
mod_prior <- c(prior(student_t(3, 0, .05), class = b), prior(student_t(3, 0, .005), class = sd), prior(student_t(3, 0, .5), class = sigma), prior(student_t(3, 0, .5), class = Intercept))
However, I also have previously tried setting the sd of the sd hyperprior to be .5 or .05 without much change in convergence. I set it so low because regardless of the prior, the sampling behavior for the sd parameter is really bizarre. It simply continues to converge towards 0, i.e.
sd_hyperprior.pdf (14.5 KB)
Even with the super strong prior, it doesn’t seem to have reached anything like the typical set by the end of the 500 samples. Does this mean I just need a longer warmup? I should mention that I am running this with inits = 0 although I guess by the time sampling starts, the sd seems to have drifted significantly away from that.
It is somewhat surprising to me that these random effects are so hard to fit as each individual appears in the dataset as either ego/alter at least 130 times (and many around 2000 since there are 2300 individuals in the dataset with pairwise similarity computed available between many of them). Because it is such a large dataset (~2 million rows), it takes quite a while to fit the model so it doesn’t seem like I can just take 10x the number of samples to get the ESS where it needs to be. Are there any other obvious changes I could make to improve convergence for these random effects? This seems to be a pretty straightforward model and the population-level effects are well estimated. The only other potentially strange thing is that the similarity is constrained to be between 0 and 1 but I am using a Gaussian model. I normally wouldn’t worry about this kind of thing but there is some bunching near 0, i.e.
similarity_hist_raw.pdf (4.7 KB)
Would it be worth trying to put this in the model (use a truncated Gaussian instead or something)?