Hi all, I am trying to run the following meta-regression model in Brms, but have run into a problem selecting an appropriate error distribution.
meta_fit_species <- brm(LRR| se(LRR_SE) ~ SpeciesNames*DaysSinceFlood + (1|Barrier), data = meta_df_conditional, family = student, chains = 4, cores = 4, backend = "cmdstanr", iter = 4000, warmup = 2000, seed = 123, control = list(adapt_delta = 0.9), save_pars = save_pars(all = TRUE), sample_prior = "yes")
The data I have are absolute value log-response-ratios and associated standard error, with the absolute value taken as a negative and positive effect sizes are equivalent. My data is heavily right-skewed, but also has a large number of outliers. The number of outliers is in line with expectations, i.e. they are unlikely to be a result of error.
I believe that a gamma distribution is the best fit for the data, but in Brms I can only include standard error in the response when fitting to a normal, skew-normal, ot student distribution. None of these are perfect, as a skew-normal fails to capture outliers, and a student fails to capture the skew. LOO is much better with student, but posterior-predictive-checks look very bad for both.
Is there a way to deal with this in Brms, such as creating a custom family, or do I need to find another approach? I would also like to include a variance-covariance matrix in the model with fcor(), which is possible with a student distribution, but not a skew-normal. I suspect this wouldn’t be possible with a custom family.
For some extra context, my study involves comparing fish communities in fragmented rivers, and I am using a meta-regression to synthesise results from 6 individual river models fit in Brms. Since I have run these models myself, I have access to all raw data and posterior draws, and have calculated log-response-ratio comparisons between factor levels (river fragments). If there is a way to better take advantage of all the data available to me, rather than simplifying the data with log-response-ratios, I am all ears. I am aware that a hierarchical model is probably a better approach, but is unfortunately computationally unfaesible. Each individual river model takes over a week to run ony my university’s HPC, so a hierarchical model would not be close to finishing within the 200 hour time limit imposed on jobs.