Can someone offer some advice for summarizing a hierarchical model with thousands of parameters?
I fit a model with 3000+ parameters in RStan. I have multiple varying intercepts and slopes, but obviously using print() for a summary of the posterior is impractical.
Any resources on how to handle this will be much appreciated!
In a hierarchical model, you can summarise the dispersion of the cluster-varying parameters by the standard deviation of their distribution. It’s also common to compute the total variance (i.e. the variance of the cluster-varying parameters + residual variance), and define ratios of variance components, e.g. the varying-intercept variance divided by the total variance provides the classic intraclass correlation coefficient
It is also often useful to summarize a model via it’s predictions (e.g. how would the predictions if all data were assigned to a single condition, keeping all other predictors the same)
The above answers are good suggestions for how to adequately summarize the 3000+ parameters (+1 to both).
You also might be asking a practical question about how to see summaries of the marginal posterior distributions for a subset of parameters of interest without dumping summaries of 3000+ parameters into your R console. In this case, try something like:
variables_of_interest <- c("variable_name_1", "variable_name_2")
draws_summary <- posterior::summarize_draws(your_draws_array)
print(draws_summary[draws_summary$variable %in% variables_of_interest,], n = 30) # n=30 prints (up to) 30 rows of the tibble
# or if you want to get every element of a particular random effect you can do
print(draws_summary[grep("var_name\\[", draws_summary$variable),], n = 30)
# or you can do things like
print(draws_summary[draws_summary$rhat > 1.01,], n = 30)