Are complex surveys feasible in brms?

Hi all,

Let me share with you this fresh report from USAID (august 2020) about doing MLM in their complex surveys to enrich the discussion for brms:

Multilevel Modeling Using DHS Surveys: A Framework to Approximate...

Briefly, It seems suitable for WeMix package but I will give a try



There’s a lot of great discussion here! I recently posted a review article on one-level models for survey data: (Pseudo) Bayesian Inference for Complex Survey Data

For multi-level models, there is an additional level of complexity if the sampling design is also multi-stage and the sampled units correspond to entities in the model hierarchy. For example sampling schools then students or hospitals then patients. There are competing approaches (are we surprised?!) but they all seem to need the sampling weights for each stage, which is often not available. This challenge is well known in the survey literature, but most approaches are not satisfactory. There’s a lot of potential to taking a Bayesian or (Pseudo-Bayesian) approach. We have a couple pre-prints in this space that might be interesting.

The general approach with perhaps some theory missing

Some more theory for a super simple model

Both papers do review some of the competing methods so take a look!

We’ve been using Stan (of course!) but with multiple weights, we are using custom models and Rstan instead of using packages like brms.


I wanted to add to the discussion a “new” (newly published, but drafts have seemingly been available for a while) article: “Bayesian Hierarchical Weighting Adjustment and Survey Inference” and the accompanying code. My reading of the article is that it gets toward a solution that @maurosc3ner was hoping for: a relatively “automatic” model for survey inference, though one that requires population counts for post-stratification cells. @jonah is one of the authors, so feel free to correct me if I’m misunderstanding the implication of the model.

The paper describes a model for cell means in a regression model, but it seems like you should be able to extend the same model for global-local shrinkage to cell-specific regression coefficients as well.


Based on this Si et al paper, it seems like MLM modeling for these survey designs should be possible. Has an implementation for brms been developed since this paper came out?

@maurosc3ner I’ve been sitting on this paper for a while but just now thinking to mention it here: Fully Bayesian Estimation under Dependent and Informative Cluster Sampling by Luis G. Leon-Novelo and Terrence D. Savitsky.

I believe it checks the boxes

  1. fully Bayesian
  2. estimates regression parameters
  3. only requires cluster indicators and weights for the observed data (not for the entire population of clusters)

I’m not aware of a publicly available implementation of the model but I believe you could code up a function that works identically to the survey package but with Stan as the backend.


Hi Corey,
I read the conversations on this post with much interest and I stumbled upon the paper and codes you shared here. I extensively work with DHS data and now planning to move to Bayesian methods for my all modeling. I found your approach most useful. I plan to implement multinomial logistic regression using the BRMS package and was hoping to review your codes for the paper you have mentioned here. I am writing a separate email to you requesting the same. Hope you don’t mind.

FYI, I am a public health researcher with only applied knowledge of the BRMS and Stan and hoping to learn from those who have a deeper understanding of both.

@tds151 is an author and could probably share the Stan model and code

1 Like


Currently, I am working on survival rather than surveys, but I will get some extra time, save this info and test it.