Are complex surveys feasible in brms?

Hi all,

Is there any way/advice to handle complex surveys such as NHANES or DHS (PSU+STRATA) in brms?

I normally work with survey package in R but I would like to move to brms. Firstly because Bayes, and secondly, my research is pointing to a more multilevel direction right now.
This is an example for one survey

svydesign(id = ~v001+v002, 
                 strata = ~strata,
                 weights = ~v005,
                 data = dhsSurvey)

thanks in advance,



Tagging @bgoodri @jonah @lauren and @paul.buerkner


It seems that it is hard to say something about this without knowing the survey package (which I do not).

So if you could describe with a bit more detail what your data look like, and how you typically perform an analysis, this would make it easier for others to try to help.

One can use weights and analyse nested designs in brms, but I can’t say more because I don’t understand what the svydesign call you describe does (which looks more like an intermediate step of the analysis than the final step).


Hi Guido, thanks for anwering,

Basic summary
Although I am not an expert, let me try to expand the details for you. Briefly, DHS surveys are based on cross-sectional representative samples of individuals and household level evaluation indicators on health, nutrition, and HIV serostatus over time. These surveys use a two-stage sampling design through a set of defined survey locations (primary sample units or PSU) statistically weighted to control for some strata. They are very common in nhanes (CDC), as well as school surveys (PISA).
The weight is important for me for two reasons. First, I am interested in making inferences about the population, not the sample. Second, because, in national datasets, certain subpopulations (e.g., Hispanic, urban/rural, or Asian students) are oversampled to ensure that the sample size is large enough for subgroup analysis. Without using weights, those groups may be overrepresented (or underrepresented) leading to erratic SE for covariates.

I normally perform this pipeline using the survey package (Lumley et al.) in a frequentist approach as:

x <- data.frame( psu_variable = sample( 1:3 , 200 , replace = TRUE ), strata_variable = sample( 1:10 , 200 , replace = TRUE ) , weight_variable = sample( 1:2 , 200 , replace = TRUE ) , some_variable = sample( 1:50 , 200 , replace = TRUE ) , outcome=sample( 0:1 , 200 , replace = TRUE ))

let’s suppose, I have a DHS for South Africa

# x
x[ , 'country_name' ] <- 'ZAF'
  1. design creation:
x_des <- svydesign( ~ psu_variable , strata = ~ strata_variable , data = x , weights = ~ weight_variable , nest = TRUE )
  1. Descriptive statistics for any variable

svymean( outcome~some_variable , x_des )

  1. Suppose a logistic regression based on the individual-level covariate

m0.w.svy<-svyglm(outcome~some_variable, design=x_des , family=quasibinomial())

  1. Finally, I post-stratify to reflect the country relative population (here)

Since I have several covariates that can complement the analysis but they are at Admin1-level resolution (e.g. Malaria incidence, Friction maps…). My ultimate goal is to add them properly into a brms stack taking care of standard errors and the survey design.

Something like this:

data = your_dataframe,
weights? design=(psu, strata)
prior = bprior,
family = bernoulli(link = “logit”),chains=4,warmup=500,
iter = 2000, cores = 4,seed = 2020,inits = “0”,
control = list(adapt_delta = 0.8,max_treedepth=10))

Thanks in advance,


Hei Esteban,

Thanks for the detailed description.

It is straight forward to use weights in brms:

yi | weights(wei) ~ predictors

(This is just the example from the brm documentation).

I do not know what the purpose of the
“design” statement in the analysis with the survey package is, which means that I can’t tell if or how this could be implemented in brms. Can you provide additional information about this?

PS: More generally about weights: These are sometimes critsized from Bayesians because using weights is not considered a fully Bayesian, and because using one set of weights removes uncertainty about weights from the analysis. An alternative approach is Multilevel Regression and poststratification (MrP, see e.g. here: I added this only for reasons of completeness, not to discourage the use of weights)


Hi, I describe one way of doing this by using nested random effects for the survey design in this Rpub


You made me curious and I had a look at your document. I am not confident that the brms and svyglm analysis in your document can be expected to generate the same results, because the brms analysis uses only weights and not design information.

A quick search lead me to this stackexchange post: Where Lumley (2010) is quoted as follows:

In a model-based analysis it would be necessary to specify the random part of the model correctly to get correct standard errors, but all our standard error estimates are design-based and so are valid regardless of the model. It is worth noting that the “sandwich”, or “model-robust”, or “heteroskedasticity-consistent” standard errors sometimes used in model-based regression analysis are almost identical to the design-based standard errors we will use; the main difference being in the handling of stratification.

So the design model is used to obtain “better” standard errors for estimates obtained from frequentist estimation. I don’t know how this would be implemented for Bayesian estimation. (Using MrP as suggested above might allow to obtain valid SEs, but I know too little about complex surveys to be able to have an opinion about this)


Yes, you’re right, they don’t give the same results at all, but the weights are effectively weighting the likelihood contribution of each person (you need to scale the weights first, as is done in the survey library), and then the random effects control for the homogeneity within the sample strata, so you are effectively modeling the stratification of the survey design. The standard errors of the parameters are another thing, the frequentist based sandwich (robust) standard errors are something that i’ve never been able to reconcile with the Bayesian paradigm, so I agree with you on that one. There’s a chapter in the Snijders and Bosker (2nd ed) Multilevel analysis book that describes modeling the design of the survey in this manner.


Disclaimer: I am continuing the conversation to learn myself.

OK, so svyglm uses a hierarchical model under the hood to account for homogeneity within strata sample?

I am asking because in your post the estimated fixed effects of the lmer model (hierarchal design + weights) and svyglm model are not the same. Maybe what is going on is that the svyglm and brms results are more similiar because brms does a better job at estimating the variance of the random intercept*? Maybe svyglm estimates the hierarchical model (if it uses one) differently than lmer?

  • ML estimates of random effects variance are biased towards 0, which is consistent with the higher value in brms compared to lmer. But I don’t know if this explains the difference in the fixed effects.

no, svyglm uses the weighted likelihood and clustered (sandwich) standard errors. It also corrects the degrees of freedom for tests to (usually) be the number of strata vs the sample size.
The idea of using the hierarchical model is basically an alternative perspective on controlling for the effects of the survey design (non-randomness)

1 Like

Hi Esteban,
We recently published an article using brms() with the DHS data for Ethiopia, and I’m happy to share our code with you if you like. Our basic approach was to do:

options (mc.cores=parallel::detectCores ())

mdatall$weight <-mdatall$v005/1000000 Per the user manual

fit1b<-brm(outcome|weights(weight)~scale(x1)+(1|region)+(1|strata), mdatall, family=bernoulli, chains = 2, cores = 2, iter=4000, warmup = 1500)

Paper is open access here


Per Esteban’s original question: I doubt that there is an implementation of design-adjusted, model-based brms in the simple way that it is using the design-based survey models. My understanding is that design-based estimators do not assume an underlying distribution of values across design variables and so are “automatic” in a sense. Per Roderick Little, “Design weighting… has the virtue of simplicity, and by avoiding an explicit model it has an aura of robustness to model misspecification” (“Comment: Struggles with Survey Weighting and Regression Modeling” 2007, p. 1).

By contrast, model-based analyses require assumptions about the relationships between the outcome and design variables. It requires a specification of a model for the design variables. Currently, there is no single model that can be implemented to give the “best” estimates in all cases that can automatically be applied. See “Bayesian Nonparametric Weighted Sampling Inference” by Si, Pillai, and Gelman (2015) for a comparison of different models for the sampling weights.

Guido stated that weighting the likelihood (thus, a pseudolikelihood) is “not considered fully Bayesian.” This is because it is not “generative,” i.e. it is not “a proper probability distribution for the parameters” (Bayesian Data Analysis, Gelman et. al. 3rd ed. p. 336). Bob Carpenter wrote up a nice illustration of this issues. One solution is to estimate parameters within cells (ideally with shrinkage) and post-stratify on known cell counts (MRP). The alternative is to model the relationship between the design variables and outcome (see again the article by Si, Pillai, and Gelman, above; or Chapter 8 of BDA).

And, as Corey’s RPubs example shows, weighting the likelihood doesn’t produce the same standard errors as the design-based survey estimates. Instead, it produces basically the same standard errors as the model-based frequentist analysis in lmer. In that way, weighting is not sufficient to get wide enough standard errors and accurate coverage.

Note that the scale of the weights matter when estimating models in Stan. This is not true of frequentist analyses other than population totals (see page 100 of “Applied Survey Data Analysis” by Heeringa, West, and Berglund, 2017) This is important to know in general, since weights oftentimes represent number of units in the population. In Corey’s example, the weights are scaled to sum to N prior to estimation in brms, so they are similar to the the lmer results.

One “non-Bayesian” workaround may be to estimate a design-effect using the survey package and then scale the weights down according to the design effect (e.g. if the design effect is 2, have the weights sum to N/2). This is in theory similar to the post-estimation adjustment approach in frequentist methods, where you scale the standard errors according to the design effect. From experience, this produces similar results to estimates based on the survey package, but it never perfectly replicates the estimates. I haven’t seen any mathematical proofs of this working, nor have I seen it suggested elsewhere. So proceed with caution.


Hi all,

First, let me thank you all the time and effort all of you put into the thread. I have tested several approaches getting similar estimates between lmer - brms - INLA, but not similars.e. as pointed by Simon:

… weighting the likelihood doesn’t produce the same standard errors as the design-based survey estimates. Instead, it produces basically the same standard errors as the model-based frequentist analysis in lmer . In that way, weighting is not sufficient to get wide enough standard errors and accurate coverage.

Although I don’t wanna be obsessed with MLM+brms and picky with the standard error. After 2 yrs of my PhD dealing with those DHS and being happy with survey package and the conservative s.e. of design surveys, I strongly think that MLMs might be a good fit to evaluate random effects of population-level covariates (not only geospatial) with the nested structure of DHS datasets (DHS survey locations are nested within Admin1-level, which are themselves nested within countries). In addition, more and more population health datasets are design-based (e.g. nhanes, brfss), and this might be extended to several problems. So that’s why I am still curious regarding what seems to be an open question in several forums.

As recommended by Skinner and Wakefield [URL]. I am also aware of Cory’s point of view about adding them as fixed effects, I will check your paper.

In the meantime, I am still using survey+poststratification but any advice is welcome.

Best regards,

1 Like

Just for posteriority, this was an interesting read:

Also see the last sentence 🙂


Very good read, especially if people are coming from disciplines where weighting isn’t common. In social sciences, all surveys that are collected by national/federal sources have weights so the sample can be representative of the population, that’s what the original post was related to. Problem is that a lot of traditional statistics students never see this kind of thing in their studies, so we have confusion around these topics.

This area (Bayesian stats on complex survey samples) is very near and dear to my heart and i’d be happy to continue the discussion about making correct inference from models that use these data in a Bayesian modeling paradigm.

1 Like

I’m down for a Bayesian complex survey discussion group if people are! I’ve been interested in this topic for a while, doing a calculation here and there and it’d be nice to have people to chat to about this. Tagging @lauren and @jonah as others are already on this thread.

@lauren runs a great MRP discussion group via zoom every Wednesday. The topics covered include complex survey topics in general (not exclusively MRP). I recommend joining the group, which just entails reaching out to @lauren for an invite (all are welcome). In case @lauren doesn’t see the post here, I know other people have reached out to her via twitter (@jazzystats) to get the link to the group.


Hi folks,

Yes we do run a reading group, where we cover surveys, Bayesian inference and the like. You’re all welcome, the current method is to reach out to me or Rohan Alexander (UofT), and we send you an email to the google group.
Sorry I missed the topic tag initially, broadly the answer is no - brms won’t do the same thing with weights when compared to a design based survey in the survey package, for the reasons outlined earlier. Also agree with Lumley - Bayesian stats, random effects and sampling design/weights really hurt my head!


No! I absolutely not, I had no intention of doing that, i’m contacting both off this post

As there seems to be some interest in this, I’ll also point to Roderick Littles work on “Calibrated Bayes”, which seems highly relevant. (I just stumbled over this while procrastinating.)

Edit: OK, most of you might already be aware of that work …