MRP: use 2nd-level parameters when generating probabilities?

Hi everyone. My question is about generating the probabilities that are to be reweighted in the context of a MRP analysis. I’m not sure if my confusion is about the MRP process or the technical aspects of Stan, but I would appreciate any insights.


For context, I have survey responses from students at a few schools. As is common with surveys, only a small percentage of those asked replied. But because we have administrative data, we have enough demographic information on both the respondents and the population of interest to poststratify the responses back to our intended population. The survey consisted of a number of questions with a “choose all that apply to you” options. We treat each question option as a single 0/1 outcome to which we fit a multilevel model and poststratify.

The Stan model code is below. We use a QR reparameterization, collapse the 0/1 outcomes into binomial clicks/views within 1st-level demographic cell, and center/scale the RHS variables (x and z) to speed up computation.

After fitting the models, we use R to recover the model parameters, multiply by a design matrix, and inv-logit transform to generate predicted probabilities of selecting the answer within each demographic cell (gender, 3 categories of age, and 4 race/ethnicity categories). The head of the design matrix looks like this, where int is aligned with the model’s alpha parameter:

      int female age_m age_h black hispc orace
 [1,]   1      0     0     0     1     0     0
 [2,]   1      0     0     0     0     1     0
 [3,]   1      0     0     0     0     0     1
 [4,]   1      0     0     0     0     0     0
 [5,]   1      0     1     0     1     0     0

We then reweight with cell population counts per MRP process:

\theta_{pop} = \frac{\sum \pi_jN_j}{\sum N_j}


Based on the way we’ve parameterized the model,

// 2nd level (school-level)
alpha ~ normal(z * gamma + mu_alpha, sigma_alpha);
// 1st level (student-level cells)
clicks ~ binomial_logit(views, Qx * theta + alpha[school]);

should we include the 2nd-level parameters (gamma) and 2nd-level covariates in the design matrix when computing the cell probabilities? Or should we only include alpha and beta, the 1st-level parameters as I’ve shown above?

I’ve done it both ways and the posterior credible intervals are much wider when including the 2nd-level parameters and covariates, which is contrary to my understanding of the benefits of 2nd-level predictors in a MRP analysis.

Again, thanks for any help you can offer.

Stan code

data {
  int<lower=1> N; // rows in 1st-level matrix
  int<lower=1> K; // cols in 1st-level matrix
  int<lower=1> J; // rows in 2nd-level matrix
  int<lower=1> L; // cols in 2nd-level matrix
  int views[N];   // number of cell members who saw question
  int clicks[N];  // number of cell members who clicked box
  int school[N];  // school indicators (at 1st level)
  matrix[N,K] x;  // 1st-level matrix (cells)
  matrix[J,L] z;  // 2nd-level matrix (school-level covariates)
transformed data {
  // Using the QR decomposition for 1st-level (x), 
  // but not 2nd-level (z) b/c there are more columns than rows

  // set up matrices for {Q}, {R}, and {R}^-1
  matrix[N,K] Qx;
  matrix[K,K] Rx;
  matrix[K,K] Rxi;

  // thin and scale the QR decompositions
  Qx = qr_thin_Q(x) * sqrt(N - 1);
  Rx = qr_thin_R(x) / sqrt(N - 1);
  Rxi = inverse(Rx);
parameters {
  real mu_alpha;             // 2nd-level 
  real<lower=0> sigma_alpha; // 2nd-level
  vector[J] alpha;	         // J intercepts for schools
  vector[L] gamma;           // L second-level parameters
  vector[K] theta;           // K first-level parameters
model {
  // Priors:
  mu_alpha ~ normal(0,1);
  gamma ~ normal(0,1);
  theta ~ normal(0,1);
  sigma_alpha ~ cauchy(0,2.5);
  // Likelihood 2nd level: intercepts distributed normally as function of
  alpha ~ normal(z * gamma + mu_alpha, sigma_alpha);

  // Likelihood 1st level: binomial (clicks/views, prob) with logit link; 
  clicks ~ binomial_logit(views, Qx * theta + alpha[school]);
generated quantities {
  // recover beta
  vector[K] beta;      // init vectorized parameters
  beta = Rxi * theta;  // \beta = {R}^-1\theta


1 Like

Sorry, it looks like your question fell through a bit. Did you manage to resolve it? Maybe @lauren is not busy and can help?