Marginal effects for nonlinear multilevel models

amillb · July 29, 2024, 7:09pm

I’m trying to calculate marginal effects for a multilevel nonlinear (beta) model using cmdstanpy. I have continuous predictors at both the top and lower levels of the model, roughly following this example.

Marginal effects of the lower-level coefficients seem pretty straightforward, either through feeding fake data to model.generate_quantities(), or computing them directly in the generated quantities block (see code below).

For marginal effects of top-level coefficients, what is the best way to do this? Fake data doesn’t work, as the generated quantities block takes the lower-level coefficients (betas) as fixed, but the top-level predictors work through the betas. All the examples I see (e.g. here) assume that there are no top-level predictors (only groups).

My attempt (somewhat convoluted) is to recompute the betas in generated quantities. See below. Is there a better way to do it?

Also, is there any equivalent to brms for Python, or another package that allows the easy computation of marginal effects?

Thank you!

data {
  int<lower=0> J; 
  int<lower=0> N;
  int<lower=0> L;                     // number of lower-level covariates (excl. intercept)
  int<lower=0> M;                     // number of top-level covariates (excl. intercept)
  array[N] int<lower=1,upper=J> group_num;
  vector[J] Z1;            // use length J rather than N
  vector[N] X1;
  vector[N] y;

} 

transformed data {
    // roughly following https://mc-stan.org/docs/stan-users-guide/multivariate-hierarchical-priors.html
    // individual predictor matrix
    matrix[N, L+1] X;
    for (i in 1:N) {
        X[i,1] = 1;
        X[i,2] = X1[i];          
    } 
    // group-level predictors
    array[J] row_vector[M+1] Z;
    for (j in 1:J) {
        Z[j][1] = 1.0;
        Z[j][2] = Z1[j];
    } 
}

parameters {
  corr_matrix[(L+1)] Omega;          // decomposed correlation matrix for individual-level predictors. 
  vector<lower=0>[L+1] tau;           // scale of diagonals
  matrix[M+1,L+1] gamma;              // top-level coefficients (including the intercept)
  array[J] vector[L+1] beta;          // lower-level level coefficients (including the intercept)
  real<lower = 0> phi;                // dispersion

} 

transformed parameters {
  vector[N] mu; 
  vector[N] xb;
  
  for (n in 1:N) {
      xb[n] = X[n] * beta[group_num[n]];
  }

  mu = inv_logit(xb);

}

model {
  // Weakly informative priors for covariance matrix and second-level parameters
  tau ~ cauchy(0,0.5);
  Omega ~ lkj_corr(2.0);
  to_vector(gamma) ~ cauchy(0, 1);
  phi ~ cauchy(0, 30);           
      
  {
    array[J] row_vector[L+1] z_gamma;
    for (j in 1:J) {
      z_gamma[j] = Z[j] * gamma;
    }
    beta ~ multi_normal(z_gamma, quad_form_diag(Omega, tau));
  }

  y ~ beta_proportion(mu, phi);

  }

generated quantities {
  vector[N] y_hat;        

  // marginal effects
  array[N] vector[L+1] me;   
  array[N] vector[L+1] me_group;
  
  for (n in 1:N) { 
    y_hat[n] = inv_logit(X[n] * beta[group_num[n]]);
   }

  // marginal effects of lower-level variables
  for (i in 1:L+1) {
      real yh2;
  
      for (n in 1:N) {
         yh2 = inv_logit(X[n] * beta[group_num[n]] + 0.0001*beta[group_num[n]][i] );
         me[n][i] = (yh2 - y_hat[n] ) / 0.0001;
      }
   }

   // marginal effects of top-level variables. 

   { real yh2;

     for (m in 1:M+1) {  // loop over top-level covariates
     	 array[J] vector[L+1] me_beta = beta;          
         for (i in 1:L+1) {
	         for (j in 1:J) {  
               me_beta[j][i] = me_beta[j][i] + 0.0001 * gamma[m, i]; // add epsilon to top-level variables, and propagate that through the betas; equivalent to adding 0.0001 to Z[j]
            }
          }

          for (n in 1:N) {
             yh2 = inv_logit(X[n] * me_beta[group_num[n]]); 
	         me_group[n][m] = (yh2 - y_hat[n] ) / 0.0001;
          }

        
     }
   }

}

Bob_Carpenter · July 30, 2024, 6:11pm

Hi, @amillb, and welcome to the Stan forums.

Can you say a bit more about what you mean by “marginal effects”? I’m not familiar with the term, but it looks like you’re generating the variable called me with

 me[n][i] = (yh2 - y_hat[n] ) / 0.0001;

where it looks like y_hat is some kind of expectation for a logistic regression outcome. I’m also not sure why there is rescaling by 0.0001. Usually you want to eliminate that and work on natural scales.

Also, something like a cauchy(0, 30) isn’t giving you much of a prior compared to just a flat prior. You’ll find things sample better with weakly informative priors that don’t allow huge outliers like the Cauchy.

You can also make a lot of this code more efficient with vectorization, e.g., by defining yh2 for inv_logit(X * me_beta[group_num]) to turn this into a much more efficient matrix multiply. Similarly, you can set me_group[ , m] = (y2 - y_hat) / 0.0001, and so on. It’s more efficient because it reduces the size of the autodiff expression graph, not because it eliminates a loop.

amillb · August 5, 2024, 9:16pm

Hi @Bob_Carpenter,

Many thanks for replying; the suggestions for priors and for code efficiency are also appreciated.

By marginal effect, I mean the change in the dependent variable that results from a “small" change in a specified independent variable (i.e., the derivative). See, for example, the discussion here.

The rescaling by 0.0001 is because 0.0001 is added to the independent variable in the following line, so the division then approximates the derivative. Because the effect of the top-level variables is channeled through the lower-level coefficients beta, I call these adjusted coefficients me_beta.

    me_beta[j][i] = me_beta[j][i] + 0.0001 * gamma[m, i];

yhat and yh2 are both the predicted outcome (y), based on a beta functional form. yhat is the predicted outcome for the actual values of X and Z. yh2 is the predicted outcome with 0.0001 added to one of the group-level predictors Z[m] for every observation. The marginal effect is the difference between the two, scaled by 0.0001. Does that make sense?

Adam

PS I have lurked in the Stan forums for many years and found the discussions very helpful. Until now, my questions have been asked and answered already.

Topic		Replies	Views
Marginal effects plot with partial residuals from CmdStanFit CmdStan techniques	2	710	September 3, 2022
Stan_glmer Marginal Effects Modeling	8	1628	April 12, 2020
Extracting the full covariance matrices from a multilevel model brms brms	2	582	January 15, 2023
Multilevel model with two predictors and their interaction Modeling rstan , specification	3	103	October 2, 2024
Understanding Multilevel modelling STAN code Modeling	2	175	April 26, 2024

Marginal effects for nonlinear multilevel models

Related topics