# Marginal effects for nonlinear multilevel models

I’m trying to calculate marginal effects for a multilevel nonlinear (beta) model using cmdstanpy. I have continuous predictors at both the top and lower levels of the model, roughly following this example.

Marginal effects of the lower-level coefficients seem pretty straightforward, either through feeding fake data to `model.generate_quantities()`, or computing them directly in the generated quantities block (see code below).

For marginal effects of top-level coefficients, what is the best way to do this? Fake data doesn’t work, as the generated quantities block takes the lower-level coefficients (betas) as fixed, but the top-level predictors work through the betas. All the examples I see (e.g. here) assume that there are no top-level predictors (only groups).

My attempt (somewhat convoluted) is to recompute the betas in generated quantities. See below. Is there a better way to do it?

Also, is there any equivalent to `brms` for Python, or another package that allows the easy computation of marginal effects?

Thank you!

``````data {
int<lower=0> J;
int<lower=0> N;
int<lower=0> L;                     // number of lower-level covariates (excl. intercept)
int<lower=0> M;                     // number of top-level covariates (excl. intercept)
array[N] int<lower=1,upper=J> group_num;
vector[J] Z1;            // use length J rather than N
vector[N] X1;
vector[N] y;

}

transformed data {
// roughly following https://mc-stan.org/docs/stan-users-guide/multivariate-hierarchical-priors.html
// individual predictor matrix
matrix[N, L+1] X;
for (i in 1:N) {
X[i,1] = 1;
X[i,2] = X1[i];
}
// group-level predictors
array[J] row_vector[M+1] Z;
for (j in 1:J) {
Z[j][1] = 1.0;
Z[j][2] = Z1[j];
}
}

parameters {
corr_matrix[(L+1)] Omega;          // decomposed correlation matrix for individual-level predictors.
vector<lower=0>[L+1] tau;           // scale of diagonals
matrix[M+1,L+1] gamma;              // top-level coefficients (including the intercept)
array[J] vector[L+1] beta;          // lower-level level coefficients (including the intercept)
real<lower = 0> phi;                // dispersion

}

transformed parameters {
vector[N] mu;
vector[N] xb;

for (n in 1:N) {
xb[n] = X[n] * beta[group_num[n]];
}

mu = inv_logit(xb);

}

model {
// Weakly informative priors for covariance matrix and second-level parameters
tau ~ cauchy(0,0.5);
Omega ~ lkj_corr(2.0);
to_vector(gamma) ~ cauchy(0, 1);
phi ~ cauchy(0, 30);

{
array[J] row_vector[L+1] z_gamma;
for (j in 1:J) {
z_gamma[j] = Z[j] * gamma;
}
}

y ~ beta_proportion(mu, phi);

}

generated quantities {
vector[N] y_hat;

// marginal effects
array[N] vector[L+1] me;
array[N] vector[L+1] me_group;

for (n in 1:N) {
y_hat[n] = inv_logit(X[n] * beta[group_num[n]]);
}

// marginal effects of lower-level variables
for (i in 1:L+1) {
real yh2;

for (n in 1:N) {
yh2 = inv_logit(X[n] * beta[group_num[n]] + 0.0001*beta[group_num[n]][i] );
me[n][i] = (yh2 - y_hat[n] ) / 0.0001;
}
}

// marginal effects of top-level variables.

{ real yh2;

for (m in 1:M+1) {  // loop over top-level covariates
array[J] vector[L+1] me_beta = beta;
for (i in 1:L+1) {
for (j in 1:J) {
me_beta[j][i] = me_beta[j][i] + 0.0001 * gamma[m, i]; // add epsilon to top-level variables, and propagate that through the betas; equivalent to adding 0.0001 to Z[j]
}
}

for (n in 1:N) {
yh2 = inv_logit(X[n] * me_beta[group_num[n]]);
me_group[n][m] = (yh2 - y_hat[n] ) / 0.0001;
}

}
}

}
``````

Hi, @amillb, and welcome to the Stan forums.

Can you say a bit more about what you mean by “marginal effects”? I’m not familiar with the term, but it looks like you’re generating the variable called `me` with

`````` me[n][i] = (yh2 - y_hat[n] ) / 0.0001;
``````

where it looks like `y_hat` is some kind of expectation for a logistic regression outcome. I’m also not sure why there is rescaling by 0.0001. Usually you want to eliminate that and work on natural scales.

Also, something like a `cauchy(0, 30)` isn’t giving you much of a prior compared to just a flat prior. You’ll find things sample better with weakly informative priors that don’t allow huge outliers like the Cauchy.

You can also make a lot of this code more efficient with vectorization, e.g., by defining `yh2` for `inv_logit(X * me_beta[group_num])` to turn this into a much more efficient matrix multiply. Similarly, you can set `me_group[ , m] = (y2 - y_hat) / 0.0001`, and so on. It’s more efficient because it reduces the size of the autodiff expression graph, not because it eliminates a loop.

Many thanks for replying; the suggestions for priors and for code efficiency are also appreciated.

By marginal effect, I mean the change in the dependent variable that results from a “small" change in a specified independent variable (i.e., the derivative). See, for example, the discussion here.

The rescaling by 0.0001 is because 0.0001 is added to the independent variable in the following line, so the division then approximates the derivative. Because the effect of the top-level variables is channeled through the lower-level coefficients `beta`, I call these adjusted coefficients `me_beta`.

``````    me_beta[j][i] = me_beta[j][i] + 0.0001 * gamma[m, i];
``````

`yhat` and `yh2` are both the predicted outcome (y), based on a beta functional form. `yhat` is the predicted outcome for the actual values of `X` and `Z`. `yh2` is the predicted outcome with 0.0001 added to one of the group-level predictors `Z[m]` for every observation. The marginal effect is the difference between the two, scaled by 0.0001. Does that make sense?