# Using bernoulli_logit_glm in a hierarchical model

I’ve been practising the hierarchical logistic regression example from the documentation. I wanted to combine it with profiling in cmdstanr. However I’m getting tripped up with the types of different variables that are input to `bernoulli_logit_glm_lpmf` etc.

Below is a working example using `bernoulli_logit`. However to get this to work I needed to perform the matrix multiplication `x_beta_ll[n] = x[n] * beta[ll[n]]` outside the function. My understanding is that this is one of the things that `bernoulli_logit_glm_lpmf` does for you! So how do I set up the `beta` vector so that is it the right length? perhaps in a transformed parameters block?

### reproducible example

``````data {
int<lower=1> D; // how many predictors -- includes intercept
int<lower=0> N; // how many observations
int<lower=1> L; // how many groups
array[N] int<lower=0, upper=1> y;
array[N] int<lower=1, upper=L> ll; // code number for each group, 1, 2, 3, .. up to L
array[N] row_vector[D] x;
}
parameters {
array[D] real mu;
array[D] real<lower=0> sigma;
array[L] vector[D] beta; // beta, L lists of D vectors
}
model {

vector[N] x_beta_ll;

profile("priors"){
for (d in 1:D) {
mu[d] ~ normal(0, 1); // tweaked from example
sigma[d] ~ exponential(1); //adding a prior here
for (l in 1:L) {
beta[l, d] ~ normal(mu[d], sigma[d]);
}
}
}
profile("like"){

for (n in 1:N) {
x_beta_ll[n] = x[n] * beta[ll[n]];
}
y ~ bernoulli_logit(x_beta_ll);
}
}
``````

note the intercept column is added manually (see `spider_data_example` below)

R code to run the model. This features a toy dataset excerpted from the beloved `spiders` dataset in `mvabund`:

``````prep_spider_datasmall <- data.frame(
stringsAsFactors = FALSE,
soil_dry_c = c(-0.139189285714286,
0.578010714285714,0.0859107142857143,0.202810714285715,
0.544210714285714,0.909710714285714,
0.706810714285715,-0.139189285714286,0.578010714285714,
0.0859107142857143,0.202810714285715,0.544210714285714,
0.909710714285714,0.706810714285715),
species = c("Alopacce","Alopacce",
"Alopacce","Alopacce","Alopacce","Alopacce",
"Alopacce","Alopcune","Alopcune","Alopcune",
"Alopcune","Alopcune","Alopcune","Alopcune"),
abd = c(25L,0L,15L,2L,1L,
0L,2L,10L,2L,20L,6L,20L,6L,7L),
pa = c(1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1),
species_num = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2),
inter = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
)

spider_data_example <- list(D = 2,
N = nrow(prep_spider_datasmall),
L = max(prep_spider_datasmall\$species_num),
y = prep_spider_datasmall\$pa,
ll = prep_spider_datasmall\$species_num,
x = with(prep_spider_datasmall, cbind(inter, soil_dry_c)))

mulivar_logit <- cmdstan_model("logistic_probit/multivariate_logit_multilevel.stan",
stanc_options = list("warn-pedantic" = TRUE))
``````
1 Like

Unfortunately the `glm` distributions aren’t able to be applied in a vectorised fashion when there are random effects involved (only random intercepts), as the distribution assumes a common `beta` for each individual.

You can either use the `bernoulli_logit_glm` in an un-vectorised fashion, or you can use the `bernoulli_logit` distribution but construct the linear predictor in a vectorised fashion.

To do this, you just need to declare both `x` and `beta` as `matrix` types, and then you can use `rows_dot_product`:

``````    y ~ bernoulli_logit(rows_dot_product(x,beta[ll]));
``````

Additionally, this will be faster if you can declare the inputs with the opposite dimensions, and then you can use `columns_dot_product` instead: PSA: where possible, use columns_dot_product rather than rows_dot_product

4 Likes

See also the code here, which has optimized code; it doesn’t use reduce-sum but it should be straightforward to add that. There is also code here that will be faster for the subset of scenarios where groups (using your terminology; “individuals” in the terminology of the code) are associated with at least two or more levels of each predictor. I’ll post soon the converse subset, where groups are associated with only one level of every predictor.

2 Likes

You could also expand the design matrix to include columns for the covariate multiplied by an indicator for each individual (this can happen in `transformed data`), and then pass in the random effect terms associated with each individual to `bernoulli_logit_glm` as ordinary coefficients. No idea if this would be competitive efficiency-wise with the other solutions or not.

2 Likes