I’ve been practising the hierarchical logistic regression example from the documentation. I wanted to combine it with profiling in cmdstanr. However I’m getting tripped up with the types of different variables that are input to bernoulli_logit_glm_lpmf
etc.
Below is a working example using bernoulli_logit
. However to get this to work I needed to perform the matrix multiplication x_beta_ll[n] = x[n] * beta[ll[n]]
outside the function. My understanding is that this is one of the things that bernoulli_logit_glm_lpmf
does for you! So how do I set up the beta
vector so that is it the right length? perhaps in a transformed parameters block?
reproducible example
data {
int<lower=1> D; // how many predictors -- includes intercept
int<lower=0> N; // how many observations
int<lower=1> L; // how many groups
array[N] int<lower=0, upper=1> y;
array[N] int<lower=1, upper=L> ll; // code number for each group, 1, 2, 3, .. up to L
array[N] row_vector[D] x;
}
parameters {
array[D] real mu;
array[D] real<lower=0> sigma;
array[L] vector[D] beta; // beta, L lists of D vectors
}
model {
vector[N] x_beta_ll;
profile("priors"){
for (d in 1:D) {
mu[d] ~ normal(0, 1); // tweaked from example
sigma[d] ~ exponential(1); //adding a prior here
for (l in 1:L) {
beta[l, d] ~ normal(mu[d], sigma[d]);
}
}
}
profile("like"){
for (n in 1:N) {
x_beta_ll[n] = x[n] * beta[ll[n]];
}
y ~ bernoulli_logit(x_beta_ll);
}
}
note the intercept column is added manually (see spider_data_example
below)
R code to run the model. This features a toy dataset excerpted from the beloved spiders
dataset in mvabund
:
prep_spider_datasmall <- data.frame(
stringsAsFactors = FALSE,
soil_dry_c = c(-0.139189285714286,
0.578010714285714,0.0859107142857143,0.202810714285715,
0.544210714285714,0.909710714285714,
0.706810714285715,-0.139189285714286,0.578010714285714,
0.0859107142857143,0.202810714285715,0.544210714285714,
0.909710714285714,0.706810714285715),
species = c("Alopacce","Alopacce",
"Alopacce","Alopacce","Alopacce","Alopacce",
"Alopacce","Alopcune","Alopcune","Alopcune",
"Alopcune","Alopcune","Alopcune","Alopcune"),
abd = c(25L,0L,15L,2L,1L,
0L,2L,10L,2L,20L,6L,20L,6L,7L),
pa = c(1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1),
species_num = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2),
inter = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
)
spider_data_example <- list(D = 2,
N = nrow(prep_spider_datasmall),
L = max(prep_spider_datasmall$species_num),
y = prep_spider_datasmall$pa,
ll = prep_spider_datasmall$species_num,
x = with(prep_spider_datasmall, cbind(inter, soil_dry_c)))
mulivar_logit <- cmdstan_model("logistic_probit/multivariate_logit_multilevel.stan",
stanc_options = list("warn-pedantic" = TRUE))