# Help calculating log-likelihood with binomial_logit_glm to use with loo

I am working with some binary outcome data from an experiment where each trial present a combination of signal manipulations and the participants give 3 binary outcome judgements.

I’m modelling this as multivariate with 3 correlated outcomes.

I’d like to use loo to do some model comparisons, but am struggling to get a combined/joint log likelihood of all three outcomes per observation.

I’m bringing the outcomes in as three arrays

data {
int<lower=1> N; // total number of observations
array[N] int Y1; //
array[N] int Y2; //
array[N] int Y3; // response variable
array[N] int<lower=1> J; // Subject idx
int<lower=1> K; //number of predictors
matrix[N, K] X; // predictor matrix
}

In generated quantities, I am calculating a log_lik for each outcome separately:

generated quantities {
vector[N] log_lik1;
vector[N] log_lik2;
vector[N] log_lik3;

for (n in 1:N) {
log_lik1[n] = bernoulli_logit_glm_lpmf(Y1| X, mu_y1_pred[n], b_y1);
log_lik2[n] = bernoulli_logit_glm_lpmf(Y2 | X, mu_y2_pred[n], b_y2);
log_lik3[n] = bernoulli_logit_glm_lpmf(Y3 | X, mu_y3_pred[n], b_y3);
}
}

I’m calling loo with

loo(fit\$draws("log_lik1"), r_eff=relative_eff(fit\$draws("log_lik1")))

But every k value is too high.

When I fit the same model and data with brms and call loo(brms_fit) all k values are fine (< 0.5). Even when I specify only one response with resp = "Y1".

Have I coded this incorrectly? Is there a way to calculate a joint log-likelihood as well?

Yes, just use

log_lik[n] = log_lik1[n] + log_lik2[n] + log_lik3[n];

That defines log_lik to be the joint log likelihood of Y1, Y2, and Y3. But I’m guessing you really meant Y1[n] and so on. Otherwise you get a vectorized value for the entire container. Of course, I’m only guessing that Y1 is a container because I don’t see how any of this makes sense otherwise. In general, it’s much easier for us to help if you supply the whole model so we can see all the variable declarations.

Thank you very much @Bob_Carpenter

I am still getting all observations with high pareto K values.

Here is the full model code:

functions {
/* compute correlated group-level effects
* Args:
*   z: matrix of unscaled group-level effects
*   SD: vector of standard deviation parameters
*   L: cholesky factor correlation matrix
* Returns:
*   matrix of scaled group-level effects
*/
matrix scale_r_cor(matrix z, vector SD, matrix L) {
// r is stored in another dimension order than z
return transpose(diag_pre_multiply(SD, L) * z);
}
}
data {
int<lower=1> N; // total number of observations
array[N] int Y1; // response variable
array[N] int Y2; // response variable
array[N] int Y3; // response variable
array[N] int<lower=1> J; // grouping indicator for subjects
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
vector[N] Z_X1;
vector[N] Z_X2;
vector[N] Z_X3;
vector[N] Z_X4;
// data for group-level effects subject
int<lower=1> N_subj; // number of subjects
int<lower=1> M; // number of coefficients per subject
int<lower=1> NC; // number of group-level correlations
int prior_only; // should the likelihood be ignored?
}
parameters {
vector[K] b_y1; // population-level effects
vector[K] b_y2; // population-level effects
vector[K] b_y3; // population-level effects
vector<lower=0>[M] sd_1; // group-level standard deviations
matrix[M, N_subj] z; // standardized group-level effects
cholesky_factor_corr[M] L; // cholesky factor of correlation matrix
}
transformed parameters {
matrix[N_subj, M] r; // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_subj] r_y1_x1;
vector[N_subj] r_y1_x2;
vector[N_subj] r_y1_x3;
vector[N_subj] r_y1_x4;
vector[N_subj] r_y2_x1;
vector[N_subj] r_y2_x2;
vector[N_subj] r_y2_x3;
vector[N_subj] r_y2_x4;
vector[N_subj] r_y3_x1;
vector[N_subj] r_y3_x2;
vector[N_subj] r_y3_x3;
vector[N_subj] r_y3_x4;
// compute actual group-level effects
r = scale_r_cor(z, sd_1, L);
r_y1_x1 = r[ : , 1];
r_y1_x2 = r[ : , 2];
r_y1_x3 = r[ : , 3];
r_y1_x4 = r[ : , 4];
r_y2_x1 = r[ : , 5];
r_y2_x2 = r[ : , 6];
r_y2_x3 = r[ : , 7];
r_y2_x3 = r[ : , 8];
r_y3_x1 = r[ : , 9];
r_y3_x2 = r[ : , 10];
r_y3_x3 = r[ : , 11];
r_y3_x4 = r[ : , 12];
}
model {
real lprior = 0; // prior contributions to the log posterior
// likelihood not including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu_y1 = rep_vector(0.0, N);
// initialize linear predictor term
vector[N] mu_y2 = rep_vector(0.0, N);
// initialize linear predictor term
vector[N] mu_y3 = rep_vector(0.0, N);
for (n in 1 : N) {
// add more terms to the linear predictor
mu_y1[n] += r_y1_x1[J[n]]
* Z_X1[n]
+ r_y1_x2[J[n]]
* Z_X2[n]
+ r_y1_x3[J[n]]
* Z_X3[n]
+ r_y1_x4[J[n]]
* Z_X4[n];
}
for (n in 1 : N) {
// add more terms to the linear predictor
mu_y2[n] += r_y2_x1[J[n]]
* Z_X1[n]
+ r_y2_x2[J[n]]
* Z_X2[n]
+ r_y2_x3[J[n]]
* Z_X3[n]
+ r_y2_x4[J[n]]
* Z_X4[n];
}
for (n in 1 : N) {
// add more terms to the linear predictor
mu_y3[n] += r_y3_x1[J[n]]
* Z_X1[n]
+ r_y3_x2[J[n]]
* Z_X2[n]
+ r_y3_x3[J[n]]
* Z_X3[n]
+ r_y3_x4[J[n]]
* Z_X4[n];
}
target += bernoulli_logit_glm_lupmf(Y1 | X, mu_y1, b_y1);
target += bernoulli_logit_glm_lupmf(Y1 | X, mu_y2, b_y2);
target += bernoulli_logit_glm_lupmf(Y1 | X, mu_y3, b_y3);
}
// priors not including constants
lprior += normal_lupdf(b_y1[1] | 0, 1);
lprior += normal_lupdf(b_y1[2] | 0, 1);
lprior += normal_lupdf(b_y1[3] | 0, 1);
lprior += normal_lupdf(b_y1[4] | 0, 1);
lprior += normal_lupdf(b_y1[5] | 1, 1);
lprior += normal_lupdf(b_y1[6] | 0, 1);
lprior += normal_lupdf(b_y2 | 0, 1);
lprior += normal_lupdf(b_y3 | 0, 1);
lprior += normal_lupdf(sd_1 | 1, 1);
lprior += lkj_corr_cholesky_lupdf(L | 1);
target += lprior;
target += std_normal_lupdf(to_vector(z));
}
generated quantities {
vector[N] log_lik1;
vector[N] log_lik2;
vector[N] log_lik3;
vector[N] log_lik;
// For predictions on observed matrix
vector[N] mu_y1_pred;
vector[N] mu_y2_pred;
vector[N] mu_y3_pred;
vector[N] Xb_y1 = X * b_y1;
vector[N] Xb_y2 = X * b_y2;
vector[N] Xb_y3 = X * b_y3;
vector[N] prob_y1;
vector[N] prob_y2;
vector[N] prob_y3;
array[N] int y1_pred;
array[N] int y2_pred;
array[N] int y3_pred;
// compute group-level correlations
corr_matrix[M] Cor_1 = multiply_lower_tri_self_transpose(L);
vector<lower=-1, upper=1>[NC] cor_1;
// extract upper diagonal of correlation matrix
for (k in 1 : M) {
for (j in 1 : (k - 1)) {
cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
}
}
for (n in 1:N) { // predictions from observed data
mu_y1_pred[n] = r_y1_x1[J[n]]
* Z_X1[n]
+ r_y1_x2[J[n]]
* Z_X2[n]
+ r_y1_x3[J[n]]
* Z_X3[n]
+ r_y1_x4[J[n]]
* Z_X4[n];
mu_y2_pred[n] = r_y2_x1[J[n]]
* Z_X1[n]
+ r_y2_x2[J[n]]
* Z_X2[n]
+ r_y2_x3[J[n]]
* Z_X3[n]
+ r_y2_x4[J[n]]
* Z_X4[n];
mu_y3_pred[n] = r_y3_x1[J[n]]
* Z_X1[n]
+ r_y3_x2[J[n]]
* Z_X2[n]
+ r_y3_x3[J[n]]
* Z_X3[n]
+ r_y3_x4[J[n]]
* Z_X4[n];
// Predicted probabilities
prob_y1[n] = inv_logit(mu_y1_pred[n] + Xb_y1[n]);
prob_y2[n] = inv_logit(mu_y2_pred[n] + Xb_y2[n]);
prob_y3[n] = inv_logit(mu_y3_pred[n] + Xb_y3[n]);
// Log-likelihoodcs
log_lik1[n] = bernoulli_logit_glm_lpmf(Y1[n] | X, mu_y1_pred[n], b_y1);
log_lik2[n] = bernoulli_logit_glm_lpmf(Y2[n] | X, mu_y2_pred[n], b_y2);
log_lik3[n] = bernoulli_logit_glm_lpmf(Y3[n] | X, mu_y3_pred[n], b_y3);
log_lik[n] = log_lik1[n] + log_lik2[n] + log_lik3[n];
}
// Predictions
y1_pred = bernoulli_logit_glm_rng(X, mu_y1_pred, b_y1);
y2_pred = bernoulli_logit_glm_rng(X, mu_y2_pred, b_y2);
y3_pred = bernoulli_logit_glm_rng(X, mu_y3_pred, b_y3);
}

This is modeling all of the Y1 at once. I’m guessing the problem is that you want the log likelihoods for the individual Y1[n]. After calculating, just add those to the target using target += ...;.

You do something like that later:

log_lik1[n] = bernoulli_logit_glm_lpmf(Y1[n] | X, mu_y1_pred[n], b_y1);

but my guess here is that you’re not pulling out individual entries of X. Shouldn’t Y1[n] depend only on X[n]? So I think the problem is that you want to decompose what bernoulli_logit_glm does and calculate individual entries.

Thanks again @Bob_Carpenter !

Since X is a matrix, how can I index the rows? Do I need to_row_vector?

If X is a matrix, X[n] is its n-th row. That can also be written less parsimoniously as X[n, ] or X[n, :] or X[n, 1:cols(X)] or row(X, n).

I tried indexing with [n]

but get errors on compiling:

Ill-typed arguments supplied to function 'bernoulli_logit_glm_lpmf':
(int, row_vector, real, vector)
Available signatures:
(int, matrix, vector, vector) => real
The second argument must be matrix but got row_vector
(int, matrix, real, vector) => real
The second argument must be matrix but got row_vector
(array[] int, matrix, vector, vector) => real
The first argument must be array[] int but got int
(array[] int, row_vector, real, vector) => real
The first argument must be array[] int but got int
(array[] int, row_vector, vector, vector) => real
The first argument must be array[] int but got int

I modified that section to:

// Log-likelihoods
log_lik1[n] = bernoulli_logit_glm_lpmf(Y1[n] | X, mu_y1_pred, b_y1);
log_lik2[n] = bernoulli_logit_glm_lpmf(Y2[n] | X, mu_y2_pred, b_y2);
log_lik3[n] = bernoulli_logit_glm_lpmf(Y3[n] | X, mu_y3_pred, b_y3);
log_lik[n] = log_lik1[n] + log_lik2[n] + log_lik3[n];

Which compiles, but then sampling gives errors for every iteration:

Chain 4 Exception: bernoulli_logit_glm_lpmf: Intercept[2] is nan, but must be finite! (in '/tmp/RtmpR6gote/model-134901b9ca0a4.stan', line 205, column 11 to column 95)

This compiles and runs without error, but the results do not at all agree with those from brms

generated quantities {
vector[N] log_lik1;
vector[N] log_lik2;
vector[N] log_lik3;
vector[N] log_lik;
// For predictions on observed matrix
vector[N] mu_y1_pred;
vector[N] mu_y2_pred;
vector[N] mu_y3_pred;
vector[N] Xb_y1 = X * b_y1;
vector[N] Xb_y2 = X * b_y2;
vector[N] Xb_y3 = X * b_y3;
vector[N] prob_y1;
vector[N] prob_y2;
vector[N] prob_y3;
array[N] int y1_pred;
array[N] int y2_pred;
array[N] int y3_pred;
// compute group-level correlations
corr_matrix[M] Cor_1 = multiply_lower_tri_self_transpose(L);
vector<lower=-1, upper=1>[NC] cor_1;
// extract upper diagonal of correlation matrix
for (k in 1 : M) {
for (j in 1 : (k - 1)) {
cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
}
}
for (n in 1:N) { // predictions from observed data
mu_y1_pred[n] = r_y1_x1[J[n]]
* Z_X1[n]
+ r_y1_x2[J[n]]
* Z_X2[n]
+ r_y1_x3[J[n]]
* Z_X3[n]
+ r_y1_x4[J[n]]
* Z_X4[n];
mu_y2_pred[n] = r_y2_x1[J[n]]
* Z_X1[n]
+ r_y2_x2[J[n]]
* Z_X2[n]
+ r_y2_x3[J[n]]
* Z_X3[n]
+ r_y2_x4[J[n]]
* Z_X4[n];
mu_y3_pred[n] = r_y3_x1[J[n]]
* Z_X1[n]
+ r_y3_x2[J[n]]
* Z_X2[n]
+ r_y3_x3[J[n]]
* Z_X3[n]
+ r_y3_x4[J[n]]
* Z_X4[n];
// Predicted probabilities
prob_y1[n] = inv_logit(mu_y1_pred[n] + Xb_y1[n]);
prob_y2[n] = inv_logit(mu_y2_pred[n] + Xb_y2[n]);
prob_y3[n] = inv_logit(mu_y3_pred[n] + Xb_y3[n]);
// Log-likelihoods
log_lik1[n] = bernoulli_logit_glm_lpmf(Y1[n] | X, mu_y1_pred[n], b_y1);
log_lik2[n] = bernoulli_logit_glm_lpmf(Y2[n] | X, mu_y2_pred[n], b_y2);
log_lik3[n] = bernoulli_logit_glm_lpmf(Y3[n] | X, mu_y3_pred[n], b_y3);
log_lik[n] = log_lik1[n] + log_lik2[n] + log_lik3[n];
}
// Predictions
y1_pred = bernoulli_logit_glm_rng(X, mu_y1_pred, b_y1);
y2_pred = bernoulli_logit_glm_rng(X, mu_y2_pred, b_y2);
y3_pred = bernoulli_logit_glm_rng(X, mu_y3_pred, b_y3);
}

loo(Fit\$draws("log_lik"), r_eff=relative_eff(Fit\$draws("log_lik")), ncores = ncores)

Computed from 4000 by 1296 log-likelihood matrix

Estimate      SE
elpd_loo -1583067.8 11689.6
p_loo      628518.5 11654.4
looic     3166135.6 23379.2
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)        0    0.0%  <NA>
(0.5, 0.7]   (ok)          0    0.0%  <NA>
(0.7, 1]   (bad)         0    0.0%  <NA>
(1, Inf)   (very bad) 1292  100.0%  0
See help('pareto-k-diagnostic') for details.

As I should have done in the initial post, here is a fully reproducible example:

Compile model

stan_file_variables <- write_stan_file("
functions {
/* compute correlated group-level effects
* Args:
*   z: matrix of unscaled group-level effects
*   SD: vector of standard deviation parameters
*   L: cholesky factor correlation matrix
* Returns:
*   matrix of scaled group-level effects
*/
matrix scale_r_cor(matrix z, vector SD, matrix L) {
// r is stored in another dimension order than z
return transpose(diag_pre_multiply(SD, L) * z);
}
}
data {
int<lower=1> N; // total number of observations
array[N] int Y1; // response variable
array[N] int Y2; // response variable
array[N] int Y3; // response variable
array[N] int<lower=1> J; // grouping indicator for subjects
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
vector[N] Z_Int;
vector[N] Z_X2;
vector[N] Z_X3;
vector[N] Z_X4;
// data for group-level effects subject
int<lower=1> N_subj; // number of subjects
int<lower=1> M; // number of coefficients per subject
int<lower=1> NC; // number of group-level correlations
int prior_only; // should the likelihood be ignored?
}
parameters {
vector[K] b_y1; // population-level effects
vector[K] b_y2; // population-level effects
vector[K] b_y3; // population-level effects
vector<lower=0>[M] sd_1; // group-level standard deviations
matrix[M, N_subj] z; // standardized group-level effects
cholesky_factor_corr[M] L; // cholesky factor of correlation matrix
}
transformed parameters {
matrix[N_subj, M] r; // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_subj] r_y1_x1;
vector[N_subj] r_y1_x2;
vector[N_subj] r_y1_x3;
vector[N_subj] r_y1_x4;
vector[N_subj] r_y2_x1;
vector[N_subj] r_y2_x2;
vector[N_subj] r_y2_x3;
vector[N_subj] r_y2_x4;
vector[N_subj] r_y3_x1;
vector[N_subj] r_y3_x2;
vector[N_subj] r_y3_x3;
vector[N_subj] r_y3_x4;
// compute actual group-level effects
r = scale_r_cor(z, sd_1, L);
r_y1_x1 = r[ : , 1];
r_y1_x2 = r[ : , 2];
r_y1_x3 = r[ : , 3];
r_y1_x4 = r[ : , 4];
r_y2_x1 = r[ : , 5];
r_y2_x2 = r[ : , 6];
r_y2_x3 = r[ : , 7];
r_y2_x4 = r[ : , 8];
r_y3_x1 = r[ : , 9];
r_y3_x2 = r[ : , 10];
r_y3_x3 = r[ : , 11];
r_y3_x4 = r[ : , 12];
}
model {
real lprior = 0; // prior contributions to the log posterior
// likelihood not including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu_y1 = rep_vector(0.0, N);
// initialize linear predictor term
vector[N] mu_y2 = rep_vector(0.0, N);
// initialize linear predictor term
vector[N] mu_y3 = rep_vector(0.0, N);
for (n in 1 : N) {
// add more terms to the linear predictor
mu_y1[n] += r_y1_x1[J[n]]
* Z_Int[n]
+ r_y1_x2[J[n]]
* Z_X2[n]
+ r_y1_x3[J[n]]
* Z_X3[n]
+ r_y1_x4[J[n]]
* Z_X4[n];
}
for (n in 1 : N) {
// add more terms to the linear predictor
mu_y2[n] += r_y2_x1[J[n]]
* Z_Int[n]
+ r_y2_x2[J[n]]
* Z_X2[n]
+ r_y2_x3[J[n]]
* Z_X3[n]
+ r_y2_x4[J[n]]
* Z_X4[n];
}
for (n in 1 : N) {
// add more terms to the linear predictor
mu_y3[n] += r_y3_x1[J[n]]
* Z_Int[n]
+ r_y3_x2[J[n]]
* Z_X2[n]
+ r_y3_x3[J[n]]
* Z_X3[n]
+ r_y3_x4[J[n]]
* Z_X4[n];
}
target += bernoulli_logit_glm_lupmf(Y1 | X, mu_y1, b_y1);
target += bernoulli_logit_glm_lupmf(Y2 | X, mu_y2, b_y2);
target += bernoulli_logit_glm_lupmf(Y3 | X, mu_y3, b_y3);
}
// priors not including constants
lprior += normal_lupdf(b_y1[1] | 0, 1);
lprior += normal_lupdf(b_y1[2] | 0, 1);
lprior += normal_lupdf(b_y1[3] | 0, 1);
lprior += normal_lupdf(b_y1[4] | 0, 1);
lprior += normal_lupdf(b_y1[5] | 1, 1);
lprior += normal_lupdf(b_y1[6] | 0, 1);
lprior += normal_lupdf(b_y2 | 0, 1);
lprior += normal_lupdf(b_y3 | 0, 1);
lprior += normal_lupdf(sd_1 | 1, 1);
lprior += lkj_corr_cholesky_lupdf(L | 1);
target += lprior;
target += std_normal_lupdf(to_vector(z));
}
generated quantities {
vector[N] log_lik1;
vector[N] log_lik2;
vector[N] log_lik3;
vector[N] log_lik;
// For predictions on observed matrix
vector[N] mu_y1_pred;
vector[N] mu_y2_pred;
vector[N] mu_y3_pred;
vector[N] Xb_y1 = X * b_y1;
vector[N] Xb_y2 = X * b_y2;
vector[N] Xb_y3 = X * b_y3;
vector[N] prob_y1;
vector[N] prob_y2;
vector[N] prob_y3;
array[N] int y1_pred;
array[N] int y2_pred;
array[N] int y3_pred;
// compute group-level correlations
corr_matrix[M] Cor_1 = multiply_lower_tri_self_transpose(L);
vector<lower=-1, upper=1>[NC] cor_1;
// extract upper diagonal of correlation matrix
for (k in 1 : M) {
for (j in 1 : (k - 1)) {
cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
}
}
for (n in 1:N) { // predictions from observed data
mu_y1_pred[n] = r_y1_x1[J[n]]
* Z_Int[n]
+ r_y1_x2[J[n]]
* Z_X2[n]
+ r_y1_x3[J[n]]
* Z_X3[n]
+ r_y1_x4[J[n]]
* Z_X4[n];
mu_y2_pred[n] = r_y2_x1[J[n]]
* Z_Int[n]
+ r_y2_x2[J[n]]
* Z_X2[n]
+ r_y2_x3[J[n]]
* Z_X3[n]
+ r_y2_x4[J[n]]
* Z_X4[n];
mu_y3_pred[n] = r_y3_x1[J[n]]
* Z_Int[n]
+ r_y3_x2[J[n]]
* Z_X2[n]
+ r_y3_x3[J[n]]
* Z_X3[n]
+ r_y3_x4[J[n]]
* Z_X4[n];
// Predicted probabilities
prob_y1[n] = inv_logit(mu_y1_pred[n] + Xb_y1[n]);
prob_y2[n] = inv_logit(mu_y2_pred[n] + Xb_y2[n]);
prob_y3[n] = inv_logit(mu_y3_pred[n] + Xb_y3[n]);
// Log-likelihoods
log_lik1[n] = bernoulli_logit_glm_lpmf(Y1[n] | X, mu_y1_pred[n], b_y1);
log_lik2[n] = bernoulli_logit_glm_lpmf(Y2[n] | X, mu_y2_pred[n], b_y2);
log_lik3[n] = bernoulli_logit_glm_lpmf(Y3[n] | X, mu_y3_pred[n], b_y3);
log_lik[n] = log_lik1[n] + log_lik2[n] + log_lik3[n];
}
// Predictions
y1_pred = bernoulli_logit_glm_rng(X, mu_y1_pred, b_y1);
y2_pred = bernoulli_logit_glm_rng(X, mu_y2_pred, b_y2);
y3_pred = bernoulli_logit_glm_rng(X, mu_y3_pred, b_y3);
}
")
Ex_Mod <- cmdstan_model(stan_file_variables)

Create data list from .csv:
Ex_Data.csv (88.0 KB)

## Create data list ----
Ex_Stan_Data <- lst(N = nrow(tbl_data),
Y1 = Ex_Data\$Y1,
Y2 = Ex_Data\$Y2,
Y3 = Ex_Data\$Y3,
X = model.matrix(~ 0 + X1 +
X2 +
X3 +
X4,
data = Ex_Data),
K = ncol(X),
K_zi = 53,
Z_Int = array(rep(1, nrow(Ex_Data)),
dimnames = list(rep(1, nrow(Ex_Data)))),
Z_X2 = array(Ex_Data\$X2,
dimnames = list(Ex_Data\$X2)),
Z_X3 = array(Ex_Data\$X3,
dimnames = list(Ex_Data\$X3)),
Z_X4 = array(Ex_Data\$X4,
dimnames = list(Ex_Data\$X4)),
J = Ex_Data %>%
mutate(subject = as.numeric(factor(subject))) %>%
pull(subject),
N_subj = length(unique(Ex_Data\$subject)),
M = 4*3, # Number of group-level coefficients * outcomes
NC = (4*3)*((4*3) - 1)/2, # (M * (M-1))/2
prior_only = 0
)

Sample from model:

# Sample from compiled model ----
Ex_Fit <- Ex_Mod\$sample(
data = Ex_Stan_Data,
iter_warmup = 1000,
iter_sampling = 1000,
chains = 4,
max_treedepth = 12,

PPC plot

Loo results:

Ex_Fit\$loo(ncores = ncores)

Computed from 4000 by 1296 log-likelihood matrix

Estimate       SE
elpd_loo -6179529.1  53762.4
p_loo     4265220.3  40488.5
looic    12359058.2 107524.8
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)        0    0.0%  <NA>
(0.5, 0.7]   (ok)          0    0.0%  <NA>
(0.7, 1]   (bad)         0    0.0%  <NA>
(1, Inf)   (very bad) 1292  100.0%  0
See help('pareto-k-diagnostic') for details

The PPC plot looks great, but the loo results look terrrible.

The same model fit in brms gives very good loo results.

fmla_y1 <- bf(Y1 ~ 0 + X1 +
X2 +
X3 +
X4 +
(1 + X2 + X3 + X4 | p | subject),
center = FALSE)

fmla_y2 <- bf(Y2 ~ 0 + X1 +
X2 +
X3 +
X4 +
(1 + X2 + X3 + X4 | p | subject),
center = FALSE)

fmla_y3 <- bf(Y3 ~ 0 + X1 +
X2 +
X3 +
X4 +
(1 + X2 + X3 + X4 | p | subject),
center = FALSE)

priors <- c(
set_prior("normal(0,1)",
class = "b",
resp = "Y1"),

set_prior("normal(1,1)",
class = "b",
coef = "X2",
resp = "Y1"),

set_prior("normal(1,1)",
class = "sd",
resp = "Y1"),

set_prior("normal(0,1)",
class = "b",
resp = "Y2"),

set_prior("normal(1,1)",
class = "sd",
resp = "Y2"),

set_prior("normal(0,1)",
class = "b",
resp = "Y3"),

set_prior("normal(1,1)",
class = "sd",
resp = "Y3")
)

brm_mod <- brm(fmla_y1 + fmla_y2 + fmla_y3,
family = bernoulli(),
Ex_Data,
prior = priors,
init = 0,
iter = 2000,
warmup = 1000,
chains = 4,
cores = ncores,
normalize = FALSE,
backend = "cmdstan",
control = list(max_treedepth = 12,
)

loo(brm_mod, ncores = ncores)

Computed from 4000 by 1296 log-likelihood matrix

Estimate   SE
elpd_loo  -2297.4 37.3
p_loo        74.8  2.1
looic      4594.9 74.5
------
Monte Carlo SE of elpd_loo is 0.1.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

Just based on a quick look, should this

be

log_lik1[n] = bernoulli_logit_glm_lpmf(Y1[n] | X[n], mu_y1_pred[n], b_y1);

That gives me the following errors:

Compiling Stan program...
Semantic error in '/tmp/Rtmp4BMJ83/model-3f428791c47a5.stan', line 189, column 30 to column 89:
-------------------------------------------------
187:                  prob_y3[n] = inv_logit(mu_y3_pred[n] + Xb_y3[n]);
188:                  // Log-likelihoods
189:                  log_lik1[n] = bernoulli_logit_glm_lpmf(Y1[n] | X[n], mu_y1_pred[n], b_y1);
^
190:                  log_lik2[n] = bernoulli_logit_glm_lpmf(Y2[n] | X[n], mu_y2_pred[n], b_y2);
191:                  log_lik3[n] = bernoulli_logit_glm_lpmf(Y3[n] | X[n], mu_y3_pred[n], b_y3);
-------------------------------------------------

Ill-typed arguments supplied to function 'bernoulli_logit_glm_lpmf':
(int, row_vector, real, vector)
Available signatures:
(int, matrix, vector, vector) => real
The second argument must be matrix but got row_vector
(int, matrix, real, vector) => real
The second argument must be matrix but got row_vector
(array[] int, matrix, vector, vector) => real
The first argument must be array[] int but got int
(array[] int, row_vector, real, vector) => real
The first argument must be array[] int but got int
(array[] int, row_vector, vector, vector) => real
The first argument must be array[] int but got int

make: *** [make/program:50: /tmp/Rtmp4BMJ83/model-3f428791c47a5.hpp] Error 1

Error: An error occured during compilation! See the message above for more information.

Maybe @paul.buerkner can help explain what ‘brms’ does in this case?

I’ve looked through the ‘brms’ code, but what I’ve gleaned and put into generated quantities isn’t working.

Perhaps there is some other problem with the model? The way you add the log-likelihoods together seems to be correct and is the same approach that brms does in R.

Thanks, @paul.buerkner

That’s what’s confusing me.

If I pass the brms model to loo all k are <0.5.

But, if I take the code and data generated by brms from the fitted model (standata() and stancode()) and then fit those directly in cmdstanr, all observations have high k.

This is what I’ve added to generated quantities in the code generated by brms

generated quantities {
vector[N] log_lik1;
vector[N] log_lik2;
vector[N] log_lik3;
vector[N] log_lik;
// For predictions on observed matrix
vector[N] mu_y1_pred;
vector[N] mu_y2_pred;
vector[N] mu_y3_pred;
// compute group-level correlations
corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
vector<lower=-1, upper=1>[NC_1] cor_1;
// extract upper diagonal of correlation matrix
for (k in 1 : M_1) {
for (j in 1 : (k - 1)) {
cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
}
}
for (n in 1:N) { // predictions from observed data
mu_y1_pred[n] = r_1_Y1_1[J_1_Y1[n]] * Z_1_Y1_1[n]
+ r_1_Y1_2[J_1_Y1[n]] * Z_1_Y1_2[n]
+ r_1_Y1_3[J_1_Y1[n]] * Z_1_Y1_3[n]
+ r_1_Y1_4[J_1_Y1[n]] * Z_1_Y1_4[n];
mu_y2_pred[n] = r_1_Y2_5[J_1_Y2[n]] * Z_1_Y2_5[n]
+ r_1_Y2_6[J_1_Y2[n]] * Z_1_Y2_6[n]
+ r_1_Y2_7[J_1_Y2[n]] * Z_1_Y2_7[n]
+ r_1_Y2_8[J_1_Y2[n]] * Z_1_Y2_8[n];
mu_y3_pred[n] = r_1_Y3_9[J_1_Y3[n]] * Z_1_Y3_9[n]
+ r_1_Y3_10[J_1_Y3[n]] * Z_1_Y3_10[n]
+ r_1_Y3_11[J_1_Y3[n]] * Z_1_Y3_11[n]
+ r_1_Y3_12[J_1_Y3[n]] * Z_1_Y3_12[n];
// Log-likelihoods
log_lik1[n] = bernoulli_logit_glm_lpmf(Y_Y1[n] | X_Y1, mu_y1_pred[n], b_Y1);
log_lik2[n] = bernoulli_logit_glm_lpmf(Y_Y2[n] | X_Y2, mu_y2_pred[n], b_Y2);
log_lik3[n] = bernoulli_logit_glm_lpmf(Y_Y3[n] | X_Y3, mu_y3_pred[n], b_Y3);
log_lik[n] = log_lik1[n] + log_lik2[n] + log_lik3[n];
}
}

This is the full code generated by brms

functions {
/* compute correlated group-level effects
* Args:
*   z: matrix of unscaled group-level effects
*   SD: vector of standard deviation parameters
*   L: cholesky factor correlation matrix
* Returns:
*   matrix of scaled group-level effects
*/
matrix scale_r_cor(matrix z, vector SD, matrix L) {
// r is stored in another dimension order than z
return transpose(diag_pre_multiply(SD, L) * z);
}
}
data {
int<lower=1> N; // total number of observations
int<lower=1> N_Y1; // number of observations
array[N_Y1] int Y_Y1; // response variable
int<lower=1> K_Y1; // number of population-level effects
matrix[N_Y1, K_Y1] X_Y1; // population-level design matrix
int<lower=1> N_Y2; // number of observations
array[N_Y2] int Y_Y2; // response variable
int<lower=1> K_Y2; // number of population-level effects
matrix[N_Y2, K_Y2] X_Y2; // population-level design matrix
int<lower=1> N_Y3; // number of observations
array[N_Y3] int Y_Y3; // response variable
int<lower=1> K_Y3; // number of population-level effects
matrix[N_Y3, K_Y3] X_Y3; // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
array[N_Y1] int<lower=1> J_1_Y1; // grouping indicator per observation
array[N_Y2] int<lower=1> J_1_Y2; // grouping indicator per observation
array[N_Y3] int<lower=1> J_1_Y3; // grouping indicator per observation
// group-level predictor values
vector[N_Y1] Z_1_Y1_1;
vector[N_Y1] Z_1_Y1_2;
vector[N_Y1] Z_1_Y1_3;
vector[N_Y1] Z_1_Y1_4;
vector[N_Y2] Z_1_Y2_5;
vector[N_Y2] Z_1_Y2_6;
vector[N_Y2] Z_1_Y2_7;
vector[N_Y2] Z_1_Y2_8;
vector[N_Y3] Z_1_Y3_9;
vector[N_Y3] Z_1_Y3_10;
vector[N_Y3] Z_1_Y3_11;
vector[N_Y3] Z_1_Y3_12;
int<lower=1> NC_1; // number of group-level correlations
int prior_only; // should the likelihood be ignored?
}
transformed data {

}
parameters {
vector[K_Y1] b_Y1; // population-level effects
vector[K_Y2] b_Y2; // population-level effects
vector[K_Y3] b_Y3; // population-level effects
vector<lower=0>[M_1] sd_1; // group-level standard deviations
matrix[M_1, N_1] z_1; // standardized group-level effects
cholesky_factor_corr[M_1] L_1; // cholesky factor of correlation matrix
}
transformed parameters {
matrix[N_1, M_1] r_1; // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_1] r_1_Y1_1;
vector[N_1] r_1_Y1_2;
vector[N_1] r_1_Y1_3;
vector[N_1] r_1_Y1_4;
vector[N_1] r_1_Y2_5;
vector[N_1] r_1_Y2_6;
vector[N_1] r_1_Y2_7;
vector[N_1] r_1_Y2_8;
vector[N_1] r_1_Y3_9;
vector[N_1] r_1_Y3_10;
vector[N_1] r_1_Y3_11;
vector[N_1] r_1_Y3_12;
// compute actual group-level effects
r_1 = scale_r_cor(z_1, sd_1, L_1);
r_1_Y1_1 = r_1[ : , 1];
r_1_Y1_2 = r_1[ : , 2];
r_1_Y1_3 = r_1[ : , 3];
r_1_Y1_4 = r_1[ : , 4];
r_1_Y2_5 = r_1[ : , 5];
r_1_Y2_6 = r_1[ : , 6];
r_1_Y2_7 = r_1[ : , 7];
r_1_Y2_8 = r_1[ : , 8];
r_1_Y3_9 = r_1[ : , 9];
r_1_Y3_10 = r_1[ : , 10];
r_1_Y3_11 = r_1[ : , 11];
r_1_Y3_12 = r_1[ : , 12];
}
model {
real lprior = 0; // prior contributions to the log posterior
// likelihood not including constants
if (!prior_only) {
// initialize linear predictor term
vector[N_Y1] mu_Y1 = rep_vector(0.0, N_Y1);
// initialize linear predictor term
vector[N_Y2] mu_Y2 = rep_vector(0.0, N_Y2);
// initialize linear predictor term
vector[N_Y3] mu_Y3 = rep_vector(0.0, N_Y3);
for (n in 1 : N_Y1) {
// add more terms to the linear predictor
mu_Y1[n] += r_1_Y1_1[J_1_Y1[n]] * Z_1_Y1_1[n]
+ r_1_Y1_2[J_1_Y1[n]] * Z_1_Y1_2[n]
+ r_1_Y1_3[J_1_Y1[n]] * Z_1_Y1_3[n]
+ r_1_Y1_4[J_1_Y1[n]] * Z_1_Y1_4[n];
}
for (n in 1 : N_Y2) {
// add more terms to the linear predictor
mu_Y2[n] += r_1_Y2_5[J_1_Y2[n]] * Z_1_Y2_5[n]
+ r_1_Y2_6[J_1_Y2[n]] * Z_1_Y2_6[n]
+ r_1_Y2_7[J_1_Y2[n]] * Z_1_Y2_7[n]
+ r_1_Y2_8[J_1_Y2[n]] * Z_1_Y2_8[n];
}
for (n in 1 : N_Y3) {
// add more terms to the linear predictor
mu_Y3[n] += r_1_Y3_9[J_1_Y3[n]] * Z_1_Y3_9[n]
+ r_1_Y3_10[J_1_Y3[n]] * Z_1_Y3_10[n]
+ r_1_Y3_11[J_1_Y3[n]] * Z_1_Y3_11[n]
+ r_1_Y3_12[J_1_Y3[n]] * Z_1_Y3_12[n];
}
target += bernoulli_logit_glm_lupmf(Y_Y1 | X_Y1, mu_Y1, b_Y1);
target += bernoulli_logit_glm_lupmf(Y_Y2 | X_Y2, mu_Y2, b_Y2);
target += bernoulli_logit_glm_lupmf(Y_Y3 | X_Y3, mu_Y3, b_Y3);
}
// priors not including constants
lprior += normal_lupdf(b_Y1[1] | 0, 1);
lprior += normal_lupdf(b_Y1[2] | 0, 1);
lprior += normal_lupdf(b_Y1[3] | 0, 1);
lprior += normal_lupdf(b_Y1[4] | 1, 1);
lprior += normal_lupdf(b_Y1[5] | 0, 1);
lprior += normal_lupdf(b_Y1[6] | 0, 1);
lprior += normal_lupdf(b_Y2 | 0, 1);
lprior += normal_lupdf(b_Y3 | 0, 1);
lprior += normal_lupdf(sd_1 | 1, 1);
lprior += lkj_corr_cholesky_lupdf(L_1 | 1);
target += lprior;
target += std_normal_lupdf(to_vector(z_1));
}
generated quantities {
vector[N] log_lik1;
vector[N] log_lik2;
vector[N] log_lik3;
vector[N] log_lik;
// For predictions on observed matrix
vector[N] mu_y1_pred;
vector[N] mu_y2_pred;
vector[N] mu_y3_pred;
// compute group-level correlations
corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
vector<lower=-1, upper=1>[NC_1] cor_1;
// extract upper diagonal of correlation matrix
for (k in 1 : M_1) {
for (j in 1 : (k - 1)) {
cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
}
}
for (n in 1:N) { // predictions from observed data
mu_y1_pred[n] = r_1_Y1_1[J_1_Y1[n]] * Z_1_Y1_1[n]
+ r_1_Y1_2[J_1_Y1[n]] * Z_1_Y1_2[n]
+ r_1_Y1_3[J_1_Y1[n]] * Z_1_Y1_3[n]
+ r_1_Y1_4[J_1_Y1[n]] * Z_1_Y1_4[n];
mu_y2_pred[n] = r_1_Y2_5[J_1_Y2[n]] * Z_1_Y2_5[n]
+ r_1_Y2_6[J_1_Y2[n]] * Z_1_Y2_6[n]
+ r_1_Y2_7[J_1_Y2[n]] * Z_1_Y2_7[n]
+ r_1_Y2_8[J_1_Y2[n]] * Z_1_Y2_8[n];
mu_y3_pred[n] = r_1_Y3_9[J_1_Y3[n]] * Z_1_Y3_9[n]
+ r_1_Y3_10[J_1_Y3[n]] * Z_1_Y3_10[n]
+ r_1_Y3_11[J_1_Y3[n]] * Z_1_Y3_11[n]
+ r_1_Y3_12[J_1_Y3[n]] * Z_1_Y3_12[n];
// Log-likelihoods
log_lik1[n] = bernoulli_logit_glm_lpmf(Y_Y1[n] | X_Y1, mu_y1_pred[n], b_Y1);
log_lik2[n] = bernoulli_logit_glm_lpmf(Y_Y2[n] | X_Y2, mu_y2_pred[n], b_Y2);
log_lik3[n] = bernoulli_logit_glm_lpmf(Y_Y3[n] | X_Y3, mu_y3_pred[n], b_Y3);
log_lik[n] = log_lik1[n] + log_lik2[n] + log_lik3[n];
}
}

@avehtari is on the right track. He just missed wrapping things up into the appropriate containers. So that should be:

bernoulli_logit_glm_lpmf({Y1[n]} | [X[n]], b_y1);
1 Like

I realized this, too. Although, looking at the available signatures, row_vector X is allowed in some signatures, so maybe this should also work without the wrapper

You’re right. Many more signatures now. I was looking at 2.18 doc. Here’s the latest:

Those should support Aki’s approach. But you need at least Stan 2.23 or 2.25 depending on the call, which are beyond RStan on CRAN (uses Stan 2.21). You could use either rstan from GitHub (uses Stan 2.26) or cmdstanr (uses the latest, Stan 2.30).

1 Like

Thanks!

I’m using CmdStan 2.30. I’ve give it a go wrapping into a { } container.