Help calculating log-likelihood with binomial_logit_glm to use with loo

I am working with some binary outcome data from an experiment where each trial present a combination of signal manipulations and the participants give 3 binary outcome judgements.

I’m modelling this as multivariate with 3 correlated outcomes.

I’d like to use loo to do some model comparisons, but am struggling to get a combined/joint log likelihood of all three outcomes per observation.

I’m bringing the outcomes in as three arrays

data {
  int<lower=1> N; // total number of observations
  array[N] int Y1; //  
  array[N] int Y2; // 
  array[N] int Y3; // response variable
  array[N] int<lower=1> J; // Subject idx
  int<lower=1> K; //number of predictors
  matrix[N, K] X; // predictor matrix
}

In generated quantities, I am calculating a log_lik for each outcome separately:

generated quantities {
   vector[N] log_lik1;
   vector[N] log_lik2;
   vector[N] log_lik3;
       
  for (n in 1:N) { 
           log_lik1[n] = bernoulli_logit_glm_lpmf(Y1| X, mu_y1_pred[n], b_y1);
           log_lik2[n] = bernoulli_logit_glm_lpmf(Y2 | X, mu_y2_pred[n], b_y2);
           log_lik3[n] = bernoulli_logit_glm_lpmf(Y3 | X, mu_y3_pred[n], b_y3);
  }
}

I’m calling loo with

loo(fit$draws("log_lik1"), r_eff=relative_eff(fit$draws("log_lik1")))

But every k value is too high.

When I fit the same model and data with brms and call loo(brms_fit) all k values are fine (< 0.5). Even when I specify only one response with resp = "Y1".

Have I coded this incorrectly? Is there a way to calculate a joint log-likelihood as well?

Yes, just use

log_lik[n] = log_lik1[n] + log_lik2[n] + log_lik3[n];

That defines log_lik to be the joint log likelihood of Y1, Y2, and Y3. But I’m guessing you really meant Y1[n] and so on. Otherwise you get a vectorized value for the entire container. Of course, I’m only guessing that Y1 is a container because I don’t see how any of this makes sense otherwise. In general, it’s much easier for us to help if you supply the whole model so we can see all the variable declarations.

Thank you very much @Bob_Carpenter

I am still getting all observations with high pareto K values.

Here is the full model code:

functions {
        /* compute correlated group-level effects
        * Args:
                *   z: matrix of unscaled group-level effects
        *   SD: vector of standard deviation parameters
        *   L: cholesky factor correlation matrix
        * Returns:
                *   matrix of scaled group-level effects
        */
                matrix scale_r_cor(matrix z, vector SD, matrix L) {
                        // r is stored in another dimension order than z
                        return transpose(diag_pre_multiply(SD, L) * z);
                }
}
data {
        int<lower=1> N; // total number of observations
        array[N] int Y1; // response variable
        array[N] int Y2; // response variable
        array[N] int Y3; // response variable
        array[N] int<lower=1> J; // grouping indicator for subjects
        int<lower=1> K; // number of population-level effects
        matrix[N, K] X; // population-level design matrix
        vector[N] Z_X1;
        vector[N] Z_X2;
        vector[N] Z_X3;
        vector[N] Z_X4;
        // data for group-level effects subject
        int<lower=1> N_subj; // number of subjects
        int<lower=1> M; // number of coefficients per subject
        int<lower=1> NC; // number of group-level correlations
        int prior_only; // should the likelihood be ignored?
}
parameters {
        vector[K] b_y1; // population-level effects
        vector[K] b_y2; // population-level effects
        vector[K] b_y3; // population-level effects
        vector<lower=0>[M] sd_1; // group-level standard deviations
        matrix[M, N_subj] z; // standardized group-level effects
        cholesky_factor_corr[M] L; // cholesky factor of correlation matrix
}
transformed parameters {
        matrix[N_subj, M] r; // actual group-level effects
        // using vectors speeds up indexing in loops
        vector[N_subj] r_y1_x1;
        vector[N_subj] r_y1_x2;
        vector[N_subj] r_y1_x3;
        vector[N_subj] r_y1_x4;
        vector[N_subj] r_y2_x1;
        vector[N_subj] r_y2_x2;
        vector[N_subj] r_y2_x3;
        vector[N_subj] r_y2_x4;
        vector[N_subj] r_y3_x1;
        vector[N_subj] r_y3_x2;
        vector[N_subj] r_y3_x3;
        vector[N_subj] r_y3_x4;
        // compute actual group-level effects
        r = scale_r_cor(z, sd_1, L);
        r_y1_x1 = r[ : , 1];
        r_y1_x2 = r[ : , 2];
        r_y1_x3 = r[ : , 3];
        r_y1_x4 = r[ : , 4];
        r_y2_x1 = r[ : , 5];
        r_y2_x2 = r[ : , 6];
        r_y2_x3 = r[ : , 7];
        r_y2_x3 = r[ : , 8];
        r_y3_x1 = r[ : , 9];
        r_y3_x2 = r[ : , 10];
        r_y3_x3 = r[ : , 11];
        r_y3_x4 = r[ : , 12];
}
model {
        real lprior = 0; // prior contributions to the log posterior
        // likelihood not including constants
        if (!prior_only) {
                // initialize linear predictor term
                vector[N] mu_y1 = rep_vector(0.0, N);
                // initialize linear predictor term
                vector[N] mu_y2 = rep_vector(0.0, N);
                // initialize linear predictor term
                vector[N] mu_y3 = rep_vector(0.0, N);
                for (n in 1 : N) {
                        // add more terms to the linear predictor
                        mu_y1[n] += r_y1_x1[J[n]]
                        * Z_X1[n]
                        + r_y1_x2[J[n]]
                        * Z_X2[n]
                        + r_y1_x3[J[n]]
                        * Z_X3[n]
                        + r_y1_x4[J[n]]
                        * Z_X4[n];
                }
                for (n in 1 : N) {
                        // add more terms to the linear predictor
                        mu_y2[n] += r_y2_x1[J[n]]
                        * Z_X1[n]
                        + r_y2_x2[J[n]]
                        * Z_X2[n]
                        + r_y2_x3[J[n]]
                        * Z_X3[n]
                        + r_y2_x4[J[n]]
                        * Z_X4[n];
                }
                for (n in 1 : N) {
                        // add more terms to the linear predictor
                        mu_y3[n] += r_y3_x1[J[n]]
                        * Z_X1[n]
                        + r_y3_x2[J[n]]
                        * Z_X2[n]
                        + r_y3_x3[J[n]]
                        * Z_X3[n]
                        + r_y3_x4[J[n]]
                        * Z_X4[n];
                }
                target += bernoulli_logit_glm_lupmf(Y1 | X, mu_y1, b_y1);
                target += bernoulli_logit_glm_lupmf(Y1 | X, mu_y2, b_y2);
                target += bernoulli_logit_glm_lupmf(Y1 | X, mu_y3, b_y3);
        }
        // priors not including constants
        lprior += normal_lupdf(b_y1[1] | 0, 1);
        lprior += normal_lupdf(b_y1[2] | 0, 1);
        lprior += normal_lupdf(b_y1[3] | 0, 1);
        lprior += normal_lupdf(b_y1[4] | 0, 1);
        lprior += normal_lupdf(b_y1[5] | 1, 1);
        lprior += normal_lupdf(b_y1[6] | 0, 1);
        lprior += normal_lupdf(b_y2 | 0, 1);
        lprior += normal_lupdf(b_y3 | 0, 1);
        lprior += normal_lupdf(sd_1 | 1, 1);
        lprior += lkj_corr_cholesky_lupdf(L | 1);
        target += lprior;
        target += std_normal_lupdf(to_vector(z));
}
generated quantities {
        vector[N] log_lik1;
        vector[N] log_lik2;
        vector[N] log_lik3;
        vector[N] log_lik;
        // For predictions on observed matrix
        vector[N] mu_y1_pred;
        vector[N] mu_y2_pred;
        vector[N] mu_y3_pred;
        vector[N] Xb_y1 = X * b_y1;
        vector[N] Xb_y2 = X * b_y2;
        vector[N] Xb_y3 = X * b_y3;
        vector[N] prob_y1;
        vector[N] prob_y2;
        vector[N] prob_y3;
        array[N] int y1_pred;  
        array[N] int y2_pred; 
        array[N] int y3_pred; 
        // compute group-level correlations
        corr_matrix[M] Cor_1 = multiply_lower_tri_self_transpose(L);
        vector<lower=-1, upper=1>[NC] cor_1;
        // extract upper diagonal of correlation matrix
        for (k in 1 : M) {
                for (j in 1 : (k - 1)) {
                        cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
                }
        }
        for (n in 1:N) { // predictions from observed data
                mu_y1_pred[n] = r_y1_x1[J[n]]
                        * Z_X1[n]
                        + r_y1_x2[J[n]]
                        * Z_X2[n]
                        + r_y1_x3[J[n]]
                        * Z_X3[n]
                        + r_y1_x4[J[n]]
                        * Z_X4[n];
                mu_y2_pred[n] = r_y2_x1[J[n]]
                        * Z_X1[n]
                        + r_y2_x2[J[n]]
                        * Z_X2[n]
                        + r_y2_x3[J[n]]
                        * Z_X3[n]
                        + r_y2_x4[J[n]]
                        * Z_X4[n];
                mu_y3_pred[n] = r_y3_x1[J[n]]
                        * Z_X1[n]
                        + r_y3_x2[J[n]]
                        * Z_X2[n]
                        + r_y3_x3[J[n]]
                        * Z_X3[n]
                        + r_y3_x4[J[n]]
                        * Z_X4[n];
                // Predicted probabilities
                prob_y1[n] = inv_logit(mu_y1_pred[n] + Xb_y1[n]);
                prob_y2[n] = inv_logit(mu_y2_pred[n] + Xb_y2[n]);
                prob_y3[n] = inv_logit(mu_y3_pred[n] + Xb_y3[n]);
                // Log-likelihoodcs
                log_lik1[n] = bernoulli_logit_glm_lpmf(Y1[n] | X, mu_y1_pred[n], b_y1);
                log_lik2[n] = bernoulli_logit_glm_lpmf(Y2[n] | X, mu_y2_pred[n], b_y2);
                log_lik3[n] = bernoulli_logit_glm_lpmf(Y3[n] | X, mu_y3_pred[n], b_y3);
                log_lik[n] = log_lik1[n] + log_lik2[n] + log_lik3[n];
        }
        // Predictions
        y1_pred = bernoulli_logit_glm_rng(X, mu_y1_pred, b_y1);
        y2_pred = bernoulli_logit_glm_rng(X, mu_y2_pred, b_y2);
        y3_pred = bernoulli_logit_glm_rng(X, mu_y3_pred, b_y3);
}





This is modeling all of the Y1 at once. I’m guessing the problem is that you want the log likelihoods for the individual Y1[n]. After calculating, just add those to the target using target += ...;.

You do something like that later:

log_lik1[n] = bernoulli_logit_glm_lpmf(Y1[n] | X, mu_y1_pred[n], b_y1);

but my guess here is that you’re not pulling out individual entries of X. Shouldn’t Y1[n] depend only on X[n]? So I think the problem is that you want to decompose what bernoulli_logit_glm does and calculate individual entries.

Thanks again @Bob_Carpenter !

Since X is a matrix, how can I index the rows? Do I need to_row_vector?

If X is a matrix, X[n] is its n-th row. That can also be written less parsimoniously as X[n, ] or X[n, :] or X[n, 1:cols(X)] or row(X, n).

I tried indexing with [n]

but get errors on compiling:

Ill-typed arguments supplied to function 'bernoulli_logit_glm_lpmf':
(int, row_vector, real, vector)
Available signatures:
(int, matrix, vector, vector) => real
  The second argument must be matrix but got row_vector
(int, matrix, real, vector) => real
  The second argument must be matrix but got row_vector
(array[] int, matrix, vector, vector) => real
  The first argument must be array[] int but got int
(array[] int, row_vector, real, vector) => real
  The first argument must be array[] int but got int
(array[] int, row_vector, vector, vector) => real
  The first argument must be array[] int but got int
(Additional signatures omitted)

I modified that section to:

 // Log-likelihoods
                log_lik1[n] = bernoulli_logit_glm_lpmf(Y1[n] | X, mu_y1_pred, b_y1);
                log_lik2[n] = bernoulli_logit_glm_lpmf(Y2[n] | X, mu_y2_pred, b_y2);
                log_lik3[n] = bernoulli_logit_glm_lpmf(Y3[n] | X, mu_y3_pred, b_y3);
                log_lik[n] = log_lik1[n] + log_lik2[n] + log_lik3[n];

Which compiles, but then sampling gives errors for every iteration:

Chain 4 Exception: bernoulli_logit_glm_lpmf: Intercept[2] is nan, but must be finite! (in '/tmp/RtmpR6gote/model-134901b9ca0a4.stan', line 205, column 11 to column 95)

This compiles and runs without error, but the results do not at all agree with those from brms

generated quantities {
        vector[N] log_lik1;
        vector[N] log_lik2;
        vector[N] log_lik3;
        vector[N] log_lik;
        // For predictions on observed matrix
        vector[N] mu_y1_pred;
        vector[N] mu_y2_pred;
        vector[N] mu_y3_pred;
        vector[N] Xb_y1 = X * b_y1;
        vector[N] Xb_y2 = X * b_y2;
        vector[N] Xb_y3 = X * b_y3;
        vector[N] prob_y1;
        vector[N] prob_y2;
        vector[N] prob_y3;
        array[N] int y1_pred;  
        array[N] int y2_pred; 
        array[N] int y3_pred; 
        // compute group-level correlations
        corr_matrix[M] Cor_1 = multiply_lower_tri_self_transpose(L);
        vector<lower=-1, upper=1>[NC] cor_1;
        // extract upper diagonal of correlation matrix
        for (k in 1 : M) {
                for (j in 1 : (k - 1)) {
                        cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
                }
        }
        for (n in 1:N) { // predictions from observed data
                mu_y1_pred[n] = r_y1_x1[J[n]]
                        * Z_X1[n]
                        + r_y1_x2[J[n]]
                        * Z_X2[n]
                        + r_y1_x3[J[n]]
                        * Z_X3[n]
                        + r_y1_x4[J[n]]
                        * Z_X4[n];
                mu_y2_pred[n] = r_y2_x1[J[n]]
                        * Z_X1[n]
                        + r_y2_x2[J[n]]
                        * Z_X2[n]
                        + r_y2_x3[J[n]]
                        * Z_X3[n]
                        + r_y2_x4[J[n]]
                        * Z_X4[n];
                mu_y3_pred[n] = r_y3_x1[J[n]]
                        * Z_X1[n]
                        + r_y3_x2[J[n]]
                        * Z_X2[n]
                        + r_y3_x3[J[n]]
                        * Z_X3[n]
                        + r_y3_x4[J[n]]
                        * Z_X4[n];
                // Predicted probabilities
                prob_y1[n] = inv_logit(mu_y1_pred[n] + Xb_y1[n]);
                prob_y2[n] = inv_logit(mu_y2_pred[n] + Xb_y2[n]);
                prob_y3[n] = inv_logit(mu_y3_pred[n] + Xb_y3[n]);
                // Log-likelihoods
                log_lik1[n] = bernoulli_logit_glm_lpmf(Y1[n] | X, mu_y1_pred[n], b_y1);
                log_lik2[n] = bernoulli_logit_glm_lpmf(Y2[n] | X, mu_y2_pred[n], b_y2);
                log_lik3[n] = bernoulli_logit_glm_lpmf(Y3[n] | X, mu_y3_pred[n], b_y3);
                log_lik[n] = log_lik1[n] + log_lik2[n] + log_lik3[n];
        }
        // Predictions
        y1_pred = bernoulli_logit_glm_rng(X, mu_y1_pred, b_y1);
        y2_pred = bernoulli_logit_glm_rng(X, mu_y2_pred, b_y2);
        y3_pred = bernoulli_logit_glm_rng(X, mu_y3_pred, b_y3);
}

loo(Fit$draws("log_lik"), r_eff=relative_eff(Fit$draws("log_lik")), ncores = ncores)

Computed from 4000 by 1296 log-likelihood matrix

           Estimate      SE
elpd_loo -1583067.8 11689.6
p_loo      628518.5 11654.4
looic     3166135.6 23379.2
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)        0    0.0%  <NA>      
 (0.5, 0.7]   (ok)          0    0.0%  <NA>      
   (0.7, 1]   (bad)         0    0.0%  <NA>      
   (1, Inf)   (very bad) 1292  100.0%  0         
See help('pareto-k-diagnostic') for details.

As I should have done in the initial post, here is a fully reproducible example:

Compile model

stan_file_variables <- write_stan_file("
functions {
        /* compute correlated group-level effects
        * Args:
                *   z: matrix of unscaled group-level effects
        *   SD: vector of standard deviation parameters
        *   L: cholesky factor correlation matrix
        * Returns:
                *   matrix of scaled group-level effects
        */
                matrix scale_r_cor(matrix z, vector SD, matrix L) {
                        // r is stored in another dimension order than z
                        return transpose(diag_pre_multiply(SD, L) * z);
                }
}
data {
        int<lower=1> N; // total number of observations
        array[N] int Y1; // response variable
        array[N] int Y2; // response variable
        array[N] int Y3; // response variable
        array[N] int<lower=1> J; // grouping indicator for subjects
        int<lower=1> K; // number of population-level effects
        matrix[N, K] X; // population-level design matrix
        vector[N] Z_Int;
        vector[N] Z_X2;
        vector[N] Z_X3;
        vector[N] Z_X4;
        // data for group-level effects subject
        int<lower=1> N_subj; // number of subjects
        int<lower=1> M; // number of coefficients per subject
        int<lower=1> NC; // number of group-level correlations
        int prior_only; // should the likelihood be ignored?
}
parameters {
        vector[K] b_y1; // population-level effects
        vector[K] b_y2; // population-level effects
        vector[K] b_y3; // population-level effects
        vector<lower=0>[M] sd_1; // group-level standard deviations
        matrix[M, N_subj] z; // standardized group-level effects
        cholesky_factor_corr[M] L; // cholesky factor of correlation matrix
}
transformed parameters {
        matrix[N_subj, M] r; // actual group-level effects
        // using vectors speeds up indexing in loops
        vector[N_subj] r_y1_x1;
        vector[N_subj] r_y1_x2;
        vector[N_subj] r_y1_x3;
        vector[N_subj] r_y1_x4;
        vector[N_subj] r_y2_x1;
        vector[N_subj] r_y2_x2;
        vector[N_subj] r_y2_x3;
        vector[N_subj] r_y2_x4;
        vector[N_subj] r_y3_x1;
        vector[N_subj] r_y3_x2;
        vector[N_subj] r_y3_x3;
        vector[N_subj] r_y3_x4;
        // compute actual group-level effects
        r = scale_r_cor(z, sd_1, L);
        r_y1_x1 = r[ : , 1];
        r_y1_x2 = r[ : , 2];
        r_y1_x3 = r[ : , 3];
        r_y1_x4 = r[ : , 4];
        r_y2_x1 = r[ : , 5];
        r_y2_x2 = r[ : , 6];
        r_y2_x3 = r[ : , 7];
        r_y2_x4 = r[ : , 8];
        r_y3_x1 = r[ : , 9];
        r_y3_x2 = r[ : , 10];
        r_y3_x3 = r[ : , 11];
        r_y3_x4 = r[ : , 12];
}
model {
        real lprior = 0; // prior contributions to the log posterior
        // likelihood not including constants
        if (!prior_only) {
                // initialize linear predictor term
                vector[N] mu_y1 = rep_vector(0.0, N);
                // initialize linear predictor term
                vector[N] mu_y2 = rep_vector(0.0, N);
                // initialize linear predictor term
                vector[N] mu_y3 = rep_vector(0.0, N);
                for (n in 1 : N) {
                        // add more terms to the linear predictor
                        mu_y1[n] += r_y1_x1[J[n]]
                        * Z_Int[n]
                        + r_y1_x2[J[n]]
                        * Z_X2[n]
                        + r_y1_x3[J[n]]
                        * Z_X3[n]
                        + r_y1_x4[J[n]]
                        * Z_X4[n];
                }
                for (n in 1 : N) {
                        // add more terms to the linear predictor
                        mu_y2[n] += r_y2_x1[J[n]]
                        * Z_Int[n]
                        + r_y2_x2[J[n]]
                        * Z_X2[n]
                        + r_y2_x3[J[n]]
                        * Z_X3[n]
                        + r_y2_x4[J[n]]
                        * Z_X4[n];
                }
                for (n in 1 : N) {
                        // add more terms to the linear predictor
                        mu_y3[n] += r_y3_x1[J[n]]
                        * Z_Int[n]
                        + r_y3_x2[J[n]]
                        * Z_X2[n]
                        + r_y3_x3[J[n]]
                        * Z_X3[n]
                        + r_y3_x4[J[n]]
                        * Z_X4[n];
                }
                target += bernoulli_logit_glm_lupmf(Y1 | X, mu_y1, b_y1);
                target += bernoulli_logit_glm_lupmf(Y2 | X, mu_y2, b_y2);
                target += bernoulli_logit_glm_lupmf(Y3 | X, mu_y3, b_y3);
        }
        // priors not including constants
        lprior += normal_lupdf(b_y1[1] | 0, 1);
        lprior += normal_lupdf(b_y1[2] | 0, 1);
        lprior += normal_lupdf(b_y1[3] | 0, 1);
        lprior += normal_lupdf(b_y1[4] | 0, 1);
        lprior += normal_lupdf(b_y1[5] | 1, 1);
        lprior += normal_lupdf(b_y1[6] | 0, 1);
        lprior += normal_lupdf(b_y2 | 0, 1);
        lprior += normal_lupdf(b_y3 | 0, 1);
        lprior += normal_lupdf(sd_1 | 1, 1);
        lprior += lkj_corr_cholesky_lupdf(L | 1);
        target += lprior;
        target += std_normal_lupdf(to_vector(z));
}
generated quantities {
        vector[N] log_lik1;
        vector[N] log_lik2;
        vector[N] log_lik3;
        vector[N] log_lik;
        // For predictions on observed matrix
        vector[N] mu_y1_pred;
        vector[N] mu_y2_pred;
        vector[N] mu_y3_pred;
        vector[N] Xb_y1 = X * b_y1;
        vector[N] Xb_y2 = X * b_y2;
        vector[N] Xb_y3 = X * b_y3;
        vector[N] prob_y1;
        vector[N] prob_y2;
        vector[N] prob_y3;
        array[N] int y1_pred;  
        array[N] int y2_pred; 
        array[N] int y3_pred; 
        // compute group-level correlations
        corr_matrix[M] Cor_1 = multiply_lower_tri_self_transpose(L);
        vector<lower=-1, upper=1>[NC] cor_1;
        // extract upper diagonal of correlation matrix
        for (k in 1 : M) {
                for (j in 1 : (k - 1)) {
                        cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
                }
        }
        for (n in 1:N) { // predictions from observed data
                mu_y1_pred[n] = r_y1_x1[J[n]]
                        * Z_Int[n]
                        + r_y1_x2[J[n]]
                        * Z_X2[n]
                        + r_y1_x3[J[n]]
                        * Z_X3[n]
                        + r_y1_x4[J[n]]
                        * Z_X4[n];
                mu_y2_pred[n] = r_y2_x1[J[n]]
                        * Z_Int[n]
                        + r_y2_x2[J[n]]
                        * Z_X2[n]
                        + r_y2_x3[J[n]]
                        * Z_X3[n]
                        + r_y2_x4[J[n]]
                        * Z_X4[n];
                mu_y3_pred[n] = r_y3_x1[J[n]]
                        * Z_Int[n]
                        + r_y3_x2[J[n]]
                        * Z_X2[n]
                        + r_y3_x3[J[n]]
                        * Z_X3[n]
                        + r_y3_x4[J[n]]
                        * Z_X4[n];
                // Predicted probabilities
                prob_y1[n] = inv_logit(mu_y1_pred[n] + Xb_y1[n]);
                prob_y2[n] = inv_logit(mu_y2_pred[n] + Xb_y2[n]);
                prob_y3[n] = inv_logit(mu_y3_pred[n] + Xb_y3[n]);
                // Log-likelihoods
                log_lik1[n] = bernoulli_logit_glm_lpmf(Y1[n] | X, mu_y1_pred[n], b_y1);
                log_lik2[n] = bernoulli_logit_glm_lpmf(Y2[n] | X, mu_y2_pred[n], b_y2);
                log_lik3[n] = bernoulli_logit_glm_lpmf(Y3[n] | X, mu_y3_pred[n], b_y3);
                log_lik[n] = log_lik1[n] + log_lik2[n] + log_lik3[n];
        }
        // Predictions
        y1_pred = bernoulli_logit_glm_rng(X, mu_y1_pred, b_y1);
        y2_pred = bernoulli_logit_glm_rng(X, mu_y2_pred, b_y2);
        y3_pred = bernoulli_logit_glm_rng(X, mu_y3_pred, b_y3);
}
")
Ex_Mod <- cmdstan_model(stan_file_variables)

Create data list from .csv:
Ex_Data.csv (88.0 KB)

Ex_Data <- read.csv("Ex_Data.csv")

## Create data list ----
Ex_Stan_Data <- lst(N = nrow(tbl_data),
                     Y1 = Ex_Data$Y1,
                     Y2 = Ex_Data$Y2,
                     Y3 = Ex_Data$Y3,
                     X = model.matrix(~ 0 + X1 +
                                              X2 +
                                              X3 +
                                              X4,
                                      data = Ex_Data),
                     K = ncol(X),
                     K_zi = 53,
                     Z_Int = array(rep(1, nrow(Ex_Data)),
                                   dimnames = list(rep(1, nrow(Ex_Data)))),
                     Z_X2 = array(Ex_Data$X2, 
                                    dimnames = list(Ex_Data$X2)),
                     Z_X3 = array(Ex_Data$X3,
                                    dimnames = list(Ex_Data$X3)),
                     Z_X4 = array(Ex_Data$X4, 
                                     dimnames = list(Ex_Data$X4)),
                     J = Ex_Data %>%
                             mutate(subject = as.numeric(factor(subject))) %>%
                             pull(subject),
                     N_subj = length(unique(Ex_Data$subject)),
                     M = 4*3, # Number of group-level coefficients * outcomes
                     NC = (4*3)*((4*3) - 1)/2, # (M * (M-1))/2
                     prior_only = 0
)

Sample from model:

# Sample from compiled model ----
Ex_Fit <- Ex_Mod$sample(
        data = Ex_Stan_Data,       
        iter_warmup = 1000,            
        iter_sampling = 1000,        
        chains = 4,    
        max_treedepth = 12, 
        adapt_delta = 0.99)

PPC plot

Loo results:

Ex_Fit$loo(ncores = ncores)

Computed from 4000 by 1296 log-likelihood matrix

           Estimate       SE
elpd_loo -6179529.1  53762.4
p_loo     4265220.3  40488.5
looic    12359058.2 107524.8
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)        0    0.0%  <NA>      
 (0.5, 0.7]   (ok)          0    0.0%  <NA>      
   (0.7, 1]   (bad)         0    0.0%  <NA>      
   (1, Inf)   (very bad) 1292  100.0%  0         
See help('pareto-k-diagnostic') for details

The PPC plot looks great, but the loo results look terrrible.

The same model fit in brms gives very good loo results.

fmla_y1 <- bf(Y1 ~ 0 + X1 +
                         X2 +
                         X3 +
                         X4 +
                         (1 + X2 + X3 + X4 | p | subject),
                 center = FALSE)

fmla_y2 <- bf(Y2 ~ 0 + X1 +
                      X2 +
                      X3 +
                      X4 +
                      (1 + X2 + X3 + X4 | p | subject),
              center = FALSE)

fmla_y3 <- bf(Y3 ~ 0 + X1 +
                      X2 +
                      X3 +
                      X4 +
                      (1 + X2 + X3 + X4 | p | subject),
              center = FALSE)


priors <- c(
        set_prior("normal(0,1)",
                  class = "b",
                  resp = "Y1"),
        
        set_prior("normal(1,1)",
                  class = "b",
                  coef = "X2",
                  resp = "Y1"),
        
        set_prior("normal(1,1)",
                  class = "sd",
                  resp = "Y1"),
        
        set_prior("normal(0,1)",
                  class = "b",
                  resp = "Y2"),
        
        set_prior("normal(1,1)",
                  class = "sd",
                  resp = "Y2"),
        
        set_prior("normal(0,1)",
                  class = "b",
                  resp = "Y3"),
        
        set_prior("normal(1,1)",
                  class = "sd",
                  resp = "Y3")
)


brm_mod <- brm(fmla_y1 + fmla_y2 + fmla_y3,
                  family = bernoulli(),
                  Ex_Data,
                  prior = priors,
                  init = 0,
                  iter = 2000,
                  warmup = 1000,
                  chains = 4,
                  cores = ncores,
                  normalize = FALSE,
                  backend = "cmdstan",
                  control = list(max_treedepth = 12,
                                 adapt_delta = 0.9)
)

loo(brm_mod, ncores = ncores)

Computed from 4000 by 1296 log-likelihood matrix

         Estimate   SE
elpd_loo  -2297.4 37.3
p_loo        74.8  2.1
looic      4594.9 74.5
------
Monte Carlo SE of elpd_loo is 0.1.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

Just based on a quick look, should this

be

log_lik1[n] = bernoulli_logit_glm_lpmf(Y1[n] | X[n], mu_y1_pred[n], b_y1);

That gives me the following errors:

Compiling Stan program...
Semantic error in '/tmp/Rtmp4BMJ83/model-3f428791c47a5.stan', line 189, column 30 to column 89:
   -------------------------------------------------
   187:                  prob_y3[n] = inv_logit(mu_y3_pred[n] + Xb_y3[n]);
   188:                  // Log-likelihoods
   189:                  log_lik1[n] = bernoulli_logit_glm_lpmf(Y1[n] | X[n], mu_y1_pred[n], b_y1);
                                       ^
   190:                  log_lik2[n] = bernoulli_logit_glm_lpmf(Y2[n] | X[n], mu_y2_pred[n], b_y2);
   191:                  log_lik3[n] = bernoulli_logit_glm_lpmf(Y3[n] | X[n], mu_y3_pred[n], b_y3);
   -------------------------------------------------

Ill-typed arguments supplied to function 'bernoulli_logit_glm_lpmf':
(int, row_vector, real, vector)
Available signatures:
(int, matrix, vector, vector) => real
  The second argument must be matrix but got row_vector
(int, matrix, real, vector) => real
  The second argument must be matrix but got row_vector
(array[] int, matrix, vector, vector) => real
  The first argument must be array[] int but got int
(array[] int, row_vector, real, vector) => real
  The first argument must be array[] int but got int
(array[] int, row_vector, vector, vector) => real
  The first argument must be array[] int but got int
(Additional signatures omitted)

make: *** [make/program:50: /tmp/Rtmp4BMJ83/model-3f428791c47a5.hpp] Error 1

Error: An error occured during compilation! See the message above for more information.

Sorry, I think I misread your model

Maybe @paul.buerkner can help explain what ‘brms’ does in this case?

I’ve looked through the ‘brms’ code, but what I’ve gleaned and put into generated quantities isn’t working.

Perhaps there is some other problem with the model? The way you add the log-likelihoods together seems to be correct and is the same approach that brms does in R.

Thanks, @paul.buerkner

That’s what’s confusing me.

If I pass the brms model to loo all k are <0.5.

But, if I take the code and data generated by brms from the fitted model (standata() and stancode()) and then fit those directly in cmdstanr, all observations have high k.

This is what I’ve added to generated quantities in the code generated by brms

generated quantities {
        vector[N] log_lik1;
        vector[N] log_lik2;
        vector[N] log_lik3;
        vector[N] log_lik;
        // For predictions on observed matrix
        vector[N] mu_y1_pred;
        vector[N] mu_y2_pred;
        vector[N] mu_y3_pred;
        // compute group-level correlations
        corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
        vector<lower=-1, upper=1>[NC_1] cor_1;
        // extract upper diagonal of correlation matrix
        for (k in 1 : M_1) {
                for (j in 1 : (k - 1)) {
                        cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
                }
        }
        for (n in 1:N) { // predictions from observed data
                mu_y1_pred[n] = r_1_Y1_1[J_1_Y1[n]] * Z_1_Y1_1[n]
                  + r_1_Y1_2[J_1_Y1[n]] * Z_1_Y1_2[n]
                  + r_1_Y1_3[J_1_Y1[n]] * Z_1_Y1_3[n]
                  + r_1_Y1_4[J_1_Y1[n]] * Z_1_Y1_4[n];
                mu_y2_pred[n] = r_1_Y2_5[J_1_Y2[n]] * Z_1_Y2_5[n]
                  + r_1_Y2_6[J_1_Y2[n]] * Z_1_Y2_6[n]
                  + r_1_Y2_7[J_1_Y2[n]] * Z_1_Y2_7[n]
                  + r_1_Y2_8[J_1_Y2[n]] * Z_1_Y2_8[n];
                mu_y3_pred[n] = r_1_Y3_9[J_1_Y3[n]] * Z_1_Y3_9[n]
                  + r_1_Y3_10[J_1_Y3[n]] * Z_1_Y3_10[n]
                  + r_1_Y3_11[J_1_Y3[n]] * Z_1_Y3_11[n]
                  + r_1_Y3_12[J_1_Y3[n]] * Z_1_Y3_12[n];
                // Log-likelihoods
                log_lik1[n] = bernoulli_logit_glm_lpmf(Y_Y1[n] | X_Y1, mu_y1_pred[n], b_Y1);
                log_lik2[n] = bernoulli_logit_glm_lpmf(Y_Y2[n] | X_Y2, mu_y2_pred[n], b_Y2);
                log_lik3[n] = bernoulli_logit_glm_lpmf(Y_Y3[n] | X_Y3, mu_y3_pred[n], b_Y3);
                log_lik[n] = log_lik1[n] + log_lik2[n] + log_lik3[n];
        }
}

This is the full code generated by brms

functions {
  /* compute correlated group-level effects
   * Args:
   *   z: matrix of unscaled group-level effects
   *   SD: vector of standard deviation parameters
   *   L: cholesky factor correlation matrix
   * Returns:
   *   matrix of scaled group-level effects
   */
  matrix scale_r_cor(matrix z, vector SD, matrix L) {
    // r is stored in another dimension order than z
    return transpose(diag_pre_multiply(SD, L) * z);
  }
}
data {
  int<lower=1> N; // total number of observations
  int<lower=1> N_Y1; // number of observations
  array[N_Y1] int Y_Y1; // response variable
  int<lower=1> K_Y1; // number of population-level effects
  matrix[N_Y1, K_Y1] X_Y1; // population-level design matrix
  int<lower=1> N_Y2; // number of observations
  array[N_Y2] int Y_Y2; // response variable
  int<lower=1> K_Y2; // number of population-level effects
  matrix[N_Y2, K_Y2] X_Y2; // population-level design matrix
  int<lower=1> N_Y3; // number of observations
  array[N_Y3] int Y_Y3; // response variable
  int<lower=1> K_Y3; // number of population-level effects
  matrix[N_Y3, K_Y3] X_Y3; // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1; // number of grouping levels
  int<lower=1> M_1; // number of coefficients per level
  array[N_Y1] int<lower=1> J_1_Y1; // grouping indicator per observation
  array[N_Y2] int<lower=1> J_1_Y2; // grouping indicator per observation
  array[N_Y3] int<lower=1> J_1_Y3; // grouping indicator per observation
  // group-level predictor values
  vector[N_Y1] Z_1_Y1_1;
  vector[N_Y1] Z_1_Y1_2;
  vector[N_Y1] Z_1_Y1_3;
  vector[N_Y1] Z_1_Y1_4;
  vector[N_Y2] Z_1_Y2_5;
  vector[N_Y2] Z_1_Y2_6;
  vector[N_Y2] Z_1_Y2_7;
  vector[N_Y2] Z_1_Y2_8;
  vector[N_Y3] Z_1_Y3_9;
  vector[N_Y3] Z_1_Y3_10;
  vector[N_Y3] Z_1_Y3_11;
  vector[N_Y3] Z_1_Y3_12;
  int<lower=1> NC_1; // number of group-level correlations
  int prior_only; // should the likelihood be ignored?
}
transformed data {
  
}
parameters {
  vector[K_Y1] b_Y1; // population-level effects
  vector[K_Y2] b_Y2; // population-level effects
  vector[K_Y3] b_Y3; // population-level effects
  vector<lower=0>[M_1] sd_1; // group-level standard deviations
  matrix[M_1, N_1] z_1; // standardized group-level effects
  cholesky_factor_corr[M_1] L_1; // cholesky factor of correlation matrix
}
transformed parameters {
  matrix[N_1, M_1] r_1; // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_1] r_1_Y1_1;
  vector[N_1] r_1_Y1_2;
  vector[N_1] r_1_Y1_3;
  vector[N_1] r_1_Y1_4;
  vector[N_1] r_1_Y2_5;
  vector[N_1] r_1_Y2_6;
  vector[N_1] r_1_Y2_7;
  vector[N_1] r_1_Y2_8;
  vector[N_1] r_1_Y3_9;
  vector[N_1] r_1_Y3_10;
  vector[N_1] r_1_Y3_11;
  vector[N_1] r_1_Y3_12;
  // compute actual group-level effects
  r_1 = scale_r_cor(z_1, sd_1, L_1);
  r_1_Y1_1 = r_1[ : , 1];
  r_1_Y1_2 = r_1[ : , 2];
  r_1_Y1_3 = r_1[ : , 3];
  r_1_Y1_4 = r_1[ : , 4];
  r_1_Y2_5 = r_1[ : , 5];
  r_1_Y2_6 = r_1[ : , 6];
  r_1_Y2_7 = r_1[ : , 7];
  r_1_Y2_8 = r_1[ : , 8];
  r_1_Y3_9 = r_1[ : , 9];
  r_1_Y3_10 = r_1[ : , 10];
  r_1_Y3_11 = r_1[ : , 11];
  r_1_Y3_12 = r_1[ : , 12];
}
model {
  real lprior = 0; // prior contributions to the log posterior
  // likelihood not including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N_Y1] mu_Y1 = rep_vector(0.0, N_Y1);
    // initialize linear predictor term
    vector[N_Y2] mu_Y2 = rep_vector(0.0, N_Y2);
    // initialize linear predictor term
    vector[N_Y3] mu_Y3 = rep_vector(0.0, N_Y3);
    for (n in 1 : N_Y1) {
      // add more terms to the linear predictor
      mu_Y1[n] += r_1_Y1_1[J_1_Y1[n]] * Z_1_Y1_1[n]
                  + r_1_Y1_2[J_1_Y1[n]] * Z_1_Y1_2[n]
                  + r_1_Y1_3[J_1_Y1[n]] * Z_1_Y1_3[n]
                  + r_1_Y1_4[J_1_Y1[n]] * Z_1_Y1_4[n];
    }
    for (n in 1 : N_Y2) {
      // add more terms to the linear predictor
      mu_Y2[n] += r_1_Y2_5[J_1_Y2[n]] * Z_1_Y2_5[n]
                  + r_1_Y2_6[J_1_Y2[n]] * Z_1_Y2_6[n]
                  + r_1_Y2_7[J_1_Y2[n]] * Z_1_Y2_7[n]
                  + r_1_Y2_8[J_1_Y2[n]] * Z_1_Y2_8[n];
    }
    for (n in 1 : N_Y3) {
      // add more terms to the linear predictor
      mu_Y3[n] += r_1_Y3_9[J_1_Y3[n]] * Z_1_Y3_9[n]
                  + r_1_Y3_10[J_1_Y3[n]] * Z_1_Y3_10[n]
                  + r_1_Y3_11[J_1_Y3[n]] * Z_1_Y3_11[n]
                  + r_1_Y3_12[J_1_Y3[n]] * Z_1_Y3_12[n];
    }
    target += bernoulli_logit_glm_lupmf(Y_Y1 | X_Y1, mu_Y1, b_Y1);
    target += bernoulli_logit_glm_lupmf(Y_Y2 | X_Y2, mu_Y2, b_Y2);
    target += bernoulli_logit_glm_lupmf(Y_Y3 | X_Y3, mu_Y3, b_Y3);
  }
  // priors not including constants
  lprior += normal_lupdf(b_Y1[1] | 0, 1);
  lprior += normal_lupdf(b_Y1[2] | 0, 1);
  lprior += normal_lupdf(b_Y1[3] | 0, 1);
  lprior += normal_lupdf(b_Y1[4] | 1, 1);
  lprior += normal_lupdf(b_Y1[5] | 0, 1);
  lprior += normal_lupdf(b_Y1[6] | 0, 1);
  lprior += normal_lupdf(b_Y2 | 0, 1);
  lprior += normal_lupdf(b_Y3 | 0, 1);
  lprior += normal_lupdf(sd_1 | 1, 1);
  lprior += lkj_corr_cholesky_lupdf(L_1 | 1);
  target += lprior;
  target += std_normal_lupdf(to_vector(z_1));
}
generated quantities {
        vector[N] log_lik1;
        vector[N] log_lik2;
        vector[N] log_lik3;
        vector[N] log_lik;
        // For predictions on observed matrix
        vector[N] mu_y1_pred;
        vector[N] mu_y2_pred;
        vector[N] mu_y3_pred;
        // compute group-level correlations
        corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
        vector<lower=-1, upper=1>[NC_1] cor_1;
        // extract upper diagonal of correlation matrix
        for (k in 1 : M_1) {
                for (j in 1 : (k - 1)) {
                        cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
                }
        }
        for (n in 1:N) { // predictions from observed data
                mu_y1_pred[n] = r_1_Y1_1[J_1_Y1[n]] * Z_1_Y1_1[n]
                  + r_1_Y1_2[J_1_Y1[n]] * Z_1_Y1_2[n]
                  + r_1_Y1_3[J_1_Y1[n]] * Z_1_Y1_3[n]
                  + r_1_Y1_4[J_1_Y1[n]] * Z_1_Y1_4[n];
                mu_y2_pred[n] = r_1_Y2_5[J_1_Y2[n]] * Z_1_Y2_5[n]
                  + r_1_Y2_6[J_1_Y2[n]] * Z_1_Y2_6[n]
                  + r_1_Y2_7[J_1_Y2[n]] * Z_1_Y2_7[n]
                  + r_1_Y2_8[J_1_Y2[n]] * Z_1_Y2_8[n];
                mu_y3_pred[n] = r_1_Y3_9[J_1_Y3[n]] * Z_1_Y3_9[n]
                  + r_1_Y3_10[J_1_Y3[n]] * Z_1_Y3_10[n]
                  + r_1_Y3_11[J_1_Y3[n]] * Z_1_Y3_11[n]
                  + r_1_Y3_12[J_1_Y3[n]] * Z_1_Y3_12[n];
                // Log-likelihoods
                log_lik1[n] = bernoulli_logit_glm_lpmf(Y_Y1[n] | X_Y1, mu_y1_pred[n], b_Y1);
                log_lik2[n] = bernoulli_logit_glm_lpmf(Y_Y2[n] | X_Y2, mu_y2_pred[n], b_Y2);
                log_lik3[n] = bernoulli_logit_glm_lpmf(Y_Y3[n] | X_Y3, mu_y3_pred[n], b_Y3);
                log_lik[n] = log_lik1[n] + log_lik2[n] + log_lik3[n];
        }
}

@avehtari is on the right track. He just missed wrapping things up into the appropriate containers. So that should be:

bernoulli_logit_glm_lpmf({Y1[n]} | [X[n]], b_y1);
1 Like

I realized this, too. Although, looking at the available signatures, row_vector X is allowed in some signatures, so maybe this should also work without the wrapper

You’re right. Many more signatures now. I was looking at 2.18 doc. Here’s the latest:

Those should support Aki’s approach. But you need at least Stan 2.23 or 2.25 depending on the call, which are beyond RStan on CRAN (uses Stan 2.21). You could use either rstan from GitHub (uses Stan 2.26) or cmdstanr (uses the latest, Stan 2.30).

1 Like

Thanks!

I’m using CmdStan 2.30. I’ve give it a go wrapping into a { } container.