Estimating Polychoric Correlation Matrix with Stan

JLC · October 21, 2020, 12:49am

Hi all,

I’m working on a Bayesian approach to “gernalizability analysis” with an ordinal data set. I used brms to fit cumulative models to then calculate “variance partitioning”.

Generalizability studies typically report a reliability statistic for the scale, this is often Cronbach’s alpha. But, my view is that the typical approach is specific to frequentist models of metric/interval data.

After some reading, I think calculating an “ordinal alpha” based on the polychoric correlation matrix would be the best approach.

There are some existing packages in R to calculate polychoric correlation matrices (e.g., polychor,, psych::polychoric) but these are restricted to maximum likelihood estimation.

I’m wondering if anyone is aware of a stan-based approach for estimating a polychoric correlation matrix. I’d like to use a Stan/Bayesian approach to fit with the rest of the analysis and provide distributions of the estimates.

I don’t have the programming skills to implement this, so I’m wondering if anyone has already attempted the estimation.

As an appendix, this is the implementation in the polychor package

polychor <-
    function (x, y, ML=FALSE, control=list(), std.err=FALSE, maxcor=.9999){
        f <- function(pars) {
            if (length(pars) == 1){
                rho <- pars
                if (abs(rho) > maxcor) rho <- sign(rho)*maxcor
                row.cuts <- rc
                col.cuts <- cc
            }
            else {
                rho <- pars[1]
                if (abs(rho) > maxcor) rho <- sign(rho)*maxcor
                row.cuts <- pars[2:r]
                col.cuts <- pars[(r+1):(r+c-1)]
                if (any(diff(row.cuts) < 0) || any(diff(col.cuts) < 0)) return(Inf)
            }
            P <- binBvn(rho, row.cuts, col.cuts)
            - sum(tab * log(P))
        }
        tab <- if (missing(y)) x else table(x, y)
        zerorows <- apply(tab, 1, function(x) all(x == 0))
        zerocols <- apply(tab, 2, function(x) all(x == 0))
        zr <- sum(zerorows)
        if (0 < zr) warning(paste(zr, " row", suffix <- if(zr == 1) "" else "s",
                                  " with zero marginal", suffix," removed", sep=""))
        zc <- sum(zerocols)
        if (0 < zc) warning(paste(zc, " column", suffix <- if(zc == 1) "" else "s",
                                  " with zero marginal", suffix, " removed", sep=""))
        tab <- tab[!zerorows, ,drop=FALSE]  
        tab <- tab[, !zerocols, drop=FALSE] 
        r <- nrow(tab)
        c <- ncol(tab)
        if (r < 2) {
            warning("the table has fewer than 2 rows")
            return(NA)
        }
        if (c < 2) {
            warning("the table has fewer than 2 columns")
            return(NA)
        }
        n <- sum(tab)
        rc <- qnorm(cumsum(rowSums(tab))/n)[-r]
        cc <- qnorm(cumsum(colSums(tab))/n)[-c]
        if (ML) {
            result <- optim(c(optimise(f, interval=c(-1, 1))$minimum, rc, cc), f,
                            control=control, hessian=std.err)
            if (result$par[1] > 1){
                result$par[1] <- maxcor
                warning(paste("inadmissible correlation set to", maxcor))
            }
            else if (result$par[1] < -1){
                result$par[1] <- -maxcor
                warning(paste("inadmissible correlation set to -", maxcor, sep=""))
            }
            if (std.err) {
                chisq <- 2*(result$value + sum(tab * log((tab + 1e-6)/n)))
                df <- length(tab) - r - c
                result <- list(type="polychoric",
                               rho=result$par[1],
                               row.cuts=result$par[2:r],
                               col.cuts=result$par[(r+1):(r+c-1)],
                               var=solve(result$hessian),
                               n=n,
                               chisq=chisq,
                               df=df,
                               ML=TRUE)
                class(result) <- "polycor"
                return(result)
            }
            else return(as.vector(result$par[1]))
        }
        else if (std.err){
            result <- optim(0, f, control=control, hessian=TRUE, method="BFGS")
            if (result$par > 1){
                result$par <- maxcor
                warning(paste("inadmissible correlation set to", maxcor))
            }
            else if (result$par < -1){
                result$par <- -maxcor
                warning(paste("inadmissible correlation set to -", maxcor, sep=""))
            }
            chisq <- 2*(result$value + sum(tab *log((tab + 1e-6)/n)))
            df <- length(tab) - r - c 
            result <- list(type="polychoric",
                           rho=result$par,
                           var=1/result$hessian,
                           n=n,
                           chisq=chisq,
                           df=df,
                           ML=FALSE)
            class(result) <- "polycor"
            return(result)
        }
        else optimise(f, interval=c(-maxcor, maxcor))$minimum
    }

bgoodri · October 21, 2020, 3:24pm

Another thing is that the polychor function produces maximum likelihood estimates of the pairwise correlation between two variables, but the overall correlation matrix among K variables may not be positive definite.

To do the Bayesian analogue, you basically need the stuff at

and to estimate the correlation matrix, the latent z variables, and the cutpoints simultaneously. Hopefully, that is enough to get started.

JLC · October 22, 2020, 3:10pm

Thanks, Ben!

The link provided and code for the tobit model are well above my understanding, but provide a good starting place!

I should have included in my initial post that the estimation of the polychromic correlation is the first step in estimating an ordinal alpha (reliability) coefficient.

I’m trying to take a Bayesian approach to what is presented and discussed in this paper that provides an R-based example workflow in the appendix.

icostley · November 20, 2022, 5:49pm

Hi @JLC

Sorry to revive this thread, but I think I got an implementation working on wanted to share it in case anyone else has trouble.

I used the above truncated multivariate normal code from @bgoodri above as well as a modified dirichlet prior for the cutpoints which I adapted from @betanalpha awesome blog post.

Here is the R code to generate some sample data:

library(tidyverse)
library(cmdstanr)
library(bayesplot)

sigma <- c(1,0.5,0.5,1) %>% matrix(.,nrow=2,byrow=T)
dat <- mvtnorm::rmvnorm(1e3, sigma=sigma)

plot(dat,pch='.')

cut_points <- list(
    c(-2,1),
    c(0, 1.7)
)

discretize_data <- function(x){
  c(
    sum(x[1] > cut_points[[1]]),
    sum(x[2] > cut_points[[2]])
  )
}

discrete_data <- dat %>% apply(., 1, discretize_data) %>% t()
table(discrete_data[,1],discrete_data[,2])


data_list <- list(
  D = ncol(discrete_data), N = nrow(discrete_data),
  y = discrete_data
)
fp <- file.path('***The path to your stan code***')
mod <- cmdstan_model(fp, force_recompile = getOption("cmdstanr_force_recompile", default = F))
poly_cor <- mod$sample(data = data_list,
                           chains = 2,
                           parallel_chains = 2,
                           iter_warmup = 2000,
                           adapt_delta = 0.8,
                           save_warmup = F,
                           iter_sampling = 2000)

As well as the stan code:

functions{
    real trunc_norm(array[] int y_data, array[] vector cut_points, matrix chol_mat, array[] real u, int D_y, real static_mu){
        real j_log_collection;
        j_log_collection = 0;
        real prev;
        vector[D_y] z;
        prev = 0;
        for (d in 1:D_y) {
            if (y_data[d] == 2){
                real bound_lower;
                real t;
                bound_lower = Phi((cut_points[d,2] - (static_mu + prev)) / (chol_mat[d,d]));
                t = bound_lower + (1 - bound_lower) * u[d];
                z[d] = inv_Phi(t);
                j_log_collection += log1m(bound_lower);
            }
            else if (y_data[d] == 1){
                real bound_lower;
                real bound_upper;
                real t;
                bound_lower = Phi((cut_points[d,1] - (static_mu + prev)) / (chol_mat[d,d]));
                bound_upper = Phi((cut_points[d,2] - (static_mu + prev)) / (chol_mat[d,d]));
                t = bound_lower + (bound_upper - bound_lower)*u[d];
                z[d] = inv_Phi(t);
                j_log_collection += log(bound_upper - bound_lower);
            }
            else if (y_data[d] == 0){
                real bound_upper;
                real t;
                bound_upper = Phi((cut_points[d,1] - (static_mu + prev)) / (chol_mat[d,d]));
                t = bound_upper*u[d];
                z[d] = inv_Phi(t);
                j_log_collection += log(bound_upper);
            }
            if (d < D_y){
                prev = chol_mat[d + 1, 1:d] * head(z,d);
            }
        }
    return(j_log_collection);
    }

    real induced_dirichlet_lpdf(vector c, vector alpha, real gamma){
        int K = num_elements(c) + 1;
        vector[K - 1] cuml = Phi(c - gamma);
        vector[K] p;
        matrix[K,K] J = rep_matrix(0,K,K);

        p[1] = cuml[1];
        for (k in 2:(K-1)){
            p[k] = cuml[k] - cuml[k-1];
        }
        p[K] = 1 - cuml[K-1];

        for (k in 1:K) J[k,1] = 1;

        for (k in 2:K){
            real rho = exp(std_normal_lpdf(c[k-1] - gamma));
            J[k,k] = -rho;
            J[k - 1, k] = rho;
        }
        return dirichlet_lpdf(p | alpha) + log_determinant(J);
    }
}

data {
  int<lower=1> D;
  int<lower=0> N;
  array[N, D] int<lower=0, upper=10> y;
//   array[D] matrix[D,D] L_Omega;
}
parameters {
  cholesky_factor_corr[D] L_Omega;
  array[N,D] real<lower=0, upper=1> u;
  array[D] ordered[D] c_points;
}
model {
    L_Omega ~ lkj_corr_cholesky(4);
    target += induced_dirichlet_lpdf(c_points[1] | rep_vector(1,D + 1), 0);
    target += induced_dirichlet_lpdf(c_points[2] | rep_vector(1,D + 1), 0);

    for (n in 1:N) target += trunc_norm(y[n], c_points, L_Omega, u[n], D, 0);
}
generated quantities {
   corr_matrix[D] Omega;
   Omega = multiply_lower_tri_self_transpose(L_Omega);
}

The above code does a pretty good job recovering the cut_points and covariance matrix:

# A tibble: 5 x 10
  variable         mean  median     sd    mad      q5     q95  rhat ess_bulk ess_tail
  <chr>           <dbl>   <dbl>  <dbl>  <dbl>   <dbl>   <dbl> <dbl>    <dbl>    <dbl>
1 c_points[1,1] -2.02   -2.02   0.0881 0.0895 -2.17   -1.88    1.00    8672.    2688.
2 c_points[1,2]  0.920   0.919  0.0454 0.0464  0.847   0.994   1.00    5049.    2837.
3 c_points[2,1]  0.0172  0.0168 0.0394 0.0398 -0.0468  0.0827  1.00    9414.    3447.
4 c_points[2,2]  1.67    1.67   0.0688 0.0687  1.56    1.79    1.00    6424.    2501.
5 Omega[1,2]     0.480   0.480  0.0418 0.0412  0.409   0.548   1.00    3458.    3262.

Not sure how well it would scale for higher dimensional data, or more cutpoints, but this simple code works pretty well! Hope that helps!

Topic		Replies	Views
Cholesky Decomposition with Ordinal Outcome Modeling rstan , techniques , specification	9	563	May 27, 2021
Estimating correlation matrix for ordinal outcomes Modeling	2	1013	September 29, 2017
Remove/estimate correlation of parameters (or get to know if correlation is real) Modeling	34	2920	April 16, 2020
Learning a correlation matrix Modeling	8	3314	December 13, 2017
Correlation between questionnaire measurements and parameters estimated in regression Modeling	10	1144	September 17, 2017

Estimating Polychoric Correlation Matrix with Stan

Related topics