# Estimating Polychoric Correlation Matrix with Stan

Hi all,

I’m working on a Bayesian approach to “gernalizability analysis” with an ordinal data set. I used `brms` to fit cumulative models to then calculate “variance partitioning”.

Generalizability studies typically report a reliability statistic for the scale, this is often Cronbach’s alpha. But, my view is that the typical approach is specific to frequentist models of metric/interval data.

After some reading, I think calculating an “ordinal alpha” based on the polychoric correlation matrix would be the best approach.

There are some existing packages in R to calculate polychoric correlation matrices (e.g., polychor,, psych::polychoric) but these are restricted to maximum likelihood estimation.

I’m wondering if anyone is aware of a stan-based approach for estimating a polychoric correlation matrix. I’d like to use a Stan/Bayesian approach to fit with the rest of the analysis and provide distributions of the estimates.

I don’t have the programming skills to implement this, so I’m wondering if anyone has already attempted the estimation.

As an appendix, this is the implementation in the `polychor` package

``````polychor <-
function (x, y, ML=FALSE, control=list(), std.err=FALSE, maxcor=.9999){
f <- function(pars) {
if (length(pars) == 1){
rho <- pars
if (abs(rho) > maxcor) rho <- sign(rho)*maxcor
row.cuts <- rc
col.cuts <- cc
}
else {
rho <- pars
if (abs(rho) > maxcor) rho <- sign(rho)*maxcor
row.cuts <- pars[2:r]
col.cuts <- pars[(r+1):(r+c-1)]
if (any(diff(row.cuts) < 0) || any(diff(col.cuts) < 0)) return(Inf)
}
P <- binBvn(rho, row.cuts, col.cuts)
- sum(tab * log(P))
}
tab <- if (missing(y)) x else table(x, y)
zerorows <- apply(tab, 1, function(x) all(x == 0))
zerocols <- apply(tab, 2, function(x) all(x == 0))
zr <- sum(zerorows)
if (0 < zr) warning(paste(zr, " row", suffix <- if(zr == 1) "" else "s",
" with zero marginal", suffix," removed", sep=""))
zc <- sum(zerocols)
if (0 < zc) warning(paste(zc, " column", suffix <- if(zc == 1) "" else "s",
" with zero marginal", suffix, " removed", sep=""))
tab <- tab[!zerorows, ,drop=FALSE]
tab <- tab[, !zerocols, drop=FALSE]
r <- nrow(tab)
c <- ncol(tab)
if (r < 2) {
warning("the table has fewer than 2 rows")
return(NA)
}
if (c < 2) {
warning("the table has fewer than 2 columns")
return(NA)
}
n <- sum(tab)
rc <- qnorm(cumsum(rowSums(tab))/n)[-r]
cc <- qnorm(cumsum(colSums(tab))/n)[-c]
if (ML) {
result <- optim(c(optimise(f, interval=c(-1, 1))\$minimum, rc, cc), f,
control=control, hessian=std.err)
if (result\$par > 1){
result\$par <- maxcor
}
else if (result\$par < -1){
result\$par <- -maxcor
warning(paste("inadmissible correlation set to -", maxcor, sep=""))
}
if (std.err) {
chisq <- 2*(result\$value + sum(tab * log((tab + 1e-6)/n)))
df <- length(tab) - r - c
result <- list(type="polychoric",
rho=result\$par,
row.cuts=result\$par[2:r],
col.cuts=result\$par[(r+1):(r+c-1)],
var=solve(result\$hessian),
n=n,
chisq=chisq,
df=df,
ML=TRUE)
class(result) <- "polycor"
return(result)
}
else return(as.vector(result\$par))
}
else if (std.err){
result <- optim(0, f, control=control, hessian=TRUE, method="BFGS")
if (result\$par > 1){
result\$par <- maxcor
}
else if (result\$par < -1){
result\$par <- -maxcor
warning(paste("inadmissible correlation set to -", maxcor, sep=""))
}
chisq <- 2*(result\$value + sum(tab *log((tab + 1e-6)/n)))
df <- length(tab) - r - c
result <- list(type="polychoric",
rho=result\$par,
var=1/result\$hessian,
n=n,
chisq=chisq,
df=df,
ML=FALSE)
class(result) <- "polycor"
return(result)
}
else optimise(f, interval=c(-maxcor, maxcor))\$minimum
}
``````

Another thing is that the `polychor` function produces maximum likelihood estimates of the pairwise correlation between two variables, but the overall correlation matrix among K variables may not be positive definite.

To do the Bayesian analogue, you basically need the stuff at

and to estimate the correlation matrix, the latent z variables, and the cutpoints simultaneously. Hopefully, that is enough to get started.

Thanks, Ben!

The link provided and code for the tobit model are well above my understanding, but provide a good starting place!

I should have included in my initial post that the estimation of the polychromic correlation is the first step in estimating an ordinal alpha (reliability) coefficient.

I’m trying to take a Bayesian approach to what is presented and discussed in this paper that provides an R-based example workflow in the appendix.