I have an outcome/predicted variable in a regression model that is a vector of correlation coefficients and therefore bounded between [-1, 1]. What distributional family and link functions would you recommend for predicting this variable? I considered re-scaling the variable to [0, 1] and using the beta distributional family (or one of its inflated extensions), but I was wondering if there were better alternatives. Thanks in advance for suggestions/clarifications.

Might be missing something, but is it possible to work on the \boldsymbol{X\beta} scale and define:

\boldsymbol{\rho} = 2 \Phi(\boldsymbol{X\beta} + \boldsymbol{\varepsilon}) - 1 where \Phi(\cdot) is the standard normal CDF?

I would be tempted to transform them to a [0,1] interval and model them using a beta distribution but transform the correlations using a logistic transform \frac{e^x}{1+e^x}, so that you know \rho < 0 \iff logistic(\rho) < 0.5, rather than an absolute transform (if you care about the sign). Haven’t tested this out, however!

1 Like

Do you require the matrix of correlations to be positive definite?

If you’re just interested in a transform then `tanh(x)`

runs from [-1, 1] which is the [-1, 1] counterpart to inverse logit.

2 Likes