# Constraining covariance matrix to two parameters (Contemporaneous Correlation)

I am trying to estimate a regression model in rstan implementing contemporaneous correlation structure analogous to a PCSE estimator outlined in the linked paper.

I have a regression model with n observations and student-t distributed errors which looks like:

\epsilon \sim \text{MV Student-t}(\mu,\Sigma,\nu)

With \mu the mean, \nu degrees of freedom, and n \times n covariance

\Sigma = \begin{pmatrix} \sigma^2 & r & r & \dots & r \\ r &\sigma^2 & r & \dots & r \\ \vdots & & \ddots & & \vdots \\ r & r & r & \dots & \sigma^2 \end{pmatrix}

I am hoping to constrain the covariance matrix such that \sigma^2 and r are constant.

Ultimately, we want to use this model to predict out-of-sample under the assumption that there is additional uncertainty due to the fact that outcomes are correlated for reasons unseen by the analyst. Note, we are making the assumption of homoscedasticity and homogenous covariance, which is both reasonable in our specific modeling context and also makes the model computationally tractable.

My current implementation follows the advice here to help improve computational performance, but I still need to transform the matrix into an n\times n matrix which is unwieldy since it propagates many duplicate parameters and involves the creation of a large matrix.

An implementation on synthetic data is presented below:


# Parameters for Synethic Data
N       <- 400
rho     <- 0.6
sigma   <- 0.15
sigma_b <- 0.05
rhosig2 <- rho*sigma**2
D       <- 4
df_sim  <- 5
# Covariance structure.
cov_mat_true <- diag(N)*sigma+ (matrix(rhosig2,nrow=N,ncol=N)-diag(N)*rhosig2)

# create data
X       <- sapply(1:D,FUN=function(x,n) runif(n),n=N)
beta    <- mvtnorm::rmvnorm(n=1,sigma = diag(D)*sigma_b)
epsilon <- t(mvtnorm::rmvt(n=1,sigma = cov_mat_true))

y <- X%*%t(beta)+ epsilon

data <- data.frame(y,X)

# put data in stan form
stan_data <- within(list(), {
Y <- dplyr::select(data,y )[[1]]
X <- model.matrix(~ X1+X2+X3+X4, data = data)
N <- length(Y)
K <- ncol(X)
})
# compile model
mod <- rstan::stan_model("contempCorr.stan")

# estimate model
test_fit <- rstan::sampling(mod,
data    = stan_data,
iter    = 4000,
warmup  = 3000,
verbose = T,
chains  = 4,
include = FALSE,
pars    = "Sigma",


with â€ścontempCorr.stanâ€ť :


functions {
}
data {
int<lower=1> N;  // total number of observations
vector[N] Y;  // response variable
int<lower=1> K;  // number of  regression coefs
matrix[N,K] X;
}
transformed data {
}
parameters {
vector[K] b;  // coefs
real<lower=1> nu;  // degrees of freedom or shape
cholesky_factor_corr[2] Lcorr;// cholesky factor (L_u matrix for R)
real<lower=0> Sigma_scale;
}
transformed parameters {
corr_matrix[2] R; // correlation matrix
cov_matrix[2] Sigma_small; // VCV matrix
cov_matrix[N] Sigma; // VCV matrix
R     = multiply_lower_tri_self_transpose(Lcorr);
// covariance matrix
for (i in 1:N){
for (j in 1:N){
if(i==j){
Sigma[i,j] = Sigma_small[1,1];
}
if(i!=j){
Sigma[i,j] = Sigma_small[1,2];
}
}
}
}
model {
// likelihood including all constants

// initialize linear predictor term
vector[N] mu = X * b;
target += multi_student_t_lpdf(Y | nu, mu, Sigma);

// priors including all constants
target += normal_lpdf(b[1] | 0, 0.1);
target += normal_lpdf(b[2] | 0, 0.1);
target += normal_lpdf(b[3] | 0, 0.1);
target += normal_lpdf(b[4] | 0, 0.1);
target += exponential_lpdf(Sigma_scale | 1/0.05);
target += lkj_corr_cholesky_lpdf(Lcorr|2.0); // prior for cholesky factor
target += gamma_lpdf(nu | 3, 0.5)
- 1 * gamma_lccdf(1 | 3, 0.5);
}
generated quantities {

}


1 Like