Model with Kronecker cov matrix


#1

Hi All,

I am relatively new to Stan, and my problem concerns model parameter outcomes from Stan application.

Study:
For each patient we observe two clinical endpoints over three visits; no grouping factors. We specify two covariance matrices: 1) between the endpoints Sigma (2x2), and 2) within endpoint, Tigma (3x3); in both cases we want them to be unstructured. Our patient covariance matrix VCOV (6x6) is Kronecker product of these two.

Below you can find:
a) Part of simulated data: groups1:6 correspond to each biomarker readout row and the biormarker - visit pairs, with visit 1 as baseline.
b) The codes for model, data and sampling.

So far:
The model complied. For beta0 and Sigma the chains mixed relatively well with outcomes in line with (or acceptably close to) the simulation input. Unfortunately, for Tigma the chains did not mix well and outcome levels were rather far from the simulation input. It was not converging even if matrix T reflected exactly the ones used in simulations. As you can see Ive used Wishart priors for covariance matrices. Before Ive tried fitting the model with proc mcmc in SAS and JAGS but no success (I guess) due to the priors as Gibbs failed in recognizing conjugacy. Any ideas why Tigma fails? Is it the model specification itself, or nature of that problem due to the chosen priors? In SAS, proc mixed gave me estimates in line with the simulation inputs.

Any help or insight is much appreciated.

Robert

> head(DataSim, 10)
   SUBJID    CVORRSN VISIT CVTSTCD group1 group2 group3 group4 group5 group6
1       1 0.39027051 DAY 1   AK001      1      0      0      0      0      0
2       1 0.12891371 DAY 1    PASP      0      1      0      0      0      0
3       1 0.46329558 DAY 2   AK001      1      0      1      0      0      0
4       1 0.08407614 DAY 2    PASP      0      1      0      1      0      0
5       1 0.34342493 DAY 3   AK001      1      0      0      0      1      0
6       1 0.18086933 DAY 3    PASP      0      1      0      0      0      1
7       2 0.27029459 DAY 1   AK001      1      0      0      0      0      0
8       2 0.18741587 DAY 1    PASP      0      1      0      0      0      0
9       2 0.58242061 DAY 2   AK001      1      0      1      0      0      0
10      2 0.38295721 DAY 2    PASP      0      1      0      1      0      0

The model:

model{
data{
int<lower=1> N;
int<lower=1> NSUBJID;
int<lower=0, upper=1> group1[N];
int<lower=0, upper=1> group2[N];
int<lower=0, upper=1> group3[N];
int<lower=0, upper=1> group4[N];
int<lower=0, upper=1> group5[N];
int<lower=0, upper=1> group6[N];
int<lower=0> df_Sigma;
int<lower=0> df_Tigma;
int<lower=0> df_VCOV;
real y[N];
real mu1;
real s1;
real mu2;
real s2;
int<lower=1> ID[N];
cov_matrix[2] S;
cov_matrix[3] T;
}

parameters{
vector[6] beta[NSUBJID];
vector[6] beta0;
cov_matrix[2] Sigma;
cov_matrix[3] Tigma;
cov_matrix[6] VCOV;
real<lower=0> sigmaErr;
}

model{

real mu[N];
matrix[6,6] Omega;

for(i in 1:N){
  mu[i] = beta[ID[i],1] * group1[i] + 
          beta[ID[i],2] * group2[i] + 
          beta[ID[i],3] * group3[i] + 
          beta[ID[i],4] * group4[i] + 
          beta[ID[i],5] * group5[i] + 
          beta[ID[i],6] * group6[i];
  y[i] ~ normal(mu[i], sigmaErr);
}

sigmaErr ~ uniform(0, 2);

for(j in 1:NSUBJID){
  beta[j, 1:6] ~ multi_normal(beta0[1:6], VCOV);
}
beta0[1] ~ normal(mu1, s1);
beta0[2] ~ normal(mu2, s2);
beta0[3] ~ normal(0, 0.1);
beta0[4] ~ normal(0, 0.1);
beta0[5] ~ normal(0, 0.1);
beta0[6] ~ normal(0, 0.1);

Sigma ~ inv_wishart(df_Sigma,S);
Tigma ~ inv_wishart(df_Tigma,T);

// Elementwise Kronecker product covariance matrix:
for (k in 1:2){   
for (l in 1:2){
for (m in 1:3){ 
for (n in 1:3){
Omega[(k-1)*3+m, (l-1)*3+n] = Sigma[k,l] * Tigma[m,n];
}
}
}
}

VCOV ~ inv_wishart(df_VCOV, Omega);
} 

Data and sampling (controls are subjective):

DataSim_Stan <- list(N=nrow(DataSim),
                     ID = DataSim $ SUBJID,
                     NSUBJID=length(unique(DataSim $ SUBJID)),
                     group1=DataSim $ group1,
                     group2=DataSim $ group2,
                     group3=DataSim $ group3,
                     group4=DataSim $ group4,
                     group5=DataSim $ group5,
                     group6=DataSim $ group6,
                     y = DataSim $ CVORRSN,
                     S = diag(0.1, 2),
                     T=diag(0.1,3),
                     mu1=0.5, s1=0.1, mu2=0.5, s2=0.1,
                     df_Sigma=2+1, df_Tigma=3+1, df_VCOV=6+1
                    )

theta_Stan = sampling(object=stan.compiled, 
                      data=DataSim_Stan,
                      pars=c('beta0', 'Sigma', 'Tigma'),
                      warmup = 5e2, 
                      iter = 1e3, 
                      thin=1, 
                      chains = 1, 
                      init='random',
                      control=list(adapt_delta=0.9, 
                                   max_treedepth=15, 
                                   stepsize=1e-3, 
                                   epsilon=1e-8))
}

#2

Based on what you described, I believe that VCOV should be declared and defined in a transformed parameters block rather than in the parameters block. In other words, it is not a separate unknown but is conditionally deterministic given Sigma and Tigma.

Except Stan does not have a Kronecker product function, so you would have to fill in its 36 elements manually. In this scenario, there would not be any prior on VCOV, just on Sigma and Tigma.

Also, inverse Wishart priors are bad. There is lots of stuff in the Stan User manual about parameterizing a covariance matrix as the product of a diagonal matrix of standard deviations, a correlation matrix, and the diagonal matrix of standard deviations again. Then you can put a lkj_corr() prior on the correlation matrix — or better yet a lkj_corr_cholesky prior on the Cholesky factor of a correlation matrix — and whatever prior you want on the standard deviations.


#3

Hi Ben, I appreciate your prompt response!

Could you have a look at model below and let me know what you think? For Sigma and Tigma still do not get what sort of expect. Thanks a lot.

data{
int<lower=1> N;
int<lower=1> NSUBJID;
int<lower=0, upper=1> group1[N];
int<lower=0, upper=1> group2[N];
int<lower=0, upper=1> group3[N];
int<lower=0, upper=1> group4[N];
int<lower=0, upper=1> group5[N];
int<lower=0, upper=1> group6[N];
real y[N];
real mu1;
real s1;
real mu2;
real s2;
int<lower=1> ID[N];
}

parameters{
vector[6] beta[NSUBJID];
vector[6] beta0;
corr_matrix[2] Sigma0;        // Correlation matrix for Sigma
corr_matrix[3] Tigma0;        // Correlation matrix for Tigma
vector<lower=0>[2] tau_Sigma; // prior scale for Sigma
vector<lower=0>[3] tau_Tigma; // prior scale for Tigma
real<lower=0> sigmaErr;
}

transformed parameters{
matrix[2,2] Sigma;
matrix[3,3] Tigma;
cov_matrix[6] VCOV;

Sigma = quad_form_diag(Sigma0, tau_Sigma);
Tigma = quad_form_diag(Tigma0, tau_Tigma);

// Elementwise Kronecker product covariance matrix:
for (k in 1:2){   
  for (l in 1:2){
    for (m in 1:3){ 
      for (n in 1:3){
        VCOV[(k-1)*3+m, (l-1)*3+n] = Sigma[k,l] * Tigma[m,n];
      }
    }
  }
}
}

model{

real mu[N];

for(i in 1:N){
  mu[i] = beta[ID[i],1] * group1[i] + 
          beta[ID[i],2] * group2[i] + 
          beta[ID[i],3] * group3[i] + 
          beta[ID[i],4] * group4[i] + 
          beta[ID[i],5] * group5[i] + 
          beta[ID[i],6] * group6[i];
  y[i] ~ normal(mu[i], sigmaErr);
}

sigmaErr ~ uniform(0, 2);

for(j in 1:NSUBJID){
  beta[j, 1:6] ~ multi_normal(beta0[1:6], VCOV);
}
beta0[1] ~ normal(mu1, s1);
beta0[2] ~ normal(mu2, s2);
beta0[3] ~ normal(0, 0.1);
beta0[4] ~ normal(0, 0.1);
beta0[5] ~ normal(0, 0.1);
beta0[6] ~ normal(0, 0.1);

// Priors on covariance matrices:
tau_Sigma ~ cauchy(0, 2.5);
tau_Tigma ~ cauchy(0, 2.5);
Sigma0 ~ lkj_corr(2);
Tigma0 ~ lkj_corr(3);


#4

I suspect those half-Cauchy priors on the standard deviations are too heavy-tailed.

Also, you can take y[i] ~ normal(mu[i], sigmaErr); out of that loop and put this

y ~ normal(mu, sigmaErr);

right after the loop.


#5

Thank you Ben,

In the meantime Ive re-developed my model. I guess there was a mistake in the previous version: VCOV was wrongly declared var-cov matrix of param beta. The new model is for changed data structure.

Below:

  1. Changed the data structure to wide format
  2. Changed model code.

My problem is that it still does not converge to the true parameters used to simulate the data (DataSim). In particular, for Sigma and Tigma, as for beta it return the true values. Ive noticed that the outcomes are pretty much dependent from the priors on Sigma and Tigma.

Recap of my objective: Sigma represent var-cov matrix between the biomarkers, Tigma represents within biomarker var-cov matrix. I am interested in estimating both matrices assuming their unstructured forms.

I will appreciate your help!

Robert

  1. Data structure:
> head(DataSim[,1:7])
  SUBJID   AK001_1   AK001_2   AK001_3    PASP_1    PASP_2    PASP_3
1      1 0.6995278 0.9585548 1.0075110 0.6773591 0.8568844 0.6124973
2      2 0.5454959 0.7843840 0.6561844 0.2228429 0.6704886 0.6542814
3      3 0.7893287 0.7811709 0.4713912 0.5130966 0.8449378 0.3997927
4      4 0.8186514 0.7457569 0.5996530 0.4058711 0.4612626 0.4863368
5      5 0.7142902 0.7250247 0.6657069 0.7024159 0.7521864 0.7027360
6      6 0.6409087 0.6771914 0.6934551 0.4702058 0.5289813 0.4141213
.
.

where, subscript ‘_1’ means biomarker value observed a visit 1, then ‘_2’ a visit 2 and so on.

  1. Re-developed model:
data{
int<lower=1> NSUBJID;
vector[6] y[NSUBJID];
real mu1;
real s1;
real mu2;
real s2;
real<lower=0> Sigma_location;
real<lower=0> Tigma_location;
real<lower=0> Sigma_scale;
real<lower=0> Tigma_scale;
real<lower=0> v1;
real<lower=0> v2;
}

parameters{
vector[6] beta;
corr_matrix[2] Sigma0;        // Correlation matrix for Sigma
corr_matrix[3] Tigma0;        // Correlation matrix for Tigma
vector<lower=0>[2] tau_Sigma; // prior scale for Sigma
vector<lower=0>[3] tau_Tigma; // prior scale for Tigma
}

transformed parameters{
cov_matrix[2] Sigma;
cov_matrix[3] Tigma;
matrix[6,6] VCOV;
vector[6] Mu;

## Visit one as baseline
Mu[1] = beta[1];
Mu[2] = beta[1] + beta[3];
Mu[3] = beta[1] + beta[4];
Mu[4] = beta[2];
Mu[5] = beta[2] + beta[5];
Mu[6] = beta[2] + beta[6];

Sigma = quad_form_diag(Sigma0, tau_Sigma);
Tigma = quad_form_diag(Tigma0, tau_Tigma);

// Elementwise Kronecker product covariance matrix:
for (k in 1:2){   
  for (l in 1:2){
    for (m in 1:3){ 
      for (n in 1:3){
        VCOV[3*(k-1)+m, 3*(l-1)+n] = Sigma[k,l] * Tigma[m,n];
      }
    }
  }
}

}

model{

beta[1] ~ normal(mu1, s1);
beta[2] ~ normal(mu2, s2);
beta[3] ~ normal(0, 0.1);
beta[4] ~ normal(0, 0.1);
beta[5] ~ normal(0, 0.1);
beta[6] ~ normal(0, 0.1);

for(j in 1:NSUBJID){
  y[j] ~ multi_normal(Mu, VCOV);
}

// Priors on covariance matrices (along recommendations in Stan manual):

// Standard deviations:
tau_Sigma ~ cauchy(Sigma_location, Sigma_scale);
tau_Tigma ~ cauchy(Tigma_location, Tigma_scale);

// Correlation matrix:
Sigma0 ~ lkj_corr(v1);
Tigma0 ~ lkj_corr(v2);

Stan data and sampler:

DataSim_STAN <- list(y=DataSim[2:(6+1)],
                     NSUBJID=length(unique(DataSim $ SUBJID)),
                     Sigma_location = 0.1,
                     Tigma_location = 0.5,
                     Sigma_scale = 0.1,
                     Tigma_scale = 0.1,
                     v1=2, 
                     v2=1,
                     mu1=0.5, s1=0.1, mu2=0.5, s2=0.1
                    )

theta_STAN = sampling(object=stan.compiled, 
                      data=DataSim_STAN,
                      pars=c('beta', 'Sigma', 'Tigma', 'Sigma0', 'Tigma0'),
                      warmup = 5e2, 
                      iter = 1e3, 
                      thin=1,
                      chains = 1, 
                      init='random',
                      algorithm='NUTS',
                      control=list(adapt_delta=0.95, max_treedepth = 30))

#6

I don’t see anything wrong with the code, but I would not be surprised if it is sensitive to the priors chosen.


#7

Thank you again for a prompt response. Is it an issue (if can say so) with dimensionality of the likelihood?
Robert


#8

I wouldn’t really say it is due to the dimensionality of anything. But each element of Sigma and Tigma affects the likelihood in a nonlinear way. And you have a pretty strong prior on the correlation matrices and a heavy tailed prior on each of the standard deviations.