Issues with covariance estimation for matrix derived product

Hi Stan community,

I am fitting a model in stan where a parameter of interest needs to be derived based on products in matrix X and need to be assigned to matrix Y, so the covariance can be estimated with other parameters in matrix Y. In the transformed parameters block I specified the following (example):

 matrix[Nind,3] Y; // Estimates per individual derived from model equation for parameter 1 & 2
 matrix[Nind,2] X; // Estimates per individual derived from model equation

for (ind in 1:Nind)
 Y[ind,3] = (constant^2 * X[ind,1]) + X[ind,2]; // estimate parameter Y3

This seems to be working fine in order to derive the expected values for Y3 based on the expected estimates from the equation.

However, the covariances/correlations that are estimated in the generated quantities block deviate a lot from the values set in my simulations. I couldn’t find any other posts similar to this issue, so is there something I’m doing wrong with the model specifications?

All the best,
Corné

Can you maybe share the rest of the code and simulation? There’s not much to go on here!

What you did provide can be the following one-liner:

Y[ , 3] = constant^2 * X[ , 1] + X[ , 2];

This only defines the third column of the matrix Y.

Hi @Bob_Carpenter,

I should have done so in the first place, but thought the error originated in the small piece of code in the previous post, but here is the full model code pasted below. The matrix product loop is in the transformed parameters section. The 2 correlations that are miss-estimated compared to the simulated data are cor_P2 and cor_P3, which are based on i[,3], which is the matrix product derived by the for loop. I have attached the simulated data below, let me know if you have any further questions.
Sim_PsiPhiW_CdG.csv (2.3 MB)

data {
  int<lower=1> No;  // total number of observations
  int<lower=1> individual[No]; // individual identity for each observation
  int<lower=1> opponent1[No]; // individual identity for each observation
  int<lower=1> opponent2[No]; // individual identity for each observation
  int<lower=1> Na; // number of individuals
  real xi[No]; // focal x
  real xjk[No]; // social partners x
  real z[No];  // phenotypic observations
}
parameters {
  real mu_x; // intercept x
  real mu; // overall intercept
  real psi; // psi slope for z1
  // i-matrix components
  vector<lower=0>[3] sd_I; // permanent individual standard deviation
  matrix[Na,3] iz; // standardised permanent individual effects
  cholesky_factor_corr[3] LI; // Permanent environment effect
  // xe-matrix components
  vector<lower=0>[2] sd_xe; // individual standard deviation
  matrix[Na,2] ixe; // standardised  individual effects
  cholesky_factor_corr[2] Lxe; // Permanent environment effect
  // Sigmas likelihood function
  real<lower=0> sigma_ez; // residual sigma
  real<lower=0> sigma_ex; // residual sigma x
}
transformed parameters {
  // Make I-matrix 
  matrix[Na,3] i; // permanent individual effects
  i = iz * diag_pre_multiply(sd_I, LI)';
  // Make xe-matrix 
  matrix[Na,2] xe; // permanent individual effects
  xe = ixe * diag_pre_multiply(sd_xe, Lxe)';
  // Expected trait values for each observation
  vector[No] e_x; // predicted values x 
  vector[No] z_exp; // expected phenotypic values
  // Expected trait values for each observation
  for (o in 1:No) 
     e_x[o]  = mu_x + xe[individual[o],1];   // errors in variable model for x
  // Partition the phenotypes into I_DIE & I_IIE
  for (o in 1:No)
    z_exp[o] = mu + i[individual[o],1] + (psi + i[individual[o],2])*xjk[o] + xe[opponent1[o],2] + xe[opponent2[o],2];
  // Derive phi/social impact per individual for z1
  for (ind in 1:Na)
 i[ind,3] = (psi^2 * xe[ind,1]) + xe[ind,2]; 
}
model {
  // Priors
  mu_x ~ normal(0, 5); // Weakly informative prior on the intercept of x
  mu ~ normal(0, 5); // Weakly informative prior on the intercept
  psi ~ normal(0, 5); // Weakly informative prior on the slope
  to_vector(sd_I) ~ exponential(3); // Ditto, and note that these distributions are truncated to be positive (given the declaration of sd)
  to_vector(sd_xe) ~ exponential(3); 
  // Random effect definition (i.e. lower levels of the hierarchy of effects)
  to_vector(iz) ~ normal(0,1); // implies i ~ normal(0,sd_I)
  to_vector(ixe) ~ normal(0,1); 
  LI ~ lkj_corr_cholesky(3); //Prior PE correlation
  Lxe ~ lkj_corr_cholesky(3); //Prior PE correlation
  // Cholesky multinormal likelihood function to estimate residual correlations
    for (o in 1:No)
  z[o] ~ normal(z_exp[o], sigma_ez);
  //Errors in variable likelihood
  for (o in 1:No)
    xi[o] ~ normal(e_x[o], sigma_ex);
}
generated quantities {
  //Variances 
  real var_DIE = sd_I[1]^2;
  real var_psi = sd_I[2]^2;
  real var_epsilon = sd_xe[2]^2;
  real var_phi = sd_I[3]^2;
  real var_x = sd_xe[1]^2;
  // Residual variation
  real var_res = sigma_ez^2; 
  // Total variation 
  real<lower=0> var_P = var_DIE + var_psi + var_epsilon + var_x + var_res;
  // Repeatabilities
  real<lower=0> rep_DIE = var_DIE/var_P; // rep DIE
  real<lower=0> rep_psi = var_psi/var_P; // rep psi
  real<lower=0> rep_epsilon = var_epsilon/var_P; // rep epsilon
  real<lower=0> rep_phi = var_phi/var_P; // rep phi
  // Correlations
  matrix[3, 3]  Omega_P;
  Omega_P = LI * LI';
  real cor_P1 = Omega_P[1,2];
  real cor_P2 = Omega_P[1,3];
  real cor_P3 = Omega_P[2,3];
  matrix[2, 2]  Omega_xe;
  Omega_xe = Lxe * Lxe';
  real cor_xe = Omega_xe[1,2];
}

Now that I’ve reread the original question, are you checking the coverage of posterior intervals? It takes a lot of data to get tight posteriors on covariance matrices in dimensions higher than a handful.

Can you simulate data according to the model and recover the right correlations?

P.S. You can declare and define on one line:

matrix[2, 2]  Omega_xe = Lxe * Lxe';

This can be sped up:

  for (o in 1:No)
  z[o] ~ normal(z_exp[o], sigma_ez);

to

z ~ normal(z_exp, sigma_ez);

Thanks @Bob_Carpenter for the reply!

The correlation coefficients that I set in the simulations are all 0.5, however the model fit yields correlation coefficients close to zero with a 95% CI that ranges from ~-0.6 to ~0.6. The variance attributed to the i[,3] also doesn’t match with the simulated values. I recalculated the variance in the model according to the rules of variance which matches the simulated variance. The simulation has ~6000 observations for 245 individuals (the unit where the correlations are estimated), and for the other correlations the 95% CI deviates about 0.1 to 0.2 from the median correlation coefficient. I have never simulated data according to the model to recover the right correlations, what do you exactly mean with that?

Interestingly, when I extract the observations in the i matrix from the posterior it looks like i[,3] is estimated correctly. Furthermore, the correlations of i[,1]-i[,3] and i[,2]-i[,3] are around 0.5 based on a simple correlation of the posterior medians . So perhaps something is going wrong because i[,3] is assigned at a later stage? Would you know of a way to resolve this?

P.S. thanks for the other tips! Is there a way to also vectorise the model equations? I tried this before, but failed because I assign individual values to the matrices which need to be indexed for the individual identifier (i.e. individual or opponent id).