Uncoupling with block-diagonal covariance matrix

I’m trying to sample from a model that has a block-diagonal covariance (in my particular case, the blocks are identical).
I wonder if I split the problem into 3 blocks, e.g. if the two following versions are equivalent:

full version

transformed parameters {
  row_vector[Mm] lambdaCoeff;  
  row_vector[3*Mn] xreal;
  cov_matrix[Mn] pcordsigma;                        // Covariance for a single coordinate
  cov_matrix[3*Mn] sigmacovar;                    // Covariance for the whole system
  lambdaCoeff=lambdaHat*diagSigma;  
  xreal=mu+lambdaCoeff*Umodes;                     // (1 X 3*Mn) + (1 X Mm) X(Mm X 3*Mn) = (1 X 3*Mn)
  pcordsigma=rep_matrix(0.0,Mn,Mn);
  for (m in 1:Mn){
      pcordsigma[m,m]=nu*nu;
  }
  for (m in 1:(Mn-1)){
    for (n in (m+1):Mn ){
        real mdijsq=-1.0*dij[m,n];
        pcordsigma[m,n]=nu*nu*exp(mdijsq/l);
        pcordsigma[n,m]=nu*nu*exp(mdijsq/l);
    }
  }
  sigmacovar=rep_matrix(0.0,3*Mn,3*Mn);  
  sigmacovar[1:Mn,1:Mn]=pcordsigma;
  sigmacovar[(Mn+1):(2*Mn),(Mn+1):(2*Mn)]=pcordsigma;
  sigmacovar[(2*Mn+1):(3*Mn),(2*Mn+1):(3*Mn)]=pcordsigma;
}
model {
  lambdaHat ~ std_normal();             // Here we use N(0,1): this prior is less informative to the one described in the paper, but must have no influence on the final result.
  l ~ uniform(lmin, lmax);
  nu ~ uniform(numin, numax);
  //Likelihood
  y ~ multi_normal(xreal, sigmacovar);
} 

** split version**

transformed parameters {
  row_vector[Mm] lambdaCoeff;  
  row_vector[Mn] xreal1;
  row_vector[Mn] xreal2;
  row_vector[Mn] xreal3;
  cov_matrix[Mn] pcordsigma;                        // Covariance for a single coordinate
  lambdaCoeff=lambdaHat*diagSigma;  
  xreal1 = mu1+lambdaCoeff*Umodes1;                     // (1 X Mn) + (1 X Mm) X(Mm X Mn) = (1 X Mn)
  xreal2 = mu2+lambdaCoeff*Umodes2;                     // (1 X Mn) + (1 X Mm) X(Mm X Mn) = (1 X Mn)
  xreal3 = mu3+lambdaCoeff*Umodes3;                     // (1 X Mn) + (1 X Mm) X(Mm X Mn) = (1 X Mn)
  pcordsigma=rep_matrix(0.0,Mn,Mn);
  for (m in 1:Mn){
      pcordsigma[m,m]=nu*nu;
  }
  for (m in 1:(Mn-1)){
    for (n in (m+1):Mn ){
        real mdijsq=-1.0*dij[m,n];
        pcordsigma[m,n]=nu*nu*exp(mdijsq/l);
        pcordsigma[n,m]=nu*nu*exp(mdijsq/l);
    }
  }
}
model {
  lambdaHat ~ std_normal();             // Here we use N(0,1): this prior is less informative to the one described in the paper, but must have no influence on the final result.
  l ~ uniform(lmin, lmax);
  nu ~ uniform(numin, numax);
  //Likelihood
  y1 ~ multi_normal(xreal1, pcordsigma);
  y2 ~ multi_normal(xreal2, pcordsigma);
  y3 ~ multi_normal(xreal3, pcordsigma);
} 

and, if equivalent, which one is more efficient.

Moreover, if I want to use in the future the map reduction to speed-up execution, would it be possible to write a function that acts on one block only and then call it on each of the 3 blocks (thus, the shards are defined for each block)?

Thanks for the help,

Cesare

It looks right to me. Definitely the second is more efficient. Matrix solves (which you’ll need to evaluate the multivariate normal) scale as N^3. So 3 matrix solves of 1/3 the size each should take 1/9th the computation.

Since every covariance matrix is the same, you can save more time by doing a Cholesky decomposition of pcordsigma and then switching to multi_normal_cholesky like:

model {
  matrix[Mn, Mn] L = cholesky_decompose(pcordsigma);
  lambdaHat ~ std_normal();
  l ~ uniform(lmin, lmax);
  nu ~ uniform(numin, numax);
  y1 ~ multi_normal_cholesky(xreal1, L);
  y2 ~ multi_normal_cholesky(xreal2, L);
  y3 ~ multi_normal_cholesky(xreal3, L);
} 

That should be a little faster yet.

Is this some sort of Gaussian process?

1 Like

Thanks for your reply.
Is for a model of image de-noising that has a covariance matrix that is similar to that of a Gaussian process.

1 Like