# How to increase speed (reduce calculation time) for covariance matrices?

[edit: escaped code]

Dear Stan-community,

I have a model with a covariance matrix as a function of another variable. The calculation time is high and I would like to reduce it.

The model consists of a univariate variable Z (genralized normally distributed).
Z~GN(mean, alpba, beta),
and W is multivariate normally distributed
W~N(mean, COV)
with mean and covariance COV, where COV is a function of Z:
COV[W_i, W,j] = rho_ij * D_i( Z) * D_j( Z)
D_i(Z) is the standard dev and is a linear function of Z.

In the case, when rho_ij is the diagonal matrix (rho_ii=1, rest 0, i.e. un-corellated W), then the run time is 5.5h. If rho is a correlation matrix with rho_ij in [-1,1], then the calculation time is 62h.

Some remarks:
Number of records is 15’000,
pystan uses one core only,
Number of chain is set to chain=1 (increasing the number of chains did not reduce time)

My main question is how can the calculation time be reduced.

# MODEL

``````functions{

real func_linear(real z, real zmin, real zmax, real rho_zmin,  real rho_zmax ){
//  linear interpolation between zmin and zmax
real y;
y =  rho_zmin + (z - zmin) * (rho_zmax - rho_zmin) / (zmax - zmin)   ;
return y;
}

real gen_normal_lpdf( vector x, real mu_, real alpha_, real beta_ ) {
vector[rows(x)] lp;
int N;

N = rows(x);
for(i in 1:N){
// pdf according to wikipedia
// https://en.wikipedia.org/wiki/Generalized_normal_distribution
lp[ i ] = log(beta_) - (log(2*alpha_)+lgamma(1/beta_)) - pow( fabs(x[i]-mu_)/alpha_, beta_ );
}
// returns loglikelihood to maximise it (not neg. likelihood!)
return sum( lp );
}
}

data {
int < lower = 1 > data_n;   // number of values
int < lower = 1 > axes_n;   //

matrix[data_n, axes_n] W;   //  data of ratios values
vector[data_n] Z;           //  data of total

real <  upper = 1 > prior_W_mean;          // mean priordistribution
real <  upper = 2 > prior_W_std;           // std dev prior
real <  upper = 1 > prior_W_cov;           // CoV for priors

real < lower = 0 > Z_0;        // start value for Z (Z_min * 0.95), used e.g. for D[A_i(Z)]
real < lower = 0 > Z_1;        // end value for Z (Z_max * 1.05)
real < lower = 0 > Z_beta;     // beta parameter for the Z variable as input
real < lower = 0, upper = 100 > prior_Z_mean;          // prior for mean
real < lower = 0, upper = 10 > prior_Z_std;           // prior for std
}

transformed data {
vector < upper = 35> [data_n] W1;
vector < upper = 35> [data_n] W2;
vector < upper = 35> [data_n] W3;
vector < upper = 35> [data_n] W4;

W1 = W[:, 1];
W2 = W[:, 2];
W3 = W[:, 3];
W4 = W[:, 4];
}

parameters {
real < lower = 0, upper = axes_n * 35  >  Z_mean;       //  mean of total

real <  upper = 1 > W1_mean;               // mean
real <  upper = 1 > W2_mean;               //
real <  upper = 1 > W3_mean;               //
real <  upper = 1 > W4_mean;               //

real < lower = 0, upper = 1 > W1_std_c;   //    Wi_std for Z0
real < lower = 0, upper = 1 > W2_std_c;   //

real < lower = 0, upper = 1 > W1_std_d;   //    Wi_std for Z1
real < lower = 0, upper = 1 > W2_std_d;   //

real < lower = -1, upper = 1 > rho_12;               //    rho
real < lower = -1, upper = 1 > rho_13;               //
real < lower = -1, upper = 1 > rho_14;               //
real < lower = -1, upper = 1 > rho_23;               //
real < lower = -1, upper = 1 > rho_24;               //
real < lower = -1, upper = 1 > rho_34;               //

real < lower = 0, upper = axes_n * 20 >     Z_alpha;  //    parameter alpha for Z
}

transformed parameters{
matrix < lower = -1000, upper = 1000 > [axes_n, axes_n] COVMAT [data_n];    //  covariance matrix
cholesky_factor_cov [axes_n] L_COVMAT [data_n] ;                            //  choleski of COVMAT

vector < upper = 1 > [axes_n]  W_mean;          //  vector of mean values

vector < lower = 0, upper = 2 > [data_n] W1_std;   //  std dev
vector < lower = 0, upper = 2 > [data_n] W2_std;   //
vector < lower = 0, upper = 2 > [data_n] W3_std;   //
vector < lower = 0, upper = 2 > [data_n] W4_std;   //

W_mean= W1_mean;     //  building the vector with mean values
W_mean= W2_mean;
W_mean= W3_mean;
W_mean= W4_mean;

for (i in 1:data_n){

//  calculate STD as a function of Z
W1_std[i] = func_linear( Z[i], Z_0, Z_1, W1_std_c, W1_std_d) ;
W2_std[i] = func_linear( Z[i], Z_0, Z_1, W2_std_c, W2_std_d) ;
W3_std[i] = W2_std[i] ;  // use W2 values
W4_std[i] = W1_std[i] ;  // use W1 values

COVMAT[i,1,1] = 1          * W1_std[i] * W1_std[i] + 1e-10;
COVMAT[i,1,2] = rho_12 * W1_std[i] * W2_std[i];
COVMAT[i,1,3] = rho_13 * W1_std[i] * W3_std[i];
COVMAT[i,1,4] = rho_14 * W1_std[i] * W4_std[i];
COVMAT[i,2,1] = COVMAT[i,1,2];
COVMAT[i,2,2] = 1          * W2_std[i] * W2_std[i] + 1e-10;
COVMAT[i,2,3] = rho_23 * W2_std[i] * W3_std[i];
COVMAT[i,2,4] = rho_24 * W2_std[i] * W4_std[i];
COVMAT[i,3,1] = COVMAT[i,1,3];
COVMAT[i,3,2] = COVMAT[i,2,3];
COVMAT[i,3,3] = 1          * W3_std[i] * W3_std[i] + 1e-10;
COVMAT[i,3,4] = rho_34 * W3_std[i] * W4_std[i];
COVMAT[i,4,1] = COVMAT[i,1,4];
COVMAT[i,4,2] = COVMAT[i,2,4];
COVMAT[i,4,3] = COVMAT[i,3,4];
COVMAT[i,4,4] = 1          * W4_std[i] * W4_std[i] + 1e-10;

L_COVMAT[i,:,:] = cholesky_decompose( COVMAT[i,:,:] );

}   // end loop over data
}

model {
//  priors

target += normal_lpdf( W1_std_c | prior_W_std, 1 );
target += normal_lpdf( W2_std_c | prior_W_std, 1 );
target += normal_lpdf( W1_std_d | prior_W_std, 1 );
target += normal_lpdf( W2_std_d | prior_W_std, 1 );

//  priors Wi_std
for (i in 1: data_n) {
target += normal_lpdf( W1_std[i] | prior_W_std, 1 );
target += normal_lpdf( W2_std[i] | prior_W_std, 1 );
target += normal_lpdf( W3_std[i] | prior_W_std, 1 );
target += normal_lpdf( W4_std[i] | prior_W_std, 1 );
}

//  priors Wi_mean
target += normal_lpdf( W1_mean | prior_W_mean, fabs( prior_W_mean ) * prior_W_cov );
target += normal_lpdf( W2_mean | prior_W_mean, fabs( prior_W_mean ) * prior_W_cov );
target += normal_lpdf( W3_mean | prior_W_mean, fabs( prior_W_mean ) * prior_W_cov );
target += normal_lpdf( W4_mean | prior_W_mean, fabs( prior_W_mean ) * prior_W_cov );

//  priors Z
target += normal_lpdf( Z_mean | prior_Z_mean, prior_W_cov * prior_Z_mean );
target += normal_lpdf( Z_alpha | prior_Z_std * tgamma(1/Z_beta) / tgamma(3/Z_beta),  prior_W_cov*sqrt( axes_n ) * prior_Z_std * tgamma(1/Z_beta) / tgamma(3/Z_beta) );

//  Likelihood
target += gen_normal_lpdf( Z | Z_mean, Z_alpha, Z_beta );

for (i in 1: data_n) {
W[i,:] ~ multi_normal_cholesky_lpdf(W_mean, L_COVMAT[i,:,:] );
}
}
``````

Sorry it’s taken so long for someone to get back to you. These hard models are hard to respond to.

With a recent enough version of Stan, you can rewrite your `gen_normal_lpdf to be faster as

``````return sum(log(beta_) - log(2 * alpha) - lgamma(inv(beta)) - pow(fabs(x - mu_) / alpha_, beta_));
``````

It avoids things like recalculating `log(beta_)` in each loop instance and uses more efficient linear algebra operations.

If this is a standard deviation, it needs to be declared with `lower=0` as well. We generally don’t recommend hard interval constraints as they usually don’t do what people think they’re going to do. The boundaries will bias the posterior means compared to unconstrained variables by pushing away probability mass.

Rather than explicitly parameterizing a covariance matrix, we strongly suggest using a Cholesky factor of a correlation matrix plus a scale vector, which can be given independent priors. The problem with explicit parameterization here is that there’s nothing to keep things positive definite other than rejection.

This is just me, but I’d also suggest not naming parameters things like `mean` as those are properties of random variables, not conventional names (e.g., in `normal(mu, sigma)`, the parameter mu is called a “location”).

2 Likes

Dear Bob, thank you these many hints. I ended up using a different model. But still these are very helpful comments.
Oliver