Sampling covariance matrix with additional constrain

Hi all,

I’m trying to use STAN to implement an information aggregation solution from this paper [https://pdfs.semanticscholar.org/fdb7/1aa9104995acf5936bff79351a36f58e8b06.pdf]

Basically, it assumes that (real-valued un-constrained) forecasts and actuals would follow a multivariate normal distribution

Let

  • F = [f1, f2, …, fn]’ be vector of forecasts

  • a be actual (scalar)

[F,a]’ ~ multi_normal(0, Sigma)

where for all f:

  • cov(f,f) = cov(f,a) = (rho(f,a) * sigma(a))^2

  • rho(f,a) > 0

If my linear algebra is not wrong, this would imply the following additional constrain:

  • r’ * inv(Omega) * r = 1

where

  • Omega is the correlation matrix of all forecasts (F)
  • r = [rho(f1,a), rho(f2,a),…,rho(fn,a)]’ is the vector of all correlations between forecasts and actual

Is it possible to implement this in Stan?

Many thanks in advance!

Andy

\boldsymbol{\Omega}^{-1} has a Cholesky factor, \mathbf{L} such that \mathbf{L}\mathbf{L}^\top = \boldsymbol{\Omega}^{-1}. Thus, you can write the constraint as \mathbf{r} \mathbf{L}\mathbf{L}^\top \mathbf{r}^\top = 1, where \mathbf{r} is (now) a row vector of correlations between forcasts and actual values. If we let \mathbf{z} = \mathbf{L}^\top \mathbf{r}^\top, then this becomes \mathbf{z}^\top \mathbf{z} = 1, so \mathbf{z} is a unit vector.

All this suggests a reparameterization where \mathbf{z} and \mathbf{L} are declared in the parameters block while \mathbf{\Omega}^{-1} and \mathbf{r}^\top = \mathbf{L}^{-\top} \mathbf{z} are found in the transformed parameters block. But you would have to figure out the details of how that would work in your model.

Hi Ben,

Many thanks for the suggestion. My apologies that It wasn’t clear from my original post, but, in addition, the target model also require:

  • r > 0 for each r

Is there any reparameterization that would support this?

Thanks,
Andy

Here might be a better approach

parameters {
  simplex[?] r_raw;
  ...
}
transformed parameters {
  vector[?] r  = r_raw / quad_form(r_raw, inverse_spd(Omega)));
}

Many thanks for the suggestion, it works!

I tested my implementation on fake data generated with Stan from this cov matrix.

     [,1] [,2] [,3]
[1,]  0.5 0.30 0.50
[2,]  0.3 0.25 0.25
[3,]  0.5 0.25 1.00

But, I want to ask if the n_eff is too low? If so, how can I improve it?

Inference for Stan model: t806h.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

                     mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
m                    0.45    0.00 0.03  0.39  0.43  0.45  0.47  0.50  1803    1
L_Omega[1,1]         1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
L_Omega[1,2]         0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
L_Omega[2,1]         0.82    0.00 0.03  0.77  0.80  0.82  0.84  0.86  2040    1
L_Omega[2,2]         0.57    0.00 0.04  0.50  0.55  0.57  0.60  0.64  2042    1
r_raw[1]             0.60    0.00 0.01  0.57  0.59  0.60  0.60  0.62  1675    1
r_raw[2]             0.40    0.00 0.01  0.38  0.40  0.40  0.41  0.43  1675    1
Omega[1,1]           1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
Omega[1,2]           0.82    0.00 0.03  0.77  0.80  0.82  0.84  0.86  2040    1
Omega[2,1]           0.82    0.00 0.03  0.77  0.80  0.82  0.84  0.86  2040    1
Omega[2,2]           1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
rho[1]               0.71    0.00 0.03  0.65  0.69  0.71  0.73  0.76  2678    1
rho[2]               0.48    0.00 0.03  0.42  0.46  0.48  0.50  0.54  2591    1
rho[3]               1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
f_info_Omega[1,1]    1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
f_info_Omega[1,2]    0.82    0.00 0.03  0.77  0.80  0.82  0.84  0.86  2040    1
f_info_Omega[1,3]    0.71    0.00 0.03  0.65  0.69  0.71  0.73  0.76  2678    1
f_info_Omega[2,1]    0.82    0.00 0.03  0.77  0.80  0.82  0.84  0.86  2040    1
f_info_Omega[2,2]    1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
f_info_Omega[2,3]    0.48    0.00 0.03  0.42  0.46  0.48  0.50  0.54  2591    1
f_info_Omega[3,1]    0.71    0.00 0.03  0.65  0.69  0.71  0.73  0.76  2678    1
f_info_Omega[3,2]    0.48    0.00 0.03  0.42  0.46  0.48  0.50  0.54  2591    1
f_info_Omega[3,3]    1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
f_info_Sigma[1,1]    0.50    0.00 0.04  0.42  0.47  0.50  0.53  0.58  2722    1
f_info_Sigma[1,2]    0.28    0.00 0.03  0.22  0.26  0.28  0.30  0.34  2467    1
f_info_Sigma[1,3]    0.50    0.00 0.04  0.42  0.47  0.50  0.53  0.58  2722    1
f_info_Sigma[2,1]    0.28    0.00 0.03  0.22  0.26  0.28  0.30  0.34  2467    1
f_info_Sigma[2,2]    0.23    0.00 0.03  0.18  0.21  0.23  0.25  0.29  2570    1
f_info_Sigma[2,3]    0.23    0.00 0.03  0.18  0.21  0.23  0.25  0.29  2570    1
f_info_Sigma[3,1]    0.50    0.00 0.04  0.42  0.47  0.50  0.53  0.58  2722    1
f_info_Sigma[3,2]    0.23    0.00 0.03  0.18  0.21  0.23  0.25  0.29  2570    1
f_info_Sigma[3,3]    1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
f_info_L_Sigma[1,1]  0.71    0.00 0.03  0.65  0.69  0.71  0.73  0.76  2678    1
f_info_L_Sigma[1,2]  0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
f_info_L_Sigma[1,3]  0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
f_info_L_Sigma[2,1]  0.39    0.00 0.03  0.33  0.37  0.39  0.41  0.46  2159    1
f_info_L_Sigma[2,2]  0.27    0.00 0.02  0.24  0.26  0.27  0.28  0.31  4045    1
f_info_L_Sigma[2,3]  0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
f_info_L_Sigma[3,1]  0.71    0.00 0.03  0.65  0.69  0.71  0.73  0.76  2678    1
f_info_L_Sigma[3,2] -0.18    0.00 0.04 -0.26 -0.21 -0.18 -0.15 -0.09  1903    1
f_info_L_Sigma[3,3]  0.68    0.00 0.04  0.61  0.66  0.68  0.71  0.75  2249    1
lp__                51.99    0.03 1.24 48.92 51.40 52.29 52.92 53.43  1510    1

here is my Stan code:

data {
  int N_p;
  int N_f;

  vector[N_f+1] y[N_p];
}

transformed data {
  int N_f_1;
  vector[N_f+1] zero;
  
  N_f_1 = N_f+1;
  zero  = rep_vector(0.0, N_f_1);
}
parameters {
  real<lower=0, upper=1>    m;
  cholesky_factor_corr[N_f] L_Omega;
  simplex[N_f] r_raw;
}

transformed parameters {
  matrix[N_f,N_f] Omega;
  vector[N_f_1] rho;
  matrix[N_f_1,N_f_1] f_info_Omega;
  matrix[N_f_1,N_f_1] f_info_Sigma;
  matrix[N_f_1,N_f_1] f_info_L_Sigma;

  Omega = multiply_lower_tri_self_transpose(L_Omega);
  f_info_Omega[1:N_f,1:N_f] = Omega;

  rho[1:N_f] = r_raw / quad_form(inverse_spd(Omega), r_raw) * m;
  rho[N_f_1] = 1.0;

  f_info_Omega[     ,N_f_1] = rho;
  f_info_Omega[N_f_1,1:N_f] = rho[1:N_f]';//'

  f_info_Sigma = quad_form_diag(f_info_Omega,rho);

  f_info_L_Sigma = cholesky_decompose(f_info_Sigma);
}

model {
  m ~ beta(2,2);
  y ~ multi_normal_cholesky(zero, f_info_L_Sigma);
}

looks fine