Sampling covariance matrix with additional constrain


#1

Hi all,

I’m trying to use STAN to implement an information aggregation solution from this paper [https://pdfs.semanticscholar.org/fdb7/1aa9104995acf5936bff79351a36f58e8b06.pdf]

Basically, it assumes that (real-valued un-constrained) forecasts and actuals would follow a multivariate normal distribution

Let

  • F = [f1, f2, …, fn]’ be vector of forecasts

  • a be actual (scalar)

[F,a]’ ~ multi_normal(0, Sigma)

where for all f:

  • cov(f,f) = cov(f,a) = (rho(f,a) * sigma(a))^2

  • rho(f,a) > 0

If my linear algebra is not wrong, this would imply the following additional constrain:

  • r’ * inv(Omega) * r = 1

where

  • Omega is the correlation matrix of all forecasts (F)
  • r = [rho(f1,a), rho(f2,a),…,rho(fn,a)]’ is the vector of all correlations between forecasts and actual

Is it possible to implement this in Stan?

Many thanks in advance!

Andy


#2

\boldsymbol{\Omega}^{-1} has a Cholesky factor, \mathbf{L} such that \mathbf{L}\mathbf{L}^\top = \boldsymbol{\Omega}^{-1}. Thus, you can write the constraint as \mathbf{r} \mathbf{L}\mathbf{L}^\top \mathbf{r}^\top = 1, where \mathbf{r} is (now) a row vector of correlations between forcasts and actual values. If we let \mathbf{z} = \mathbf{L}^\top \mathbf{r}^\top, then this becomes \mathbf{z}^\top \mathbf{z} = 1, so \mathbf{z} is a unit vector.

All this suggests a reparameterization where \mathbf{z} and \mathbf{L} are declared in the parameters block while \mathbf{\Omega}^{-1} and \mathbf{r}^\top = \mathbf{L}^{-\top} \mathbf{z} are found in the transformed parameters block. But you would have to figure out the details of how that would work in your model.


#3

Hi Ben,

Many thanks for the suggestion. My apologies that It wasn’t clear from my original post, but, in addition, the target model also require:

  • r > 0 for each r

Is there any reparameterization that would support this?

Thanks,
Andy


#4

Here might be a better approach

parameters {
  simplex[?] r_raw;
  ...
}
transformed parameters {
  vector[?] r  = r_raw / quad_form(r_raw, inverse_spd(Omega)));
}

#5

Many thanks for the suggestion, it works!

I tested my implementation on fake data generated with Stan from this cov matrix.

     [,1] [,2] [,3]
[1,]  0.5 0.30 0.50
[2,]  0.3 0.25 0.25
[3,]  0.5 0.25 1.00

But, I want to ask if the n_eff is too low? If so, how can I improve it?

Inference for Stan model: t806h.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

                     mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
m                    0.45    0.00 0.03  0.39  0.43  0.45  0.47  0.50  1803    1
L_Omega[1,1]         1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
L_Omega[1,2]         0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
L_Omega[2,1]         0.82    0.00 0.03  0.77  0.80  0.82  0.84  0.86  2040    1
L_Omega[2,2]         0.57    0.00 0.04  0.50  0.55  0.57  0.60  0.64  2042    1
r_raw[1]             0.60    0.00 0.01  0.57  0.59  0.60  0.60  0.62  1675    1
r_raw[2]             0.40    0.00 0.01  0.38  0.40  0.40  0.41  0.43  1675    1
Omega[1,1]           1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
Omega[1,2]           0.82    0.00 0.03  0.77  0.80  0.82  0.84  0.86  2040    1
Omega[2,1]           0.82    0.00 0.03  0.77  0.80  0.82  0.84  0.86  2040    1
Omega[2,2]           1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
rho[1]               0.71    0.00 0.03  0.65  0.69  0.71  0.73  0.76  2678    1
rho[2]               0.48    0.00 0.03  0.42  0.46  0.48  0.50  0.54  2591    1
rho[3]               1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
f_info_Omega[1,1]    1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
f_info_Omega[1,2]    0.82    0.00 0.03  0.77  0.80  0.82  0.84  0.86  2040    1
f_info_Omega[1,3]    0.71    0.00 0.03  0.65  0.69  0.71  0.73  0.76  2678    1
f_info_Omega[2,1]    0.82    0.00 0.03  0.77  0.80  0.82  0.84  0.86  2040    1
f_info_Omega[2,2]    1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
f_info_Omega[2,3]    0.48    0.00 0.03  0.42  0.46  0.48  0.50  0.54  2591    1
f_info_Omega[3,1]    0.71    0.00 0.03  0.65  0.69  0.71  0.73  0.76  2678    1
f_info_Omega[3,2]    0.48    0.00 0.03  0.42  0.46  0.48  0.50  0.54  2591    1
f_info_Omega[3,3]    1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
f_info_Sigma[1,1]    0.50    0.00 0.04  0.42  0.47  0.50  0.53  0.58  2722    1
f_info_Sigma[1,2]    0.28    0.00 0.03  0.22  0.26  0.28  0.30  0.34  2467    1
f_info_Sigma[1,3]    0.50    0.00 0.04  0.42  0.47  0.50  0.53  0.58  2722    1
f_info_Sigma[2,1]    0.28    0.00 0.03  0.22  0.26  0.28  0.30  0.34  2467    1
f_info_Sigma[2,2]    0.23    0.00 0.03  0.18  0.21  0.23  0.25  0.29  2570    1
f_info_Sigma[2,3]    0.23    0.00 0.03  0.18  0.21  0.23  0.25  0.29  2570    1
f_info_Sigma[3,1]    0.50    0.00 0.04  0.42  0.47  0.50  0.53  0.58  2722    1
f_info_Sigma[3,2]    0.23    0.00 0.03  0.18  0.21  0.23  0.25  0.29  2570    1
f_info_Sigma[3,3]    1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
f_info_L_Sigma[1,1]  0.71    0.00 0.03  0.65  0.69  0.71  0.73  0.76  2678    1
f_info_L_Sigma[1,2]  0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
f_info_L_Sigma[1,3]  0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
f_info_L_Sigma[2,1]  0.39    0.00 0.03  0.33  0.37  0.39  0.41  0.46  2159    1
f_info_L_Sigma[2,2]  0.27    0.00 0.02  0.24  0.26  0.27  0.28  0.31  4045    1
f_info_L_Sigma[2,3]  0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
f_info_L_Sigma[3,1]  0.71    0.00 0.03  0.65  0.69  0.71  0.73  0.76  2678    1
f_info_L_Sigma[3,2] -0.18    0.00 0.04 -0.26 -0.21 -0.18 -0.15 -0.09  1903    1
f_info_L_Sigma[3,3]  0.68    0.00 0.04  0.61  0.66  0.68  0.71  0.75  2249    1
lp__                51.99    0.03 1.24 48.92 51.40 52.29 52.92 53.43  1510    1

here is my Stan code:

data {
  int N_p;
  int N_f;

  vector[N_f+1] y[N_p];
}

transformed data {
  int N_f_1;
  vector[N_f+1] zero;
  
  N_f_1 = N_f+1;
  zero  = rep_vector(0.0, N_f_1);
}
parameters {
  real<lower=0, upper=1>    m;
  cholesky_factor_corr[N_f] L_Omega;
  simplex[N_f] r_raw;
}

transformed parameters {
  matrix[N_f,N_f] Omega;
  vector[N_f_1] rho;
  matrix[N_f_1,N_f_1] f_info_Omega;
  matrix[N_f_1,N_f_1] f_info_Sigma;
  matrix[N_f_1,N_f_1] f_info_L_Sigma;

  Omega = multiply_lower_tri_self_transpose(L_Omega);
  f_info_Omega[1:N_f,1:N_f] = Omega;

  rho[1:N_f] = r_raw / quad_form(inverse_spd(Omega), r_raw) * m;
  rho[N_f_1] = 1.0;

  f_info_Omega[     ,N_f_1] = rho;
  f_info_Omega[N_f_1,1:N_f] = rho[1:N_f]';//'

  f_info_Sigma = quad_form_diag(f_info_Omega,rho);

  f_info_L_Sigma = cholesky_decompose(f_info_Sigma);
}

model {
  m ~ beta(2,2);
  y ~ multi_normal_cholesky(zero, f_info_L_Sigma);
}

#6

looks fine