Aim
I am attempting to decompose an air-quality 2D matrix (1500 time points x 80 compounds) into several factors (4-5) with some autocorrelation restrictions for the time series. We call this model BAMF (Bayesian Autocorrelated Matrix Factorisation, Rusanen et al., 2024) and we refer to G to talk about the time series matrix and to F for the compound fingerprints matrix (see Figure 1). We want to apply the regularized horseshoe to the compound fingerprints so that the individual compounds’ contribution can be narrowed to zero for certain factors while staying large for the other factors.
I would like to receive some advice on what might be wrong with our horseshoe implementation and some possible solutions for these issues, also if needed, ideas on tests to diagnose the issue. Find the model and input data attached.
Method
We want to the regularized horsehoe prior as indicated in Piironen and Vehtari (2017) to our matrix F so that compounds have zero contributions on certain factors. For this purpose, we set a prior on F:
F_{kj} \sim Normal(0,\sigma_{HS}\cdot \tau \cdot \tilde{\lambda}_{kj})
where the k, j indices sweep the factors, components, respectively, σHS is a parameter set positive and τ and λ ̃ are parameters defined as:
\tau \sim Cauchy^+ (0, \sigma_{HS} \cdot scale_{global}),
\tilde{\lambda} = \sqrt{\frac{c^2 \cdot \lambda ^2}{c^2 + \tau^2 \cdot \lambda^2}}
where
\lambda \sim Cauchy^+(0,1),
scale_{global} \sim \frac{p_0}{(p-p_0)\cdot \sqrt{m}},
c = slab_scale \cdot \sqrt{c_{aux}},
c_{aux} \sim \Gamma^{-1}(0.4\cdot slab_{df}, 0.5\cdot slab_{df}),
and slabscale and slabdf are 2.5 and 4, respectively, as indicated in the article, m is the number of components , p the number of factors, and p0 the desired number of non-zero components of a factor.
Problem
As you may see in Figure 2, the horseshoe application to the 5 factors (x-axis) fingerprints is not working as expected in this range of compounds shown (y-axis). The regular BAMF is in blue the BAMF + the regularized Horseshoe (BAMF+HS) in orange. The orange distributions are not making the compounds more regularized upon the factors, I would expect that it would shrink some components to zero while maintaining the large contributions in certain factors, but this is not happening or not enough. However, I reckon that not even for BAMF we get the expected perfect Gaussians, implying there might be an issue there as well.
I tried to strengthen the horseshoe power by reducing the p0 even to one but no improvement was found regarding that. I believe the issues could be:
a) our parameters and F matrix are not stable enough after the 2000 samples (1000 are warm-up);
b) the horseshoe priors are wrongly set and/or another prior would suit better our problem.
I’m inclined for a) since I found huge variability of thau, lambda_tilde, c, along samples (Figure 3). R hat is low in both cases, BAMF and BAMF+HS, but a bit lower for BAMF+HS (Figure 4). My guess is that since our parameters are not stable, are not horseshoeing properly. If that’s the case, how could I solve this?
Model
//
// BAMF + Horseshoe prior all m/zs
// 7/5/2024
// T-students changed to Cauchys.
// Lambda is m/z dependent.
//
//
functions {
matrix vector_array_to_matrix(vector[] x) {
matrix[size(x), rows(x[1])] y;
for (m in 1:size(x))
y[m] = x[m]';
return y;
}
}
data {
int<lower=1> n; // number of data points
int<lower=1> m; // number of dimensions
int<lower=1> p; // number of components
matrix[n,m] X; // data matrix
matrix<lower=0>[n,m] sigma; // separate sigma for each of the observations (!)
vector<lower=0>[n-1] timesteps; //How much time elapses between values in X (arbitrary unit, but unit choice also affects other priors)
real<lower=0> minerr; //Minimal error
}
transformed data {
vector[n*m] Xv = to_vector(X);
vector[n*m] sigmav = to_vector(sigma);
real < lower =0> p_zero =3 ; //# slab scale for the regularized horseshoe
real < lower =0> scale_global;
real < lower =0> slab_scale ; //# slab scale for the regularized horseshoe
real < lower =0> slab_df; //# slab degrees of freedom for the regularized horseshoe
scale_global = p_zero / ((p-p_zero)*sqrt(n));
slab_scale = 2.5;
slab_df = 4;
}
parameters {
matrix<lower=0>[n,p] G; // score matrix
simplex[m] F[p]; // profiles simplex
vector<lower=0>[p] alpha_a; // Posterior a
vector<lower=minerr>[p] alpha_b; // Posterior b
real < lower =0> tau; // Global shrinkage parameter
matrix < lower =0>[p,m] lambda ; // Local shrinkage parameter
real < lower =0> caux ; //A regularisation parameter
real<lower=0> logsigma; // Logsigma
}
transformed parameters {
real < lower =0> sigma_hs ; // noise std
matrix < lower =0 >[p,m] lambda_tilde ; // 'truncated' local shrinkage parameter
real < lower =0> c; // slab scale
matrix [n, m] Z; // latent function values
sigma_hs = exp ( logsigma );
c = slab_scale * sqrt ( caux );
lambda_tilde = sqrt ( c^2 * square (lambda) ./ (c^2 + tau ^2* square (lambda)) );
Z = G*vector_array_to_matrix(F);
}
model {
lambda ~ cauchy ( 0, 1);
tau ~ cauchy ( 0, scale_global * sigma_hs );
caux ~ inv_gamma (0.5* slab_df , 0.5* slab_df );
for (k in 1:p) {
for (j in 1:m) {
F[k,j] ~ normal(0, sigma_hs * tau * lambda_tilde[k,j]); // Horseshoe prior
}
}
for(j in 1:p) {
G[2:,j] ~ cauchy(G[:(n-1),j], alpha_a[j]*timesteps + alpha_b[j]);
}
Xv ~ normal(to_vector(G*vector_array_to_matrix(F)), sigmav);
}
You can find the .stan file attached in this post and the input data file here. Since we use a synthetic dataset for model testing, the “truth" for F, G is also attached. Let me know if you need me to provide something else.
Thanks.
References
Piironen, J., & Vehtari, A. (2017). Sparsity information and regularization in the horseshoe and other shrinkage priors.
Rusanen, A., Björklund, A., Manousakas, M., Jiang, J., Kulmala, M. T., Puolamäki, K., & Daellenbach, K. R. (2023). A novel probabilistic source apportionment approach: Bayesian auto-correlated matrix factorization. Atmospheric Measurement Techniques Discussions, 2023, 1-28.
bamf_HS_lambda_jk.stan (2.7 KB)
F_truth.txt (5.0 KB)
G_truth.txt (108.5 KB)