Transformed parameters converge

techniques
#1

Hi,

I have some trouble understanding the information i get from the summary table. My parameters have \hat{R} values far away from unity while my transformed parameters appears to behave quite well.
Below I have a more in depth explanation of my approach, followed by the implemented .stan model and finally the summary table.

Actual question

If anyone could tell me why only the transformed parameters behave well and why approach 3 so much slower than approach 1 and 2, it would be greatly appreciated.
Before anyone states it I am aware of the tree depth being saturated. But using method 1 or 2 never gets above leapfrog step three.

In depth

I am trying to solve the following inverse problem: \textbf{d}=\textbf{Gm}
Here \textbf{d} (size n) is observations, \textbf{G} (size n\times m) data kernel and \textbf{m} (size m) the model.
The parameters in \textbf{m} are dependent and from some simulations, \textbf{M} (Each column is a realization of \textbf{m}), I have an approximate covariance matrix \Sigma_M.

I have sampled the following ways:

  1. Each element in \textbf{m} being independent and normally distributed
  2. \textbf{m} following multivariate normal distribution using \Sigma_M
  3. Lastly projecting \textbf{m} into eigenspace, using the eigen-decomposition of \Sigma_M.

With method 3 I ran into a problem.
I sample \textbf{q} in eigen-space and project it back to determine the likelihood.
\textbf{m} = \textbf{Tq}.*\sigma_M + \mu_M
Here \textbf{T} is the tranformation matrix to and from eigen-space, .* elementwise multiplication, \sigma_M and \mu_M standard deviation and mean of each row in \textbf{M}, respectively.
The scaling with \sigma_M and \mu_M is done because \textbf{M} is standardized before determining \Sigma_M.

I found that this approach worked really bad. And realized that much of the information was “trapped” in \mu_M which was kept constant.
I then tried giving \mu_M a distribution. Which resulted in \textbf{m} having \hat{R}\approx1, but not \mu_M and \textbf{q}.
And also it required way to much computational time when compared to method 2.

Model

data {
  int<lower=0> L;	// Length of data-array
  int<lower=0> n;	// SH-degree
  int<lower=0> mMax;	// Amount of model parameters
  int<lower=0> p;	// Amount of model truncated parameters
  vector[L] d;		// Declare size of data-array
  matrix[L, mMax] G;	// Declare size of design matrix
  vector[mMax] igrfMean;
  vector[mMax] igrfStd;
  vector[mMax] orgStd;
  matrix[mMax, p] T;	// Transfer matrix
}

parameters {
  vector[p] q;	// Declare model array
  vector[mMax] mu; // Mean variable
}

transformed parameters {
  vector[mMax] m;
  m = ((T*q) .* orgStd)+(mu .* igrfStd + igrfMean);
}

model {
  vector[L] res;		// Declare residual array

  // Prior distribution of eigenspace
  for (i in 1:p) {
    q[i] ~ normal(0, 1);
  }

  // Prior distribution of mean
  for (i in 1:mMax) {
    mu[i] ~ normal(0, 1);
  }

  // Likelihood
  res = d-G*m;
  res ~ normal(0,1);
}

Output

WARNING:pystan:n_eff / iter below 0.001 indicates that the effective sample size has likely been overestimated
WARNING:pystan:Rhat above 1.1 or below 0.9 indicates that the chains very likely have not mixed
WARNING:pystan:3859 of 4000 iterations saturated the maximum tree depth of 10 (96.5 %)
WARNING:pystan:Run again with max_treedepth larger than 10 to avoid saturation
WARNING:pystan:Chain 4: E-BFMI = 0.0599
WARNING:pystan:E-BFMI below 0.2 indicates you may need to reparameterize your model
Inference for Stan model: None_0a54820ed44399940f2d61834cc82db1.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

        mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
q[1]    1.71    0.04   0.06   1.67   1.67   1.68   1.75   1.82      2  69.36
q[2]    4.59    0.03   0.04   4.51   4.56   4.61   4.62   4.62      2  73.53
q[3]   -1.34    0.02   0.03  -1.39  -1.36  -1.33  -1.33  -1.33      2  73.05
q[4]   -0.26    0.03   0.04  -0.29  -0.29  -0.28  -0.24  -0.19      2  79.27
q[5]   -7.73    0.01   0.02  -7.74  -7.74  -7.74  -7.72   -7.7      2  68.54
q[6]    1.52    0.04   0.06   1.48   1.48   1.49   1.56   1.63      2   66.2
q[7]   -6.04    0.03   0.05  -6.07  -6.07  -6.06   -6.0  -5.95      2  73.13
mu[1]  -0.18     0.9   1.28   -1.8  -1.25  -0.34   0.92   1.75      2 1249.7
mu[2]    2.5    3.34   4.73  -0.68  -0.61   0.01    6.1  10.67      2 1245.3
mu[3]   4.35    4.37   6.19  -1.03    0.2   1.77   8.96   14.9      2 502.12
mu[4] 122.34  115.22  163.1   8.89  17.22  38.15 235.88 418.68      2  68.87
mu[5]   1.86    3.16   4.47  -1.57  -1.08  -0.28   5.22   9.55      2 2346.3
mu[6]   6.41    7.78  11.01  -1.07  -0.42   0.61  14.39  25.45      2 1439.7
mu[7]  -0.39     0.9   1.27  -1.63  -1.61  -0.63   0.88   1.34      2 1396.7
mu[8]  -2884   62.72  88.78  -2945  -2941  -2929  -2822  -2723      2  74.26
m[1]  -5.2e4  3.4e-5 2.1e-3 -5.2e4 -5.2e4 -5.2e4 -5.2e4 -5.2e4   3869    1.0
m[2]  3138.7  4.1e-5 2.6e-3 3138.7 3138.7 3138.7 3138.7 3138.7   3934    1.0
m[3]  680.29  4.1e-5 2.5e-3 680.29 680.29 680.29  680.3  680.3   3800    1.0
m[4]  -172.9  2.6e-5 1.6e-3 -172.9 -172.9 -172.9 -172.9 -172.9   3816    1.0
m[5]   -3074  3.1e-5 1.9e-3  -3074  -3074  -3074  -3074  -3074   3695    1.0
m[6]  5004.7  3.0e-5 1.9e-3 5004.7 5004.7 5004.7 5004.7 5004.7   3931    1.0
m[7]  -715.2  3.5e-5 2.2e-3 -715.2 -715.2 -715.2 -715.2 -715.2   4094    1.0
m[8]  1198.8  3.4e-5 2.2e-3 1198.8 1198.8 1198.8 1198.8 1198.8   4253    1.0
lp__  -5.2e4  201.52  285.2 -5.2e4 -5.2e4 -5.2e4 -5.1e4 -5.1e4      2  72.49
0 Likes