Sampling dependent parameters


#1

Hi,
I am trying to sample spherical harmonics.
Initially I thought of the model coefficients as independent and Gaussian distributed. I was successful in sampling to degree 25 (675 dimensions).
I have now been trying to think of the coefficients as dependent by using some simulations (from someone else) to define a covariance matrix for the coefficients. My plan was then to use a singular value decomposition of the covariance matrix to project to and from eigen-space. In that way I could sample independent coefficients and project them back to the original space.

But when trying my new prior I cannot get above degree 4 (24 dimensions), when using similar data and sampling inputs, before reaching Rhat < 0.9 and Rhat > 1.1.

Can anyone see if there is anything wrong with my model?

I have included the models below and attached data as rdump.

Covariance model:

data {
  int<lower=0> L;	// Length of data-array
  int<lower=0> n;	// SH-degree
  int<lower=0> mMax;	// Amount of model parameters
  vector[L] d;		// Declare size of data-array
  matrix[L, mMax] G;	// Declare size of design matrix
  vector[mMax] orgMean; // Mean of original model data
  vector[mMax] orgStd; // Standard deviation of original model data
  matrix[mMax, mMax] T;	// Transfer matrix
}

parameters {
  vector[mMax] q;	// Declare model array (eigen-space)
}

transformed parameters {
  vector[mMax] m;
  m = (T*q+orgMean).*orgStd; // Project back to original space
  // q in projected to standardized orginal space
  // Then it is scaled using orgMean and orgStd
}

model {
  vector[L] res;		// Declare residual array

  // Prior distribution
  for (i in 1:mMax) {
    q[i] ~ normal(0, 1); // Assume normal distribution
  }

  // Likelihood
  res = d-G*m;
  res ~ normal(0,1);
}

Original model assuming independent coefficients:

data {
  int<lower=0> L;	// Length of data-array
  int<lower=0> n;	// SH-degree
  int<lower=0> mMax;	// Amount of model parameters
  vector[L] d;		// Declare size of data-array
  matrix[L, mMax] G;	// Declare size of design matrix
  vector[mMax] limits;
}

parameters {
  vector[mMax] m;	// Declare model array
}

model {
  vector[L] res;		// Declare residual array

  // Prior distribution
  for (i in 1:mMax) {
    m[i] ~ normal(0, limits[i]/2);
  }

  // Likelihood
  res <- d-G*m;
  res ~ normal(0,1);
}

#2

It sounds like you’ve improved your prior on the parameters you’re sampling?

So previously you said:

q \sim \mathcal{N}(0, I)

And now you have:

q \sim \mathcal{N}(\mu, \Sigma)

Where \mu and \Sigma are given as data?

You can sample this directly in Stan:

q ~ multi_normal(mu, Sigma);

Or you can take the Cholesky decomposition of this, LL^T = \Sigma, and then do:

transformed data {
  matrix[mMax, mMax] L = cholesky_decompose(Sigma);
}
parameters {
  vector[mMax] z;
}
transformed parameters {
  vector[mMax] q = mu + L * z;
}
model {
  z ~ normal(0, 1);
}

And that has the effect of giving you the right prior on q. You could also do the LL^T thing a matrix square root using an eigendecomposition.

The thing that stood out to me in your code was the diagonal orgStd. Is that right? It’s convenient if so, but is it right?

That help at all?


#3

Thank you for the answer!
I did do an eigendecomposition, just outside STAN, was not aware that I could do in the STAN model.
I do believe that orgStd is correct.
Having the original model data D (size m\times n, m being coefficients and n observations) I standardized before doing the eigendecomposition D'_i=\frac{D_i-\mu_{i, org}}{\sigma_{i, org}}, i indexing each row/time series.
Then the eigendecomposition of the covariance matrix [U,S,V]=svd(cov(D')).
T in the model is then T=V\sqrt{S}.
That’s why I thought (T q+\mu_{org}).*\sigma_{org} would bring me from eigen- back to original-space.
Is this logic flawed?

Now to your suggestions:
I have now tried m ~ multi_norm(mu, sigma); which did successfully sample degree 4 (24 dimensions). I have had one unsuccessful attempt with degree 10 (120 dimensions). But am trying to constrain it by using 5 times more data. It does seem to require far more data to constrain compared to thinking of the coefficients as independent.
I have not yet tried the Cholesky decomposition, but might do it later today.

The reason I wanted to sample the parameters in an independent space is because they aren’t entirely Gaussian distributed. I therefore hoped that if this worked out I could figure out a way to replace q ~ normal(0,1); with a more appropriate and possible custom distribution. Maybe by fitting a spline?


#4

Standardizing data could definitely make sense, and doing transformations of the data in certain regressions can make the posterior easier to work with. Depends on the model: https://mc-stan.org/docs/2_18/stan-users-guide/QR-reparameterization-section.html . Perhaps that is what you’re looking for?

What led you to try these transformations? The QR thing you do for treedepth issues. The non-centering parameters thing you do to avoid changing curvature (funnel) problems.

The posterior, in the end, is gonna do whatever it wants. The prior can be independent, or whatnot, but we don’t have a huge amount of control over the posterior. Sampling statements in Stan (things with the ~ syntax) are increments to the log density, not actually random number draws (see here: https://mc-stan.org/docs/2_18/reference-manual/sampling-statements-section.html)