Sampling dependent parameters

#1

Hi,
I am trying to sample spherical harmonics.
Initially I thought of the model coefficients as independent and Gaussian distributed. I was successful in sampling to degree 25 (675 dimensions).
I have now been trying to think of the coefficients as dependent by using some simulations (from someone else) to define a covariance matrix for the coefficients. My plan was then to use a singular value decomposition of the covariance matrix to project to and from eigen-space. In that way I could sample independent coefficients and project them back to the original space.

But when trying my new prior I cannot get above degree 4 (24 dimensions), when using similar data and sampling inputs, before reaching Rhat < 0.9 and Rhat > 1.1.

Can anyone see if there is anything wrong with my model?

I have included the models below and attached data as rdump.

Covariance model:

data {
  int<lower=0> L;	// Length of data-array
  int<lower=0> n;	// SH-degree
  int<lower=0> mMax;	// Amount of model parameters
  vector[L] d;		// Declare size of data-array
  matrix[L, mMax] G;	// Declare size of design matrix
  vector[mMax] orgMean; // Mean of original model data
  vector[mMax] orgStd; // Standard deviation of original model data
  matrix[mMax, mMax] T;	// Transfer matrix
}

parameters {
  vector[mMax] q;	// Declare model array (eigen-space)
}

transformed parameters {
  vector[mMax] m;
  m = (T*q.*orgStd)+orgMean; // Project back to original space
  // q in projected to standardized orginal space
  // Then it is scaled using orgMean and orgStd
}

model {
  vector[L] res;		// Declare residual array

  // Prior distribution
  for (i in 1:mMax) {
    q[i] ~ normal(0, 1); // Assume normal distribution
  }

  // Likelihood
  res = d-G*m;
  res ~ normal(0,1);
}

Original model assuming independent coefficients:

data {
  int<lower=0> L;	// Length of data-array
  int<lower=0> n;	// SH-degree
  int<lower=0> mMax;	// Amount of model parameters
  vector[L] d;		// Declare size of data-array
  matrix[L, mMax] G;	// Declare size of design matrix
  vector[mMax] limits;
}

parameters {
  vector[mMax] m;	// Declare model array
}

model {
  vector[L] res;		// Declare residual array

  // Prior distribution
  for (i in 1:mMax) {
    m[i] ~ normal(0, limits[i]/2);
  }

  // Likelihood
  res <- d-G*m;
  res ~ normal(0,1);
}
#2

It sounds like you’ve improved your prior on the parameters you’re sampling?

So previously you said:

q \sim \mathcal{N}(0, I)

And now you have:

q \sim \mathcal{N}(\mu, \Sigma)

Where \mu and \Sigma are given as data?

You can sample this directly in Stan:

q ~ multi_normal(mu, Sigma);

Or you can take the Cholesky decomposition of this, LL^T = \Sigma, and then do:

transformed data {
  matrix[mMax, mMax] L = cholesky_decompose(Sigma);
}
parameters {
  vector[mMax] z;
}
transformed parameters {
  vector[mMax] q = mu + L * z;
}
model {
  z ~ normal(0, 1);
}

And that has the effect of giving you the right prior on q. You could also do the LL^T thing a matrix square root using an eigendecomposition.

The thing that stood out to me in your code was the diagonal orgStd. Is that right? It’s convenient if so, but is it right?

That help at all?

1 Like
#3

Thank you for the answer!
I did do an eigendecomposition, just outside STAN, was not aware that I could do in the STAN model.
I do believe that orgStd is correct.
Having the original model data D (size m\times n, m being coefficients and n observations) I standardized before doing the eigendecomposition D'_i=\frac{D_i-\mu_{i, org}}{\sigma_{i, org}}, i indexing each row/time series.
Then the eigendecomposition of the covariance matrix [U,S,V]=svd(cov(D')).
T in the model is then T=V\sqrt{S}.
That’s why I thought (T q+\mu_{org}).*\sigma_{org} would bring me from eigen- back to original-space.
Is this logic flawed?

Now to your suggestions:
I have now tried m ~ multi_norm(mu, sigma); which did successfully sample degree 4 (24 dimensions). I have had one unsuccessful attempt with degree 10 (120 dimensions). But am trying to constrain it by using 5 times more data. It does seem to require far more data to constrain compared to thinking of the coefficients as independent.
I have not yet tried the Cholesky decomposition, but might do it later today.

The reason I wanted to sample the parameters in an independent space is because they aren’t entirely Gaussian distributed. I therefore hoped that if this worked out I could figure out a way to replace q ~ normal(0,1); with a more appropriate and possible custom distribution. Maybe by fitting a spline?

#4

Standardizing data could definitely make sense, and doing transformations of the data in certain regressions can make the posterior easier to work with. Depends on the model: https://mc-stan.org/docs/2_18/stan-users-guide/QR-reparameterization-section.html . Perhaps that is what you’re looking for?

What led you to try these transformations? The QR thing you do for treedepth issues. The non-centering parameters thing you do to avoid changing curvature (funnel) problems.

The posterior, in the end, is gonna do whatever it wants. The prior can be independent, or whatnot, but we don’t have a huge amount of control over the posterior. Sampling statements in Stan (things with the ~ syntax) are increments to the log density, not actually random number draws (see here: https://mc-stan.org/docs/2_18/reference-manual/sampling-statements-section.html)

#5

I am trying to solve a similar forward problem as in the QR-reparameterization example \textbf{d}=\textbf{Gm}.
Initially I thought of \textbf{m} as independent, which worked, but is definitely incorrect. And yes, the posterior will do its own thing, but that doesn’t mean I shouldn’t improve my prior.
For that reason I wanted to treat \textbf{m} as dependent in the prior.
The easiest way I could think of it was to instead look at a set of transformed parameters and see them as independent.

Using multi_norm() works great. Maybe my problem is that I don’t fully understand the role of sampling statements. Normally I have to implement my own sampling algorithm. And I feel that using multi_norm() is applying a black-box within which I don’t know what happens.

#6

Oh okay, there are a few things:

y ~ multi_normal(mu, Sigma)

translates to:

target += multi_normal_lpdf(y | mu, Sigma)

multi_normal_lpdf is just the log of the PDF here: https://en.wikipedia.org/wiki/Multivariate_normal_distribution

The reason we’re incrementing this target thing is because that’s what HMC needs to do the sampling. “target” in Stan is just the log of the likelihood times the prior. For descriptions of HMC, check out: https://arxiv.org/pdf/1206.1901.pdf and https://arxiv.org/pdf/1701.02434.pdf

It’s sometimes best to start by writing the likelihood times prior as just a product of probabilities. The tilde shorthand can get confusing quickly.

#7

Yes, but I have not been able to find the content of: multi_norm_lpdf()
It should just be a function like any other?

I am glad you say that, because I do not understand where the likelihood in multi_norm() comes from. The forward problem in nowhere specified in it, only the prior, which is why I typically add
residuals ~ normal(0,1)

If I where to do it manually should I calculate both likelihood and prior

// Pseudo code
parameters {
  vector m
}
model {
  real prior = calc_prior(m)
  real likelihood = calc_likelihood(d,m) // Here d is some declared data
  target += log(prior*likelihood)
}

Thanks for the articles. I have read the one by Betancourt, but not the other. Maybe I should have a look at that one.

#8

Here’s the function definition: https://github.com/stan-dev/math/blob/develop/stan/math/prim/mat/prob/multi_normal_lpdf.hpp

There’s some stuff in there that will probably look weird. Let me know if you have questions. Just keep in mind it’s evaluating the log of the multivariate normal pdf and it’s doing it in a way to make sure the automatic differentiation can be done efficiently.

The only problem with what you wrote is that you’d want to do everything on the log scale, but otherwise it’s good. So:

real prior = calc_log_prior(m);
real likelihood = calc_log_likelihood(d,m); // Here d is some declared data
target += prior + likelihood;

The log scale is important to avoid numerical underflows (things rounding to zero accidentally).

The Radford one has nice R code in it for HMC (check Figure 2) which I like.

#9

I have implemented my own calculations of multi_norm(), just to prove it to myself. But am not sure why size_y and size_vec are doing. I understand the first term as to be multiplied by the amount of parameters, which is either size_y or size_vec.

if (include_summand&lt;propto&gt;::value)
    lp += NEG_LOG_SQRT_TWO_PI * size_y * size_vec;

if (include_summand&lt;propto, T_covar_elem&gt;::value)
    lp -= 0.5 * log_determinant_ldlt(ldlt_Sigma) * size_vec;

And am working through Gelman’s article.
In my initial question I didn’t understand why applying an eigen-decomposition didn’t work.

I think the problem is standardization before determining the covariance matrix and then decomposing it. The mean, \mu, that is extracted though standardization contain a lot of information, which is never disputed.
It should then be treated as a variable, but that would double the amount of variables being sampled.