Speeding up multivariate normal model

Hi all,

I am trying to use Stan to fit a Multivariate normal model to my data. Currently, I have a response vector tF (y1 in Stan file) of size 200 and a Design matrix (X1) of dimension 200 by 9. My regression matrix H has dimension 200 by 26.

It takes around 30 minutes in RStudio and I was wondering if you haveMySpeed1MyVersion.stan (1.3 KB)
any suggestions how could I speed it up.

Thank you in advance.

And here is the code in R that I use to fit my model in Stan.StanUserGroup.R (637 Bytes)

and the data that you could use for fitting. StanUser.data.R (131.7 KB)

Hmm, maybe tighten up the priors on the length scale and see if that changes anything? This looks like a Gaussian process (https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations#priors-for-gaussian-processes)

lognormal(0.1, 0.5) seems like it could be pretty heavy tailed. The latest manual (2.16) has some updated stuff on length scale priors you could look at. I think by default the new examples use gamma(4, 4)s and such.

You’ll probably want to tighten up your priors on beta as well probly. There can be identifiability issues between what is explained by the mean and left to the covariance in these models I think.

Dunno how much mileage you’ll get out of that though. Give it a go!

1 Like

What does the posterior look like? Are the parameters all reasonably well identified?

We’d generally recommend using scale as a parameter—that is, sigma, not sigma_sq.

I don’t know enough about the kind of GP you’re trying to put together to understand the half-normal prior you put on sigma_sq, but the precise shape rarely matters as much as where the bulk of the probability mass is, whether it goes to zero at zero, and how long the tails are.

 for(k in 2:Np){
    beta[k] ~ normal(0,10);

can be vectorized to

beta[2:nP] ~ normal(0, 10);

but that’s not going to do much for you. Why no prior on beta[1]?

If you don’t want to save all those transformed parameters, you can make them local variables in the model.

//  matrix[N1, N1] L;

this better be commented out or it’ll create an improper posterior.

 delta = to_row_vector(delta_par);
  for(i in 1:N1) XScaled[i] = X1[i] ./ delta;

Why not just declare delta as a row-vector parameter rather than taking a vector then transposing it?

Nugget = diag_matrix(rep_vector(nugget, N1));
Sigma = cov_exp_quad(XScaled, sqrt(sigma_sq), pow(sqrt(2), -1 )) + sigma_sq*Nugget;

this is bad news. Don’t ever create diagonal matrices unless absolutely necessary. It is far far more efficient to avoid defining Nugget and do this:

Sigma = cov_exp_quad(XScaled, sqrt(sigma_sq), 1 / sqrt(2));
for (k in 1:rows(Sigma))
  Sigma[k,k] += sigma_sq * nugget[k];

It avoids creating a quadratic number of zero values on the autodiff stack.

Does cov_exp_quad not have any jitter built in? I haven’t used it.

Yes, it is a Gaussian Process model. Thank you for the great suggestion on the priors for length scale parameters, lognormal(0.1, 0.5) is indeed a heavy tailed and it indeed makes a difference.

I think you are right that specifying a huge diagonal matrix is unnecessary. I actually managed to implement your suggestions together with Cholesky transformation of a covariance matrix and now it takes only 2 minutes to run a model.
However, I have some quick questions for you:

  1. Could you set a lower bound for parameters in a row-vector like you could do for integers and real?
  2. Also could I have a sneak peak inside the function cov_exp_quad, and if yes, how could I do it in Stan?

Thanks for all your great suggestions! It did really help with the speed!:)

Yes, see the manual. For example:

row_vector<lower=x>[N] a;

The definition is in the Stan manual and source is on GitHub in stan-dev/math.

Here’s the one that does derivatives:


And without: