Long time to run

Dear Stan developers:

I have constructed a Bayesian semiparametric model with penalised splines and autoregressive errors for my simulated data. However, it was run alright (no divergence after warmup and no other warning messages and the outputs look fine) a couple of weeks ago and I had about 10 days off without working on this particular model and I came back to run it again both on my desktop computer and on my laptop. However, it failed to run, I kept getting the message.
Gradient evaluation took 0.296 seconds
1000 transitions using 10 leapfrog steps per transition would take 2960 seconds.
Adjust your expectations accordingly!

And then it just takes forever to run.

I am a bit confused as the model was running okay previously, but now it seems to have trouble. Where can potentially go wrong in this case? I begin debugging the issue by omitting the modelling block and just to generate both univariate B-splines basis functions and bivariate B-spline basis functions for the interaction terms in the model to see whether it will get stuck again.
My data is not particularly large only 101 observations with 12 basis functions in both univariate and bivariate B-splines with first-order differencing penalties for smoothness.
bayesian semi-parametric model with penalised splines and AR(1) errors.R (2.4 KB)

Any suggestions and ideas are most welcome.

Thanks in advance,

Jaslene

my stan model code
b_spline_int_AR(1)_penalty_model_v2.stan (9.2 KB)

That seems very slow. I’d check to make sure you are using the proper CFLAGS for optimization.

You can also use the multi_normal precision function to avoid inverting and uninverting the precision matrix.

Thank you. I will look into multi_normal precision function to speed up the inverting matrices issue.
It is not necessary for the model to construct both univariate and bivariate B-spline basis functions and penalty matrices within the stan model (currently they are being constructed in the transformed data block), I could have simply supplied them as data to run. I find this might be what causes the problem as with 101 observations and 12 basis functions, one of the penalty matrix is of dimension (144*144), though it is quite sparse and the bivariate B-spline basis function for the interaction term is of size (101, 12^2). so the matrix gets larger as increasing observations and basis functions. Could this be one of the reasons why the model run slowly?

The large matrix could be the issue, inverting is O(n^3) so that could be the issue. Right now you are inverting it to pass into the multinormal function, which is then inverting it again to calculate the density, so the whole inversion thing can be totally avoided and will likely speed things up a lot.

You may also be able to exploit the sparsity by directly coding up the log-likelihood and avoiding the matrix multiplications. Recent versions of STAN also have some new sparse matrix data structures but I have not had a chance to use them.

Thanks, I will also have a look at recent document of STAN.

The only sparsity we have is sparse matrix times dense vector, the workhorse of sparse GLMs.

You don’t ever want to create diagonal matrices and multiply like this:

K_X[i,j] * diag_matrix(rep_vector(1,num_basis))[k,l]

Use diag_post_mulitply instead.

The real key is cutting down numbers of operations here. But that’s only in transformed data, so it’s no big deal as it only happens once and there are no derivatives.

Rather than multi_normal, it’s more efficient (both in terms of time and arithmetic stability) to use Cholesky factors of covariance matrices. Rather than inverting, we have a multi_normal_precision distribution (precision is inverse of covariance).

Those gamma(1, 0.005) priors can be bad news because they tend to pull estimates away from zero quite strongly; Andrew’s written about this at length and his papers are cited in the priors discussion in the regression chapter of the manual.

You want to vectorize this

for (i in 1:num_data)
                                                         
    {
                                                         
      Y[i] ~ normal (Y_hat[i] + error, sigma) ;
     error = kappa * (Y[i] - Y_hat[i]) ;
                                                         
    }

as

Y ~ normal(Y_hat + kappa * (Y - Y_hat), sigma);

What you have is wrong because kappa is getting defined after it’s used. I have no idea what you’re trying to do here, but this looks suspicious given that Y occurs on both the left and right of the ~.

This doesn’t do anything given that you’ve properly declared constraints on kappa—every variable is implicitly given a uniform prior on all values satisfying its constraints:

  kappa ~ uniform (-1, 1) ;

but you’ll need to make Y_hat a vector to allow the vector arithmetic (vector plus scalar broadcasts the scalar to each dimension).

Hi Bob, thanks for your detailed reply. I will definitely look into some of the functions you mentioned above to reduce the number of operations in my programme and perhaps change the prior on lambda terms in my model block.

In terms of, the part
for (i in 1:num_data)

{
                                                     
  Y[i] ~ normal (Y_hat[i] + error, sigma) ;
 error = kappa * (Y[i] - Y_hat[i]) ;
                                                     
}

and you suggested to change to

Y ~ normal(Y_hat + kappa * (Y - Y_hat), sigma);

Actually, what I want to model is here is Y_i = a0 * X_i + a* f(X) + b* f(X,Z) + errors_i
where f(X) and f(X,Z) are modelled by a univariate B-spline and a bivariate B-spline with penalty priors.
Instead of IID errors, here the errors are AR(1) process.
errors_i = kappa * errors_(I-1) + w_i where w_i~N(0, sigma^2)

I was looking at one of the old posts https://groups.google.com/forum/#!topic/stan-users/A42gKWdLLTQ
about fitting an AR process for the error terms and that is what I have implemented, I think it is the right way (perhaps an efficient way) of modelling the AR process for the errors?

The key may be to move to a non-centered parameterization if you have a time series model; otherwise, you get a lot of prior dependency among the parameters. That’s fine if you have a lot of data, but you usually don’t in a time series. So rather than

alpha[t] ~ normal(alpha[t-1], sigma);

you can use

alpha[2:T] = alpha_raw * sigma;
alpha[2:T] = alpha[2:T] + alpha[1:(T-1)];
alpha_raw ~ normal(0, 1);

and treat alpha[1] separately, maybe with

alpha[1] = alpha_raw[1] * sigma1;