Long time to run

Jaslene_Lin · June 23, 2017, 3:20am

Dear Stan developers:

I have constructed a Bayesian semiparametric model with penalised splines and autoregressive errors for my simulated data. However, it was run alright (no divergence after warmup and no other warning messages and the outputs look fine) a couple of weeks ago and I had about 10 days off without working on this particular model and I came back to run it again both on my desktop computer and on my laptop. However, it failed to run, I kept getting the message.
Gradient evaluation took 0.296 seconds
1000 transitions using 10 leapfrog steps per transition would take 2960 seconds.
Adjust your expectations accordingly!

And then it just takes forever to run.

I am a bit confused as the model was running okay previously, but now it seems to have trouble. Where can potentially go wrong in this case? I begin debugging the issue by omitting the modelling block and just to generate both univariate B-splines basis functions and bivariate B-spline basis functions for the interaction terms in the model to see whether it will get stuck again.
My data is not particularly large only 101 observations with 12 basis functions in both univariate and bivariate B-splines with first-order differencing penalties for smoothness.
bayesian semi-parametric model with penalised splines and AR(1) errors.R (2.4 KB)

Any suggestions and ideas are most welcome.

Thanks in advance,

Jaslene

Jaslene_Lin · June 23, 2017, 3:22am

my stan model code
b_spline_int_AR(1)_penalty_model_v2.stan (9.2 KB)

aaronjg · June 23, 2017, 3:57am

That seems very slow. I’d check to make sure you are using the proper CFLAGS for optimization.

You can also use the multi_normal precision function to avoid inverting and uninverting the precision matrix.

Jaslene_Lin · June 23, 2017, 4:14am

Thank you. I will look into multi_normal precision function to speed up the inverting matrices issue.
It is not necessary for the model to construct both univariate and bivariate B-spline basis functions and penalty matrices within the stan model (currently they are being constructed in the transformed data block), I could have simply supplied them as data to run. I find this might be what causes the problem as with 101 observations and 12 basis functions, one of the penalty matrix is of dimension (144*144), though it is quite sparse and the bivariate B-spline basis function for the interaction term is of size (101, 12^2). so the matrix gets larger as increasing observations and basis functions. Could this be one of the reasons why the model run slowly?

aaronjg · June 23, 2017, 4:47am

The large matrix could be the issue, inverting is O(n^3) so that could be the issue. Right now you are inverting it to pass into the multinormal function, which is then inverting it again to calculate the density, so the whole inversion thing can be totally avoided and will likely speed things up a lot.

You may also be able to exploit the sparsity by directly coding up the log-likelihood and avoiding the matrix multiplications. Recent versions of STAN also have some new sparse matrix data structures but I have not had a chance to use them.

Jaslene_Lin · June 23, 2017, 4:51am

Thanks, I will also have a look at recent document of STAN.

Bob_Carpenter · June 23, 2017, 6:02pm

The only sparsity we have is sparse matrix times dense vector, the workhorse of sparse GLMs.

You don’t ever want to create diagonal matrices and multiply like this:

K_X[i,j] * diag_matrix(rep_vector(1,num_basis))[k,l]

Use diag_post_mulitply instead.

The real key is cutting down numbers of operations here. But that’s only in transformed data, so it’s no big deal as it only happens once and there are no derivatives.

Rather than multi_normal, it’s more efficient (both in terms of time and arithmetic stability) to use Cholesky factors of covariance matrices. Rather than inverting, we have a multi_normal_precision distribution (precision is inverse of covariance).

Those gamma(1, 0.005) priors can be bad news because they tend to pull estimates away from zero quite strongly; Andrew’s written about this at length and his papers are cited in the priors discussion in the regression chapter of the manual.

You want to vectorize this

for (i in 1:num_data)
                                                         
    {
                                                         
      Y[i] ~ normal (Y_hat[i] + error, sigma) ;
     error = kappa * (Y[i] - Y_hat[i]) ;
                                                         
    }

as

Y ~ normal(Y_hat + kappa * (Y - Y_hat), sigma);

What you have is wrong because kappa is getting defined after it’s used. I have no idea what you’re trying to do here, but this looks suspicious given that Y occurs on both the left and right of the ~.

This doesn’t do anything given that you’ve properly declared constraints on kappa—every variable is implicitly given a uniform prior on all values satisfying its constraints:

  kappa ~ uniform (-1, 1) ;

but you’ll need to make Y_hat a vector to allow the vector arithmetic (vector plus scalar broadcasts the scalar to each dimension).

Jaslene_Lin · June 24, 2017, 12:05am

Hi Bob, thanks for your detailed reply. I will definitely look into some of the functions you mentioned above to reduce the number of operations in my programme and perhaps change the prior on lambda terms in my model block.

In terms of, the part
for (i in 1:num_data)

{
                                                     
  Y[i] ~ normal (Y_hat[i] + error, sigma) ;
 error = kappa * (Y[i] - Y_hat[i]) ;
                                                     
}

and you suggested to change to

Y ~ normal(Y_hat + kappa * (Y - Y_hat), sigma);

Actually, what I want to model is here is Y_i = a0 * X_i + a* f(X) + b* f(X,Z) + errors_i
where f(X) and f(X,Z) are modelled by a univariate B-spline and a bivariate B-spline with penalty priors.
Instead of IID errors, here the errors are AR(1) process.
errors_i = kappa * errors_(I-1) + w_i where w_i~N(0, sigma^2)

I was looking at one of the old posts https://groups.google.com/forum/#!topic/stan-users/A42gKWdLLTQ
about fitting an AR process for the error terms and that is what I have implemented, I think it is the right way (perhaps an efficient way) of modelling the AR process for the errors?

Bob_Carpenter · June 27, 2017, 9:38pm

The key may be to move to a non-centered parameterization if you have a time series model; otherwise, you get a lot of prior dependency among the parameters. That’s fine if you have a lot of data, but you usually don’t in a time series. So rather than

alpha[t] ~ normal(alpha[t-1], sigma);

you can use

alpha[2:T] = alpha_raw * sigma;
alpha[2:T] = alpha[2:T] + alpha[1:(T-1)];
alpha_raw ~ normal(0, 1);

and treat alpha[1] separately, maybe with

alpha[1] = alpha_raw[1] * sigma1;

Topic		Replies	Views
Stan and cmdstan running slow Modeling	8	717	March 24, 2022
Speeding up multivariate normal model Modeling techniques	7	3164	September 9, 2017
Fine tuning for polynomial Posteriors Modeling performance	1	38	December 3, 2024
Simple Model Running Slowly in Stan Modeling performance	1	1483	August 24, 2018
Seeking expert stan modeler for help speeding up a complex stan model Jobs fitting-issues , specification , performance	4	928	July 29, 2020

Long time to run

Related topics