"Fixed_param" mode agonisingly slow

Hello,

I have a Gaussian Process model that’s agonisingly slow in “fixed_param” mode on RStan 2.19.2 in Windows.

When I run Stan in NUTS sampling mode it takes a few minutes for 100 iterations.

I have inserted print statements at the end of the transformed data block and the start of the generated quantities block.

What happens is:

  • The print statement at the end of the transformed data block appears in a fraction of a second
  • Stan then prints out SAMPLING FOR MODEL 'GP' NOW (CHAIN 1).
  • Nothing happens for a long time, although the CPU seems to be active (Rstudio shows in the Windows task manager that it’s done 1 core’s work)
  • After a long time, Stan prints my print statement at the start of the generated quantities block, Beginning generated quantities...
  • Stan then claims that the sampling is done in 8 seconds
Chain 1:  Elapsed Time: 0 seconds (Warm-up)
Chain 1:                8.769 seconds (Sampling)
Chain 1:                8.769 seconds (Total)

The overall run time is 26 minutes.

My generated quantities block has one primary loop that loops over the number of simulations I want. The output is a matrix that’s 150 rows x 5000 columns. (150 rows = number of simulations, 5000 columns = data points).

Here’s my call to Stan: stanFit_predict <- sampling(stanModel_predict, datForStan_predict, chains=1, iter=1, warmup=0, algorithm="Fixed_param")

Any idea what’s going on here? Why does it take ~26 minutes to go from the transformed data block to start executing the generated quantities block?

(There’s no parameters block or transformed parameters block and the model block is empty).

Thanks in advance!

1 Like

Something seems really weird. Could you post the Stan program?

I won’t post the whole thing because there’s some quite lengthy functions, but here’s the core code:

data{
  int<lower=1> ntrain; // Number of data points
  int<lower=1> npred; // Number of predictions to make
  vector[ntrain] y; // Data
  int<lower=1> nx; // Number of x predictors
  matrix[ntrain,nx] Xtrain;
  matrix[npred,nx] Xpred;
  int<lower=0,upper=1> enableRegressors; // Should separate ARD kernel be used for regressors
  vector[ntrain] t; // Time periods for the training data
  real<lower=0> timeLengthPrior; // Guess at the temporal autocorrelation length scale

  // Outputs from fitted GP model
  int nsim; // Number of simulations in the input data
  real sigma[nsim]; // Observational noise

  vector[nx] L_X[nsim]; // Length scale of the X variables
  real A_ARD_X[nsim]; // Amplitude of the GP for the predictors
  vector[nx] beta_X[nsim]; // Linear regression coefficients

  real A_time[nsim]; // Amplitude of the temporal fluctuations
  real l_time[nsim]; // Length scale of the temporal fluctuations

  real A_RQ[nsim]; // Amplitude
  real l_RQ[nsim]; // Length scale
  real alpha_RQ[nsim]; // Alpha

  real A_RQ_2[nsim]; // Amplitude
  real l_RQ_2[nsim]; // Length scale
  real alpha_RQ_2[nsim]; // Alpha
}

// ############################# Transformed data ##########################
transformed data{
  real nugget = 1e-5; // Nugget to be added to covariance matrices to prevent ill conditioning

  matrix[ntrain+npred, nx] X; // Combined Xfit and xpred
  matrix[ntrain+npred, nx] X_rescaled; // Combined Xfit and xpred

  vector[nx] Xtrain_colmin; // Means of columns of Xtrain - used for rescaling
  vector[nx] Xtrain_colmax; // Means of columns of Xtrain - used for rescaling

  vector[ntrain] y_standardised; // Normalise y to have mean zero and standard deviation one - will accelerate convergence

  real ysd; // Standard deviation of the data
  real ymean; // Mean of the data - used for mildly informative prior

  // Combine Xtrain and Xpred into one matrix
  for(k in 1:nx){
    X[1:ntrain,k] = Xtrain[,k]; // Xfit
    X[(ntrain+1):(ntrain+npred),k] = Xpred[,k]; // Xpred
  }

  // Calculate the scale of the X predictors
  for(k in 1:nx){
    Xtrain_colmin[k] = min(Xtrain[,k]);
    Xtrain_colmax[k] = max(Xtrain[,k]);
  }

  // Rescale X to based on the Xfit min and max so that predictors lie in the range [-0.5, 0.5]
  for(k in 1:nx){
    for(i in 1:(ntrain+npred)){
      X_rescaled[i,k] = (X[i,k]-Xtrain_colmin[k]+1e-9)/(Xtrain_colmax[k]-Xtrain_colmin[k]+1e-9)-0.5; // Need small number to avoid divide by zero errors if data all constant
    }
  }

  ymean = mean(y);
  ysd = sd(y);

  // Standardise y data
  for(i in 1:ntrain){
    y_standardised[i] = (y[i] - ymean)/ysd;
  }

  print("...Done with transformed data...")

}

model{
}

generated quantities{
  matrix[nsim, npred] ypred; // Predictions

  print("Beginning generated quantities...")
  {

    for(p in 1:nsim){
      {
        vector[ntrain+npred] meanfunc; // Include the predictions for simplicity now
        real sigma_sq_use; // Square of observational noise

        matrix[ntrain, ntrain] K; // Covariance matrix for the training data
        matrix[ntrain, npred] K_fitpred; // Covariance between the training data and preidctions
        vector[npred] ypred_standardised;

        print("Prediction iteration ",p," of ",nsim)

        // Calculate the mean function
        sigma_sq_use = square(sigma[p]); // Squared observational noise
        meanfunc =  X_rescaled*beta_X[p];

        // Calculate covariance matrix for fit
        K = calc_covariance(1, ntrain, 1, ntrain, L_X[p], X_rescaled[1:ntrain,],
          A_time[p], l_time[p], t,
          A_RQ[p], alpha_RQ[p], l_RQ[p],
          A_RQ_2[p], alpha_RQ_2[p], l_RQ_2[p],
          A_ARD_X[p], enableRegressors, 0);

        // Calculate covariance for the cross terms
        K_fitpred = calc_covariance(1, ntrain, ntrain+1, ntrain+npred, L_X[p], X_rescaled,
          A_time[p], l_time[p], t,
          A_RQ[p], alpha_RQ[p], l_RQ[p],
          A_RQ_2[p], alpha_RQ_2[p], l_RQ_2[p],
          A_ARD_X[p], enableRegressors, 0);

        // Now predict the expected value of the GP
        ypred_standardised = gp_pred_nonoise(K, K_fitpred, y_standardised,
          meanfunc[1:ntrain], meanfunc[(ntrain+1):(ntrain+npred)], sigma[p], nugget);
        for(i in 1:npred){
          ypred[p,i] = ypred_standardised[i]*ysd+ymean; // Get back onto the original scale
        }
      }
    }
  }
}

The functions that sit under calc_covariance() calculate the various covariance terms in the GP.

Anything with nsim on it relates to parameters that are fitted through a separate model that shares very similar code. That is, I’m passing in the MCMC iterations from the same model fitted in different Stan code and then using the loop over nsim to perform predictions at new points based on the posterior.

For reference, nsim=100, ntrain=150 and npred=5000.

(The same code and inputs with npred=5000 when run through the original NUTS sampling has no issues)

This might be the issue with R relying on nested lists for arrays of arrays

The data are passed in as matrices. Do you think there’s an issue in the conversion? I can always read them in as matrices and copy then into arrays of arrays so as not to break the type definitions in my code

I just meant that R’s list-of-lists are slow when you try to iterate over them when rstan copies data over to Stan (and back) it might be really slow if either input or output has nested lists. This has come up before on the forums. You could throw out most of your model and just generate equivalent sized I/O to test if that is the issue. IDK what the alternative is for you so not sure how much effort it’s worth to you.

OK, I’ve solved the problem. It’s not related to the fixed parameters mode - it’s apparently related to trying to allocate large arrays that are retained as output for rstan.

Things I tried:

  • Moving almost all the workload to the transformed data block
  • Reading in as a matrix and then copying into an array of vectors
  • Putting in a fake parameter and then trying NUTS sampling rather than Fixed_param

The problem turns out to be this line in the generated quantities block:

matrix[nsim, npred] ypred; // Predictions

With iter=1, nsim=50 and npred=5000 this is surprisingly slow and takes more than ten minutes.

By doing averaging over nsim within Stan in the generated quantities block, and returning an niter*npred array the mean predictions take about ten seconds and the averaging takes about 0.1s.

This means that I can’t inspect the posterior distribution of predictions externally, but only look at derived quantities like the mean and standard deviation. This is annoying but not fatal for me.

I don’t know why setting up a 1505000 array would be so slow. At 8 bytes per entry it’s only 2 MB.

Here’s the new generated quantities block:

generated quantities{
  vector[npred] ypred; // Predictions
  vector[npred] ypred_sd; // Standard deviation of predictions

  print("Beginning generated quantities...")
  {
    /* NOTE: Averaging is done within Stan rather than outside Stan, otherwise a very large number
    of elements have to be done within memory. This is very slow. */
    matrix[nsim, npred] ypred_sim; // Predictions over all simulations

    // Generate predicted values based on each simulation iteration
    for(p in 1:nsim){
      for(i in 1:npred){
        ypred_sim[p,i] = ypred_standardised[p,i]*ysd+ymean; // Get back onto the original scale
      }
    }

    // Generate mean and sd of predictions over the simulation iterations for each data point
    for(i in 1:npred){
      ypred[i] = mean(ypred_sim[,i]);
      ypred_sd[i] = sd(ypred_sim[,i]);
    }
  }
}
3 Likes

Thanks so much for putting the work in to document this. This has come up before with rstan and it’s clearly fixable when rstan’s output is allowed to change (I think Ben has been saying rstan3 due to whatever constraints he’s dealing with… IIRC it’s that he didn’t want to change data output formats or something like that).

2 Likes