If I can spped up base my reduce sum stan code

Paparize · July 14, 2023, 3:14pm

I have 30000 samples to run in my program. I use reduce sum method but it still needs to run for a long time.

I have the code like this. If I can write the loop of reduce sum to a vector?

functions {
  real partial_sum(
    int[] slice_y,
    int start, int end,
    matrix x1,matrix x2, vector[] thresh,
    vector beta1, vector[] beta2, int[] g
  )
  {
    real lp = 0;
    for(i in start:end)
      lp += ordered_logistic_lpmf(slice_y[i-start+1] |x1[i]*beta1+x2[i]*beta2[g[i]]  , thresh[g[i]]);
    return lp;
  }
  real induced_dirichlet_lpdf(vector c, vector alpha, real phi) {
    int K = num_elements(c) + 1;
    vector[K - 1] sigma = inv_logit(phi - c);
    vector[K] p;
    matrix[K, K] J = rep_matrix(0, K, K);
    // Induced ordinal probabilities
    p[1] = 1 - sigma[1];
    for (k in 2:(K - 1))
      p[k] = sigma[k - 1] - sigma[k];
    p[K] = sigma[K - 1];
    // Baseline column of Jacobian
    for (k in 1:K) J[k, 1] = 1;
    // Diagonal entries of Jacobian
    for (k in 2:K) {
      real rho = sigma[k - 1] * (1 - sigma[k - 1]);
      J[k, k] = - rho;
      J[k - 1, k] = rho;
    }
    return   dirichlet_lpdf(p | alpha)
           + log_determinant(J);
  }
}

data {
  int<lower=1> N;             // Number of observations
  int<lower=1> K;             // Number of ordinal categories
  int<lower=1> D;
  int<lower=1> DN;
  array[N] int<lower=1, upper=K> y; // Observed ordinals
  matrix[N,D-DN] x1;
  matrix[N,DN] x2;
  int g[N];
  int<lower=1> P;        //P different threshold 
}
parameters {
  vector[D-DN] beta1;  
  vector[DN] beta21;
  real<lower=0> sigma;
  
  vector<lower=0>[DN] Omega;
  ordered[K - 1] thresh[P]; 
  vector[DN] yita[P-1];
  //cholesky_factor_corr[DN] L_Omega;
  
  

}

transformed parameters{
  vector[DN] beta2[P];
 
  beta2[1] = beta21;
  for (i in 2: P){
    beta2[i] = beta2[i-1] + Omega.*yita[i-1];
  }
}
model {
  //vector[DN] Zero= rep_vector(0,DN);
  beta1~ normal(0,10);
  beta21 ~normal(0,10);
  thresh[1] ~ induced_dirichlet(rep_vector(1, K), 0);
  sigma ~ normal(0,1);
  Omega ~ normal(0,1);
  //yita[1] ~ normal(0,1);
  for (i in 1: (P-1)){
   (thresh[i+1]-thresh[i])~ normal(0,sigma);
   //(beta2[i+1]-beta2[i])~ normal(0,Omega);
   yita[i] ~ normal(0,1);
  }
  target += reduce_sum(partial_sum, y, 1, x1,x2, thresh, beta1,beta2, g);
}

Some suggestions?

ssp3nc3r · July 17, 2023, 4:21pm

You can do a few things within partial_sum, though that may not be the bottleneck. You can order the data such that the groups g are ordered, which would minimize slicing the array. You can also make lp a vector and just sum once in the return statement:

int len = end-start+1;
array[len] real lp;
for(i in 1:len) {
  lp[i] = ...;
}
return sum(lp);

cmdstanr has profiling that you can use to see what parts of the model are bottlenecks.

Also, in reduce_sum, you are setting the grainsize to 1, which means let stan automatically decide. If memory copies are limiting instead of CPU crunching, you can start with a more conservative split. Basically, something like dividing the number of observations by the number of physical cores per chain. That way, you use as many physical cores as you have available but minimize memory copies. Or, depending on how many groups you have, you might split the data across groups; then inside the partial_sum there is no loop at all.

Topic		Replies	Views
How to use reduce_sum in for loops? Modeling	8	363	January 10, 2024
Reduce_sum function in stan Modeling	5	1002	September 27, 2023
Reduce_sum() no time saving for multilevel model Modeling techniques	1	449	December 24, 2020
How to use reduce-sum for a function returning vector (implementation of Dirichlet Process Mixtures) Developers	11	501	September 18, 2022
Improving Runtime & Reduce Sum for Hierarchical model with large number of groups Modeling	8	540	October 30, 2023

If I can spped up base my reduce sum stan code

Related topics