Rstan versus cmdstan run times

I am trying to gain an understanding why rstan models take much longer to run than the same model built and compiled with cmdstan (through cmdstanr) in R. One model takes 0.1-0.2 seconds for cmdstan but 2-3 seconds for rstan for the same numbers of iterations. For context, this is in an R package where an MCMC step is included in model fitting. Currently it uses cmdstanr for that step but I wanted to compile the Stan program as part of the package so looked at rstan, ideally incorporating the Stan program as part of the broader C++ program making use of rstan’s stan_fit class (although I haven’t got this work yet). An example of the Stan program is at the bottom. I have tried compiling the R package with rstan using compiler options like -fno-math-errno -O3, but it comes nowhere close to cmdstanr’s run times. Is there any way of getting rstan to run comparably to cmdstanr?

An example of the Stan program:

functions {
  real partial_sum1_lpdf(array[] real y, int start, int end){
    return std_normal_lpdf(y[start:end]);
  real partial_sum2_lpdf(array[] real y,int start, int end, vector mu,vector sigma){
    return normal_lpdf(y[start:end]|mu[start:end],sigma[start:end]);
data {
  int N; 
  int Q; 
  vector[N] Xb;
  matrix[N,Q] Z;
  array[N] real y;
  vector[N] sigma;
parameters {
  array[Q] real gamma;
model {
  int grainsize = 1;
  target += reduce_sum(partial_sum1_lpdf,gamma,grainsize);
  target += reduce_sum(partial_sum2_lpdf,y,grainsize,Xb + Z*to_vector(gamma),sqrt(sigma));

what version or RStan?
what version of CmdStan is CmdStanR running?