Efficiency of constrained Poisson regression

I have a large model (~100k parameters, ~1.6M observations) with a Poisson GLM likelihood. The relevant bits are:

...
parameters {
    vector[n_coeffs] gamma_raw;
    ...
}

transformed parameters {
    vector[n_coeffs] gamma = 0.1*gamma_raw;
    ...
}

model {
  ...
  y ~ poisson_log_glm(X, alpha, gamma);
} 

The scaling is so that gamma_raw are on the unit scale. n_coeffs is small (~40), the high-dimensional part is alpha, and there is a lot of work going on in transformed parameters to set it up.

This model fits well enough, but a couple of gammas are not as expected. I realised that domain knowledge implies that some of them should be greater than others or than 0. I have therefore imposed constraints as follows:

parameters {
    positive_ordered[2] zeta_euro;
    ordered[4]          zeta_intl;
    vector[2]           zeta_rest;
    vector[n_coeffs-8]  gamma_raw;
    ...
}

transformed parameters {
    vector[8]          zeta;
    vector[n_coeffs-8] gamma = 0.1*gamma_raw;


    zeta = 0.1*append_row(append_row(zeta_euro, zeta_intl), zeta_rest);
    ...
}

model {
  ...
  y ~ poisson_log_glm(X, alpha, append_row(zeta, gamma));
} 

This fits well and zetas are as expected, but it takes 2-3x longer in wall clock time, even though treedepth, acceptance rate and step size are all pretty much the same (treedepth exactly equal). I know that append_row copies things, but it’s not that much at the end of the day. This leaves the constraints – are they inherently costly, or can I do something clever here to get back close to previous fitting times?

My apologies to anyone who spent time thinking about it. The problem was with my local hardware. In the cloud, the constrained model runs as gingerly as the unconstrained one.