Run time

If you are using algorithm="NUTS", the default, I’d say it is very (extremely?) unlikely that the number of integration steps stays constant along the entire chain, that’s how regular HMC algorithm works, and probably the main motivation for a dynamic criterion for the number of time steps for each MCMC iteration is that there will not normally be a fixed number that will work most efficiently everywhere in parameter space.
So that is weird, but I think there are other you could give you better advice on that than I could if that keeps happening.

I haven’t sampled spherical harmonics before, but I did have an issue with greatly varying run times in PyStan for a model with many parameters. I found that some of the chains that ended faster were actually converging around a lower value of the posterior (this thread is about a different issue, but the PyStan traces illustrate what I’m talking about). I think that can be attributed to the step size and mass matrix optimized for the different chains, which was not good enough for some of them.

You said Rhat ~ 1, so that is probably not exactly the case, but it can still be that the traces are different during burnin and that could explain the difference in run time – a chain that takes longer to converge may be using very small time steps and wasting computation. So maybe it’s useful to look at the complete trace of 'lp__' and 'stepsize__' and see if that could be a reason.