I would first check, whether the posterior is not pathological in some way - improving the posterior geometry can improve speed by a much larger factor than optimizing the computation. What is the distribution of treedepth of your iterations? (You can use nuts_params
to get it in R, not sure about Python, but there certainly is a way)
Most of the hints at Divergent transitions - a primer are also applicable for debuggin pathlogical geometries, especially simplifying the model and trying to understand it better. I admit I have basically no experience with the math of the hazard/competing risk models, so can’t really help that much directly…
Also, if you end up deciding that you want to improve computation, than I would start with using the profiler (see e.g. Profiling Stan programs with CmdStanR • cmdstanr) to see which parts are actually slow.