Requesting a suggestion for a faster implementation of a model involving mixture distribution

I would first check, whether the posterior is not pathological in some way - improving the posterior geometry can improve speed by a much larger factor than optimizing the computation. What is the distribution of treedepth of your iterations? (You can use nuts_params to get it in R, not sure about Python, but there certainly is a way)

Most of the hints at Divergent transitions - a primer are also applicable for debuggin pathlogical geometries, especially simplifying the model and trying to understand it better. I admit I have basically no experience with the math of the hazard/competing risk models, so can’t really help that much directly…

Also, if you end up deciding that you want to improve computation, than I would start with using the profiler (see e.g. Profiling Stan programs with CmdStanR • cmdstanr) to see which parts are actually slow.