If the generative processes are fully independent and do not share any parameters (e.g. covarying random effects) then it’s fine to model them separately, and can even be preferable since if there are problems in the posterior geometry separating the models will easily and definitively localize those problems to one side or the other. A potential advantage to fitting jointly is that, if using brms or similar, you get all the machinery to predict the response in one step.

I agree that the continuous data looks a bit heavy-tailed here. I might start with a Student-t observational model with a relatively conservative prior on \nu^{-1}, such as \text{normal}(\nu^{-1} \mid 0, 0.11) that keeps \nu above 4-ish.

The separation here is a consequence of mixing discrete and continuous processes. Consider what would happen for two discrete processes, such as zero-inflating a Poisson model. Here the observation y = 0 has a non-zero probability of arising from both models so we can’t precisely assign that observation to one model or the other. Instead we have to fit the joint model that allows for the possibility that the zero came from either component at the same time.

That’s the difference between zero-inflation and the hurdle model. With zero-inflated Poisson, there are two potential sources of a zero—the Poisson or the inflation. With the hurdle model, the zero always comes from the zero component of the mixture.