are you suggesting to model the transition matrix (e.g., with a Dirichlet regression) and a mixture of time distributions in which each component is associated with a state? Now that I think about it, if the transition matrix is independent on the dwell times, I can model transitions and dwell times separately right?
Thanks for your input. I realized I need to think a bit longer and harder on the problem.