Hey Jens!
I think the important thing is whether you already have threading in your model or not. In terms of chains you need a sufficiently long warmup and some post warmups. So spamming a lot of chains is not the right way to go. Then it’s important how accurate the results should be, i.e. are you interested in tail quantities etc.? So maybe you can figure out how many chains make sense for your model in this way. Then multiply that by the number of threads that make sense for your model and you get a (really rough) rule of thumb for number of core you want. I’d recommend the compute (CPU) optimized machines.
Also, you might want to check out this thread if you haven’t already:
Mitzi mentions the folk theorem there and from my experience speeding up a model is often impossible if the model is incorrect… Just double check everything and look out for efficiency gains in the code (and then add threading).
I don’t know enough about HMM to make any statement regarding compute time of 3-state vs. 2-state models, sorry.
On a final note: If you are looking for speed you should definitely check out CmdStanR! :)
Hope this was at least a bit helpful.
Cheers,
Max