This is surely helpful for the community, thanks. I am also still quite new to stan and torsten so having these resources is very nice. As pointed out above, I rather accept some non-optimized code in the beginning when I learn the concepts to then gradually increase the complexity. I have also read your paper about within-chain parallelization, it is very helpful and well-written! However, I have realized that implementing this would be one step to early before I have fully undersood some more fundamental aspects of stan/torsten. But it is on my list and I am happy to tackle it at a more advance stage in the future.