Unfortunately I know to little of Stans internals to really judge that, but I believe it shouldn’t be too difficult. Our tool can be adapted quite quickly (at least partially).
For the weakest support with basic monitoring and basic warnings, you only need to send http requests to the tool. An initial request that contains the file, used algorithm, number of burnin samples and number of samples and then you could either send 1 request after each sample or you could send them batched every X samples (or every X seconds, doesn’t matter to the tool). For PyMC this was really easy because they have this `record` method in the PyMC backends that is called every sampler iteration. But I am sure that something similar could be done for stan. For stan specifically, given the stan programs are fully compiled c, if I am not mistaken, adding some http requests to the sampler when it’s compiled with a flag like `live_debugging` should be straight forward?
The nutpie feature seems neat too, but I believe it would be easier to implement with a subscriber like/callback interface.
To get the model graph and support for the funnel warnings we currently rely on GitHub - lasapp/lasapp: Language-Agnostic Static Analysis of Probabilistic Programming: Replication package . I believe it has no stan support, but we really only need an interface that our debugger can call that provides (a) a model graph, and (b) rv’s that have a scale parameter that is influenced by other rv’s.
And lastly for tailored warnings to stan, we need to add stan to the languages in the tool itself, which shouldn’t be to complicated (see PyMC language def: InferlogHolmes-Appendix/InferLogHolmes/extension/webview-src/ppl-debugger-webview/src/PPL/pymc.ts at main · ipa-lab/InferlogHolmes-Appendix · GitHub ). Otherwise it might be confusing when the warnings suggests to “change the target_accept to a higher value”, while stan would call this adapt_delta. Also the code change suggestions wouldn’t be great without 😅
But I also want to add that given MCMC monitor already exists for stan, and already seems more focused on people using it, than evaluating an idea (like our tool), maybe adding the warnings and sampler stats there would be an overall easier road.