Wish list for Stan interfaces

Bob_Carpenter · June 15, 2017, 8:58pm

@sakrejda posted this on the logging topic, but I’m moving it to its own topic to avoid hijacking the logging topic.

My [Krzysztof’s] wish list for an interface is something like:

these loggers
sane input format, binary and text
typed output for the stuff like mass matrix / warmup / config messages
streamed binary output that can be read during the run
utilities to read/validate config
config you can read from a file
threading
job scheduling based on config files

I could do most of this in an interface by going off on my own but I think
it’s better to push core Stan in a direction that makes it possible

Bob_Carpenter · June 15, 2017, 9:10pm

Do you have more concrete proposals?

these loggers: the current interface is low level. What do you want the high-level interfaces to be in the interfaces? Config by file? Config by static/global variable? Writing options to files, to the screen, to sockets, to databases?
sane input format: you’ll have to be more specific. I think everyone agrees that the R dump format has to go, but there’s not a concrete proposal on the table for a replacement as far as I know. If we want to go with somehting like JSON, someone will need to propose a schema for representing all of our data types. And when you say binary and text, I take it you mean two different formats? If so, should they be convertible?
typed output: not sure what you mean here. What do you consider to be output? If you look at something like R and have a matrix variable a in Stan and you have a fit object, then extract(fit)$a gives you the structured output, but it’s first indexed by draw, then by the two matrix arguments. Or do you mean file-based output?
streamed binary output: is that also typed in the sense of the previous version or do you want multiple formats? I think everyone agrees having streaming output is desirable if only so as not to blow out memory in R or Python and to make it easy to move output among interfaces (because the R tools have much richer visualizations than exist in Python at the moment)
config you can read from a file: config for calling sampling and optimization? Woudl that file then point ot another file with inputs or do you imagine data being in the same file? What about inits for mass matrices or parameters? Would this also config MPI, GPUs, etc (make-like config in addition to calling a sampler once everything’s made)?
threading: this can’t be done without tweaking the underlying memory handling in the autodiff library to make the global stacks thread local. There’s a performance hit per thread, but it allows you to run without copying data.
job scheduling: I’m not sure what you’re thinking here. Do you mean multiple fits scheduled over a cluster? Scheduled over multiple cores?

For loggers, there’s no interface to control logging level and everything just dumps to one level; so you’d have to first fix the underlying C++ code.

For binary inputs, you’d need to implement a var_context. @betanalpha wants to change how those work, so you need to coordinate with him.

For threading, you’d need to also modify the underlying C++ code and provide a way to control from the compilation whehter to use thread-local or global memory.

Topic		Replies	Views
Universal, static, logger-style output Developers pystan , rstan	57	3227	August 14, 2018
Usage of Arrow with Stan Interfaces	8	678	January 20, 2023
Proposal for consolidated output Developers features	39	2735	July 8, 2018
Urgent (funding app due today): anyone interested in joining Caterva crew to build R interface? Developers	5	545	August 5, 2020
RStan3 and PyStan3 Interface Developers	77	5985	July 16, 2017

Wish list for Stan interfaces

Related topics