My feeling is that in order to get natural return objects we need to have three stages, which I repeat:
compile Stan program (translate to C++, compile and link C++): results in compiled model that can be reused with other data
add data to program: results in density function that can be called for log density and derivatives and transformations
sample: results in a sample of multiple chains consisting of multiple draws
(3) is made up of multiple calls to sample individual chains with different chain IDs that then get put back together, whereas (1) consists of translating the Stan program to C++, compiling the C++, dynamically linking the C++.
If I understand correctly, RStan currently provides (1) with
stan_model(), (2+3) with
optimizing() by passing in the result of
stan_model(), and (1+2+3) is combined with top-level call tostan()` (which only does sampling with HMC). I believe Allen is suggesting providing (1+2) and (3) in Python.
I agree with Allen that it's bad to combine output. That's why my own preference is to separate (1), (2), and (3). If you want to start combining them, then things like the compiled and linked code will be an implicit global.
At this point we're going around in circles, so I think you guys need to make a decision. I don't think either Python or R hew very strongly to any kind of programming idiom or notion of modularity across their popular packages. They both tend to jam smaller operations together into large composed calls, whereas I come from a more Unix, C/C++, Java religious background where modularity is king (because it's easier to document and test and gives the user more flexibility to compose their own operations). If we look at particular Python packages,
- PyMC3 cooks the data and model together
- scikit-learn compiles the model with some data (in the form of fixed model parameters), but then leaves the data contributing to the likelihood for another call;
- Edward has a concept of model with lazy data, so it can presumably customize just what data's baked into the model and which is provided at runtime
I just don't see consistent usage we can follow.