Proper Way to Include Stan Model in Python Package


#1

Original question posted on Stack Overflow. Please let me know if I can copy and paste your answer to Stack Overflow, or you can do the same (for your credit).

I am writing a package in python (3.x) for working with a particular data set. The package reads this data, accesses it, plots, it, etc. I have also defined Bayesian models using stan and pystan which I would like to implement in this package.

pystan works by pointing to a particular .stan file which specifies the model:

def compile_model(filename, model_name=None, verbose=False, **kwargs):
'''
This will automatically cache models - great if you're just running a
script on the command line.
See http://pystan.readthedocs.io/en/latest/avoiding_recompilation.html
Code by Aki Vehtari: see
https://github.com/avehtari/BDA_py_demos/blob/new_pystan_demos/utilities_and_data/stan_utility.py
'''
from hashlib import md5
import pystan
import pickle

with open(filename) as f:
    model_code = f.read()
    code_hash = md5(model_code.encode('ascii')).hexdigest()
    if model_name is None:
        cache_fn = 'cached-model-{}.pkl'.format(code_hash)
    else:
        cache_fn = 'cached-{}-{}.pkl'.format(model_name, code_hash)
    try:
        sm = pickle.load(open(cache_fn, 'rb'))
    except:
        sm = pystan.StanModel(model_code=model_code)
        with open(cache_fn, 'wb') as f:
            pickle.dump(sm, f)
     else:
            if verbose:
                print('Using cached StanModel')
    return sm

I would like to know, in the context of building a package in python, what is the proper way to point to the filename so that I can call from a fitting function something like:

sm = compile_model(filename=stan_file)

(i.e. how to specify stan_file). My package’s directory structure is quite simple – simplified version is:

 .
β”œβ”€β”€ LICENSE
β”œβ”€β”€ README.md
β”œβ”€β”€ pgkname
β”‚   β”œβ”€β”€ PyFile.py
β”‚   β”œβ”€β”€ PyFile2.py
β”‚   └── __init__.py
β”‚   stan
β”‚    └── mod1.stan
└── setup.py

Clarification: the .stan file I would like to link to is ./stan/mod1.stan


#2

You’re trying to build a package that one might distribute on PyPI, right?

The main obstacle to distributing a compiled Stan model is that it needs to be compiled for different platforms separately. A compiled model which was compiled on macOS will not work on Linux and vice-versa.

A simple way around this problem would involve compiling (and saving) the desired model during installation with some sort of setuptools hook.

I hope this answer helps. I wish I had better news.


#3

Yes – I actually am OK with having the model compile the first time the user runs it. My problem is actually just a simpler one of how to correctly point to the stan file – do relative paths work?


#4

Are the models static?

Why not include them as a variable/class (str) inside the package?

modulename.model_str.stan_model1 == "functions { ...."

edit. Relative path should work or you can even use fileobject. See how stanc function reads in the data.

Edit2. (If external files are needed) maybe the most robust way to read in a file is to use the absolute path ( os.path.join(modulename.__file__, 'model_dir', 'stan_model_file.stan')


#5

I only know how to do this for R, but you might look at how Facebook does their Python version of prophet:


#6

Oops, that code is not written by me. That code was written by Michael Betancourt and I picked it up from his case study and I didn’t notice that the particular file didn’t have the copyright notice included (licence info was only in the directory). I’ve now added the correct copyright and license (Michael Betancourt, BSD3) notice to that file.


#7

Got it. Thanks for clarifying! I should have looked to the directory’s license file.


#8

To give credit where credit is due, @seantalts wrote that particular function based of an example by @ariddell!