Modularity for making Stan more Pythonic and Rthonic

The function I posted above also isn’t as efficient as it should be because it has to do 1 iteration of sampling every time foo is called. If you just wanted foo as a function of params (with data always the same) then we would only need to do 1 iteration once before creating foo, not every time. That’s an artifact of the current RStan implementation that dates back a long time but should be better in v3.

It should just require 0 iterations, right?

In theory yes but currently we have to run for more than zero or it will error for reasons that predate me, maybe even Ben, not sure.

OK, I guess we can set treedepth=1 so it will take minimal time, but in any case the purpose of this exercise (we can talk and elaborate it a bit) is to demonstrate the functionality that we’d like to have in the future. So it’s fine if the implementation is a bit clunky.

We can talk more.

If you’re willing to wrangle some R lambdas you can already do this.

my_compile <- function(program_file) {
  function(data, params) {
    fit <- stan(program_file, chains = 0)
    log_prob(fit, unconstrain_parameters(fit, params))
  }
}

and then use is

posterior <- my_compile('my_program.stan')
lp <- posterior(c(1.2, 3.9, 0.3))

That still requires you to provide the constrained parameters in order as a list. I don’t know if there’s an easy way to have structured parameters the way we’d use for inits. You can also drop the unconstrain_parameters thing in the function and do everything on the unconstrained scale.

@Bob_Carpenter I see a data argument to the anonymous function but I don’t see data getting used anywhere. In order to be able to change the data I think you’d need something like this:

stan_compile <- function(file) {
  comp <- stan_model(file)
  function(data, params) {
    suppressWarnings({
      x <- sampling(comp, data = data, chains = 0)
    })

    # should only be used if no constrained parameters since grad_log_prob()
    # actually wants unconstrained params
    grad_and_target <- grad_log_prob(x, upars = params)

    out <- list()
    out$target <- attr(grad_and_target, "log_prob")
    attr(grad_and_target, "log_prob") <- NULL
    out$gradient <- grad_and_target
    out
  }
}

Thanks for catching that. The data needs to go in the call to stan().

Why are you suppressing warnings?

I got really lost in this:

out <- list()
out$target <- attr(grad_and_target, "log_prob")
attr(grad_and_target, "log_prob") <- NULL
out$gradient <- grad_and_target
out

Why not just return grad_and_target? Given that you’re going to make a two-element list, then why not pull out just the log density value for $target and just the gradient vector for $gradient? As is, the gradient gets set to the gradient and target after nulling out the target. Is it more efficient in R to code it how you did? Will out$gradient just be a vector even though it looks like it starts as an object (or list?).

I guess because the target is stored as an attribute in grad_and_target, but for a user it’s easier to see that’s available if it’s returned as a separate element in a list.

out$gradient will have the same format as grad_and_target, but its log_prob attribute has been nulled out, since it’s returned separately.

I think there’s a few extra switches that can cut down on the initialization (and there’s some extras in my code that are unnecessary!). For ctsem I use something like the function below – using fast=FALSE is still faster than what is in this thread, fast=TRUE version doesn’t give full functionality of the stanfit class, but sufficient for running:

myfit$log_prob(blah blah all arguments must be explicitly named)

stan_reinitsf <- function(model, data,fast=FALSE){
  if(fast) sf <- new(model@mk_cppmodule(model),data,0L,rstan:::grab_cxxfun(model@dso))
  
  if(!fast) suppressMessages(suppressWarnings(sf<-sampling(model,iter=0,chains=0,init=0,data=data,check_data=FALSE,  control=list(max_treedepth=0),save_warmup=FALSE,test_grad=FALSE)))
  return(sf)
}
1 Like

Basically what @mcol said. I wanted to have a list with slots target and gradient as the returned object because that seemed nice, and I didn’t want the gradient slot having an attribute “log_prob” so I NULLed that out.

I could have just done this:

out <- list()
out$target <- log_prob(x, upars = params)
out$gradient <- grad_log_prob(x, upars = params)

but then I’m computing log_prob when I call both functions and there’s also that attribute that gets printed:

print(out)
$target
[1] -6.931472

$gradient
[1] -3
attr(,"log_prob")
[1] -6.931472

I tried doing things step-by-step as I couldn’t understand the code @jonah wrote.

> model <- stan_model(model_code = "parameters { real<lower = 0, upper = 1> theta; }")

> fit <- sampling(model, chains = 0)
the number of chains is less than 1; sampling not done

> glp <- grad_log_prob(fit, upars = c(0.3))

> glp
[1] -0.148885
attr(,"log_prob")
[1] -1.40871

> str(glp)
 num -0.149
 - attr(*, "log_prob")= num -1.41

I’ve never heard of attr but it looks like it’s something attached to a number. So the real question is why did you choose that return type for grad_log_prob and if it’s good enough there, why isn’t it good enough for this function?

So I went the rest of the way to see what happens:

> out <- list()

> out$target <- attr(glp, "log_prob")

> attr(glp, "log_prob") <- NULL

> out$gradient <- glp

> str(out)
List of 2
$ target : num -1.41
$ gradient: num -0.149

It looks like this truly does remove the attribute, at least as far as str() is concerned.

Is there a way to just get a number out of a number and leave behind all the attributes and something. If we had such a function, let’s say numerical_val_only, this would be much cleaner with

    grad_and_target <- grad_log_prob(x, upars = params)
    out <- list()
    out$target <- attr(grad_and_target, "log_prob")
    attr(grad_and_target, "log_prob") <- NULL
    out$gradient <- grad_and_target
    out

coded as

glp <- grad_log_prob(x, upars = params)
list(target = attr(glp, "log_prob"),
     gradient = numerical_val_only(glp))

That way, nothing gets modified. It’s generally a lot easier to reason about and test code that

  1. constructs well-formed objects whole (like the return list here rather than building them up incrementally with setters/getters as with the out variable in the original code), and
  2. treats objects as immutable once constructed (like glp here assuming we can somehow pull out the numerical value without destroying the return type).

It’s not always possible in high-performance code, but when you can do this, you’ll find it makes your life a whole lot easier during testing as the number of side effects goes down and you no longer need tests for valid state everywhere.

I didn’t ;). I’ve been around for a while but that predates me! I wasn’t saying it should be that way, just trying to work with what’s already there.

I like that. Using attr(x, name) <- NULL is the standard way to do it in R, but I agree it’s suboptimal. Unfortunately removing all attributes of an object doesn’t give you the numerical value you want because R also uses attributes to store information about dimensions. So in R the dim attribute needs to be preserved in order to keep a matrix a matrix (i,e., to not flatten it). So numerical_val_only could remove all attributes except a few. Maybe something like this:

# Remove attributes except for information about dimensions
numerical_val_only <- function(x) {
  dims <- dim(x)
  nms <- dimnames(x)
  attributes(x) <- NULL
  structure(x, dim = dims, dimnames = nms)
}

Now this works for me:

glp <- grad_log_prob(x, upars = params)
list(target = attr(glp, "log_prob"),
     gradient = numerical_val_only(glp))

$target
[1] -6.931472

$gradient
[1] -3

Python example for log_prob and grad_log_prob.

There are two examples. First wraps with function and can not be saved for the later use and the other uses class which can be pickled if saved in external python file and imported with import.

Setting up data

import pystan
import datetime
import numpy as np

model_code = """
data {
    int<lower=1> N;
    matrix<lower=0>[N,N] Y;
} parameters {
    vector<lower=0>[N] x;
}
model {
    for (n in 1:N) {
        for (m in 1:N) {
            Y[n,m] ~ normal(x[n], x[m]);
        }
    }
}
"""

data = {
    "N" : 10,
    "Y" : abs(np.random.randn(10, 10)) + 10
}

Example 1: Function wrap

Simple example wrapping fit
(THIS “fails”, if init fails for the sampling. To fix this, you can manually setup the fit class (as in the class example)).

def wrap_log_prob(fit):
    
    def log_prob(pars, lp=True, grad_lp=True, adjust_transform=True):
        upars = fit.unconstrain_pars(pars)
        returns = []
        if lp:
            returns.append(fit.log_prob(upars, adjust_transform))
        if grad_lp:
            returns.append(fit.grad_log_prob(upars, adjust_transform)))
        return tuple(returns)

    par_dim_pair = ('"{}": {}'.format(item, tuple(dim) if isinstance(dim ,list) else "") for item, dim in zip(fit.model_pars, fit.par_dims))
        
    doc = ("Calculate log_prob and grad_log_prob for Stan model with data.\n    pars : dict\n    Model parameters (name, shape):\n        {model_pars}\n"
           "    lp : bool\n        return log_density\n    grad_lp : bool\n        return gradient of the log_density\n"
           "    adjust_transform : bool\n        Whether we add the term due to the transform from constrained\n        space to"
           " unconstrained space implicitly done in Stan.").format(model_pars="\n        ".join(par_dim_pair))
        
    log_prob.__doc__ = doc
    
    return log_prob
stan_model = pystan.StanModel(model_code=model_code)
fit = stan_model.sampling(iter=1)
log_prob = wrap_log_prob(fit)
print(log_prob({"x" : np.ones(10)}))

# (-4795.230045738186,
# array([1042.93067947, 1003.14750843, 1086.25723069, 1050.24616967,
#        1017.67878138, 1113.06497286, 1044.31842895, 1063.42547567,
#         988.48122581, 1058.35411879]))

Example 2: Class wrap

A bit more bloated example using class and manually creating fit without sampling (this could be done also with the previous method)

import pystan
import datetime

class LogProb:
    
    def __init__(self, data, *, model=None, fit=None, model_code=None, file=None):
        """Wrap log_prob and grad_log_prob for Stan model with data.

        Parameters
        ----------
        data : dict
        model : StanModel, optional
            PyStan StanModel object.
        fit : StanFit4Model, optional
            PyStan fit object.
        model_code : str, optional
            model code as a string.
        file : str, optional
            path to .stan file.
        """
        if model is None and fit is None:
            if all(item is None for item in (model_code, file)):
                raise TypeError("Needs one defined {model, fit, model_code, file}")
            
        self._model = model
        self._data = data
        self._fit = fit
        self._model_code = model_code
        self._file = file
        
        if fit is None:
            if model is None:
                model = pystan.StanModel(file=file, model_code=model_code)
            seed = pystan.misc._check_seed(None)
            fit = model.fit_class(data, seed)
            
            fnames_oi = fit._get_param_fnames_oi()
            n_flatnames = len(fnames_oi)
            fit.sim = {'samples': [],
                       'chains': 0,
                       'iter': 0,
                       'warmup': 0,
                       'thin': 1,
                       'n_save': 0,
                       'warmup2': 0,
                       'permutation': [],
                       'pars_oi': fit._get_param_names_oi(),
                       'dims_oi': fit._get_param_dims_oi(),
                       'fnames_oi': fnames_oi,
                       'n_flatnames': n_flatnames}
            fit.model_name = model.model_name
            fit.model_pars = fit._get_param_names()[:-1]
            fit.par_dims = fit._get_param_dims()[:-1]
            fit.mode = 2
            fit.inits = {}
            fit.stan_args = []
            fit.stanmodel = model
            fit.date = datetime.datetime.now()
            
            self._model = model
            self._fit = fit
        else:
            self._model = fit.stanmodel
            
        self._update_doc()
            
    def __call__(self, pars, lp=True, grad_lp=True, adjust_transform=True):
            upars = self._fit.unconstrain_pars(pars)
            returns = []
            if lp:
                returns.append(self._fit.log_prob(upars, adjust_transform))
            if grad_lp:
                returns.append(self._fit.grad_log_prob(upars, adjust_transform)))
            return tuple(returns)
    
    def _update_doc(self):
        
        par_dim_pair = ('"{}": {}'.format(item, tuple(dim) if isinstance(dim ,list) else "") for item, dim in zip(self._fit.model_pars, self._fit.par_dims))
        
        doc = ("Calculate log_prob and grad_log_prob for Stan model with data.\n    pars : dict\n    Model parameters (name, shape):\n        {model_pars}\n"
               "    lp : bool\n        return log_density\n    grad_lp : bool\n        return gradient of the log_density\n"
               "    adjust_transform : bool\n        Whether we add the term due to the transform from constrained\n        space to"
               " unconstrained space implicitly done in Stan.").format(model_pars="\n        ".join(par_dim_pair))
        
        self.__doc__ = doc
    
    def __repr__(self):
        return "LogProb and GradLogProb for PyStan model: {}".format(self._fit.model_name)
log_prob = LogProb(data, model_code=model_code)

# run log_prob and grad_log_prob for one point
print(log_prob({"x" : np.ones(10)}))

# (-4795.230045738186,
# array([1042.93067947, 1003.14750843, 1086.25723069, 1050.24616967,
#        1017.67878138, 1113.06497286, 1044.31842895, 1063.42547567,
#         988.48122581, 1058.35411879]))

Save (pickle) the log_prob (class instance) and reuse later

The class wrapper enable one to pickle and reuse the function later.
To get this work, save LogProb class under a file log_prob_class.py
and keep it in the same folder as the script

from log_prob_class import LogProb
log_prob = LogProb(data, model_code=model_code)

import gzip
import pickle

with gzip.open("pickled_log_prob.pickle.gzip", "wb") as f:
    pickle.dump(log_prob, f, protocol=-1)

and later in a new python run

import gzip
import pickle

with gzip.open("pickled_log_prob.pickle.gzip", "rb") as f:
    log_prob = pickle.load(log_prob, f)

import numpy as np
print(log_prob({"x" : np.ones(10)}))

# (-4795.230045738186,
# array([1042.93067947, 1003.14750843, 1086.25723069, 1050.24616967,
#        1017.67878138, 1113.06497286, 1044.31842895, 1063.42547567,
#         988.48122581, 1058.35411879]))

Got it. Would’ve chosen the list, had to adapt the legacy return.

I also like the refactored function better in that I find it clearer what each piece of the code is doing. When you have things like named functions, it becomes self documenting even if you don’t know the low-level nuances of the language (as I obviously don’t for R).

1 Like