The function I posted above also isn’t as efficient as it should be because it has to do 1 iteration of sampling every time foo is called. If you just wanted foo as a function of params (with data always the same) then we would only need to do 1 iteration once before creating foo, not every time. That’s an artifact of the current RStan implementation that dates back a long time but should be better in v3.
It should just require 0 iterations, right?
In theory yes but currently we have to run for more than zero or it will error for reasons that predate me, maybe even Ben, not sure.
OK, I guess we can set treedepth=1 so it will take minimal time, but in any case the purpose of this exercise (we can talk and elaborate it a bit) is to demonstrate the functionality that we’d like to have in the future. So it’s fine if the implementation is a bit clunky.
We can talk more.
If you’re willing to wrangle some R lambdas you can already do this.
my_compile <- function(program_file) {
function(data, params) {
fit <- stan(program_file, chains = 0)
log_prob(fit, unconstrain_parameters(fit, params))
}
}
and then use is
posterior <- my_compile('my_program.stan')
lp <- posterior(c(1.2, 3.9, 0.3))
That still requires you to provide the constrained parameters in order as a list. I don’t know if there’s an easy way to have structured parameters the way we’d use for inits. You can also drop the unconstrain_parameters
thing in the function and do everything on the unconstrained scale.
@Bob_Carpenter I see a data
argument to the anonymous function but I don’t see data
getting used anywhere. In order to be able to change the data
I think you’d need something like this:
stan_compile <- function(file) {
comp <- stan_model(file)
function(data, params) {
suppressWarnings({
x <- sampling(comp, data = data, chains = 0)
})
# should only be used if no constrained parameters since grad_log_prob()
# actually wants unconstrained params
grad_and_target <- grad_log_prob(x, upars = params)
out <- list()
out$target <- attr(grad_and_target, "log_prob")
attr(grad_and_target, "log_prob") <- NULL
out$gradient <- grad_and_target
out
}
}
Thanks for catching that. The data needs to go in the call to stan()
.
Why are you suppressing warnings?
I got really lost in this:
out <- list()
out$target <- attr(grad_and_target, "log_prob")
attr(grad_and_target, "log_prob") <- NULL
out$gradient <- grad_and_target
out
Why not just return grad_and_target
? Given that you’re going to make a two-element list, then why not pull out just the log density value for $target
and just the gradient vector for $gradient
? As is, the gradient gets set to the gradient and target after nulling out the target. Is it more efficient in R to code it how you did? Will out$gradient
just be a vector even though it looks like it starts as an object (or list?).
I guess because the target is stored as an attribute in grad_and_target
, but for a user it’s easier to see that’s available if it’s returned as a separate element in a list.
out$gradient
will have the same format as grad_and_target
, but its log_prob
attribute has been nulled out, since it’s returned separately.
I think there’s a few extra switches that can cut down on the initialization (and there’s some extras in my code that are unnecessary!). For ctsem I use something like the function below – using fast=FALSE is still faster than what is in this thread, fast=TRUE version doesn’t give full functionality of the stanfit class, but sufficient for running:
myfit$log_prob(blah blah all arguments must be explicitly named)
stan_reinitsf <- function(model, data,fast=FALSE){
if(fast) sf <- new(model@mk_cppmodule(model),data,0L,rstan:::grab_cxxfun(model@dso))
if(!fast) suppressMessages(suppressWarnings(sf<-sampling(model,iter=0,chains=0,init=0,data=data,check_data=FALSE, control=list(max_treedepth=0),save_warmup=FALSE,test_grad=FALSE)))
return(sf)
}
Basically what @mcol said. I wanted to have a list with slots target
and gradient
as the returned object because that seemed nice, and I didn’t want the gradient slot having an attribute “log_prob” so I NULLed that out.
I could have just done this:
out <- list()
out$target <- log_prob(x, upars = params)
out$gradient <- grad_log_prob(x, upars = params)
but then I’m computing log_prob when I call both functions and there’s also that attribute that gets printed:
print(out)
$target
[1] -6.931472
$gradient
[1] -3
attr(,"log_prob")
[1] -6.931472
I tried doing things step-by-step as I couldn’t understand the code @jonah wrote.
> model <- stan_model(model_code = "parameters { real<lower = 0, upper = 1> theta; }")
> fit <- sampling(model, chains = 0)
the number of chains is less than 1; sampling not done
> glp <- grad_log_prob(fit, upars = c(0.3))
> glp
[1] -0.148885
attr(,"log_prob")
[1] -1.40871
> str(glp)
num -0.149
- attr(*, "log_prob")= num -1.41
I’ve never heard of attr
but it looks like it’s something attached to a number. So the real question is why did you choose that return type for grad_log_prob
and if it’s good enough there, why isn’t it good enough for this function?
So I went the rest of the way to see what happens:
> out <- list()
> out$target <- attr(glp, "log_prob")
> attr(glp, "log_prob") <- NULL
> out$gradient <- glp
> str(out)
List of 2
$ target : num -1.41
$ gradient: num -0.149
It looks like this truly does remove the attribute, at least as far as str()
is concerned.
Is there a way to just get a number out of a number and leave behind all the attributes and something. If we had such a function, let’s say numerical_val_only
, this would be much cleaner with
grad_and_target <- grad_log_prob(x, upars = params)
out <- list()
out$target <- attr(grad_and_target, "log_prob")
attr(grad_and_target, "log_prob") <- NULL
out$gradient <- grad_and_target
out
coded as
glp <- grad_log_prob(x, upars = params)
list(target = attr(glp, "log_prob"),
gradient = numerical_val_only(glp))
That way, nothing gets modified. It’s generally a lot easier to reason about and test code that
- constructs well-formed objects whole (like the return list here rather than building them up incrementally with setters/getters as with the
out
variable in the original code), and - treats objects as immutable once constructed (like
glp
here assuming we can somehow pull out the numerical value without destroying the return type).
It’s not always possible in high-performance code, but when you can do this, you’ll find it makes your life a whole lot easier during testing as the number of side effects goes down and you no longer need tests for valid state everywhere.
I didn’t ;). I’ve been around for a while but that predates me! I wasn’t saying it should be that way, just trying to work with what’s already there.
I like that. Using attr(x, name) <- NULL
is the standard way to do it in R, but I agree it’s suboptimal. Unfortunately removing all attributes of an object doesn’t give you the numerical value you want because R also uses attributes to store information about dimensions. So in R the dim
attribute needs to be preserved in order to keep a matrix a matrix (i,e., to not flatten it). So numerical_val_only
could remove all attributes except a few. Maybe something like this:
# Remove attributes except for information about dimensions
numerical_val_only <- function(x) {
dims <- dim(x)
nms <- dimnames(x)
attributes(x) <- NULL
structure(x, dim = dims, dimnames = nms)
}
Now this works for me:
glp <- grad_log_prob(x, upars = params)
list(target = attr(glp, "log_prob"),
gradient = numerical_val_only(glp))
$target
[1] -6.931472
$gradient
[1] -3
Python example for log_prob and grad_log_prob.
There are two examples. First wraps with function and can not be saved for the later use and the other uses class which can be pickled if saved in external python file and imported with import
.
Setting up data
import pystan
import datetime
import numpy as np
model_code = """
data {
int<lower=1> N;
matrix<lower=0>[N,N] Y;
} parameters {
vector<lower=0>[N] x;
}
model {
for (n in 1:N) {
for (m in 1:N) {
Y[n,m] ~ normal(x[n], x[m]);
}
}
}
"""
data = {
"N" : 10,
"Y" : abs(np.random.randn(10, 10)) + 10
}
Example 1: Function wrap
Simple example wrapping fit
(THIS “fails”, if init fails for the sampling. To fix this, you can manually setup the fit class (as in the class example)).
def wrap_log_prob(fit):
def log_prob(pars, lp=True, grad_lp=True, adjust_transform=True):
upars = fit.unconstrain_pars(pars)
returns = []
if lp:
returns.append(fit.log_prob(upars, adjust_transform))
if grad_lp:
returns.append(fit.grad_log_prob(upars, adjust_transform)))
return tuple(returns)
par_dim_pair = ('"{}": {}'.format(item, tuple(dim) if isinstance(dim ,list) else "") for item, dim in zip(fit.model_pars, fit.par_dims))
doc = ("Calculate log_prob and grad_log_prob for Stan model with data.\n pars : dict\n Model parameters (name, shape):\n {model_pars}\n"
" lp : bool\n return log_density\n grad_lp : bool\n return gradient of the log_density\n"
" adjust_transform : bool\n Whether we add the term due to the transform from constrained\n space to"
" unconstrained space implicitly done in Stan.").format(model_pars="\n ".join(par_dim_pair))
log_prob.__doc__ = doc
return log_prob
stan_model = pystan.StanModel(model_code=model_code)
fit = stan_model.sampling(iter=1)
log_prob = wrap_log_prob(fit)
print(log_prob({"x" : np.ones(10)}))
# (-4795.230045738186,
# array([1042.93067947, 1003.14750843, 1086.25723069, 1050.24616967,
# 1017.67878138, 1113.06497286, 1044.31842895, 1063.42547567,
# 988.48122581, 1058.35411879]))
Example 2: Class wrap
A bit more bloated example using class and manually creating fit without sampling (this could be done also with the previous method)
import pystan
import datetime
class LogProb:
def __init__(self, data, *, model=None, fit=None, model_code=None, file=None):
"""Wrap log_prob and grad_log_prob for Stan model with data.
Parameters
----------
data : dict
model : StanModel, optional
PyStan StanModel object.
fit : StanFit4Model, optional
PyStan fit object.
model_code : str, optional
model code as a string.
file : str, optional
path to .stan file.
"""
if model is None and fit is None:
if all(item is None for item in (model_code, file)):
raise TypeError("Needs one defined {model, fit, model_code, file}")
self._model = model
self._data = data
self._fit = fit
self._model_code = model_code
self._file = file
if fit is None:
if model is None:
model = pystan.StanModel(file=file, model_code=model_code)
seed = pystan.misc._check_seed(None)
fit = model.fit_class(data, seed)
fnames_oi = fit._get_param_fnames_oi()
n_flatnames = len(fnames_oi)
fit.sim = {'samples': [],
'chains': 0,
'iter': 0,
'warmup': 0,
'thin': 1,
'n_save': 0,
'warmup2': 0,
'permutation': [],
'pars_oi': fit._get_param_names_oi(),
'dims_oi': fit._get_param_dims_oi(),
'fnames_oi': fnames_oi,
'n_flatnames': n_flatnames}
fit.model_name = model.model_name
fit.model_pars = fit._get_param_names()[:-1]
fit.par_dims = fit._get_param_dims()[:-1]
fit.mode = 2
fit.inits = {}
fit.stan_args = []
fit.stanmodel = model
fit.date = datetime.datetime.now()
self._model = model
self._fit = fit
else:
self._model = fit.stanmodel
self._update_doc()
def __call__(self, pars, lp=True, grad_lp=True, adjust_transform=True):
upars = self._fit.unconstrain_pars(pars)
returns = []
if lp:
returns.append(self._fit.log_prob(upars, adjust_transform))
if grad_lp:
returns.append(self._fit.grad_log_prob(upars, adjust_transform)))
return tuple(returns)
def _update_doc(self):
par_dim_pair = ('"{}": {}'.format(item, tuple(dim) if isinstance(dim ,list) else "") for item, dim in zip(self._fit.model_pars, self._fit.par_dims))
doc = ("Calculate log_prob and grad_log_prob for Stan model with data.\n pars : dict\n Model parameters (name, shape):\n {model_pars}\n"
" lp : bool\n return log_density\n grad_lp : bool\n return gradient of the log_density\n"
" adjust_transform : bool\n Whether we add the term due to the transform from constrained\n space to"
" unconstrained space implicitly done in Stan.").format(model_pars="\n ".join(par_dim_pair))
self.__doc__ = doc
def __repr__(self):
return "LogProb and GradLogProb for PyStan model: {}".format(self._fit.model_name)
log_prob = LogProb(data, model_code=model_code)
# run log_prob and grad_log_prob for one point
print(log_prob({"x" : np.ones(10)}))
# (-4795.230045738186,
# array([1042.93067947, 1003.14750843, 1086.25723069, 1050.24616967,
# 1017.67878138, 1113.06497286, 1044.31842895, 1063.42547567,
# 988.48122581, 1058.35411879]))
Save (pickle) the log_prob (class instance) and reuse later
The class wrapper enable one to pickle and reuse the function later.
To get this work, save LogProb class under a file log_prob_class.py
and keep it in the same folder as the script
from log_prob_class import LogProb
log_prob = LogProb(data, model_code=model_code)
import gzip
import pickle
with gzip.open("pickled_log_prob.pickle.gzip", "wb") as f:
pickle.dump(log_prob, f, protocol=-1)
and later in a new python run
import gzip
import pickle
with gzip.open("pickled_log_prob.pickle.gzip", "rb") as f:
log_prob = pickle.load(log_prob, f)
import numpy as np
print(log_prob({"x" : np.ones(10)}))
# (-4795.230045738186,
# array([1042.93067947, 1003.14750843, 1086.25723069, 1050.24616967,
# 1017.67878138, 1113.06497286, 1044.31842895, 1063.42547567,
# 988.48122581, 1058.35411879]))
Got it. Would’ve chosen the list, had to adapt the legacy return.
I also like the refactored function better in that I find it clearer what each piece of the code is doing. When you have things like named functions, it becomes self documenting even if you don’t know the low-level nuances of the language (as I obviously don’t for R).