It appears that Rstan has some of this functionality via the grad_log_prob() function for the stanfit object: log_prob and grad_log_prob functions — log_prob-methods • rstan.
So the idea is that you create a custom lp in a .stan file, use Rstan to sample with 0 chains, which throws a warning but still creates the stanfit object. Then use grad_log_prob() to get an S4 class function. Since the Stan model is called with the full data as an argument (vector for y and matrix for X) this seems to partially resolve the question of applying to a vector of input data.
I recently used some awkward string manipulations in R along with the deriv() function to create functions for the gradient and hessian of lp for a logistic regression for a toy aymptotics example. I know this is a simple example that others have asked about (Automatic differentiation with stan math - #16 by bbbales2) but it demonstrate the usefulness of stan and autodiff which readily applies to a general lp. See comparisons below.
One current limitation with this Rstan solution is that only the gradient of the lp and not the hessian is provided. Is there a way to get the hessian via Rstan, or is a more custom approach via Rcpp and StanHeaders the best option?
Another possible limitation is if we generated new data (y,X), currently we’d need to create another stanfit object with this data as input. Is there a way to create an object like that produced by grad_log_prob() that also accepts both the parameters and the data as input?
RCode:
#compare deriv and Rstan for evaluating gradient and hessian
#Logistic regression with d covariates using deriv#
##Define lp and calculate grad and hessian##
#change d > 0 and repeat
d <- 1
xbeta <- paste("beta0 +", paste("beta", 1:d, "*", "x", 1:d, sep = "", collapse = " + "), sep = "")
beta <- paste("beta", 0:d, sep = "")
xd <- paste("x", 1:d, sep = "")
frm.str <- paste("~ (y *log((1+ exp(-1*(",
xbeta,
")) )^-1) + (1-y)*log(1-(1+ exp(-1*(",
xbeta,
")) )^-1))", sep = "")
arg <- c("y", xd, beta)
#R function for gradient and hessian
d1th <- deriv(as.formula(frm.str), namevec =beta, function.arg = arg, hessian = TRUE, func = TRUE)
#Generate y and set beta
n <- 100
beta0 <- 1
for(i in 1:d){
eval(parse(text = paste("x", i," <- rnorm(n,0,1)", sep = "") ) )
eval(parse(text = paste("beta", i," <- 1/i", sep = "") ) )
}
#linear predictor
mu <- eval(parse(text = xbeta))
theta <- plogis(mu)
#generate y
y1 <- rbinom(n = n, size = 1, prob = theta)
#evaluate over a vector of y, x values, beta
argv <- paste(c("y1", paste("x", 1:d, sep = ""), paste("beta", 0:d, sep = "")), collapse = " , ")
Dth <- eval(parse(text = paste("d1th(", argv ,")", collapse =" ") ) )
#sum over observations to get total gradient and hessian
lphat1 <- sum(Dth)
Hhat1 <- colSums(attr(Dth, "hessian"), dims = 1)
Ghat1 <- colSums(attr(Dth, "gradient"))
Output using deriv()
lphat1
[1] -44.42416
Ghat1
beta0 beta1
5.9741254 0.5186645
Using Rstan
#Reproduce in Rstan?
library(rstan)
mod <- stan_model('logistic.stan')# compile stan code
y <- y1
X <- eval(parse(
text = paste(c("cbind(1, ",paste("x", 1:d, sep = "", collapse =", "), " )"), collapse = "")
))
k <- dim(X)[2]
n <- length(y)
out_stan <- sampling(object = mod, data = list(y = as.vector(y), X = X, k = k, n = n),
pars = c("beta"),
chains = 0,
)
betav <- eval(parse(
text = paste(c("c(", paste(beta, collapse = ", "), ")"), collapse ="")
))
tfun <- function(beta) log_prob(out_stan, beta)
tgrfun <- function(beta) grad_log_prob(out_stan, beta)
tfun(betav)
tgrfun(betav)#compare to lphat1 and Ghat1
Output using Rstan
tgrfun(betav)#compare to lphat1 and Ghat1
[1] 5.9741254 0.5186645
attr(,“log_prob”)
[1] -44.42416
Stan file (logistic.stan)
/* Logistic regression*/
data {
int<lower=1> n; // number of observations
int<lower=1> k; // number of linear predictors
int<lower=0, upper = 1> y[n]; // Response variable
matrix[n, k] X; // coefficient matrix
}
parameters{
vector[k] beta; /* regression coefficients from linear predictor */
}
transformed parameters{
vector[n] mu;
mu = X * beta; /* linear predictor */
} /* end transformed parameters block */
model{
/*improper prior on beta in (-inf,inf)*/
/* directly update the log-probability for sampling */
target += bernoulli_logit_lpmf(y | mu);
} /* end model{} block */