A gradient block?


#1

Some of the models I fit quite regularly have fairly straightforward analytical gradients but are very expensive to evaluate via autodiff. I know it’s in theory possible to hack together my own version of Stan with these analytical gradients defined on the back end, but I’d probably mess that up. Something like this would be extremely handy:

data {
  ...
}
parameters {
  real par1;
  real par2;
}
gradients {
  par1 = some expression;
  // gradient wrt par2 to be left to autodiff
}
model {
  ...
}

This isn’t much of a question, but would something like this be possible in theory?


#2

Can you wrap it into its own function and then code the gradients by hand? You don’t need to recompile Stan to do this:
https://cran.r-project.org/web/packages/rstan/vignettes/external.html

I’ve done this for some parts of a model that have easy analytical gradients and are evaluated in the inner loop of the likelihood function.


#3

Can you provide an example?
I would be nice to have an example prototype about how to extend Stan in C++.


#4

Sure.

I needed to calculate \frac x {1 - x} or \exp(\mbox{logit}(x)) in a hot path, but could not find it in the Stan math library. It has derivative \frac 1 {(1-x)^2}

So I added this code to a separate .hpp

and included it as described in the tutorial.

inline double exp_logit(double u) {
  return u / (1 - u);
}

inline var exp_logit(const var& u) {
  const double uv = u.val();
  const double m_u = (1-uv);
  const double m_u_sq = m_u * m_u;
  
  
  return var(new precomp_v_vari(exp_logit(u.val()),
                                u.vi_,
                                1 / m_u_sq));
}
struct exp_logit_fun {
  template <typename T>
  static inline T fun(const T& x) {
    return exp_logit(x);
  }
};

template <typename T>
inline typename apply_scalar_unary<exp_logit_fun, T>::return_t
exp_logit(const T& x) {
  return apply_scalar_unary<exp_logit_fun, T>::apply(x);
}

template <typename T0__>
Eigen::Matrix<typename boost::math::tools::promote_args<T0__>::type, 
Eigen::Dynamic,1>
exp_logit(const Eigen::Matrix<T0__, Eigen::Dynamic,1>& x, std::ostream* 
pstream__) {
 return exp_logit(x);
}

struct exp_logit_functor__ {
template <typename T0__>
    Eigen::Matrix<typename boost::math::tools::promote_args<T0__>::type, 
Eigen::Dynamic,1>
  operator()(const Eigen::Matrix<T0__, Eigen::Dynamic,1>& x, std::ostream* 
pstream__) const {
return exp_logit(x, pstream__);
}
};

This is pretty much all the Stan C++ code I’ve ever written, so by no means take this to be best practices!


#5

User defined gradients are in Stan Road Map https://github.com/stan-dev/stan/wiki/Stan-Road-Map,
but while waiting for them, what @aaronjg did doesn’t seem that complicated (given that the process of including additional .hpp is not too complicated for you)


#6

I’d trust 0% of users to keep the gradients block properly updated as they evolve their model but I can see the attraction.


#7

User’s autodiff could be cross-checked by Stan’s version at initial phase.


#8

And for random draws after sampling stops. This is also in the plan.


#9

We have to trust our users. There’s so many ways to screw up a Stan program that we can’t really provide much protection.

I’d trust a fair bit more than 0% because some users are careful with software and test as they go.

And as others have noted, we wouldn’t do this without finite diff or autodiff tests.

What we’re likely to add is a way to define a function with gradients, not just define gradients w.r.t. the log density for parameters.

Then users will only define gradients w.r.t. constrained parameters, so we’ll need to chain the transforms and Jacobians for any constrained parameters.


#10

@Bob_Carpenter That sounds even better than a gradients block. Exciting!