Bypassing numerical differentiation with this simple hack?

I feel a bit silly asking this question, but…

I would like to try to compute analytically the gradient of a linear Gaussian state space model in parallel with filtering. I’d prefer to avoid coding the whole thing in c++, so I came up with this Idea.

I do all the computation, log likelihood and the gradient, in Stan and then have a simple c++ function similar to this. Only my function would be something like (I don’t know c++):

double ll(const vector& ll.pars.grad) {  
  return ll.pars.grad[0];
}

var ll(const var& ll.pars.grad) {
      vector<double> grad = vector(length(ll.pars.grad),0.0);
      grad[ind.grad] = ll.pars.grad[ind.grad];
      return var(new precomp_v_vari(ll.pars.grad[0].val(), to_vector(ll.pars.grad), grad));
    }

My function would take as inputs the likelihood and gradient and the parameters declared in the parameters block. The double valued function returns the likelihood. The var valued function returns as gradient the precomputed gradient values and zeros for the “gradient” of the input likelihood and input gradient. The output of the function is then simply added to the “target” in the model block.

Would this work? Or would the gradient evaluation algorithm still go through all the previous transformations?

1 Like

Could you be clearer about what you’re trying to do? Is this supposed to be called from a Stan program? What would a minimal Stan program look like? That would give us some context for answering without just trying to explain Stan’s auto-diff from scratch :/

1 Like

I would like to utilize Stan’s NUT sampler to infer the posteriors of the parameters in a linear Gaussian state space model. If I use the Kalman filter to evaluate the likelihood, the gradient gets complicated and will be slow to evaluate using automatic differentiation.

I want to test if computing the gradients analytically would make things faster. I know that this requires an external c++ function. Since the only reasonable way of computing the gradient is in parallel with the likelihood, it seems that I would need to do everything in one big c++ function that takes as input the parameters, forms the model matrices and their gradients wrt the parameters, loops through the Kalman filter, updates the gradients in parallel, and returns the likelihood and the gradients. As I have no experience in c++, I would like to avoid this.

My question is, is there a way to do all the computations in Stan, in the transformed parameters block, and then call a small c++ function that takes as inputs the untransformed parameters, the computed log likelihood and its computed gradients wrt the parameters and then puts out the likelihood unchanged. If this output is then added to the target in the model block, will Stan simply take as gradients the gradients given by this small c++ function, ignoring the transformations of the parameters done before this function is called?

I hope this makes my problem clearer. And please don’t feel obliged to explain Stan’s auto-diff from scratch, a short answer will do:)

1 Like

I’d like to test my hack with a simple model, but the problem now is, can the type of function I am thinking of, i.e. the equivalent of

double
sinc(const double& x, std::ostream* pstream__) {
  return x != 0.0 ? sin(x) / x : 1.0;
}

var
sinc(const var& x, std::ostream* pstream__) {
  double x_ = x.val();
  double f = x_ != 0.0 ? sin(x_) / x_ : 1.0;
  double dfdx_ = x_ != 0.0 ? (cos(x_) - sin(x_)) / x_ : 0.0;
  return var(new precomp_v_vari(f, x.vi_, dfdx_));
}

be written with vector valued parameters. Is it possible to form the var valued return variable from vectors?

Are you sure you can’t just use Gaussian dynamic linear model? It’s basically a Kalman filter.

The transition matrix is time varying. I also need to handle missing observations.

Thanks for the explanation: if you do the calculations for the gradient in Stan, you’ll be adding a lot of unrelated gradient calculations so no, I don’t think you can use the strategy you are suggesting.

My suggestion is to implement the model in pure Stan first (there are a few Kalman filter implementations kicking around if you search the forums) and when you nail down the changes you need for your specific case it may either a) just work fast enough; or b) require small changes to the C++ code for an existing function. Don’t forget you can also call gaussian_dlm_obs in a loop in Stan with a new transition matrix for each time step.