Logspline density estimation

ShaunMcD · January 31, 2019, 9:08pm

Hi there,

I’m playing with the idea of logspline density estimation in Stan. See these notes (http://www.econ.uiuc.edu/~roger/courses/574/lectures/L18.pdf) for a reasonably non-technical explanation. The idea is that given a sample X = \left(X_1, \ldots, X_n\right), we estimate their density f by exponentiating a linear combination of B-spline basis functions:

f(x) = \frac{\exp{\sum_{j=1}^P \theta_j B_j(t)}}{c(\theta)},

with normalizing constant c(\theta) = \int \exp{\sum_{j=1}^P \theta_j B_j(t)}\mathrm{d}t. Presumably the Stan code would be something like this:

data{
int n; //Size of X-sample
int P; //Dimensionality of spline basis
matrix[n, P] B; //Basis function evaluations at sample X-points

parameters{
vector[P] theta; //Spline coefficients
... //Other smoothing stuff
}

model {
  target += B*theta - log(c);
... //Suitable priors for theta and other smoothing stuff
}

The challenge is with \log(c(\theta)). Although it is a “normalizing constant” for the density, it is not constant with respect to the parameters \theta and so I think it must be included in the log posterior. However, it doesn’t have a closed form - presumably it should therefore be estimated numerically, but I’m not sure if there’s a way to do this efficiently. In particular I imagine the autodiff of the numerical approximation would be a mess.

Does anyone have any experience with this kind of model in Stan - or any kind of density estimation involving some normalizing function? Any insight is greatly appreciated.

bbbales2 · February 2, 2019, 3:56am

Correct

It is, but it can work surprisingly well haha.

The easiest way to do this robustly for now is probably with the ODE integrators: 9.2 Ordinary Differential Equation Solvers | Stan Functions Reference

There’s a 1D quadrature thing that’d be better for this, but it’s been stalled in pull requests since I haven’t pushed it forward (sorry :D!).

You might have some mileage coding up a quadrature manually, but if you do that it’d probably be worth computing an error estimate in generated quantities and tracking it (compute the error estimate by doubling the quadrature node count or whatever and comparing that estimate against the one you use in your model).

ShaunMcD · February 8, 2019, 11:22pm

Thanks Ben!

I’ll give the ODE method a try…although my use case actually involves estimating a lot of densities, so it may become infeasible after all.

Is there a built-in way to get an error estimate from the RKF45 scheme? Since the integral is really “part of the likelihood”, I suppose that a numerical approximation would result in some kind of “approximate posterior” (in addition to the usual MCMC uncertainty).

bbbales2 · February 9, 2019, 1:57am

The ode integrator functions have tolerance arguments that you can use.

Yeah, but just keep the tolerance low enough so that you don’t have to worry about this stuff :D.

@maxbiostat working through doing some custom C++ code that uses integrals and such in a model over here: Problem implementing Zipf (power law) distribution -- external C++ . You might be able to use that here.

It’d probably be faster than the ode integrator for this. This boost integral code is the same stuff that’s being used in the yet unfinished 1d integrator stuff.

Edited for clarity… Twice!

Topic		Replies	Views
Sampling from a density up to normalisation constant General	3	684	January 8, 2019
Hoping for some guidance / help with implementing custom log likelihood and gradient for research project (details below) General	23	2220	October 18, 2021
Custom log-likelihood function speed up Modeling rstan , techniques , math	5	1114	December 15, 2020
Using CmdStan to evaluate an array of log densities Other	5	1333	June 17, 2017
Density functions but not for incrementing log-likelihood General	13	1053	August 26, 2017

Logspline density estimation

Related topics