Tagging @avehtari, @anon75146577, @seantalts, @betanalpha.
Latent gaussian models
Suppose we have a latent Gaussian model
\phi \sim \pi(\phi) \\
\theta \sim \mathrm{Normal}(\mu(\phi), \Sigma(\phi)) \\
y_{j \in g(i)} \sim \pi(y | \theta_i, \phi)
When doing a nested Laplace approximation to marginalize out \theta, we can show that \log p(\phi | y) \approx log p(\phi) + \alpha, where:
\alpha = \log p(y | \theta^*, \phi) - \frac{1}{2} \left ( \log \mathrm{det}|\Sigma_\phi| + \log \mathrm{det}|H| + [\theta^* - \mu(\phi)]^T \Sigma^{-1}_\phi [\theta^* - \mu(\phi)] \right)
where \theta^* is the mode of p(\theta | y, \phi), and H = \nabla^2 \log p(\theta | y, \phi), with the Hessian taken with respect to \theta.
Function specification
The mode is found using an algebraic solver, and the Hessian is analytically known for certain observational models, notably poisson, normal, binomial, and negative-binomial. For these cases, we can write a Stan function that automates the calculation of \alpha. A Stan model might then look as:
functions {
// functor which return individual elements of vectors / matrix
vector mu_phi(real phi, real y, int i) { }
matrix sigma_phi(real phi, real y, int i, int j) { }
}
data {
int M; // number of groups
int N; // number of observation
int/real y[N];
int index[N]; // group for each observation
}
parameters {
real phi;
}
transformed parameters {
// vector mu_phi = ...;
// matrix sigma_phi = ...;
static vector theta_0[M] = to_vector(rep_array(0, M));
}
model {
phi ~ p(.);
target += laplace_lgp_poisson(theta_0, mu_phi, sigma_phi, phi, y, index);
// or
target += laplace_lgp_binomial(theta_0, mu_phi, sigma_phi, phi, y, index);
}
Remark 1: the laplace
functions take functor as an argument, so itās a bit of pain to implement them. It might be worth waiting for Stan 3 and use lambda functions.
Remark 2: sigma_phi and mu_phi are functors that compute the individual elements of a matrix. This allows more localization, improves memory, and generates smaller autodiff trees. See the function spec for Gaussian processes.
Remark 3: in many cases, \Sigma_\phi is sparse.
Finding the curvature
We would like to generalize to a broader class of likelihoods. The issue is computing H = \nabla^2 \log p(y | \theta, \phi) - \Sigma_\phi. Automatically computing the Hessian of an arbitrary function can be costly, though it is feasible. In practice (according to Dan), modelers work out the Hessian and provide it. We could allow users to specify the Hessian, and pass it to laplace_lgp_general
.
Finding the mode
Finding the mode is a high-dimensional root-finding problem. Since the problem is (approximately) convex, a candidate approach is to use a Newton solver, in particular the KINSOL implementation, see earlier post. In particular, we want to consider parallelizing the solver, which can be done separately from calculating the autodiff.
We also want to use an initial guess \theta_0 (or theta_0
in the code), based on solutions in previous iterations, to gain speedup. Sean and I discussed this a little bit. Note this is applicable to any problem that uses the solver. Basically, theta_0
is declared as a static variable, and persists across iterations. This could be a reasonable scheme if the leapfrog steps are small enough when simulating Hamiltonian trajectories.
static vector theta_0 = 0;
theta_0 = algebra_solver(system, theta_0, parm, x_r, x_i);
See link. To safeguard against misuses, we could only accept static variables as the argument for certain functions (e.g. initial guesses).