A sufficient implementation of the multivariate normal lpdf

markjrieke · June 7, 2026, 12:21pm

I worked out a sufficient implementation of the multivariate normal. Compute the element-wise sums of \sum \mathbf{y} \mathbf{y}^{\text{T}} and \sum \mathbf{y}^\text{T} (xxts and xts in the function below), then pass to the function below (along with the number of observations, n) to get a dramatic speed boost (depending on dataset size and number of unique cells you can summarize to).

Wrote about the implementation here — there is very likely a better version of this function that could be written, but works as-is for now!

real multi_normal_sufficient_lpdf(
  matrix xxts,
  row_vector xts,
  real n,
  vector mu,
  matrix Sigma
) {
  int K = size(mu);
  matrix[K,K] invs = inverse(Sigma);
  real lp = 0.0;
  lp += (-n * K) / 2 * log(2 * pi());
  lp += -n/2 * log_determinant(Sigma);
  lp += -0.5 * trace(Sigma \ xxts);
  lp += trace(invs * mu * xts);
  lp += -n/2 * (mu' * invs * mu);
  return lp;
}

avehtari · June 8, 2026, 6:33pm

In the current version you have three calls (inverse, log_determinant, ) which all do the same O(n^3) matrix factorization. Following code does it only once

real multi_normal_sufficient_lpdf(
  matrix xxts,
  row_vector xts,
  real n,
  vector mu,
  matrix Sigma
) {
  int K = size(mu);

  matrix[K, K] L = cholesky_decompose(Sigma);
  vector[K] Linv_mu  = mdivide_left_tri_low(L, mu);
  vector[K] Linv_xts = mdivide_left_tri_low(L, xts');
  matrix[K, K] Linv_xxts = mdivide_left_tri_low(L, xxts);
  real log_det_Sigma = 2 * sum(log(diagonal(L)));
  real tr_Sinv_xxts = trace(mdivide_left_tri_low(L, Linv_xxts'));
  real lp = 0.0;
  lp += -0.5 * n * K * log(2 * pi());
  lp += -0.5 * n * log_det_Sigma;
  lp += -0.5 * tr_Sinv_xxts;
  lp += dot_product(Linv_xts, Linv_mu);
  lp += -0.5 * n * dot_self(Linv_mu);
  return lp;
}

so this should be 3 times faster.

Bob_Carpenter · June 9, 2026, 3:00pm

@avehtari—when will this implementation be faster than what we actually do in Stan? For the Stan implementation, we do all the factoring, then iterate over the inputs, each of which spawns an mdivide_left_ldlt. Is there a size at which it’d be better to implement this way? If so, I’m guessing it’s something @stevebronder could knock out in no time.

Here’s what we do now (the autodiff’s all in prim now for functions that don’t involve autodiff):

github.com/stan-dev/math

stan/math/prim/prob/multi_normal_lpdf.hpp

develop

#ifndef STAN_MATH_PRIM_PROB_MULTI_NORMAL_LPDF_HPP
#define STAN_MATH_PRIM_PROB_MULTI_NORMAL_LPDF_HPP

#include <stan/math/prim/meta.hpp>
#include <stan/math/prim/err.hpp>
#include <stan/math/prim/fun/as_column_vector_or_scalar.hpp>
#include <stan/math/prim/fun/constants.hpp>
#include <stan/math/prim/fun/dot_product.hpp>
#include <stan/math/prim/fun/eval.hpp>
#include <stan/math/prim/fun/log.hpp>
#include <stan/math/prim/fun/log_determinant_ldlt.hpp>
#include <stan/math/prim/fun/max_size_mvt.hpp>
#include <stan/math/prim/fun/mdivide_left_ldlt.hpp>
#include <stan/math/prim/fun/size_mvt.hpp>
#include <stan/math/prim/fun/sum.hpp>
#include <stan/math/prim/fun/to_ref.hpp>
#include <stan/math/prim/fun/transpose.hpp>
#include <stan/math/prim/fun/vector_seq_view.hpp>
#include <stan/math/prim/functor/partials_propagator.hpp>

This file has been truncated. show original

avehtari · June 9, 2026, 4:35pm

Yeah, I did consider also mentioning multi_normal_lpdf, but in a hurry focused on pointing out that the original code did do matrix factorization three times. I didn’t have time to think how to use it for the sufficient statistic version, but that would be a useful addition for this thread

avehtari · June 9, 2026, 5:25pm

Ok, I did check and it really is the sufficient statistic part that is making multi_normal_lpdf not applicable or if it would be used inside it would add another unnecessary factorization

Otherwise multi_normal_lpdf is also doing just one factorization, but instead of Cholesky doing LDL-factorization (aka square-root-free Cholesky decomposition). I’m just used to use Cholesky, but LDL is often a better choice, so what multi_normal_lpdf is doing is just fine

jmh530 · June 9, 2026, 5:47pm

This version also could be trivially changed to take L as an input instead of Sigma. And then this version could just calculate L and pass it to that version of the function.

And the blog post also has the math for doing a version of the function that takes the inverse of the covariance matrix as an input.

Topic		Replies	Views
What is the point of the multi_normal_cholesky parametrization? Modeling	9	447	February 3, 2025
About computation speed difference between multi_normal_lpdf and lpdf matrix operation General specification	2	564	July 30, 2020
Multi normal pdf with low rank covariance matrix Developers features , math	4	1014	February 13, 2020
Documentation - multi_normal_cholesky_lpdf Modeling specification	4	1379	January 28, 2019
Vectorized use of multi_normal_lpdf Modeling performance	3	547	February 5, 2023

A sufficient implementation of the multivariate normal lpdf

Related topics