Surprising memory usage with eigendecomposition

maedoc · February 7, 2019, 12:06pm

I have made a simple model that constructs a symmetric Jacobian for a linear ODE system, and applies eigenvalues_sym and eigenvectors_sym, for a 164 dimensional system. The expressions are very simple, yet memory usage saturates immediately: top shows 44 GB allocated on a 16 GB system… the input Rdump file is 400KB.

Are there known memory issues with these functions? For reference, I’ve posted the code on this gist.

edit I checked the scaling of memory usage with problem size, and I’m sure I can’t use this as is. Since I only need leading eigenvectors and values, I will find another way to compute them.

bbbales2 · February 8, 2019, 3:01am

Not a bad idea.

Those functions are autodiffed now, though I’m surprised they blow up that badly.

If you scan through this, it shows you how to do eigenvalues_sym: GitHub & BitBucket HTML Preview

I don’t think the derivatives are totally right if you have duplicate eigenvalues and such but that should get you started.

Sounds like you’d need to implement the eigenvectors yourself too. I liked this derivation: http://gifi.stat.ucla.edu/speed/deleeuw_R_07c.pdf .

maedoc · February 8, 2019, 10:43am

Thanks for the links! I see you have N=10 using the builting eigensolver. I also did some scaling tests, and this size works fine. It’s really above N=50 where I see inappropriate memory usage, which now makes sense, looking at the implementation.

By comparison, the Python autograd package implements eigh (equivalent of eigenvectors_sym) gradient explicitly

def grad_eigh(ans, x, UPLO='L'):
    """Gradient for eigenvalues and vectors of a symmetric matrix."""
    N = x.shape[-1]
    w, v = ans              # Eigenvalues, eigenvectors.
    def vjp(g):
        wg, vg = g          # Gradient w.r.t. eigenvalues, eigenvectors.
        w_repeated = anp.repeat(w[..., anp.newaxis], N, axis=-1)
        off_diag = anp.ones((N, N)) - anp.eye(N)
        F = off_diag / (T(w_repeated) - w_repeated + anp.eye(N))
        return _dot(v * wg[..., anp.newaxis, :] + _dot(v, F * _dot(T(v), vg)), T(v))
    return vjp

which is from this paper, but corresponds to (12) in your second link.

Would this be a useful PR to Stan’s math library?

edit the derivative formulas make sense to me now, I’m just not sure how easy it is to implement for Stan.

bbbales2 · February 23, 2019, 9:29pm

Sorry for the radio silence. I screwed up and wasn’t looking at the unread tab to find unread messages :/.

To answer the usefulness question, it’d be super important for eigenvalue problems, though I don’t know how common these are in models. Maybe they’d be more common if they were faster. I think the eigenvalues_sym function and such are primarily used in the Riemannian stuff, and there it’s higher order autodiff (and maybe that explains why this never got a fast reverse-mode implementation?).

The detail I haven’t investigated here is duplicate eigenvalues. I dunno if the derivatives fall apart there or what.

anon75146577 · February 24, 2019, 7:44pm

This paper contains all of the information. I’m not really sure what to do - in practice I’d expect non-simple eigenvalues to be fairly rare (just because of numerical error), so it’s not completely clear to me how much we have to worry about this.

Anyway, the simple case would be easy to program up explicity (the formulas are in the Giles paper)

Maybe someone like @betanalpha has feelings? At least of how we can test how important the non-simple case is.

betanalpha · February 24, 2019, 11:21pm

My understanding is that in the case of degenerate, but real, eigenvalues the typical eigendecomposition algorithms will just return an arbitrary set of orthogonal vectors spanning the degenerate subspace and the derivatives will then be with respect to this arbitrary basis but otherwise well-defined.

In the Riemannian code I don’t use the stan::math::eigenvalue_sym but rather use the Eigen calls directly and then implement the gradients as described in https://arxiv.org/pdf/1212.4693.pdf. These might not be applicable to the general case, however, as I compute the gradients of the quadratic form and log determinant directly and the degeneracy might not propagate through those operations.

Bob_Carpenter · March 12, 2019, 5:51pm

Yes. They’re just autodiffing through the iterative algorithm. The right thing to do is to replace with a reasonable matrix algorithm using the generalized adjoint-vector product.

Topic		Replies	Views
OpenCL symmetric eigendecomposition Developers math	3	431	May 10, 2021
qr_Q using tons of RAM? Modeling	8	782	April 11, 2018
Am I Leaking? [Eigen autodiff] Developers	12	1878	April 29, 2019
Constructing precomputed_gradients with Eigen objects? Modeling	4	537	September 6, 2018
Solving System of ODE with Eigenvalues/Eigenvectors Modeling techniques	4	571	February 6, 2023

Surprising memory usage with eigendecomposition

Related topics