Surprising memory usage with eigendecomposition


#1

I have made a simple model that constructs a symmetric Jacobian for a linear ODE system, and applies eigenvalues_sym and eigenvectors_sym, for a 164 dimensional system. The expressions are very simple, yet memory usage saturates immediately: top shows 44 GB allocated on a 16 GB system… the input Rdump file is 400KB.

Are there known memory issues with these functions? For reference, I’ve posted the code on this gist.

edit I checked the scaling of memory usage with problem size, and I’m sure I can’t use this as is. Since I only need leading eigenvectors and values, I will find another way to compute them.


#2

Not a bad idea.

Those functions are autodiffed now, though I’m surprised they blow up that badly.

If you scan through this, it shows you how to do eigenvalues_sym: http://htmlpreview.github.io/?https://github.com/bbbales2/stancon_2018/blob/master/rus.html

I don’t think the derivatives are totally right if you have duplicate eigenvalues and such but that should get you started.

Sounds like you’d need to implement the eigenvectors yourself too. I liked this derivation: http://gifi.stat.ucla.edu/speed/deleeuw_R_07c.pdf .


#3

Thanks for the links! I see you have N=10 using the builting eigensolver. I also did some scaling tests, and this size works fine. It’s really above N=50 where I see inappropriate memory usage, which now makes sense, looking at the implementation.

By comparison, the Python autograd package implements eigh (equivalent of eigenvectors_sym) gradient explicitly

def grad_eigh(ans, x, UPLO='L'):
    """Gradient for eigenvalues and vectors of a symmetric matrix."""
    N = x.shape[-1]
    w, v = ans              # Eigenvalues, eigenvectors.
    def vjp(g):
        wg, vg = g          # Gradient w.r.t. eigenvalues, eigenvectors.
        w_repeated = anp.repeat(w[..., anp.newaxis], N, axis=-1)
        off_diag = anp.ones((N, N)) - anp.eye(N)
        F = off_diag / (T(w_repeated) - w_repeated + anp.eye(N))
        return _dot(v * wg[..., anp.newaxis, :] + _dot(v, F * _dot(T(v), vg)), T(v))
    return vjp

which is from this paper, but corresponds to (12) in your second link.

Would this be a useful PR to Stan’s math library?

edit the derivative formulas make sense to me now, I’m just not sure how easy it is to implement for Stan.