Issues with inference in infinite factor model

I implemented a minimal version of ADVI in TensorFlow (I’m aware of Edward but I needed something a bit more low-level):

The implementation passes basic tests but I wanted to check on a more involved model by comparing my inference results to Stan’s ADVI and NUTS ones.

I considered the IFM of Bhattacharya and Dunson:

The results from Stan are off, for instance the covariance matrix omega is not recovered in a 10-dimensional case (I generate the data) and as I am a complete beginner in Stan programming I blame this entirely on some implementation mistake. Further evidence of this is that Stan’s ADVI and NUTS inference results shows some level of agreement.

Follow my implementation:

data {
    int<lower=0> N;
    int<lower=0> P;
    int<lower=0> H;
    vector[P]    y[N];
parameters {
    real<lower=0>         a1;
    real<lower=0>         a2;
    vector<lower=0>[H]    delta;
    matrix<lower=0>[P, H] phi;
    matrix[P, H]          lam;
    vector<lower=0>[P]    s2i;
transformed parameters {
    vector[H]    theta;
    vector[P]    s2;
    matrix[P, P] omega;

    theta[1] = delta[1];
    for (h in 2:H)
        theta[h] = theta[h-1] * delta[h];

    s2 = 1.0 ./ s2i;

    omega = diag_matrix(s2) + lam * lam';
model {
    vector[P] zero;
    for (p in 1:P)
        zero[p] = 0.0;

    a1 ~ gamma(2.0, 1.0);

    a2 ~ gamma(2.0, 1.0);

    delta[1] ~ gamma(a1, 1.0);
    for (h in 2:H)
        delta[h] ~ gamma(a2, 1.0);

    for (p in 1:P)
        for (h in 1:H)
            phi[p, h] ~ gamma(3.0 / 2.0, 3.0 / 2.0);

    for (p in 1:P)
        for (h in 1:H)
            lam[p, h] ~ normal(0.0, 1.0 / phi[p, h] / theta[h]);

    for (p in 1:P)
        s2i[p] ~ gamma(1.0, 0.3);

    for (n in 1:N)
        y[n] ~ multi_normal(zero, omega);

Are there any obvious issues, aside from missed vectorization opportunities in the distributions ?

I’m also aware that the 2-level hierarchical structure might not help inference, but I consistently recover omega in my ADVI implementation and the algorithm should be very close, so I must be doing something silliy here…

As an aside, is it expected for zero to be initialized to nan ?

It turns out there were no mistake in the scripts above (well, aside from me ignoring that normal() is parametrized by scale and not variance in Stan).
The disagreement was caused by a bug in the post-processing of the csv files generated by Stan for plotting purposes.

Case closed.

1 Like

Thanks for replying with the resolution.

There are other inefficiencies than just missed vectorizations, but it sounds like you don’t care about efficiency.

I am actually going to implement another model soon which is not too dissimilar, so it would helpful if you could point out what are the other efficiency issues with the code above that I am unaware of.
Thank you in advance!