Vectorized != loop

Hi All,

I’m a newbie to stan and statistical modeling, and have been successfully avoiding posting dumb questions to this forum… until now.
I’m experimenting with a very simple autoregression model in 2D space-time and noticed that if I use a loop, vs the recommended vectorization, I get different results for parameter ‘b’. I’m specifying the seed and initial values for parameters in my call to Stan sampling (sm.sampling(data=tg_dat, init=init_vals, seed=10, …)
The model code and a figure showing diffences are attached; I can provide data to run the model if necessary, but am guessing this is a stupid error on my part that can be easily addressed.

Thank a lot for any help,
CC


            data {
              int<lower=0> N;  // number of locations
              int<lower=0> K; // number of times
              matrix[K,N] z; // observations
              matrix[N,N] D; // covariance
              vector[K]   T; // times (centered at 0)
            }
            parameters {
              real r;
              real phi;
              real sigma_2;

              vector[N] b;
              real mu;
              real pi_2;

            }

            transformed parameters {
                matrix[K,N] yhat;
                matrix[N,N] SIG;
                SIG   = sigma_2*exp(-phi*D);

                for (n in 1:N) {
                    yhat[1,n] = 0;
                    for (k in 2:K) {
                        //yhat[k,n] = r * z[k-1,n];
                        yhat[k,n] = r * (z[k-1,n] - b[n]*T[k-1]) + b[n]*T[k];
                    }
                }
            }

            model {
                // priors
                r   ~ uniform(0, 1);
                phi ~ lognormal(-7, 5);
                sigma_2 ~ inv_gamma(0.5, 0.0004033337821613455);

                // for (n in 1:N) { b[n] ~ normal(mu, pi_2); } // COMPARING THIS WITH LINE BELOW
                b ~ normal(mu, pi_2);

                mu      ~ normal(0.0034202389013296727, 0.0001013002175191524);
                pi_2    ~ inv_gamma(0.5, 5.06501087595762e-07);

                // model
                for (k in 1:K)
                  z[k] ~ multi_normal(yhat[k], SIG); // likelihood
            }

1 Like

Just eyeballing it, I can’t see a reason for them to differ. If you could post some fake/toy data then I could debug locally.

How big is the difference? Can you post the summary from each run?

1 Like

@andrjohns thanks for taking a look. I’ll get some toy data up there tomorrow. Are you able to see the plot I added? It shows the difference for each kept run. The difference looks to be on the order of 0.0001 .

Oh a difference that small is completely fine/expected. The vectorised calculations on the backend are different to a simple loop, so there will be some minor variation between the two approaches, there’s nothing to worry about here

1 Like

Actually could you still post full test code for how you generated stuff for that plot? This is still surprising to me. There might be some differences, but if this plot is just vectorized vs. non-vectorized that’s bad.

Also beware that in Stan the second argument to a normal is the standard deviation, not the variance.

3 Likes

There are vertical lines visible in the plot at b,looped = 0.0034, which suggests there might be sampling/convergence issues?

1 Like

Thanks for all the feedback everyone.
I’m attaching a full script for pystan and updated plot (increased sampling, and fixed a sqrt- thanks @bbbales2).
Look forward to additional comments.
Sorry the code is very ugly.

  • CC

AR1_2D_Test.py (4.9 KB)
b_cmp2

Had a look. There must be something going on in the reproducibility part of this. Like maybe some weird seed thing or something.

I took the two versions of the model and then computed gradients at a random point in unconstrained space and they seemed to be the same:

mod = stan("testmodel.stan", data = list(N = N, K = K, z = z, D = D, T = T_), iter = 2, chains = 1)
mod2 = stan("testmodel2.stan", data = list(N = N, K = K, z = z, D = D, T = T_), iter = 2, chains = 1)

P = get_num_upars(mod2)
upar = rnorm(P)
> grad_log_prob(mod, upar)
[1]  3.191917e+03  4.279200e-10  2.308689e+03 -2.610314e+03  1.364886e+03 -4.076887e-01 -1.430043e+00
attr(,"log_prob")
[1] -2300.139
> grad_log_prob(mod2, upar)
[1]  3.191917e+03  4.279200e-10  2.308689e+03 -2.610314e+03  1.364886e+03 -4.076887e-01 -1.430043e+00
attr(,"log_prob")
[1] -2300.139

Also, you should add constraints to your parameters to match the support of the distributions you want to sample. You need to do both, or you’ll get errors at runtime.

So like if you do:

r ~ uniform(0.5, 1.5);

Then if r is a parameter you should constrain it like:

real<lower = 0.5, upper = 1.5> r;

Similarly standard deviation/variance parameters should be lower bounded by zero like:

real<lower = 0.0> pi_2;

Thanks for taking a look. I’m a bit confused by what you mean by reproducibility. I see that you weren’t able to reproduce the issue on a simpler case but were you able to on the larger test I supplied?

Either way, I guess I’m not sure where this leaves us. It seems not to be an issue for my results so I have been ignoring it. Let me know if there is anything else you would like me to do.

Thanks for the tip on the constraints!

So it seems your code is working fine. It is just looks like a convergence issue. If you look at the two charts you gave, they look like they might have symptoms of convergence issues. I’m not sure if you have done a trace plot of your loglikelihood (ll) accross chains. It might show that your warmup isn’t long enough or that your data has peculiarities that allow a potentially unstable or multimodal solution for b i.e a linear time trend for each location might not represent your data well, giving a couple of different ways to fit.

In the end it looks like you should have a look at the trace of the ll for each chain to verify good mixing, then it is likely your model will do better with a longer warmup and more iterations. This is just from looking at how clustered the data on the charts seems to be. It should be a bit smoother.

Efficiency wise, I suggest that the last lines of your code make use of map_rect() or reduce_sum() to reduce your code run time if it seems a bit slow for you.

Thanks. I haven’t done much diagnostic checking yet - this is actually just a subset of the model I put together for this example. I tried running the full model and that’s a whole new beast, as it has one or more issues that are causing it to take forever. Still getting to the bottom of my own mistakes, but you will probably see a new thread started sometime soon :).

Yes,

You are using python, but it looks like you need to borrow shinystan from r for diagnostics.

unfortunately there doesn’t seem to be an easy way to use shinystan from python at this stage:
Good workflow for getting samples from PyStan (python) into ShinyStan ®? - General - The Stan Forums (mc-stan.org)

So that might be the general issue you are facing. It might be smart to try importing your data into r as you can then make use of brms and shinystan, which could speed your development time considerably. Easy to pick up, and totally worth your efforts I suggest.

Oh I meant trying to start two different models at exactly the same point with the same rng seed etc.

In this case the two different models should be the same up to order of addition and whatnot, so it seems like it might be possible to get them to match.

But anyway the easier way to make this check is to just compare the gradients, and those match. This makes makes me think there isn’t a problem here (and the reason the two chain experiments shows differences is some problem with starting two models in exactly the same point, or some slight numerical wiggles, or something like that)

1 Like