I’m a newbie to stan and statistical modeling, and have been successfully avoiding posting dumb questions to this forum… until now.
I’m experimenting with a very simple autoregression model in 2D space-time and noticed that if I use a loop, vs the recommended vectorization, I get different results for parameter ‘b’. I’m specifying the seed and initial values for parameters in my call to Stan sampling (sm.sampling(data=tg_dat, init=init_vals, seed=10, …)
The model code and a figure showing diffences are attached; I can provide data to run the model if necessary, but am guessing this is a stupid error on my part that can be easily addressed.
Thank a lot for any help,
CC
data {
int<lower=0> N; // number of locations
int<lower=0> K; // number of times
matrix[K,N] z; // observations
matrix[N,N] D; // covariance
vector[K] T; // times (centered at 0)
}
parameters {
real r;
real phi;
real sigma_2;
vector[N] b;
real mu;
real pi_2;
}
transformed parameters {
matrix[K,N] yhat;
matrix[N,N] SIG;
SIG = sigma_2*exp(-phi*D);
for (n in 1:N) {
yhat[1,n] = 0;
for (k in 2:K) {
//yhat[k,n] = r * z[k-1,n];
yhat[k,n] = r * (z[k-1,n] - b[n]*T[k-1]) + b[n]*T[k];
}
}
}
model {
// priors
r ~ uniform(0, 1);
phi ~ lognormal(-7, 5);
sigma_2 ~ inv_gamma(0.5, 0.0004033337821613455);
// for (n in 1:N) { b[n] ~ normal(mu, pi_2); } // COMPARING THIS WITH LINE BELOW
b ~ normal(mu, pi_2);
mu ~ normal(0.0034202389013296727, 0.0001013002175191524);
pi_2 ~ inv_gamma(0.5, 5.06501087595762e-07);
// model
for (k in 1:K)
z[k] ~ multi_normal(yhat[k], SIG); // likelihood
}
@andrjohns thanks for taking a look. I’ll get some toy data up there tomorrow. Are you able to see the plot I added? It shows the difference for each kept run. The difference looks to be on the order of 0.0001 .
Oh a difference that small is completely fine/expected. The vectorised calculations on the backend are different to a simple loop, so there will be some minor variation between the two approaches, there’s nothing to worry about here
Actually could you still post full test code for how you generated stuff for that plot? This is still surprising to me. There might be some differences, but if this plot is just vectorized vs. non-vectorized that’s bad.
Also beware that in Stan the second argument to a normal is the standard deviation, not the variance.
Thanks for all the feedback everyone.
I’m attaching a full script for pystan and updated plot (increased sampling, and fixed a sqrt- thanks @bbbales2).
Look forward to additional comments.
Sorry the code is very ugly.
Had a look. There must be something going on in the reproducibility part of this. Like maybe some weird seed thing or something.
I took the two versions of the model and then computed gradients at a random point in unconstrained space and they seemed to be the same:
mod = stan("testmodel.stan", data = list(N = N, K = K, z = z, D = D, T = T_), iter = 2, chains = 1)
mod2 = stan("testmodel2.stan", data = list(N = N, K = K, z = z, D = D, T = T_), iter = 2, chains = 1)
P = get_num_upars(mod2)
upar = rnorm(P)
Also, you should add constraints to your parameters to match the support of the distributions you want to sample. You need to do both, or you’ll get errors at runtime.
So like if you do:
r ~ uniform(0.5, 1.5);
Then if r is a parameter you should constrain it like:
real<lower = 0.5, upper = 1.5> r;
Similarly standard deviation/variance parameters should be lower bounded by zero like:
Thanks for taking a look. I’m a bit confused by what you mean by reproducibility. I see that you weren’t able to reproduce the issue on a simpler case but were you able to on the larger test I supplied?
Either way, I guess I’m not sure where this leaves us. It seems not to be an issue for my results so I have been ignoring it. Let me know if there is anything else you would like me to do.
So it seems your code is working fine. It is just looks like a convergence issue. If you look at the two charts you gave, they look like they might have symptoms of convergence issues. I’m not sure if you have done a trace plot of your loglikelihood (ll) accross chains. It might show that your warmup isn’t long enough or that your data has peculiarities that allow a potentially unstable or multimodal solution for b i.e a linear time trend for each location might not represent your data well, giving a couple of different ways to fit.
In the end it looks like you should have a look at the trace of the ll for each chain to verify good mixing, then it is likely your model will do better with a longer warmup and more iterations. This is just from looking at how clustered the data on the charts seems to be. It should be a bit smoother.
Efficiency wise, I suggest that the last lines of your code make use of map_rect() or reduce_sum() to reduce your code run time if it seems a bit slow for you.
Thanks. I haven’t done much diagnostic checking yet - this is actually just a subset of the model I put together for this example. I tried running the full model and that’s a whole new beast, as it has one or more issues that are causing it to take forever. Still getting to the bottom of my own mistakes, but you will probably see a new thread started sometime soon :).
So that might be the general issue you are facing. It might be smart to try importing your data into r as you can then make use of brms and shinystan, which could speed your development time considerably. Easy to pick up, and totally worth your efforts I suggest.
Oh I meant trying to start two different models at exactly the same point with the same rng seed etc.
In this case the two different models should be the same up to order of addition and whatnot, so it seems like it might be possible to get them to match.
But anyway the easier way to make this check is to just compare the gradients, and those match. This makes makes me think there isn’t a problem here (and the reason the two chain experiments shows differences is some problem with starting two models in exactly the same point, or some slight numerical wiggles, or something like that)