First of all, apologizes for the length of this post. We (myself and Paqui Corpas, my PhD student), Mitzi and Bob have been changing e-mails for more than a month about issues related to this example and we feel that now could be a good moment to get more people involved in this conversation. My main goal with all this is applying Stan to large disease mapping problems making it possible to deal with larger datasets than those studied in WinBUGS. I have been a WinBUGS user for disease mapping problems for more than a decade and I feel that I should make room for new tools to apply them to these large datasets. I have some experience also with INLA but it seems a bit restrictive for some of the models that I am using and, sincerely, I prefer MCMC, so Stan has been the first alternative tool that I have considered.
We have read with interest Mitzi’s work on ICAR-based spatial models. First, I would like to note one issue on Mitzi’s current implementation of the BYM model (file: bym_predictor_plus_offset.stan file in the /stan-dev/example-models Github repo at: https://github.com/stan-dev/example-models/tree/master/knitr/car-iar-poisson). Lines 28 to 32 of that model state:
real<lower=0> sigma_phi = inv(sqrt(tau_phi)); // convert precision to sigma
phi[1:(N - 1)] = phi_std_raw;
phi[N] = -sum(phi_std_raw);
phi = phi * sigma_phi;
Thus, if we understand it appropriately, phi is a re-scaled ICAR, with standard deviation sigma_phi. On the other hand, line 37 in the model states:
target += -0.5 * dot_self(phi[node1] - phi[node2]);
Which corresponds to an ICAR distribution of standard deviation equal to 1 (instead of sigma_phi) so this does not agree with the previous lines of code. If we are right (we would appreciate your opinion on this), this could be fixed by making something like
real<lower=0> sigma_phi = inv(sqrt(tau_phi)); // convert precision to sigma
phi_std[1:(N - 1)] = phi_std_raw;
phi_std[N] = -sum(phi_std_raw);
vector[N] phi = phi_std * sigma_phi;
target += -0.5 * dot_self(phi_std[node1] - phi_std[node2]);
since phi_std (but not phi) follows an ICAR(1) distribution. Regretfully, we haven’t gotten this to work in Stan (it crashes surely because our proficiency is still low with Stan), although we have programmed a slightly modified version of this (phi is not defined as a transformed_parameter but it is computed within the model) that do works, file: Resul-BYM-WithPriors-Stan.pdf in the Github repo fisabio/BYM-stan at https://github.com/fisabio/BYM-stan. The structure of this (and all subsequent) file(s) is the same: 1.- syntax of the model run 2.- syntax used to run the model 3.- Descriptive results.
We have run the original “bym_predictor_plus_offset.stan” model on several of our datasets (same region of study but different causes of death), results are shown in file Resul-ModelMitzi-Stan.pdf in the Github repo above fisabio/BYM-stan, from now on all the documents that we will mention will be available there. We see that posterior summaries of sigma_phi are identical for all datasets where convergence is achieved. Note that the prior mass of the values sampled for sigma_phi, around 0.016, is rather low (prior used for tau_phi is a gamma(1, 1) which places most of the prior mass for sigma_phi above 0.3). So this prior does not explain these identical results and reinforce us to think that the question raised above could really be a concerning issue. On the other hand, we have implemented the changes mentioned above on Mitzi’s current model (model and results are available in Resul-BYM-WithPriors-Stan.pdf). In this case we do not find identical posterior results on sigma_phi, evidencing more sensible results.
Nevertheless, although results in this last model seem sensible most of the causes have considerable part of the posterior mass of sigma_phi below 0.3, despite of the gamma prior used puts low probability there. This is why we have changed those gamma informative priors to uniform priors on the sds, which was the original priors that we wanted to implement originally. We do not want to get into the appropriateness of these or other priors here since our interest is more computational. Nevertheless, we will use these uniform priors on the sds and flat improper priors on the random effects since we have years of extensive experience with them in WinBUGS and we have retrieved in all cases apparently sensible results. Thus, we have implemented the last of the models mentioned in Stan but with flat improper priors for both fixed effects (intercept) and sds of random effects (model and results at Resul-BYM-NoPriors-Stan.pdf). We have also implemented that same model in WinBUGS (model and results at Resul-BYM-WinBUGS.pdf). Results for both models, when convergence is achieved, are mostly identical, at least for the sds so we consider these two implementations of the BYM model in both different packages to be equivalent.
Now we go into the original question that motivated us to get in contact with Bob and Mitzi, Stan’s efficiency in large datasets. The datasets in our example correspond to mortality data in Spain at municipal level (around 8.000 units of analysis). So this seems a large dataset where Stan should show its advantage in comparison to WinBUGS. I apologize for not sharing these datasets but confidentiality issues (small area data) do not allow me, at this moment, to share them. As mentioned, we run the BYM model with flat priors in both WinBUGS and Stan for our datasets (results in files Resul-BYM-NoPriors-Stan.pdf and Resul-BYM-WinBUGS.pdf). As shown, when convergence is achieved, posterior inference on the sds coincides for both implementations. This is good news. Nevertheless we are a bit surprised of Stan’s computational performance as compared to that of WinBUGS. First, we see that for six of our datasets (datasets: 9,10,11,28,30,31) convergence is not achieved in Stan. WinBUGS, on the other hand, just had 3 datasets with (minor in some cases) convergence problems (datasets: 13,17,37): n.eff<100. Moreover, although Stan run for 4400 iterations and WinBUGS run for 11000, WinBUGS runs much faster than Stan. To be fair, the number of effective of iterations of each software should be taken into account in this comparison. But even in that case Stan’s performance is just similar to that of WinBUGS. Specifically, for Stan (results in document Resul-BYM-NoPriors-Stan.pdf), leaving out the 6 datasets with bad convergence, we find that 19.2 seconds are due for increasing the n.eff in unit (per average in all datasets). On the other hand, for WinBUGS (results in document Resul-BYM-WinBUGS.pdf), leaving out the 3 datasets with bad convergence, we find that 18.9 seconds are needed per unit of n.eff. Therefore, WinBUGS seems at least for these datasets as competitive as Stan. Just to let you know min.n.eff in the results mean the minimum n.eff for all the variables retrieved in each model and for comparability the same number of cores (3) have been used for running the Stan and WinBUGS version of the BYM model.
Maybe one could think that priors used could be having some effect on computing times. In our opinion priors are not so important. Results in Resul-BYM-WithPriors-Stan.pdf show computing times for the BYM stan models with the gamma informative priors in Mitzi’s model and Results in Resul-BYM-NoPriors-Stan.pdf show computing times for that model but with improper flat uniform priors for fixed effects and sds of random effects. Results for both models show no significant difference between both models so, this does not seem at a first glance a decisive issue in terms of computational efficiency. A more important issue in our opinion could be differences between the different chains sampled. As shown in the stan implementation of BYM (file Resul-BYM-NoPriors-Stan.pdf), differences between different causes of death can be very important, meanwhile this does not happen in WinBUGS (file Resul-BYM-WinBUGS.pdf). We have seen that when Stan takes longer to run is due to some of the chains sampled, which takes much longer than the rest of them. Is there any way to control this in Stan such as fixing the starting points of the chains as for WinBUGS? Do you have any other suggestion for making Stan more efficient in this context? We are very confident in Stan capabilities for large disease mapping studies but at our first attempt we haven’t really found that hypothetical advantage. Any guidance or suggestion in this sense will be really welcome.