New Stan-based R package 'walker' for dynamic regression models

I have been doing some testing with Stan for a while now in order to make some comparison with HMC and pseudo-marginal based MCMC algorithms, and as a byproduct (which is not related to pseudo-marginal stuff at all) I ended up creating an R package called walker, which can be used for Bayesian inference of dynamic regression models. These models are essentially standard linear regression models, except that the regression coefficients are allowed to vary over “time” according to independent random walks. Compared to most obvious implementation, here the idea is to marginalise over the regression coefficients using Kalman filter, leading to faster algorithm which mixes and scales better as well.

Package with a short vignette can be found in Github: https://github.com/helske/walker and I will likely submit it to CRAN soon as well. Comments and especially pull requests are welcome, the interface for analyzing the results are still pretty rough, as I haven’t built custom methods for the post-processing), and the time series aspect of the model/results need sometimes bit special consideration when using standard methods (extract, summary etc) of stanfit objects.

Enjoy.

1 Like

Is there something specific you’d like us to look at, such as the coding of Stan models or how it’s integrated into R? If so, some pointers into the repo would help—I poked around a bit, but didn’t find any raw Stan models, just generated C++.

The Stan models are in the exec directory. The “basic” version is here: https://github.com/helske/walker/blob/master/exec/rw_model_naive.stan, and the Kalman filter based model here: https://github.com/helske/walker/blob/master/exec/rw_model.stan

It would be nice to get confirmation that the above models are coded properly in terms of efficiency, so that I’m not doing anything overly stupid which would make the comparisons between different approaches unfair. I did have some problems with mixing vectors and arrays and needed to make some additional transformed parameters because of these, not sure if there would be easier way.

In terms of efficiency, the sampling statement (~) forms of the distributions are more efficient because they will drop all the constant terms. So while you’ll get the same effect from

target += foo_lpdf(y | theta);

and

y ~ foo(theta);

the latter is more effiient. That’s relevant in line 30 of rw_model_naive.stan.

You also want to rewrite this to vectorize:

for(t in 1:n) {
    target += normal_lpdf(beta_raw[, t] | 0, 1);
    target += normal_lpdf(y[t] | xreg[t, ] * beta[, t], sigma[1]);    
  }

and replaced with

to_vector(beta_raw) ~ normal(0, 1);
{
  vector[n] mu;
  for (t in 1:n) mu[t] = xreg[t , ] * beta[, t];
  y ~ normal(mu, sigma[1]);

I didn’t understand the second term and why you have xreg[t, ] * beta[, t] and only use sigma[1], but I wasn’t looking very closely.

Whenver there’s a a[ , i] it would be better to rranspose and use a[i]—it keeps memory local and there are no copies. You might want to read the efficiency chapter of the manual, which goes over most of this with examples.

You can replace

  for(i in 1:k) sigma_vec[i] = sigma[1 + i];

with

sigma_vec = sigma[2: ];

But overall, you’re better off not hacking all the scales into a single vector and then pulling out the first one one place and the others elsewhere.

You don’t need the tmp assignment in

for(t in 2:n) {
    tmp = beta[, t - 1];
    beta[, t] = tmp + sigma_vec .* beta_raw[, t];  
  }

you can just use beta[ , t - 1] directly.

Thanks a lot! I forgot the existence of the sampling statement as in first iteration I needed all the constants as well.

About the last part, that is due to the deep-copy issue (Deep copy warning), using tmp only copies of the single columns are needed instead of copying whole matrix, I think.

Thanks again Bob, I managed to speed up the above “naive” model considerably by implementing your suggestions, as well as modifying couple other things such as transposing xreg in transformed data block and then using just row_vector[n] mu = columns_dot_product(xreg, beta); in constructing mu. The computation time of the example model in the vignette dropped from 179s to 96s. The tweaking of the Kalman filter based model produced small gains as well, from 19s to 15s.

But if a is matrix, isn’t it better to work on columns than rows?

Glad to hear that helped. I tend to forget about columns_dot_product and the ilk, which are in place for just this purpose.

Yes, if you’re going to copy a vector from a matrix, then it’s better to work on columns because that’ll preserve memory locality. But it still requires a copy. If you need to access rows or vectors of a matrix and you don’t need to also access the whole matrix, then it’s better to use an array of vectors

vector[N] a[M];
... a[m] ...

which doesn’t need to do make a copy. There’s more information in the arrays vs. vectors/matrices chapter of the manual.

In any case, copying is usually a relatively minor cost if the model’s at all complicated.

Dear Helske,

I have a few latent parameters in a GLM. Currently I use GP as a ‘smoother’, but I’d to try
your kalman smoother for comparison. Do you have an easy example? I looked in your
code and would model y ~ normal(0, 1) and the priors two?

Andre

Sorry, I’m not sure if I understand your question. There are some examples in the vignette (http://htmlpreview.github.io/?https://github.com/helske/walker/blob/master/walker_html/walker.html) and couple more in the R docs (type ?walker in R console).

Note that walker only supports Gaussian models, not GLMs. Although you might get reasonable results in some cases anyway even though your observations are non-Gaussian (like Poisson with high counts). It is in theory possible to extend the similar marginalization approach to non-Gaussian cases as well, and I might implement those to future version, but that needs bit more work (i.e. time).

Well I had some time after all (not really but…). The latest Github version of the package contains new function, walker_glm, which can used for time-varying GLMs (Currently only Poisson case is implemented, but at least binomial and negative binomial will/should be available at some point).

I haven’t tested the function much yet, and I haven’t really though about best way to present the results, as compared to walker, walker_glm outputs weighted samples from the posterior (there is a parameter weights in the output).

Dear Helske,
thank you. Meanwhile I figured out http://www.utdallas.edu/~pxb054000/code/pests.r
and did a translation to Stan, which had some stability problems. Will try your solution now.
I’m not so familiar with Kalman filter.

Thanks again, Andre