# Discrete convolution by direct summation - can this be made more efficient?

I have a function that does essentially the same thing as R’s filter() with method=“convolution” (I think there’s a very similar function in scipy). Here’s the function:

``````functions {
matrix convolve(matrix X, vector kernel) {
int nr = rows(X);
int nc = cols(X);
int kl = rows(kernel);
int kl1 = kl-1;
matrix[nr-kl1, nc] Y;
for (j in 1:nc) {
for (i in 1:(nr-kl1)) {
Y[i, j] = dot_product(X[i:(i+kl1), j], kernel);
}
}
return Y;
}
}
``````

And here is a comparison to filter():

``````> require(rstan)
> require(rbenchmark)
>
> kernel <- c(1:6,5:1)
> kernel <- kernel/sum(kernel)
> set.seed(1234)
> X <- matrix(rnorm(3000*40),3000, 40)
> npvd <- stanc("npvd.stan")

starting worker pid=6184 on localhost:11959 at 18:14:19.897
> expose_stan_functions(npvd)
>
> benchmark(Y1 <- convolve(X, kernel), Y2 <- filter(X, kernel, method="convolution"),
+           replications=1000)
test replications
1                       Y1 <- convolve(X, kernel)         1000
2 Y2 <- filter(X, kernel, method = "convolution")         1000
elapsed relative user.self sys.self user.child sys.child
1   5.990    1.099     5.951    0.032          0         0
2   5.451    1.000     5.427    0.013          0         0
> all.equal(Y1, Y2[6:2995,])
[1] TRUE
>
``````

So this isn’t bad – the Stan version essentially ties R’s filter(), which appears to wrap some R bookkeeping code around C calls. Is there an obvious way to make it better?

The Stan code for the entire model is below. Basically it does linear regression on a set of predictors convolved with a simplex kernel. The predictors started out as eigenvectors from a principal components analysis and those are unit scaled. Yes the weird scaling of the observed data was intentional.

The model works very well and is producing results I believe. I’d be happy if it were faster. There may be other ways besides the convolution to do that. For example I did some sort of cross-validation exercise on the principal components that told me I should use 42 of them. That might be overkill. The kernel could use a tighter prior (I think it should have centrally concentrated mass) – I don’t know if that would help with speed though.

One other question. Variable declaration and assignment in a single statement works (I’m using rstan 2.14.1 at the moment). I haven’t seen any example code where this is done and don’t recall an explicit statement in the language manual that this is OK. Is this just a matter of programming style? A new feature? A deprecated feature?

Code follows:

``````functions {
matrix convolve(matrix X, vector kernel) {
int nr = rows(X);
int nc = cols(X);
int kl = rows(kernel);
int kl1 = kl-1;
matrix[nr-kl1, nc] Y;
for (j in 1:nc) {
for (i in 1:(nr-kl1)) {
Y[i, j] = dot_product(X[i:(i+kl1), j], kernel);
}
}
return Y;
}
}
data {
int<lower=1> nf;  //number of flux values
int<lower=1> kl; //kernel length
int<lower=1> nc;
vector[nf] gflux;
vector<lower=0>[nf] g_std;
matrix[nf+kl-1, nc] X;
}
transformed data {
vector[nf] y;
vector[nf] y_std;
real mu_g = mean(gflux);
y = (gflux-mu_g)/mu_g;
y_std = g_std/mu_g;
}
parameters {
simplex[kl] kernel;
vector[nc] beta;
real alpha;
}
model {
matrix[nf, nc] X_c;

X_c = convolve(X, kernel);

alpha ~ normal(0., 5.);
beta ~ normal(0., 50.);
y ~ normal(alpha + X_c * beta, y_std);
}
generated quantities {
vector[nf] gfit;
vector[nf] gflux_rep;
vector[nf] log_lik;

gfit = mu_g + mu_g*(alpha + convolve(X, kernel) * beta);
for (i in 1:nf) {
gflux_rep[i] = normal_rng(gfit[i], g_std[i]);
log_lik[i] = normal_lpdf(gflux[i] | gfit[i], g_std[i]);
}
}

``````

It doesn’t look like there’s an FFT yet (https://github.com/stan-dev/math/issues/20), so your options are probably just to try to rewrite things using Matrix-vector stuff as much as possible. I think you can change:

``````for (j in 1:nc) {
for (i in 1:(nr-kl1)) {
Y[i, j] = dot_product(X[i:(i+kl1), j], kernel);
}
}
``````

Into something like:

``````row_vector[kl] row_kernel = kernel';
for (i in 1:(nr-kl1)) {
Y[i,] = row_kernel * X[i:(i+kl1),];
}
``````

(Notice the transpose on the kernel). Make sure and check this in R first to make sure I didn’t goof on the translation…

That worked! Thanks!

I get a better than factor of two improvement in execution time of the function with your suggested modification and the full model run on real data is ~30-40% faster.

Benchmark with new function temporarily added to code as “convolve2”:

``````> require(rstan)
> require(rbenchmark)
>
> kernel <- c(1:6,5:1)
> kernel <- kernel/sum(kernel)
> set.seed(1234)
> X <- matrix(rnorm(3000*40),3000, 40)
> npvd <- stanc("npvd.stan")

starting worker pid=7975 on localhost:11959 at 08:55:48.189
> expose_stan_functions(npvd)
>
> benchmark(Y1 <- convolve(X, kernel),
+           Y2 <- convolve2(X, kernel),
+           Y3 <- filter(X, kernel, method="convolution"),
+           replications=1000)
test replications
1                       Y1 <- convolve(X, kernel)         1000
2                      Y2 <- convolve2(X, kernel)         1000
3 Y3 <- filter(X, kernel, method = "convolution")         1000
elapsed relative user.self sys.self user.child sys.child
1   5.943    2.617     5.886    0.046          0         0
2   2.271    1.000     2.251    0.017          0         0
3   5.206    2.292     5.183    0.013          0         0
> all.equal(Y1, Y3[6:2995,])
[1] TRUE
> all.equal(Y1, Y2)
[1] TRUE
``````

Revised function:

``````  matrix convolve(matrix X, vector kernel) {
int nr = rows(X);
int nc = cols(X);
int kl = rows(kernel);
int kl1 = kl-1;
row_vector[kl] kernelp = kernel';
matrix[nr-kl1, nc] Y;
for (i in 1:(nr-kl1)) {
Y[i, ] = kernelp * X[i:(i+kl1), ];
}
return Y;
}
``````

Good to hear!

The other way to do this is just do what you were doing but copy things around to put all the multiplication n’ such into big matvecs:

``````row_vector[ikr] row_kernel = kernel';
matrix[ikr, nr - kl1] Xtmp;
for(j in 1:nc) {
for(i in 1:(nr - kl1)) {
Xtmp[,i] = X[i:(i + kl1), j]
}

Y[, j] = row_kernel * Xtmp;
}
``````

Dunno if it’d perform any better/worse. Good luck!

Thanks again. It took me a while to figure out what you were suggesting. That code throws an error because the parser sees a row vector being assigned to a vector. This works though:

``````Y[, j] = (row_kernel * Xtmp)';
``````

This is slower by ~60-80% than your first suggestion and just ties my first naive attempt, but it was worth a try!

1 Like