Discrete convolution by direct summation - can this be made more efficient?

Michael_Peck · June 19, 2017, 11:49pm

I have a function that does essentially the same thing as R’s filter() with method=“convolution” (I think there’s a very similar function in scipy). Here’s the function:

functions {
  matrix convolve(matrix X, vector kernel) {
    int nr = rows(X);
    int nc = cols(X);
    int kl = rows(kernel);
    int kl1 = kl-1;
    matrix[nr-kl1, nc] Y;
    for (j in 1:nc) {
      for (i in 1:(nr-kl1)) {
        Y[i, j] = dot_product(X[i:(i+kl1), j], kernel);
      }
    }
    return Y;
  }
}

And here is a comparison to filter():

> require(rstan)
> require(rbenchmark)
> 
> kernel <- c(1:6,5:1)
> kernel <- kernel/sum(kernel)
> set.seed(1234)
> X <- matrix(rnorm(3000*40),3000, 40)
> npvd <- stanc("npvd.stan")

starting worker pid=6184 on localhost:11959 at 18:14:19.897
> expose_stan_functions(npvd)
> 
> benchmark(Y1 <- convolve(X, kernel), Y2 <- filter(X, kernel, method="convolution"),
+           replications=1000)
                                             test replications
1                       Y1 <- convolve(X, kernel)         1000
2 Y2 <- filter(X, kernel, method = "convolution")         1000
  elapsed relative user.self sys.self user.child sys.child
1   5.990    1.099     5.951    0.032          0         0
2   5.451    1.000     5.427    0.013          0         0
> all.equal(Y1, Y2[6:2995,])
[1] TRUE
>

So this isn’t bad – the Stan version essentially ties R’s filter(), which appears to wrap some R bookkeeping code around C calls. Is there an obvious way to make it better?

The Stan code for the entire model is below. Basically it does linear regression on a set of predictors convolved with a simplex kernel. The predictors started out as eigenvectors from a principal components analysis and those are unit scaled. Yes the weird scaling of the observed data was intentional.

The model works very well and is producing results I believe. I’d be happy if it were faster. There may be other ways besides the convolution to do that. For example I did some sort of cross-validation exercise on the principal components that told me I should use 42 of them. That might be overkill. The kernel could use a tighter prior (I think it should have centrally concentrated mass) – I don’t know if that would help with speed though.

One other question. Variable declaration and assignment in a single statement works (I’m using rstan 2.14.1 at the moment). I haven’t seen any example code where this is done and don’t recall an explicit statement in the language manual that this is OK. Is this just a matter of programming style? A new feature? A deprecated feature?

Code follows:

functions {
  matrix convolve(matrix X, vector kernel) {
    int nr = rows(X);
    int nc = cols(X);
    int kl = rows(kernel);
    int kl1 = kl-1;
    matrix[nr-kl1, nc] Y;
    for (j in 1:nc) {
      for (i in 1:(nr-kl1)) {
        Y[i, j] = dot_product(X[i:(i+kl1), j], kernel);
      }
    }
    return Y;
  }
}
data {
  int<lower=1> nf;  //number of flux values
  int<lower=1> kl; //kernel length
  int<lower=1> nc;
  vector[nf] gflux;
  vector<lower=0>[nf] g_std;
  matrix[nf+kl-1, nc] X;
}
transformed data {
  vector[nf] y;
  vector[nf] y_std;
  real mu_g = mean(gflux);
  y = (gflux-mu_g)/mu_g;
  y_std = g_std/mu_g;
}
parameters {
  simplex[kl] kernel;
  vector[nc] beta;
  real alpha;
}
model {
  matrix[nf, nc] X_c;
  
  X_c = convolve(X, kernel);
  
  alpha ~ normal(0., 5.);
  beta ~ normal(0., 50.);
  y ~ normal(alpha + X_c * beta, y_std);
}
generated quantities {
  vector[nf] gfit;
  vector[nf] gflux_rep;
  vector[nf] log_lik;
  
  gfit = mu_g + mu_g*(alpha + convolve(X, kernel) * beta);
  for (i in 1:nf) {
    gflux_rep[i] = normal_rng(gfit[i], g_std[i]);
    log_lik[i] = normal_lpdf(gflux[i] | gfit[i], g_std[i]);
  }
}

bbbales2 · June 20, 2017, 12:36am

It doesn’t look like there’s an FFT yet (https://github.com/stan-dev/math/issues/20), so your options are probably just to try to rewrite things using Matrix-vector stuff as much as possible. I think you can change:

for (j in 1:nc) {
  for (i in 1:(nr-kl1)) {
    Y[i, j] = dot_product(X[i:(i+kl1), j], kernel);
  }
}

Into something like:

row_vector[kl] row_kernel = kernel';
for (i in 1:(nr-kl1)) {
  Y[i,] = row_kernel * X[i:(i+kl1),];
}

(Notice the transpose on the kernel). Make sure and check this in R first to make sure I didn’t goof on the translation…

Michael_Peck · June 20, 2017, 2:37pm

That worked! Thanks!

I get a better than factor of two improvement in execution time of the function with your suggested modification and the full model run on real data is ~30-40% faster.

Benchmark with new function temporarily added to code as “convolve2”:

> require(rstan)
> require(rbenchmark)
> 
> kernel <- c(1:6,5:1)
> kernel <- kernel/sum(kernel)
> set.seed(1234)
> X <- matrix(rnorm(3000*40),3000, 40)
> npvd <- stanc("npvd.stan")

Zernike loaded at startup
starting worker pid=7975 on localhost:11959 at 08:55:48.189
> expose_stan_functions(npvd)
> 
> benchmark(Y1 <- convolve(X, kernel), 
+           Y2 <- convolve2(X, kernel),
+           Y3 <- filter(X, kernel, method="convolution"),
+           replications=1000)
                                             test replications
1                       Y1 <- convolve(X, kernel)         1000
2                      Y2 <- convolve2(X, kernel)         1000
3 Y3 <- filter(X, kernel, method = "convolution")         1000
  elapsed relative user.self sys.self user.child sys.child
1   5.943    2.617     5.886    0.046          0         0
2   2.271    1.000     2.251    0.017          0         0
3   5.206    2.292     5.183    0.013          0         0
> all.equal(Y1, Y3[6:2995,])
[1] TRUE
> all.equal(Y1, Y2)
[1] TRUE

Revised function:

  matrix convolve(matrix X, vector kernel) {
    int nr = rows(X);
    int nc = cols(X);
    int kl = rows(kernel);
    int kl1 = kl-1;
    row_vector[kl] kernelp = kernel';
    matrix[nr-kl1, nc] Y;
    for (i in 1:(nr-kl1)) {
        Y[i, ] = kernelp * X[i:(i+kl1), ];
    }
    return Y;
  }

bbbales2 · June 20, 2017, 5:24pm

Good to hear!

The other way to do this is just do what you were doing but copy things around to put all the multiplication n’ such into big matvecs:

row_vector[ikr] row_kernel = kernel';
matrix[ikr, nr - kl1] Xtmp;
for(j in 1:nc) {
  for(i in 1:(nr - kl1)) {
    Xtmp[,i] = X[i:(i + kl1), j]
  }

  Y[, j] = row_kernel * Xtmp;
}

Dunno if it’d perform any better/worse. Good luck!

Michael_Peck · June 21, 2017, 10:27pm

Thanks again. It took me a while to figure out what you were suggesting. That code throws an error because the parser sees a row vector being assigned to a vector. This works though:

Y[, j] = (row_kernel * Xtmp)';

This is slower by ~60-80% than your first suggestion and just ties my first naive attempt, but it was worth a try!

Topic		Replies	Views
2D convolution in stan Developers features , math	5	1069	July 22, 2020
Dot products of vectors to perform 1D convolution General	11	1996	April 7, 2021
1D and 2D convolution functions in stan Modeling features , math , ecology , population-dynamics	1	649	September 19, 2022
Does Stan has a function similar to the Apply function in R? General	3	1210	March 5, 2019
Vectorization of sum of real*matrix terms - any ideas? Modeling	10	1616	May 16, 2017

Discrete convolution by direct summation - can this be made more efficient?

Related topics