Difference in `fft` implementation between stan and numpy/scipy

Thank you for the Fourier implementation in the newest release! Stan defines the fft u\in\mathbb{C}^N of a complex vector v\in\mathbb{C}^N as

u_n = \sum_{m < n} v_m \cdot \exp\left(\frac{-n \cdot m \cdot 2 \cdot \pi \cdot \sqrt{-1}}{N}\right).

Numpy instead defines it as

u_n=\sum_{m=0}^{n-1} v_m\exp\left(\frac{-2\pi i mn}{N}\right).

These are identical up to the difference in summation (over all indices for numpy and only over “time” indices less than “frequency” indices in Stan). Of course, not everything needs to follow the same conventions as numpy, but I just wanted to understand the summation convention in Stan because it differs from previous definitions I’ve seen, e.g. on wikipedia.

The results also seem to differ when comparing test data.

// test_fft.stan
data {
    int n;
    vector[n] x;

generated quantities {
   complex_vector[n] y = fft(x);
# test_fft.py
import cmdstanpy
import numpy as np

model = cmdstanpy.CmdStanModel(stan_file="test_fft.stan")
n = 3
x = np.random.normal(0, 1, n)
fit = model.sample({"n": n, "x": x}, fixed_param=True, iter_warmup=0, iter_sampling=1)
stan_fft, = fit.stan_variable("y")
np_fft = np.fft.fft(x)
for label, fft in [("stan", stan_fft), ("np", np_fft)]:
    print("real", np.round(fft.real, 3))
    print("imag", np.round(fft.imag, 3))

# output
real [3.143 0.    1.075]
imag [ 0.501  1.075 -0.501]

real [3.143 1.075 1.075]
imag [ 0.     0.501 -0.501]

python: 3.10.0 (default, Jan 15 2022, 13:32:36) [Clang 12.0.5 (clang-1205.0.22.9)]
python-bits: 64
OS: Darwin
OS-release: 21.6.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LOCALE: ('en_US', 'UTF-8')
cmdstan_folder: /Users/till/.cmdstan/cmdstan-2.30.1
cmdstan: (2, 30)
cmdstanpy: 1.0.4
pandas: 1.4.3
xarray: None
tdqm: None
numpy: 1.22.1
ujson: 5.4.0

Do you know why the values differ? As the signal is real, I would’ve expected the zero-frequency term to have zero imaginary part (as numpy does).

Stan wraps the Eigen C++ library for the implementation of FFTs. They have a few notes on different choices they’ve made here.
I’m not sure if this accounts for the difference you’ve noted or not

1 Like

Thank you for the link. They mention scaling which should not affect the results up to a global rescaling. The other difference is how they handle real FFTs where only half the spectrum is required. Neither should lead to the differences observed above, I think.

1 Like

Oh my, it seems this is a bug not in Stan, but in cmdstanpy:

generated quantities {
   complex_vector[n] y = fft(x);


Chain [1] [3.14295,1.0746,1.0746]
Chain [1] [0,0.501066,-0.501066]

This is not good! Thank you for finding this

1 Like

Aha, nice catch! Any idea why cmdstanpy is returning the odd values?

I’ll extract the real and imaginary parts in separate vectors for the time being.

There isn’t a better answer than “we didn’t check our work here” sadly. The CmdStanPy output assumes that complex numbers in the output are stored like an extra final dimension of size 2. But they’re not really, they’re stored strided alongside the other dimensions.

To make this concrete, a vector of 3 elements was being read in as if it was stored as x.1.real, x.2.real, x.3.real, x.1.imag. x.2.imag, x.3.imag, but really it is actually written to disk as x.1.real, x.1.imag, x.2.real, x.2.imag, ...

We should have tested this, but we didn’t. Our tests only asserted the result had the right shape, not the right order, which I’ll rectify as I fix the bug. My guess is people haven’t used complex outputs very much yet in Stan, so nobody noticed it was broken.

1 Like

Thank you for the additional context and the great work with CmdStanPy!

1 Like

Thank you again for being curious and finding the bug!

I suppose getting the same output now still doesn’t answer your question of why the definition in the docs is nonstandard. For that I’ll ping @Bob_Carpenter, who wrote the docs