FFTW Tutorial

# FFTW Tutorial #

This is a basic C project (Makefile, but also for the Eclipse IDE) I use for exploring FFTW 3.3.9.

One- and two-dimensional discrete Fourier transforms (DFTs) of random data are computed using both FFTW and straight-forward naive algorithms in order to illustrate explicitly what kind of symmetries and scaling properties FFTW implies in its inputs and outputs.

## One-Dimensional Examples #

This tutorial starts by computing one-dimensional (1D) DFTs of random input data.

### 1D complex-to-complex #

The first example is basically a self-contained version of the corresponding example in the FFTW manual.

We want to compute the complex-valued one-dimensional DFT here, which is specified in section 4.8.1 of the FFTW reference manual.

$$Y_k = \sum\limits_{j=0}^{n-1} X_j \, e^{\pm 2 \pi j k \sqrt{-1}/n} \textrm{ for } k=0,1,…,(n-1)$$

The sign in the exponent of the basis function specifies the direction in which the Fourier transform is to be computed: -1 indicates a “forward” transform and +1 indicates a “backward” transform. These values are available via the FFTW_FORWARD and FFTW_BACKWARD preprocessor macros.

In order to compute the DFT, complex-valued products of the following form need to be evaluated:

$$X_j \, e^{\pm 2 \pi j k \sqrt{-1}/n}$$

Eulers formula comes in handy now (where $i$ is the imaginary unit with ${i}^2=-1$):

$$e^{i \varphi} = \cos(\varphi) + i \sin(\varphi)$$

The angle argument phi can be identified in above formulas for the DFT:

$$\varphi = \pm 2 \pi j k / n$$

Now the complex-valued product can be computed using only real-valued variables:

\begin{align} ~& \left( \mathcal{Re}(X_j) + i \, \mathcal{Im}(X_j) \right) \cdot \left( \cos(\varphi) + i \sin(\varphi) \right) \\ =& \phantom{+ i}~ \left( \mathcal{Re}(X_j) \cos(\varphi) - \mathcal{Im}(X_j) \sin(\varphi) \right) \\ ~& + i \left( \mathcal{Re}(X_j) \sin(\varphi) + \mathcal{Im}(X_j) \cos(\varphi) \right) \end{align}

FFTW implements all this math internally and the explicit formulation was only used to build a basis for the computations to follow. Below is the example code showing how to compute the 1d c2c DFT using both FFTW and a manual implementation. The size of the DFT is specified via the variable n and the direction (forward or backward) is specified via the variable dir. Complex-valued arrays in, ref_out and fftw_out are allocated to hold the input (X_k) and outputs (Y_k) of the DFT. A plan for the corresponding DFT is created using fftw_plan_dft_1d. Only after this, the input data is written into the in array. The reference output is computed now (before calling fftw_execute), since in some later examples (most notably multi-dimensional c2r transforms), FFTW overwrites the input data and for good practise, we keep this in mind already now. Note that this is an out-of-place transform, since in and fftw_out are allocated to be separate arrays. Next the FFTW transform can be executed via fftw_execute. This fills the corresponding output array fftw_out, which is subsequently compared against the reference output in ref_out. A conservative tolerance of 1e-12 is specified to make the example work also for weird input data (as generated by the PRNG). The actual deviations are usually much smaller and can be observed in the screen output from compare_1d_cplx. Finally, the fftw_plan is destroyed and the memory is released using fftw_free (which has to be used if the array was allocated using fftw_alloc_*).

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#include <complex.h>
#include <fftw3.h>

#include "util.h"

int test_1d_c2c(int n) {

int dir = FFTW_BACKWARD;
double real, imag, phi;

fftw_complex *in = fftw_alloc_complex(n);
fftw_complex *ref_out = fftw_alloc_complex(n);
fftw_complex *fftw_out = fftw_alloc_complex(n);

fftw_plan p = fftw_plan_dft_1d(n, in, fftw_out, dir, FFTW_ESTIMATE);

// fill the input array with random data
fill_random_1d_cplx(n, in);

// compute the reference output
for (int k = 0; k < n; ++k) {
ref_out[k] = 0.0;
for (int j = 0; j < n; ++j) {
phi = dir * 2.0 * M_PI * j * k / ((double) n);

real = creal(in[j]) * cos(phi) - cimag(in[j]) * sin(phi);
imag = creal(in[j]) * sin(phi) + cimag(in[j]) * cos(phi);
ref_out[k] += real + I * imag;
}
}

// compute the DFT of in using FFTW
fftw_execute(p);

// compare reference output with FFTW output
double eps = 1e-12;
int status = compare_1d_cplx(n, ref_out, fftw_out, eps);

fftw_destroy_plan(p);
fftw_free(in);
fftw_free(ref_out);
fftw_free(fftw_out);

return status;
}

int main(int argc, char** argv) {
int status = 0;
status += test_1d_c2c(32);
status += test_1d_c2c(33);
return status;
}


The code is available in the file src/test_1d_c2c.c.

### 1D complex-to-real and real-to-complex #

The next two examples deal with DFTs of purely real data (r2c) and DFTs which produce purely real data (c2r). These are covered in the official FFTW tutorial as well as in the FFTW reference manual.

In case either the input array or the output array are constrained to be purely real, the corresponding complex-valued output or input array features Hermitian symmetry (where the n-periodicity has been included as well):

$$Y_{n-k} = Y_{-k} = Y^{*}_k$$

For the case of a DFT of real-valued X_j and complex-valued Y_k with Hermitian symmetry, the Fourier sum is written out excplicitly as follows:

\begin{align} %TODO: fix underbrace Y_{-k} =& \sum\limits_{j=0}^{n-1} X_j \Bigl( \cos(-2 \pi j k / n) + i \sin(-2 \pi j k / n) \Bigr) \\ =& \sum\limits_{j=0}^{n-1} X_j \Bigl( \cos(2 \pi j k / n) - i \sin(2 \pi j k / n) \Bigr) = Y^{*}_k \end{align}

The figure below illustrates the structure of the complex-valued Fourier space arrays occuring in the DFT for both even-valued (n=6) and odd-valued (n=7) sizes of the DFT.

The size required to contain all information required for the transform from or to a real-valued array is contained in the first n/2+1 (division by 2 rounded down) entries of the complex array, indicated by the red bars in above figure. For both even and odd values of n, Hermitian symmetry implies Y_0 = Y*_0 and thus Y_0 is always real. In the case of even n, we can intuitively observe that Y_m = Y*_-m where m=n/2 (the element at the Nyquist frequency, and thus also the last element of the complex-valued Fourier-space array) is also purely real.

The DFT formulation includes all elements of the Fourier-space array from 0 to n-1. Now that only parts of these coefficients are taken into account, they have to be weighted appropriately to recover the results that would have been obtained by using the full array without taking advantage of its symmetry properties. For odd n, all components of the Fourier-space array except the DC element at k=0 have to be weighted with a factor of 2. For even n, all components of the Fourier-space array except the DC element at k=0 and the Nyquist element at k=n/2 have to be weighted with a factor of 2. The elements that need to be weighted by a factor of 2 are highlighted by solid blue lines in above illustration. The redundant elements that are not explicitly needed are indicated by dashed blue lines.

The backward transformation from complex-valued Fourier space to real space is demonstrated in src/test_1d_c2r.c. The relevant portion of the source code is here:

int nCplx = n / 2 + 1;
for (int k = 0; k < n; ++k) {

// start with DC component, which is purely real due to Hermitian symmetry
ref_out[k] = creal(in[0]);

int loopEnd = nCplx;

// special case for even n
if (n % 2 == 0) {
// Nyquist element is purely real as well
phi = 2.0 * M_PI * (nCplx - 1) * k / ((double) n);
ref_out[k] += creal(in[nCplx - 1]) * cos(phi);

loopEnd = nCplx-1;
}

// middle elements are handled the same for even and odd n
for (int j = 1; j < loopEnd; ++j) {
phi = 2.0 * M_PI * j * k / ((double) n);

real = creal(in[j]) * cos(phi) - cimag(in[j]) * sin(phi);
ref_out[k] += 2.0 * real;
}
}


Note that the integer division used to compute nCplx (correctly) rounds down. The rest is a relatively straight-forward implementation of above verbose algorithm. The DC component is always taken to be real. Depending on whether n is even or odd, the number of elements to take into account with both real and imaginary component (loopEnd) is adjusted. The (purely real) Nyquist element at n/2 is added separately if n is even. All other elements are weighted by a factor of 2 and only the real part of the complex product of input Fourier coefficient and complex-valued basis function is actually computed.

The forward transform from real space to Fourier space is comparably simple to implement:

for (int k = 0; k < nCplx; ++k) {

// DC component is always real
ref_out[k] = in[0];

for (int j = 1; j < n; ++j) {
phi = -2.0 * M_PI * j * k / ((double) n);

real = in[j] * cos(phi);
imag = in[j] * sin(phi);
ref_out[k] += real + I * imag;
}
}


Note that in this case, only the non-redundant part of the complex-values Fourier coefficients need to be computed from a real-valued input. The separate handling of the DC component is not strictly necessary, since cos(0)=1 and sin(0)=0 and thus the DC component would get no imaginary contribution. The full example is available in the file src/test_1d_r2c.c.

It becomes evident in above examples that the sign of a c2r DFT is always FFTW_BACKWARD and the sign of a r2c DFT is always FFTW_FORWARD.

### 1D real-to-real #

Certain symmetries can be assumed for a given real input array of which a DFT is to be computed that lead to the output array being purely real as well. This is another gain of a factor of 2 in speed and memory usage over r2c/c2r transforms. Depending on even (odd) parity of the input array, the transform outputs have even (odd) parity. They are therefore called Discrete Cosine Transform (DCT) and Discrete Sine Transfrom (DST), respectively.

The logical size of the corresponding DFT is denoted as N. The actual array size given to FFTW is denoted by n. For the DCT/DST types implemented in FFTW, N is always even. Note that this does not pose any restrictions on the input array sizes n. One can think of the logical DFT input array as one that FFTW ‘sees’ internally and computes a regular DFT of. The resulting output array is purely real and features the named symmetry properties, since the logical input array was ‘constructed’ from the given input array to have the desired symmetry properties.

In below example plots used to illustrate these symmetry properties/assumptions, random Fourier coefficients have been sampled and transformed back to real-space using the named inverse transforms. This allows to ’evaluate’ the input data also in the samples given in the (small) input arrays. In these plots, red dashed vertical lines indicate even symmetry (f(x)=f(-x)) about the indicated position and blue dashed vertical lines indicate odd symmetry (f(x)=-f(-x)) about the indicated position. The grey-shaded area in the background indicated the range of samples that are included in the input array. The x axis labels denote the indices in the actual input array given to FFTW (only valid within the range of the grey boxes).

The nomenclature works as follows: The first letter is R to indicate real-valued data. The second letter distinguished between E for even-parity data and O for odd-parity data. The following DFT is for discrete Fourier transform (who guessed…). The next two digits indicate wheter (1) or not (0) the input (first digit) or the output (second digit) data is shifted by half a sample. Think of this in terms of parity: whether the symmetry axis is located at a sample (no shifting necessary) or between two samples (shifting necessary). The shifting becomes necessary when formulating the symmetry properties over sampled data that has integer indices vs. symmetry axis that are possibly located at half-integer locations. The parity and symmetry properties of the output array are those of the input array for the inverse transform.

For all transforms, a periodicity of N is assumed for the logical input array as X_j = X_{N+j} where X is the input data array.

Here is a quick overview table to indicate the assumed symmetries in the input array for the following types of r2r DFTs:

type actual r2r input logically-equivalent DFT input
REDFT00 a b c d e a b c d e d c b
REDFT10 a b c d  a b c d d c b a
REDFT01 a b c d  a b c d 0 -d -c -b
REDFT11 a b c d  a b c d -d -c -b -a
RODFT00 a b c  0 a b c 0 -c -b -a
RODFT10 a b c d  a b c d -d -c -b -a
RODFT01 a b c d  a b c d c b a 0
RODFT11 a b c d  a b c d d c b a

For the DCTs, please also consider https://en.wikipedia.org/wiki/Discrete_cosine_transform#Formal_definition and in particular https://upload.wikimedia.org/wikipedia/commons/a/ae/DCT-symmetries.svg .

For the DSTs, please also consider https://en.wikipedia.org/wiki/Discrete_sine_transform#Definition and in particular https://upload.wikimedia.org/wikipedia/commons/3/31/DST-symmetries.svg .

#### REDFT00 (DCT-I) #

In case of the real-valued even-parity DFT with no shifts in either input or output array (REDFT00), also called the DCT-I, the corresponding logical DFT size is given by N = 2(n-1), corresponding to n = N/2+1.

The formal definition of the REDFT00 is given below:

$$Y_k = X_0 + (-1)^k X_{n-1} + 2 \sum\limits_{j=1}^{n-2} X_j \cos\left[ \pi j k / (n-1) \right]$$

The inverse of this transform is REDFT00 itself. The input array is assumed to have even symmetry around j=0 and even symmetry also around j=n−1.

In above figure, the lowercase letters a to e refer to the input data abcde for the size-5 REDFT00, which is logically equivalent to a size-8 DFT with real-valued input data abcdedcb.

In order to demonstrate the use of this method, the logically equivalent DFT input is filled appropriately and its output is checked against REDFT00. In the following code, in is the input array (size n) given to REDFT00 and in_logical is the (complex-valued) input array (size N) handed to a generic 1D DFT. Similarly, out is the output array (size n) from REDFT00 and out_logical is the output array (size N) from a generic 1D DFT.

Here is how the symmetric input is generated:

// the first half of the array is identical
for (int i=0; i<n; ++i) {
in_logical[i] = in[i];
}

// second half is filled according to even symmetry around n-1
for (int i=0; i<n-1; ++i) {
in_logical[n+i] = in[n-2-i];
}


The checks are a little bit more involved. The logically equivalent DFT output should be purely real-valued:

for (int i = 0; i < N; ++i) {
if (fabs(cimag(out_logical[i])) > eps) {
printf("error: imag of [%d] is %g\n", i, cimag(out_logical[i]));
status = 1;
} else {
printf("imag of [%d] is %g\n", i, cimag(out_logical[i]));
}
}


The first n values should be identical between REDFT00 and the generalized DFT:

double delta;
for (int i = 0; i < n; ++i) {
delta = creal(out_logical[i]) - out[i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", i, delta);
}
}


The remaining values should have even symmetry around n-1:

for (int i = 0; i < n - 2; ++i) {
delta = creal(out_logical[n + i]) - out[n - 2 - i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", n+i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", n+i, delta);
}
}


The full example can be found in src/test_1d_redft00.c.

#### REDFT10 (DCT-II) #

In case of the real-valued even-parity DFT with shifted input data (REDFT10), also called the DCT-II, the corresponding logical DFT size is given by N = 2n, corresponding to n = N/2. This function is commonly known as “the” DCT.

The formal definition of the REDFT10 is given below:

$$Y_k = 2 \sum\limits_{j=0}^{n-1} X_j \cos\left[ \pi (j+1/2) k / n \right]$$

The inverse of this transform is REDFT01. The input array is assumed to have even symmetry around j=-0.5 and even symmetry also around j=n−0.5.

In above figure, the lowercase letters a to e refer to the input data abcd for the size-4 REDFT10, which is logically equivalent to a size-8 DFT with real-valued input data abcddcba.

In order to demonstrate the use of this method, the logically equivalent DFT input is filled appropriately and its output is checked against REDFT10. In the following code, in is the input array (size n) given to REDFT10 and in_logical is the (complex-valued) input array (size N) handed to a generic 1D DFT. Similarly, out is the output array (size n) from REDFT10 and out_logical is the output array (size N) from a generic 1D DFT.

Here is how the symmetric input is generated:

// the first half of the array is identical
for (int i = 0; i < n; ++i) {
in_logical[i] = in[i];
}

// second half is filled according to even symmetry around n-0.5
for (int i = 0; i < n; ++i) {
in_logical[n + i] = in[n - 1 - i];
}


The checks are a little bit more involved. The logically equivalent DFT output should be purely real-valued:

for (int i = 0; i < N; ++i) {
if (fabs(cimag(out_logical[i])) > eps) {
printf("error: imag of [%d] is %g\n", i, cimag(out_logical[i]));
status = 1;
} else {
printf("imag of [%d] is %g\n", i, cimag(out_logical[i]));
}
}


The first n values should be identical between REDFT10 and the generalized DFT:

for (int i = 0; i < n; ++i) {
delta = creal(out_logical[i]) - out[i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", i, delta);
}
}


Odd symmetry around n implies that the value at n is zero:

if (fabs(creal(out_logical[n])) > eps) {
printf("error: delta of [%d] is %g\n", n, creal(out_logical[n]));
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", n, creal(out_logical[n]));
}


The remaining values should have odd symmetry around n:

for (int i = 1; i < n; ++i) {
delta = creal(out_logical[n + i]) - (-out[n - i]);
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", n + i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", n + i, delta);
}
}


The full example can be found in src/test_1d_redft10.c.

#### REDFT01 (DCT-III) #

In case of the real-valued even-parity DFT with shifted output data (REDFT01), also called the DCT-III, the corresponding logical DFT size is given by N = 2n, corresponding to n = N/2. This function is commonly known as “the” inverse DCT (IDCT).

The formal definition of the REDFT01 is given below:

$$Y_k = X_0 + 2 \sum\limits_{j=1}^{n-1} X_j \cos\left[ \pi j (k+1/2) / n \right]$$

The inverse of this transform is REDFT10. The input array is assumed to have even symmetry around j=0 and odd symmetry around j=n.

In order to demonstrate the use of this method, the logically equivalent DFT input is filled appropriately and its output is checked against REDFT01. In the following code, in is the input array (size n) given to REDFT01 and in_logical is the (complex-valued) input array (size N) handed to a generic 1D DFT. Similarly, out is the output array (size n) from REDFT01 and out_logical is the output array (size N) from a generic 1D DFT.

Here is how the symmetric input is generated:

// the first half of the array is identical
for (int i = 0; i < n; ++i) {
in_logical[i] = in[i];
}

// second half is filled according to odd symmetry around n
for (int i = 1; i < n; ++i) {
in_logical[n + i] = -in[n - i];
}


The checks are a little bit more involved. The logically equivalent DFT output should be purely real-valued:

for (int i = 0; i < N; ++i) {
if (fabs(cimag(out_logical[i])) > eps) {
printf("error: imag of [%d] is %g\n", i, cimag(out_logical[i]));
status = 1;
} else {
printf("imag of [%d] is %g\n", i, cimag(out_logical[i]));
}
}


The first n values should be identical between REDFT01 and the generalized DFT:

for (int i = 0; i < n; ++i) {
delta = creal(out_logical[i]) - out[i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", i, delta);
}
}


The remaining values should have even symmetry around n-0.5:

for (int i = 0; i < n; ++i) {
delta = creal(out_logical[n + i]) - out[n - 1 - i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", n + i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", n + i, delta);
}
}


The full example can be found in src/test_1d_redft01.c.

#### REDFT11 (DCT-IV) #

In case of the real-valued even-parity DFT with both shifted input and output data (REDFT11), also called the DCT-IV, the corresponding logical DFT size is given by N = 2n, corresponding to n = N/2.

The formal definition of the REDFT11 is given below:

$$Y_k = 2 \sum\limits_{j=0}^{n-1} X_j \cos\left[ \pi (j+1/2) (k+1/2) / n \right]$$

The inverse of this transform is REDFT11 itself. The input array is assumed to have even symmetry around j=-0.5 and odd symmetry around j=n-0.5.

In order to demonstrate the use of this method, the logically equivalent DFT input is filled appropriately and its output is checked against REDFT11. In the following code, in is the input array (size n) given to REDFT11 and in_logical is the (complex-valued) input array (size N) handed to a generic 1D DFT. Similarly, out is the output array (size n) from REDFT11 and out_logical is the output array (size N) from a generic 1D DFT.

Here is how the symmetric input is generated:

// the first half of the array is identical
for (int i = 0; i < n; ++i) {
in_logical[i] = in[i];
}

// second half is filled according to odd symmetry around n-0.5
for (int i = 0; i < n; ++i) {
in_logical[n + i] = -in[n - 1 - i];
}


The checks are a little bit more involved. The logically equivalent DFT output should be purely real-valued:

for (int i = 0; i < N; ++i) {
if (fabs(cimag(out_logical[i])) > eps) {
printf("error: imag of [%d] is %g\n", i, cimag(out_logical[i]));
status = 1;
} else {
printf("imag of [%d] is %g\n", i, cimag(out_logical[i]));
}
}


The first n values should be identical between REDFT11 and the generalized DFT:

for (int i = 0; i < n; ++i) {
delta = creal(out_logical[i]) - out[i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", i, delta);
}
}


The remaining values should have odd symmetry around n-0.5:

for (int i = 0; i < n; ++i) {
delta = creal(out_logical[n + i]) - (-out[n - 1 - i]);
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", n + i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", n + i, delta);
}
}


The full example can be found in src/test_1d_redft11.c.

#### RODFT00 (DST-I) #

In case of the real-valued odd-parity DFT with no shifts in either input or output array (RODFT00), also called the DST-I, the corresponding logical DFT size is given by N = 2(n+1), corresponding to n = N/2-1. Note that the periodicity of N of the logical input array in combination with odd symmetry X_j = -X_{n-j} leads to X_0 = -X_0 which is equivalent to X_0 = 0. This first always-zero element of the input array is not explicitly included in the input to FFTW and the input array thus has a size of one less and the indices of the symmetry axis shift by 1.

The formal definition of the RODFT00 is given below:

$$Y_k = 2 \sum\limits_{j=0}^{n-1} X_j \sin \left[ \pi (j+1) (k+1) / (n+1) \right]$$

The inverse of this transform is RODFT00 itself. The input array is assumed to have odd symmetry around j=-1 and odd symmetry also around j=n.

In order to demonstrate the use of this method, the logically equivalent DFT input is filled appropriately and its output is checked against RODFT00. In the following code, in is the input array (size n) given to RODFT00 and in_logical is the (complex-valued) input array (size N) handed to a generic 1D DFT. Similarly, out is the output array (size n) from RODFT00 and out_logical is the output array (size N) from a generic 1D DFT.

Here is how the symmetric input is generated:

// the first half of the array is identical
// except the first 0 (and the rest of the array being shifted)
in_logical[0] = 0.0;
for (int i = 0; i < n; ++i) {
in_logical[i + 1] = in[i];
}

// second half is filled according to odd symmetry around (n+1)
in_logical[n + 1] = 0.0;
for (int i = 0; i < n; ++i) {
in_logical[n + 2 + i] = -in[n - 1 - i];
}


The checks are a little bit more involved. The logically equivalent DFT output should be purely imaginary-valued:

for (int i = 0; i < N; ++i) {
if (fabs(creal(out_logical[i])) > eps) {
printf("error: real of [%d] is %g\n", i, creal(out_logical[i]));
status = 1;
} else {
printf("real of [%d] is %g\n", i, creal(out_logical[i]));
}
}


Odd symmetry about 0 implies that the imaginary part of the first output should be zero:

if (fabs(cimag(out_logical[0])) > eps) {
printf("error: delta of [%d] is %g\n", 0, -cimag(out_logical[0]));
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", 0, -cimag(out_logical[0]));
}


The next n values should have the output of RODFT00 in their negative imaginary parts with one index offset to account for the first zero in the input:

for (int i = 0; i < n; ++i) {
delta = -cimag(out_logical[i + 1]) - out[i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", i + 1, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", i + 1, delta);
}
}


Odd symmetry about n in the input to RODFT00 implies odd symmetry in the output of the logically-equivalent DFT about n+1 due to the index shift by one:

if (fabs(cimag(out_logical[n + 1])) > eps) {
printf("error: delta of [%d] is %g\n", n + 1,
-cimag(out_logical[n + 1]));
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", n + 1, -cimag(out_logical[n + 1]));
}


The remaining values should have odd symmetry around n+1 (note again that the real output from RODFT00 needs to be compared against the negative imaginary parts in the output of the logically equivalent DFT shifted by one full index):

for (int i = 0; i < n; ++i) {
delta = -cimag(out_logical[n + 2 + i]) - (-out[n - 1 - i]);
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", n + 2 + i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", n + 2 + i, delta);
}
}


The full example can be found in src/test_1d_rodft00.c.

#### RODFT10 (DST-II) #

In case of the real-valued odd-parity DFT with shifted input data (RODFT10), also called the DST-II, the corresponding logical DFT size is given by N = 2n, corresponding to n = N/2. This function is commonly known as “the” DST.

The formal definition of the RODFT10 is given below:

$$Y_k = 2 \sum\limits_{j=0}^{n-1} X_j \sin \left[ \pi (j+1/2) (k+1) / n \right]$$

The inverse of this transform is RODFT01. The input array is assumed to have odd symmetry around j=-0.5 and odd symmetry also around j=n-0.5.

In order to demonstrate the use of this method, the logically equivalent DFT input is filled appropriately and its output is checked against RODFT10. In the following code, in is the input array (size n) given to RODFT10 and in_logical is the (complex-valued) input array (size N) handed to a generic 1D DFT. Similarly, out is the output array (size n) from RODFT10 and out_logical is the output array (size N) from a generic 1D DFT.

Here is how the symmetric input is generated:

// the first half of the array is identical
for (int i = 0; i < n; ++i) {
in_logical[i] = in[i];
}

// second half is filled according to odd symmetry around (n-0.5)
for (int i = 0; i < n; ++i) {
in_logical[n + i] = -in[n - 1 - i];
}


The checks are a little bit more involved. The logically equivalent DFT output should be purely imaginary-valued:

for (int i = 0; i < N; ++i) {
if (fabs(creal(out_logical[i])) > eps) {
printf("error: real of [%d] is %g\n", i, creal(out_logical[i]));
status = 1;
} else {
printf("real of [%d] is %g\n", i, creal(out_logical[i]));
}
}


The output of RODFT10 is shifted by one index to the left with respect to the logically-equivalent DFT, since the first input is constrained to be zero (resulting in odd parity about -1 in the input to RODFT10). Odd symmetry about 0 implies that the imaginary part of the first output should be zero:

if (fabs(cimag(out_logical[0])) > eps) {
printf("error: delta of [%d] is %g\n", 0, -cimag(out_logical[0]));
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", 0, -cimag(out_logical[0]));
}


The next n values should have the output of RODFT10 in their negative imaginary parts with one index offset to account for the first zero in the input:

for (int i = 0; i < n; ++i) {
delta = -cimag(out_logical[i + 1]) - out[i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", i + 1, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", i + 1, delta);
}
}


The remaining values have even symmetry around n-1 (note again that the real output from RODFT10 needs to be compared against the negative imaginary parts in the output of the logically equivalent DFT shifted by one full index):

for (int i = 0; i < n - 1; ++i) {
delta = -cimag(out_logical[n + 1 + i]) - out[n - 2 - i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", n + 1 + i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", n + 1 + i, delta);
}
}


The full example can be found in src/test_1d_rodft10.c.

#### RODFT01 (DST-III) #

In case of the real-valued odd-parity DFT with shifted output data (RODFT01), also called the DST-III, the corresponding logical DFT size is given by N = 2n, corresponding to n = N/2. This function is commonly known as “the” inverse DST (IDST).

The formal definition of the RODFT01 is given below:

$$Y_k = (-1)^kX_{n-1} + 2 \sum\limits_{j=0}^{n-2} X_j \sin \left[ \pi (j+1) (k+1/2) / n \right]$$

The inverse of this transform is RODFT10. The input array is assumed to have odd symmetry around j=-1 and even symmetry around j=n-1.

In order to demonstrate the use of this method, the logically equivalent DFT input is filled appropriately and its output is checked against RODFT01. In the following code, in is the input array (size n) given to RODFT01 and in_logical is the (complex-valued) input array (size N) handed to a generic 1D DFT. Similarly, out is the output array (size n) from RODFT01 and out_logical is the output array (size N) from a generic 1D DFT.

The input of RODFT01 is shifted by one index to the left with respect to the logically-equivalent DFT, since the first input is constrained to be zero (resulting in odd parity about -1 in the input to RODFT01). Here is how the symmetric input is generated:

// a zero in the first entry is needed to satisfy
// odd symmetry about -1 in the shifted input array
in_logical[0] = 0.0;

// the first half of the array is identical up to shift
for (int i = 0; i < n; ++i) {
in_logical[i + 1] = in[i];
}

// second half is filled according to even symmetry around (n-1)
for (int i = 0; i < n - 1; ++i) {
in_logical[n + 1 + i] = in[n - 2 - i];
}


The checks are a little bit more involved. The logically equivalent DFT output should be purely imaginary-valued:

for (int i = 0; i < N; ++i) {
if (fabs(creal(out_logical[i])) > eps) {
printf("error: real of [%d] is %g\n", i, creal(out_logical[i]));
status = 1;
} else {
printf("real of [%d] is %g\n", i, creal(out_logical[i]));
}
}


The first n values should have the output of RODFT01 in their negative imaginary parts:

for (int i = 0; i < n; ++i) {
delta = -cimag(out_logical[i]) - out[i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", i, delta);
}
}


The remaining values have odd symmetry around n-0.5 (note again that the real output from RODFT01 needs to be compared against the negative imaginary parts in the output of the logically equivalent DFT shifted by one full index):

for (int i = 0; i < n; ++i) {
delta = -cimag(out_logical[n + i]) - (-out[n - 1 - i]);
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", n + i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", n + i, delta);
}
}


The full example can be found in src/test_1d_rodft01.c.

#### RODFT11 (DST-IV) #

In case of the real-valued odd-parity DFT with both shifted input and output data (RODFT11), also called the DST-IV, the corresponding logical DFT size is given by N = 2n, corresponding to n = N/2.

The formal definition of the RODFT11 is given below:

$$Y_k = 2 \sum\limits_{j=0}^{n-1} X_j \sin \left[ \pi (j+1/2) (k+1/2) / n \right]$$

The inverse of this transform is RODFT11 itself. The input array is assumed to have odd symmetry around j=-0.5 and even symmetry around j=n-0.5.

In order to demonstrate the use of this method, the logically equivalent DFT input is filled appropriately and its output is checked against RODFT11. In the following code, in is the input array (size n) given to RODFT11 and in_logical is the (complex-valued) input array (size N) handed to a generic 1D DFT. Similarly, out is the output array (size n) from RODFT11 and out_logical is the output array (size N) from a generic 1D DFT.

Here is how the symmetric input is generated:

// the first half of the array is identical up to shift
for (int i = 0; i < n; ++i) {
in_logical[i] = in[i];
}

// second half is filled according to even symmetry around (n-0.5)
for (int i = 0; i < n; ++i) {
in_logical[n + i] = in[n - 1 - i];
}


The checks are a little bit more involved. The logically equivalent DFT output should be purely imaginary-valued:

for (int i = 0; i < N; ++i) {
if (fabs(creal(out_logical[i])) > eps) {
printf("error: real of [%d] is %g\n", i, creal(out_logical[i]));
status = 1;
} else {
printf("real of [%d] is %g\n", i, creal(out_logical[i]));
}
}


The first n values should have the output of RODFT11 in their negative imaginary parts:

for (int i = 0; i < n; ++i) {
delta = -cimag(out_logical[i]) - out[i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", i, delta);
}
}


The remaining values have even symmetry around n-0.5 (note again that the real output from RODFT11 needs to be compared against the negative imaginary parts in the output of the logically equivalent DFT):

for (int i = 0; i < n; ++i) {
delta = -cimag(out_logical[n + i]) - out[n - 1 - i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", n + i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", n + i, delta);
}
}


The full example can be found in src/test_1d_rodft11.c.

## Two-Dimensional Examples #

The next section deals with the computation of two-dimensional (2D) DFTs of random input data.

### 2D complex-to-complex #

A first example for the computation of two-dimensional Fourier transforms using FFTW is given here:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#include <complex.h>
#include <fftw3.h>

#include "util.h"

int test_2d_c2c(int n0, int n1) {
int dir = FFTW_FORWARD;

fftw_complex *in = fftw_alloc_complex(n0 * n1);
fftw_complex *out = fftw_alloc_complex(n0 * n1);
fftw_complex *ref_out = fftw_alloc_complex(n0 * n1);

fftw_plan p = fftw_plan_dft_2d(n0, n1, in, out, dir, FFTW_ESTIMATE);

// random input
fill_random_2d_cplx(n0, n1, in);

// manually compute DFT for reference
int idx_k, idx_j;
double phi;
for (int k0 = 0; k0 < n0; ++k0) {
for (int k1 = 0; k1 < n1; ++k1) {
idx_k = k0 * n1 + k1;

ref_out[idx_k] = 0.0;

for (int j0 = 0; j0 < n0; ++j0) {
for (int j1 = 0; j1 < n1; ++j1) {
idx_j = j0 * n1 + j1;

phi = dir * 2.0 * M_PI * (
k0 * j0 / ((double) n0)
+ k1 * j1 / ((double) n1) );

ref_out[idx_k] += in[idx_j] * cexp(I * phi);
}
}
}
}

fftw_execute(p);

// compare outputs
double eps = 1.0e-12;
int status = compare_2d_cplx(n0, n1, ref_out, out, eps);

return status;
}

int main(int argc, char **argv) {
int status = 0;
status += test_2d_c2c(4, 4);
status += test_2d_c2c(4, 5);
status += test_2d_c2c(5, 4);
status += test_2d_c2c(5, 5);
return status;
}


As usual, arrays are allocated to the specified size and filled with random data. The two-dimensional complex-valued DFT is computed manually for reference and the outputs are compared against the result of FFTW. The product of the one-dimensional DFTs along the two dimensions is computed here via addition of the corresponding phases in the exponent of the complex-valued basis function.

The code can be found in src/test_2d_c2c.c.

### 2D complex-to-real #

The two-dimensional c2r transform can make use of the Hermitian symmetry (see above) to reduce the computational work in the last dimension of the input data by about a factor of 2. The input data thus has to have a shape of (n0xn1_cplx) where n1_cplx = n1/2+1 (division by 2 rounded down) and (n0xn1) is the logical size of the DFT and also the size of the real-valued output arrays.

The arrays are allocated as follows:

fftw_complex *in = fftw_alloc_complex(n0 * n1_cplx);
double *out = fftw_alloc_real(n0 * n1);
double *ref_out = fftw_alloc_real(n0 * n1);


The manual transform that is computed for reference is done as follows:

int idx_k, idx_j;
double phi, real;
for (int k0 = 0; k0 < n0; ++k0) {
for (int k1 = 0; k1 < n1; ++k1) {
idx_k = k0 * n1 + k1;

ref_out[idx_k] = 0.0;

for (int j0 = 0; j0 < n0; ++j0) {
for (int j1 = 0; j1 < n1_cplx; ++j1) {
idx_j = j0 * n1_cplx + j1;

phi = 2.0 * M_PI * (  k0 * j0 / ((double) n0)
+ k1 * j1 / ((double) n1) );

// output is purely real,
// so compute only real part
real = creal(in[idx_j]) * cos(phi) - cimag(in[idx_j]) * sin(phi);

ref_out[idx_k] += real;

// n1/2+n1%2 is n1/2 if n1 is even
// and it is n1/2+1 if n1 is odd
if (j1 > 0 && j1 < n1 / 2 + n1 % 2) {
ref_out[idx_k] += real;
}
}
}
}
}


The full example can be found in src/test_2d_c2r.c.

### 2D real-to-complex #

The two-dimensional r2c transform can make use of the Hermitian symmetry (see above) to reduce the computational work in the last dimension of the output data by about a factor of 2. The real-valued input data has to have a shape of (n0xn1), which is also the logical size of the DFT. The output data then has a shape of (n0xn1_cplx) where n1_cplx = n1/2+1 (division by 2 rounded down).

The arrays are allocated as follows:

double *in = fftw_alloc_real(n0 * n1);
fftw_complex *out = fftw_alloc_complex(n0 * n1_cplx);
fftw_complex *ref_out = fftw_alloc_complex(n0 * n1_cplx);


The manual transform that is computed for reference is done as follows:

int idx_k, idx_j;
double phi;
for (int k0 = 0; k0 < n0; ++k0) {
for (int k1 = 0; k1 < n1_cplx; ++k1) {
idx_k = k0 * n1_cplx + k1;

ref_out[idx_k] = 0.0;

for (int j0 = 0; j0 < n0; ++j0) {
for (int j1 = 0; j1 < n1; ++j1) {
idx_j = j0 * n1 + j1;

phi = -2.0 * M_PI * (  k0 * j0 / ((double) n0)
+ k1 * j1 / ((double) n1) );

ref_out[idx_k] += in[idx_j] * cexp(I*phi);
}
}
}
}


The full example can be found in src/test_2d_r2c.c.

### 2D real-to-real #

Two-dimensional real-to-real DFTs can be defined with any of the one-dimensional transforms in each of the two dimensions. This implies that for eight one-dimensional transforms there is a total of 64 possible combinations that exceed the scope of this tutorial and also do not contribute much insight.

Note that FFTW computes the separable product of the transforms along each dimension and not the true multi-dimensional DFT. This can be understood intuitively by the addition formulas for sine and cosine:

\begin{align*} \sin(u \pm v) =& \sin(u) \cos(v) \pm \cos(u) \sin(v) \\ \cos(u \pm v) =& \cos(u) \cos(v) \mp \sin(u) \sin(v) \end{align*}

FFTW’s two-dimensional real-to-real DFTs can be used to compute the products on the right-hand side of above equation and two such DFts have to be added together to arrive at the true two-dimensional DFT with the Fourier kernel as on the left-hand side of above equation.

In the following, two examples are presented. First, a mixed r2r DFT with REDFT10 in the first dimension and RODFT10 in the second dimension is computed. Second, a true two-dimensional DFT in computed using two FFTW calls.

#### 2D REDFT10+RODFT10 #

An example of a two-dimensional transform is presented with REDFT10 in the first dimension and RODFT10 in the second dimension.

The basis functions for the two dimensions are computed separately:

// REDFT10 in first dimension
basis_0 = 2.0 * cos(M_PI * (j0 + 0.5) * k0 / ((double) n0));

// RODFT10 in second dimension
basis_1 = 2.0 * sin(M_PI * (j1 + 0.5) * (k1 + 1.0) / ((double) n1));


The contribution to the output array is computed as follows:

ref_out[idx_k] += in[idx_j] * basis_0 * basis_1;


The full example can be found in src/test_2d_r2r_e10_o10.c.

#### True 2D IDCT using FFTW #

A true two-dimensional inverse discrete cosine transform is computed using FFTW in this example:

$$Y_{k_0,k_1} = \sum\limits_{j_0=0}^{n_0-1} \sum\limits_{j_1=0}^{n_1-1} X_{j_0,j_1} \cos \left( \pi \left( j_0 \frac{k_0 + \frac{1}{2}}{n_0} + j_1 \frac{k_1 + \frac{1}{2}}{n_1} \right) \right)$$

FFTW can only compute the separable product of one-dimensional transforms for multi-dimensional transform. Above kernel function thus has to be split up by application of the addition formula for sine and cosine:

\begin{align*} ~& \cos \left( \pi \left( j_0 \frac{k_0 + \frac{1}{2}}{n_0} + j_1 \frac{k_1 + \frac{1}{2}}{n_1} \right) \right) \\ =& \cos \left( \pi j_0 \frac{k_0 + \frac{1}{2}}{n_0} \right) \cos \left( \pi j_1 \frac{k_1 + \frac{1}{2}}{n_1} \right) - \sin \left( \pi j_0 \frac{k_0 + \frac{1}{2}}{n_0} \right) \sin \left( \pi j_1 \frac{k_1 + \frac{1}{2}}{n_1} \right) \end{align*}

The left half of above equation is computed using a two-dimensional REDFT01 and the right half is computed using a two-dimensional RODFT01. The outputs of these transforms are subtracted from each other to yield the desired result.

First consider the naive implementation in plain C.

The input and output arrays are allocated and filled with random data:

double *in = fftw_alloc_real(n0 * n1);
double *ref_out = fftw_alloc_real(n0 * n1);

fill_random_2d_real(n0, n1, in);


The two-dimensional IDCT is computed from the input data:

double basis;
for (int k0 = 0; k0 < n0; ++k0) {
for (int k1 = 0; k1 < n1; ++k1) {
idx_k = k0 * n1 + k1;

ref_out[idx_k] = 0.0;

for (int j0 = 0; j0 < n0; ++j0) {
for (int j1 = 0; j1 < n1; ++j1) {
idx_j = j0 * n1 + j1;

basis = cos(M_PI * (  (k0 + 0.5) * j0 / ((double) n0)
+ (k1 + 0.5) * j1 / ((double) n1) ) );

ref_out[idx_k] += in[idx_j] * basis;
}
}
}
}


Next, the same transform is implemented using FFTW.

Separate input and output array are allocated in this educational example:

double *in1 = fftw_alloc_real(n0 * n1);
double *in2 = fftw_alloc_real(n0 * n1);
double *out1 = fftw_alloc_real(n0 * n1);
double *out2 = fftw_alloc_real(n0 * n1);


The two two-dimensional transforms are planned:

fftw_plan p1 = fftw_plan_r2r_2d(n0, n1, in1, out1, FFTW_REDFT01, FFTW_REDFT01, FFTW_ESTIMATE);
fftw_plan p2 = fftw_plan_r2r_2d(n0, n1, in2, out2, FFTW_RODFT01, FFTW_RODFT01, FFTW_ESTIMATE);


The next step is to fill the two input arrays in1 and in2 for FFTW with appropriate data so that the results of FFTW are equivalent to those of the naive implementation presented above. This step is not entirely trivial and presented in detail next.

A two-dimensional loop over the input data in is set up and all inputs to FFTW are initialized to be zero. Then, the two input arrays are filled separately.

for (int j0=0; j0<n0; ++j0) {
for (int j1=0; j1<n1; ++j1) {
idx_j = j0 * n1 + j1;

// make sure that all entries are zero
in1[idx_j] = 0.0;
in2[idx_j] = 0.0;

/* fill in1 */

/* fill in2 */
}
}


The input data for REDFT01 has to be multiplied by 0.5 for each dimension where the index is above 0 to cancel the factor of 2 in the definition of REDFT01 for direct correspondence with the desired true two-dimensional IDCT. This is the /* fill in1 */ step in above loop:

double factor = 1.0;
if (j0 > 0) {
factor *= 0.5;
}
if (j1 > 0) {
factor *= 0.5;
}

in1[idx_j] = factor * in[idx_j];


The input data for RODFT01 also needs to have entries multiplied by 0.5 for each dimension in which the index is less than n-1 to cancel the factor of 2 in the definition of RODFT01 for direct correspondence with the desired true two-dimensional IDCT. Additionally, the input data needs to be shifted left/down by one index due to the j+1 appearing in the definition of RODFT01. This is the /* fill in2 */ step in above loop:

if (j0 > 0 && j1 > 0) {
int my_j0 = j0-1;
int my_j1 = j1-1;

idx_j_2 = my_j0 * n1 + my_j1;

factor = 1.0;
if (my_j0 < n0-1) {
factor *= 0.5;
}
if (my_j1 < n1-1) {
factor *= 0.5;
}

in2[idx_j_2] = factor * in[idx_j];
}


In the desired truely-2D IDCT, there is no contribution from the second term in the split-basis approach used to implement this transform using FFTW if either j0==0 or j1==0, since the respecitve arguments to sin are zero in this case. This motivates the check for j0 > 0 && j1 > 0 above. The indices j0 and j1 in the REDFT01 input are transformed to the corresponding indices my_j0 and my_j1 in the input to RODFT01. The corresponding linear index idx_j_2 in in2 is computed from my_j0 and my_j1. Factors of 0.5 are required for each dimension in which the index is less than n-1 to cancel the factor of 2 in the definition of RODFT01.

The FFTW plans can be executed after computing the reference output and it remains to combine the outputs by subtracting the contribution from the two-dimensional RODFT01 to arrive at the final result in out1:

for (int k0 = 0; k0 < n0; ++k0) {
for (int k1 = 0; k1 < n1; ++k1) {
idx_k = k0 * n1 + k1;

// subtract because cos(u+v) = cos(u)*cos(v) - sin(u)*sin(v)
out1[idx_k] -= out2[idx_k];
}
}


The full example can be found in src/test_2d_r2r_true2d.c.

## Real-world applications #

In this section, FFTW is used to evaluate the geometry of the magnetic axis and a flux surface of a stellarator ideal magnetohydrodynamic (MHD) plasma equilibrium as computed by the Variational Moments Equilibrium Code (VMEC).

Here is a plot of a few of the flux surfaces of the Wendelstein 7-X stellarator:

Nested flux surfaces are shown in grey, red, green and blue. The black line indicates the magnetic axis.

The geometry of the magnetic axis and that of a flux surface are real-valued. Therefore, c2r DFTs are sufficient to compute the backward transforms to real space.

### Geometry of the Magnetic Axis in a Stellarator #

The real-space geometry of the magnetic axis (a general closed curve) is given via a one-dimensional DFT evaluated using real-valued arithmetics:

\begin{align*} R_\mathrm{ax}(\zeta) =& \sum\limits_{n=0}^{N} R_{\mathrm{ax}, n}^\mathrm{cos} \cos \left( n \zeta \right) + R_{\mathrm{ax}, n}^\mathrm{sin} \sin \left( n \zeta \right) \\ Z_\mathrm{ax}(\zeta) =& \sum\limits_{n=0}^{N} Z_{\mathrm{ax}, n}^\mathrm{sin} \sin \left( n \zeta \right) + Z_{\mathrm{ax}, n}^\mathrm{cos} \cos \left( n \zeta \right) \end{align*}

where ζ is the toroidal angle per field period in radians. For a five-fold symmetric stellarator like W7-X, ζ ranges from 0 to 5 * 2π around the whole machine. Conversely, the first (unique) toroidal segment of the geometry is contained within a range of ζ from 0 to 2π.

In this example the geometry of the magnetic axis is evaluated at regular intervals in ζ:

$$\zeta_l = 2 \pi l / n_\zeta$$

where l ranges from 0 to n_ζ-1.

Inserting this into the Fourier series for, e.g., the R coordinate, leads to a DFT:

$$R_{\mathrm{ax}, l} = \sum\limits_{n=0}^{N} \left( R_{\mathrm{ax}, n}^\mathrm{cos} - i , R_{\mathrm{ax}, n}^\mathrm{sin} \right) e^{2 \pi i n l / n_\zeta}$$

Note that typically the number of Fourier coefficients included in computation of the MHD equilibrium is (much) less than the number of grid points, i.e., N < n_ζ. The physics happens in real space and certains aspects of a computation might require a fine spatial resolution (i.e., many grid points in real space) to accurately resolve, e.g., gradients while the overall complexity of the solution (gouverned by the number of included Fourier coefficients) does not need to be extremely high. FFTW only supports (logically) equally-sized input and output arrays and thus, if less Fourier coefficients than desired grid points are to be used, the Fourier coefficient array needs to be filled with zeros in the remaining area.

The axis geometry is defined as follows:

// number of Fourier coefficients
int ntor = 12;

// cos-parity Fourier coefficients for magnetic axis
double R_ax_cos[13] = { 5.63, 0.391, 0.0123, 1.21e-3, 4.89e-6, -5.12e-5,
-6.57e-5, 2.27e-6, -9.28e-5, -5.32e-7, 6.67e-5, 5.72e-5, 2.38e-5 };

// sin-parity Fourier coefficients for magnetic axis
double R_ax_sin[13] = { 0.0, 0.0727, 6.34e-3, 5.84e-3, 9.77e-4, 5.32e-5,
8.48e-5, 5.57e-5, 5.56e-5, 5.53e-6, 7.74e-7, 1.03e-5, 8.75e-6 };


The number of grid points in the toroidal direction at which the magnetic axis geometry is to be evaluated is specified via the n_zeta parameter. It needs to be checked that n_zeta is large enough, i.e., above the Nyquist limit corresponding to the specified number of Fourier coefficients. Then, the input and output arrays can be allocated:

int nCplx = n_zeta/2+1;
if (nCplx<ntor+1) {
printf("error: number of grid points too low.\n");
return;
}

fftw_complex *in = fftw_alloc_complex(nCplx);
double *out = fftw_alloc_real(n_zeta);

fftw_plan p = fftw_plan_dft_c2r_1d(n_zeta, in, out, FFTW_ESTIMATE);


Now that the input array is allocated, the available Fourier coefficients can be copied over and (if present) the remainder of the input array is set to zero:

// copy over available Fourier coefficients
for (int n=0; n<=ntor; ++n) {
in[n] = R_ax_cos[n] - I * R_ax_sin[n];
}

// zero out remaining input Fourier coefficients
for (int n=ntor+1; n<nCplx; ++n) {
in[n] = 0.0;
}


The full example can be found in src/app_magn_axis.c.

Assuming stellarator symmetry, half of the Fourier coefficients can be omitted and the transform reduces to the one-dimensional IDCT and IDST, respectively:

\begin{align*} R_\mathrm{ax}(\zeta) =& \sum\limits_{n=0}^{N} R_{\mathrm{ax}, n}^\mathrm{cos} \cos \left( n \zeta \right) \\ Z_\mathrm{ax}(\zeta) =& \sum\limits_{n=0}^{N} Z_{\mathrm{ax}, n}^\mathrm{sin} \sin \left( n \zeta \right) \end{align*}

The FFTW documentation explicitly advises against using R*DFT00 transforms for this purpose due to numerical stability issues which are currently circumvented within FFTW by using a less efficient algorithm. If the data is required anyway on the whole domain from 0 to 2π, it is probably easiest and fastest (in terms of development work) to simply use a c2r DFT and provide only real data in the input.

On the other hand, in case the evaluation locations are not required to be located on “full” grid points in the toroidal direction, REDFT01 and RODFT01 can be used. This is the subject of the now-starting second half of this example.

The number of grid points as well as the output array size are given as follows:

int nCplx = n_zeta / 2;
if (nCplx < ntor + 1) {
printf("error: number of grid points too low.\n");
return;
}


Only the cosine Fourier coefficients are copied over into the (real) input array:

// copy over available Fourier coefficients
in[0] = R_ax_cos[0];
for (int n = 1; n <= ntor; ++n) {
in[n] = R_ax_cos[n];
}

// zero out remaining input Fourier coefficients
for (int n = ntor + 1; n < nCplx; ++n) {
in[n] = 0.0;
}


The second half of the output array (of size n_zeta for equivalent with the first half of this example) is filled by applying the even symmetry of the output data implicitly used in REDFT01:

for (int n=0; n<nCplx; ++n) {
out[n_zeta - 1 - n] = out[n];
}


The output data is compared in the following figure with the equivalent result from using a c2r transform:

The violet curve labelled ‘DFT’ is the output from the c2r DFT applied in the first half of this example. The green curve labelled ‘REDFT01’ is the output from the REDFT01 applied in the second half of this example. Note that the locations of the REDFT01 output needed to be shifted to the right by half an index to yield overlapping curves.

The full example can be found in src/app_magn_axis_stellsym.c.

### Geometry of a Flux Surface in a Stellarator #

The real-space geometry of the flux surface is a general toroidal surface which is conveniently parameterized using poloidal and toroidal angle-like coordinates. The surface geometry is then given via a two-dimensional DFT:

\begin{align*} R(\theta, \zeta) =& \phantom{+}~ \sum\limits_{n= 0}^{N} R_{0,n}^\mathrm{cos} \cos \left( n \zeta \right) + R_{0,n}^\mathrm{sin} \sin \left( n \zeta \right) \\ ~& + \sum\limits_{m=1}^{M} \sum\limits_{n=-N}^{N} R_{m,n}^\mathrm{cos} \cos \left( m \theta - n \zeta \right) + R_{m,n}^\mathrm{sin} \sin \left( m \theta - n \zeta \right) \\ Z(\theta, \zeta) =& \phantom{+}~ \sum\limits_{n= 0}^{N} Z_{0,n}^\mathrm{sin} \sin \left( n \zeta \right) + Z_{0,n}^\mathrm{cos} \cos \left( n \zeta \right) \\ ~& + \sum\limits_{m=1}^{M} \sum\limits_{n=-N}^{N} Z_{m,n}^\mathrm{sin} \sin \left( m \theta - n \zeta \right) + Z_{m,n}^\mathrm{cos} \cos \left( m \theta - n \zeta \right) \end{align*}

where (θ,ζ) are the poloidal and toroidal angle-like variables in radians, respectively. For a five-fold symmetric stellarator like W7-X, ζ ranges from 0 to 5 * 2π around the whole machine. Conversely, the first (unique) toroidal segment of the geometry is contained within a range of ζ from 0 to 2π. The poloidal angle-like coordinate θ ranges from 0 to 2π once the short way around the torus (wristband-like).

The two-dimensional DFT above can be simplified (shown here for R only) by introducing rectangular two-dimensional Fourier coefficient matrices:

$$R_{m,n} = \begin{cases} 0 & m=0, n<0 \\ R_{0,n}^\mathrm{cos} - i R_{0,n}^\mathrm{sin} & m=0,n \geq 0 \\ R_{m,n}^\mathrm{cos} - i R_{m,n}^\mathrm{sin} & \textrm{else} \end{cases}$$

In this example the geometry of the flux surface is evaluated on a regular grid in θ and ζ:

$$\zeta_l = 2 \pi l / n_\zeta$$

$$\theta_k = 2 \pi k / n_\theta$$

where l ranges from 0 to n_ζ-1 and k ranges from 0 to n_θ-1.

Inserting this into the Fourier series for, e.g., the R coordinate, leads to a DFT:

$$R_{l,k} = \sum\limits_{m= 0}^{M} \sum\limits_{n=-N}^{N} R_{m,n} , e^{2 \pi i ( n l / n_\zeta + m k / n_\theta ) }$$

The R coordinate is a real-valued quantity which implies that a two-dimensional c2r DFT provided by FFTW can be used to perform the backward transform in order to evaluate the flux surface geometry. One further issue consists in the fact that the definition of the Fourier geometry employed in VMEC uses an angle argument (m θ - n ζ) where the two-dimensional DFT written out above uses (m θ + n ζ). The Fourier coefficients coming from VMEC have to be inserted into the positions corresponding to (-n) in the input array given to FFTW in order to resolve this pecularity.

The next part concers the code to copy over the Fourier coefficients staring from the VMEC output in arrays rmnc and zmns into the input to FFTW.

The index in the FFTW input corresponding to given mode numbers n and m can be computed as follows:

if (n <= 0) {
idx_in = -n * nyq_pol + m;
} else {
idx_in = (n_zeta - n) * nyq_pol + m;
}


This code is abbreviated in the following as /* compute idx_in */.

For the first coefficients with m=0, the copying is performed as follows:

int idx_vmec = 0;
int m = 0;
for (int n = 0; n <= ntor; ++n) {

/* compute idx_in */

in_R[idx_in] =     rmnc[idx_vmec];
in_Z[idx_in] = I * zmns[idx_vmec];

idx_vmec++;
}


Note that VMEC weights the coefficients for m=0 with positive n ζ in contrast to the following coefficients with m>0. The cosine-parity quantities like R are not influence by this, but the sine-parity quantities like Z need to get the sign of their value flipped due to the odd symmetry sin(-n zeta) = -sin(n zeta).

The second part of the copying adresses the Fourier coefficients with m>0:

for (m = 1; m < mpol; ++m) {
for (int n = -ntor; n <= ntor; ++n) {

/* compute idx_in */

in_R[idx_in] =  0.5 *     rmnc[idx_vmec];
in_Z[idx_in] = -0.5 * I * zmns[idx_vmec];

idx_vmec++;
}
}


Here, the coefficients are scaled by a factor of 0.5. FFTW implies that coefficientss for m<0 were present in the logically equivalent DFT input, which is not the case for VMEC output. Thus, they are (in this case errornously) scaled internally by a factor of 2, which needs to be counteracted by a scaling factor of 0.5 in the input.

Assuming stellarator symmetry, half of the Fourier coefficients can be omitted and the transform reduces to the two-dimensional IDCT and IDST:

\begin{align*} R(\theta, \zeta) =& \sum\limits_{n= 0}^{N} R_{0,n}^\mathrm{cos} \cos \left( n \zeta \right) + \sum\limits_{m=1}^{M} \sum\limits_{n=-N}^{N} R_{m,n}^\mathrm{cos} \cos \left( m \theta - n \zeta \right) \\ Z(\theta, \zeta) =& \sum\limits_{n= 0}^{N} Z_{0,n}^\mathrm{sin} \sin \left( n \zeta \right) + \sum\limits_{m=1}^{M} \sum\limits_{n=-N}^{N} Z_{m,n}^\mathrm{sin} \sin \left( m \theta - n \zeta \right) \end{align*}

This is currently resolved by simply omitting the stellarator-asymmetric terms in the input to FFTW.

The full example code can be found in src/app_flux_surface.c. A Python script is provided to plot the resulting real-space geometry in src/plot_lcfs_realspace.py.

## Allocation of arrays #

Throughout this example collection, the proposed convenience wrapper functions provided by FFTW for allocating real- and complex-valued arrays are used:

int n = 32;
int nOut = n/2+1;
double *in = fftw_alloc_real(n);
fftw_complex *out = fftw_alloc_complex(nOut);


where n is the real-space size of the DFT and nOut is the number of Fourier coefficients resulting from a r2c DFT. The corresponding “raw” allocation code would look like this:

double *in = (double*) fftw_malloc(sizeof(double) * n);
fftw_complex *out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * nOut);


Note that above code is equivalent to the standard C way of allocating memory using malloc:

double *in = (double*) malloc(sizeof(double) * n);
fftw_complex *out = (fftw_complex*) malloc(sizeof(fftw_complex) * nOut);


except that the FFTW routines ensure proper memory alignment for exploiting SIMD instructions of modern CPUs.

## Utility functions #

In order to keep the examples short, a separate header file src/util.h is provided. It contains methods to operate on one- and two-dimensional arrays (the latter stored in row-major order) of real (double) and complex (fftw_complex) numbers. The following operations are supported:

• fill with random numbers between 0 and 1: e.g. fill_random_1d_cplx
• element-wise check for approximate equality: e.g. compare_1d_real
• write into a text file: e.g. dump_1d_real
• compute generalized DFT (shift and non-linear phase): dft_1d_cplx