# FFTW Tutorial #

This is a basic C project (Makefile, but also for the Eclipse IDE) I use for exploring FFTW 3.3.9.

One- and two-dimensional discrete Fourier transforms (DFTs) of random data are computed using both FFTW and straight-forward naive algorithms in order to illustrate explicitly what kind of symmetries and scaling properties FFTW implies in its inputs and outputs.

## One-Dimensional Examples #

This tutorial starts by computing one-dimensional (1D) DFTs of random input data.

### 1D complex-to-complex #

The first example is basically a self-contained version of the corresponding example in the FFTW manual.

We want to compute the complex-valued one-dimensional DFT here, which is specified in section 4.8.1 of the FFTW reference manual.

$$ Y_k = \sum\limits_{j=0}^{n-1} X_j \, e^{\pm 2 \pi j k \sqrt{-1}/n} \textrm{ for } k=0,1,…,(n-1) $$

The sign in the exponent of the basis function specifies the direction in which the Fourier transform is to be computed:
`-1`

indicates a “forward” transform and `+1`

indicates a “backward” transform.
These values are available via the `FFTW_FORWARD`

and `FFTW_BACKWARD`

preprocessor macros.

In order to compute the DFT, complex-valued products of the following form need to be evaluated:

$$ X_j \, e^{\pm 2 \pi j k \sqrt{-1}/n} $$

Eulers formula comes in handy now (where $i$ is the imaginary unit with ${i}^2=-1$):

$$ e^{i \varphi} = \cos(\varphi) + i \sin(\varphi) $$

The angle argument `phi`

can be identified in above formulas for the DFT:

$$ \varphi = \pm 2 \pi j k / n $$

Now the complex-valued product can be computed using only real-valued variables:

\begin{align} ~& \left( \mathcal{Re}(X_j) + i \, \mathcal{Im}(X_j) \right) \cdot \left( \cos(\varphi) + i \sin(\varphi) \right) \\ =& \phantom{+ i}~ \left( \mathcal{Re}(X_j) \cos(\varphi) - \mathcal{Im}(X_j) \sin(\varphi) \right) \\ ~& + i \left( \mathcal{Re}(X_j) \sin(\varphi) + \mathcal{Im}(X_j) \cos(\varphi) \right) \end{align}

FFTW implements all this math internally and the explicit formulation was only used to build a basis for the computations to follow.
Below is the example code showing how to compute the 1d `c2c`

DFT using both FFTW and a manual implementation.
The size of the DFT is specified via the variable `n`

and the direction (forward or backward) is specified via the variable `dir`

.
Complex-valued arrays `in`

, `ref_out`

and `fftw_out`

are allocated to hold the input (*X_k*) and outputs (*Y_k*) of the DFT.
A plan for the corresponding DFT is created using `fftw_plan_dft_1d`

.
Only after this, the input data is written into the `in`

array.
The reference output is computed now (before calling `fftw_execute`

), since in some later examples (most notably multi-dimensional `c2r`

transforms),
FFTW overwrites the input data and for good practise, we keep this in mind already now.
Note that this is an out-of-place transform, since `in`

and `fftw_out`

are allocated to be separate arrays.
Next the FFTW transform can be executed via `fftw_execute`

.
This fills the corresponding output array `fftw_out`

, which is subsequently compared against the reference output in `ref_out`

.
A conservative tolerance of `1e-12`

is specified to make the example work also for weird input data (as generated by the PRNG).
The actual deviations are usually much smaller and can be observed in the screen output from `compare_1d_cplx`

.
Finally, the `fftw_plan`

is destroyed and the memory is released using `fftw_free`

(which has to be used if the array was allocated using `fftw_alloc_*`

).

```
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <complex.h>
#include <fftw3.h>
#include "util.h"
int test_1d_c2c(int n) {
int dir = FFTW_BACKWARD;
double real, imag, phi;
fftw_complex *in = fftw_alloc_complex(n);
fftw_complex *ref_out = fftw_alloc_complex(n);
fftw_complex *fftw_out = fftw_alloc_complex(n);
fftw_plan p = fftw_plan_dft_1d(n, in, fftw_out, dir, FFTW_ESTIMATE);
// fill the input array with random data
fill_random_1d_cplx(n, in);
// compute the reference output
for (int k = 0; k < n; ++k) {
ref_out[k] = 0.0;
for (int j = 0; j < n; ++j) {
phi = dir * 2.0 * M_PI * j * k / ((double) n);
real = creal(in[j]) * cos(phi) - cimag(in[j]) * sin(phi);
imag = creal(in[j]) * sin(phi) + cimag(in[j]) * cos(phi);
ref_out[k] += real + I * imag;
}
}
// compute the DFT of in using FFTW
fftw_execute(p);
// compare reference output with FFTW output
double eps = 1e-12;
int status = compare_1d_cplx(n, ref_out, fftw_out, eps);
fftw_destroy_plan(p);
fftw_free(in);
fftw_free(ref_out);
fftw_free(fftw_out);
return status;
}
int main(int argc, char** argv) {
int status = 0;
status += test_1d_c2c(32);
status += test_1d_c2c(33);
return status;
}
```

The code is available in the file `src/test_1d_c2c.c`

.

### 1D complex-to-real and real-to-complex #

The next two examples deal with DFTs of purely real data (`r2c`

) and DFTs which *produce* purely real data (`c2r`

).
These are covered in the official FFTW tutorial
as well as in the FFTW reference manual.

In case either the input array or the output array are constrained to be purely real, the corresponding complex-valued output or input array
features Hermitian symmetry (where the `n`

-periodicity has been included as well):

$$ Y_{n-k} = Y_{-k} = Y^{*}_k $$

For the case of a DFT of real-valued *X_j* and complex-valued *Y_k* with Hermitian symmetry,
the Fourier sum is written out excplicitly as follows:

$$\begin{align} %TODO: fix underbrace Y_{-k} =& \sum\limits_{j=0}^{n-1} X_j \Bigl( \cos(-2 \pi j k / n) + i \sin(-2 \pi j k / n) \Bigr) \\ =& \sum\limits_{j=0}^{n-1} X_j \Bigl( \cos(2 \pi j k / n) - i \sin(2 \pi j k / n) \Bigr) = Y^{*}_k \end{align}$$

The figure below illustrates the structure of the complex-valued Fourier space arrays
occuring in the DFT for both even-valued (`n=6`

) and odd-valued (`n=7`

) sizes of the DFT.

The size required to contain all information required for the transform from or to a real-valued array
is contained in the first `n/2+1`

(division by 2 rounded down) entries of the complex array, indicated by the red bars in above figure.
For both even and odd values of `n`

, Hermitian symmetry implies *Y_0* = *Y*_0* and thus *Y_0* is always real.
In the case of even `n`

, we can intuitively observe that *Y_m* = *Y*_-m* where *m*=`n/2`

(the element at the Nyquist frequency,
and thus also the last element of the complex-valued Fourier-space array) is also purely real.

The DFT formulation includes all elements of the Fourier-space array from 0 to `n-1`

.
Now that only parts of these coefficients are taken into account, they have to be weighted appropriately
to recover the results that would have been obtained by using the full array without taking advantage of its symmetry properties.
For odd `n`

, all components of the Fourier-space array except the DC element at *k*=0 have to be weighted with a factor of 2.
For even `n`

, all components of the Fourier-space array except the DC element at *k*=0 and the Nyquist element at *k*=`n/2`

have to be weighted with a factor of 2.
The elements that need to be weighted by a factor of 2 are highlighted by solid blue lines in above illustration.
The redundant elements that are not explicitly needed are indicated by dashed blue lines.

The backward transformation from complex-valued Fourier space
to real space is demonstrated in `src/test_1d_c2r.c`

.
The relevant portion of the source code is here:

```
int nCplx = n / 2 + 1;
for (int k = 0; k < n; ++k) {
// start with DC component, which is purely real due to Hermitian symmetry
ref_out[k] = creal(in[0]);
int loopEnd = nCplx;
// special case for even n
if (n % 2 == 0) {
// Nyquist element is purely real as well
phi = 2.0 * M_PI * (nCplx - 1) * k / ((double) n);
ref_out[k] += creal(in[nCplx - 1]) * cos(phi);
loopEnd = nCplx-1;
}
// middle elements are handled the same for even and odd n
for (int j = 1; j < loopEnd; ++j) {
phi = 2.0 * M_PI * j * k / ((double) n);
real = creal(in[j]) * cos(phi) - cimag(in[j]) * sin(phi);
ref_out[k] += 2.0 * real;
}
}
```

Note that the integer division used to compute `nCplx`

(correctly) rounds down.
The rest is a relatively straight-forward implementation of above verbose algorithm.
The DC component is always taken to be real.
Depending on whether `n`

is even or odd, the number of elements to take into account with both real and imaginary component (`loopEnd`

) is adjusted.
The (purely real) Nyquist element at `n/2`

is added separately if `n`

is even.
All other elements are weighted by a factor of 2 and only the real part
of the complex product of input Fourier coefficient and complex-valued basis function is actually computed.

The forward transform from real space to Fourier space is comparably simple to implement:

```
for (int k = 0; k < nCplx; ++k) {
// DC component is always real
ref_out[k] = in[0];
for (int j = 1; j < n; ++j) {
phi = -2.0 * M_PI * j * k / ((double) n);
real = in[j] * cos(phi);
imag = in[j] * sin(phi);
ref_out[k] += real + I * imag;
}
}
```

Note that in this case, only the non-redundant part of the complex-values Fourier coefficients need to be computed
from a real-valued input. The separate handling of the DC component is not strictly necessary, since `cos(0)=1`

and `sin(0)=0`

and thus the DC component would get no imaginary contribution.
The full example is available in the file `src/test_1d_r2c.c`

.

It becomes evident in above examples that the sign of a `c2r`

DFT is always `FFTW_BACKWARD`

and
the sign of a `r2c`

DFT is always `FFTW_FORWARD`

.

### 1D real-to-real #

Certain symmetries can be assumed for a given real input array of which a DFT is to be computed
that lead to the output array being purely real as well. This is another gain of a factor of 2 in speed
and memory usage over `r2c`

/`c2r`

transforms.
Depending on even (odd) parity of the input array, the transform outputs have even (odd) parity.
They are therefore called Discrete Cosine Transform (DCT) and Discrete Sine Transfrom (DST), respectively.

The logical size of the corresponding DFT is denoted as *N*.
The actual array size given to FFTW is denoted by `n`

.
For the DCT/DST types implemented in FFTW, *N* is always even.
Note that this does not pose any restrictions on the input array sizes `n`

.
One can think of the logical DFT input array as one that FFTW ‘sees’ internally and computes a regular DFT of.
The resulting output array is purely real and features the named symmetry properties,
since the logical input array was ‘constructed’ from the given input array to have the desired symmetry properties.

In below example plots used to illustrate these symmetry properties/assumptions,
random Fourier coefficients have been sampled and transformed back to real-space using the named inverse transforms.
This allows to ’evaluate’ the input data also in the samples given in the (small) input arrays.
In these plots, red dashed vertical lines indicate even symmetry (*f(x)=f(-x)*) about the indicated position
and blue dashed vertical lines indicate odd symmetry (*f(x)=-f(-x)*) about the indicated position.
The grey-shaded area in the background indicated the range of samples that are included in the input array.
The x axis labels denote the indices in the actual input array given to FFTW (only valid within the range of the grey boxes).

The nomenclature works as follows:
The first letter is **R** to indicate real-valued data.
The second letter distinguished between **E** for even-parity data and **O** for odd-parity data.
The following **DFT** is for discrete Fourier transform (who guessed…).
The next two digits indicate wheter (**1**) or not (**0**) the input (first digit) or the output (second digit) data is shifted by half a sample.
Think of this in terms of parity: whether the symmetry axis is located at a sample (no shifting necessary) or between two samples (shifting necessary).
The shifting becomes necessary when formulating the symmetry properties over sampled data that has integer indices
vs. symmetry axis that are possibly located at half-integer locations.
The parity and symmetry properties of the output array are those of the input array for the inverse transform.

For all transforms, a periodicity of *N* is assumed for the *logical* input array as *X_j = X_{N+j}* where *X* is the input data array.

Here is a quick overview table to indicate the assumed symmetries in the input array for the following types of `r2r`

DFTs:

type | actual `r2r` input |
logically-equivalent DFT input |
---|---|---|

REDFT00 | `a b c d e` |
`a b c d e d c b` |

REDFT10 | `a b c d ` |
`a b c d d c b a` |

REDFT01 | `a b c d ` |
`a b c d 0 -d -c -b` |

REDFT11 | `a b c d ` |
`a b c d -d -c -b -a` |

RODFT00 | `a b c ` |
`0 a b c 0 -c -b -a` |

RODFT10 | `a b c d ` |
`a b c d -d -c -b -a` |

RODFT01 | `a b c d ` |
`a b c d c b a 0` |

RODFT11 | `a b c d ` |
`a b c d d c b a` |

For the DCTs, please also consider https://en.wikipedia.org/wiki/Discrete_cosine_transform#Formal_definition and in particular https://upload.wikimedia.org/wikipedia/commons/a/ae/DCT-symmetries.svg .

For the DSTs, please also consider https://en.wikipedia.org/wiki/Discrete_sine_transform#Definition and in particular https://upload.wikimedia.org/wikipedia/commons/3/31/DST-symmetries.svg .

#### REDFT00 (DCT-I) #

In case of the real-valued even-parity DFT with no shifts in either input or output array (REDFT00),
also called the DCT-I, the corresponding logical DFT size is given by *N* = 2(`n`

-1), corresponding to `n`

= *N*/2+1.

The formal definition of the REDFT00 is given below:

$$ Y_k = X_0 + (-1)^k X_{n-1} + 2 \sum\limits_{j=1}^{n-2} X_j \cos\left[ \pi j k / (n-1) \right] $$

The inverse of this transform is REDFT00 itself.
The input array is assumed to have even symmetry around *j=0* and even symmetry also around *j=n−1*.

In above figure, the lowercase letters *a* to *e* refer to the input data *abcde* for the size-5 REDFT00,
which is logically equivalent to a size-8 DFT with real-valued input data *abcdedcb*.

In order to demonstrate the use of this method,
the logically equivalent DFT input is filled appropriately and its output is checked against `REDFT00`

.
In the following code, `in`

is the input array (size `n`

) given to `REDFT00`

and `in_logical`

is the (complex-valued) input array (size *N*) handed to a
generic 1D DFT.
Similarly, `out`

is the output array (size `n`

) from `REDFT00`

and `out_logical`

is the output array (size *N*) from a generic 1D DFT.

Here is how the symmetric input is generated:

```
// the first half of the array is identical
for (int i=0; i<n; ++i) {
in_logical[i] = in[i];
}
// second half is filled according to even symmetry around n-1
for (int i=0; i<n-1; ++i) {
in_logical[n+i] = in[n-2-i];
}
```

The checks are a little bit more involved. The logically equivalent DFT output should be purely real-valued:

```
for (int i = 0; i < N; ++i) {
if (fabs(cimag(out_logical[i])) > eps) {
printf("error: imag of [%d] is %g\n", i, cimag(out_logical[i]));
status = 1;
} else {
printf("imag of [%d] is %g\n", i, cimag(out_logical[i]));
}
}
```

The first `n`

values should be identical between `REDFT00`

and the generalized DFT:

```
double delta;
for (int i = 0; i < n; ++i) {
delta = creal(out_logical[i]) - out[i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", i, delta);
}
}
```

The remaining values should have even symmetry around `n-1`

:

```
for (int i = 0; i < n - 2; ++i) {
delta = creal(out_logical[n + i]) - out[n - 2 - i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", n+i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", n+i, delta);
}
}
```

The full example can be found in `src/test_1d_redft00.c`

.

#### REDFT10 (DCT-II) #

In case of the real-valued even-parity DFT with shifted input data (REDFT10),
also called the DCT-II, the corresponding logical DFT size is given by *N* = 2`n`

, corresponding to `n`

= *N*/2.
This function is commonly known as “the” DCT.

The formal definition of the REDFT10 is given below:

$$ Y_k = 2 \sum\limits_{j=0}^{n-1} X_j \cos\left[ \pi (j+1/2) k / n \right] $$

The inverse of this transform is REDFT01.
The input array is assumed to have even symmetry around *j=-0.5* and even symmetry also around *j=n−0.5*.

In above figure, the lowercase letters *a* to *e* refer to the input data *abcd* for the size-4 REDFT10,
which is logically equivalent to a size-8 DFT with real-valued input data *abcddcba*.

In order to demonstrate the use of this method,
the logically equivalent DFT input is filled appropriately and its output is checked against `REDFT10`

.
In the following code, `in`

is the input array (size `n`

) given to `REDFT10`

and `in_logical`

is the (complex-valued) input array (size *N*) handed to a
generic 1D DFT.
Similarly, `out`

is the output array (size `n`

) from `REDFT10`

and `out_logical`

is the output array (size *N*) from a generic 1D DFT.

Here is how the symmetric input is generated:

```
// the first half of the array is identical
for (int i = 0; i < n; ++i) {
in_logical[i] = in[i];
}
// second half is filled according to even symmetry around n-0.5
for (int i = 0; i < n; ++i) {
in_logical[n + i] = in[n - 1 - i];
}
```

The checks are a little bit more involved. The logically equivalent DFT output should be purely real-valued:

```
for (int i = 0; i < N; ++i) {
if (fabs(cimag(out_logical[i])) > eps) {
printf("error: imag of [%d] is %g\n", i, cimag(out_logical[i]));
status = 1;
} else {
printf("imag of [%d] is %g\n", i, cimag(out_logical[i]));
}
}
```

The first `n`

values should be identical between `REDFT10`

and the generalized DFT:

```
for (int i = 0; i < n; ++i) {
delta = creal(out_logical[i]) - out[i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", i, delta);
}
}
```

Odd symmetry around `n`

implies that the value at `n`

is zero:

```
if (fabs(creal(out_logical[n])) > eps) {
printf("error: delta of [%d] is %g\n", n, creal(out_logical[n]));
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", n, creal(out_logical[n]));
}
```

The remaining values should have odd symmetry around `n`

:

```
for (int i = 1; i < n; ++i) {
delta = creal(out_logical[n + i]) - (-out[n - i]);
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", n + i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", n + i, delta);
}
}
```

The full example can be found in `src/test_1d_redft10.c`

.

#### REDFT01 (DCT-III) #

In case of the real-valued even-parity DFT with shifted output data (REDFT01),
also called the DCT-III, the corresponding logical DFT size is given by *N* = 2`n`

, corresponding to `n`

= *N*/2.
This function is commonly known as “the” inverse DCT (IDCT).

The formal definition of the REDFT01 is given below:

$$ Y_k = X_0 + 2 \sum\limits_{j=1}^{n-1} X_j \cos\left[ \pi j (k+1/2) / n \right] $$

The inverse of this transform is REDFT10.
The input array is assumed to have even symmetry around *j=0* and odd symmetry around *j=n*.

In order to demonstrate the use of this method,
the logically equivalent DFT input is filled appropriately and its output is checked against `REDFT01`

.
In the following code, `in`

is the input array (size `n`

) given to `REDFT01`

and `in_logical`

is the (complex-valued) input array (size *N*) handed to a
generic 1D DFT.
Similarly, `out`

is the output array (size `n`

) from `REDFT01`

and `out_logical`

is the output array (size *N*) from a generic 1D DFT.

Here is how the symmetric input is generated:

```
// the first half of the array is identical
for (int i = 0; i < n; ++i) {
in_logical[i] = in[i];
}
// second half is filled according to odd symmetry around n
for (int i = 1; i < n; ++i) {
in_logical[n + i] = -in[n - i];
}
```

The checks are a little bit more involved. The logically equivalent DFT output should be purely real-valued:

```
for (int i = 0; i < N; ++i) {
if (fabs(cimag(out_logical[i])) > eps) {
printf("error: imag of [%d] is %g\n", i, cimag(out_logical[i]));
status = 1;
} else {
printf("imag of [%d] is %g\n", i, cimag(out_logical[i]));
}
}
```

The first `n`

values should be identical between `REDFT01`

and the generalized DFT:

```
for (int i = 0; i < n; ++i) {
delta = creal(out_logical[i]) - out[i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", i, delta);
}
}
```

The remaining values should have even symmetry around `n-0.5`

:

```
for (int i = 0; i < n; ++i) {
delta = creal(out_logical[n + i]) - out[n - 1 - i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", n + i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", n + i, delta);
}
}
```

The full example can be found in `src/test_1d_redft01.c`

.

#### REDFT11 (DCT-IV) #

In case of the real-valued even-parity DFT with both shifted input and output data (REDFT11),
also called the DCT-IV, the corresponding logical DFT size is given by *N* = 2`n`

, corresponding to `n`

= *N*/2.

The formal definition of the REDFT11 is given below:

$$ Y_k = 2 \sum\limits_{j=0}^{n-1} X_j \cos\left[ \pi (j+1/2) (k+1/2) / n \right] $$

The inverse of this transform is REDFT11 itself.
The input array is assumed to have even symmetry around *j=-0.5* and odd symmetry around *j=n-0.5*.

In order to demonstrate the use of this method,
the logically equivalent DFT input is filled appropriately and its output is checked against `REDFT11`

.
In the following code, `in`

is the input array (size `n`

) given to `REDFT11`

and `in_logical`

is the (complex-valued) input array (size *N*) handed to a
generic 1D DFT.
Similarly, `out`

is the output array (size `n`

) from `REDFT11`

and `out_logical`

is the output array (size *N*) from a generic 1D DFT.

Here is how the symmetric input is generated:

```
// the first half of the array is identical
for (int i = 0; i < n; ++i) {
in_logical[i] = in[i];
}
// second half is filled according to odd symmetry around n-0.5
for (int i = 0; i < n; ++i) {
in_logical[n + i] = -in[n - 1 - i];
}
```

```
for (int i = 0; i < N; ++i) {
if (fabs(cimag(out_logical[i])) > eps) {
printf("error: imag of [%d] is %g\n", i, cimag(out_logical[i]));
status = 1;
} else {
printf("imag of [%d] is %g\n", i, cimag(out_logical[i]));
}
}
```

The first `n`

values should be identical between `REDFT11`

and the generalized DFT:

```
for (int i = 0; i < n; ++i) {
delta = creal(out_logical[i]) - out[i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", i, delta);
}
}
```

The remaining values should have odd symmetry around `n-0.5`

:

```
for (int i = 0; i < n; ++i) {
delta = creal(out_logical[n + i]) - (-out[n - 1 - i]);
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", n + i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", n + i, delta);
}
}
```

The full example can be found in `src/test_1d_redft11.c`

.

#### RODFT00 (DST-I) #

In case of the real-valued odd-parity DFT with no shifts in either input or output array (RODFT00),
also called the DST-I, the corresponding logical DFT size is given by *N* = 2(`n`

+1), corresponding to `n`

= *N*/2-1.
Note that the periodicity of *N* of the logical input array in combination with odd symmetry *X_j = -X_{n-j}*
leads to *X_0 = -X_0* which is equivalent to *X_0 = 0*.
This first always-zero element of the input array is not explicitly included in the input to FFTW
and the input array thus has a size of one less and the indices of the symmetry axis shift by 1.

The formal definition of the RODFT00 is given below:

$$ Y_k = 2 \sum\limits_{j=0}^{n-1} X_j \sin \left[ \pi (j+1) (k+1) / (n+1) \right] $$

The inverse of this transform is RODFT00 itself.
The input array is assumed to have odd symmetry around *j=-1* and odd symmetry also around *j=n*.

In order to demonstrate the use of this method,
the logically equivalent DFT input is filled appropriately and its output is checked against `RODFT00`

.
In the following code, `in`

is the input array (size `n`

) given to `RODFT00`

and `in_logical`

is the (complex-valued) input array (size *N*) handed to a
generic 1D DFT.
Similarly, `out`

is the output array (size `n`

) from `RODFT00`

and `out_logical`

is the output array (size *N*) from a generic 1D DFT.

Here is how the symmetric input is generated:

```
// the first half of the array is identical
// except the first 0 (and the rest of the array being shifted)
in_logical[0] = 0.0;
for (int i = 0; i < n; ++i) {
in_logical[i + 1] = in[i];
}
// second half is filled according to odd symmetry around (n+1)
in_logical[n + 1] = 0.0;
for (int i = 0; i < n; ++i) {
in_logical[n + 2 + i] = -in[n - 1 - i];
}
```

The checks are a little bit more involved. The logically equivalent DFT output should be purely imaginary-valued:

```
for (int i = 0; i < N; ++i) {
if (fabs(creal(out_logical[i])) > eps) {
printf("error: real of [%d] is %g\n", i, creal(out_logical[i]));
status = 1;
} else {
printf("real of [%d] is %g\n", i, creal(out_logical[i]));
}
}
```

Odd symmetry about `0`

implies that the imaginary part of the first output should be zero:

```
if (fabs(cimag(out_logical[0])) > eps) {
printf("error: delta of [%d] is %g\n", 0, -cimag(out_logical[0]));
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", 0, -cimag(out_logical[0]));
}
```

The next `n`

values should have the output of `RODFT00`

in their negative imaginary parts
with one index offset to account for the first zero in the input:

```
for (int i = 0; i < n; ++i) {
delta = -cimag(out_logical[i + 1]) - out[i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", i + 1, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", i + 1, delta);
}
}
```

Odd symmetry about `n`

in the input to `RODFT00`

implies
odd symmetry in the output of the logically-equivalent DFT about `n+1`

due to the index shift by one:

```
if (fabs(cimag(out_logical[n + 1])) > eps) {
printf("error: delta of [%d] is %g\n", n + 1,
-cimag(out_logical[n + 1]));
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", n + 1, -cimag(out_logical[n + 1]));
}
```

The remaining values should have odd symmetry around `n+1`

(note again that the real output from `RODFT00`

needs to be compared against
the negative imaginary parts in the output of the logically equivalent DFT shifted by one full index):

```
for (int i = 0; i < n; ++i) {
delta = -cimag(out_logical[n + 2 + i]) - (-out[n - 1 - i]);
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", n + 2 + i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", n + 2 + i, delta);
}
}
```

The full example can be found in `src/test_1d_rodft00.c`

.

#### RODFT10 (DST-II) #

In case of the real-valued odd-parity DFT with shifted input data (RODFT10),
also called the DST-II, the corresponding logical DFT size is given by *N* = 2`n`

, corresponding to `n`

= *N*/2.
This function is commonly known as “the” DST.

The formal definition of the RODFT10 is given below:

$$ Y_k = 2 \sum\limits_{j=0}^{n-1} X_j \sin \left[ \pi (j+1/2) (k+1) / n \right] $$

The inverse of this transform is RODFT01.
The input array is assumed to have odd symmetry around *j=-0.5* and odd symmetry also around *j=n-0.5*.

In order to demonstrate the use of this method,
the logically equivalent DFT input is filled appropriately and its output is checked against `RODFT10`

.
In the following code, `in`

is the input array (size `n`

) given to `RODFT10`

and `in_logical`

is the (complex-valued) input array (size *N*) handed to a
generic 1D DFT.
Similarly, `out`

is the output array (size `n`

) from `RODFT10`

and `out_logical`

is the output array (size *N*) from a generic 1D DFT.

Here is how the symmetric input is generated:

```
// the first half of the array is identical
for (int i = 0; i < n; ++i) {
in_logical[i] = in[i];
}
// second half is filled according to odd symmetry around (n-0.5)
for (int i = 0; i < n; ++i) {
in_logical[n + i] = -in[n - 1 - i];
}
```

The checks are a little bit more involved. The logically equivalent DFT output should be purely imaginary-valued:

```
for (int i = 0; i < N; ++i) {
if (fabs(creal(out_logical[i])) > eps) {
printf("error: real of [%d] is %g\n", i, creal(out_logical[i]));
status = 1;
} else {
printf("real of [%d] is %g\n", i, creal(out_logical[i]));
}
}
```

The output of `RODFT10`

is shifted by one index to the left with respect to the logically-equivalent DFT,
since the first input is constrained to be zero (resulting in odd parity about `-1`

in the input to `RODFT10`

).
Odd symmetry about `0`

implies that the imaginary part of the first output should be zero:

```
if (fabs(cimag(out_logical[0])) > eps) {
printf("error: delta of [%d] is %g\n", 0, -cimag(out_logical[0]));
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", 0, -cimag(out_logical[0]));
}
```

The next `n`

values should have the output of `RODFT10`

in their negative imaginary parts
with one index offset to account for the first zero in the input:

```
for (int i = 0; i < n; ++i) {
delta = -cimag(out_logical[i + 1]) - out[i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", i + 1, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", i + 1, delta);
}
}
```

The remaining values have even symmetry around `n-1`

(note again that the real output from `RODFT10`

needs to be compared against
the negative imaginary parts in the output of the logically equivalent DFT shifted by one full index):

```
for (int i = 0; i < n - 1; ++i) {
delta = -cimag(out_logical[n + 1 + i]) - out[n - 2 - i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", n + 1 + i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", n + 1 + i, delta);
}
}
```

The full example can be found in `src/test_1d_rodft10.c`

.

#### RODFT01 (DST-III) #

In case of the real-valued odd-parity DFT with shifted output data (RODFT01),
also called the DST-III, the corresponding logical DFT size is given by *N* = 2`n`

, corresponding to `n`

= *N*/2.
This function is commonly known as “the” inverse DST (IDST).

The formal definition of the RODFT01 is given below:

$$ Y_k = (-1)^kX_{n-1} + 2 \sum\limits_{j=0}^{n-2} X_j \sin \left[ \pi (j+1) (k+1/2) / n \right] $$

The inverse of this transform is RODFT10.
The input array is assumed to have odd symmetry around *j=-1* and even symmetry around *j=n-1*.

In order to demonstrate the use of this method,
the logically equivalent DFT input is filled appropriately and its output is checked against `RODFT01`

.
In the following code, `in`

is the input array (size `n`

) given to `RODFT01`

and `in_logical`

is the (complex-valued) input array (size *N*) handed to a
generic 1D DFT.
Similarly, `out`

is the output array (size `n`

) from `RODFT01`

and `out_logical`

is the output array (size *N*) from a generic 1D DFT.

The input of `RODFT01`

is shifted by one index to the left with respect to the logically-equivalent DFT,
since the first input is constrained to be zero (resulting in odd parity about `-1`

in the input to `RODFT01`

).
Here is how the symmetric input is generated:

```
// a zero in the first entry is needed to satisfy
// odd symmetry about -1 in the shifted input array
in_logical[0] = 0.0;
// the first half of the array is identical up to shift
for (int i = 0; i < n; ++i) {
in_logical[i + 1] = in[i];
}
// second half is filled according to even symmetry around (n-1)
for (int i = 0; i < n - 1; ++i) {
in_logical[n + 1 + i] = in[n - 2 - i];
}
```

The checks are a little bit more involved. The logically equivalent DFT output should be purely imaginary-valued:

```
for (int i = 0; i < N; ++i) {
if (fabs(creal(out_logical[i])) > eps) {
printf("error: real of [%d] is %g\n", i, creal(out_logical[i]));
status = 1;
} else {
printf("real of [%d] is %g\n", i, creal(out_logical[i]));
}
}
```

The first `n`

values should have the output of `RODFT01`

in their negative imaginary parts:

```
for (int i = 0; i < n; ++i) {
delta = -cimag(out_logical[i]) - out[i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", i, delta);
}
}
```

The remaining values have odd symmetry around `n-0.5`

(note again that the real output from `RODFT01`

needs to be compared against
the negative imaginary parts in the output of the logically equivalent DFT shifted by one full index):

```
for (int i = 0; i < n; ++i) {
delta = -cimag(out_logical[n + i]) - (-out[n - 1 - i]);
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", n + i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", n + i, delta);
}
}
```

The full example can be found in `src/test_1d_rodft01.c`

.

#### RODFT11 (DST-IV) #

In case of the real-valued odd-parity DFT with both shifted input and output data (RODFT11),
also called the DST-IV, the corresponding logical DFT size is given by *N* = 2`n`

, corresponding to `n`

= *N*/2.

The formal definition of the RODFT11 is given below:

$$ Y_k = 2 \sum\limits_{j=0}^{n-1} X_j \sin \left[ \pi (j+1/2) (k+1/2) / n \right] $$

The inverse of this transform is RODFT11 itself.
The input array is assumed to have odd symmetry around *j=-0.5* and even symmetry around *j=n-0.5*.

In order to demonstrate the use of this method,
the logically equivalent DFT input is filled appropriately and its output is checked against `RODFT11`

.
In the following code, `in`

is the input array (size `n`

) given to `RODFT11`

and `in_logical`

is the (complex-valued) input array (size *N*) handed to a
generic 1D DFT.
Similarly, `out`

is the output array (size `n`

) from `RODFT11`

and `out_logical`

is the output array (size *N*) from a generic 1D DFT.

Here is how the symmetric input is generated:

```
// the first half of the array is identical up to shift
for (int i = 0; i < n; ++i) {
in_logical[i] = in[i];
}
// second half is filled according to even symmetry around (n-0.5)
for (int i = 0; i < n; ++i) {
in_logical[n + i] = in[n - 1 - i];
}
```

```
for (int i = 0; i < N; ++i) {
if (fabs(creal(out_logical[i])) > eps) {
printf("error: real of [%d] is %g\n", i, creal(out_logical[i]));
status = 1;
} else {
printf("real of [%d] is %g\n", i, creal(out_logical[i]));
}
}
```

The first `n`

values should have the output of `RODFT11`

in their negative imaginary parts:

```
for (int i = 0; i < n; ++i) {
delta = -cimag(out_logical[i]) - out[i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", i, delta);
}
}
```

The remaining values have even symmetry around `n-0.5`

(note again that the real output from `RODFT11`

needs to be compared against
the negative imaginary parts in the output of the logically equivalent DFT):

```
for (int i = 0; i < n; ++i) {
delta = -cimag(out_logical[n + i]) - out[n - 1 - i];
if (fabs(delta) > eps) {
printf("error: delta of [%d] is %g\n", n + i, delta);
status = 1;
} else {
printf("match of [%d] (delta=%g)\n", n + i, delta);
}
}
```

The full example can be found in `src/test_1d_rodft11.c`

.

## Two-Dimensional Examples #

The next section deals with the computation of two-dimensional (2D) DFTs of random input data.

### 2D complex-to-complex #

A first example for the computation of two-dimensional Fourier transforms using FFTW is given here:

```
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <complex.h>
#include <fftw3.h>
#include "util.h"
int test_2d_c2c(int n0, int n1) {
int dir = FFTW_FORWARD;
fftw_complex *in = fftw_alloc_complex(n0 * n1);
fftw_complex *out = fftw_alloc_complex(n0 * n1);
fftw_complex *ref_out = fftw_alloc_complex(n0 * n1);
fftw_plan p = fftw_plan_dft_2d(n0, n1, in, out, dir, FFTW_ESTIMATE);
// random input
fill_random_2d_cplx(n0, n1, in);
// manually compute DFT for reference
int idx_k, idx_j;
double phi;
for (int k0 = 0; k0 < n0; ++k0) {
for (int k1 = 0; k1 < n1; ++k1) {
idx_k = k0 * n1 + k1;
ref_out[idx_k] = 0.0;
for (int j0 = 0; j0 < n0; ++j0) {
for (int j1 = 0; j1 < n1; ++j1) {
idx_j = j0 * n1 + j1;
phi = dir * 2.0 * M_PI * (
k0 * j0 / ((double) n0)
+ k1 * j1 / ((double) n1) );
ref_out[idx_k] += in[idx_j] * cexp(I * phi);
}
}
}
}
fftw_execute(p);
// compare outputs
double eps = 1.0e-12;
int status = compare_2d_cplx(n0, n1, ref_out, out, eps);
return status;
}
int main(int argc, char **argv) {
int status = 0;
status += test_2d_c2c(4, 4);
status += test_2d_c2c(4, 5);
status += test_2d_c2c(5, 4);
status += test_2d_c2c(5, 5);
return status;
}
```

As usual, arrays are allocated to the specified size and filled with random data. The two-dimensional complex-valued DFT is computed manually for reference and the outputs are compared against the result of FFTW. The product of the one-dimensional DFTs along the two dimensions is computed here via addition of the corresponding phases in the exponent of the complex-valued basis function.

The code can be found in `src/test_2d_c2c.c`

.

### 2D complex-to-real #

The two-dimensional `c2r`

transform can make use of the Hermitian symmetry
(see above)
to reduce the computational work in the *last* dimension of the input data by about a factor of 2.
The input data thus has to have a shape of (`n0`

x`n1_cplx`

) where `n1_cplx = n1/2+1`

(division by 2 rounded down)
and (`n0`

x`n1`

) is the logical size of the DFT and also the size of the real-valued output arrays.

The arrays are allocated as follows:

```
fftw_complex *in = fftw_alloc_complex(n0 * n1_cplx);
double *out = fftw_alloc_real(n0 * n1);
double *ref_out = fftw_alloc_real(n0 * n1);
```

The manual transform that is computed for reference is done as follows:

```
int idx_k, idx_j;
double phi, real;
for (int k0 = 0; k0 < n0; ++k0) {
for (int k1 = 0; k1 < n1; ++k1) {
idx_k = k0 * n1 + k1;
ref_out[idx_k] = 0.0;
for (int j0 = 0; j0 < n0; ++j0) {
for (int j1 = 0; j1 < n1_cplx; ++j1) {
idx_j = j0 * n1_cplx + j1;
phi = 2.0 * M_PI * ( k0 * j0 / ((double) n0)
+ k1 * j1 / ((double) n1) );
// output is purely real,
// so compute only real part
real = creal(in[idx_j]) * cos(phi) - cimag(in[idx_j]) * sin(phi);
ref_out[idx_k] += real;
// add symmetric entries twice
// n1/2+n1%2 is n1/2 if n1 is even
// and it is n1/2+1 if n1 is odd
if (j1 > 0 && j1 < n1 / 2 + n1 % 2) {
ref_out[idx_k] += real;
}
}
}
}
}
```

The full example can be found in `src/test_2d_c2r.c`

.

### 2D real-to-complex #

The two-dimensional `r2c`

transform can make use of the Hermitian symmetry
(see above)
to reduce the computational work in the *last* dimension of the output data by about a factor of 2.
The real-valued input data has to have a shape of (`n0`

x`n1`

), which is also the logical size of the DFT.
The output data then has a shape of (`n0`

x`n1_cplx`

) where `n1_cplx = n1/2+1`

(division by 2 rounded down).

The arrays are allocated as follows:

```
double *in = fftw_alloc_real(n0 * n1);
fftw_complex *out = fftw_alloc_complex(n0 * n1_cplx);
fftw_complex *ref_out = fftw_alloc_complex(n0 * n1_cplx);
```

The manual transform that is computed for reference is done as follows:

```
int idx_k, idx_j;
double phi;
for (int k0 = 0; k0 < n0; ++k0) {
for (int k1 = 0; k1 < n1_cplx; ++k1) {
idx_k = k0 * n1_cplx + k1;
ref_out[idx_k] = 0.0;
for (int j0 = 0; j0 < n0; ++j0) {
for (int j1 = 0; j1 < n1; ++j1) {
idx_j = j0 * n1 + j1;
phi = -2.0 * M_PI * ( k0 * j0 / ((double) n0)
+ k1 * j1 / ((double) n1) );
ref_out[idx_k] += in[idx_j] * cexp(I*phi);
}
}
}
}
```

The full example can be found in `src/test_2d_r2c.c`

.

### 2D real-to-real #

Two-dimensional real-to-real DFTs can be defined with any of the one-dimensional transforms in each of the two dimensions. This implies that for eight one-dimensional transforms there is a total of 64 possible combinations that exceed the scope of this tutorial and also do not contribute much insight.

Note that FFTW computes the *separable product* of the transforms along each dimension
and not the true multi-dimensional DFT.
This can be understood intuitively by the addition formulas for sine and cosine:

\begin{align*} \sin(u \pm v) =& \sin(u) \cos(v) \pm \cos(u) \sin(v) \\ \cos(u \pm v) =& \cos(u) \cos(v) \mp \sin(u) \sin(v) \end{align*}

FFTW’s two-dimensional real-to-real DFTs can be used to compute the products on the right-hand side of above equation and two such DFts have to be added together to arrive at the true two-dimensional DFT with the Fourier kernel as on the left-hand side of above equation.

In the following, two examples are presented.
First, a mixed `r2r`

DFT with `REDFT10`

in the first dimension and `RODFT10`

in the second dimension is computed.
Second, a true two-dimensional DFT in computed using two FFTW calls.

#### 2D REDFT10+RODFT10 #

An example of a two-dimensional transform is presented
with `REDFT10`

in the first dimension and `RODFT10`

in the second dimension.

The basis functions for the two dimensions are computed separately:

```
// REDFT10 in first dimension
basis_0 = 2.0 * cos(M_PI * (j0 + 0.5) * k0 / ((double) n0));
// RODFT10 in second dimension
basis_1 = 2.0 * sin(M_PI * (j1 + 0.5) * (k1 + 1.0) / ((double) n1));
```

The contribution to the output array is computed as follows:

```
ref_out[idx_k] += in[idx_j] * basis_0 * basis_1;
```

The full example can be found in `src/test_2d_r2r_e10_o10.c`

.

#### True 2D IDCT using FFTW #

A true two-dimensional inverse discrete cosine transform is computed using FFTW in this example:

$$ Y_{k_0,k_1} = \sum\limits_{j_0=0}^{n_0-1} \sum\limits_{j_1=0}^{n_1-1} X_{j_0,j_1} \cos \left( \pi \left( j_0 \frac{k_0 + \frac{1}{2}}{n_0} + j_1 \frac{k_1 + \frac{1}{2}}{n_1} \right) \right) $$

FFTW can only compute the separable product of one-dimensional transforms for multi-dimensional transform. Above kernel function thus has to be split up by application of the addition formula for sine and cosine:

\begin{align*} ~& \cos \left( \pi \left( j_0 \frac{k_0 + \frac{1}{2}}{n_0} + j_1 \frac{k_1 + \frac{1}{2}}{n_1} \right) \right) \\ =& \cos \left( \pi j_0 \frac{k_0 + \frac{1}{2}}{n_0} \right) \cos \left( \pi j_1 \frac{k_1 + \frac{1}{2}}{n_1} \right) - \sin \left( \pi j_0 \frac{k_0 + \frac{1}{2}}{n_0} \right) \sin \left( \pi j_1 \frac{k_1 + \frac{1}{2}}{n_1} \right) \end{align*}

The left half of above equation is computed using a two-dimensional `REDFT01`

and the right half is computed using a two-dimensional `RODFT01`

.
The outputs of these transforms are subtracted from each other to yield the desired result.

First consider the naive implementation in plain `C`

.

The input and output arrays are allocated and filled with random data:

```
double *in = fftw_alloc_real(n0 * n1);
double *ref_out = fftw_alloc_real(n0 * n1);
fill_random_2d_real(n0, n1, in);
```

The two-dimensional IDCT is computed from the input data:

```
double basis;
for (int k0 = 0; k0 < n0; ++k0) {
for (int k1 = 0; k1 < n1; ++k1) {
idx_k = k0 * n1 + k1;
ref_out[idx_k] = 0.0;
for (int j0 = 0; j0 < n0; ++j0) {
for (int j1 = 0; j1 < n1; ++j1) {
idx_j = j0 * n1 + j1;
basis = cos(M_PI * ( (k0 + 0.5) * j0 / ((double) n0)
+ (k1 + 0.5) * j1 / ((double) n1) ) );
ref_out[idx_k] += in[idx_j] * basis;
}
}
}
}
```

Next, the same transform is implemented using FFTW.

Separate input and output array are allocated in this educational example:

```
double *in1 = fftw_alloc_real(n0 * n1);
double *in2 = fftw_alloc_real(n0 * n1);
double *out1 = fftw_alloc_real(n0 * n1);
double *out2 = fftw_alloc_real(n0 * n1);
```

The two two-dimensional transforms are planned:

```
fftw_plan p1 = fftw_plan_r2r_2d(n0, n1, in1, out1, FFTW_REDFT01, FFTW_REDFT01, FFTW_ESTIMATE);
fftw_plan p2 = fftw_plan_r2r_2d(n0, n1, in2, out2, FFTW_RODFT01, FFTW_RODFT01, FFTW_ESTIMATE);
```

The next step is to fill the two input arrays `in1`

and `in2`

for FFTW
with appropriate data so that the results of FFTW are equivalent
to those of the naive implementation presented above.
This step is not entirely trivial and presented in detail next.

A two-dimensional loop over the input data `in`

is set up
and all inputs to FFTW are initialized to be zero.
Then, the two input arrays are filled separately.

```
for (int j0=0; j0<n0; ++j0) {
for (int j1=0; j1<n1; ++j1) {
idx_j = j0 * n1 + j1;
// make sure that all entries are zero
in1[idx_j] = 0.0;
in2[idx_j] = 0.0;
/* fill in1 */
/* fill in2 */
}
}
```

The input data for `REDFT01`

has to be multiplied by `0.5`

for each dimension where the index is above `0`

to cancel
the factor of `2`

in the definition of `REDFT01`

for direct correspondence with the desired true two-dimensional IDCT.
This is the `/* fill in1 */`

step in above loop:

```
double factor = 1.0;
if (j0 > 0) {
factor *= 0.5;
}
if (j1 > 0) {
factor *= 0.5;
}
in1[idx_j] = factor * in[idx_j];
```

The input data for `RODFT01`

also needs to have entries
multiplied by `0.5`

for each dimension in which the index is less than `n-1`

to cancel the factor of `2`

in the definition of `RODFT01`

for direct correspondence with the desired true two-dimensional IDCT.
Additionally, the input data needs to be shifted left/down by one index
due to the `j+1`

appearing in the definition of `RODFT01`

.
This is the `/* fill in2 */`

step in above loop:

```
if (j0 > 0 && j1 > 0) {
int my_j0 = j0-1;
int my_j1 = j1-1;
idx_j_2 = my_j0 * n1 + my_j1;
factor = 1.0;
if (my_j0 < n0-1) {
factor *= 0.5;
}
if (my_j1 < n1-1) {
factor *= 0.5;
}
in2[idx_j_2] = factor * in[idx_j];
}
```

In the desired truely-2D IDCT, there is no contribution from the second term
in the split-basis approach used to implement this transform using FFTW
if either `j0==0`

or `j1==0`

, since the respecitve arguments to `sin`

are zero in this case.
This motivates the check for `j0 > 0 && j1 > 0`

above.
The indices `j0`

and `j1`

in the `REDFT01`

input are transformed to the corresponding indices `my_j0`

and `my_j1`

in the input to `RODFT01`

. The corresponding linear index `idx_j_2`

in `in2`

is computed from `my_j0`

and `my_j1`

.
Factors of `0.5`

are required for each dimension in which the index is less than `n-1`

to cancel the factor of `2`

in the definition of `RODFT01`

.

The FFTW plans can be executed after computing the reference output
and it remains to combine the outputs by subtracting the contribution
from the two-dimensional `RODFT01`

to arrive at the final result in `out1`

:

```
for (int k0 = 0; k0 < n0; ++k0) {
for (int k1 = 0; k1 < n1; ++k1) {
idx_k = k0 * n1 + k1;
// subtract because cos(u+v) = cos(u)*cos(v) - sin(u)*sin(v)
out1[idx_k] -= out2[idx_k];
}
}
```

The full example can be found in `src/test_2d_r2r_true2d.c`

.

## Real-world applications #

In this section, FFTW is used to evaluate the geometry of the magnetic axis and a flux surface of a stellarator ideal magnetohydrodynamic (MHD) plasma equilibrium as computed by the Variational Moments Equilibrium Code (VMEC).

Here is a plot of a few of the flux surfaces of the Wendelstein 7-X stellarator:

Nested flux surfaces are shown in grey, red, green and blue. The black line indicates the magnetic axis.

The geometry of the magnetic axis and that of a flux surface are real-valued.
Therefore, `c2r`

DFTs are sufficient to compute the backward transforms to real space.

### Geometry of the Magnetic Axis in a Stellarator #

The real-space geometry of the magnetic axis (a general closed curve) is given via a one-dimensional DFT evaluated using real-valued arithmetics:

\begin{align*} R_\mathrm{ax}(\zeta) =& \sum\limits_{n=0}^{N} R_{\mathrm{ax}, n}^\mathrm{cos} \cos \left( n \zeta \right) + R_{\mathrm{ax}, n}^\mathrm{sin} \sin \left( n \zeta \right) \\ Z_\mathrm{ax}(\zeta) =& \sum\limits_{n=0}^{N} Z_{\mathrm{ax}, n}^\mathrm{sin} \sin \left( n \zeta \right) + Z_{\mathrm{ax}, n}^\mathrm{cos} \cos \left( n \zeta \right) \end{align*}

where ζ is the toroidal angle per field period in radians. For a five-fold symmetric stellarator like W7-X, ζ ranges from 0 to 5 * 2π around the whole machine. Conversely, the first (unique) toroidal segment of the geometry is contained within a range of ζ from 0 to 2π.

In this example the geometry of the magnetic axis is evaluated at regular intervals in ζ:

$$ \zeta_l = 2 \pi l / n_\zeta $$

where *l* ranges from `0`

to `n_ζ-1`

.

Inserting this into the Fourier series for, e.g., the *R* coordinate, leads to a DFT:

$$ R_{\mathrm{ax}, l} = \sum\limits_{n=0}^{N} \left( R_{\mathrm{ax}, n}^\mathrm{cos} - i , R_{\mathrm{ax}, n}^\mathrm{sin} \right) e^{2 \pi i n l / n_\zeta} $$

Note that typically the number of Fourier coefficients included in computation
of the MHD equilibrium is (much) less than the number of grid points, i.e., *N* < `n_ζ`

.
The physics happens in real space and certains aspects of a computation
might require a fine spatial resolution (i.e., many grid points in real space) to accurately
resolve, e.g., gradients while the overall complexity of the solution
(gouverned by the number of included Fourier coefficients) does not need to be extremely high.
FFTW only supports (logically) equally-sized input and output arrays and thus,
if less Fourier coefficients than desired grid points are to be used,
the Fourier coefficient array needs to be filled with zeros in the remaining area.

The axis geometry is defined as follows:

```
// number of Fourier coefficients
int ntor = 12;
// cos-parity Fourier coefficients for magnetic axis
double R_ax_cos[13] = { 5.63, 0.391, 0.0123, 1.21e-3, 4.89e-6, -5.12e-5,
-6.57e-5, 2.27e-6, -9.28e-5, -5.32e-7, 6.67e-5, 5.72e-5, 2.38e-5 };
// sin-parity Fourier coefficients for magnetic axis
double R_ax_sin[13] = { 0.0, 0.0727, 6.34e-3, 5.84e-3, 9.77e-4, 5.32e-5,
8.48e-5, 5.57e-5, 5.56e-5, 5.53e-6, 7.74e-7, 1.03e-5, 8.75e-6 };
```

The number of grid points in the toroidal direction at which the magnetic axis geometry
is to be evaluated is specified via the `n_zeta`

parameter.
It needs to be checked that `n_zeta`

is large enough, i.e.,
above the Nyquist limit corresponding to the specified number of Fourier coefficients.
Then, the input and output arrays can be allocated:

```
int nCplx = n_zeta/2+1;
if (nCplx<ntor+1) {
printf("error: number of grid points too low.\n");
return;
}
fftw_complex *in = fftw_alloc_complex(nCplx);
double *out = fftw_alloc_real(n_zeta);
fftw_plan p = fftw_plan_dft_c2r_1d(n_zeta, in, out, FFTW_ESTIMATE);
```

Now that the input array is allocated, the available Fourier coefficients can be copied over and (if present) the remainder of the input array is set to zero:

```
// copy over available Fourier coefficients
for (int n=0; n<=ntor; ++n) {
in[n] = R_ax_cos[n] - I * R_ax_sin[n];
}
// zero out remaining input Fourier coefficients
for (int n=ntor+1; n<nCplx; ++n) {
in[n] = 0.0;
}
```

The full example can be found in `src/app_magn_axis.c`

.

Assuming stellarator symmetry, half of the Fourier coefficients can be omitted and the transform reduces to the one-dimensional IDCT and IDST, respectively:

\begin{align*} R_\mathrm{ax}(\zeta) =& \sum\limits_{n=0}^{N} R_{\mathrm{ax}, n}^\mathrm{cos} \cos \left( n \zeta \right) \\ Z_\mathrm{ax}(\zeta) =& \sum\limits_{n=0}^{N} Z_{\mathrm{ax}, n}^\mathrm{sin} \sin \left( n \zeta \right) \end{align*}

The FFTW documentation
explicitly advises against using `R*DFT00`

transforms for this purpose due to numerical stability issues
which are currently circumvented within FFTW by using a less efficient algorithm.
If the data is required anyway on the whole domain from 0 to 2π,
it is probably easiest and fastest (in terms of development work)
to simply use a `c2r`

DFT and provide only real data in the input.

On the other hand, in case the evaluation locations are not required to be located on
“full” grid points in the toroidal direction, `REDFT01`

and `RODFT01`

can be used.
This is the subject of the now-starting second half of this example.

The number of grid points as well as the output array size are given as follows:

```
int nCplx = n_zeta / 2;
if (nCplx < ntor + 1) {
printf("error: number of grid points too low.\n");
return;
}
```

Only the cosine Fourier coefficients are copied over into the (real) input array:

```
// copy over available Fourier coefficients
in[0] = R_ax_cos[0];
for (int n = 1; n <= ntor; ++n) {
in[n] = R_ax_cos[n];
}
// zero out remaining input Fourier coefficients
for (int n = ntor + 1; n < nCplx; ++n) {
in[n] = 0.0;
}
```

The second half of the output array (of size `n_zeta`

for equivalent with the first half of this example)
is filled by applying the even symmetry of the output data implicitly used in `REDFT01`

:

```
for (int n=0; n<nCplx; ++n) {
out[n_zeta - 1 - n] = out[n];
}
```

The output data is compared in the following figure with the equivalent result from using a `c2r`

transform:

The violet curve labelled ‘DFT’ is the output from the `c2r`

DFT applied in the first half of this example.
The green curve labelled ‘REDFT01’ is the output from the `REDFT01`

applied in the second half of this example.
Note that the locations of the `REDFT01`

output needed to be shifted to the right by half an index to yield overlapping curves.

The full example can be found in `src/app_magn_axis_stellsym.c`

.

### Geometry of a Flux Surface in a Stellarator #

The real-space geometry of the flux surface is a general toroidal surface which is conveniently parameterized using poloidal and toroidal angle-like coordinates. The surface geometry is then given via a two-dimensional DFT:

\begin{align*} R(\theta, \zeta) =& \phantom{+}~ \sum\limits_{n= 0}^{N} R_{0,n}^\mathrm{cos} \cos \left( n \zeta \right) + R_{0,n}^\mathrm{sin} \sin \left( n \zeta \right) \\ ~& + \sum\limits_{m=1}^{M} \sum\limits_{n=-N}^{N} R_{m,n}^\mathrm{cos} \cos \left( m \theta - n \zeta \right) + R_{m,n}^\mathrm{sin} \sin \left( m \theta - n \zeta \right) \\ Z(\theta, \zeta) =& \phantom{+}~ \sum\limits_{n= 0}^{N} Z_{0,n}^\mathrm{sin} \sin \left( n \zeta \right) + Z_{0,n}^\mathrm{cos} \cos \left( n \zeta \right) \\ ~& + \sum\limits_{m=1}^{M} \sum\limits_{n=-N}^{N} Z_{m,n}^\mathrm{sin} \sin \left( m \theta - n \zeta \right) + Z_{m,n}^\mathrm{cos} \cos \left( m \theta - n \zeta \right) \end{align*}

where (θ,ζ) are the poloidal and toroidal angle-like variables in radians, respectively. For a five-fold symmetric stellarator like W7-X, ζ ranges from 0 to 5 * 2π around the whole machine. Conversely, the first (unique) toroidal segment of the geometry is contained within a range of ζ from 0 to 2π. The poloidal angle-like coordinate θ ranges from 0 to 2π once the short way around the torus (wristband-like).

The two-dimensional DFT above can be simplified (shown here for *R* only)
by introducing rectangular two-dimensional Fourier coefficient matrices:

$$ R_{m,n} = \begin{cases} 0 & m=0, n<0 \\ R_{0,n}^\mathrm{cos} - i R_{0,n}^\mathrm{sin} & m=0,n \geq 0 \\ R_{m,n}^\mathrm{cos} - i R_{m,n}^\mathrm{sin} & \textrm{else} \end{cases} $$

In this example the geometry of the flux surface is evaluated on a regular grid in θ and ζ:

$$ \zeta_l = 2 \pi l / n_\zeta $$

$$ \theta_k = 2 \pi k / n_\theta $$

where *l* ranges from `0`

to `n_ζ-1`

and *k* ranges from `0`

to `n_θ-1`

.

Inserting this into the Fourier series for, e.g., the *R* coordinate, leads to a DFT:

$$ R_{l,k} = \sum\limits_{m= 0}^{M} \sum\limits_{n=-N}^{N} R_{m,n} , e^{2 \pi i ( n l / n_\zeta + m k / n_\theta ) } $$

The *R* coordinate is a real-valued quantity
which implies that a two-dimensional `c2r`

DFT provided by FFTW can be used to
perform the backward transform in order to evaluate the flux surface geometry.
One further issue consists in the fact that the definition
of the Fourier geometry employed in VMEC uses an angle argument *(m θ - n ζ)*
where the two-dimensional DFT written out above uses *(m θ + n ζ)*.
The Fourier coefficients coming from VMEC
have to be inserted into the positions corresponding to *(-n)*
in the input array given to FFTW in order to resolve this pecularity.

The next part concers the code to copy over the Fourier coefficients
staring from the VMEC output in arrays `rmnc`

and `zmns`

into the input to FFTW.

The index in the FFTW input corresponding to given mode numbers `n`

and `m`

can be computed as follows:

```
if (n <= 0) {
idx_in = -n * nyq_pol + m;
} else {
idx_in = (n_zeta - n) * nyq_pol + m;
}
```

This code is abbreviated in the following as `/* compute idx_in */`

.

For the first coefficients with `m=0`

,
the copying is performed as follows:

```
int idx_vmec = 0;
int m = 0;
for (int n = 0; n <= ntor; ++n) {
/* compute idx_in */
in_R[idx_in] = rmnc[idx_vmec];
in_Z[idx_in] = I * zmns[idx_vmec];
idx_vmec++;
}
```

Note that VMEC weights the coefficients for `m=0`

with positive `n ζ`

in contrast to the following coefficients with `m>0`

.
The cosine-parity quantities like *R* are not influence by this,
but the sine-parity quantities like *Z* need to get the sign of their value flipped
due to the odd symmetry `sin(-n zeta) = -sin(n zeta)`

.

The second part of the copying adresses the Fourier coefficients
with `m>0`

:

```
for (m = 1; m < mpol; ++m) {
for (int n = -ntor; n <= ntor; ++n) {
/* compute idx_in */
in_R[idx_in] = 0.5 * rmnc[idx_vmec];
in_Z[idx_in] = -0.5 * I * zmns[idx_vmec];
idx_vmec++;
}
}
```

Here, the coefficients are scaled by a factor of `0.5`

.
FFTW implies that coefficientss for `m<0`

were present in the logically equivalent DFT input,
which is not the case for VMEC output. Thus, they are (in this case errornously)
scaled internally by a factor of `2`

, which needs to be counteracted by a scaling factor of `0.5`

in the input.

Assuming stellarator symmetry, half of the Fourier coefficients can be omitted and the transform reduces to the two-dimensional IDCT and IDST:

\begin{align*} R(\theta, \zeta) =& \sum\limits_{n= 0}^{N} R_{0,n}^\mathrm{cos} \cos \left( n \zeta \right) + \sum\limits_{m=1}^{M} \sum\limits_{n=-N}^{N} R_{m,n}^\mathrm{cos} \cos \left( m \theta - n \zeta \right) \\ Z(\theta, \zeta) =& \sum\limits_{n= 0}^{N} Z_{0,n}^\mathrm{sin} \sin \left( n \zeta \right) + \sum\limits_{m=1}^{M} \sum\limits_{n=-N}^{N} Z_{m,n}^\mathrm{sin} \sin \left( m \theta - n \zeta \right) \end{align*}

This is currently resolved by simply omitting the stellarator-asymmetric terms in the input to FFTW.

The full example code can be found in `src/app_flux_surface.c`

.
A Python script is provided to plot the resulting real-space geometry in `src/plot_lcfs_realspace.py`

.

## Allocation of arrays #

Throughout this example collection, the proposed convenience wrapper functions provided by FFTW for allocating real- and complex-valued arrays are used:

```
int n = 32;
int nOut = n/2+1;
double *in = fftw_alloc_real(n);
fftw_complex *out = fftw_alloc_complex(nOut);
```

where `n`

is the real-space size of the DFT and `nOut`

is the number of Fourier coefficients resulting from a `r2c`

DFT.
The corresponding “raw” allocation code would look like this:

```
double *in = (double*) fftw_malloc(sizeof(double) * n);
fftw_complex *out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * nOut);
```

Note that above code is equivalent to the standard C way of allocating memory using `malloc`

:

```
double *in = (double*) malloc(sizeof(double) * n);
fftw_complex *out = (fftw_complex*) malloc(sizeof(fftw_complex) * nOut);
```

except that the FFTW routines ensure proper memory alignment for exploiting SIMD instructions of modern CPUs.

## Utility functions #

In order to keep the examples short, a separate header file `src/util.h`

is provided.
It contains methods to operate on one- and two-dimensional arrays (the latter stored in row-major order)
of real (`double`

) and complex (`fftw_complex`

) numbers.
The following operations are supported:

- fill with random numbers between 0 and 1: e.g.
`fill_random_1d_cplx`

- element-wise check for approximate equality: e.g.
`compare_1d_real`

- write into a text file: e.g.
`dump_1d_real`

- compute generalized DFT (shift and non-linear phase):
`dft_1d_cplx`