Nearest-neighbor and linear interpolations in 1D with CUDA.

Vitality Learning
7 min readOct 8, 2023

The CPU and GPU codes regarding this post are available at the Vitality Learning GitHub repository.

We present some theory behind interpolation and, in particular, nearest-neighbor and linear interpolation and then present CPU (python) and GPU (PyCUDA) implementations thereof.

The interpolation problem

The interpolation problem can be stated as follows: knowing a function f(x) at the N+1 discrete points

retrieve the values of the function at any point of the (a, b) interval. Typically, an interpolation scheme sets

where g is the interpolation function and

are N+1 interpolation parameters which are fixed by enforcing that the function to interpolate f and the interpolating function g are coincident at the interpolation points, namely

Often, the interpolating function g is expressed as a combination of basis functions:

Many times, the function f can be well approximated by polynomials, i.e.

In the latter case, the interpolation parameters are worked out by constraining that

so that the following matrix relation is established

where the relevant matrix is a Vandermonde matrix which is invertible.

Lagrange interpolation

In the case of Lagrange interpolation, we set

where the i-th basis function is a polynomial of degree N whose zeros are the interpolation points except for the i-th. In other words,

where the right hand side is the Kronecker delta.

By the fundamental algebra theorem, thanks to the knowledge of its zeros, each relevant polynomial can be factored as

where C is a constant to be determined. The constant C can be found by enforcing that at the non-zero sampling point, the relevant polynomial is unitary, namely

so that

As a consequence, the interpolating function g can be expressed as

The remainder of the interpolation process is such that the function f can be expressed as

where

Nearest-neighbor interpolation

Nearest-neighbor interpolation.

Nearest-neighbor interpolation is a zero-th order polynomial interpolation that assigns the value of the nearest neighbor in the original data to each interpolation point. It is a simple method that does not involve any weighted averaging like linear interpolation (see below).

Linear interpolation

Linear interpolation can be drawn from general Lagrange interpolation in the case N = 1. In particular

Linear interpolation.

CPU implementation

In the following, we present the CPU (python) implementations of nearest-neighbor and linear interpolations. The common parts of the implementations will be presented referring to nearest-neighbor interpolation only.

Nearest-neighbor CPU interpolation: python implementation

Let us begin by breaking down the code implementing nearest-neighbor interpolation in 1D in PyCUDA.

The code inputs are the vector x of the sampling points, the vector y of the sample values and the vector yi of the interpolated vector.

In the following, the main steps.

  1. Reshaping Vectors

The following lines ensure that the vectors x, xi, and y are column vectors.

2. Calculate Spacing

The reciprocal of the spacing between adjacent sample points in the x vector is then calculated as in the following snippet.

3. Adjust xi values

The values in xi are shifted by subtracting the minimum value of x. This is done to make sure that the indices calculated later are positive.

4. Initialize Output

The output vector yi is initialized with NaN values.

5. Calculate Nearest-Lower-Neighbors Indices

The indices of the nearest-lower-neighbors for each value in xi are calculated.

6. Perform Interpolation

The code finally checks for out-of-bounds indices and performs linear interpolation for the valid indices. The result is stored in the yi vector.

Linear CPU interpolation: python implementation

Let us now turn to breaking down the code implementing linear interpolation in 1D in PyCUDA.

The code inputs are the vector x of the sampling points, the vector y of the sample values and the vector yi of the interpolated vector.

Steps 1–5 are the same as for the nearest-neighbor interpolation.

6. Perform Interpolation

The out-of-bounds indices are checked and linear interpolation is performed for the valid indices according to the above interpolation formula. The result is once again stored in the yi vector.

GPU implementation

The GPU implementation available at the GitHub repository of Vitality Learning is written in PyCUDA language. Examples on setting up a PyCUDA code are available in the post Five different ways to sum vectors in PyCUDA. In the sequel, we will iluustrate the only CUDA kernels of interest.

Concerning the CUDA kernels, we first present three different versions for linear interpolation: the first uses data stored in global memory, the second exploits texture fetching and the third texture filtering. Finally, we will present also a version performing nearest-neighbor with texture filtering.

In all the considered implementations, each thread is responsible for interpolating one interpolation point.

Linear GPU interpolation: global memory

In the following, an overview of the kernel:

  1. Thread identification

Each thread is associated to a unique global identifier based on its position within blocks and the current block.

2. Boundary check

Ensures that each thread works on a valid output (interpolation) point (within the range 0,...,M-1).

3. Relative position calculation

Calculates the position relative to the first input position. See step #3 of the CPU case.

4. Nearest lower neighbor index calculation

Calculates the index of the nearest lower neighbor using rounding towards zero.

5. Input values retrieval

Retrieves the values of the nearest input neighbors needed for interpolation.

6. Weight calculation and linear interpolation

Calculates weights for linear interpolation and assigns the result to the corresponding interpolation point.

The developed python code sets up constant memory for ndx_const. The use of constant memory ensures that the ndx_const value is efficiently shared among all threads in the GPU. This is achieved by the following lines:

in CUDA.

Furthermore, the python function calling the CUDA kernel is the following:

The value of ndx is computed from outside that function and moved to constant memory by the following lines:

Linear GPU interpolation: texture fetch

We now present the solution using texture fetching in an incremental way as before. Using texture memory for data fetching in CUDA can lead to improved memory access patterns and caching benefits, especially for scenarios where memory access patterns have spatial locality. It is a powerful feature that CUDA provides for optimizing memory access in GPU kernels.

The relevant kernel function is reported below:

In addition to before, it is necessary to work out a texture object declaration d_yintexture for 1D float data, namely

Finally, the rows

replace the above used direct memory access with texture fetching. The values of the nearest input neighbors from the texture are fetched using the calculated indices.

Moreover, the function calling the compiled CUDA kernel needs to use the below additional rows:

The first instruction retrieves the texture reference object associated with the texture named d_yintexture from the CUDA module (mod). The second, binds the CUDA array d_yin to the texture reference d_yintexture. This establishes the association between the texture reference and the specific CUDA array, allowing subsequent texture fetching in CUDA kernels to use the data from d_yin.

Linear GPU interpolation: texture filtering

To conclude the roundup of approaches for linear interpolation, we now present the case of employing linear texture filtering.

The CUDA kernel is the following:

The first line:

declares a texture object named d_yintexture_filtering for 1D float data with linear filtering.

The direct memory access with texture fetching used before is changed to texture filtering.

The (float)xi + 0.5f part applies a half-pixel offset, which is used to achieve the right alignment in texture filtering.

The two above lines appear in the python function calling the CUDA kernel. In particular, the first line sets texture flags to read data as integers. This is necessary when using linear filtering to indicate that the texture contains data at integer locations. Moreover, the second line sets the texture filter mode to be linear. This enables linear interpolation between texels during texture fetching.

Nearest-neighbor GPU interpolation: texture filtering

To perform a nearest-neighbor interpolation, instead of a linear one, using texture filtering, it is enough to employ the same last CUDA kernel exploited for linear texture filtering, changing the set up instructions to

--

--

Vitality Learning

We are teaching, researching and consulting parallel programming on Graphics Processing Units (GPUs) since the delivery of CUDA. We also play Matlab and Python.