A short notice on performing matrix multiplications in PyCUDA
Among the most common technical and scientific numerical operations, matrix multiplication occupies one of the top positions.
In Python, matrix multiplication is immediately possible using the dot
routine of numpy
library. But how performing matrix multiplications on a GPU using PyCUDA
?
PyCUDA
offers the possibility of interfacing codes with already available CUDA
libraries. This is a luck since, as known, cuBLAS
enables matrix multiplications on GPUs in an extreemly effective and fast way.
A first, CUDA-like possibility
A first way to interface PyCUDA
with the cuBLAS
library is employing the cublas
module of the scikit-cuda
package. Actually, it would be even possible to directly link the cublas.dll
library, but this would require somewhat more tricky operations from the User’s side. Accordingly, this possibility is here unconsidered.
The following code represents an example of matrix multiplication with PyCUDA
using cuBLAS
. As it can be seen, the cuBLAS
call is definitely similar to the cuBLAS
syntax CUDA
Users adopt.
Nevertheless, it should be remembered that cuBLAS
assumes a Fortran-like column-major matrix ordering. This explains the reason why, when defining the A
and B
CPU matrices, the column-major ordering has been specified through the order='F'
memory layout. Obviously, once performed the matrix multiplication on the GPU, the layout of the result will still have a column-major ordering. Therefore, before presenting the result, a layout change is needed for the result matrix C_gpu
again with an order='F'
specification.
A second, simpler possibility
A second, simpler possibility to perform matrix multiplications on the GPU through cuBLAS
routines is to exploit the dot
function of the linalg
module of scikit-cuda
package. Such a routine is the “symmetrical” counterpart of numpy
dot
. The following is an example code.
The code is now much simpler than before since linalg.dot
automatically handles the matrix layout and many input parameters are default parameters and do not need to be specified. The complete syntax is available at the scikit-cuda dot documentation page.