A short notice on performing matrix multiplications in PyCUDA

Vitality Learning
2 min readMay 15, 2019

Among the most common technical and scientific numerical operations, matrix multiplication occupies one of the top positions.

In Python, matrix multiplication is immediately possible using the dot routine of numpy library. But how performing matrix multiplications on a GPU using PyCUDA?

PyCUDA offers the possibility of interfacing codes with already available CUDA libraries. This is a luck since, as known, cuBLAS enables matrix multiplications on GPUs in an extreemly effective and fast way.

A first, CUDA-like possibility

A first way to interface PyCUDA with the cuBLAS library is employing the cublas module of the scikit-cuda package. Actually, it would be even possible to directly link the cublas.dll library, but this would require somewhat more tricky operations from the User’s side. Accordingly, this possibility is here unconsidered.

The following code represents an example of matrix multiplication with PyCUDA using cuBLAS . As it can be seen, the cuBLAS call is definitely similar to the cuBLAS syntax CUDA Users adopt.

Nevertheless, it should be remembered that cuBLAS assumes a Fortran-like column-major matrix ordering. This explains the reason why, when defining the A and B CPU matrices, the column-major ordering has been specified through the order='F' memory layout. Obviously, once performed the matrix multiplication on the GPU, the layout of the result will still have a column-major ordering. Therefore, before presenting the result, a layout change is needed for the result matrix C_gpu again with an order='F' specification.

A second, simpler possibility

A second, simpler possibility to perform matrix multiplications on the GPU through cuBLAS routines is to exploit the dot function of the linalg module of scikit-cuda package. Such a routine is the “symmetrical” counterpart of numpy dot . The following is an example code.

The code is now much simpler than before since linalg.dot automatically handles the matrix layout and many input parameters are default parameters and do not need to be specified. The complete syntax is available at the scikit-cuda dot documentation page.

--

--

Vitality Learning

We are teaching, researching and consulting parallel programming on Graphics Processing Units (GPUs) since the delivery of CUDA. We also play Matlab and Python.