# A short notice on performing matrix multiplications in PyCUDA

Among the most common technical and scientific numerical operations, matrix multiplication occupies one of the top positions.

In Python, matrix multiplication is immediately possible using the `dot`

routine of `numpy`

library. But how performing matrix multiplications on a GPU using `PyCUDA`

?

`PyCUDA`

offers the possibility of interfacing codes with already available `CUDA`

libraries. This is a luck since, as known, `cuBLAS`

enables matrix multiplications on GPUs in an extreemly effective and fast way.

**A first, CUDA-like possibility**

A first way to interface `PyCUDA`

with the `cuBLAS`

library is employing the `cublas`

module of the `scikit-cuda`

package. Actually, it would be even possible to directly link the `cublas.dll`

library, but this would require somewhat more tricky operations from the User’s side. Accordingly, this possibility is here unconsidered.

The following code represents an example of matrix multiplication with `PyCUDA`

using `cuBLAS`

. As it can be seen, the `cuBLAS`

call is definitely similar to the `cuBLAS`

syntax `CUDA`

Users adopt.

Nevertheless, it should be remembered that `cuBLAS`

assumes a Fortran-like column-major matrix ordering. This explains the reason why, when defining the `A`

and `B`

** **CPU matrices, the column-major ordering has been specified through the `order='F'`

memory layout. Obviously, once performed the matrix multiplication on the GPU, the layout of the result will still have a column-major ordering. Therefore, before presenting the result, a layout change is needed for the result matrix `C_gpu`

again with an `order='F'`

specification.

**A second, simpler possibility**

A second, simpler possibility to perform matrix multiplications on the GPU through `cuBLAS`

routines is to exploit the `dot`

function of the `linalg`

module of `scikit-cuda`

package. Such a routine is the “symmetrical” counterpart of `numpy`

`dot`

. The following is an example code.

The code is now much simpler than before since `linalg.dot`

automatically handles the matrix layout and many input parameters are default parameters and do not need to be specified. The complete syntax is available at the scikit-cuda dot documentation page.