PyCUDA, Google Colab and the GPU
In this post, we provide an introduction to the PyCUDA library and to the Google Colaboratory environment and a short PyCUDA unit sample that can be also run on Google Colab.
What is PyCUDA?
PyCUDA is a library developed by Andreas Klöckneret al. allowing to write CUDA codes and compiling, optimizing and using them as ordinary Python functions in a totally transparent way to the User. The User does not need to manage the CUDA compiler unless he explicitly requests it.
PyCUDA uses the concept of GPU run-time code generation (RTCG) enabling the execution oflow-level code launched by the high-level scripting language offered by Python. The use of RTCG increases the User’s productivity from different points of view.
A first advantage of RTCG is related to the possibility of a low-level programming by writing CUDA kernels only for the portions of the code to be accelerated while using, for the remaining ones, all the functionalities of a high-level language, as graphics or I/O. RTCG enables a run-time code optimization instead of a compile-time one. The former occurs at a more favorable time, when all the information on the machine on which the code must be executed is available. Also, the result of the compilation process is cached and reused if possible, initiating recompilation only when necessary. This is illustrated in figure below where the compilation and caching operations in the gray box are performed transparently to the User.
Finally, it is possible to fully exploit the potentialities of CUDA libraries thanks to the readiness of many wrappers in publicly available libraries or to construct such wrappers by oneself.
A second advantage of RTCG is associated to the possibility of using, within certain limits, a high-level, mathematical-like syntax for GPU executions.
The potentialities of the PyCUDA library are illustrated with simple examples in the post “Five different ways to sum vectors in PyCUDA”.
Let us give now some words on the Google Colaboratory environment.
Google Colaboratory, or Colab, is a totally free development environment based on Jupyter Notebook.
Jupyter Notebook is an open-source, free web application permitting to create and share documents containing codes, equations, text, plots, tables and images and that enables sharing codes on the GitHub platform. In particular, the code is any time modifiable and executable in real time. What implemented can be later exported as Python or ipynb source, where ipynb is a format capable to host all the content of the Jupyter Notebook web application session and including the inputs and outputs of the computations, the images and the comments and that can be finally exported as html, pdf and LaTeX.
Google Colab supports Python 2.7 and 3.6, does not request any configuration and accommodatesthe CPU, GPU or Tensor Processing Unit (TPU) execution, depending on the needs. It hosts libraries like PyTorch, TensorFlow, Keras and OpenCV, so that it is much used for Machine Learning, Deep Learning and also experiments for Computer Vision. It is possible, however, to install also other modules if necessary. In order to exploit Google Colab, it is enough to have a Google account and all the work can be saved on Google Drive.
The first screen that is visualized when Google Colab is launched is a welcome project in which the different possibilities offered by the platform appear.
It is possible to create a new notebook, for example in Python 3.6, by selecting New notebook from the File menu. To enable the current session to use the GPU, it is enough to click on Change runtime type of the Runtime menu and select GPU from the Hardware accelerator drop down menu. It is understood that, from now on, such selection should be operated to correctly run the shown example.
Dumping the GPU properties
A first, very simple example enables to dump the properties of the GPU card in use. The example is entirely shown below.
import pycuda.driver as cuda
import pycuda.autoinitprint(“%d device(s) found.” % cuda.Device.count())
dev = cuda.Device(0)
print(“Device: %s”, dev.name())
print(“ Compute Capability: %d.%d” % dev.compute_capability())
print(“ Total Memory: %s KB” % (dev.total_memory()//(1024)))
atts = [(str(att), value)
for att, value in dev.get_attributes().items()]
for att, value in atts:
print(“ %s: %s” % (att, value))
However, before launching the code, it is necessary to install PyCUDA under the Google Colab environment. This can be done by the following snippet
!pip install pycuda
Going back to the code above, it permits to illustrate the normal workflow of a PyCUDA code.
In particular, the first step is to load the libraries as in a standard Python code. In the above example, two libraries are imported:
- pycuda.driver: it contains functions for memory handling, as allocation, deallocation and transfers, for the dumping of information on the GPU card etc.; in the example, the cuda short hand is given to pycuda.driver;
- pycuda.autoinit: it does not use a short hand notation and this call serves for the device initialization, memory cleanup and context creation.
The first operation performed in the above listing is that of counting the number of available devices by means of cuda.Device.count(). The second is that of dumping the properties of the only available GPU, namely, GPU number 0, in the dev variable. Later on, the GPU name stored in dev.name(), the compute capability stored in dev.compute_capability() and the available free memory (in bytes) stored in dev.total_memory() are shown. Finally, all the GPU attributes are ordered in an alphabetical order and displayed on screen. An example of output is shown below
1 device(s) found.
Device: %s Tesla P100-PCIE-16GB
Compute Capability: 6.0
Total Memory: 16671616 KB