Google Summer of Code - Blog #4!
Hello! Welcome to the latest blog post in my Google Summer of Code series. With this blog, we’ll marking the end of the second phase of evaluations! Additionally, this blog is going to be a little different from all the previous ones as we finally start to work and discuss about Python and the actual RADIS code base.
Honestly, I am actually really glad that we’re finally at this stage. The monotonity of Cython and working on the same, huge Cython+CuPy file with more than a thousand lines of code was getting very frustrating :P
In this blog, I shall be discussing more about the way we will integrate the GPU code with RADIS. But before we do that, let us first understand how RADIS handles this part right now, performing computations purely on the CPU. In order to compute the spectra of a molecule, we make use of module defined in RADIS known as Line-by-Line
Module or LBL for short. This module contains numerous methods which are used for calculations of spectras. One of the most standard methods for computing the spectras is known as calc_spectrum
, which takes in inputs such as the molecule, isotopes, temperature, pressure, waverange over which the spectra needs to be calculated (and many other depending on the user’s requirements) and returns the result in the form of Spectrum
object which neatly packs the information in single object which can then be processed as needed. This is what the user sees when they use RADIS. However, for us to develop and contribute, we also need to understand what goes on under the hood when a method such as calc_spectrum
is called.
The first thing that occurs when calc_spectrum
is called is the validation and conversion of physical quantities (such as temprature and pressure) into default units that are assumed in the rest of the code. This is followed by the instantiation of an object from the SpectrumFactory
class, which will contain all the information about the spectrum to be computed. This is all standard, something that will happen each time calc_spectrum
gets called. However, from here on, things start to get interesting as we look into the databank. Let us first understand what the purpose of a databank is: in order to compute a spectra for a specfic molecule, we need information about it. This could include data like line positions, line intensities, pressure-broadening parameters, etc. We don’t have to understand what each of these quantities mean or represent (to be honest, I don’t either) but the main idea we need to internalize is that we can’t compute a spectra without any information. So while it might be fair to say that a spectra can literally be generated from thin air, metaphorically that does not hold true.
This data, used to compute the spectra, can often be huge and therefore we try to avoid moving it around as much as possible. RADIS has implemented cache features in order to minimize the computations to be done on the raw databank. The important idea we need to focus on, is that once a databank has been loaded, we don’t have to load it once more if we ever call calc_spectrum
again for the same data (with maybe different waverange, etc.) This is done using extra memory: the original data that is loaded in the memory is saved as a dataframe, and every call made to calc_spectrum
first creates a copy of this data and works on it instead. This preserves the original dataset and allows us to use it again in case it is required, as the cost of the extra memory needed to save the copy. This was also the first problem I had to tackle: the GPU code we were going to use made use of npy
formatted arrays to load the data, whereas RADIS was designed to load h5
or par
files. This meant I had to modify the databank loader RADIS used to support npy files. This was not a very significant problem however, as we could simply load the data separately instead of using the databank. The only issue with this approach would have been the repeated loading of data everytime the function call was made, but since we were using a small dataset, it wasn’t a significant issue and we decided to push it back and first focus on getting the GPU code to work with independent data being loaded.
Once we had the data, the next step was to perform the computations. Now I won’t go into the details in this part since the mathematics behind these computations was something which: 1. I didn’t understand completely, and 2. I didn’t need to understand completely. Given my project already had a proof-of-concept code with the mathematics inplace, and my job was to make that code compatible with RADIS, I didn’t need to understand the nuances of it. I did go through the paper draft to understand the general idea behind the parallel approach we used, but I didn’t even try with the serial code since I had absolutely nothing to do with it. This mathematics was actually abstracted and encapsulated in another method eq_spectrum
. Once this method finished execution, we would have the spectrum ready with all the information ready for the user. Within this method, there was one specific method that was of particular interest to us: _calc_broadening
, responsible for the broadening step in the spectra calculation pipeline. This method was the bottleneck in the entire process, and the GPU code was primarily helping us speed up this particular portion of the process. My job here was to create another analogous method eq_spectrum_gpu
which would do the exact same job as eq_spectrum
, except instead of calling the sequence of serial methods to compute the spectra, I’d simply call the GPU code which was ready for us, and obtain the result. Some further processing of this result (which is computationally very inexpensive compared the rest of process), we have our spectra ready. It is a lot simpler and significantly easier than the CPU equivalent since we off-load the entire process to the Cython-compiled binaries and don’t have to worry about anything else. One problem I had with this (which is still not resolved as of writing this) is how do we fill in the missing information gaps in the Spectrum object, which in case of the CPU code were filled sequentially as the pipeline proceeded. Since our GPU code would only return the final result, I am not sure if we would be able to recover the preceding information bits which we didn’t obtain. Hopefully I’ll have the answer to it tomororrow when Erwan replies. That would pretty much finish my job here.
However, another important bit of information that we’re skimming over is how we call the compiled binary file. This part is in fact what I will be working on over the coming week. The idea is that although right now I am using a simple binary file located in the same file to run the code, it is possible (and highly likely) that the same binary file will not work properly (or at all) in another system it is not meant for. As a consequence, the entire method might crash. In order to resolve this, we will be looking into placing the source code instead of the binary file in the RADIS codebase, and everytime a user needs to use the GPU version of calc_spectrum
, the code will first, automatically, compile the Cython file on their system, obtain the binary file meant for their system, and use that instead. The details regarding the implementation of this bit are still not clear as it is something I will be working on next week!
The next blog will cover the implementation details about this runtime-compilation setup along with a discussion on what is next! Thank you!