Week 3 & 4: First blood

“Magic”

Since the last blog post, where there was a discussion about creating a script to convert the ROOT file into an HDF5 file with the CTA ML data format. So, during week 3 and 4 I was working on making this script. There were a few issues in this conversion.

In the current DL1DataHandler, the event number is created conveniently in accordance with CTA data. But, the MAGIC data uses a different way to store event numbers. There are 2 arrays for each camera, one for the EvtNumber and another for StereoEvtNumber. The StereoEvtNumber array is mapped from the EvtNumber array. So, I used all the stereo events and stored their values in the HDF5 file. Mono study can be also done on these stereo events. Since MAGIC doesn’t currently do mono analysis on events triggering only one telescope, that part is currently omitted.

So, now that this mapping is figured, we also mapped all the variables required in the HDF5 file with them in the ROOT file. Once everything was set up, I tried reading this converted file using the DL1DataReader. The first run yielded amazing results.

Read more…

What we've been working on these days!

Hey, folks! I hope everyone is okay out there. Today, I am going to explain a little bit about Repeat ground track orbits, and the value that lies behind. Orbits with repeating ground tracks play a significant role in space engineering. Ground tracks that repeat according to any pattern have meaningful applications in remote sensing missions, reconnaissance missions, and numerous rendezvous and docking opportunities with an orbiting spacecraft. Since they overfly the same points on the planet’s surface every repeat cycle, such as those studying gravity, the atmosphere, or the movement of the polar ice cap.

mind-blown

Read more…

2 Weeks at DiRAC

BiWeekly GSoC Updates Blog Post

Data Intensive Research in Astronjomy and Astrophysics
Data-Intensive Research in Astronomy and Cosmology

In all my previous internships/contract-work I have worked with only the web stack and that too on the web front end specifically.
All through the past years, I have been part of creating large scale consumer-facing web apps mostly with Javascript and related frameworks like React, Vue, and Angular.

Read more…

Week 1 & 2: Coding Officially Begins!

Hey Sid, did the coding period officially begin?

The community bonding period ended by the end of last month and the coding period officially began, I started to work on basic structure of the package and setting up the (not so user-friendly, PS: from astronomer’s perspective) interface for the image reduction methods.

Read more…

GSOC 2020: The Coding period commences!

Sunrise in Gujarat, near Vadodara city

So we got started with the coding period, I had a couple of community meetings with my mentors and few full community meetings where I discussed what I was working upon and what was needed to be done.

Major work in this fortnight was on refactoring Dataretriever Clients QueryResponse tables Pull request, PR #4213. This enabled the simple clients to show more metadata information like SatelliteNumber , Detector, Level, etc. in their response tables. All this information was extracted from the URL corresponding to the desired files using the parser.

What did it change? Earlier the things looked like this:

Read more…

Chapter 1: Apricity

An endeavour to better understand our Sun’s choleric disposition
The Bhagirathi Massif. The mountain is named after Bhagiratha, the legendary king of the Ikshvaku dynasty who brought the River Ganges, to Earth from the heavens. It symbolizes the flow of divine knowledge, or the knowledge of liberation (Ganga), into human consciousness (earth) by the grace of God (Shiva) and the austere efforts of enlightened masters (Bhagiratha).
ॐ भूर् भुवः स्वः।
तत्सवितुर्वरेण्यं भर्गो॑ देवस्य धीमहि।
धियो यो नः प्रचोदयात् ॥

There is some beauty in the fact that the essence of my undertaking needs not more than 24 letters of explanation.

What I have written above in the Devanagari script is one of the most important and highly revered Vedic hymns, the Gayatri Mantra. As translated by Dr S. Radhakrishnan, it states,

“We meditate on the effulgent glory of the divine Light; may he inspire our understanding.’’

The goal of my project is to study solar flares. The effulgent glory, the flares, that the divine Light, our Sun, produces. I shall meditate on them over the summer and better my understanding and appreciation of the mechanisms that govern them.

Read more…

GSoC 2020: glue-solar project 1.1

The official coding period of GSoC 2020 has begun on June 2nd (HKT) and the glue-solar work is currently underway according to plan as discussed with mentors Stuart Mumford and Nabil Freij during the community bonding period that precedes the coding period. The tasks proposed that we would like to see implemented this summer include but not limited to the following:

1. Modify the existing glue 1D Profile viewer to provide sliders for extra dimensions (currently collapses)
2. Clean up UX / UI (icon) for the pixel extraction tool (perhaps also upstreaming from glue-solar to glue)

Read more…

Google Summer of Code - Blog #1!

Hey there, welcome to the second blog of the series, and the first one to document the coding period. The Community bonding period which I described in my previous blog ended on 31st May and paved the way for the official coding period of the Google Summer of Code. These past two weeks were my first where I spent most of my time working on the actual code that will be a part of my project. My primary objective over these two weeks was to study the proof of work code that implements the spectral matrix algorithm to compute the spectra and execute it on a GPU. This was followed by a period of studying the different mechanisms with which RADIS calculates the spectras, and to understand the differences between each of them. This was important as implementing GPU compatible methods for all these distinct pipelines is my final objective and it is essential for me to understand the differences between these methods at the very onset of my project. Finally, the remaining time was spent on back and forth discussions with my mentors on various languages and libraries that could have been possible choices for undertaking this project. Once we had made our decision, I spent the time going through the library’s documentation, source code and tutorials to familiarize myself with these tools.

The first major objective for these two weeks focussed on studying and executing the proof of work code. This was a single CUDA C file which demonstrated the idea of using a spectral matrix to compute a spectra while making use of a GPU could offer performance boosts of multiple orders over the naive methods. I initially planned on executing the code and running it on my personal computer, but the idea was quickly dismissed because of reasons I already discussed in my previous blog. As a result, I ended up using Google Colab for this experimentation which came with its own fair share of discomforts. The first, and most significant of which, was the lack of persistent storage on Colab and thus being forced to resort to Google Drive for saving our database instead. This was costly in terms of both, the time it took to store the data on the cloud and also on the overall performance of the code as the time taken to load the data to memory increased significantly compared to a single CPU-GPU system like my personal laptop. This however, was not detrimental to the fundamental objective as the benchmarking could be done for each part of the code separately, and thus it did not influence or affect the execution of the device code or its perfomance in any way. Another task which popped up when using Colab to run CUDA was to setup the system so it could run native CUDA C files along with the Python code as well. This fortunately was not very difficult to solve and a couple of google searches gave us the list of all the necessary packages we needed to compile and execute C files on Colab. Once that was set up, the only thing that was left for me to do was transfer the data from my laptop to Google Drive. This once again posed a problem that I had not anticipated. Uploading 8GB of data takes much longer than downloading the same amount of data! As soon as that realization hit me, I decided to adopt another approach. I copied the code that I used to download the data from the FTP server to my local storage and ran it on Google Colab! This allowed me to once again redownload the entire data (which in the raw format was ~ 30GB) directly on my Drive instead. The process was much faster than I had anticipated and I soon had the raw data on my Drive. After running another couple of scripts to format and repartition the data into separate numpy arrays, I was ready to go. Execution of the code went smoothly except for a few hiccups surrounding the matplotlibcpp library that was being used to plot the output spectra. I wasn’t able to solve this problem immediately like the others and talked to my mentors about it. They advised me to not worry too much about it right now as it really wasn’t the critical part of the project. The major part, the kernel that was supposed to run on the GPU ran as expected and the results we obtained by timing the kernel performance were very positive! Now that I had successfully executed the code, what followed was a series of different runs of the same code, only this time with a different aim to test how far we could take this GPU compatible code. To give some numbers here, the original proof of work code that crunched the 8GB processed database computed a total of 240 million lines in less than a second! To be more specific, it took 120 ms on average to achieve that number. To put that into perspective, a naive implementation of the same code, that does not make use of the optimizations we did here, would take 10,000x longer to produce the same results! That in itself makes the naive approach an impractical solution to the problem. Compared to the current RADIS implementation, the performance gain was still significant with upto 50x gain in terms of time spent for computing the spectra. In order to see how far we could take this code, we also tried it with it a bunch of different ranges from the same dataset. While the original code was tested on a range that spanned from 1750 to 2400 cm-1 wavenumber, we took it as far as 1250-3050 cm-1. Surprisingly, the code scaled pretty well with the increase in the number of lines being computed, going from the original 120 ms taken to compute 240M lines to ~ 220 ms to compute 330M lines. Testing such a wide range and getting such positive results was sufficient proof for us to pack up the analysis part and move on to the actual implementation.

Read more…