GSOC 2020: Metadata searches using Fido

Missed Comet NEOWISE due to annoying cloud cover in Vadodara straight for 2 weeks :(

And that’s the central theme of the project :)

It would now be possible to query clients that return metadata tables using Fido. So SunPy’s Fido is a unified interface that allows searching and downloading solar physics data. In other words, it is a consistent and easy way to query most forms of solar physics data. It searches various archives and web services based on search attributes specified in the query.

SunPy currently supports metadata facilities viz., JSOC Client, HEK Client, and Helio Client.

Read more…

Week 7 & 8: Full Throttle Ahead!

In the last blog post, I was discussing about creating a MAGICEventSource and calling it inside DL1DataWriter to make HDF5 files with the CTA ML data format from the ROOT files.

I have integrated the MAGICEventSource to writer.py and also updated it to ctapipe v0.8.0. This code is housed at PR #90. There were a few small issues with the metadata and transformations. The current ctapipe MCHeader container has variables needed to be filled by the input file, but the ROOT file has separate RunHeaders, so we were in a dilemma as to which metadata to store.

Finally, we decided to create our own MCHeader container inside the MAGICEventSource and use this to store metadata instead of using the ctapipe container.

Read more…

GSoC 2020: glue-solar project 2.2

The end of the 2nd Coding Period of this year’s Google Summer of Code has finally arrived. I cannot help but noticed the many things I have learned/built over the past two months for both the glue Graphical User Interface (GUI), as well for its solar physics plugin glue-solar. One of the tricks I have learned is writing up tests for the GUI programming I have done in the process, which is using Qt for Python to port the entire PySide module to Qt5. My main approach is quite simple: Imitate, modify, test. Since glue has a fairly well developed codebase, it is not hard to find sample code snippets within it for inspiration of new code to add. The GUI unit tests are no exception to this rule. And, I would like to use the opportunity to share the experience with more novice contributors to the software, so perhaps somehow someone else somewhere down the line will be able to benefit from this.

The first and foremost concern we should have regarding any type of unit testing is what we should check for functionality-wise. Let me take some code I have just written over the past few days which adds NDData (a data structure native to Astropy) support to glue and enables the loading of various types of astronomical data more readily, such as the standard FITS files. As I have discussed with my mentors via the glue-solar IRC channel, we have observed that NDData is much like laser was back in the 1960’s (e.g. as reported by NYTimes then) was a solution in search of a problem before its wider adoption by the astronomy community for LSST, DKIST and CCDProc data. Now we are in the process of integrating it into glue. The original conception of this, at least in principle, is to use the simple and fluid structure of NDData to help process for example FITS data. This is because there are no generic NDData files in existence at all. This is to facilitate the manipulation of not only the data component, but also its units, mask, uncertainty, and meta attributes, which are quite common in the handling of astronomical data (pun intended). With such a motivation in mind, we have added a nddata.py module to the glue/core/data_factories directory in a PR at glue. To complete the PR, it is standard practice to add tests where applicable, so we have added a testing module called test_nddata.py in the glue/core/data_factories/tests directory to not only serve as a routine, but also to test whether the code has been properly debugged, which caught all of the major known bugs I have inadvertently introduced to the codebase before testing.

The GUI unit test I have written is as follows:

Read more…

On the move!

Hey, folks! Here's Meuge, your usual host from the last few months.

Johnny

In the last few weeks, we got a little bit of everything, analyzing, coding, and searching for feasible solutions to reach the best result. Therefore, we came up with a new feature in Poliastro's Earth module. The API may change in the future, so be aware by the time you read this. Hence, I bet you might be wondering, what's all the fuzz?

Read more…

Week 5 & 6: Half-time

The first evaluation is successfully complete!

In the last blog post, I discussed adding a script to the DL1DataHandler to convert ROOT files to HDF5. But, in order to maintain symmetry with the current DL1DataWriter, we decided to instead create a MAGICEventSource from this root2hdf5 format that initializes the DL1 data container with ROOT data and then is sent to the CTAMLDataDumper to dump this data into an HDF5 file. This will be analogous to the SimTelEventSource that is currently being used for CTA data from simtel files.

So, I have been working on creating this MAGICEventSource. The basic code for the class structure is complete, and the containers are initialized correctly. All that remains now is to add some code to the DL1DataWriter and CTAMLDataDumper to accept and recognize MAGIC data and use the MAGICEventSource instead of the SimTelEventSource.

Read more…

GSOC 2020: End of the First Half

Comet C/2020 F3 (NEOWISE) will be visible in India for the next 20 days!

So now I’m halfway through the Summer of Code Journey. The last two weeks have been full of code reviews, code refactoring, and documenting stuff. I also helped new contributors to the SunPy to get them started. Thus I interacted more with the community this time.

Making HEC Fido Compatible

HEC stands for Heliophysics Event Catalogue. For your information, Heliophysics events are a large variety of phenomena that:

  • originate or occur on the Sun.
  • Propagate through the interplanetary medium.
  • Interact with the geospace and planetary analogs.
An illustration of an Heliophysics Event| Earth’s magnetic field shielding our planet from solar particles. Credit: NASA/GSFC/SVS

HEC allows complex searches for events stored in indexed catalogs. SunPy has a HECClient which allows you to interface with Helio web services.

Read more…

Detour: A Reflection

Look at where we are, look at where we started.

This post marks the halfway point of the GSoC journey. Instead of my usual full blown update posts, I want to celebrate this milestone with just a small, solemn reflection on the work we have accomplished so far.

We have the Search Events up and running that maps the Sunspotter observation to the Heliophysics Events Knowledgebase (HEK). This has broadened our dataset and will allow us to do timeseries analysis in the near future.

We also have a Sunspotter class that handles the various CSVs and the AR FITS files.

Read more…

Google Summer of Code - Blog #3!

Hey! Continuing from where we left off last time, in this blog I’ll be talking about the work I did in the period between 1st - 15th July. As I had mentioned in the previous blog, my primary task for this period was to get the Cython/CuPy version of our proof-of-work code to produce the correct output. I had initially expected the work to be relatively easy, as the output was already there; I simply had to check the logic part of the program to identify why it wasn’t correct. However, the work actually turned out to be a lot more painful that I’d have imagined. I had briefly touched upon this fact in the last blog as well, but the first thing I had to do in order to finish this task was to ensure that the debugging tools and methods were in place. This, fortunately, was one of the easier parts of the problem. The problem was two fold, and thus required 2 different solutions as well. First, the loading of the dataset on to the RAM was a challenging task for my computer. I am not sure exactly why that was the case, but it certainly was nowhere as performant as the C++ code which did the same task of loading the exact same data in a relative breeze. I looked around the internet to understand what could possibly be slowing down the loading step so much, which was quite simply a single line : v0 = np.load(dir+path+'v0.npy'). The file itself was around 400MB, and there were a total of 8 of them, pushing the total memory required to around 3 GB. The solution to this problem was relatively straight forward, and felt almost as if it was hiding in plain sight when I did find it. The core idea is that when we try and load a numpy array the way I was doing without having declared the variable previously as a c-type, Cython quite naturally assumes it to be a pure Python variable and therefore fails to deliver the performance boost it promises on compilation. This however, was a trivial issue to resolve. All I had to leverage the advantage promised by Cython was to declare the numpy arrays prior to loading them with the datasets. This was done with a simple line cdef np.ndarray[dtype=np.float32_t, ndim=1] v0 = np.zeros(N_points, dtype=np.float32). That was it! With just this single addition, the array was now a c-type variable and thus was processed significantly faster than the older pure numpy arrays. Unfortunately I didn’t benchmark the difference as I still have some things which I am not super confident about related to this part. I am not sure if it’s due to an observation bias or some other external factor, but I felt that the speed of loading the data itself varied quite significantly even with the same binaries. I am not sure if this is due to some caching/optimizations being done under the hood by the compiler itself, but whatever it is, certainly would make aimless benchmarking without controlling these external factors a futile exercise. The second issue which I faced which was making debugging difficult was the inability of my GPU to automatically kill the Python/CuPy tasks once the program finished execution. I searched around stackoverflow and found that it is actually a rather common issue with CuPy. As a result, it also didn’t take a long time before I found a makeshift solution for this problem as well. All I had to do once the program had finished execution was to call some specific CuPy methods to free the memory, and it worked just fine! With these two issues sorted, I had a much better setup in place to try and debug the code without being forced to restart the computer or wait 10 minutes for the RAM to clear up everytime the program finished execution!

Read more…