A Month into GSoC

It’s almost been a month since the start of GSoC’s coding period and the work, I’m glad to write, is progressing at a steady and satisfactory rate.

The Developments

The last time around, my first ever not-so-meaningless contribution to open-source had just got merged, and I was really happy about it. But what that also did was, get me over the initial anxiety and intimidation I might have been feeling towards open-source. This, I think, has also helped speed things along.

While I started working on the optional features of my project around 2 weeks ago, I had to scrape the initial implementation because it turned out to be very, very slow and therefore had to be completely replaced with a better and more efficient approach, which was a bit less straightforward. But now, two weeks into experimenting and iterating, a new pull-request has been opened with the newly implemented efficient version of the feature, and while it's still a few minor commits away from its final form, the core functionality works as expected and, if everything goes as expected, which is never a guarantee, a hefty part of my proposal’s objectives will be complete.

Read more…

Insight of Implementation of JAX to stingray- GSoC coding period!

In the last blog, I wrote about Introduction to JAX and Automatic Differentiation. In this one, my plan for the next stage of implementation. Currently, I am working on the modeling notebook (https://github.com/StingraySoftware/notebooks/blob/main/Modeling/ModelingExamples.ipynb) to re-design it using JAX, especially to make optimization more robust by having JAX compute gradients on the likelihood function.

My mentor Daniela highlighted the issue that the current implementation is not robust using NumPy. The plan is to keep working on the current modeling notebook replacing NumPy by jax.numpy and also use grad, jit, vmap, random functionality of JAX.
When it comes to re-design, understanding the current design and the possible drawback and issues with corresponding packages comes on you first and I am trying them out. One such challenge is importing emcee into jupyter notebook for sampling. Despite making sure, I download the dependency in the current virtual environment and then making sure I import emcee into the notebook, it is still acting weird and showing an error: emcee not installed! Can’t sample! It looks like a clash of dependencies.

Trying to have fun while it lasts!

For now, the plan is to solve every bug I face in the journey and then proceed with understanding how everything connects and the next step is to come up with the report of optimization using JAX. Stay tuned for more on how JAX can accelerate and augment the current modeling framework.

Read more…

astropy@GSoC Blog Post #3, Week 3

So, it's the start of the 3rd week now. I will be virtually meeting Aarya and Moritz again Tom.

For the past few weeks now, I have been pushing commits to a Draft PR https://github.com/astropy/astropy/pull/11835 on GitHub. I wanted to have something working quite early in the project, in order to be able to pinpoint accurately when something doesn't work. This is why I started with directly adding the cdspyreadme code within Astropy. Afterwards, I am also writing the code from scratch. As more of the required features from cdspyreadme get integrated into cds.py, those files and codes added earlier will be removed.

About the reading/writing to Machine Readable Table format, in fact I wrote about it briefly in my GSoC Proposal that I could attempt it as an extension. I don't have an opinion on whether or not it should have it's own format classes etc. However, since the title of my GSoC project is to Add a CDS format writer to Astropy, I would prefer to work on the CDS format writer first and then on the MRT format. The MRT header anyway appears to be a bit simpler than the CDS header, so there shouldn't be much difficulty in the extension.

So, in a nutshell, this is my workflow:

  • Try out directly using cdspyreadme from within Astropy.
  • Add CdsData.write method.
  • Add a ByteByByte writer.
  • Write features to add complete ReadMe to the Header, starting off with having both ReadMe and Data in a single file.
  • Have features for writing separate CDS ReadMe and Data file.
  • Further work on some specific table columns, for instance, those containing Units and Coordinates.
  • Add appropriate tests along the way.
  • Resolve other issues that come up.
  • MRT format reader/writer.

I have completed the first three tasks and will now work on the fourth. I think by the time this finishes, a separate CDSColumn.py won't be required. I can open another PR which adds the Data writer, in the meantime.

Let's see how it goes!

Read more…

astropy@GSoC Blog Post #3, Week 3

So, it's the start of the 3rd week now. I will be virtually meeting Aarya and Moritz again Tom.

For the past few weeks now, I have been pushing commits to a Draft PR https://github.com/astropy/astropy/pull/11835 on GitHub. I wanted to have something working quite early in the project, in order to be able to pinpoint accurately when something doesn't work. This is why I started with directly adding the cdspyreadme code within Astropy. Afterwards, I am also writing the code from scratch. As more of the required features from cdspyreadme get integrated into cds.py, those files and codes added earlier will be removed.

About the reading/writing to Machine Readable Table format, in fact I wrote about it briefly in my GSoC Proposal that I could attempt it as an extension. I don't have an opinion on whether or not it should have it's own format classes etc. However, since the title of my GSoC project is to Add a CDS format writer to Astropy, I would prefer to work on the CDS format writer first and then on the MRT format. The MRT header anyway appears to be a bit simpler than the CDS header, so there shouldn't be much difficulty in the extension.

So, in a nutshell, this is my workflow:

  • Try out directly using cdspyreadme from within Astropy.
  • Add CdsData.write method.
  • Add a ByteByByte writer.
  • Write features to add complete ReadMe to the Header, starting off with having both ReadMe and Data in a single file.
  • Have features for writing separate CDS ReadMe and Data file.
  • Further work on some specific table columns, for instance, those containing Units and Coordinates.
  • Add appropriate tests along the way.
  • Resolve other issues that come up.
  • MRT format reader/writer.

I have completed the first three tasks and will now work on the fourth. I think by the time this finishes, a separate CDSColumn.py won't be required. I can open another PR which adds the Data writer, in the meantime.

Let's see how it goes!

Read more…

Rotation and Coordinates

Finally, the official “coding period” of GSoC finally began a couple of days ago. From where we started of with Sunkit-Pyvista, to where we are today makes me feel a tad bit happy! 😄

Weeks 1 and 2 were initially set out for me to complete adding rotation functionality to the library, which started off great, but ended up causing some confusion 😅.

This was quickly sorted out and we went with not having to implement rotation functionality and moved on, learning that not everything will go according to plan and it’s okay for stuff to not work out at times.

Read more…

Chapter 1: First Flight

Hey! Missed me? I’m back with another blog, the first related to the Coding Period. Got some progress and interesting observation to share!

Ready -> Set -> Code -> Analyze

The first thing I did in the coding period, was analyse the problem and get a feasible approach to resolve it.

Read more…

About my Google Summer of Code Project: Part 1

I had been eyeing Google Summer of Code last year (and the year before that), but never really got around to doing anything about it. It’s a wonderful learning experience and being in my final year of college this was the last opportunity I was going to get. So I decided to give it a shot.

I started late, sometime during late February. I picked out a few organizations that looked interesting to me. openastronomy particularly caught my eye because I was working on another project of mine related to Astronomy. In fact we were using one of the Python libraries under openastronomy. Now this is an umbrella organization, which means that there are multiple sub-organizations — sunpy, astropy, radis, poliastro, and a few more. They’re used extensively by the scientific community in their research. The project I selected was under sunpy, which is a Python library for solar data analysis. The project is about resampling data to increase or decrease its resolution — more on that later. Again, I had recently performed this operation in one of my projects, so it seemed only natural for me to go with this one. I worked on some issues on GitHub and submitted PRs, tried to get a hold of the codebase, put together a proposal, got feedback from the project mentors and friends, and submitted it. After about a month of impatient waiting, I received an email saying that my proposal was accepted! Awesome!

Now, I plan to continue writing these blogs throughout the project and since this is the first one, let me take a moment to talk about the project. So, there’s a sunpy-affiliated package called ndcube, which exists to provide users an easier way of handling coordinates. Astronomical data like images taken from cameras are usually stored as n-dimensional arrays. A dimension could represent spatial or temporal axes. In such an array, the pixel coordinates map to some coordinates in the real world. These could be RA and Dec, or in the case of solar data, Helioprojective Latitude and Longitude. Nevertheless, there needs to be a mapping from the pixels to the real world. This is given by the World Coordinate System, which is just a set of (complicated) mathematical transformations. ndcube is a package that correlates the actual data with its transformations in such a way that you can manipulate the data, and the transformations will continue to remain consistent. It can be used with any type of data like images, spectra, timeseries data, and so on.

Read more…

GSoC Progress Report? Almost Done!

With only 2 weeks into the coding period, it feels good to say that the bulk of the work associated with the 2 milestones is almost done. Almost because the newly added functionality is yet to be battle-tested and while there is always room for change and improvements, my focus will be shifting from the main objective to the optional objectives.

The Project

Scientific jargon ahead!

The study and interpretation of time-series data have become an integral part of modern-day astronomical studies and a common approach for characterizing the properties of time-series data is to estimate the power spectrum of the data using the periodogram. But the periodogram as an estimate suffers from being:

Read more…

JAX-based automatic differentiation: Introduction of modern statistical modeling to Stingray

I assume everyone reading this is already aware of two classical forms of differentiation, namely symbolic and finite differentiation. Symbolic differentiation operates on expanded mathematical expressions which lead to inefficient code and introduction of truncation error while finite differentiation deals with round-off errors. Optimized calculation of derivatives is crucial when it comes to training neural networks or mathematical modeling using bayesian inference. Both classical methods are slow at computing partial derivatives of a function with respect to many inputs, as is needed for gradient-based optimization algorithms. Here, automatic differentiation comes to the rescue. Automatic differentiation is a powerful tool to automate the calculation of derivatives and is preferable to more traditional methods, especially when differentiating complex algorithms and mathematical functions.

Photo Source: Wikipedia

Stingray is astrophysical spectral timing software, a library in python built to perform time series analysis and related tasks on astronomical light curves. JAX is a python library designed for high-performance numerical computing. Its API for numerical functions is based on NumPy, a collection of functions used in scientific computing. Both Python and NumPy are widely used and familiar, making JAX simple, flexible, and easy to adopt. It can differentiate through a large subset of python’s features, including loops, ifs, recursion, and closures, and it can even take derivatives of derivatives. Such modern differentiation packages deploy a broad range of computational techniques to improve the applicability, run time, and memory management.

JAX utilizes the grad function transformation to convert a function into a function that returns the original function’s gradient, just like Autograd. Beyond that, JAX offers a function transformation jit for just-in-time compilation of existing functions and vmap and pmap for vectorization and parallelization, respectively.

Read more…