Moving towards the right approach
The last few weeks have been a whirlwind of ideas, coding, and rethinking as we figured out the best way to build my dashboard project.
Our First Idea:
The last few weeks have been a whirlwind of ideas, coding, and rethinking as we figured out the best way to build my dashboard project.
Our First Idea:
Hey everyone welcome to the second episode of my Google Summer of Code project series, where I’m working on partial decompression for large datasets.
So, what’s the big catch here? Well, the CO₂ dataset I’m working with is about 6 GB in its compressed .bz2form, and when you decompress it, it explodes into 50 GB. Most systems struggle to load that much data into memory or parse it into a DataFrame, either due to storage limits, memory, or swap issues.
And obviously not everyone wants the whole 50 GB anyway. Usually, people need just a 1 GB chunk from somewhere inside. So decompressing the entire thing just to fetch a small part is a massive waste of time and resources.
The past couple of weeks have been really productive. After the initial planning and community bonding period, I’ve finally started working on the actual implementation. The transition from understanding the theory to getting hands-on with the code has been challenging but rewarding.
After writing my last blog post and going through the codebase more thoroughly, I discovered there are several existing classes and methods that can be reused for this project. This has led to a revised approach that builds on what’s already working well in RADIS.
The core approach is the same as originally planned, but now the rovibrational populations are calculated using the existing RovibParFuncCalculator. This lets me reuse code while adding the electronic state functionality we need.
In my continued journey with Stingray.jl
During GSoC 2025, this phase focused on a core aspect of high-energy astrophysics: time filtering using GTIs (Good Time Intervals) and BTIs (Bad Time Intervals). After a productive discussion with my mentor @matteobachetti during our meet, I dove into implementing and refining functionality around GTIs—an essential tool in the timing analysis of astrophysical data.
As part of the ongoing development, we’ve now begun focusing on the frontend side of the project, improving the user experience and preparing for more database integrations.
To ensure a clean development workflow, I started by:
A major enhancement was implementing dark mode across the entire frontend, not just the MUI (Material UI) components, but also the plotting graph area.
Hey everyone!
If you’ve read my first blog, you already know how this journey started — with nervous excitement, inspiring mentors, and my deep love for astronomy. Since then, things have only gotten more interesting — and yes, more challenging, but in the best way possible!
Finally, I have been contributing to scientific open source for about a year now and it has taught me a lot I mean a lot, I still remember searching for a simple documentation issue so I could get it merged and call myself a “contributor” haha, and that got me started in scientific open source. Since then, I have been able to make some truly meaningful contributions to many projects, and here I am writing a blog for GSoC with Radis which is a pythonic library for fast line-by-line code for high resolution infrared molecular spectra, under the OpenAstronomy umbrella.
So my project is cool ngl, and it is titled “Fast Parsing of Large Databases and Execution Bottlenecks” basically there exists a large highly compressed CO₂ spectroscopic database of size 6 GB file that decompresses to about 50 GB and takes at least 2.5 hours to parse and convert into a DataFrame. As you might expect, my project is about significantly reducing the parsing time and finding a workaround for storing only the “necessary” parts of the decompressed file.
Radis is pythonic and sometimes python gets real slow if not used the way it is meant to be used, so my initial thought was to first clean up the existing code and use vectorised operations and with numba we should be able to see some real optimisation. But then I realized the current implementation already has the right vectorized operations on DataFrames, and Pandas’ vectorized methods are already implemented in optimized C/Cython loops. So there isn’t much more to do here other than replace a few overheads with other operations. After that, as discussed with my mentors I can use a C++ Single Instruction Multiple Data (SIMD) mechanism to parse the file and create a python interface on top of that with pybind11 or cython and other option but this will cost us portability as compilation will be a thing that is to considered. Other approach which is using vulkan API in python as it supports CPU as well GPU parallelism and its cross platform as it will works on all CPU architectures.
A month has passed since the GSoC results came out, and honestly, and the excitement still hasn’t quite settled. That announcement day was pretty intense, my mind was racing, second-guessing everything. But wow, things worked out even better than I’d hoped.
The community bonding period hit right during my final exams, which was tricky timing. So I focused on what mattered most, getting to know my mentors, joining the weekly calls, and really wrapping my head around the project from a theoretical standpoint.
I’m working on something called “Electronic Spectra for RADIS.” Basically, RADIS is this awesome library for modeling rovibrational spectra, but it’s missing one key piece; electronic transitions. That’s where I come in! I want to add that missing functionality and make RADIS even more powerful.
The first weeks of GSoC were all about getting familiar with the project, understanding its current state, and laying the groundwork for upcoming work. We began by listing the key features we plan to implement and prioritizing them based on impact and feasibility.
Before diving into development, we focused on improving the current state of the codebase:
This cleanup helped fix some bugs and made the repository more maintainable moving forward.
Wow! I still can't believe I'm actually in Google Summer of Code! This was a dream, and now here I am, officially past the Community Bonding Period and diving headfirst into the Coding Phase.
What’s Happening Right Now?
Right now, I’m sketching out the basic structure of the application interface. The first big task? Fetching event files from specific sources. Sounds simple, right? Well… not quite