The Autumn of my GSoC project.
“Roads go ever ever on,
Over rock and under tree,
By caves where never sun has shone,
By streams that never find the sea;
Over snow by winter sown,
And through the merry flowers of June,
Over grass and over stone,
And under mountains in the moon.”
~ The Hobbit, J.R.R. Tolkien
When I first read the Hobbit, I always wondered what going on an adventure would feel like.
Would I be fighting goblins, striding through enchanted forests with wizards?
Would there be a treasure? Would there be a Dragon?
And as I grew older, would or could there be a happy ending?
Well, an adventure it indeed was.
Building a domain specific Deep Learning Toolkit, learning about packaging, testing, and the best practices, all to just set the package up!
I did fight goblins that were the bugs in my PyTorch pipelines. Those NaNs, oh those pesky NaNs! They really were like goblins that popped up in the input data when I least expected them.
Though I did enlist the help of my wizard mentors, David, Sophie, and Monica, and together we marched through the enchanted (or were they cursed?) forests of SWPC and HEK to get the data we wanted.
It was a fascinating experience. One that I will cherish forever.
I wish I could extend this summer, but my time is up, and all our goals for this project have been completed.
After all, all adventures must come to an end.
“The Road goes ever on and on
Out from the door where it began.
Now far ahead the Road has gone,
Let others follow it who can!
Let them a journey new begin,
But I at last with weary feet
Will turn towards the lighted inn,
My evening-rest and sleep to meet”
And so we come to the end of my GSoC journey, let’s talk about the treasure!
1 ) The Search Events Object
Often we find data representing the same observed physical phenomenon to have slightly different values when the data comes from different sources.
This could be due to noise or different choices of parameters for the preprocessing techniques employed.
This creates problems when the two datasets need to be compared, and /or may data exclusive to them.
We faced this problem when the data from NOAA that characterizes the Active Regions was not available in the Sunspotter dataset.
Although the observations were identical, they were not exact.
Their multidimensional nature also prevented matching by simple sorting.
In my repository, Pythia, we created a general Search Events Table Matching algorithm that would solve this problem.
Although it is general enough to be used with any tabular dataset, we do plan on making it more ‘intelligent’ so as to require minimal preprocessing from the user.
2) The Flare Forecasting
After significant preprocessing, we were able to get a good enough dataset to feed to our deep learning pipeline.
We wanted to predict if an Active Region would flare in the first six hours from its observation or not.
We were inspired by architectures, popular in academia in the domain of flare forecasting, Though our approach was the first to combine modern Deep Learning techniques for building our Convolutional Nets.
With some PyTorch magic, we were able to get a prediction accuracy of around 84% on the test set in the binary classification of whether an active region would flare or not. A study of flare forecasting using Machine learning in a fixed time frame from observation is unique in itself.
We implemented an Autoencoder to get a low dimensional representation of the Active Regions, so that they may be used with other scalar measurements.
We also were curious to see if the absolute orientation of the Active regions with respect to the sun was of any significance in the Active Region’s flaring activity.
While we were expecting a connection, we were surprised to see that the orientation mattered more in the case where the Active Region did flare than in the case where it did not. More work is required before we can quantitatively state our findings on this front.
What began as a project to analyze the Sunspotter dataset, has grown way beyond its original scope.
With the power of Pytorch Lightning and SunPy, Pythia is shaping out be a general-purpose Deep learning Toolkit for Solar Physics.
The barrier to entry for using Deep Learning in Solar Physics Research is quite high for people without the technical knowledge of Deep Learning and without the time to invest heavily into learning the many nuances of modern Deep Learning frameworks.
With Pythia, we plan on providing a SciKit Learn like interface, with the power and muscle of PyTorch and the elegance and order of Pytorch Lightning.
Although the expansion and generalization are still underway, with more use cases and help from the community, Pythia will surely help in making modern Deep Learning more accessible to the Solar Physics community.
“Roads go ever ever on
Under cloud and under star,
Yet feet that wandering have gone
Turn at last to home afar.
Eyes that fire and sword have seen
And horror in the halls of stone
Look at last on meadows green
And trees and hills they long have known.”
Unlike Bilbo, my Dragon isn’t evil like Smaug.
It isn’t greedy of gold, nor too large or too scary.
It tells me it prefers gardening over desolating lands as a hobby.
And has so far been fairly consistent in his participation in our friendship.
A little secret that not many know is that my dragon doesn’t have a physical existence.
I confide in my reader and trust them to keep this secret, a secret.
The Dragon I am talking about, of course, is this newfound sense of spirit and the freedom that I have acquired over this summer.
This summer has been magical for me.
This summer, I worked on what I would call a dream project, using AI in Natural Sciences.
I interacted with scientists from NASA, UCL, Trinity, etc., all of whom are heroes to me.
I co-founded an Open Source organization, The WildfirePy Project, for doing my part in studying, understanding, and preventing wildfires.
And finally, I got my first job! Becoming financially independent, and working on projects that make a difference!
Even in my wildest dreams, I never imagined such fortune and serendipity.
In all honesty, I’m not sure if the Dragon will stay, or fly away.
I am yet to start counting on it for its company when I need it.
But I hope it stays. :)
GSoC and Open Source in general, have been life-changing for me.
More than anything, it has given me a way to express myself, my love for science, my love for code and my desire to be part of something bigger than myself.
And act on it.
Open source is nourishment to my soul, and I shall forever be grateful to Google Summer of Code for helping me get started with it.
And to SunPy for making my life more luminous!
“The Road goes ever on and on
Down from the door where it began.
Now far ahead the Road has gone,
And I must follow, if I can,
Pursuing it with eager feet,
Until it joins some larger way
Where many paths and errands meet.
And whither then? I cannot say.”
Well, I wouldn’t know.
I am just getting started!