GSOC 2020: The Coding period commences!
So we got started with the coding period, I had a couple of community meetings with my mentors and few full community meetings where I discussed what I was working upon and what was needed to be done.
Major work in this fortnight was on refactoring Dataretriever Clients QueryResponse tables Pull request, PR #4213. This enabled the simple clients to show more metadata information like SatelliteNumber , Detector, Level, etc. in their response tables. All this information was extracted from the URL corresponding to the desired files using the parser.
What did it change? Earlier the things looked like this:
And now:
Those np.nan Wavelength values annoyed me the most and now we not only be having the correct wavelengths, but other details too reflected in the response table. All the columns which were not relevant to the client were removed.
Was that all about it? No. We need to find better ways of implementing the same feature. Earlier I used a _get_metadata_for_url method to extract all details from the URL, which was separately implemented for all clients. After getting the suggestions from my mentors, I implemented it in a better way; extracting all info in the scraper itself. After completing all tests, we discovered that there can be an even better way of doing this; by removing the client-specific _get_url_for_timerange() method itself! I used the registered attrs for achieving the same. All attrs were iterated to get the list of all possible directories, and then the only thing scraper has to do was pattern matching.
The idea was to closely link scraper and GenericClient to have minimum client-specific code in their class implementations. I’ll push the changes as I complete all failing tests due to the change and add documentation for the API.
All the 4 Ground Clients PRs were closed, after a discussion with the SunPy and VSO community. I updated my Gong Synoptic Client Pull Request and got so far all reviews resolved. This would enable SunPy to access the Magnetogram Synoptic Map archives from NSO-GONG. Originally the issue was opened in pfsspy. I also worked on a fix to the wrong goes Satellite Number issue in PR #4288 recently. Using **kwargs in _get_overlap_urls method fixed the bug.
There were other PRs too made and updated in this period which were merged before SunPy’s 2.0 release. I reduced the time for a goes_suvi client test from 8–10 secs to 1.5–2 secs on my system, in PR #4099. I had to explore why scraper took so much time for the test. Another one PR #4132 was a way to prevent a future bug in scraper’s filelist method; so now it checks if the <a href> in any webpage is None or not.
PR #4011 was also updated which will restore the ability to post search filter the responses from VSO. I also went through the JSOC codebase and fido_factory.py to understand the complexities of implementation of Fido post-search filter in SunPy. It is the next target in my Project. Just as a glimpse, this is how the VSO will look after post search filtering. I have added an extra concatenation routine by overloading + operator.
I’m enjoying my summer now, wherever I face diffculties I talk with my mentors to get it resolved. I faced some issues in connectivity due to the thunderstorms out there in my city, but now everything is back to normal. The weather is pleasant now so I can engage more!
Looking forward to making more PRs in the next fortnight!
CARPE NOCTEM!