GSoC 2020: Generalization of Clients
This fortnight, I worked around iterating over different designs for redesigning the Dataretriever Clients so that its implementation can be simpler and more general. Generalization here means the ability to inherit most of the methods from the base class itself; therefore minimizing a number of similar methods in the subclasses.
Show Method for QueryResponse
I got PR #4309 merged which solved an old Issue #556. A simple show() function in ~sunpy.net.base_client.BaseQueryResponse enabled QueryResponse objects for Dataretriever, VSO, and JSOC clients to specify the columns to be shown in the result.
This returns an ~astropy.table.Table instance, so table operations can also be easily performed on the result.
_extract_files_meta method in Scraper
PR #4313 was merged in sunpy:master that allows the scraper to extract the metadata stored in the file URLs. This function will prove very useful for refactoring the whole ~sunpy.net.dataretriever submodule.
A new module parse was added in ~sunpy.extern which allowed to specify the extractor which will parse the file URL and return a dict containing the value of attrs like Wavelength, Time, Level, etc. Even the URL is returned by the method, which ensures no changes are to be made in fetch() methods for clients.
Playing with post filters
Last fortnight I was working with post filters and concatenation of responses for VSO. Last week I dabbled a bit with post-filters for attrs used in all net clients. Using single_dispatch decorator over filter_results enabled post-filtering in dataretriver and VSO. It is pretty similar to the way it is done for ~sunpy.net.vso.attrs.
Redesigning GenericClient
So there is a draft PR #4321 where work is in progress for the new implementation for the generalization. QueryResponeBlock is removed from the client.py since by few changes in the QueryResponse enables us to do it all using a dictionary. Similarly a lot of methods were removed or changed under the aim to simplify the two Generic Classes.
Not only the base class, even the client class were made simple. For simple clients, we are supposed to only define required attrs, optional attrs, a baseurl, and an extractor which makes the search working.
After this refactoring, the example dataretriever source client class would look something like this:
Only a class method and few class attributes are sufficient for defining a simple DR client!
Hooks for translating attrs
There are some clients which deviate from generalization. For those clients, it was discussed in a meeting with mentors that post-hooks and pre-hooks for scraper are to be designed which shall perform a translation of attrs provided in the search to their representation in the url. While working around it, I discovered few bugs in fermi_gbm and other clients to be addressed in scraper hooks. Responses for Detector numbers 10 and 11 were never returned because in the url they were represented by na and nb respectively. They will be fixed by translators as a part of pre-hook before passing it to the scraper.
Moving Rhessi out from Generic
Since rhessi didn’t follow the Generalization as the metadata can’t just be extracted from the file URL, so it is being implemented as subclass of base_client.
Every week we move closer and closer to Generalization :). Enjoying the meetings where I and my mentors discuss the pros and cons of different designs of GenericClient!