Ideas page for GSoC 2015
Browse ideas for the following projects:
For each participating project, the ideas are organized from easiest to hardest.
Astropy core package
Implement Distribution Support for Quantity
Suggested Mentor(s): Erik Tollerud
Difficulty: Beginner to Intermediate
Astronomy knowledge needed: none, but statistics knowledge/background useful
Programming skills: Python
Description
The Quantity class is powerful but doesn’t have particularly useful support for uncertainties on quantities or other statistical approaches to thinking about numbers. A very straightforward way to make progress on this would be to create a subclass of Quantity called “Distribution” (or similar) that represents a probability density function of a quantity as Monte-Carlo-sampled arrays. This project would involve implementing this subclass, propagating operations while combining distributions, as well as tools for extracting useful information from such distributions. If there is time, this could also involve expanding this system to support common analytically-representable distributions such as Gaussian and Poisson distributions.
Implement image rasterization methods for models
Suggested Mentor(s): Christoph Deil
Difficulty: Intermediate
Astronomy knowledge needed: Basic
Programming skills: Python, Cython
Description
When fitting models to binned data, evaluating the model at the bin centers leads to incorrect results if the model changes significantly within a bin. E.g. think of an image where the point spread function (PSF) only has a width slightly above the pixel size and you want to distinguish small Galaxies from stars.
Currently Astropy models have an evaluate
method that can be used to
evaluate them on a grid of pixel centers, there’s also an oversampling function
to get a better representation of the expected flux in pixels. It would be
useful to add methods that allow fast and precise rasterization of models,
similar to what graphics libraries do (sparse subsampling or resampling of
models evaluated on grids that are appropriate for each model, or
anti-aliasing).
There are different options for how to proceed with this project, e.g. possibly add optional extension, sampling grid and bounding box information to the Astropy model classes, or contribute rasterisation code to astropy.modeling or photutils, or expand the existing resampling code in the reproject package. The student should be interested in model fitting and image rasterisation as well as profiling and extensive testing of a given method to make it “just work” for the end user.
Add indexing capability to Table object
Suggested Mentor(s): Tom Aldcroft (Astropy), Stuart Mumford (SunPy)
Difficulty: Intermediate
Astronomy knowledge needed: none
Programming skills: Python, Cython, familiarity with database algorithms
Description
The Table class is the core astropy class for storing and manipulating tabular data. Currently it supports a limited set of database-like capabilities including table joins and grouping. A natural extension of this is to provide the ability to create and maintain an index on one or more columns as well as a table primary key. With these indexed columns available then certain selection and query operations could be highly optimized. The challenge is to maintain the integrity of the indexes as column or table properties change, using state of the art algorithms for high performance.
There are various uses of this functionality, such as supporting time series data, where the index column would allow you to sort the Table correctly as well as performing operations such as truncations and merges while maintaining the integrity of the time series. Other uses include catalogs of positions in the night sky where an index column of astropy coordinate objects would maintain the uniqueness of every position.
To summarize:
- Add method to create an index for a specified column
- Add code to maintain these indexes when the table is modified
- Add method to designate a column as a primary key (possibly maintaining table in sort order for that key)
- Optimize existing table operations to use indexes where possible
- Add new methods to select table rows based on column values
Unify and improve file handling
Suggested Mentor(s): Michael Droettboom
Difficulty: Intermediate to Expert
Astronomy knowledge needed: none
Programming skills: Python, Unix features
Description
We have a number of packages that read and write data to files and file-like objects. While there was some initial effort to unify this code in get_readable_fileobj and others, in general each package is handling its own file I/O. This sort of code is notoriously difficult to get right across versions of Python and the different platforms we support, so it would be beneficial to remove this duplication. This also means that some features, such as gzip handling or URL handling, are not universally available or inconsistent across packages. Once this is unified, we can move on to some more advanced features that don’t exist anywhere in astropy, such as HTTP Range fetching (see astropy/#3446, and OS-level file locking to make multiprocessing applications that write to files more robust.
Implement missing astropy.modeling functionality
Suggested Mentor(s): Christoph Deil
Difficulty: Intermediate to expert
Astronomy knowledge needed: Basic
Programming skills: Python
Description
Implement some basic features are still missing in the astropy.modeling package:
- Fit parameter errors (symmetric and profile likelihood)
- Poisson fit statistic
- PSF-convolved models
- model parameter and fit result serialisation, e.g. to YAML or JSON or XML (e.g. some astronomers use XML)
For the parameter error and Poisson fit statistic part some statistics background is needed, as well as interest in discussing and finding a good API for these things.
An optional fun application at the end of this project (if model and fit result serialisation is implemented) could be to develop an interactive image fitting GUI (e.g. with IPython widgets in the web browser) for common 2D Astropy models, showing data, model and residual images and letting the user adjust model parameters and display fit statistics and results interactively.
Implement framework for handling velocities and velocity transforms in astropy.coordinates
Suggested Mentor(s): Adrian Price-Whelan & Erik Tollerud
Difficulty: Intermediate to Expert
Astronomy knowledge needed: understanding of coordinate transformations, some knowledge of astronomical coordinate systems would be useful
Programming skills: Python
Description
The coordinates subpackage currently only supports transforming positional coordinates, but it would be useful to develop a consistent framework for also transforming velocities (e.g., proper motion to proper motion, or proper motion to cartesian) with full support for barycentric, galactocentric, and LSR motion. This project could be:
- working with us to develop a consistent API for handling velocities within coordinates,
- developing a trial implementation of an API,
- actually doing core development to implement the new features, or
- some combination of all of the above.
Implement Public API for ERFA
Suggested Mentor(s): Erik Tollerud
Difficulty: Intermediate to Expert
Astronomy knowledge needed: None required, but may be helpful for understanding ERFA functionality
Programming skills: Python, Cython, C
Description
Some of the major functionality for Astropy uses the ERFA C library (adapted from the IAU SOFA library) as the back-end for computational “heavy-lifting”. Members of the community have expressed a desire to use this lower-level python wrapper around ERFA for other purposes that may not be directly relevant for Astropy. So this project would involve making the necessary changes to make the ERFA python API public. This includes:
- Getting the documentation up to the astropy standard (currently it is mostly auto-generated verbatim from the C comments).
- Implementing a more complete test suite for the python side of the code.
- Possibly moving it to a separate package as part of the liberfa GitHub organization. This would also include making the necessary changes to ensure everything continues to work in Astropy.
- Any other steps necessary to ensure the resulting package (or sub-package of Astropy) is stable and relatively easy to use.
Packages affiliated with Astropy
Develop an affiliated package for observation planning / scheduling
Suggested Mentor(s): Christoph Deil
Difficulty: Beginner
Astronomy knowledge needed: Intermediate
Programming skills: Python
Description
Now that Astropy can transform from horizontal (altitude/azimuth) to sky coordinates it’s possible to develop tools for observation planning / scheduling (see here for an example). It would be nice to start developing an affiliated package that can be used by observers and observatories to plan and schedule observations. This project could go in a few different directions, including:
- creating typical tables and plots for observation planning
- optimising scheduling of observations for given target lists and telescope slew speed / exposure lengths for a given night or even month / year
- contribute sun / moon rise / set functionality to astropy coordinates
- a desktop or web GUI
The project could start with a look at the functionality of existing tools and then gather some input on the astropy mailing list what the community wants. The student should have an interest in coordinates, observations planning / scheduling and plotting / GUIs.
Contribute gamma-ray data analysis methods to Gammapy
Suggested Mentor(s): Christoph Deil, Axel Donath
Difficulty: Beginner to intermediate
Astronomy knowledge needed: Basic
Programming skills: Python
Description
Gammapy is an Astropy-affiliated package to simulate and analyse data from gamma-ray telescopes such as Fermi, H.E.S.S. and CTA. A lot of basic functionality is still missing, specifically we think that contributing to one of the sub-packages gammapy.background (background modeling), gammapy.detect (source detection methods) or gammapy.spectrum (spectral analysis methods) would be a good GSoC project if you are interested in implementing specific established data analysis algorithms (e.g. adaptive-ring or reflected region or template background estimation, or spectrum forward-folding or unfolding methods) used in gamma-ray astronomy (but no prior gamma-ray data experience / knowledge needed).
Astropy Acknowledgement/Citation Generator
Suggested Mentor(s): Erik Tollerud
Difficulty: Beginner to Intermediate
Astronomy knowledge needed: none, although some experience with astronomy citation practices might be useful
Programming skills: Python and LaTeX/BibTeX
Description
Some parts of Astropy and affiliated packages use algorithms or tools that have been published in the scientific literature (this includes Astropy itself). To encourage citing these works, it would be useful if Astropy had a feature to allow attaching citations to methods, functions, or packages. This would then allow a user to simply run a function along the lines of “write_citations” and have it print or write a file that tells them what papers to cite. Bonus points if this actually can show BibTeX or LaTeX bibliography entries that can be just dropped into papers with minimal effort on the part of the user.
Adding further spectral standards to specutils
Suggested Mentor(s): Adam Ginsburg & Wolfgang Kerzendorf
Difficulty: Intermediate
Programming skills: Python
Description
Specutils is a package within the astropy collection that deals with operations with spectra. Apart from imaging, spectra are the second main data product in astronomy. While imaging data is collected by hooking a giant DSLR at the end of telescope and sticking coloured glass between telescope and DSLR (a filter), spectra are obtained by breaking light up into its components and then observing the resulting distribution. These data are saved in a variety of formats.
Currently, we are able to read and write a subset of standards that are out there. As a project, we suggest to implement the remaining unsupported standards. All of the code is in Python and a good understanding of classes is needed for this project.
Improve pyregion and pyds9
Suggested Mentor(s): Christoph Deil
Difficulty: Intermediate
Astronomy knowledge needed: Basic
Programming skills: Python
Description
The pyregion package is very useful to work with ds9 and CIAO region files. It is now at https://github.com/astropy/pyregion but it is unfinished … someone has to improve and polish it. In particular the region file parser is very slow (see pyregion#48 and someone interested in parsing should find out why and make it fast. There are several other things to do, e.g. using astropy coordinates everywhere and implementing tests so that it is compatible with ds9 to a very high accuracy. The package could also be extended with Python functions to read / write / visualise MOC files or to unify and [improve the existing Python interfaces to ds9. The student should be interested in sky coordinates and regions, parsing, visualisation, writing tests and docs, and for the ds9 interfaces some Cython coding is probably needed.
Revamp astropython.org web site
Suggested Mentor(s): Tom Aldcroft
Difficulty: Intermediate
Astronomy knowledge needed: Basic / none
Programming skills: Python, web development (javascript etc)
Description
The http://www.astropython.org site is one of the top two generic informational / resource sites about Python in astronomy. This site uses Google App Engine and is basically all custom code built around the bloggart engine. Currently it is getting a bit stale for a few reasons:
- There is no good mechanism for guest posting to expand the community of people contributing.
- It is painful to add content because of the antiquated entry interface which now seems to work only on firefox.
- The comment system is lacking (no feedback to comment authors etc).
- The website code itself is convoluted and difficult to maintain / improve
The proposal is to start over with all modern tools to bring fresh energy and involvement into this project. All details of how to do this to be determined, but one requirement is to migrate all the current content. Part of this would be re-evaluating current resources as well as digging around to freshen up the resource list.
ChiantiPy
GUI Spectral Explorer
Suggested Mentor(s): Ken Dere
Difficulty: Intermediate
Astronomy knowledge needed: A basic understand of astrophysical spectroscopy
Programming skills: Python
Description
The goal of this project is to provide a graphical user interface to enable a user to explore observed spectra and compare it with theoretical spectra. The basis for the theoretical spectra is the CHIANTI atomic database for astrophysical spectroscopy that was first released in 1997. Programmatic access to the database, which is freely available, is provided by the ChiantiPy package – a pure python package. It is highly object oriented with each ion, such as Fe XVII, being the basic object. Higher level objects are often assembled from a collection of ions, such as when calculating a spectrum. ChiantiPy uses the CHIANTI database to calculate line and continuum intensities as a function of temperature, electron density. This can be done for a set of elemental abundances in CHIANTI or for a user provided set of elemental abundances. At present, if a user wants to compare CHIANTI theoretical spectra it must be done on a case-by-case basis. A GUI explorer, written in Python and preferably PyQt or Wx based, will provide an integrated tool to import observed spectra and plot them alongside theoretical spectra. It will further allow the user to understand what spectra lines contribute to various spectral line profile, how the predicted spectra vary as a function of temperature and density.
It will be necessary to develop techniques to import observed spectra from a variety sources. Typical sources are in FITS files, HDF5 files, or csv files. It will also be important to allow users import their data through modules of their own.
SunPy
Improvements to the SunPy Database
Suggested Mentor(s): Stuart Mumford, Steven Christe
Difficulty: Beginner
Astronomy knowledge needed: None
Programming skills: Python, some database knowledge would be helpful, but not required.
Description
The database
module provides functionality to users to manage collections of files on disk in a way not reliant upon folder structure and file name.
The database allows users to find files on disk by either physical parameters, such as wavelength and time or properties of the instrument such as name and spacecraft.
It also allows more complex queries by enabling searches of the raw meta data associated with the files.
The improvements to the database functionality that would be implemented by this project include:
- Integration of the new
UnifiedDownloader
code into the database search, to replace the direct VSO integration current present. (The VSO is a repository of solar physics data, SunPy’s VSO API has been wrapped byUnifiedDownloader
.) - Support for relative paths in the database module #783 to allow a centralised database with multiple users, all referencing a central file store mounted with different absolute paths on each client.
- Supporting all data supported by the
sunpy.lightcurve
module in the database. The major hurdle here is the lack of standardisation in the file used by this data.
There are various other maintenance tasks which need undertaking (https://github.com/sunpy/sunpy/labels/Database) which would be a good way for someone interested in this project to familiarise themselves with the codebase.
Integrating ChiantiPy and SunPy
Suggested Mentor(s): Dan Ryan, Ken Dere
Difficulty: Beginner
Astronomy knowledge needed: Some knowledge of spectra.
Programming skills: Python.
Description
The CHIANTI atomic physics database is a valuable resource for solar physics. The CHIANTI database holds a large amount of information on the physical properties of different elements in different ionisation states and enabled the calculation of various parameters from this information. Using CHIANTI it is possible to calculate the spectra of various types of solar plasma (e.g., flare, quiet sun, etc.) from the observed elemental abundances and ionisation states. These synthetic spectra are essential for comparing to the data observed by various instruments to calculate the response functions of the instruments and to compare to the properties of observed plasma to allow the calculation of physical parameters such as temperature.
Currently, no SunPy code uses the Python interface to the CHIANTI database ChiantiPy. This project would develop the routines to be included in SunPy to use ChiantiPy for various physical calculations desired. The first potential use of ChiantiPy in SunPy is in the sunpy.instr.goes
module, where currently data tables calculated using CHIANTI are downloaded from the Solar Software (SSW) distribution, these data tables should be created using SunPy.
Other potential application of ChiantiPy in SunPy include:
- Conversion of ChiantiPy spectra objects to SunPy Spectra objects.
- Calculation of AIA temperature response functions from ChiantiPy contribution functions.
Expected Outcomes: This project would facilitate SunPy becoming independent from Solar SoftWare (SSW) in producing and maintaining files required by the sunpy.instr.goes module for determining the thermodynamic properties of the emitting plasma observed by GOES. It would also allow SunPy users to calculate spectra and exclusively through Python without relying on SSW.
Support for analysis of Solar Energetic Particles
Suggested Mentor(s): David Pérez-Suárez
Difficulty: Beginner
Astronomy knowledge needed: None
Programming skills: Python.
Description
SunPy is able to read a lightcurve from different sources (GOES x-ray, Lyra, Norh,…), however these are not all. SoHO/ERNE (Energetic and Relativistic Nuclei and Electron experiment on board SoHO) measures one of the important effects in Space Weather, Solar Energetic Particles (SEP). The data of such instrument (as for GOES particle measurements) comes as plaintext csv files with header information. This project should add ERNE to the SunPy supported instruments by being able to read these files in as a lightcurve object and allow to perform the basic operations used when such data is analysed: eg. energy ranges binning, visualisation, …
Lightcurve Refactor
Suggested Mentor(s): Stuart Mumford, Dan Ryan, Andrew Inglis
Difficulty: Intermediate
Astronomy knowledge needed: None
Programming skills: Python
Description
The Lightcurve
class is one of the three core datatypes in SunPy, along with Map and Spectra.
Lightcurve
is designed to read in, process and store meta data related to solar physics time series data.
Currently, Lightcurve
uses the pandas library as its underlying data structure, however, this is subject to change in the future.
Much like the map
submodule, lightcurve
needs to be able to read in various supported data formats (such as FITS, ascii and others in the future), store their meta data and give users Beginner and unified access to this metadata independently of the original source of the data.
As currently implemented (as of 0.5) the lightcurve
module performs three core tasks:
- Download the raw data
- Read this data into a pandas dataframe
- store the meta data obtained with the data.
As of the SunPy 0.6 release the first stage will be moved out of lightcurve
and into the net
subpackage as part of the UnifiedDownloader
(name subject to change) Pull Request.
This leaves lightcurve
in a similar position to map
where the data acquisition is not part of the core data type and is managed separately.
Therefore, enabling the implementation of a factory class like Map
for the lightcurve module.
Expected Outcomes
Someone under taking this project will complete the following tasks:
- Become familiar with the
UnifiedDownloader
code, if it has not been accepted into the SunPy codebase, complete the remaining tasks for this to be achieved. - Re-write any new lightcurve sources that were not included in the
UnifiedDownloader
code as sources forUnifiedDownloader
. - Write a factory class for
lightcurve
similar to thesunpy.map.Map
class. This class will be a generic constructor forlightcurve
allowing the user to instantiate any one of the many subclasses ofGenericLightcurve
present insunpy.lightcurve.sources
. The API design for the factory class is here: https://github.com/sunpy/sunpy-SEP/pull/6 - Design and develop a robust method of dealing with lightcurve meta data, which can handle joining different parts of timeseries from different files, each with their own meta data. (See #1122)
IRIS, 4D Cubes and GUI
Suggested Mentors: Steven Christe (NASA GSFC, SunPy), Nabil Freij (Sheffield University)
Difficulty: Intermediate to Expert
Astronomy knowledge needed: None
Programming skills: Python and basic knowledge of GUI design.
Description:
Recently, a new Sun observing satellite was launched, called IRIS. It performs high-resolution, multi-wavelength observations of the solar atmosphere. As a result, the data is saved out as 4D cubes. These cubes have the following structure, [Time, Wavelength, Spatial]. This format is also used by other ground and space-based telescopes.
Traditionally (which is a powerful thing in astronomy), data analysis is done using a programming language called IDL. Using this language, a GUI was created called CRISPEX and is used to do simple but effective analysis.
This project aims to create a smaller scale version that uses Ginga as a backend. Ginga is a file viewer that was created with astrophysics in mind. It allows basic manipulation of FIT files, which are the standard data container in astrophysics. A Python plugin will be created and integrated into Ginga, allowing the user to open 3D/4D datasets and perform basic analysis, such as, slit extraction.
To achieve this, a previous ESA summer project created a cube class. While it was finished, it was never integrated into SunPy. The code was created to hold and manipulate complex datatypes. It is similar in style to the SunPy Map Class and follows that convention. It however, has extra features enabling specific data formats to be extracted that the user requires, for example, a spectrum. The student will need to become familiar with this code, as small tweaks need to occur before it is added to SunPy.
Finally, the plugin will be created using Python. However, a background in QT would ideally be needed but it is not necessary. Ginga uses multiple backends for the GUI but we plan to use QT.
Plugin Features:
- Open FITS file and call the correct SunPy Map or Cube class.
- Solar coordinate integration.
- Perform slit extraction with the ability to choose a time and/or wavelength range.
Sunpy Feature:
- Full IRIS support.
yt
Enable volume rendering of octree datasets
Suggested Mentor(s): Matthew Turk, Sam Skillman
Difficulty: Intermediate
Astronomy knowledge needed: None
Programming skills: Familiarity with Python and Cython, and a familiarity with data structures such as octrees and B-trees.
Description
At present, volume rendering in yt works best with patch-based AMR datasets. Extending this to support octree datasets will enable a much greater diversity of data types and formats to be visualized in this way.
This would include several specific, concrete actions:
- Development of viewpoint traversal-ordering for Octree datasets
- Refactoring grid traversal methods to travel along the octree data structure without explicit parentage links (i.e., using built-in neighbor-finding functions)
- Optimizing for parallel decomposition of octrees in this way
Implementation of deep image format
Suggested Mentor(s): Matthew Turk, Kacper Kowalik
Difficulty: Advanced
Astronomy knowledge needed: None
Programming skills: Familiarity with Python and Cython, and a familiarity with z-buffering.
Description
Deep image compositing can be used to create a notion of depth. This could be utilized for multi-level rendering, rendering of semi-transparent streamlines inside volumes.
This would require:
- Developing a sparse image format data container
- Utilizing aforementioned container for multi-level rendering
Volume Traversal
Suggested Mentor(s): Matthew Turk, Sam Skillman
Difficulty: Advanced
Astronomy knowledge needed: None
Programming skills: Familiarity with Python and Cython, and a familiarity with data structures such as octrees and B-trees.
Description
Currently yt uses several objects that utilize brick decomposition, i.e. a process by which overlapping grids are broken apart until a full tessellation of the domain (or data source) is created with no overlaps. This is done by the kD-tree decomposition. This project aims to enhance current capabilities by providing easy mechanisms for creating volume traversal mechanisms. There are two components to this: handling tiles of data, and creating fast methods for passing through the data and moving between tiles.
This would require:
- Creating flexible (in terms of ordering) iterator over the “tiles” that compose a given data object
- Designing and implementing object for storing values returned by aforementioned iterator, that would:
- Cache a slice of the grid or data object that it operates on
- Filter particles from the data object it operates on
- Provide a mechanism for identifying neighbor objects from a given face index.
- Provide mechanisms for generating vertex-centered data or cell-centered data quickly
- Implement a mechanism for integrating paths through tiles, that would:
- define a method for determining when a ray has left an object
- define a method for selecting the next brick to traverse or connect to
- update the value of a ray’s direction