Ideas page for GSoC 2016

Browse ideas for the following projects:

For each participating project, the ideas are organized from easiest to hardest.


If you are interested in one of the following Astropy Project ideas please see the Astropy GSoC 2016 Guidelines for additional information that is specific to Astropy.

Implement Scheduling capabilities for Astroplan

Suggested Mentor(s): Erik Tollerud, Eric Jeschke, Josh Walawender

Difficulty: Beginner to Intermediate

Astronomy knowledge needed: Basic understanding of how astronomy observations work, practical experience a plus

Programming skills: Python


The astroplan affiliated package is an Astropy affiliated package that provides tools for planning observations. One valuable feature that astroplan could provide is basic scheduling capabilities for an observing run. Many large observatories have their own schedulers, but this package would be targeted at the needs of the typical individual or small-collaboration observing run. While some initial efforts have occurred, this project would involve expanding those efforts into a full-fledged API and implementing both the interface and the actual scheduler(s).

Ephemerides for Solar System objects in Astropy

Suggested Mentor(s): Marten van Kerkwijk, Erik Tollerud

Difficulty: Beginner to Intermediate

Astronomy knowledge needed: Some understanding of astronomical coordinate systems, basic knowledge of solar system dynamics (or ability to learn as-needed to implement the specific algorithms required)

Programming skills: Python, some knowledge of C might be helpful


An often-requested missing feature in Astropy is the ability to compute ephemerides: the on-sky location of Solar System objects like the planets, asteroids, or artificial satellites. This project would involve implementing just this feature. This will likely start with implementing a get_moon function similar to the existing get_sun to familiarize the student with the important concepts in the astropy.coordinates subpackage. The larger part of the project will likely involve using the orbital elements that the JPL Solar System dynamics group has already complied (there is already a package to read these files: JPLEphem), and translate those into the Astropy coordinates framework. The student will implement these algorithms and also collaborate with the mentors and Astropy community to develop an API to access this machinery.

Implement Public API for ERFA

Suggested Mentor(s): Erik Tollerud, Tom Aldcroft

Difficulty: Intermediate to Expert

Astronomy knowledge needed: None required, but may be helpful for understanding ERFA functionality

Programming skills: Python, Cython, C


Some of the major functionality for Astropy uses the ERFA C library (adapted from the IAU SOFA library) as the back-end for computational “heavy-lifting”. Members of the community have expressed a desire to use this lower-level python wrapper around ERFA for other purposes that may not be directly relevant for Astropy. So this project would involve making the necessary changes to make the ERFA python API public. This includes:

  • Getting the documentation up to the astropy standard (currently it is mostly auto-generated verbatim from the C comments).
  • Implementing a more complete test suite for the python side of the code.
  • Possibly moving it to a separate package as part of the liberfa GitHub organization. This would also include making the necessary changes to ensure everything continues to work in Astropy.
  • Any other steps necessary to ensure the resulting package (or sub-package of Astropy) is stable and relatively easy to use.

Web development for Gammapy

Suggested Mentor(s): Christoph Deil, Johannes King

Difficulty: Intermediate to Expert

Astronomy knowledge needed: None.

Programming skills: Scientific python (Numpy, Scipy, Astropy), Web development (Python backend, Javascript frontend)


Gammapy is a Python package for professional gamma-ray astronomers. We are looking for a web developer with good Python, HTML and Javascript skills that is interested in building web pages and apps to display and browse gamma-ray data and maybe even launch Gammapy analyses. There’s a few different projects we’d like to see realised, depending on your interests and skills. One option is to build a much-improved version of TeVCat (a TeV catalog browse web page), that includes more image and catalog data and interactivity (maps that pan & zoom, search field for source name) with the general public as well as professional gamma-ray astronomers as the target. This project would mostly be front-end development, plus Python scripts to prepare the images and catalogs in suitable formats. Another option is to write several small static site generator scripts or Python web apps that let us browse the gamma-ray data and analysis results, basically a web GUI for Gammapy. That project would mostly be Python web app development, and you have to learn a bit more about Gammapy before GSoC starts.

Data analysis for Gammapy

Suggested Mentor(s): Christoph Deil, Johannes King

Difficulty: Intermediate to Expert

Astronomy knowledge needed: Some, e.g. sky coordinates and projections. Experience with X-ray or gamma-ray data analysis (e.g. Fermi-LAT) is a plus, but not a requirement.

Method knowledge needed: Some experience in data analysis (e.g. images, regions) and statistics (e.g. Poisson noise).

Programming skills: Python (including pytest and Sphinx) and scientific python (Numpy, Scipy, Astropy)


Gammapy is a Python package for professional gamma-ray astronomers. We are looking for someone that’s interested towork on a few distinct data analysis tasks, each taking a few weeks of the GSoC total time. Gammapy is a very young project, and there’s a lot to do. Examples of what needs to be done include implementing new algorithms (e.g. image reprojection, source detection, region-based analysis), bringing existing prototype algorithms to production (improve API and implementation, add tests and docs) as well as grunt work that’s needed to go towards production quality and a Gammapy 1.0 release this fall (e.g. set up continuous integration for example IPython notebooks or adding more tests). To get an idea of what is going on in Gammapy and what still needs to be done, please check out the project on Github ( and browse the documentation a bit (or try out the examples) and if this looks interesting to you, send us an email and let us know what your skills and interests are.

Implement PSF photometry for fitting several overlapping objects at once

Suggested Mentor(s): Moritz Guenther, Brigitta Sipocz

Difficulty: Intermediate to Expert

Astronomy knowledge needed: basic understanding of what photometry is

Programming skills: Python


The photutils package is an Astropy affiliated package that provides tools for photometry (measuring how bright a source is).

There are several ways to do photometry and the package currently implements aperture photometry (just add up all the flux in an image in some some area) and single source point-spread-function (PSF) fitting (fit a function such as a Gaussian to the image). In many situations, sources may overlap in the image, e.g. when observing a dense star cluster, so that we need to fit many functions at once. However, the simple brute-force approach “just fit a model with hundreds of parameters” if there are hundreds of stars usually fails.

This project includes looking at other astronomy codes to see how they tackle the problem; select, modify and improve an algorithm that fits into the astropy modelling framework; implement this in python; and, if it turns out that speed is a problem, move speed-critical parts to Cython. To verify that the new code works, we will compare it to the solutions of established PSF photometry codes.

See for a discussion of some problems and possible solutions that will be addressed in this project.

Bridge sherpa and astropy fitting

Suggested Mentor(s): D. Burke, T. Aldcroft, H. M. Guenther

Difficulty: Expert or better

Astronomy knowledge needed: fitting functions and statistics

Programming skills: Python, C, Cython


Both astropy and Sherpa ( provide modelling and fitting capabilities; however, Sherpa’s features are way more advanced. Sherpa provides far more build-in models, a larger choice of optimizers and a real variety of fit statistics. Unfortunately Sherpa is less well known and for historical reasons the object-oriented user interface is less polished than the functional state-based interface. The main goal is the bring Sherpa’s optimizers and fit statistic functions to astropy; the stretch goal is to develop a bridge between both packages such that a user can use a astropy models completely interchangably with Sherpa models and fitters. Sherpa models should look like astropy models to astropy to enable situations where the model can be made out of three components (a user defined model, an astropy model and a Sherpa model) and this is then fitted to astropy data using the Sherpa fitters.

This project requires the student to get proficient in two major packages (Not an easy task!), but with code written in just a few weeks of GSoC it will give astropy users access to fitting capabilites that required many years of developer time and that are unfeasable redevelop from scratch.

Enhancements to Ginga, a Toolkit for Building Scientific Image Viewers

Suggested Mentor(s): Eric Jeschke, Pey-Lian Lim, Nabil Freij

Difficulty: Beginning to Advanced, depending on project choices

Astronomy knowledge needed: Some, depending on project choices

Programming skills: Python and scientific python (Numpy, Scipy, Astropy), git version control

Desirable: OpenCL, Javascript/web sockets, C/C++ programming, experience in image or array processing, concurrent programming, experience in using GUI toolkits, github-based workflow


Ginga is a toolkit for constructing scientific image viewers in Python, with an emphasis toward astronomy. Ginga is being used at a number of observatories and institutes for observation and instrument control, quick look, custom data reduction and analysis tasks. The general aim is to build upon this toolkit improving its current features and to expand this toolkit in order for scientists to be able to easily accomplish preliminary data analysis.

We are looking for an individual to work on a few select project areas, depending on skill level and interest. Each project area itself would form a small part of the overall GSOC project. Essentially it would be a large pick and mix but do not let this put you off. This method would allow a range of different contributions to be made the Ginga toolkit that are for your choosing.


  • Improve and expand Ginga’s unit test suite and coverage
  • Improve documentation and tutorials, including via Jupyter notebooks and video voice-overs
  • Improve our “native app” packaging for Mac, Unix and Windows
  • Improving LineProfile and Slit plugins
  • Enhance existing plugins by adding GUIs for some common tasks like configuring catalog sources, which are currently done by editing config files
  • Add support for loading broken FITS files by [“fingerprinting” them] (


  • Improve Ginga backends for web browsers (native javascript/web sockets and/or Jupyter notebooks and/or Bokeh server)
  • Enhancements to “traditional” GUI backends (e.g. add support for gtk3, AGG support for python 3, improvements to Qt-based widgets)
  • Graft the astropy-helpers package into Ginga
  • Adding support for calculating approximate line-of-sight velocities
  • Enhance existing plugins for data analysis tasks, usually featuring astropy or affiliated packages


  • Implement an OpenCL module that leverages CPU and GPU resources for accelerating some common image processing operations (scaling, transformations, rotations) on numpy image arrays. Benchmark against current CPU based solutions.
  • Improving IO speeds by optimizing use of, lazy reads, file caching hints, optimizing concurrency, etc.
  • Adding support for a binary file format used by a very popular ground-based solar telescope and extending it to support Stokes data products

If you are interested in working on any of these aspects, or want to propose some other work on Ginga, please sign in to Github and comment on Assist the Ginga Project.


Improve Python bindings to CasaCore measures

Suggested Mentor(s): Ger van Diepen, Tammo Jan Dijkema

Difficulty: Intermediate

Astronomy knowledge needed: Some understanding of astronomical coordinate systems and transformations

Programming skills: Python, some C++


CasaCore contains many features to perform astronomical coordinate transformations, for example from B1950 to J2000, or from J2000 to Azimuth-Elevation. Moreover, it can compute ephemerides, which may make it useful for many other projects. See The current python binding python-casacore contains a python binding to the measures library, but this is not a very programmer friendly binding, and thus not much used. An interface to measures exists within CasaCore that makes converting coordinates much easier. This interface was written with TaQL in mind. This project concerns modifying the TaQL measures interface to a python measures interface, thus making casacore measures easily accessible from Python

Frequency conversions for TaQL / python-casacore

Suggested Mentor(s): Ger van Diepen, Tammo Jan Dijkema

Difficulty: Beginner / Intermediate

Astronomy knowledge needed: Some understanding of use of astronomical frequencies (regarding Doppler shifts etc.)

Programming skills: C++


The casacore measures module contains code for converting frequencies between various reference frames (e.g. Rest frequency, Geocentric, Topocentric, Galacto centric). Having this module available in TaQL would make it much more convenient to perform these kind of conversions. Example code exists for other conversions, see e.g.

This project concerns writing such a converter for the Doppler and Frequency conversions. It will require tweaking in boost-python, but since the example code is available for other measures, it should not be too hard.

General python-casacore cleanup

Suggested Mentor(s): Gijs Molenaar, Ger van Diepen

Difficulty: Intermediate

Astronomy knowledge needed: none

Programming skills: python


The current python-casacore code is already much improved over the previous “pyrap” implementaion. This python binding to casacore is now python 3 compatible, contains some unit tests, etc. But some work remains to be done:

  • Remove all compile warnings
  • Modernise code, add missing features, maybe more ‘pythonic’.
  • Improve test coverage (24% at the moment)

This is a typical project to learn making good code.

Table plotting for python-casacore

Suggested Mentor(s): Ger van Diepen, Tammo Jan Dijkema

Difficulty: Beginner

Astronomy knowledge needed: Some idea about astronomical units

Programming skills: Python

Radio interferometric data sets are almost always stored in casacore “Measurement Sets”. These can be queried through TaQL, see e.g. It would be nice to have a plotting routine in python-casacore to easily plot two columns against each other, which nicely formatted axes etc (possibly using wcsaxes).

This would, at the very least, make a nice extension to the taql jupyter kernel underneath


Solar Storms forecasting server

Suggested Mentors: Antonio del Mastro , Olena Persianova

Difficulty: Intermediate to Hard

Astronomy knowledge needed: None beforehand, the student will be required to research relevant publications.

Programming skills: advanced Python; basic Theano or TensorFlow; basic Django or Flask; experience with some ANN library, such as Keras, theanets or Lasagne.


Solar storms are responsible for disruption of satellite communication, and damage to space electronical equipments. The storms have to be taken into account also for EVA and habitat maintenance activities, as the higher levels of radiation brought by them have a detrimental effect on the crew member’s health.

Prediction of these storms are essential to prevent said damage. A lot of astronomical data is generated on a daily basis, and this could be used in conjunction with machine learning methods to predict solar storms.

In this project, the student will be required to:

  • Using a machine learning approach, predict the duration and intensity solar storms:
    • The student should use preferably an artificial neural networks approach (although alternatives, such as random forests, SVM, bayesian models or HMMs, can be considered).
    • The predictions should be given with 24-48 hs in advance of a storm (depending on viability).
    • The student should evaluate training and test data provided by IMS, or find a suitable datasaet, if the data provided is unsuitable.
    • The student should evaluate an approach suggested by the IMS to test the model’s performance, or propose a testing procedure of his/her own.
  • Provide information on a dynamically updated web page, using preferably Django or Flask, which should at least include:
    • The real-time and historical sensor’s values; as plots, when appropriate.
    • Useful statistics about the sensors (TBD).
    • The model’s predictions.
    • Useful statistics about the predictions (e.g. RMSE)
  • Incorporate the prediciton model and the web page into the ERAS ecosystem, which means building Tango device servers (at least one for the predictor, more if necessary).

Currently, a few features are being used for the prediction of solar storms, among others:

  1. Radio flux
  2. Sunspot area
  3. Sunspot Number
  4. X-ray Background Flux

We recommend the student to research the viability of using more features in the model.

Some other resources to get started:

NASA’s Solar Storm and Space Weather FAQ

Space Weather Prediction Center’s Historical SWPC Products and Data Displays

Note: If you are interested in this project, please apply via the Python Software Fundation


Image compression and efficient table reading in FITSIO.jl

Suggested Mentor(s): Kyle Barbary, Ryan Giordan

Difficulty: Intermediate to Expert

Astronomy knowledge needed: none

Programming skills: Julia, some C


FITS (Flexible Image Transport System) format files are the standard containers for imaging and tabular data in astronomy. The FITSIO.jl package provides support for reading and writing these files in Julia. It is implemented as a high-level, yet efficent, wrapper for the C library cfitsio. This project would involve improving the available functionality and performance of FITSIO.jl. Some desired features, such as I/O of compressed images, reading and writing subsets of tables and appending to existing tables already have implementations in cfitsio. Implementing these would involve understanding the cfitsio API and writing wrappers in Julia using ccall. A more advanced feature is fast I/O of large tables with multiple columns. No C API exists for this – it would involve understanding memory layout of FITS tables, pointer arithmetic and byteswapping. It will be interesting to see whether this can be done efficiently in Julia or if a C shim is required. Finally, benchmarks should be implemented to ensure that FITSIO.jl performance is as good as possible.

For a more detailed list of features, see this issue.


Lightcurve Refactor

Suggested Mentor(s): Stuart Mumford, Dan Ryan, Andrew Inglis, Jack Ireland

Difficulty: Beginner

Astronomy knowledge needed: None

Programming skills: Python


The Lightcurve class is one of the three core datatypes in SunPy, along with Map and Spectra. Lightcurve is designed to read in, process and store meta data related to solar physics time series data. Currently, Lightcurve uses the pandas library as its underlying data structure, however, this is subject to change in the future.

Much like the map submodule, lightcurve needs to be able to read in various supported data formats (such as FITS, ascii and others in the future), store their meta data and give users unified access to this metadata independently of the original source of the data.

As currently implemented (as of 0.6) the lightcurve module performs three core tasks:

  1. Download the raw data
  2. Read this data into a pandas dataframe
  3. store the meta data obtained with the data.

As of the SunPy 0.7 release the first stage will be moved out of lightcurve and into the net subpackage as part of the UnifiedDownloader Pull Request. This leaves lightcurve in a similar position to map where the data acquisition is not part of the core data type and is managed separately.

The objective of this project is to re-implement the core of the lightcurve submodule, such that it no longer contains the code to download data from the internet. The lightcurve module should be able to open file from disk that have been downloaded using the new UnifiedDownloader submodule. The lightcurve factory must be able to read files from multiple sources some of which will be able to be auto-detcted and some which will not. The lightcurve module must also be able to combine multiple files into a single timeseries.

Expected Outcomes

Someone under taking this project will complete the following tasks:

  1. Become familiar with the UnifiedDownloader code, if it has not been accepted into the SunPy codebase, complete the remaining tasks for this to be achieved.
  2. Write a factory class for lightcurve similar to the class. This class will be a generic constructor for lightcurve allowing the user to instantiate any one of the many subclasses of GenericLightcurve present in sunpy.lightcurve.sources. The API design for the factory class is in SEP 7.
  3. Design and develop a robust method of dealing with lightcurve meta data, which can handle joining different parts of timeseries from different files, each with their own meta data. (See #1122)

A successful proposal for this project will demonstrate that the applicant has understood the mechanism behind the Map factory as already implemented in SunPy and presents a timeline of what things need to change in Lightcurve to mirror the design of Map and follow the design for Lightcurve in SEP 7.

Implementing AIA response functions in SunPy

Suggested Mentor(s): Drew Leonard, Will Barnes

Difficulty: Beginner

Astronomy knowledge needed: Some knowledge of coronal emission processes would be beneficial.

Programming skills: Python.


The CHIANTI atomic physics database is a valuable resource for solar physics. The CHIANTI database holds a large amount of information on the physical properties of different elements in different ionisation states and enables the calculation of various parameters from this information. Using CHIANTI it is possible to calculate the spectra of various types of solar plasma (e.g., flare, quiet sun, etc.) from the observed elemental abundances and ionisation states. These synthetic spectra are essential for calculating reponse functions of various instruments. An instrument’s wavelength response function describes how much light emitted at a given wavelength is measured by the instrument. Similarly, the temperature response function describes the instrument’s sensitivity to light emitted by plasma at a particular temperature. These response functions play a vital role in correctly interpreting observations, as does proper calculation of these functions.

Currently, SunPy has no implementation of instrument response functions. This project would develop the routines necessary to calculate response functions using the Python interface to the CHIANTI database, ChiantiPy. The primary implementation of this would be to produce default wavelength and temperature response functions for the Atmospheric Imaging Assembly instrument. A detailed discussion of the AIA response functions can be found in Boerner et al 2012.

Other potential applications of ChiantiPy in SunPy include:

  1. Generalisation of the code to produce response functions using arbitrary values of physical parameters (elemental abundances, etc.).
  2. Calculation of reponse functions for other instruments.
  3. Conversion of ChiantiPy spectra objects to SunPy Spectra objects.

Expected Outcomes: This project would facilitate SunPy becoming independent from Solar SoftWare (SSW) for analysing AIA data, particularly with respect to inferring plasma properties such as temperature and density.

A successful proposal will outline a schedule for implementing at least a single set of temperature and wavelength response functions for AIA, and the response functions for arbitrary plasma conditions would be a bonus. Familiarity with CHIANTI, ChiantiPy and SSW’s implementation of the response functions will help to properly assess how long will be required to recreate them in SunPy.

Real time data access and visualisation tools

Suggested Mentor(s): David Perez-Suarez, Jack Ireland

Difficulty: Beginner-Intermediate

Astronomy knowledge needed: none

Programming skills: Python


Real time data is very useful for spaceweather operations, SunPy provides access to data by different virtual observatories or services (like or or by accessing to direct data archives. Fido (formerly called UnifiedDownloader) provides a single point of access to them all. However, this needs to be extended to other data archives, and a logic implemented so depending on the time range asked it downloads the data from the realtime archives or from the full-archive.

Additionally, this project should produce some visualisation tools to combine data from different sources. Some examples are overlay of active regions on top of solar images (like in SolarMonitor), GOES X-ray flux with active regions number on the flares detected (like in Latest Events), latest features observed available from HEK on top of a map (e.g. isolsearh).

In summary, this project has two objectives:

  1. Implementation of real time archives and logic on Fido.
  2. Creation of visualisation tools to represent real-time data.

Familiarisation with the unidown branch and matplotlib library will help you to create a proper timeline on how much time will take to implement, test and document each part of the project.

Improvements to the SunPy Database

Suggested Mentor(s): Stuart Mumford, Simon Liedtke, Steven Christe

Difficulty: Intermediate

Astronomy knowledge needed: None

Programming skills: Python, some database design knowledge would be helpful.


The database module provides functionality to users to manage collections of files on disk in a way not reliant upon folder structure and file name. The database allows users to find files on disk by either physical parameters, such as wavelength and time or properties of the instrument such as name and spacecraft. It also allows more complex queries by enabling searches of the raw meta data associated with the files.

The SunPy database will also act as a proxy for some web services supported by SunPy. When used like this, the database module takes a user query, downloads the data from the web service and then stores it in the database, and then returns the query to the user. SunPy contains clients to various web services, the first and primary web service SunPy supported was the Virtual Solar Observatory (VSO), this is the web service the database was originally designed to support. Since the original development of the database module, the database has also been extended to support the HEK client.

The SunPy web clients, use a system named attrs (an abbreviation for attributes) to compose queries, this attrs system is also used by the database to perform queries on the database, with some of the attrs shared between the VSO client and the database. Recently, a new downloader front end (originally named UnifiedDownloader, now affectionately known as Fido) has been developed, this provides a Factory Class, with which various download clients (such as the VSO) can register with, providing information about which attrs and attr values that client supports. Using this approach, the Fido downloader provides a single interface to the many different services SunPy supports. The first part of this project will be to update the database module to support the new Fido interface, specifically by using Fido inside the database to retrieve data.

The second part of the project will be to update the caching mechanism implemented in the database module. The current caching system serialises the users VSO query and stores it as JSON, upon the user requesting another query, the query will be compared to the cache of serialised queries and if a match is found, the results from the cached query returned. This mechanism is limiting in that if the user requests 100 records in query A and 100 records in query B, but 50 of the records requested in both queries are the same (i.e. two overlapping time windows) then the 50 records will be re-downloaded as the cache of query A will not match query B. The updated caching system will store the records a query returns (before the data is downloaded) and then link the results of a query to the records in the database (once the data has been downloaded). Then when records are retrieved from a web service, any records that are stored in the cache table can be skipped for retrieval from the web service and returned from the records in the database. This will allow the caching of partial queries rather than whole queries as is currently implemented.

This project aims to achieve the following things:

  1. Update the current implementation of the database using the VSO attributes to use the slightly refactored Fido attributes and use Fido inside the database to download data from the VSO.
  2. Implement a new caching mechanism bases of the results of Queries with Fido rather than the current caching which is based upon the VSO query.

A successful proposal will schedule updates to the database package in small sections, rather than in one large pull request. The work should be understood and broken down into individual sections.

There are various other maintenance tasks which need undertaking ( which would be a good way for someone interested in this project to familiarise themselves with the codebase.

GUI to use LCT tools

Suggested Mentor(s): Jose Iván Campos Rozo (National Astronomical Observatory, National University of Colombia), Santiago Vargas Domínguez (National Astronomical Observatory, National University of Colombia), David Pérez Suárez.

Difficulty: Intermediate

Astronomy knowledge needed: None

Programming skills: Python, basic knowledge of qt4, pyqt4, qt designer


The Local Correlation Tracking (LCT, November & Simon, 1988) technique is a robust method used to study the dynamics of structures in a time series of images. By tracking pixel displacements, using a correlation window, LCT can determine proper motions and generate flow maps of horizontal velocities. This procedure is used to study the dynamics of plasma in the solar photosphere at different spatial scales, e.g the analysis of granular and supergranular convective cells, meridional flows, etc. A widget implemented in Python was developed. It generates a user-friendly graphical user interface (GUI) to control various parameters for the process of calculating flow maps of proper motions for a series of filtergrams (data cube). Our purpose is to implement this tool in Sunpy using its structure and to improve it with some more options, i.e. masks, statistics, histograms, contours and multi-plots. Although an initial version is already developed, our proposal is to focus on the efficient integration of the code in the SunPy libraries. The code (without widget files yet) is

Expected Outcomes: To integate efficiently the code in SunPy libraries.


If you are interested in one of the yt ideas, please see the GSoC 2016 Guidelines on the yt bitbucket wiki.

All projects in this section are for yt, an analysis and visualization environment for particle and mesh-based volumetric data. It has readers for most astrophysical simulation codes, as well as a few nuclear engineering simulation codes. It can handle data produced by particle-based codes, as well data produced by codes that use various types of mesh structures, including uniform and adaptively refined meshes as well as unstructured and semistructured meshes. yt is able to analyze and visualize these datasets with substantially different on-disk and in memory formats using a common language.

To learn more about how to use yt to interact with simulation data, take a look at the quickstart guide, as well as the rest of the yt documentation. We also provide a listing of sample test datasets that can be loaded by yt. We use a variety of public communication channels, including mailing lists, IRC, and a slack channel that can be joined by anyone interested in yt development.

For more information about contributing to yt, take a look at our developer guide. To see discussions about past yt projects, take a look at the yt enhancement proposal (YTEP) listing.

Integrate yt plots with interactive matplotlib backends

Suggested Mentor(s): Nathan Goldbaum, Matthew Turk

Difficulty: Intermediate

Knowledge needed: Familiarity with matplotlib. Knowledge of matplotlib’s object oriented API a plus.

Programming skills: Python. GUI programming.


Currently, all yt plotting objects have a show() method that displays a version of the plot in Jupyter notebooks. This works for the most part and is relatively simple due to Jupyter’s data model. However, this reliance on the notebook fails for users who work primarily from the command line, either in the vanilla python interpreter or the IPython command line application. We receive many requests from confused users who do not understand why show() errors out in the regular python interpreter or appears to do nothing in IPython, when they expect a GUI window to pop up when they run show().

This project would have a student modify yt’s plotting objects to hook into matplotlib’s interactive backends so that plots can be optionally displayed using a GUI plotting window. Optimally, we would also enable callbacks so that zooming and selecting does the “right thing”, generating high resolution data when it is available.

This is constrained by maintaining backward compatibility: by default yt should not fail when generating plots on headless devices (e.g. when connecting over SSH to a supercomputer).


  • A proof of concept demonstrating how to hook into matplotlib’s interactive backends using the matplotlib object-oriented API, or a way to show how to gracefully fall back to using pyplot instead of the object oriented API.

  • A YTEP describing the proposed approach for modifying yt’s plotting infrastructure to support matplotlib’s interactive plotting backends.

  • The implementation for the YTEP submitted as a bitbucket pull request to the main yt repository.

Improve test coverage and test performance

Suggested Mentor(s): Kacper Kowalik, Nathan Goldbaum

Difficulty: Beginner to Advanced, depending on where the student takes the project

Knowledge needed: Familiarity with the nose testing package.

Programming skills: Python, Cython


Currently yt’s test suite is split between unit tests (which take about 45 minutes to run) and answer tests, which are normally only run on a continuous integration server. Altogether the tests only cover about a third of the yt codebase, so much of the code in yt needs test coverage. Additionally, the tests take a long time to run, and we would like to reduce the test runtime while simultaneously increasing code coverage.

This project could go in a number of directions:

  • Implement a way to retrofit the current tests for different geometries (e.g. cartesian, cylindrical, and spherical coordinates) and data styles (e.g. particle data, as well as various kind of mesh data, including uniform resolution, octree, patch AMR, and unstructured meshes). Ideally this would allow us to test all functionality for all possible data styles. This will require learning and improving the “Stream” frontend, which allows the injestion of in-memory data into yt.

  • Identify areas of the code that are not well tested and devise tests for them. This will require measuring the test coverage of yt’s Python and Cython components. The student working on this will need to gain familiarity with untested or undertested parts of the codebase and add new tests. Optimally the new tests will make use of new reusable infrastructure that will be helpful for tests across the yt codebase.

  • Improve volume rendering and visualization unit tests. Right now visualization tests rely heavily on answer testing and image comparison. It would be more flexible and easier to understand when things go wrong if the tests instead compared with a predicted answer using some sort of simplified geometry or via introspection.


  • Develop a framework for measuring test covering in yt’s python and cython components. Triage the reports to look for areas that are user facing and have poor test coverage.

  • Make a number of pull requests adding tests across the yt codebase.

  • Modify existing testing infrastructure or develop new test infrastructure to improve testing of yt functionality on different data types.

Domain contexts and domain-specific fields

Suggested Mentor(s): Britton Smith, Matthew Turk

Difficulty: Beginner to Intermediate

Knowledge needed: Undergrad level Physics knowledge. More specific domain-specific knowledge of astronomy, hydrodynamics, finite-element methods, GIS, meteorology, geophysics, oceanography a plus

Programming skills: Python

The original focus of yt was to analyze datasets from astrophysical simulations. However, use of yt has been expanding to other scientific domains, such as nuclear physics, meteorology, and geophysics. Still, much of the infrastructure within yt is built upon the assumption that the datasets being loaded are astrophysical and hydrodynamic in nature. This assumption informs the choice of derived fields made available to the user as well as the default unit system. For example, fields such as “Jeans mass” and “X-ray emissivity” in CGS units are of little use to an earthquake simulation.

The goal of this project is to develop a system for domain contexts, sets of fields and unit systems associated with specific scientific domains. Rather than having all fields be made available to all datasets, each dataset is given a domain context, which specifies the relevant fields and most meaningful unit system. Domain contexts could also be subclassed to provide further specificity, for example, cosmology as a subclass of astrophysics.


  • For each of the existing frontends, identify the relevant field plugins. Create a data structure to associate with each frontend that lists only the relevant plugins. Take the field plugin loading machinery, which currently just loops over all plugins, and have it only load plugins relevant to the loaded frontend.

  • With the above as an example, identify and document all of the places in the code where the domain is assumed to be astronomy. Use this to come up with a set of attributes that minimally describe a scientific domain, i.e., list of field plugins, unit system, etc.

  • Write up a YTEP describing the proposed design and ideas for implementation. Should identify an initial set of domain contexts, sort fields into domain contexts, and sketch how frontends should declare needed domain contexts.

  • Create a domain context class with the identified attributes. Implement an Base, astronomy, and possibly a nuclear engineering domain context and associate it with the existing frontends.