Dataset Tracking with yt

In this post I’d like to discuss a bit of work in progress to highlight some exciting new features that we hope to have working in yt sometime soon.

On any machine that runs yt, there is a file created in the users home directory named ~/.yt/parameter_files.csv that yt uses internally to keep track of datasets it has seen. This is just a simple text file containing comma-separated entries with a few pieces of information about datasets, like their location on disk and the last date and time they were ‘seen’ by yt. To keep this file from exploding, it’s kept at some maximum number of entries. But, clearly, text is not the ideal way to store this kind of information for anything over a few hundred entries. Recently Matt has been working on updating this system to use a SQLite database, which should have several advantages over the text file in terms of speed and disk usage.

This got me thinking about what could be done to extend this local listing of datasets into something more useful, globally. What if there was a way to view any and all datasets ever seen by yt in one convenient place? It could be searchable over a number of attributes, including creation date and when it was last seen by yt, and it would list which machine the dataset is stored on. Finally, this functionality should be transparent to the user once it is set up (with minimal effort) - the global listing of datasets should just be updated automatically in the background as part of the normal workflow.

Over a couple days last week I did a quick and dirty implementation of this using Amazon AWS SimpleDB and a simple web-cgi script I wrote in Python. The advantages of SimpleDB are that it is “in the cloud” (sheesh) and very inexpensive. In fact, for small databases with low usage levels, it is free. (As an aside, Amazon is very generous with academic grants, which could be used for this or other yt-related services.) The Python script is very simple and can be cloned off of BitBucket. The script can be run on any computer with a webserver and Python (which includes Macs and Linux machines), and I envision a website (perhaps mydb.yt-project.org, for example) being created where a user can login from anywhere to view their datasets easily.

The entire thing is not finished yet: the updates to SimpleDB are not automatic, nor have we settled on a final list of which attributes to store in the listing. However, in two days I was able to get enough working to show what I think are the key killer features of the system in a screencast which I’ve linked below. I should note that in the time since I made the screencast, I have made a few improvements. In particular, the numerical columns can now be sorted correctly.

I’m excited about the prospects for a simple system like this!

Want to contribute a post? Check out our contributor guide

Check out our documentation here

The yt-project homepage

Our quickstart notebooks on getting started

The yt source repository

yt has many extension packages to help you in your scientific workflow! Check these out, or create your own.

ytini

ytini is set of tools and tutorials for using yt as a tool inside the 3D visual effects software Houdini or a data pre-processor externally to Houdini.

Trident

Trident is a full-featured tool that projects arbitrary sightlines through astrophysical hydrodynamics simulations for generating mock spectral observations of the IGM and CGM.

pyXSIM

pyXSIM is a Python package for simulating X-ray observations from astrophysical sources.

ytree

Analyze merger tree data from multiple sources. It’s yt for merger trees!

yt_idv

yt_idv is a package for interactive volume rendering with yt! It provides interactive visualization using OpenGL for datasets loaded in yt. It is written to provide both scripting and interactive access.

widgyts

widgyts is a jupyter widgets extension for yt, backed by rust/webassembly to allow for browser-based, interactive exploration of data from yt.

yt_astro_analysis

yt_astro_analysis is the yt extension package for astrophysical analysis.

Make your own!!

Finally, check out our development docs on writing your own yt extensions!

Are you interested in contributing to the yt blog?

Check out our post on contributing to the blog for a guide! https://yt-project.github.io/blog/posts/contributing/

We welcome contributions from all members of the yt community. Feel free to reach out if you need any help.

The yt hub at https://girder.hub.yt/ has a ton of resources to check out, whether you have yt installed or not.

The collections host all sorts of data that can be loaded with yt. Some have been used in publications, and others are used as sample frontend data for yt. Maybe there’s data from your simulation software?

The rafts host the yt quickstart notebooks, where you can interact with yt in the browser, without needing to install it locally. Check out some of the other rafts too, like the widgyts release notebooks – a demo of the widgyts yt extension pacakge; or the notebooks from the CCA workshop – a user’s workshop on using yt.

Dataset Tracking with yt
@ Stephen Skory | Monday, Sep 12, 2011 | 3 minute read | Update at Monday, Sep 12, 2011

Aboout the yt Project

yt extension modules