Registration for the 2012 yt workshop, to be held at the FLASH center in Chicago from January 24-26, is now open.
Please register here: http://yt- project.org/workshop2012/
A useful new addition to yt are boolean data containers. These are hybrid data containers that are built by relating already-defined data containers with each other using boolean operators. Nested boolean logic, using parentheses, is also supported. The boolean data container (or volume) is made by constructing a list of volumes interspersed with operators given as strings. Below are some examples of what can be done with boolean data containers.
The "OR" operator combines volume of the two data containers into one. The two intial volumes may or may not overlap, meaning that the combined volume may constitute several disjoint volumes. Here is an example showing the construction of a boolean volume of two disjoint spheres:
sp1 = pf.h.sphere([0.3]*3, .15) sp2 = pf.h.sphere([0.7]*3, .25) bool = pf.h.boolean([sp1, "OR", sp2])
Here is a short video showing the result:
The "AND" operator mixes two volumes where both volumes cover the same volume. Put another way, the "AND" operator produces a new volume that is defined by all cells that lie in both of the initial volumes. Here is an example of the intersection of a sphere and a cube:
re1 = pf.h.region([0.5]*3, [0.0]*3, [0.7]*3) sp1 = pf.h.sphere([0.5]*3, 0.5) bool = pf.h.boolean([re1, "AND", sp1])
Here is a short video showing the result:
The "NOT" operator is the only non-transitive operator, and is read from left to right. For example, if there are multiple "NOT" operators, the first "NOT" on the left and the two volumes on either side are considered first. The new volume constructed is the volume contained in the first data container that the second data container does not cover. This can be thought of as a subtraction from the first volume by the second volume. Here is an example of a cubical region having a corner cut out of it:
re1 = pf.h.region([0.5]*3, [0.]*3, [1.]*3) re2 = pf.h.region([0.5]*3, [0.5]*3, [1.]*3) bool = pf.h.boolean([re1, "NOT", re2])
Here is a short video showing the result:
It is possible to use nested logic using parentheses. When nested logic is used, the order of logical operations begins at the inner-most nested level and proceeds outwards, always respecting the left to right ordering for "NOT" operations. This may be used to create truly fantastic volumes. Here is an example of a piece of Swiss cheese created from two cubical regions and two spheres. The second sphere sp2 wraps around the periodic boundaries and impacts the largest cube in more than one place.
re1 = pf.h.region([0.5]*3, [0.]*3, [1.]*3) re2 = pf.h.region([0.5]*3, [0.5]*3, [1.]*3) sp1 = pf.h.sphere([0.5, 0.7, 0.5], .25) sp2 = pf.h.sphere([0.1]*3, .25) bool = pf.h.boolean([re1, "NOT", "(", re2, "AND", sp1, ")", "NOT", sp2])
For those wondering how the movies were made, I've posted the script here . Note that blocks of comments will need to be turned on/off to get the desired boolean data container.
yt now has a Google Plus page. Here's we'll post smaller, less blog-worthy items, hold video conferencing 'hangouts', and so on. Encircle away! And if you post something you'd like to be reshared, just be sure to explicitly share it with '+yt' so we know.
A few of us worked this past week on a couple yt projects and made what we think is significant progress. Two of the items we focused on were testing and parallelism.
For testing, we've broadened the test suite to include many more functions and derived quantities. We now have 548 tests that include (off and on-axis) slices, (off and on- axis) projections, phase distributions, halo finding, volume rendering, and geometrical region cuts such as rectangular solids, spheres, and disks. We use both plain and derived fields for these tests so that it covers as many bases as possible. With this framework, we are now able to keep a gold standard of the test results for any dataset, then test later changes against this standard. These tests can test for bitwise identicality or allow for some tolerance. For a full list of tests, you can run python yt/tests/runall.py -l, and use --help to look at the usage. We will soon be updating the documentation to provide more information on how to set up the testing framework, but I think all of us agree that this will make it much easier to test our changes to make sure bugs have not crept in.
The second big change I'd like to talk about is the way we now handle parallelism in yt. Previously, methods that employed parallelism through MPI calls would first inherit from ParallelAnalysisInterface, which had access to a ton of mpi functions that all work off of MPI.COMM_WORLD. In our revamp we wanted to accomplish two things: 1) merge duplicate mpi calls that were only different by the type of values they work on and do overall cleanup. 2) Allow for nested levels of parallelism where two (or more) separate communicators are able to use barriers and collective operations such as allreduce. To do this, we worked in a two-step process. First we took things like:
def _mpi_allsum(self, data): def _mpi_Allsum_double(self, data): def _mpi_Allsum_long(self, data): def _mpi_allmax(self, data): def _mpi_allmin(self, data):
and packed it into a single function:
def mpi_allreduce(self, data, dtype=None, op='sum'):
When a numpy array is passed to this new mpi_allreduce, dtype is determined from the array properties. If the data is a dictionary, then it is passed to mpi4py's allreduce function that acts on dictionaries. This greatly reduced the number of lines in parallel_analysis_interface (1376 to 915), even after adding in additional functionality.
The second step was bundling all of these functions into a new class called Communicator. This Communicator object is initialized with an MPI communicator that no longer is restricted to COMM_WORLD. Using this as the fundamental MPI object, we then built a CommunicationSystem object that manages these communicators. A global communication_system instance is created, that is initialized with COMM_WORLD at the top of the system if the environment is mpi4py-capable. If not, an empty communicator is created that has passthroughs for all the mpi functions.
Using this new framework we are now able to take advantage of multiple communicators. There are two use cases that we have implemented so far:
parallel_objects is a method in parallel_analysis_interface.py for iterating over a set of objects such that a group of processors work on each object. This could be used, for example, to run N projections each with M processors, allowing for a parallelism of NxM.
workgoups allows users to set up multiple MPI communicators with a non-uniform number of processors to each work on a separate task. This capability lives within the ProcessorPool and Workgroup objects in parallel_analysis_interface.py
These are just the first two that we tried out and we are very excited about the new possibilities.
With these changes, there was one implementation change that has already come up once in the mailing list. When you implement a new class that you'd like to have access to the communication objects, you must first inherit ParallelAnalysisInterface, and then make sure that __init__ makes a call to: ParallelAnalysisInterface.__init__()
At that point, your new class will have access to the mpi calls through the self.comm object. For example, to perform a reduction one would do:
As I said before, this will be documented soon, but hopefully this will help for now.
Sam, Britton, Cameron, Jeff, and Matt
I'm pleased to announce the 2012 yt Workshop at the FLASH Center in Chicago, January 24-26.
The workshop will be aimed at both users and developers of yt. We will begin with intensive user training, moving from basic usage to advanced and parallel usage. Users are encouraged to bring their ideas and prototypes for new analysis routines as there will be opportunities to work with more experienced developers. We will then address to how to modify, extend and contribute to yt, and transition to a developers workshop. In the developers portion of the workshop, we will discuss ideas for improvements to the code and then break into groups to implement new features. Users are highly encouraged to stay and participate in development. The FLASH Center has graciously offered to host the workshop. We have identified a hotel in downtown Chicago (near the river, just off Michigan Avenue) that we are able to book double-occupancy rooms for $99/night, pre-tax. We are actively pursuing funding opportunities, but as of yet have not secured funding for participant costs; if we are able to do so, it will likely cover hotel stays for a limited number of individuals willing to share rooms for the four nights of the workshop (Jan. 23-26).
As we prepare hotel reservations, funding applications and other technicalities, we need to get a sense of not only how many people are potentially going to attend, but also their current career stage, funding availability and so on. If you are interested in attending the workshop, we would greatly appreciate it if you would visit the following URL and fill out the Google Form: http://goo.gl/xElrB . If you have already filled it out, no need to do it again!
Once we have the details of the conference settled, further information will be forthcoming regarding registration, accommodations, and possible financial support.
For specific questions regarding the workshop, please email John ZuHone at jzuhone [at] milkyway.gsfc.nasa.gov.
A few of the yt developers have been experimenting with screencasts to show off new features or demonstrate how to do some things. Sam and I have both prepared screencasts on volume rendering and getting started with developing, respectively. Check them out below, and please feel free to leave comments and let us know what you think -- not just about the screencasts, but about what they demonstrate, and if you think any of the concepts or routines could be made easier.
In this post I'd like to discuss a bit of work in progress to highlight some exciting new features that we hope to have working in yt sometime soon.
On any machine that runs yt, there is a file created in the users home directory named ~/.yt/parameter_files.csv that yt uses internally to keep track of datasets it has seen. This is just a simple text file containing comma-separated entries with a few pieces of information about datasets, like their location on disk and the last date and time they were 'seen' by yt. To keep this file from exploding, it's kept at some maximum number of entries. But, clearly, text is not the ideal way to store this kind of information for anything over a few hundred entries. Recently Matt has been working on updating this system to use a SQLite database, which should have several advantages over the text file in terms of speed and disk usage.
This got me thinking about what could be done to extend this local listing of datasets into something more useful, globally. What if there was a way to view any and all datasets ever seen by yt in one convenient place? It could be searchable over a number of attributes, including creation date and when it was last seen by yt, and it would list which machine the dataset is stored on. Finally, this functionality should be transparent to the user once it is set up (with minimal effort) - the global listing of datasets should just be updated automatically in the background as part of the normal workflow.
Over a couple days last week I did a quick and dirty implementation of this using Amazon AWS SimpleDB and a simple web-cgi script I wrote in Python. The advantages of SimpleDB are that it is "in the cloud" (sheesh) and very inexpensive. In fact, for small databases with low usage levels, it is free. (As an aside, Amazon is very generous with academic grants, which could be used for this or other yt-related services.) The Python script is very simple and can be cloned off of BitBucket. The script can be run on any computer with a webserver and Python (which includes Macs and Linux machines), and I envision a website (perhaps mydb.yt-project.org, for example) being created where a user can login from anywhere to view their datasets easily.
The entire thing is not finished yet: the updates to SimpleDB are not automatic, nor have we settled on a final list of which attributes to store in the listing. However, in two days I was able to get enough working to show what I think are the key killer features of the system in a screencast which I've linked below. I should note that in the time since I made the screencast, I have made a few improvements. In particular, the numerical columns can now be sorted correctly.
I'm excited about the prospects for a simple system like this!
This last week, following the release of version 2.2 of yt, I spent a bit of time looking at speed improvements. There were several places that the code was unacceptably slow:
- 1D profiles (as noted in our method paper, even)
- Ghost-zone generation
- RAMSES grid data loading
The first of these was relatively easy to fix. In the past, 1D profiles (unlike 2D profiles) were calculated using pure-python mechanisms; numpy was used for digitization, then inverse binning was conducted by the numpy 'where' command, and these binnings were used to generate the overall histogram. However, with 2D and 3D profiles, we used specialized C code written expressly for our purposes. This last week I found myself waiting for profiles for too long, and I wrote a specialized C function that conducted binning in one-dimensions. This sped up my profiling code by a factor of 3-4, depending on the specific field being profiled.
The second, ghost zone generation, was harder. To generated a 'smoothed' grid, interpolation is performed cascading down from the root grid to the final grid, allowing for a buffer region. This helps to avoid dangling nodes. Ideally, filling ghost zones would be easier and require less interpolation; however, as we do not restrict the characteristics of the mesh in such a way as to ease this, we have to use the most general case. I spent some time looking over the code, however, and realized that the most general method of interpolation was being used -- which allowed for interpolation from a regular grid onto arbitrary shapes. After writing a specialized regular-grid to regular-grid interpolator (and ensuring consistency and identicality of results) I saw a speedup of a factor of about 2.5-3 in generating ghost zones; this has applications from volume rendering to finite differencing and so on.
Finally, in the past, RAMSES grids following regridding were allowed to cross domains (i.e., processor files.) By rewriting the regridding process to only allow regrids to exist within a single domain, I was able to speed up the process of loading data, allowing it to preload data for things like projections, as well. Next this will be used as a load balancer, and it will also ease the process of loading particles from disk. I am optimistic that this will also enable faster, more specific read times to bring down peak memory usage.
Hopefully over the next few months more optimization can be conducted. If you want to test out how long something takes, particularly if it's a long-running task, I recommend using pyprof2html, which you can install with pip install pyprof2html. Then run a profiling code:
$ python2.7 -m cProfile -o my_slow_script.cprof my_slow_script.py $ pyprof2html my_slow_script.cprof
This will create a directory called 'html', which has a nice presentation of where things are slow. If you send the .cprof file to the mailing list, we can take a look, too, and see if there are some obvious places to speed things up.
(Please feel encouraged to forward this message to any other interested parties.)
We are proud to announce the release of yt version 2.2. This release includes several new features, bug fixes, and numerous improvements to the code base and documentation. At the new yt homepage, http://yt-project.org/ , an installation script, a cookbook, documentation and a guide to getting involved can be found. We are particularly proud of the new GUI, entitled "Reason," which allows real-time exploration of datasets, and which can be used (locally or remotely over SSH) with no dependencies other than a web browser. A basic demonstration of its usage can be found at: http://vimeo.com/groups/ytgallery/videos/28506477 .
yt is a community-developed analysis and visualization toolkit for astrophysical simulation data. yt provides full support for Enzo, Orion, Nyx, and FLASH codes, with preliminary support for the RAMSES, ART, and Maestro codes. It can be used to create many common types of data products such as:
- Arbitrary Data Selection
- Cosmological Analysis
- Halo finding
- Parallel AMR Volume Rendering
- Gravitationally Bound Objects Analysis
There are a few major additions since yt-2.1 (Released April 8, 2011), including:
- New web GUI "Reason," designed for efficient remote usage over SSH tunnels
- Command-line submission to the yt Hub (http://hub.yt-project.org/)
- Absorption line spectrum generator for cosmological simulations
- Support for the Nyx code
- An order of magnitude speed improvement in the RAMSES support
- Experimental interoperability with ParaView
- Quad-tree projections, speeding up the process of projecting by up to an order of magnitude and providing better load balancing
- "mapserver" for in-browser, Google Maps-style slice and projection visualization
- Many bug fixes and performance improvements
With this release, we also unveil the yt Hub, an astrophysical simulation-specific location for sharing scripts, analysis and visualization tools, documents and repositories used to generated publications. The yt Hub has been designed to allow programmatic access from the command line, and we encourage you to browse the current offerings and contribute your own. The yt Hub can be found at http://hub.yt-project.org/ .
Get Involved: http://yt-project.org/doc/advanced/developing.html
If you can’t wait to get started, install with:
$ wget http://hg.yt-project.org/yt/raw/stable/doc/install_script.sh $ bash install_script.sh
Development has been sponsored by the NSF, DOE, and University funding. We invite you to get involved with developing and using yt!
Please forward this announcement to interested parties.
The yt development team
In keeping with the project rename we've moved this blog from it's old home at blog.enzotools.org to its new home at blog.yt-project.org. But, we've put in a few redirects, and the RSS feed hasn't moved, so you shouldn't need to do anything different to get here. We've also enabled anonymous commenting, so feel free to comment below. (But, of course, letting us know who you are would certainly help with keeping in touch!)