yt development: 2.0, Cython, and physics module wrapping

Author: Matthew Turk
Published on: Jan 24, 2011, 1:10:50 PM
Permalink - Source code

This is the second blog entry in the weekly series, with some updates on what took place last week with respect to yt development. One of the more exciting things is the final one, which is the start of what I want to focus on for the next couple months or years: integration of physics modules with analysis code, and then the ultimate inversion of that relationship.

yt-2.0

This week saw the release of yt 2.0, which has the most prominent feature of being a completely re-organized code base, ready for expansion, with clearer entry points for modification, and a faster import time. The "stable" branch was updated to this reorganized codebase, and so all new installations should have this same basic layout. The new layout looks something like this:

yt/
yt/data_objects
yt/utilities
yt/frontends
yt/gui
yt/visualization
yt/analysis_modules

Each of these contains python files, possibly submodules, and so on. But in contrast with the old layout, where things were named lagos and raven and so on, I think it's safe to say we've traded whimsy for utility.

Cython

(If you'll forgive me, this is taken from an email I wrote to yt-users.)

Cython is a Python-like dialect that compiles down to C code. It understands NumPy arrays and Python objects natively, and can produce code that operates at speeds competitive with raw, hand-written C code. We use it in yt to write things like the volume renderer, fast interpolation, level set identification, PNG writing, and so on. You can see most of the Cython code in yt/utilities/_amr_utils/*.pyx .

In the past, the Cython code was converted to C before being put into the repository. This is visible in yt/utilities/amr_utils.c. If you take a second to open this up, you'll note a couple things about it. The first is that it's generated code: notoriously awful to read, and 100% impossible to maintain. That's why it's never touched directly, and only generated by Cython. But, the problem with that is that if we want to make small changes, they cascade into ridiculously long changesets and diffs, which end up growing the size of the mercurial repository by far more than is appropriate.

To get around this, I have added an install-time dependency on the Cython package. To ensure that this will cause no problems during the transition, installation of Cython was added to the install script a while ago, and I have additionally added a check to setup.py. If Cython is not found, it should install it. The only cases where this should cause a problem are those where yt was installed with elevated (sudo) permissions. Now, every time yt is rebuilt, the C code that was previously in yt/utilities/amr_utils.c is regenerated from the Cython code in yt/utilities/_amr_utils/*.pyx. This means that we no longer need to update `amr_utils.c in the mercurial repository.

Adding Cython brings with it a number of interesting things we can do; these include much faster iteration on improvements to, say, the volume renderer. Additionally, there are some other very interesting things one can do with the speed improvements Cython brings, which hopefully we can explore in the future.

(Here ends the email extraction)

One possible place to explore using Cython is with writing on-the-fly faster, single-cell kernels. I have been thinking about this, for things like calculating flux across isocontours, which may need to be redefined on the fly inside a running session. This could be very useful for looking at the collapse-state of star forming clumps in a calculation. Another would be the subject of the next section, which is generating interfaces to physics modules.

Physics Module Wrapping

As a bit of background, one of the goals I am attempting to reach is to be able to run individual physics modules (chemistry, cooling, possibly even hydrodynamics) on a given set of cells; this will enable further exploration of things like the cooling time, the chemical timescales and so on. More importantly, it's the first step toward being able to run an actual simulation inside yt, which is the goal of this project over the next several years. More relevantly, I am planning to further develop the Enzo chemistry solvers, and this is necessary to do so in a rapid manner. A quick dig through the early stages of yt will reveal that this worked, once upon a time, with the older Enzo solvers; however, in order to drive forward my near-term solver development I will not be able to use those wrappers.

This last week I spent some time attempting to write a semi-general wrapper for Enzo's physics solvers (in this case, the chemistry solver) that could be called from Python. I ran into several difficulties, which I will chronicle here, but I believe I have come upon a solution; unfortunately it is a solution that may take longer to implement than I had hoped.

In the past, before the development of the branch of Enzo that would become 2.0, the construction of wrappers around fortran modules was much simpler. In the past, the code had been slightly more sanitized than it is now (comments in F77 were more uniformly used -- "c" instead of "!", few end-of-line, running over 80 characters, and so on) in the solvers. Additionally, the transition to Enzo 2.0 brought with it uniform 64-bit aware code; for Fortran modules, this relies on "--fdefault-real-8" in the GNU compilers, and "-r8" in the Intel compilers. This promotes all "REAL" variables to be of higher precision within the Fortran code.

f2py is a module that comes with NumPy, and which can parse and generate wrappers for Fortran code. However, the default promotion presents a difficulty, and I think that we cannot use f2py for this. I found the next-generation f2py to pose a similar problem, and I was unable to get fwrap, the new Cython-using wrapper into a working state. I think that direct wrapping of Fortran is outside the scope of my near-term goals. However, what I was able to locate was a piece of software called cppheader, which is based on PLY, that is able to parse in a general-fashion the header files from C++. I believe that I can use this to construct general wrappers using Cython for the entire Grid object, thus exposing the Fortran solvers indirectly. I have a set of hand-constructed wrappers that I wrote last Summer, but a generative method will be easier to maintain. This will remove the complexity that arises from the precision handling in Enzo. Constructing wrappers for other simulation platforms will likely take a similar form.

That wraps it up for this week.