Relicensing yt from GPLv3 to BSD

Today, I hit the merge button on a relicensing of yt from GPLv3 to the 3-clause BSD license. Below is a blog post, largely drawn from an email I sent to yt-dev several months ago, describing why this process began and what it means for both users and developers.

This was an effort we started several months ago, following discussions at the SciPy 2013 conference. Nathan Goldbaum, Sam Skillman and I discussed it somewhat at length, and eventually decided to begin exploring this with the other key stakeholders of yt. It wasn't a decision or process that we took lightly, nor was it particularly seamless, but it was mostly smooth and I think overall it is going to be a positive change for the community.

When I originally created yt in the Summer of 2006 (then called Raven) I placed it under the then-new GPLv3 license. This license brought with it an ideology that I support -- free (as in freedom) and open source software, and attempts to ensure that the spread of software spreads those freedoms as well. This is done through terms of licensing; while there are several subtleties to how this plays out, and the goals of FLOSS and the GPL align very well with my own, at some point I began to believe that yt would be better suited under a permissive, non-copyleft license rather than the GPLv3. As such, through discussions with the other developers, and through consent from every contributor to yt over its history, we have relicensed it under the 3-clause BSD license.

Why?

In the scientific software community, for the most part codes and platforms are released under a permissive, BSD-like license. This is not universally true, but within the scientific python ecosystem (including projects such as AstroPy, NumPy, IPython and so on), BSD-like licenses are especially prevalent. These licenses place no restrictions on redistribution, passing on freedoms to end users, or making a piece of software closed-source. A side effect is that if a piece of software is BSD licensed, it cannot rely on GPL'd software without itself being subject to those terms. Specifically, a BSD licensed package that requires an import of a GPL'd package may then be subject to the GPL -- this is why it has been termed "viral" in the past. As examples, many BSD-licensed packages exist in the scientific software community: VisIt, ParaView, MayaVi, NumPy, Matplotlib, IPython, Python itself, mpi4py, h5py, SymPy, SciPy, most of the scikits, scikits-learn, NetworkX and so on. Collaboration with these projects is currently one-way because of our license.

When I initially decided on a license for yt (seven years ago) it seemed appropriate to use the licensing terms as a mechanism to encourage contributions upstream. However, within the current ecosystem, it is clear that because of the virality of the GPL and the prevailing mindsets of developers, it is actually an impediment to receiving contributions and receiving mindshare. John Hunter described it very clearly in his "BSD Pitch".

While John focuses on commercial utilization, I believe that within the scientific python ecosystem the picture can be broadened to include any piece of software that is under a permissive license. yt cannot be used or relied upon as a primary component without that piece of software then becoming subject to the terms of the GPL. Additionally, some private and public institutions are averse to providing code under the GPL, specifically version 3 of the GPL.

By transitioning to a permissive license, we may be able to receive more contributions and collaborate more widely. As a few examples, this could include more direct collaborations with packages such as Glue, IPython, VisIt, ParaView, and even utilization and exposing of yt methods and operations in other, permissively-licensed packages. For example, deep integration between permissively-licensed simulation codes will benefit from this. Furthermore, individuals who otherwise could not contribute code under the GPL (due to employer restrictions) will be able to contribute code under a permissive license.

The GPL is designed to prevent turning FLOSS code proprietary. Changing to a BSD license does not allow another entity to prevent us from continuing to develop or make available any yt code. It simply means that others can utilize it however they see fit.

I believe that we stand to gain considerably more than we stand to lose from this transition. (Interestingly enough, Wolfgang Bangerth and Timo Heister came to similar conclusions in section 3.4 their article What Makes Computational Open Source Software Libraries Successful?) More to the point, a few years ago on the yt-dev mailing list we came up with a mission statement for yt, which now adorns our homepage. I think we can better serve that mission statement by enabling broader collaborations within the scientific software ecosystem.

This is not motivated by any desire to create a proprietary distribution of yt -- in fact, exactly the opposite. I believe that in the current ecosystem of scientific software, yt will be more sustainable if it is under a permissive license. I hope we continue to scale.

The New License

As of changeset 7a7ca4d (in main yt branch) and 7b180c7 (yt-3.0 branch), yt is now available under the 3-clause BSD license. In addition to this, the author lists have been removed from the files; this was a suggestion from the other developers, to encourage a different representation of authorship.

In beginning this process, I consulted with several individuals that I consider role models in the community -- other long-term, core yt developers but also Fernando Perez, Anthony Scopatz, Matthew Terry and Katy Huff. To accomplish the relicensing, we had a public discussion (on yt-dev) of the advantages and disadvantages of the relicensing. I'm deeply grateful to Fernando for an extremely thoughtful email exchange where he described his own motivations for moving IPython from LGPL to BSD, as well as a set of guidelines and suggestions for relicensing yt. The process was inspired by the IPython licensing and credit system, and I hope we continue to learn from their successes over time.

To relicense, I personally emailed each individual contributor to yt explaining the reason, linking to documents describing each license, and asking them to publicly state their consent to relicense. Links to each mailing list entry were posted in a Google Spreadsheet. Once these messages had been collected, we were able to change the license on all of the header files.

At some point in the future, we will likely put out a 2.6 release. This release will be a long-term stable release, and will be available under the BSD license linked to above. But, as of today, checkouts of the code will be available under that license already.

Future

I have come to believe very strongly that as a project we can do more to support goals of open science, open source, and build a stronger community by this relicensing, and by rethinking how we fit into an ecosystem of scientific software.

Fun stuff is ahead.

Author: Matthew Turk
Published on: Sep 12, 2013, 11:53:18 PM
Permalink - Source code

OBJ File Exporter for Surfaces

OBJ and MTL Files

If the ability to maneuver around an isosurface of your 3D simulation in Sketchfab cost you half a day of work (let's be honest, 2 days), prepare to be even less productive. With a new OBJ file exporter, you can now upload multiple surfaces of different transparencies in the same file. The following code snippet produces two files which contain the vertex info (surfaces.obj) and color/transparency info (surfaces.mtl) for a 3D galaxy simulation:

from yt.mods import *

pf = load("/data/workshop2012/IsolatedGalaxy/galaxy0030/galaxy0030")
rho = [2e-27, 1e-27]
trans = [1.0, 0.5]
filename = './surfaces'

sphere = pf.h.sphere("max", (1.0, "mpc"))
for i,r in enumerate(rho):
    surf = pf.h.surface(sphere, 'Density', r)
    surf.export_obj(filename, transparency = trans[i], color_field='Temperature', plot_index = i)

The calling sequence is fairly similar to the export_ply function previously used to export 3D surfaces. However, one can now specify a transparency for each surface of interest, and each surface is enumerated in the OBJ files with plot_index. This means one could potentially add surfaces to a previously created file by setting plot_index to the number of previously written surfaces.

One tricky thing: the header of the OBJ file points to the MTL file (with the header command mtllib). This means if you move one or both of the files you may have to change the header to reflect their new directory location.

A Few More Options

There are a few extra inputs for formatting the surface files you may want to use.

(1) Setting dist_fac will divide all the vertex coordinates by this factor. Default will scale the vertices by the physical bounds of your sphere.

(2) Setting color_field_max and/or color_field_min will scale the colors of all surfaces between this min and max. Default is to scale the colors of each surface to their own min and max values.

Uploading to SketchFab

To upload to Sketchfab one only needs to zip the OBJ and MTL files together, and then upload via your dashboard prompts in the usual way. For example, the above script produces:

Importing to MeshLab and Blender

The new OBJ formatting will produce multi-colored surfaces in both MeshLab and Blender, a feature not possible with the previous PLY exporter. To see colors in MeshLab go to the "Render" tab and select "Color -> Per Face". Note in both MeshLab and Blender, unlike Sketchfab, you can't see transparencies until you render.

...One More Option

If you've started poking around the actual code instead of skipping off to lose a few days running around your own simulations you may have noticed there are a few more options then those listed above, specifically, a few related to something called "Emissivity." This allows you to output one more type of variable on your surfaces. For example:

from yt.mods import *

pf = load("/data/workshop2012/IsolatedGalaxy/galaxy0030/galaxy0030")
rho = [2e-27, 1e-27]
trans = [1.0, 0.5]
filename = './surfaces'

def _Emissivity(field, data):
    return (data['Density']*data['Density']*np.sqrt(data['Temperature']))
add_field("Emissivity", function=_Emissivity, units=r"\rm{g K}/\rm{cm}^{6}")

sphere = pf.h.sphere("max", (1.0, "mpc"))
for i,r in enumerate(rho):
    surf = pf.h.surface(sphere, 'Density', r)
    surf.export_obj(filename, transparency = trans[i],
                    color_field='Temperature', emit_field = 'Emissivity',
                    plot_index = i)

will output the same OBJ and MTL as in our previous example, but it will scale an emissivity parameter by our new field. Technically, this makes our outputs not really OBJ files at all, but a new sort of hybrid file, however we needn't worry too much about that for now.

This parameter is useful if you want to upload your files in Blender and have the embedded rendering engine do some approximate ray-tracing on your transparencies and emissivities. This does take some slight modifications to the OBJ importer scripts in Blender. For example, on a Mac, you would modify the file "/Applications/Blender/blender.app/Contents/MacOS/2.65/scripts/addons/io_scene_obj/import_obj.py", in the function "create_materials" with:

# ...

                 elif line_lower.startswith(b'tr'):  # translucency
                     context_material.translucency = float_func(line_split[1])
                 elif line_lower.startswith(b'tf'):
                     # rgb, filter color, blender has no support for this.
                     pass
                 elif line_lower.startswith(b'em'): # MODIFY: ADD THIS LINE
                     context_material.emit = float_func(line_split[1]) # MODIFY: THIS LINE TOO
                 elif line_lower.startswith(b'illum'):
                     illum = int(line_split[1])

# ...

To use this in Blender, you might create a Blender script like the following:

import bpy
from math import *

bpy.ops.import_scene.obj(filepath='./surfaces.obj') # will use new importer

# set up lighting = indirect
bpy.data.worlds['World'].light_settings.use_indirect_light = True
bpy.data.worlds['World'].horizon_color = [0.0, 0.0, 0.0] # background = black
# have to use approximate, not ray tracing for emitting objects ...
#   ... for now...
bpy.data.worlds['World'].light_settings.gather_method = 'APPROXIMATE'
bpy.data.worlds['World'].light_settings.indirect_factor=20. # turn up all emiss

# set up camera to be on -x axis, facing toward your object
scene = bpy.data.scenes["Scene"]
scene.camera.location = [-0.12, 0.0, 0.0] # location
scene.camera.rotation_euler = [radians(90.), 0.0, radians(-90.)] # face to (0,0,0)

# render
scene.render.filepath ='/Users/jillnaiman/surfaces_blender' # needs full path
bpy.ops.render.render(write_still=True)

This above bit of code would produce an image like so:

http://blog.yt-project.org/attachments/surfaces_blender.png

Note that the hottest stuff is brightly shining, while the cool stuff is less so (making the inner isodensity contour barely visible from the outside of the surfaces).

If the Blender image caught your fancy, you'll be happy to know there is a greater integration of Blender and yt in the works, so stay tuned!

Author: jnaiman
Published on: Mar 30, 2013, 12:50:40 AM
Permalink - Source code

The First Development Workshop

March 6-8, we held a yt-dev workshop at UCSC. Thanks to everyone who attended, as well as Joel Primack and HIPACC for sponsoring us. This was the first development-oriented workshop we've ever held, and it was a gigantic success! To see a full photo album, visit our Google Plus Page.

http://blog.yt-project.org/attachments/devworkshop.jpg

With local organization by Nathan Goldbaum, Chris Moody and Ji-hoon Kim at UCSC, over twenty people were able to participate, working on a diverse set of projects. The workshop was structured around introducing topics through lightning talks and then sprinting on those topics during breakout sessions.

The lightning talks were set up to be one slide (or ten notebook cells, if you used the IPython notebook!) presenting a concept with ideas for going into the breakout sessions. These were isolated ideas -- shovel-ready, with working guidelines. We had talks from John ZuHone, Casey Stark, Kacper Kowalik, Cameron Hummels, Doug Rudd, Britton Smith, Jeff Oishi, Nathan Goldbaum and Chris Moody. Following this, we split into a series of breakout groups.

We also had talks from Joel Primack presenting the AGORA project and from Hari Krishnan about his work to integrate yt and VisIt. On the final day, we also had a show-and-tell, where a member of each working group presented what they worked on and what they accomplished.

Grid Data Format

This breakout group focused on building a portable library in C for reading and writing data that takes the form of structured, rectilinear grids. The idea here is to adhere to the GDF format, which specifies a self-describing HDF5 layout for files. By developing this portable library, the group hopes to be able to use yt to make initial conditions, write them out in the already-existing GDF writer in yt, and then link simulation codes against this library to read them back in and run simulations. It would also enable directly converting between simulation code outputs.

Units and Arrays

This group ambitiously set out the task of incorporating Casey's library dimensionful into yt. This library utilizes SymPy for developing a units system that can convert between known units as well as affiliate those units with array objects. The hope here is to eliminate much of the existing unit handling and field duplication (i.e., CellMass and CellMassMsun) and provide easier methods for deploying unit conversions. This work may take some time to integrate into yt 3.0, but it will be a feature of that codebase eventually. By the end of the workshop, this group had a working implementation.

Halo Catalogs

The current system for handling halos and halo catalogs in yt is a bit ad hoc, where objects are passed around and data held onto for some time. This is inefficient and doesn't lend itself well to agile and flexible halo finding. This working group developed a rough outline of a YTEP that described halo finding and halo catalogs as a state of flow -- rather than performing analysis after the fact, analysis will now be performed during the course of halo finding. This will be done by supplying a set of callbacks that will be executed on each halo in turn. An ontology for describing halo catalogs was also developed, which is being fleshed out in a YTEP.

Octree Improvements

Some representatives of the NMSU-ART code were present at the workshop. This working group focused on cleaning up the rough edes of the Octree support in yt-3.0, as well as applying it to some actual data. Chris spent quite a bit of time polishing up star particle IO, fixing rough corners in the octree code, and testing things. Kenza gave a show-and-tell talk on the final day showing how she was able to load up a galaxy formation simulation and plot streamlines, all in a few lines of code.

ARTIO

For extremely large Octree datasets, it's simply infeasible to load the entire mesh into memory. The ARTIO working group worked to develop opaque data chunks in yt-3.0, so that yt can iterate over chunks, build data objects, perform processing, and conduct visualization all without needing to know how the data is laid out on disk or knowing anything about the mesh at all! This is an extremely powerful concept, and will hopefully present a number of opportunities for applying this elsewhere in the future.

Volume Rendering

The method by which volume renderings are created in yt is a bit clunky right now. Many arguments are pushed into the constructor for the Camera object, and then there's a limited flexibility after that. This working group built off of the still-pending YTEP-0010 to build a new system for creating volume renderings with a stateful Scene, cameras that move, volumes that don't, and so on. Cameron, Mark and Will demoed a working implementation of a Scene object on the final day. Jill also was able to demo on the final day some initial integration of yt with Blender -- showing how to script the camera path via Python, display surfaces extracted with yt, and make movies from them.

Thank you!

Thanks to everyone who attended! It was a great time, and I think we're all excited to have another one sometime in the future.

Author: Matthew Turk
Published on: Mar 18, 2013, 5:11:12 PM
Permalink - Source code

yt 2.5 released!

We’re proud to announce the release of version 2.5 of the yt Project, http://yt-project.org/. The new version includes many new features, refinements of existing features, and numerous bugfixes. We encourage all users to upgrade to take advantage of the changes.

yt is a community-developed analysis and visualization toolkit, primarily directed at astrophysical hydrodynamics simulations. It provides full support for output from the Enzo, FLASH, Orion, and Nyx codes, with preliminary support for several others. It provides access to simulation data using an intuitive python interface, can perform many common visualization tasks, and offers a framework for conducting data reductions and analysis of simulation data.

The most visible changes with the 2.5 release include:

For a complete list of changes in this release, please visit the Changelog (http://yt-project.org/docs/2.5/changelog.html).

Information about the yt project, including installation instructions, can be found on the homepage: http://yt-project.org/

Development of yt has been sponsored by the NSF, the DOE, and various universities. We develop yt in the open and encourage contributions from users who extend and improve the code. We invite you to get involved with developing and using yt!

Author: John ZuHone
Published on: Mar 2, 2013, 4:57:44 AM
Permalink - Source code

yt-dev 2013

We are proud to officially announce the upcoming 2013 yt developer workshop. This three day event, to be held on the campus of the University of California, Santa Cruz on March 6th through 8th, will bring together a diverse group of students, researchers, and developers.

Up to this point, yt development has proceeded largely over the internet, leveraging e-mail lists, our online code repository, and IRC to track, discuss, and evaluate changes to the code. Unfortunately, this model makes it difficult to jump in to development as a newcomer. This workshop will not only allow experienced developers to collaborate in person on new features, but will also be a means for new developers to learn what yt is about and begin contributing. The workshop will include several training sessions, constituting a primer in distributed version control, test-driven development, and best practices for scientific programming.

The workshop will also serve as a venue to plan and execute some of the major new features we are planning for the upcoming yt 3.0 release, including full support for outputs of Lagrangian codes like Gadget, Gasoline and AREPO, better support for oct-based codes like ART and RAMSES, initial conditions generation, the new Grid Data Format, a new way of handling units and unit conversion, non-cartesian geometries, and advanced graphical browser widgets inside the iPython notebook.

More information and a registration form for the workshop are available on the workshop website. We have limited funding support for hotels and airfare. Funds will be preferentially distributed to students and the level of individual support will depend on demand.

If you have questions or concerns about the workshop, please feel free to contact the organizers at workshop2013@yt-project.org.

On behalf of the organizing committee,

Nathan Goldbaum

Author: Nathan Goldbaum
Published on: Jan 16, 2013, 6:42:26 PM
Permalink - Source code

Particle Generators

Notebook Download

Generating particle initial conditions is now possible in yt. The following shows how to generate particle fields from pre-defined particle lists, lattice distributions, and distributions based on density fields.

First, we define a gridded density field where the particle density field has been "cloud-in-cell" (CIC) interpolated to the grid, and define a function that assigns a set of particle indices based on a number of particles and a starting index. This is for a case where we want to add particles to an already existing set but make sure they have uniqune indices.

In[1]:

from yt.mods import *
from yt.utilities.particle_generator import *
import yt.utilities.initial_conditions as ic
import yt.utilities.flagging_methods as fm
from yt.frontends.stream.api import refine_amr
from yt.utilities.lib import CICDeposit_3

def _pgdensity(field, data):
    blank = np.zeros(data.ActiveDimensions, dtype='float32')
    if data.NumberOfParticles == 0: return blank
    CICDeposit_3(data["particle_position_x"].astype(np.float64),
                 data["particle_position_y"].astype(np.float64),
                 data["particle_position_z"].astype(np.float64),
                 data["particle_gas_density"].astype(np.float32),
                 np.int64(data.NumberOfParticles),
                 blank, np.array(data.LeftEdge).astype(np.float64),
                 np.array(data.ActiveDimensions).astype(np.int32),
                 np.float64(data['dx']))
    return blank
add_field("particle_density_cic", function=_pgdensity,
          validators=[ValidateGridType()],
          display_name=r"$\mathrm{Particle}\/\mathrm{Density}$")

def add_indices(npart, start_num) :
    return np.arange((npart)) + start_num

Next, we'll set up a uniform grid with some random density data:

In[2]:

domain_dims = (128, 128, 128)
dens = 0.1*np.random.random(domain_dims)
fields = {"Density": dens}
ug = load_uniform_grid(fields, domain_dims, 1.0)

As a first example, we'll generate particle fields from pre-existing NumPy arrays. First, we define a list of particle field names, and then assign random positions to the particles in one corner of the grid. We then call FromListParticleGenerator, which generates the particles. assign_indices assigns the indices (using numpy.arange by default). apply_to_stream applies the particle fields to the grid.

In[3]:

num_particles1 = 10000
field_list = ["particle_position_x","particle_position_y",
              "particle_position_z","particle_gas_density"]
x = np.random.uniform(low=0.0, high=0.5, size=num_particles1) # random positions
y = np.random.uniform(low=0.0, high=0.5, size=num_particles1) # random positions
z = np.random.uniform(low=0.0, high=0.5, size=num_particles1) # random positions
pdata = {'particle_position_x':x,
         'particle_position_y':y,
         'particle_position_z':z}
particles1 = FromListParticleGenerator(ug, num_particles1, pdata)
particles1.assign_indices()
particles1.apply_to_stream()
yt : [INFO     ] 2013-01-01 21:24:32,484 Adding Density to list of fields
yt : [INFO     ] 2013-01-01 21:24:32,486 Adding particle_position_z to list of fields
yt : [INFO     ] 2013-01-01 21:24:32,487 Adding particle_index to list of fields
yt : [INFO     ] 2013-01-01 21:24:32,487 Adding particle_position_x to list of fields
yt : [INFO     ] 2013-01-01 21:24:32,488 Adding particle_position_y to list of fields

Now that the particles are part of the parameter file, they may be manipulated and plotted:

In[4]:

slc = SlicePlot(ug, 2, ["Density"], center=ug.domain_center)
slc.set_cmap("Density","spring")
slc.annotate_particles(0.2, p_size=10.0) # Display all particles within a thick slab 0.2 times the domain width
slc.show()
http://blog.yt-project.org/attachments/ParticleGenerator_files/ParticleGenerator_ipynb_fig_00.png

Now let's try adding a particle distribution in a lattice-shaped spatial arrangement. Let's choose ten particles on a side, and place them in a small region away from the random particles. We'll use the special add_indices function we defined earlier to assign indices that are all different from the ones the already existing particles have.

In[5]:

pdims = np.array([10,10,10]) # number of particles on a side in each dimension
ple = np.array([0.6,0.6,0.6]) # left edge of particle positions
pre = np.array([0.9,0.9,0.9]) # right edge of particle positions
particles2 = LatticeParticleGenerator(ug, pdims, ple, pre, field_list)
particles2.assign_indices(function=add_indices, npart=np.product(pdims),
                          start_num=num_particles1)
particles2.apply_to_stream()
yt : [INFO     ] 2013-01-01 21:24:33,957 Adding particle_gas_density to list of fields

We now have both sets of particles:

In[6]:

slc = SlicePlot(ug, 2, ["Density"], center=ug.domain_center)
slc.set_cmap("Density","spring")
slc.annotate_particles(0.2, p_size=10.0)
slc.show()
http://blog.yt-project.org/attachments/ParticleGenerator_files/ParticleGenerator_ipynb_fig_01.png

And by sorting all of the indices we can check that all of them are unique, as advertised:

In[7]:

dd = ug.h.all_data()
indices = np.sort(np.int32(dd["particle_index"]))
print "All indices unique = ", np.all(np.unique(indices) == indices)
All indices unique =  True

Now let's get fancy. We will use the initial conditions capabilities of yt to apply a spherically symmetric density distribution based on the "beta-model" functional form, and set up a refinement method based on overdensity. Then, we will call refine_amr to apply this density distribution and refine the grid based on the overdensity over some value.

In[8]:

fo = [ic.BetaModelSphere(1.0,0.1,0.5,[0.5,0.5,0.5],{"Density":(10.0)})]
rc = [fm.flagging_method_registry["overdensity"](4.0)]
pf = refine_amr(ug, rc, fo, 3)

Now, we have an interesting density field to serve as a distribution function for particle positions. What we do next is define a spherical region over which particle positions will be generated based on the local grid density. We also will map the grid density to a particle density field using cloud-in-cell interpolation. Finally, when we apply these particles, we will set the optional argument clobber=True, which will remove the particles we already created.

In[9]:

num_particles3 = 100000
map_dict = {"Density": "particle_gas_density"} # key is grid field, value is particle field
sphere = pf.h.sphere(pf.domain_center, (0.5, "unitary"))
particles3 = WithDensityParticleGenerator(pf, sphere, num_particles3,
                                          field_list)
particles3.assign_indices()
particles3.map_grid_fields_to_particles(map_dict) # Map density fields to particle fields
particles3.apply_to_stream(clobber=True) # Get rid of all pre-existing particles

Now we'll plot up both the grid density field and the "particle_density_cic" field (defined at the top of the script), which is mapped from the particles onto the grid. We also overplot the particle positions. These should roughly correspond to the non-zero values of "particle_density_cic", but there will be some discrepancies due to the fact that they are taken from a thick slab and only a slice of the grid-based field is shown.

In[10]:

slc = SlicePlot(pf, 2, ["Density","particle_density_cic"], center=pf.domain_center)
slc.set_log("Density", True)
slc.set_log("particle_density_cic", True)
slc.set_cmap("all", "spring")
slc.annotate_grids()
slc.annotate_particles(0.01,p_size=3)
slc.show()
http://blog.yt-project.org/attachments/ParticleGenerator_files/ParticleGenerator_ipynb_fig_02.png http://blog.yt-project.org/attachments/ParticleGenerator_files/ParticleGenerator_ipynb_fig_03.png
Author: John ZuHone
Published on: Jan 4, 2013, 7:05:00 AM
Permalink - Source code

2012 In Review

2012 was an amazing year for yt. Whether measured by improvements to the code or community activity, it has been the busiest and most productive yet.

Here are a few stats:

Thanks for everything, and here's to making 2013 even better!

Author: Matthew Turk
Published on: Dec 31, 2012, 11:21:30 PM
Permalink - Source code

3D Surfaces and Sketchfab

Surfaces

For a while now, yt has had the ability to extract isosurfaces from volumetric data using a marching cubes algorithm. The surfaces could be exported in OBJ format, values could be samples at the center of each face of the surface, and flux of a given field could be calculated over the surface. This means you could, for instance, extract an isocontour in density and calculate the mass flux over that isocontour. It also means you could export a surface from yt and view it in something like Blender, MeshLab, or even on your Android or iOS device in MeshPad or MeshLab Android. One important caveat with marching cubes is that with adaptive mesh refinement data, you will see cracks across refinement boundaries unless a "crack-fixing" step is applied to match up these boundaries. yt does not perform such an operation, and so there will be seams visible in 3D views of your isosurfaces.

The methods to do so were methods on data objects -- extract_isocontours, calculate_isocontour_flux -- which returned just numbers or values. However, recently, I've created a new object called AMRSurface that makes this process much easier. You can create one of these objects by specifying a source data object and a field over which to identify a surface at a given value. For example:

from yt.mods import *
pf = load("/data/workshop2012/IsolatedGalaxy/galaxy0030/galaxy0030")
sphere = pf.h.sphere("max", (1.0, "mpc"))
surface = pf.h.surface(sphere, "Density", 1e-27)

This object, surface, can now be queried for values on the surface. For instance:

print surface["Temperature"].min(), surface["Temperature"].max()

will return the values 11850.7476943 and 13641.0663899. These values are interpolated to the face centers of every triangle that constitutes a portion of the surface. Note that reading a new field requires re-calculating the entire surface, so it's not the fastest operation. You can get the vertices of the triangle by looking at the property .vertices.

Exporting to a File

If you want to export this to a PLY file you can call the routine export_ply, which will write to a file and optionally sample a field at every face or vertex, outputting a color value to the file as well. This file can then be viewed in MeshLab, Blender or on the website Sketchfab.com. But if you want to view it on Sketchfab, there's an even easier way!

Exporting to Sketchfab

Sketchfab is a website that uses WebGL, a relatively new technology for displaying 3D graphics in any browser. It's very fast and typically requires no plugins. Plus, it means that you can share data with anyone and they can view it immersively without having to download the data or any software packages! Sketchfab provides a free tier for up to 10 models, and these models can be embedded in websites.

There are lots of reasons to want to export to Sketchfab. For instance, if you're looking at a galaxy formation simulation and you publish a paper, you can include a link to the model in that paper (or in the arXiv listing) so that people can explore and see what the data looks like. You can also embed a model in a website with other supplemental data, or you can use Sketchfab to discuss morphological properties of a dataset with collaborators. It's also just plain cool.

The AMRSurface object includes a method to upload directly to Sketchfab, but it requires that you get an API key first. You can get this API key by creating an account and then going to your "dashboard," where it will be listed on the right hand side. Once you've obtained it, put it into your ~/.yt/config file under the heading [yt] as the variable sketchfab_api_key. If you don't want to do this, you can also supply it as an argument to the function export_sketchfab.

Now you can run a script like this:

from yt.mods import *
pf = load("redshift0058")
dd = pf.h.sphere("max", (200, "kpc"))
rho = 5e-27

bounds = [(dd.center[i] - 100.0/pf['kpc'],
           dd.center[i] + 100.0/pf['kpc']) for i in range(3)]

surf = pf.h.surface(dd, "Density", rho)

upload_id = surf.export_sketchfab(
    title = "RD0058 - 5e-27",
    description = "Extraction of Density (colored by Temperature) at 5e-27 " \
                + "g/cc from a galaxy formation simulation by Ryan Joung."
    color_field = "Temperature",
    color_map = "hot",
    color_log = True,
    bounds = bounds
)

and yt will extract a surface, convert to a format that Sketchfab.com understands (PLY, in a zip file) and then upload it using your API key. For this demo, I've used data kindly provided by Ryan Joung from a simulation of galaxy formation. Here's what my newly-uploaded model looks like, using the embed code from Sketchfab:

As a note, Sketchfab has a maximum model size of 50MB for the free account. 50MB is pretty hefty, though, so it shouldn't be a problem for most needs. We're working on a way to optionally upload links to the Sketchfab models on the yt Hub, but for now, if you want to share a cool model we'd love to see it!

Thanks to Sketchfab for such a cool service, and for helping us out along the way with their API.

Author: Matthew Turk
Published on: Dec 5, 2012, 3:24:42 PM
Permalink - Source code

The Rockstar Halo Finder in yt

Over the last few weeks, Matt Turk, Christopher Moody, and Stephen Skory have been working to improve the integration of the Rockstar halo finder in yt. Rockstar was written primarily by Peter Behroozi and has a main website here. Linked there is the source and the most current edition of the method paper which includes a timing and scaling study.

Rockstar is a six dimensional halo finder, meaning that it considers both particle position and momentum when locating dark matter halos. It is also capable of locating bound substructure in halos and producing a detailed merger tree. As of this writing its main deficit is that it cannot handle simulations with varying particle mass. This means that in simulations that include star particles, the star particles must be excluded for the purposes of halo finding. Also, Rockstar cannot analyze "zoom-in" or "nested" simulations with various values of dark matter particle mass.

Here is a brief list of the main improvements:

The full documentation on how to run Rockstar is available in the yt documentation.

Examples of Substructure Location

One of the compelling features of Rockstar is the ability to identify bound substructure of halos. Below are two images showing the halos identified by HOP and Rockstar over-plotted on a projection of gas density. Note that the circles mean different things in the two cases. In the case of HOP, the circles show the radius from the center of mass to the most distant particle, while for Rockstar it is from the center of mass to the calculated virial radius.

Paying attention to the central region of the halo, notice how Rockstar identifies the small in-falling subhalos that HOP doesn't. This is not surprising because HOP is not designed to detect substructure.

HOP:

HOP Output

Rockstar:

Rockstar Output

Note that In the Rockstar image, the halos on the periphery are not encircled due to the way the image was prepared.

Author: Stephen Skory <s@skory.us>
Published on: Nov 26, 2012, 8:22:16 PM
Permalink - Source code

What's Up With yt 3.0?

This is a long blog post! The short of it is:

As with all of yt, 3.0 is being developed completely in the open. We're testing using JIRA at http://yt-project.atlassian.net/ for tracking progress. The main development repository is at https://bitbucket.org/yt_analysis/yt-3.0 and discussions have been ongoing on the yt-dev mailing list. Down below you can find some contributor ideas and information!

Why 3.0?

The very first pieces of yt to be written are now a bit over six years old. When it started, it was a very simple Python wrapper around pyHDF, designed to make slices, then export those slices to ASCII where they were plotted by a plotting package called HippoDraw. It grew a bit to include projections and sphere selection over the course of a few months, and eventually became the community project it is today.

But, despite those initial steps being a relatively long time ago, there are still many vestiges in yt. For instance, the output of print_stats on an AMR hierarchy object is largely unchanged since that time.

Most importantly, however, is that yt needs to continue to adapt to best serve analysis and visualization needs in the community. To do that, yt 3.0 has been conceived as a project to rethink some of the basic assumptions and principles in yt. In doing so, we will be able to support new codes of different types, larger datasets, and most importantly enable us to grow the community of users and developers. In many ways, the developments in yt 3.0 will serve to clarify and simply the code base, but without sacrificing speed or memory. By changing the version number from 2.X to 3.0, we also send the signal that things may not work the same way -- and in fact, there may be API incompatibilities along the way. But they won't be changed without need, and we're trying to reduce disruption as much as possible.

yt 3.0 is designed to allow support for non-cartesian coordinates, non-grid data (SPH, unstructured mesh), and to remove many of the "Enzo-isms" that populate the code base. This brings with it a lot of work, but also a lot of opportunity.

If you have ideas, concerns or comments, email yt-dev!

What's Going In To 3.0?

We've slated a large number of items to be put into 3.0, as well as a large number of system rewrites. By approaching this piecemeal, we hope to address one feature or system at a time so that the code can remain in a usable state.

Geometry selection

In the 2.X series, all geometric selection (spheres, regions, disks) is conducted by looking first at grids, then points, and choosing which items go in. This also involves a large amount of numpy array concatenation, which isn't terribly good for memory.

The geometry selection routines have all been rewritten in Cython. Each geometric selection routine implements a selection method for grids and points. This allows non-grid based codes (such as particle-only codes) to use the same routines without a speed penalty. These routines all live inside yt/geometry/selection_routines.pyx, and adding a new routine is relatively straightforward.

The other main change with how geometry is handled is that data objects no longer know how the data is laid out on disk or in memory. In the past, data objects all had a _grids attribute. But, in 3.0, this can no longer be relied upon -- because we don't want all the data formats to have grids! Data is now laid out in format-neutral "chunks," which are designed to support selection based on spatial locality, IO convenience, or another arbitrary method. This allows the new GeometryHandler class to define how data should be read in off disk, and it reduces the burden on the data objects to understand how to best access data.

For instance, the GridGeometryHandler understands how to batch grid IO for best performance and how to feed that to the code-specific IO handler to request fields. This new method allows data objects to specifically request particular fields, understand which fields are being generated, and most importantly not need to know anything about how data is being read off disk.

It also allows dependencies for derived fields to be calculated before any IO is read off disk. Presently, if the field VelocityMagnitude is requested of a data object, the data object will read the three fields x-velocity, y-velocity and z-velocity (or their frontend-specific aliases -- see below for discussion of "Enzo-isms") independently. The new system allows these to be read in bulk, which cuts by a third the number of trips to the disk, and potentially reduces the cost of generating the field considerably.

Finally, it allows data objects to expose different chunking mechanisms, which simplifies parallelism and allows parallel analysis to respect a single, unified interface.

Geometry selection is probably the biggest change in 3.0, and the one that will enable yt to read particle codes in the same way it reads grid codes.

Removing Enzo-isms

yt was originally designed to read Enzo data. It wasn't until Jeff Oishi joined the project that we thought about expanding it beyond Enzo, to the code Orion, and at the time it was decided that we'd alias fields and parameters from Orion to the corresponding field names and parameters in Enzo. The Orion fields and parameters would still be available, but the canonical mechanism for referring to them from the perspective of derived fields would be the Enzo notation.

When we developed yt 2.0, we worked hard to remove many of the Enzo-isms from the parameter part of the system: instead of accessing items like pf["HubbleConstantNow"] (a clear Enzo-ism, with the problem that it's also not tab completable) we changed to accessing explicitly accessing pf.hubble_constant.

But the fields were still Enzo-isms: Density, Temperature, etc. For 3.0, we decided this will change. The standard for fields used in yt is still under discussion, but we are moving towards following PEP-8 like standards, with lowercase and underscores, and going with explicit field names over implicit field names. Enzo fields will be translated to this (but of course still accessible in the old way) and all derived fields will use this naming scheme.

Non-Cartesian Coordinates

From its inception, yt has only supported cartesian coordinates explicitly. There are surprisingly few places that this becomes directly important: the volume traversal, a few fields that describe field volumes, and the pixelizer routines.

Thanks to hard work by Anthony Scopatz and John ZuHone, we have now abstracted out most of these items. This work is still ongoing, but we have implemented a few of the basic items necessary to provide full support for cylindrical, polar and spherical coordinates. Below is a slice through a polar disk simulation, rendered with yt.

http://blog.yt-project.org/attachments/cylindrical_pixelizer.png

Unit Handling and Parameter Access

Units in yt have always been in cgs, but we would like to make it easier to convert fields and lengths. The first step in this direction is to use Casey Stark's project dimensionful ( http://caseywstark.com/blog/2012/code-release-dimensionful/ ). This project is ambitious and uses the package SymPy ( http://sympy.org ) for manipulating symbols and units, and it seems ideal for our use case. Fields will now carry with them units, and we will ensure that they are correctly propagated.

Related to this is how to access parameters. In the past, parameter files (pf) have been overloaded to provide dict-like access to parameters. This was degenerate with accessing units and conversion factors. In 3.0, you will need to explicitly access pf.parameters to access them.

Multi-Fluid and Multi-Particle Support

In yt 3.0, we want to be able to support simulations with separate populations of fluids and particles. As an example, in many cosmology simulations, both dark matter and stars are simulated. As it stands in yt 2.X, separating the two for analysis requires selecting the entire set of all particles and discarding those particles not part of the population of interest. Some simulation codes allow for subselecting particles in advance, but the means of addressing different particle types was never clear. For instance, it's not ideal to create new derived fields for each type of particle -- we want to re-use derived field definitions between particle types.

Some codes, such as Piernik (the code Kacper Kowalik, one of the yt developers, uses) also have support for multiple fluids. There's currently no clear way to address different types of fluid, and this suffers from the same issue the particles do.

In 3.0, fields are now specified by two characteristics, both of which have a default, which means you don't have to change anything if you don't have a multi-fluid or multi-particle simulation. But if you do, you can now access particles and fluids like this:

sp = pf.h.sphere("max", (10.0, 'kpc'))
total_star_mass = sp["Star", "ParticleMassMsun"].sum()

Furthermore, these field definitions can be accessed anywhere that allows a field definition:

sp = pf.h.sphere("max", (10.0, 'kpc'))
total_star_mass = sp.quantities["TotalQuantity"](("Star", "ParticleMassMsun"))

For codes that do allow easy subselection (like the sometime-in-the-future Enzo 3.0) this will also insert the selection of particle types directly in the IO frontend, preventing unnecessary reads or allocations of memory.

By using multiple fluids directly, we can define fields for angular momentum, mass and so on only once, but apply them to different fluids and particle types.

Supporting SPH and Octree Directly

One of the primary goals that this has all been designed around is supporting non-grid codes natively. This means reading Octree data directly, without the costly step of regridding it, as is done in 2.X. Octree data will be regarded as Octrees, rather than patches with cells in them. This can be seen in the RAMSES frontend and the yt/geometry/oct_container.pyx file, where the support for querying and manipulating Octrees can be found.

A similar approach is being taken with SPH data. However, as many of the core yt developers are not SPH simulators, we have enlisted people from the SPH community for help in this. We have implemented particle selection code (using Octrees for lookups) and are already able to perform limited quantitative analysis on those particles, but the next phase of using information about the spatial extent of particles is still to come. This is an exciting area, and one that requires careful thought and development.

How Far Along Is It?

Many of the items above are still in their infancy. However, several are already working. As it stands, RAMSES can be read and analyzed directly, but not volume rendered. The basics of reading SPH particles and quickly accessing them are done, but they are not yet able to be regarded as a fluid with spatial extent or visualized in a spatial manner. Geometry selection is largely done with the exception of boolean objects and covering grids. Units are still in their infancy, but the removal of Enzo-isms has begun. Finally, non-cartesian coordinates are somewhat but not completely functional; FLASH cylindrical datasets should be available, but they require some work to properly analyze still.

Why Would I Want To Use It?

The best part of many of these changes is that they're under the hood. But they also provide for cleaner scripts and a reduction in the effort to get started. And many of these improvements carry with them substantial speedups.

For example, reading a large data region off disk from an Enzo dataset is now nearly 50% faster than in 2.X, and the memory overhead is considerably lower (as we get rid of many intermediate allocations.) Using yt to analyze Octree data such as RAMSES and NMSU-ART is much more straightforward, and it requires no costly regridding step.

Perhaps the best reason to want to move to 3.0 is that it's going to be the primary line of development. Eventually 2.X will be retired, and hopefully the support of Octree and SPH code will help grow the community and bring new ideas and insight.

How Can I Help?

The first thing you can do is try it out! If you clone it from http://bitbucket.org/yt_analysis/yt-3.0 you can build it and test it. Many operations on patch based AMR will work (in fact, we run the testing suite on 3.0, and as of right now only covering grid tests fail) and you can also load up RAMSES data and project, slice, and analyze it.

If you run into any problems, please report them to either yt-users or yt-dev! And if you want to contribute, whether that be in the form of brainstorming, telling us your ideas about how to do things, or even contributing code and effort, please stop by either the #yt channel on chat.freenode.org or yt-dev, where we can start a conversation about how to proceed.

Thanks for making it all the way down -- 3.0 is the future of yt, and I hope to continue sharing new developments and status reports.

Author: Matthew Turk
Published on: Nov 15, 2012, 9:05:33 PM
Permalink - Source code