auto3dem Tutorial

Using the auto3dem Package on the IU Bloomington Clusters

Tim Baker's lab at University of California, San Diego has made publically available several data sets designed to show people how to use the auto3dem package to generate reconstructions of icosahedral viruses (and other macro-molecular complexes that have icosahedral or 532 point group symmetry). The EMC keeps copies of two of these data sets on an IUB computing cluster disk array: a very well-behaved ~2000 particle Reovirus data set that can be used to generate an ~10 Å resolution reconstruction using three commands (!) and an ~600 particle bacteriophage P22 data set that illustrates the entire reconstruction process in more detail.

These data sets come with all the instructions necessary to use them. Anyone should be able to copy the data sets to a machine that runs the auto3dem package, follow the instructions found with each and produce the correct final results. The additional details provided here are intended to show users how to do the data processing using the IUB computing clusters (specifically the Karst cluster) and to show some ways the EMC has implemented to do additional analysis of the image processing steps and results. In addition, because the auto3dem package is continually changing and the documentation in the various demos does not necessarily reflect the recent changes, the description here provides commands and commentary that work with the current auto3dem release installed on the cluster (version 4.05, as of spring, 2015).

NOTE: The data sets are stored on DCWAN as compressed archives (tar files, aka tarballs) and must be un-compressed before use. All these details are described in the individual sections below, but the linux commands used are only briefly described. Information about the individual linux commands and a brief overall description of how linux works ore offered. Further questions about the use of the computing clusters should be directed to David Morgan at dagmorga@indiana.edu. These commands will begin with a $ (the usual linux promote) and any comments about the command will be placed on the same line but separated from it using a # character (the pound sign).

It will be convenient in the discussion below to have a simple of understanding of how auto3dem works and the "units of data/information" that it utilizes. The auto3dem package is an example of one of the many different model-based procedures that are used in cryoTEM single particle reconstruction schemes. Other packages (e.g., EMAN, IMAGIC, SPIDER and XMIPP) use their own type of model-based alignment and reconstruction algorithms, and the details of auto3dem's Polar Fourier Transform (PFT) method will not be discussed here. The initial publication that describes the PFT procedure is a great place to learn the procedure's details, and David Morgan would be happy to share what he knows with anyone who asks.

When such model-based alignment procedures are used, a reference three-dimensional (3d) model is projected over a range of angles, individual images (or class-averages, though the rest of this discussion will use the term "individual images" to refer to either) are aligned to each projection of the 3d model and the "best match" between an individual image and a specific projection of the 3d model is selected. After every individual image has been compared to each of the model projections, the individual images are assigned the projection angles of the best matched 3d model projection and a new 3d model is then generated from the individual images and these projection angles. Every model-based alignment and 3d reconstruction procedure follows this sort of general scheme, though there is considerable variation in the details of how the different steps are accomplished. In the discussions below, the procedures of 3d model projection and their comparison to the individual images will simply be referred to as an alignment or a cycle of aligning/alignment.

The auto3dem package performs exactly this sort of model-based alignment and reconstruction and is designed to be used in an iterative fashion (i.e., once a new 3d model has been generated, a new set of projections are calculated, the individual images are again aligned to these new projections and those results are used to create an even newer 3d model. Iteration occurs for any number of cycles of alignment and 3d model creation, and it is the user's responsibility to determine when enough iterations have been performed. The user can also adjust parameters that control details of the alignment and reconstruction steps, with the goal of getting a better and better 3d model.

Fundamental Data Manipulated by the auto3dem Package

The following list contains names and descriptions of the fundamental units of either data or information that are manipulated by the auto3dem package. The descriptions both describe a particular unit and show the fundamental ways in which the units are inter-related. These names will be used in the discussions below and the reader is urged to refer to the descriptions in this list whenever necessary:

  • micrographs (or CCD camera images) - either digital scans of negatives that were recorded using a transmission electron microscope (TEM) or digital images acquired on a TEM using a CCD camera or a direct electron-detecting camera
  • particle co-ordinates - pairs of x- and y-cordinates that describe the location in a micrograph of individual icosahedral particles; some micrographs will have only a few particles while others will have 100's; please keep in mind that it is easier in the long run to start any cryoTEM single particle reconstruction project with a small number of micrographs that contain lots of individual particles
  • individual images (aka boxed particles) - images of icosahedral particles that have been selected from (aka boxed out of) the micrographs using a set of particle co-ordinates; auto3dem expects these boxed particles to be square (i.e., the x- and y-dimensions must be equal) and that their dimension is an odd number; also, auto3dem expects the contast in these images to show protein as dark
  • alignment data (aka alignment files or alignment angles, which can refer to only the angular data itself or can be used as shorthand for the full set of alignment data) - each individual image has an associated set of parameters that describe the alignment of that image to a 3d model; if there are no 3d models, the alignment data can be all zeroes or even non-existent (such "blank" alignment files are referred to as dummy alignment files in the discussion below); this alignment information consists of three alignment angles that describe the orientation of a boxed particle relative to the 3d model, an x/y co-ordinate pair that describes the center of the boxed particle and at least one of several quality scores for the comparison between an individual image and the best matched reprojection; there are additional parameters that can describe an individual image (e.g., a delta magnification parameter that is not 1.000 is sometimes used to indicates that a particular image is slightly smaller or larger than a 3d model to which it was compared); in addition, the alignment files contains information specific to the micrograph from which the boxed particles arose; this information includes microscope information (accelerating voltage, spherical aberration coefficient, defocus and astigmatism), and camera/scanner information (pixel size)
  • 3d model (aka reference model) - 3d models are reprojections (aka reprojected images)
  • reprojections - during an alignment cycle, the current 3d model is projected at a set of user (or program) specified angles that evenly fill the icosahedral unit cell; these projected images are called reprojections or reprojected images; the set of reprojections are compared to each individual image and the best match is saved; this best match determines a projection angle for each of the individual images, and that information along with the actual images are used to create a new 3d model

NOTE: Images and volumes are stored in what is called the Purdue Image Format or more often simply PIF format. These files use .pif as their file-extension. In some of the discussion below, some of the pif files will be converted into MRC formatted files. This is simply a matter of convenience for the user and is not usually necessary to use the auto3dem package.

Reovirus Data

This data set contains about 2000 images of Reovirus particles. The particles have already be boxed out of the initial micrographs. The defocus of each micrograph has also been previously determined. Since all this information is available with the data set, the only things that the user needs to do are:

  1. generate an initial 3d model
  2. iteratively refine the boxed particles against the updated 3d model of each iteration

Each of these steps can be accomplished with simple commands to the auto3dem package, though using Karst to do this work involves a bit more effort. The first steps shown below involve creating an area to do the work, and copying the data into that area. The file names used in the following description are completely arbitrary, but will be used consistently. Here are the set of comands that will create a working area in your home directory called auto3demTutorial, copy the compressed data set into that area and uncompress it:

$ cd # move to your home directory
$ mkdir auto3demTutorial # create the working area
$ cd auto3demTutorial # move into the new area
$ cp /N/dcwan/projects/cryoem/Common/auto3demTutorial/ReoVirus.tgz ./ # copies the archived data set to this new area
$ tar -xzvf ReoVirus.tgz # uncompress the archive

There is now a new directory in the working area that contains the uncompressed data set. A listing of the working area shows that uncompressing the archive created a new directory called ReoVirus. After moving into that new directory, two listing commands show what has been created in this new area (the first ls command) and in a sub-directory (the second ls command). In the output from these ls commands, dagmorga and chem are designations (user name and group affiliation) for the user who did this work (David Morgan). Your output will show similar information about you.

$ ls -l
total 753856
drwxr-xr-x 3 dagmorga chem 4096 Mar 18 2010 ReoVirus
-rw-r--r-- 1 dagmorga chem 385971321 Mar 24 09:51 ReoVirus.tgz
$ cd ReoVirus
$ ls -l
total 0
-rw-r--r-- 1 dagmorga chem 327 Mar 18 2010 README
-rw-r--r-- 1 dagmorga chem 674 Mar 18 2010 commands
drwxr-xr-x 2 dagmorga chem 4096 Mar 18 2010 dat
$ ls -l dat
total 977344
-rw-r--r-- 1 dagmorga chem 335 Mar 18 2010 6602.dat_000
-rw-r--r-- 1 dagmorga chem 61866932 Mar 18 2010 6602_boxn.pif
-rw-r--r-- 1 dagmorga chem 335 Mar 18 2010 6604.dat_000
-rw-r--r-- 1 dagmorga chem 60856868 Mar 18 2010 6604_boxn.pif
-rw-r--r-- 1 dagmorga chem 335 Mar 18 2010 6606.dat_000
-rw-r--r-- 1 dagmorga chem 55301516 Mar 18 2010 6606_boxn.pif
-rw-r--r-- 1 dagmorga chem 335 Mar 18 2010 6622.dat_000
-rw-r--r-- 1 dagmorga chem 48231068 Mar 18 2010 6622_boxn.pif
-rw-r--r-- 1 dagmorga chem 335 Mar 18 2010 6623.dat_000
-rw-r--r-- 1 dagmorga chem 60604352 Mar 18 2010 6623_boxn.pif
-rw-r--r-- 1 dagmorga chem 335 Mar 18 2010 6624.dat_000
-rw-r--r-- 1 dagmorga chem 50503712 Mar 18 2010 6624_boxn.pif
-rw-r--r-- 1 dagmorga chem 335 Mar 18 2010 6628.dat_000
-rw-r--r-- 1 dagmorga chem 56059064 Mar 18 2010 6628_boxn.pif
-rw-r--r-- 1 dagmorga chem 335 Mar 18 2010 6629.dat_000
-rw-r--r-- 1 dagmorga chem 51008744 Mar 18 2010 6629_boxn.pif
-rw-r--r-- 1 dagmorga chem 335 Mar 18 2010 6630.dat_000
-rw-r--r-- 1 dagmorga chem 55806548 Mar 18 2010 6630_boxn.pif

The last command shows the items in the dat directory: the files with extension .pif are the individual images of Reovirus capsids and the files with extension .dat_000 are alignment files that describe the images. These are "dummy" alignment files in that they only contain information about the micrograph from which the particles were boxed, while normal auto3dem alignment files would contain per-particle alignment information for each of the boxed particles.

The auto3dem package has a command that sets things up to create an unbiased initial 3d model for further work. This reference model is constructed using only a small subset of the boxed particles. The initial 3d model creation step assigns random alignment angles to a small set of individual images and then aligns those images against the 3d model. Although the initial assignment of alignment angles was totally random, iterating the alignment and model creation will frequently generate a 3d model that is close enough to the actual structure that it can be used as the initial 3d model for alignment and 3d model generation of the full set of boxed particles. Because this idea of using random alignment angles is only successful some (unknown) of the time, the auto3dem command repeats this random assignment of angles numerous times and automatically determines the best 3d model. The overall process is called radom model creation (abbreviated rmc or RMC in various places). The user must normally decide whether the best 3d model is actually useful, but for this well-behaved Reovirus data set, the RMC process will definitely work. The only command necessary to setup everything for the RMC process is setup_rmc.

The setup_rmc command is actually a Perl script from the auto3dem package that runs many other programs from the package. It also allows the user to change the default parameters, though for the Reovirus exercise, we are simply going to use the defaults. The following shows the results of running this command:

$ setup_rmc

Bacteriophage P22 Data

Content coming soon...