Using the auto3dem Package on the IUB Clusters

Under Construction!

Tim Baker's lab at UC San Diego has made publically available several data sets designed to show people how to use the auto3dem package to generate reconstructions of icosahedral viruses (and other macro-molecular complexes that have icosahedral or 532 point group symmetry). The EMC keeps copies of two of these data sets on an IUB computing cluster disk array: a very well-behaved ~2000 particle Reovirus data set that can be used to generate an ~10 Å resolution reconstruction using three commands (!) and an ~600 particle bacteriophage P22 data set that illustrates the entire reconstruction process in more detail.

These data sets come with all the instructions necessary to use them. Anyone should be able to copy the data sets to a machine that runs the auto3dem package, follow the instructions found with each and produce the correct final results. The additional details provided here are intended to show users how to do the data processing using the IUB computing clusters (specifically the Karst cluster) and to show some ways the EMC has implemented to do additional analysis of the image processing steps and results. In addition, because the auto3dem package is continually changing and the documentation in the various demos does not necessarily reflect the recent changes, the description here provides commands and commentary that work with the current auto3dem release installed on the cluster (version 4.05, as of spring, 2015).

NOTE: The data sets are stored on DCWAN as compressed archives (tar files, aka tarballs) and must be un-compressed before use. All these details are described in the individual sections below, but the linux commands used are only briefly described. Information about the individual linux commands can be found here and a brief overall description of how linux works is here. Further questions about the use of the computing clusters should be directed to David Morgan. Also, in the following description of what to do, linux commands (including all their arguments) will be printed in bold. These commands will begin with a $ (the usual linux promot) and any comments about the command will be placed on the same line but separated from it using a # character (the pound sign).

It will be convenient in the discussion below to have a simple of understanding of how auto3dem works and the "units of data/information" that it utilizes. The auto3dem package is an example of one of the many different model-based procedures that are used in cryoTEM single particle reconstruction schemes. Other packages (e.g., EMAN, IMAGIC, SPIDER and XMIPP) use their own type of model-based alignment and reconstruction algorithms, and the details of auto3dem's Polar Fourier Transform (PFT) method will not be discussed here. The initial publication that describes the PFT procedure is a great place to learn the procedure's details, and David Morgan would be happy to share what he knows with anyone who asks.

When such model-based alignment procedures are used, a reference three-dimensional (3d) model is projected over a range of angles, individual images (or class-averages, though the rest of this discussion will use the term "individual images" to refer to either) are aligned to each projection of the 3d model and the "best match" between an individual image and a specific projection of the 3d model is selected. After every individual image has been compared to each of the model projections, the individual images are assigned the projection angles of the best matched 3d model projection and a new 3d model is then generated from the individual images and these projection angles. Every model-based alignment and 3d reconstruction procedure follows this sort of general scheme, though there is considerable variation in the details of how the different steps are accomplished. In the discussions below, the procedures of 3d model projection and their comparision to the individual images will simply be referred to as an alignment or a cycle of aligning/alignment (and will be shown in bold when relevant)

The auto3dem package performs exactly this sort of model-based alignment and reconstruction and is designed to be used in an iterative fashion (i.e., once a new 3d model has been generated, a new set of projections are calculated, the individual images are again aligned to these new projections and those results are used to create an even newer 3d model. Iteration occurs for any number of cycles of alignment and 3d model creation, and it is the user's responsibility to determine when enough iterations have been performed. The user can also adjust parameters that control details of the alignment and reconstruction steps, with the goal of getting a better and better 3d model.

The following list contains names and descriptions of the fundamental units of either data or information that are manipulated by the auto3dem package. The descriptions both describe a particular unit and show the fundamental ways in which the units are inter-related. These names (in bold) will be used in the discussions below and the reader is urged to refer to the descriptions in this list whenever necessary:

NOTE2: Images and volumes are stored in what is called the Purdue Image Format or more often simply PIF format. These files use .pif as their file-extension. In some of the discussion below, some of the pif files will be converted into MRC formatted files. This is simply a matter of convenience for the user and is not usually necessary to use the auto3dem package.

Reovirus Data

This data set contains about 2000 images of Reovirus particles. The particles have already be boxed out of the initial micrographs. The defocus of each micrograph has also been previously determined. Since all this information is available with the data set, the only things that the user needs to do are:

  1. generate an initial 3d model
  2. iteratively refine the boxed particles against the updated 3d model of each iteration

Each of these steps can be accomplished with simple commands to the auto3dem package, though using Karst to do this work involves a bit more effort. The first steps shown below involve creating an area to do the work, and copying the data into that area. The file names used in the following description are completely arbitrary, but will be used consistently. Here are the set of comands that will create a working area in your home directory called auto3demTutorial, copy the compressed data set into that area and uncompress it:

                  $ cd                          # move to your home directory
                  $ mkdir auto3demTutorial      # create the working area
                  $ cd auto3demTutorial         # move into the new area

                  $ cp /N/dcwan/projects/cryoem/Common/auto3demTutorial/ReoVirus.tgz ./
                                                # copies the archived data set to this new area
                  $ tar -xzvf ReoVirus.tgz      # uncompress the archive

There is now a new directory in the working area that contains the uncompressed data set. A listing of the working area shows that uncompressing the archive created a new directory called ReoVirus. After moving into that new directory, two listing commands show what has been created in this new area (the first ls command) and in a sub-directory (the second ls command). In the output from these ls commands, dagmorga and chem are designations (user name and group affiliation) for the user who did this work (David Morgan). Your output will show similar information about you.

                  $ ls -l
                  total 753856
                  drwxr-xr-x 3 dagmorga chem      4096 Mar 18  2010 ReoVirus
                  -rw-r--r-- 1 dagmorga chem 385971321 Mar 24 09:51 ReoVirus.tgz
                  $ cd ReoVirus
                  $ ls -l
                  total 0
                  -rw-r--r-- 1 dagmorga chem  327 Mar 18  2010 README
                  -rw-r--r-- 1 dagmorga chem  674 Mar 18  2010 commands
                  drwxr-xr-x 2 dagmorga chem 4096 Mar 18  2010 dat
                  $ ls -l dat 
                  total 977344
                  -rw-r--r-- 1 dagmorga chem      335 Mar 18  2010 6602.dat_000
                  -rw-r--r-- 1 dagmorga chem 61866932 Mar 18  2010 6602_boxn.pif
                  -rw-r--r-- 1 dagmorga chem      335 Mar 18  2010 6604.dat_000
                  -rw-r--r-- 1 dagmorga chem 60856868 Mar 18  2010 6604_boxn.pif
                  -rw-r--r-- 1 dagmorga chem      335 Mar 18  2010 6606.dat_000
                  -rw-r--r-- 1 dagmorga chem 55301516 Mar 18  2010 6606_boxn.pif
                  -rw-r--r-- 1 dagmorga chem      335 Mar 18  2010 6622.dat_000
                  -rw-r--r-- 1 dagmorga chem 48231068 Mar 18  2010 6622_boxn.pif
                  -rw-r--r-- 1 dagmorga chem      335 Mar 18  2010 6623.dat_000
                  -rw-r--r-- 1 dagmorga chem 60604352 Mar 18  2010 6623_boxn.pif
                  -rw-r--r-- 1 dagmorga chem      335 Mar 18  2010 6624.dat_000
                  -rw-r--r-- 1 dagmorga chem 50503712 Mar 18  2010 6624_boxn.pif
                  -rw-r--r-- 1 dagmorga chem      335 Mar 18  2010 6628.dat_000
                  -rw-r--r-- 1 dagmorga chem 56059064 Mar 18  2010 6628_boxn.pif
                  -rw-r--r-- 1 dagmorga chem      335 Mar 18  2010 6629.dat_000
                  -rw-r--r-- 1 dagmorga chem 51008744 Mar 18  2010 6629_boxn.pif
                  -rw-r--r-- 1 dagmorga chem      335 Mar 18  2010 6630.dat_000
                  -rw-r--r-- 1 dagmorga chem 55806548 Mar 18  2010 6630_boxn.pif

The last command shows the items in the dat directory: the files with extension .pif are the individual images of Reovirus capsids and the files with extension .dat_000 are alignment files that describe the images. These are "dummy" alignment files in that they only contain information about the micrograph from which the particles were boxed, while normal auto3dem alignment files would contain per-particle alignment information for each of the boxed particles.

The auto3dem package has a command that sets things up to create an unbiased initial 3d model for further work. This reference model is constructed using only a small subset of the boxed particles. The initial 3d model creation step assigns random alignment angles to a small set of individual images and then aligns those images against the 3d model. Although the initial assignment of alignment angles was totally random, iterating the alignment and model creation will frequently generate a 3d model that is close enough to the actual structure that it can be used as the initial 3d model for alignment and 3d model generation of the full set of boxed particles. Because this idea of using random alignment angles is only successful some (unknown) of the time, the auto3dem command repeats this random assignment of angles numerous times and automatically determines the best 3d model. The overall process is called radom model creation (abbreviated rmc or RMC in various places). The user must normally decide whether the best 3d model is actually useful, but for this well-behaved Reovirus data set, the RMC process will definitely work. The only command necessary to setup everything for the RMC process is setup_rmc.

The setup_rmc command is actually a Perl script from the auto3dem package that runs many other programs from the package. It also allows the user to change the default parameters, though for the Reovirus exercise, we are simply going to use the defaults. The following shows the results of running this command:

                  $ setup_rmc




Bacteriophage P22 Data