histogramMRC.sh Script

Under Construction!

In the following description of the histogramMRC.sh script, the script name and its content are shown in bold. Interspersed throughout the script are comments not contained in the actual script that describe what various pieces of the script are doing and that refer to the page that describes the overall action of this script. In these comments, things such as text that a user might type or text that is also contained in the script itself are also presented in bold.

NOTE: Comparison of the script shown here with a current version of the script from someplace else may show differences. Especially with regard to spacing and indentation, the script shown here and a script running on a linux machine may easily differ. It is also possible that changes will have been made to the script on a computer that reflect changes in available software or the actual operating system. Such differences can be ignored in terms of using what is shown here as an explanation of scripting.

This is a tcsh script and so it must start with #!/bin/tcsh. The -f option simply tells the tcsh shell to start without loading any resource or startup files, meaning that things start faster. The script then has a very short explanation (marked using the comment delimiter #) of what the script does in order to give a reader of the script an idea of what should happen. Throughout the script, comments are marked using initial #'s.

          #! /bin/tcsh -f
          #
          # Script to generate a histogram from an MRC file
          #
          # ----------------------------------------------------------------------

The next line defines the echo command to include the -e flag, which specifies that the command is to expect and use special characters to aid with formatting the output. This is not necessary in many cases, but is used in most of the EMC's scripts on Karst.

          set ECHO = '/bin/echo -e'

The following block is a complicated set of if-then-else that allows the user to type histogramMRC.sh help in order to obtain some information about the script's operation. This same construction also allows the user to receive a standard how-to-use message if something is typed that the script cannot understand (e.g., histogramMRC.sh filename triggers an error (e.g., wrong number of command line arguments) that tells the user the proper way to invoke the script). Whatever action is produced using this if-then-else construction, the exit status will be 1 (see the comment for the exit(0) statement for an explanation of the use of exit status values). All these actions are referred to elsewhere as step 1.

          if ( $#argv != 2 && $#argv != 3 ) then
             if ( $?1 && $1 == "help" ) then
                 $ECHO "\n\thistogramMRC.sh\n"
                 $ECHO "\t\t This script uses both the IMOD program header"
                 $ECHO "\t\t and the image2000/image2010 program histok.exe"
                 $ECHO "\t\t to generate a histogram of an input MRC file.\n"
                 $ECHO "\t\t The script assumes that the extension of the "
                 $ECHO "\t\t MRC file is 'mrc' and odd things will happen if"
                 $ECHO "\t\t this is not the case.\n"
                 $ECHO "\t\t The only inputs to the script are the name of"
                 $ECHO "\t\t the input file and a flag to produce an output"
                 $ECHO "\t\t histogram that contains either linear or log"
                 $ECHO "\t\t scaling along the y-axis.\n"
                 $ECHO "\t\t Simple instructions for using this script follow:\n"
               else
                 $ECHO "\n\tIncorrect number of arguments ($#argv) !\n"
                 $ECHO "\tProper usage is:\n"
             endif
             $ECHO "\t   histogramMRC.sh input linear/log <show>\n"
             $ECHO "\t       where input is the name of the input MRC file"
             $ECHO "\t             linear/log are the 2 choices for the 'mode'"
             $ECHO "\t                of the output (linear or log scaling )"
             $ECHO "\t           & <show> is an optional argument to display"
             $ECHO "\t                the output before the script closes\n"
             exit(1)
          endif

The following lines assign the first two command line arguments ( $1 and $2 ) to variables ( IMAGE and MODE, respectively) that are used in the remainder of the script and that are more intuitively obvious as to their values. The script then sets the names for a display program ( GV is set to the program xpdf ) and a file converter ( CONVERTER is set to the program ps2pdf ). Setting these near the beginning of the script makes it easier to modify the script for use on a computer system that (for example) does not have these two programs, but has different programs that would accomplish similar things.

          set IMAGE = $1
          set MODE  = $2

          # set the viewer; originally a postscript viewer, but it's now easier to convert to PDF...
          set GV = xpdf
          set CONVERTER = ps2pdf

The following if-then-else construction tests if there were three command line arguments when the file was run ( if ( $#argv == 3 ) then ). If that is true (the first part of the if clause), a variable named SHOW is set to the value show (and bear in mind that when variables are set or compared to string values, the string must appear inside quotation marks). If it is false (the else clause), that same SHOW variable is set to no. Note that the value of the third command line argument is irrelevant, and it is only the existence of a third argument that is important in this particular script. This section of the script is referred to as step 2 elsewhere.

          if ( $#argv == 3 ) then
             set SHOW  = "show"
           else
             set SHOW = "no"
          endif

The following test statement looks for whether the input image file exists and has a non-zero size ( -s ). The if clause that follows prints an error message and causes the script to exit with exit status 2 if the file dose not exist or has a size of zero. This part of the script is referred to as step 3 elsewhere.

          test -s ${IMAGE}
          if ( $status != 0 ) then
             $ECHO "\n\tFile $IMAGE does not exist...\n"
             exit(2)
          endif

This next if clause ensures that the second command line argument (set to MODE above) has only the values of linear or log. If MODE has some other value, the script prints an error message and exits (with exit status 3 this time). This part of the script is referred to as step 4 elsewhere.

          if ( $MODE != "linear" && $MODE != "log" ) then
             $ECHO "\n\tDisplay mode ($MODE) is not valid...\n"
             exit(3)
          endif

Since the program that is going to calculate the histogram does not understand the words linear or log, the next if-then-else construction sets the value of variable MODE to what the program will understand. Note that since the script previously made certain that the only values for MODE were linear or log, the if clause actually works by testing for whether the value of MODE is linear. If so, MODE is reset to 0 (which the histogram program will understand as "make the histogram using a linear scale"). If MODE is not linear (and therefore must be log), MODE is reset to 1 (which the histogram program will understand as "make the histogram using a logarithmic scale").

          if ( $MODE == "linear" ) then
              set MODE = 0
            else
              set MODE = 1
          endif

The program that calculates the image's histgram can actually work on sub-regions of the image (and the image can be three-dimensional). Inputs to the program include the starting and stopping positions in the x-, y- and z-dimensions. In this particular script, the entire image will be used for the the histogram. However, instead of using 0's as inputs to the histogram program, the script sets the variables Xstart, Ystart and Zstart to the value 0, which would make it easier in the future to use this script to histogram a portion of an image using input from the command line.

          # use variables here in case I want to make this input...
          set Xstart = 0
          set Ystart = 0
          set Zstart = 0

The stopping positions for the program that calculates the histogram are (in the case of using the entire image) the maximum values for the x-, y- and z-dimensions. This information is carried in the header of the MRC file and the following constructions are just a complicated way to read that information from the file's header:

The program header (from the IMOD suite of programs) reads the MRC header and would normally spew all that output to the terminal window. Instead, the output is fed to a grep command that looks for the line of output associated with the file's dimensions ( | grep "Number of columns" ). Since the script can not use the entire line of information, but rather the three individual dimensions, this line of output is passed to a sed command that replaces a string of .'s with a single ^ ( | sed -e "s/\. /^/g" , where the | (pipe) character is actually at the end of the previous line in the script ( | \ ), i.e., the \ (backslash) character indicates that the line is to be treated as if the following line were appended to the line being examined).

This altered line of output is passed to a pair of cut commands that first removes everything before and including the ^ ( | cut -f2 -d^ ) and then selects the sub-region of the remaining characters that contain the relevant information (e.g., | cut -c1-7 selects the part of the remaining output that only contains the x-dimension). Finally, all these individual steps are used to assign the final value to the three different variables (e.g., set Ydim = `header $IMAGE...`, where the paired ` (back quotes) characters tell the script to execute the commands within the back quotes before anything else is done). All this is messy (and compact due to piping from one command into another), but it works. Bear in mind that it only works because the script writer knew exactly how the output of the header command would appear, and how to select the relevant information from that output. This part of the script is referred to as step 5 elsewhere.

          # get some constants from image.  This is a bit ugly but should account for
          #       small changes in the formatting of the output from header
          set Xdim = `header $IMAGE | grep "Number of columns" | \
                       sed -e "s/\. /^/g" | cut -f2 -d^ | cut -c1-7 `
          set Ydim = `header $IMAGE | grep "Number of columns" | \
                       sed -e "s/\. /^/g" | cut -f2 -d^ | cut -c9-15 `
          set Zdim = `header $IMAGE | grep "Number of columns" | \
                       sed -e "s/\. /^/g" | cut -f2 -d^ | cut -c17-23 `

The actual histogram is calculated in this next part of the script. The histok.exe program is part of the MRC Image2010 package, which was not initially written to run using command line arguments. For that reason, the program is started ( histok.exe ), all the output is directed to a sort of write-only memory called /dev/null ( >> /dev/null ) and input to the progam is accepted line by line until the characters eot are reached ( << eot ). This use of the << eot construction (or actually "<< AnySortOfText" ) is a common way of sending input to programs that can not be run from the command line alone, but rather expect the user to supply input after the program starts to execute.

In the case of histok.exe, the file to process is set using the line before the program is started ( setenv IN $IMAGE, another relic from how the MRC package was initially constructed) and the expected input is a line containing the x-, y- and z-ranges to process ( $Xstart, $Xdim, $Ystart, $Ydim, $Zstart, $Zdim ) and a second line that contains only $MODE and that tells the program to use either a linear or a log scale for the output (and recall that the value of MODE was previously set to 0 or 1, since this program does not understand the words linear or log). Also bear in mind that when sending information to the histok.exe program, it is necessary to send not the name of the variable (e.g., MODE) but rather the value of the variable (e.g., $MODE). This part of the script is referred to as step 6 elsewhere.

          # now use Image2010's histok.exe to do a histogram of the entire image
          setenv IN $IMAGE
          histok.exe >> /dev/null << eot
          $Xstart, $Xdim, $Ystart, $Ydim, $Zstart, $Zdim
          $MODE
          eot

If the histok.exe program runs properly, a PostScript file called HISTO.PS will be created. The following part of the script deals with that output, first checking that the file was created and has a non-zero size ( test -s HISTO.PS ). The next part of the script is an if-then-else construction that deals with whether the file HISTO.PS was created (additional things happen) or not (the script ends with a message and exit status 4). The test for the PostScript output is referred to as step 7 elsewhere.

If the file was successfully created (the else clause), an output filename is constructed that drops the .mrc extension from the input filename ( set OUTPUT = `echo $IMAGE | sed -e "s\.mrc\\g"` ) and that has the string _histo appended to what remains ( set OUTPUT = ${OUTPUT}_histo ).

The program associated earlier with the variable CONVERTER is searched for ( which $CONVERTER >> /dev/null, where the output is being sent to /dev/null again), and the outcome of that search is fed into another if-then-else construction. If the conversion program is available (i.e., if the which command is successful), that program is used to convert HISTO.PS to PDF format ( $CONVERTER HISTO.PS $OUTPUT.pdf ), and the user is told the name of the output file ( $OUTPUT.pdf ). This part of the script is referred to as step 8 elsewhere.

If the conversion program can not be found, the HISTO.PS file is simply renamed to $OUTPUT.ps (i.e., a name based on the original input file but with the .ps extension indicating a PostScript file). When this happens, the user is notified that it was not possible to find the conversion program and is given the output file name (and the script terminates with exit status 5). This part of the script is referred to as step 9 elsewhere.

          test -s HISTO.PS
          if ( $status == 1 ) then
              $ECHO "\n\tScript failed to generate histogram...\n"
              exit(4)
            else
              set OUTPUT = `echo $IMAGE | sed -e "s\.mrc\\g"`
              set OUTPUT = ${OUTPUT}_histo
              which $CONVERTER >> /dev/null
              if  ( $status == 0 ) then
                  $CONVERTER HISTO.PS $OUTPUT.pdf
                  $ECHO "\n\tOutput histogram saved as $OUTPUT.pdf\n"
                else
                  $ECHO "\n\tUnable to find the requested converter ($CONVERTER)...\n"
                  $ECHO "\n\tOutput histogram saved as $OUTPUT.ps\n"
                  mv HISTO.PS $OUTPUT.ps
                  exit(5)
              endif
          endif

The final block of the script deals with whether to display the histogram file or not: if the variable SHOW is set to a value of show, then the script searches for the display program associated with variable GV (a step referred to as step 10 elsewhere) and does different things depending upon whether the program can be found or not (using exactly the same logic that deals with the conversion program's existence). If the display program can be found, the histogram output file is displayed (and the script only continues when the user shuts down the display program), and if it can not be found, the script relays that information to the user. This display part of the script is referred to as step 11 elsewhere.

          if ( $SHOW == "show" ) then
            which $GV >> /dev/null
            if ( $status == 0 ) then
               $GV $OUTPUT.pdf
              else
               $ECHO "\n\tUnable to find the requested postscript viewer ($GV)...\n"
            endif
          endif

Finally, the script terminates with exit status 0. If this script happened to be run inside another script, it would be possible to test for this value as a way of ensuring that the histogramMRC.sh script had run properly (meaning that additional steps in this second script that require the output of histogramMRC.sh could be performed). Such a script could also report the non-zero exit status to the user and could even have different output for exit status 1 (an indication that the script never really started), 2 (an indication that the input image did not exist), 3 (an indication that the second command line argument could not be understood), 4 (an indication that the HISTO.PS file was not created for some reason) or 5 (an indication that the script was unable to find the PostScript to PDF converter and so the output was a PostScript file).

          exit(0)

Here are a series of example outputs from the histogramMRC.sh script, where the command a user types is in bold and the different commands are separated by horizontal bars:


        
        $ histogramMRC.sh help

        histogramMRC.sh

                 This script uses both the IMOD program header
                 and the image2000/image2010 program histok.exe
                 to generate a histogram of an input MRC file.

                 The script assumes that the extension of the
                 MRC file is 'mrc' and odd things will happen if
                 this is not the case.

                 The only inputs to the script are the name of
                 the input file and a flag to produce an output
                 histogram that contains either linear or log
                 scaling along the y-axis.

                 Simple instructions for using this script follow:

           histogramMRC.sh input linear/log <show>

               where input is the name of the input MRC file
                     linear/log are the 2 choices for the 'mode'
                        of the output (linear or log scaling )
                   & <show> is an optional argument to display
                        the output before the script closes
                        

        
        $ histogramMRC.sh alignedTEM.mrc

        Incorrect number of arguments (1) !

        Proper usage is:

           histogramMRC.sh input linear/log <show>

               where input is the name of the input MRC file
                     linear/log are the 2 choices for the 'mode'
                        of the output (linear or log scaling )
                   & <show> is an optional argument to display
                        the output before the script closes
                        

        
        $ histogramMRC.sh alignedTEM.mrc linear

        Output histogram saved as alignedTEM_histo.pdf
        

        
        $ histogramMRC.sh alignedTEM.mrc log show

        Output histogram saved as alignedTEM_histo.pdf
        

Note that this last command caused the program xpdf to display the histogram that was created by the script. However, there is no additional output to the terminal window indicating this has happened.