Command line programs are classes, too!

Most OOP discussions focus on GUI or domain-specific development areas, completely ignoring the workhorse of computing: command line programs. This article examines CommandLineApp, a base class for creating command line programs as objects, with option and argument validation, help text generation, and more.

Although many of the hot new development topics are centered on web technologies like AJAX, regular command line programs are still an important part of most systems. Many system administration tasks still depend on command line programs, for example. Often, a problem is simple enough that there is no reason to build a graphical or web user interface when a straightforward command line interface will do the job. Command line programs are less glamorous than programs with fancy graphics, but they are still the workhorses of modern computing.

The Python standard library includes two modules for working with command line options. The getopt module presents an API that has been in use for decades on some platforms and is commonly available in many programming languages, from C to bash. The optparse module is more modern than getopt, and offers features such as type validation, callbacks, and automatic help generation. Both modules elect to use a procedural-style interface, though, and as a result neither has direct support for treating your command line application as a first class object. There is no facility for sharing common options between related programs using getopt. And, while it is possible to reuse optparse.OptionParser instances in different programs, it is not as natural as inheritance.

CommandLineApp is a base class for command line programs. It handles the repetitive aspects of interacting with the user on the command line such as parsing options and arguments, generating help messages, error handling, and printing status messages. To create your application, just make a subclass of CommandLineApp and concentrate on your own code. All of the information about switches, arguments, and help text necessary for your program to run is derived through introspection. Common options and behavior can be shared by applications through inheritance.

To create your application, just make a subclass of CommandLineApp and concentrate on your own code.

csvcat Requirements

Recently, I needed to combine data from a few different sources, including a database and a spreadsheet, to summarize the results. I wanted to import the merged data into a spreadsheet where I could perform the analysis. All of the sources were able to save data to comma-separated-value (CSV) files; the challenge was merging the files together. Using the csv module in the Python standard library, and CommandLineApp, I wrote a small program to read multiple CSV files and concatenate them into a single output file. The program,

csvcat, is a good illustration of how to create applications with CommandLineApp.

The requirements for csvcat were fairly simple. It needed to read one or more CSV files and combine them, without repeating the column headers that appeared in each input source. In some cases, the input data included columns I did not want, so I needed to be able to select the columns to include in the output. No sort feature was needed, since I was going to import it into a spreadsheet when I was done and I could sort the data after importing it. To make the program more generally useful, I also included the ability to select the output format using a csv module feature called “dialects”.

Analyzing the Help

Listing 1 shows the help output for the final version of csvcat, produced by running csvcat --help. Listing 2 shows the source for the program. All of the information in the help output is derived from the csvcat class through introspection. The help text follows a fairly standard layout. It begins with a description of the application, followed by increasingly more detailed descriptions of the syntax, arguments, and options. Application-specific help such as examples and argument ranges appears at the end.

Listing 1

Concatenate comma separated value files.


SYNTAX:

  csvcat [<options>] filename [filename...]

    -c col[,col...], --columns=col[,col...]
    -d name, --dialect=name
    --debug
    -h
    --help
    --quiet
    --skip-headers
    -v
    --verbose=level


ARGUMENTS:

    The names of comma separated value files, such as might be
    exported from a spreadsheet or database program.


OPTIONS:

    -c col[,col...], --columns=col[,col...]
        Limit the output to the specified columns. Columns are
        identified by number, starting with 0.

    -d name, --dialect=name
        Specify the output dialect name. Defaults to "excel".

    --debug
        Set debug mode to see tracebacks.

    -h
        Displays abbreviated help message.

    --help
        Displays verbose help message.

    --quiet
        Turn on quiet mode.

    --skip-headers
        Treat the first line of each file as a header, and only
        include one copy in the output.

    -v
        Increment the verbose level. Higher levels are more verbose.
        The default is 1.

    --verbose=level
        Set the verbose level.

EXAMPLES:


To concatenate 2 files, including all columns and headers:

  $ csvcat file1.csv file2.csv

To concatenate 2 files, skipping the headers in the second file:

  $ csvcat --skip-headers file1.csv file2.csv

To concatenate 2 files, including only the first and third columns:

  $ csvcat --col 0,2 file1.csv file2.csv


OUTPUT DIALECTS:

    excel-tab
    excel

Listing 2

#!/usr/bin/env python
"""Concatenate csv files.
"""

import csv
import sys
import CommandLineApp

class csvcat(CommandLineApp.CommandLineApp):
    """Concatenate comma separated value files.
    """

    EXAMPLES_DESCRIPTION = '''
To concatenate 2 files, including all columns and headers:

  $ csvcat file1.csv file2.csv

To concatenate 2 files, skipping the headers in the second file:

  $ csvcat --skip-headers file1.csv file2.csv

To concatenate 2 files, including only the first and third columns:

  $ csvcat --col 0,2 file1.csv file2.csv
'''

    def showVerboseHelp(self):
        CommandLineApp.CommandLineApp.showVerboseHelp(self)
        print
        print 'OUTPUT DIALECTS:'
        print
        for name in csv.list_dialects():
            print 't%s' % name
        print
        return

    skip_headers = False
    def optionHandler_skip_headers(self):
        """Treat the first line of each file as a header,
        and only include one copy in the output.
        """
        self.skip_headers = True
        return

    dialect = "excel"
    def optionHandler_dialect(self, name):
        """Specify the output dialect name.
        Defaults to "excel".
        """
        self.dialect = name
        return
    optionHandler_d = optionHandler_dialect

    columns = []
    def optionHandler_columns(self, *col):
        """Limit the output to the specified columns.
        Columns are identified by number, starting with 0.
        """
        self.columns.extend([int(c) for c in col])
        return
    optionHandler_c = optionHandler_columns

    def getPrintableColumns(self, row):
        """Return only the part of the row which should be printed.
        """
        if not self.columns:
            return row

        # Extract the column values, in the order specified.
        response = ()
        for c in self.columns:
            response += (row[c],)
        return response

    def getWriter(self):
        return csv.writer(sys.stdout, dialect=self.dialect)

    def main(self, *filename):
        """
        The names of comma separated value files, such as might be
        exported from a spreadsheet or database program.
        """
        headers_written = False

        writer = self.getWriter()

        # process the files in order
        for name in filename:
            f = open(name, 'rt')
            try:
                reader = csv.reader(f)

                if self.skip_headers:
                    if not headers_written:
                        # This row must include the headers for the output
                        headers = reader.next()
                        writer.writerow(self.getPrintableColumns(headers))
                        headers_written = True
                    else:
                        # We have seen headers before, and are skipping,
                        # so do not write the first row of this file.
                        ignore = reader.next()

                # Process the rest of the file
                for row in reader:
                    writer.writerow(self.getPrintableColumns(row))
            finally:
                f.close()
        return

if __name__ == '__main__':
    csvcat().run()

The program description is taken from the docstring of the csvcat class. Before it is printed, the text is split into paragraphs and reformatted using textwrap, to ensure that it is no wider than 80 columns of text.

The program description is followed by a syntax summary for the program. The options listed in the syntax section correspond to methods with names that begin with optionHandler_. For example, optionHandler_skip_headers() indicates that csvcat should accept a --skip-headers option on the command line.

The names of any non-optional arguments to the program appear in the syntax summary. In this case, csvcat needs the names of the files containing the input data. At least one file name is necessary, and multiple names can be given, as indicated by the fact that the filename argument to main() (line 78) uses the variable argument notation: *filename. A longer description of the arguments, taken from the docstring of the main() method (lines 79-82), follows the syntax summary. As with the general program summary, the description of the arguments is reformatted with textwrap to fit the screen.

Options and Their Arguments

Following the argument description is a detailed explanation of all of the options to the program. CommandLineApp examines each option handler method to build the option description, including the name of the option, alternative names for the same option, and the name and description of any arguments the option accepts. There are three variations of option handlers, based on the arguments used by the option.

The simplest kind of option does not take an argument at all, and is used as a “switch” to turn a feature on or off. The method optionHandler_skip_headers (lines 38-43) is an example of such a switch. The method takes no argument, so CommandLineApp recognizes that the option being defined does not take an argument either. To create the option name, the prefix is stripped from the method name, and the underscore is converted to a dash (--); optionHandler_skip_headers becomes --skip-headers.

Other options accept a single argument. For example, the --dialect option requires the name of the CSV output dialect. The method optionHandler_dialect (lines 46-51) takes one argument, called name. The suggested syntax for the option, as seen in Listing 1, is --dialect=name. The name of the method’s argument is used as the name of the argument to the option in the help text.

The -d option has the same meaning as --dialect, because optionHandler_d is an alias for optionHandler_dialect (line 52). CommandLineApp recognizes aliases, and combines the forms in the documentation so the alternative forms -d name and --dialect=name are described together.

It is often useful for an option to take multiple arguments, as with --columns. The user could repeat the option on the command line, but it is more compact to allow them to list multiple values in one argument list. When CommandLineApp sees an option handler method that takes a variable argument list, it treats the corresponding option as accepting a list of arguments. When the option appears on the command line, the string argument is split on any commas and the resulting list of strings is passed to the option handler method.

For example, optionHandler_columns (lines 55-60) takes a variable length argument named col. The option --columns can be followed by several column numbers, separated by commas. The option handler is called with the list of values pre-parsed. In the syntax description, the argument is shown repeating: --columns=col[,col…].

For all cases, the docstring from the option handler method serves as the help text for the option. The text of the docstring is reformatted using textwrap so both the code and help output are easy to read without extra effort on the part of the developer.

Application-specific Detailed Help

The general syntax and option description information is produced in the same way for all CommandLineApp programs. There are times when an application needs to include additional information in the help output, though, and there are two ways to add such information.

The first way is by providing examples of how to use the program on the command line. Although it is optional, including examples of how to apply different combinations of arguments to your program to achieve various results enhances the usefulness of the help as a reference manual. When the EXAMPLES_DESCRIPTION class attribute is set, it is used as the source for the examples. Unlike the other documentation strings, the EXAMPLES_DESCRIPTION is printed directly without being reformatted. This preserves the indentation and other formatting of the examples, so the user sees an accurate representation of the program’s inputs and outputs.

Occasionally, a program may need to include information in its help output which cannot be statically defined in a docstring or derived by CommandLineApp. At the very end of its help, csvcat includes a list of available CSV dialects which can be used with the --dialect option. Since the list of dialects must be constructed at runtime based on what dialects have been registered with the csv module, csvcat overrides showVerboseHelp() to print the list itself (lines 27-35).

Using csvcat

The inputs to csvcat are any number of CSV files, and the output is CSV data printed to standard output. To test csvcat during development, I created two small files with test data. Each file contains three columns of data: a number, a string, and a date.

$ cat testdata1.csv
"Title 1","Title 2","Title 3"
1,"a",08/18/07
2,"b",08/19/07
3,"c",08/20/07

The second file does not include quotes around any of the string fields. I chose to include this variation because csvcat does not quote its output, so using unquoted test data simulates re-processing the output of csvcat.

$ cat testdata2.csv
Title 1,Title 2,Title 3
40,D,08/21/07
50,E,08/22/07
60,F,08/23/07

The simplest use of csvcat is to print the contents of an input file to standard output. Notice that the output does not include quotes around the string fields.

$ csvcat testdata1.csv
Title 1,Title 2,Title 3
1,a,08/18/07
2,b,08/19/07
3,c,08/20/07

It is also possible to select which columns should be included in the output using the --columns option. Columns are identified by their number, beginning with ``. Column numbers can be listed in any order, so it is possible to reorder the columns of the input data, if needed.

$ csvcat --columns 2,0 testdata1.csv
Title 3,Title 1
08/18/07,1
08/19/07,2
08/20/07,3

Switching to tab-separated columns instead of comma-separated is easily accomplished by using the --dialect option. There are only two dialects available by default, but the the csv module API supports registering additional dialects.

$ csvcat --dialect excel-tab testdata1.csv
Title 1 Title 2 Title 3
1       a       08/18/07
2       b       08/19/07
3       c       08/20/07

For my project, there were input files with several columns, but only two of them needed to be included in the output. Each file had a single row of column headers. I only wanted one set of headers in the output, so the headers from subsequent files needed to be skipped. And the output had to be in a format I could import into a spreadsheet, for which the default “excel” dialect worked fine. The data was merged with a command like this:

$ csvcat --skip-headers --columns 2,0 testdata1.csv testdata2.csv
Title 3,Title 1
08/18/07,1
08/19/07,2
08/20/07,3
08/21/07,40
08/22/07,50
08/23/07,60

Running a CommandLineApp Program

Most of the work for csvcat is being done in the main() method. To invoke the application, however, the caller does not invoke main() directly. The program should be started by calling run(), so the options are validated and exceptions from main() are handled. The run() method is one of several methods that are not intended to be overridden by derived classes, since they implement the core features of a command line program. The source for CommandLineApp appears in Listing 3.

Listing 3

#!/usr/bin/env python
# CommandLineApp.py
"""Base class for building command line applications.
"""

import getopt
import inspect
import os
try:
    from cStringIO import StringIO
except:
    from StringIO import StringIO
import sys
import textwrap


class CommandLineApp(object):
    """Base class for building command line applications.

    Define a docstring for the class to explain what the program does.

    Include descriptions of the command arguments in the docstring for
    main().

    When the EXAMPLES_DESCRIPTION class attribute is not empty, it
    will be printed last in the help message when the user asks for
    help.
    """

    EXAMPLES_DESCRIPTION = ''

    # If true, always ends run() with sys.exit()
    force_exit = True

    # The name of this application
    _app_name = os.path.basename(sys.argv[0])

    _app_version = None

    def __init__(self, commandLineOptions=sys.argv[1:]):
        "Initialize CommandLineApp."
        self.command_line_options = commandLineOptions
        self.supported_options = self.scanForOptions()
        return

    def main(self, *args):
        """Main body of your application.

        This is the main portion of the app, and is run after all of
        the arguments are processed.  Override this method to implment
        the primary processing section of your application.
        """
        pass

    def handleInterrupt(self):
        """Called when the program is interrupted via Control-C
        or SIGINT.  Returns exit code.
        """
        sys.stderr.write('Canceled by user.n')
        return 1

    def handleMainException(self, err):
        """Invoked when there is an error in the main() method.
        """
        if self.debugging:
            import traceback
            traceback.print_exc()
        else:
            self.errorMessage(str(err))
        return 1

    ## HELP

    def showHelp(self, errorMessage=None):
        "Display help message when error occurs."
        print
        if self._app_version:
            print '%s version %s' % (self._app_name, self._app_version)
        else:
            print self._app_name
        print

        # If they made a syntax mistake, just
        # show them how to use the program.  Otherwise,
        # show the full help message.
        if errorMessage:
            print ''
            print 'ERROR: ', errorMessage
            print ''
            print ''
            print '%sn' % self._app_name
            print ''

        txt = self.getSimpleSyntaxHelpString()
        print txt
        print 'For more details, use --help.'
        print
        return

    def showVerboseHelp(self):
        "Display the full help text for the command."
        txt = self.getVerboseSyntaxHelpString()
        print txt
        return

    ## STATUS MESSAGES

    def statusMessage(self, msg='', verbose_level=1, error=False, newline=True):
        """Print a status message to output.

        Arguments

            msg=''            -- The status message string to be printed.

            verbose_level=1   -- The verbose level to use.  The message
                              will only be printed if the current verbose
                              level is >= this number.

            error=False       -- If true, the message is considered an error and
                              printed as such.

            newline=True      -- If true, print a newline after the message.

        """
        if self.verbose_level >= verbose_level:
            if error:
                output = sys.stderr
            else:
                output = sys.stdout
            output.write(str(msg))
            if newline:
                output.write('n')
            # some log mechanisms don't have a flush method
            if hasattr(output, 'flush'):
                output.flush()
        return

    def errorMessage(self, msg=''):
        'Print a message as an error.'
        self.statusMessage('ERROR: %sn' % msg, verbose_level=0, error=True)
        return

    ## DEFAULT OPTIONS

    debugging = False
    def optionHandler_debug(self):
        "Set debug mode to see tracebacks."
        self.debugging = True
        return

    _run_main = True
    def optionHandler_h(self):
        "Displays abbreviated help message."
        self.showHelp()
        self._run_main = False
        return

    def optionHandler_help(self):
        "Displays verbose help message."
        self.showVerboseHelp()
        self._run_main = False
        return

    def optionHandler_quiet(self):
        'Turn on quiet mode.'
        self.verbose_level = 0
        return

    verbose_level = 1
    def optionHandler_v(self):
        """Increment the verbose level.
        Higher levels are more verbose.
        The default is 1.
        """
        self.verbose_level = self.verbose_level + 1
        self.statusMessage('New verbose level is %d' % self.verbose_level,
                           3)
        return

    def optionHandler_verbose(self, level=1):
        """Set the verbose level.
        """
        self.verbose_level = int(level)
        self.statusMessage('New verbose level is %d' % self.verbose_level,
                           3)
        return

    ## INTERNALS (Subclasses should not need to override these methods)

    def run(self):
        """Entry point.

        Process options and execute callback functions as needed.
        This method should not need to be overridden, if the main()
        method is defined.
        """
        # Process the options supported and given
        options = {}
        for info in self.supported_options:
            options[ info.switch ] = info
        parsed_options, remaining_args = self.callGetopt(self.command_line_options,
                                                         self.supported_options)
        exit_code = 0
        try:
            for switch, option_value in parsed_options:
                opt_def = options[switch]
                opt_def.invoke(self, option_value)

            # Perform the primary action for this application,
            # unless one of the options has disabled it.
            if self._run_main:
                main_args = tuple(remaining_args)

                # We could just call main() and catch a TypeError,
                # but that would not let us differentiate between
                # application errors and a case where the user
                # has not passed us enough arguments.  So, we check
                # the argument count ourself.
                num_args_ok = False
                argspec = inspect.getargspec(self.main)
                expected_arg_count = len(argspec[0]) - 1

                if argspec[1] is not None:
                    num_args_ok = True
                    if len(argspec[0]) > 1:
                        num_args_ok = (len(main_args) >= expected_arg_count)
                elif len(main_args) == expected_arg_count:
                    num_args_ok = True

                if num_args_ok:
                    exit_code = self.main(*main_args)
                else:
                    self.showHelp('Incorrect arguments.')
                    exit_code = 1

        except KeyboardInterrupt:
            exit_code = self.handleInterrupt()

        except SystemExit, msg:
            exit_code = msg.args[0]

        except Exception, err:
            exit_code = self.handleMainException(err)
            if self.debugging:
                raise

        if self.force_exit:
            sys.exit(exit_code)
        return exit_code

    def scanForOptions(self):
        "Scan through the inheritence hierarchy to find option handlers."
        options = []

        methods = inspect.getmembers(self.__class__, inspect.ismethod)
        for method_name, method in methods:
            if method_name.startswith(OptionDef.OPTION_HANDLER_PREFIX):
                options.append(OptionDef(method_name, method))

        return options

    def callGetopt(self, commandLineOptions, supportedOptions):
        "Parse the command line options."
        short_options = []
        long_options = []
        for o in supportedOptions:
            if len(o.option_name) == 1:
                short_options.append(o.option_name)
                if o.arg_name:
                    short_options.append(':')
            elif o.arg_name:
                long_options.append('%s=' % o.switch_base)
            else:
                long_options.append(o.switch_base)

        short_option_string = ''.join(short_options)

        try:
            parsed_options, remaining_args = getopt.getopt(
                commandLineOptions,
                short_option_string,
                long_options)
        except getopt.error, message:
            self.showHelp(message)
            if self.force_exit:
                sys.exit(1)
            raise
        return (parsed_options, remaining_args)

    def _groupOptionAliases(self):
        """Return a sequence of tuples containing
        (option_names, option_defs)
        """
        # Figure out which options are aliases
        option_aliases = {}
        for option in self.supported_options:
            method = getattr(self, option.method_name)
            existing_aliases = option_aliases.setdefault(method, [])
            existing_aliases.append(option)

        # Sort the groups in order
        grouped_options = []
        for options in option_aliases.values():
            names = [ o.option_name for o in options ]
            grouped_options.append( (names, options) )
        grouped_options.sort()
        return grouped_options

    def _getOptionIdentifierText(self, options):
        """Return the option identifier text.

        For example:

          -h
          -v, --verbose
          -f bar, --foo bar
        """
        option_texts = []
        for option in options:
            option_texts.append(option.getSwitchText())
        return ', '.join(option_texts)

    def getArgumentsSyntaxString(self):
        """Look at the arguments to main to see what the program accepts,
        and build a syntax string explaining how to pass those arguments.
        """
        syntax_parts = []
        argspec = inspect.getargspec(self.main)
        args = argspec[0]
        if len(args) > 1:
            for arg in args[1:]:
                syntax_parts.append(arg)
        if argspec[1]:
            syntax_parts.append(argspec[1])
            syntax_parts.append('[' + argspec[1] + '...]')
        syntax = ' '.join(syntax_parts)
        return syntax

    def getSimpleSyntaxHelpString(self):
        """Return syntax statement.

        Return a simplified form of help including only the
        syntax of the command.
        """
        buffer = StringIO()

        # Show the name of the command and basic syntax.
        buffer.write('%s [<options>] %snn' %
                         (self._app_name, self.getArgumentsSyntaxString())
                     )

        grouped_options = self._groupOptionAliases()

        # Assemble the text for the options
        for names, options in grouped_options:
            buffer.write('    %sn' % self._getOptionIdentifierText(options))

        return buffer.getvalue()

    def _formatHelpText(self, text, prefix):
        if not text:
            return ''
        buffer = StringIO()
        text = textwrap.dedent(text)
        for para in text.split('nn'):
            formatted_para = textwrap.fill(para,
                                           initial_indent=prefix,
                                           subsequent_indent=prefix,
                                           )
            buffer.write(formatted_para)
            buffer.write('nn')
        return buffer.getvalue()

    def getVerboseSyntaxHelpString(self):
        """Return the full description of the options and arguments.

        Show a full description of the options and arguments to the
        command in something like UNIX man page format. This includes

          - a description of each option and argument, taken from the
                __doc__ string for the optionHandler method for
                the option

          - a description of what additional arguments will be processed,
                taken from the arguments to main()

        """
        buffer = StringIO()

        class_help_text = self._formatHelpText(inspect.getdoc(self.__class__),
                                               '')
        buffer.write(class_help_text)

        buffer.write('nSYNTAX:nn  ')
        buffer.write(self.getSimpleSyntaxHelpString())

        main_help_text = self._formatHelpText(inspect.getdoc(self.main), '    ')
        if main_help_text:
            buffer.write('nnARGUMENTS:nn')
            buffer.write(main_help_text)

        buffer.write('nOPTIONS:nn')

        grouped_options = self._groupOptionAliases()

        # Describe all options, grouping aliases together
        for names, options in grouped_options:
            buffer.write('    %sn' % self._getOptionIdentifierText(options))

            help = self._formatHelpText(options[0].help, '        ')
            buffer.write(help)

        if self.EXAMPLES_DESCRIPTION:
            buffer.write('EXAMPLES:nn')
            buffer.write(self.EXAMPLES_DESCRIPTION)
        return buffer.getvalue()


class OptionDef(object):
    """Definition for a command line option.

    Attributes:

      method_name - The name of the option handler method.
      option_name - The name of the option.
      switch      - Switch to be used on the command line.
      arg_name    - The name of the argument to the option handler.
      is_variable - Is the argument expected to be a sequence?
      default     - The default value of the option handler argument.
      help        - Help text for the option.
      is_long     - Is the option a long value (--) or short (-)?
    """

    # Option handler method names start with this value
    OPTION_HANDLER_PREFIX = 'optionHandler_'

    # For *args arguments to option handlers, how to split the argument values
    SPLIT_PARAM_CHAR = ','

    def __init__(self, methodName, method):
        self.method_name = methodName
        self.option_name = methodName[len(self.OPTION_HANDLER_PREFIX):]
        self.is_long = len(self.option_name) > 1

        self.switch_base = self.option_name.replace('_', '-')
        if len(self.switch_base) == 1:
            self.switch = '-' + self.switch_base
        else:
            self.switch = '--' + self.switch_base

        argspec = inspect.getargspec(method)

        self.is_variable = False
        args = argspec[0]
        if len(args) > 1:
            self.arg_name = args[-1]
        elif argspec[1]:
            self.arg_name = argspec[1]
            self.is_variable = True
        else:
            self.arg_name = None

        if argspec[3]:
            self.default = argspec[3][0]
        else:
            self.default = None

        self.help = inspect.getdoc(method)
        return

    def getSwitchText(self):
        """Return the description of the option switch.

        For example: --switch=arg or -s arg or --switch=arg[,arg]
        """
        parts = [ self.switch ]
        if self.arg_name:
            if self.is_long:
                parts.append('=')
            else:
                parts.append(' ')
            parts.append(self.arg_name)
            if self.is_variable:
                parts.append('[%s%s...]' % (self.SPLIT_PARAM_CHAR, self.arg_name))
        return ''.join(parts)


    def invoke(self, app, arg):
        """Invoke the option handler.
        """
        method = getattr(app, self.method_name)
        if self.arg_name:
            if self.is_variable:
                opt_args = arg.split(self.SPLIT_PARAM_CHAR)
                method(*opt_args)
            else:
                method(arg)
        else:
            method()
        return

if __name__ == '__main__':
    CommandLineApp().run()

The available and supported options are examined when the instance is initialized (lines 40-44). By default, the contents of sys.argv are used as the options and arguments passed in from the command line to the program. It is easy to pass a different list of options when writing automated tests for your program, by passing a list of strings to __init__() as commandLineOptions. The options supported by the program are determined by scanning the class for option handler methods. No options are actually evaluated until run() is called.

When the program is run, the first thing it does is use getopt to validate the options it has been given (line 201). In callGetopt(), the arguments needed by getopt are constructed based on the option handlers discovered for the class (lines 262-288). Options are processed in the order they are passed on the command line (lines 205-207), and the option handler method for each option encountered is called. When an option handler requires an argument that is not provided on the command line, getopt detects the error. When an argument is provided, the option handler is responsible for determining whether the value is the correct type or otherwise valid. When the argument is not valid, the option handler can raise an exception with an error message to be printed for the user.

After all of the options are handled, the remaining arguments to the program are checked to be sure there are enough to satisfy the requirements, based on the argspec of the main() function. The number of arguments is checked explicitly to avoid having to handle a TypeError if the user does not pass the right number of arguments on the command line. If CommandLineApp depended on catching a TypeError when it passed too few arguments to main(), it could not tell the difference between a coding error and a user error. If a mistake inside main() caused a TypeError to occur, it might look like the user had passed an incorrect number of arguments to the program.

Error Handling

When an exception is raised during option processing or inside main(), the exception is caught by one of the except clauses on lines 236-245 and given to an error handling method. Subclasses can change the error handling behavior by overriding these methods.

KeyboardInterrupt exceptions are handled by calling handleInterrupt(). The default behavior is to print a message that the program has been interrupted and cause the program to exit with an error code. A subclass could override the method to clean up an in-progress task, background thread, or other operation which otherwise might not be automatically stopped when the KeyboardInterrupt is received.

When a lower level library tries to exit the program, SystemExit may be raised. CommandLineApp traps the SystemExit exception and exits normally, using the exit status taken from the exception. If the force_exit attribute of the application is false, run() returns instead of exiting (lines 247-249). Trapping attempts to exit makes it easier to integrate CommandLineApp programs with unittest or other testing frameworks. The test can instantiate the application, set force_exit to a false value, then run it. If any errors occur, a status code is returned but the test process does not exit.

Trapping attempts to exit makes it easier to integrate CommandLineApp programs with unittest or other testing frameworks.

All other types of exceptions are handled by calling handleMainException() and passing the exception as an argument. The default implementation of handleMainException() (lines 62-70) prints a simple error message based on the exception, unless debugging mode is turned on. Debugging mode prints the entire traceback for the exception.

$ csvcat file_does_not_exist.csv
ERROR: [Errno 2] No such file or directory:
'file_does_not_exist.csv'

Option Definitions

The standard library module inspect provides functions for performing introspection operations on classes and objects at runtime. The API supports basic querying and type checking so it is possible, for example, to get a list of the methods of a class, including all inherited methods.

CommandLineApp.scanForOptions() uses inspect to scan an application class for option handler methods (lines 251-260). All of the methods of the class are retrieved with inspect.getmembers(), and those whose name starts with optionHandler_ are added to the list of supported options. Since most command line options use dashes instead of underscores, but method names cannot contain dashes, the underscores in the option handler method names are converted to dashes when creating the option name.

The __init__() method of the OptionDef class (lines 440-469) does all of the work of determining the command line switch name and what type of arguments the switch takes. The option handler method is examined with inspect.getargspec(), and the result is used to initialize the OptionDef.

An “argspec” for a function is a tuple made up of four values: a list of the names of all regular arguments to the function, including self if the function is a method; the name of the argument to receive the variable argument values, if any; the name of the argument to receive the keyword arguments, if any; and a list of the default values for the arguments, in they order they appear in the list of option names.

The argspecs for the option handlers in csvcat illustrate the variations of interest to OptionDef. First, optionHandler_skip_headers:

>>> import Listing2
>>> import inspect
>>> print inspect.getargspec(
... Listing2.csvcat.optionHandler_skip_headers)
(['self'], None, None, None)

Since the only positional argument to the method is self, and there is no variable argument name given, the option handler is treated as a simple command line switch without any arguments.

The optionHandler_dialect, on the other hand, does include an additional argument:

>>> print inspect.getargspec(
... Listing2.csvcat.optionHandler_dialect)
(['self', 'name'], None, None, None)

The name argument is listed in the argspec as a single regular argument. The result, when a program is run, is that while the options are being processed by CommandLineApp and OptionDef, the value for name is passed directly to the option handler method (line 497).

The optionHandler_columns method illustrates variable argument handling:

>>> print inspect.getargspec(
... Listing2.csvcat.optionHandler_columns)
(['self'], 'col', None, None)

The col argument from optionHandler_columns is named in the argspec as the variable argument identifier. Since optionHandler_columns accepts variable arguments, the OptionDef splits the argument value into a list of strings, and the list is passed to the option handler method (lines 494-495) using the variable argument syntax.

The other variable argument configuration, using unidentified keyword arguments, does not make sense for an option handler. The user of the command line program has no standard way to specify named arguments to options, so they are not supported by OptionDef.

Status Messages

In addition to command line option and argument parsing, and error handling, CommandLineApp provides a “status message” interface for giving varying levels of feedback to the user. Status messages are printed by calling self.statusMessage() (line 108). Each message must indicate the verbose level setting at which the message should be printed. If the current verbose level is at or higher than the desired level, the message is printed. Otherwise, it is ignored. The -v, --verbose, and --quiet flags let the user control the verbose_level setting for the application, and are defined in the CommandLineApp so that all subclasses inherit them.

Listing 4

#!/usr/bin/env python
# Illustrate verbose level controls.

import CommandLineApp

class verbose_app(CommandLineApp.CommandLineApp):
    "Demonstrate verbose level controls."

    def main(self):
        for i in range(1, 10):
            self.statusMessage('Level %d' % i, i)
        return

if __name__ == '__main__':
    verbose_app().run()

Listing 4 contains another sample application which uses statusMessage() to illustrate how the verbose level setting is applied. The default verbose level is 1, so when the program is run without any additional arguments only a single message is printed:

$ python Listing4.py
Level 1
$

The --quiet option silences all status messages by setting the verbose level to ``:

$ python Listing4.py --quiet
$

Using the -v option increases the verbose setting, one level at a time. The option can be repeated on the command line:

$ python Listing4.py -v
Level 1
Level 2
$ python Listing4.py -vv
New verbose level is 3
Level 1
Level 2
Level 3
$

And the --verbose option sets the verbose level directly to the desired value:

$ python Listing4.py --verbose 4
New verbose level is 4
Level 1
Level 2
Level 3
Level 4
$

Error messages can be printed to the standard error stream using the errorMessage() method (lines 138-141). The message is prefixed with the word “ERROR”, and error messages are always printed, no matter what verbose level is set. Most programs will not need to use errorMessage() directly, because raising an exception is sufficient to have an error message displayed for the user.

CommandLineApp and Inheritance

When creating a suite of related programs, it is usually desirable for all of the programs to use the same options and, in many cases, share other common behavior. For example, when working with a database the connection and transaction must be managed reliably. Rather than re-implementing the same database handling code in each program, by using CommandLineApp, you can create an intermediate base class for your programs and share a single implementation. Listing 5 includes a skeleton base class called SQLiteAppBase for working with an sqlite3 database in this way.

Listing 5

#!/usr/bin/env
# Base class for sqlite programs.

import sqlite3
import CommandLineApp

class SQLiteAppBase(CommandLineApp.CommandLineApp):
    """Base class for accessing sqlite databases.
    """

    dbname = 'sqlite.db'
    def optionHandler_db(self, name):
        """Specify the database filename.
        Defaults to 'sqlite.db'.
        """
        self.dbname = name
        return

    def main(self):
        # Subclasses can override this to control the arguments
        # used by the program.
        self.db_connection = sqlite3.connect(self.dbname)
        try:
            self.cursor = self.db_connection.cursor()
            exit_code = self.takeAction()
        except:
            # throw away changes
            self.db_connection.rollback()
            raise
        else:
            # save changes
            self.db_connection.commit()
        return exit_code

    def takeAction(self):
        """Override this in the actual application.
        Return the exit code for the application
        if no exception is raised.
        """
        raise NotImplementedError('Not implemented!')

if __name__ == '__main__':
    SQLiteAppBase().run()

SQLiteAppBase defines a single option handler for the --db option to let the user choose the database file (line 12). The default database is a file in the current directory called “sqlite.db”. The main() method establishes a connection to the database (line 22), opens a cursor for working with the connection (line 24), then calls takeAction() to do the work (line 25). When takeAction() raises an exception, all database changes it may have made are discarded and the transaction is rolled back (line 28). When there is no error, the transaction is committed and the changes are saved (line 32).

Listing 6

#!/usr/bin/env python
# Initialize the database

import time
from Listing5 import SQLiteAppBase

class initdb(SQLiteAppBase):
    """Initialize a database.
    """

    def takeAction(self):
        self.statusMessage('Initializing database %s' % self.dbname)
        # Create the table
        self.cursor.execute("CREATE TABLE log (date text, message text)")
        # Log the actions taken
        self.cursor.execute(
            "INSERT INTO log (date, message) VALUES (?, ?)",
            (time.ctime(), 'Created database'))
        self.cursor.execute(
            "INSERT INTO log (date, message) VALUES (?, ?)",
            (time.ctime(), 'Created log table'))
        return

if __name__ == '__main__':
    initdb().run()

A subclass of SQLiteAppBase can override takeAction() to do some actual work using the database connection and cursor created in main(). Listing 6 contains one such program, called initdb. In initdb, the takeAction() method creates a “log” table (line 14) using the database cursor established in the base class. It then inserts two rows into the new table, using the same cursor. There is no need for initdb to commit the transaction, since the base class will do that after takeAction() returns without raising an exception.

$ python Listing6.py
Initializing database sqlite.db

Listing 7

#!/usr/bin/env python
# Initialize the database

from Listing5 import SQLiteAppBase

class showlog(SQLiteAppBase):
    """Show the contents of the log.
    """

    substring = None
    def optionHandler_message(self, substring):
        """Look for messages with the substring.
        """
        self.substring = substring
        return

    def takeAction(self):
        if self.substring:
            pattern = '%' + self.substring + '%'
            c = self.cursor.execute(
                "SELECT * FROM log WHERE message LIKE ?;",
                (pattern,))
        else:
            c = self.cursor.execute("SELECT * FROM log;")

        for row in c:
            print '%-30s %s' % row
        return 0

if __name__ == '__main__':
    showlog().run()

The showlog program in Listing 7 also uses SQLiteAppBase. It reads records from the log table and prints them out to the screen. When no options are given, it uses the cursor opened by the base class to find all of the records in the “log” table (line 24), and print them:

$ python Listing7.py
Sat Aug 25 19:09:41 2007       Created database
Sat Aug 25 19:09:41 2007       Created log table

The --message option to showlog can be used to filter the output to include only records whose message column matches the pattern given. When a message substring is specified, the select statement is altered to include only messages containing the substring (lines 19-20). In this example, only log messages with the word “table” in the message are printed:

$ python Listing7.py --message table
Sat Aug 25 19:09:41 2007       Created log table

The updatelog program in Listing 8 inserts new records into the database. Each time updatelog is called, the message passed on the command line is saved as an instance attribute by main() (line 15) so it can be used later when a new row is inserted into the log table (line 20) by takeAction().

Listing 8

#!/usr/bin/env python
# Initialize the database

import time
from Listing5 import SQLiteAppBase

class updatelog(SQLiteAppBase):
    """Add to the contents of the log.
    """

    def main(self, message):
        """Provide the new message to add to the log.
        """
        # Save the message for use in takeAction()
        self.message = message
        return SQLiteAppBase.main(self)

    def takeAction(self):
        self.cursor.execute(
            "INSERT INTO log (date, message) VALUES (?, ?)",
            (time.ctime(), self.message))
        return

if __name__ == '__main__':
    updatelog().run()
$ python Listing8.py "another new message"
$ python Listing7.py
Sat Aug 25 19:09:41 2007       Created database
Sat Aug 25 19:09:41 2007       Created log table
Sat Aug 25 19:10:29 2007       another new message

As with initdb, because the base class commits changes to the database after takeAction() returns, updatelog does not need to manage the database connection in any way. Since all of the example programs use the database connection and cursor created by their base class, they could be updated to use a Postgresql or MySQL database by modifying the base class, without having to make those changes to each program separately.

Future Work

I have been using CommandLineApp in my own work for several years now, and continue to find ways to enhance it. The two primary features I would still like to add are the ability to print the help for a command in formats other than plain text, and automatic type conversion for arguments.

It is difficult to prepare attractive printed documentation from plain text help output like what is produced by the current version of CommandLineApp. Parsing the text output directly is not necessarily straightforward, since the embedded help may contain characters or patterns that would confuse a simple parser. A better solution is to use the option data gathered by introspection to generate output in a format such as DocBook, which could then be converted to PDF or HTML using other tool sets specifically designed for that purpose. There is a prototype of a program to create DocBook output from an application class, but it is not robust enough to be released – yet.

CommandLineApp is based on the older option parsing module, getopt, rather than the new optparse. This means it does not support some of the newer features available in optparse, such as type conversion for arguments. Type conversion could be added to CommandLineApp by inferring the types from default values for arguments. The OptionDef already discovers default values, but they are not used. The OptionDef.invoke() method needs to be updated to look at the default for an option before calling the option handler. If the default is a type object, it can be used to convert the incoming argument. If the default is a regular object, the type of the object can be determined using type(). Then, once the type is known, the argument can be converted.

Conclusion

I hope this article encourages you to think about your command line programs in a different light, and to treat them as first class objects. Using inheritance to share code is so common in other areas of development that it is hardly given a second thought in most cases. As has been shown with the SQLiteAppBase programs, the same technique can be just as powerful when applied to building command line programs, saving development time and testing effort as a result. CommandLineApp has been used as the foundation for dozens of types of programs, and could be just what you need the next time you have to write a new command line program.