os – Portable access to operating system specific features.

Purpose:Portable access to operating system specific features.
Python Version:1.4 (or earlier)

The os module provides a wrapper for platform specific modules such as posix, nt, and mac. The API for functions available on all platform should be the same, so using the os module offers some measure of portability. Not all functions are available on all platforms, however. Many of the process management functions described in this summary are not available for Windows.

The Python documentation for the os module is subtitled “Miscellaneous operating system interfaces”. The module includes mostly functions for creating and managing running processes or filesystem content (files and directories), with a few other random bits of functionality thrown in besides.

Note

Some of the example code below will only work on Unix-like operating systems.

Process Owner

The first set of functions to cover are used for determining and changing the process owner ids. These are mostly useful to authors of daemons or special system programs which need to change permission level rather than running as root. I won’t try to explain all of the intricate details of Unix security, process owners, etc. in this brief post. See the References list below for more details.

Let’s start with a script to show the real and effective user and group information for a process, and then change the effective values. This is similar to what a daemon would need to do when it starts as root during a system boot, to lower the privilege level and run as a different user. If you download the examples to try them out, you should change the TEST_GID and TEST_UID values to match your user.



import os

TEST_GID=501
TEST_UID=527

def show_user_info():
        print ‘Effective User  :’, os.geteuid()
        print ‘Effective Group :’, os.getegid()
        print ‘Actual User       :’, os.getuid(), os.getlogin()
        print ‘Actual Group     :’, os.getgid()
        print ‘Actual Groups   :’, os.getgroups()
        return

print ‘BEFORE CHANGE:’
show_user_info()
print

try:
        os.setegid(TEST_GID)
except OSError:
        print ‘ERROR: Could not change effective group.  Re-run as root.’
else:
        print ‘CHANGED GROUP:’
        show_user_info()
        print

try:
        os.seteuid(TEST_UID)
except OSError:
        print ‘ERROR: Could not change effective user.  Re-run as root.’
else:
        print ‘CHANGE USER:’
        show_user_info()
        print

When run as myself (527, 501) on OS X, I see this output:

$ python os_process_user_example.py
BEFORE CHANGE:
Effective User  : 527
Effective Group : 501
Actual User      : 527 dhellmann
Actual Group    : 501
Actual Groups   : [501, 101, 500, 102, 98, 80]

CHANGED GROUP:
Effective User  : 527
Effective Group : 501
Actual User      : 527 dhellmann
Actual Group    : 501
Actual Groups   : [501, 101, 500, 102, 98, 80]

CHANGE USER:
Effective User  : 527
Effective Group : 501
Actual User      : 527 dhellmann
Actual Group    : 501
Actual Groups   : [501, 101, 500, 102, 98, 80]

Notice that the values do not change. Since I am not running as root, processes I start cannot change their effective owner values. If I do try to set the effective user id or group id to anything other than my own, an OSError is raised.

Now let’s look at what happens when we run the same script using sudo to start out with root privileges:

$ sudo python os_process_user_example.py
BEFORE CHANGE:
Effective User  : 0
Effective Group : 0
Actual User      : 0 dhellmann
Actual Group    : 0
Actual Groups   : [0, 1, 2, 8, 29, 3, 9, 4, 5, 80, 20]

CHANGED GROUP:
Effective User  : 0
Effective Group : 501
Actual User      : 0 dhellmann
Actual Group    : 0
Actual Groups   : [501, 1, 2, 8, 29, 3, 9, 4, 5, 80, 20]

CHANGE USER:
Effective User  : 527
Effective Group : 501
Actual User      : 0 dhellmann
Actual Group    : 0
Actual Groups   : [501, 1, 2, 8, 29, 3, 9, 4, 5, 80, 20]

In this case, since we start as root, we can change the effective user and group for the process. Once we change the effective UID, the process is limited to the permissions of that user. Since non-root users cannot change their effective group, we need to change the group first then the user.

Besides finding and changing the process owner, there are functions for determining the current and parent process id, finding and changing the process group and session ids, as well as finding the controlling terminal id. These can be useful for sending signals between processes or for complex applications such as writing your own command line shell.

Process Environment

Another feature of the operating system exposed to your program though the os module is the environment. Variables set in the environment are visible as strings which can be read through os.environ or os.getenv(). Environment variables are commonly used for configuration values such as search paths, file locations, and debug flags. Let’s look at an example of retrieving an environment variable, and passing a value through to a child process.

import os

print 'Initial value:', os.environ.get('TESTVAR', None)
print 'Child process:'
os.system('echo $TESTVAR')

os.environ['TESTVAR'] = 'THIS VALUE WAS CHANGED'

print
print 'Changed value:', os.environ['TESTVAR']
print 'Child process:' 
os.system('echo $TESTVAR')

del os.environ['TESTVAR']

print
print 'Removed value:', os.environ.get('TESTVAR', None)
print 'Child process:' 
os.system('echo $TESTVAR')

The os.environ object follows the standard Python mapping API for retrieving and setting values. Changes to os.environ are exported for child processes.

$ python os_environ_example.py

THIS VALUE WAS CHANGED

Initial value: None
Child process:

Changed value: THIS VALUE WAS CHANGED
Child process:

Removed value: None
Child process:

Process Working Directory

A concept from operating systems with hierarchical filesystems is the notion of the “current working directory”. This is the directory on the filesystem the process uses as the default location when files are accessed with relative paths.

import os

print 'Starting:', os.getcwd()
print os.listdir(os.curdir)

print 'Moving up one:', os.pardir
os.chdir(os.pardir)

print 'After move:', os.getcwd()
print os.listdir(os.curdir)

Note the use of os.curdir and os.pardir to refer to the current and parent directories in a portable manner. The output should not be surprising:

$ python os_cwd_example.py
Starting: /Users/dhellmann/Documents/PyMOTW/src/PyMOTW/os
['__init__.py', '__init__.pyc', 'index.rst', 'os_access.py', 'os_cwd_example.py', 'os_directories.py', 'os_environ_example.py', 'os_exec_example.py', 'os_fork_example.py', 'os_kill_example.py', 'os_popen.py', 'os_popen2.py', 'os_popen2_seq.py', 'os_popen3.py', 'os_popen4.py', 'os_process_id_example.py', 'os_process_user_example.py', 'os_spawn_example.py', 'os_stat.py', 'os_stat_chmod.py', 'os_stat_chmod_example.txt', 'os_symlinks.py', 'os_system_background.py', 'os_system_example.py', 'os_system_shell.py', 'os_wait_example.py', 'os_waitpid_example.py', 'os_walk.py']
Moving up one: ..
After move: /Users/dhellmann/Documents/PyMOTW/src/PyMOTW
['__init__.py', '__init__.pyc', 'abc', 'about.rst', 'anydbm', 'array', 'articles', 'asynchat', 'asyncore', 'atexit', 'base64', 'BaseHTTPServer', 'bisect', 'builtins.rst', 'bz2', 'calendar', 'cc-by-nc-sa.png', 'cgitb', 'cmd', 'collections', 'commands', 'compileall', 'compression.rst', 'ConfigParser', 'contents.rst', 'contextlib', 'Cookie', 'copy', 'copyright.rst', 'cryptographic.rst', 'csv', 'data_types.rst', 'datetime', 'dbhash', 'dbm', 'decimal', 'dev_tools.rst', 'difflib', 'dircache', 'dis', 'docs', 'dumbdbm', 'EasyDialogs', 'exceptions', 'feed.png', 'file_access.rst', 'file_formats.rst', 'filecmp', 'fileinput', 'fnmatch', 'fractions', 'frameworks.rst', 'functools', 'gdbm', 'generic_os.rst', 'getopt', 'getpass', 'gettext', 'glob', 'grp', 'gzip', 'hashlib', 'heapq', 'history.rst', 'hmac', 'i18n.rst', 'imaplib', 'imp', 'importing.rst', 'inspect', 'internet_data.rst', 'internet_protocols.rst', 'ipc.rst', 'itertools', 'json', 'language.rst', 'linecache', 'locale', 'logging', 'mail.png', 'mailbox', 'mhlib', 'miscelaneous.rst', 'mmap', 'multiprocessing', 'numeric.rst', 'operator', 'optional_os.rst', 'optparse', 'os', 'ospath', 'pdf_contents.rst', 'persistence.rst', 'pickle', 'pipes', 'pkgutil', 'platform', 'plistlib', 'pprint', 'profile', 'profilers.rst', 'pwd', 'pyclbr', 'pydoc', 'Queue', 'readline', 'resource', 'rlcompleter', 'robotparser', 'runtime_services.rst', 'sched', 'shelve', 'shlex', 'shutil', 'signal', 'SimpleXMLRPCServer', 'smtpd', 'smtplib', 'SocketServer', 'string', 'string_services.rst', 'StringIO', 'struct', 'subprocess', 'sys', 'tabnanny', 'tarfile', 'tempfile', 'textwrap', 'threading', 'time', 'timeit', 'trace', 'traceback', 'unittest', 'unix.rst', 'urllib', 'urllib2', 'urlparse', 'uuid', 'warnings', 'weakref', 'webbrowser', 'whichdb', 'xmlrpclib', 'zipfile', 'zipimport', 'zlib']

Pipes

The os module provides several functions for managing the I/O of child processes using pipes. The functions all work essentially the same way, but return different file handles depending on the type of input or output desired. For the most part, these functions are made obsolete by the new-ish subprocess module (added in 2.4), but there is a good chance you will encounter them if you are maintaining existing code.

The most commonly used pipe function is popen(). It creates a new process running the command given and attaches a single stream to the input or output of that process, depending on the mode argument. While popen functions work on Windows, some of these examples assume some sort of Unix-like shell. The descriptions of the streams also assume Unix-like terminology:

  • stdin - The “standard input” stream for a process (file descriptor 0) is readable by the process. This is usually where terminal input goes.
  • stdout - The “standard output” stream for a process (file descriptor 1) is writable by the process, and is used for displaying non-error information to the user.
  • stderr - The “standard error” stream for a process (file descriptor 2) is writable by the process, and is used for conveying error messages.
import os

print 'popen, read:'
pipe_stdout = os.popen('echo "to stdout"', 'r')
try:
    stdout_value = pipe_stdout.read()
finally:
    pipe_stdout.close()
print '\tstdout:', repr(stdout_value)

print '\npopen, write:'
pipe_stdin = os.popen('cat -', 'w')
try:
    pipe_stdin.write('\tstdin: to stdin\n')
finally:
    pipe_stdin.close()
$ python os_popen.py
        stdin: to stdin
popen, read:
        stdout: 'to stdout\n'

popen, write:

The caller can only read from OR write to the streams associated with the child process, which limits the usefulness. The other popen variants provide additional streams so it is possible to work with stdin, stdout, and stderr as needed.

For example, popen2() returns a write-only stream attached to stdin of the child process, and a read-only stream attached to its stdout.

import os

print 'popen2:'
pipe_stdin, pipe_stdout = os.popen2('cat -')
try:
    pipe_stdin.write('through stdin to stdout')
finally:
    pipe_stdin.close()
try:
    stdout_value = pipe_stdout.read()
finally:
    pipe_stdout.close()
print '\tpass through:', repr(stdout_value)

This simplistic example illustrates bi-directional communication. The value written to stdin is read by cat (because of the ‘-‘ argument), then written back to stdout. Obviously a more complicated process could pass other types of messages back and forth through the pipe; even serialized objects.

$ python os_popen2.py
os_popen2.py:36: DeprecationWarning: os.popen2 is deprecated.  Use the subprocess module.
  pipe_stdin, pipe_stdout = os.popen2('cat -')
popen2:
        pass through: 'through stdin to stdout'

In most cases, it is desirable to have access to both stdout and stderr. The stdout stream is used for message passing and the stderr stream is used for errors, so reading from it separately reduces the complexity for parsing any error messages. The popen3() function returns 3 open streams tied to stdin, stdout, and stderr of the new process.

import os

print 'popen3:'
pipe_stdin, pipe_stdout, pipe_stderr = os.popen3('cat -; echo ";to stderr" 1>&2')
try:
    pipe_stdin.write('through stdin to stdout')
finally:
    pipe_stdin.close()
try:
    stdout_value = pipe_stdout.read()
finally:
    pipe_stdout.close()
print '\tpass through:', repr(stdout_value)
try:
    stderr_value = pipe_stderr.read()
finally:
    pipe_stderr.close()
print '\tstderr:', repr(stderr_value)

Notice that we have to read from and close both streams separately. There are some related to flow control and sequencing when dealing with I/O for multiple processes. The I/O is buffered, and if the caller expects to be able to read all of the data from a stream then the child process must close that stream to indicate the end-of-file. For more information on these issues, refer to the Flow Control Issues section of the Python library documentation.

$ python os_popen3.py
os_popen3.py:36: DeprecationWarning: os.popen3 is deprecated.  Use the subprocess module.
  pipe_stdin, pipe_stdout, pipe_stderr = os.popen3('cat -; echo ";to stderr" 1>&2')
popen3:
        pass through: 'through stdin to stdout'
        stderr: ';to stderr\n'

And finally, popen4() returns 2 streams, stdin and a merged stdout/stderr. This is useful when the results of the command need to be logged, but not parsed directly.

import os

print 'popen4:'
pipe_stdin, pipe_stdout_and_stderr = os.popen4('cat -; echo ";to stderr" 1>&2')
try:
    pipe_stdin.write('through stdin to stdout')
finally:
    pipe_stdin.close()
try:
    stdout_value = pipe_stdout_and_stderr.read()
finally:
    pipe_stdout_and_stderr.close()
print '\tcombined output:', repr(stdout_value)
$ python os_popen4.py
os_popen4.py:36: DeprecationWarning: os.popen4 is deprecated.  Use the subprocess module.
  pipe_stdin, pipe_stdout_and_stderr = os.popen4('cat -; echo ";to stderr" 1>&2')
popen4:
        combined output: 'through stdin to stdout;to stderr\n'

Besides accepting a single string command to be given to the shell for parsing, popen2(), popen3(), and popen4() also accept a sequence of strings (command, followed by arguments). In this case, the arguments are not processed by the shell.

import os

print 'popen2, cmd as sequence:'
pipe_stdin, pipe_stdout = os.popen2(['cat', '-'])
try:
    pipe_stdin.write('through stdin to stdout')
finally:
    pipe_stdin.close()
try:
    stdout_value = pipe_stdout.read()
finally:
    pipe_stdout.close()
print '\tpass through:', repr(stdout_value)
$ python os_popen2_seq.py
os_popen2_seq.py:36: DeprecationWarning: os.popen2 is deprecated.  Use the subprocess module.
  pipe_stdin, pipe_stdout = os.popen2(['cat', '-'])
popen2, cmd as sequence:
        pass through: 'through stdin to stdout'

File Descriptors

The os module includes the standard set of functions for working with low-level file descriptors (integers representing open files owned by the current process). This is a lower-level API than is provided by file() objects. I am going to skip over describing them here, since it is generally easier to work directly with file() objects. Refer to the library documentation for details if you do need to use file descriptors.

Filesystem Permissions

The function os.access() can be used to test the access rights a process has for a file.

import os

print 'Testing:', __file__
print 'Exists:', os.access(__file__, os.F_OK)
print 'Readable:', os.access(__file__, os.R_OK)
print 'Writable:', os.access(__file__, os.W_OK)
print 'Executable:', os.access(__file__, os.X_OK)

Your results will vary depending on how you install the example code, but it should look something like this:

$ python os_access.py
Testing: os_access.py
Exists: True
Readable: True
Writable: True
Executable: False

The library documentation for os.access() includes 2 special warnings. First, there isn’t much sense in calling os.access() to test whether a file can be opened before actually calling open() on it. There is a small, but real, window between the 2 calls during which the permissions on the file could change. The other warning applies mostly to networked filesystems which extend the POSIX permission semantics. Some filesystem types may respond to the POSIX call that a process has permission to access a file, then report a failure when the attempt is made using open() for some reason not tested via the POSIX call. All in all, it is better to call open() with the required mode and catch the IOError raised if there is a problem.

More detailed information about the file can be accessed using os.stat() or os.lstat() (if you want the status of something that might be a symbolic link).

import os
import sys
import time

if len(sys.argv) == 1:
    filename = __file__
else:
    filename = sys.argv[1]

stat_info = os.stat(filename)

print 'os.stat(%s):' % filename
print '\tSize:', stat_info.st_size
print '\tPermissions:', oct(stat_info.st_mode)
print '\tOwner:', stat_info.st_uid
print '\tDevice:', stat_info.st_dev
print '\tLast modified:', time.ctime(stat_info.st_mtime)

Once again, your results will vary depending on how the example code was installed. Try passing different filenames on the command line to os_stat.py.

$ python os_stat.py
os.stat(os_stat.py):
        Size: 1516
        Permissions: 0100644
        Owner: 527
        Device: 234881026
        Last modified: Thu Mar 12 07:29:46 2009

On Unix-like systems, file permissions can be changed using os.chmod(), passing the mode as an integer. Mode values can be constructed using constants defined in the stat module. Here is an example which toggles the user’s execute permission bit:

import os
import stat

filename = 'os_stat_chmod_example.txt'
if os.path.exists(filename):
    os.unlink(filename)
f = open(filename, 'wt')
f.write('contents')
f.close()

# Determine what permissions are already set using stat
existing_permissions = stat.S_IMODE(os.stat(filename).st_mode)

if not os.access(filename, os.X_OK):
    print 'Adding execute permission'
    new_permissions = existing_permissions | stat.S_IXUSR
else:
    print 'Removing execute permission'
    # use xor to remove the user execute permission
    new_permissions = existing_permissions ^ stat.S_IXUSR

os.chmod(filename, new_permissions)
    

The script assumes you have the right permissions to modify the mode of the file to begin with:

$ python os_stat_chmod.py
Adding execute permission

Directories

There are several functions for working with directories on the filesystem, including creating, listing contents, and removing them.

import os

dir_name = 'os_directories_example'

print 'Creating', dir_name
os.makedirs(dir_name)

file_name = os.path.join(dir_name, 'example.txt')
print 'Creating', file_name
f = open(file_name, 'wt')
try:
    f.write('example file')
finally:
    f.close()

print 'Listing', dir_name
print os.listdir(dir_name)

print 'Cleaning up'
os.unlink(file_name)
os.rmdir(dir_name)
$ python os_directories.py
Creating os_directories_example
Creating os_directories_example/example.txt
Listing os_directories_example
['example.txt']
Cleaning up

There are 2 sets of functions for creating and deleting directories. When creating a new directory with os.mkdir(), all of the parent directories must already exist. When removing a directory with os.rmdir(), only the leaf directory (the last part of the path) is actually removed. In contrast, os.makedirs() and os.removedirs() operate on all of the nodes in the path. os.makedirs() will create any parts of the path which do not exist, and os.removedirs() will remove all of the parent directories (assuming it can).

Walking a Directory Tree

The function os.walk() traverses a directory recursively and for each directory generates a tuple containing the directory path, any immediate sub-directories of that path, and the names of any files in that directory. This example shows a simplistic recursive directory listing.

import os, sys

# If we are not given a path to list, use /tmp
if len(sys.argv) == 1:
    root = '/tmp'
else:
    root = sys.argv[1]

for dir_name, sub_dirs, files in os.walk(root):
    print '\n', dir_name
    # Make the subdirectory names stand out with /
    sub_dirs = [ '%s/' % n for n in sub_dirs ]
    # Mix the directory contents together
    contents = sub_dirs + files
    contents.sort()
    # Show the contents
    for c in contents:
        print '\t%s' % c
$ python os_walk.py

/tmp
        .X0-lock
        .X11-unix/
        .keystone_install_lock
        .ksda.20F/
        .s.PGSQL.5432
        .s.PGSQL.5432.lock
        ccc_exclude.2HkEqq
        ccc_exclude.a5JJoz
        ccc_exclude.waawqY
        ccc_exclude.yuqI2O
        com.hp.launchport
        distribute-0.6.10.tar.gz
        distribute-0.6.8.tar.gz
        emacs527/
        example.db
        hb.95600/
        hb.95680/
        launch-eJNg4h/
        launch-lmuC2Y/
        launch-ngEJaX/
        launchd-128.khXXYy/
        pymotw_import_example.shelve
        trace_example.recurse.cover
        var_backups/

/tmp/.ksda.20F
        ksurl

/tmp/.X11-unix
        X0

/tmp/emacs527
        server

/tmp/hb.95600
        19402000
        19402001
        19402002
        19402003
        19402004
        19402005
        19402006
        19402007
        19402008
        19402009

/tmp/hb.95680

/tmp/launch-eJNg4h
        Render

/tmp/launch-lmuC2Y
        Listeners

/tmp/launch-ngEJaX
        :0

/tmp/launchd-128.khXXYy
        sock

/tmp/var_backups

Running External Command

Warning

Many of these functions for working with processes have limited portability. For a more consistent way to work with processes in a platform independent manner, see the subprocess module instead.

The simplest way to run a separate command, without interacting with it at all, is os.system(). It takes a single string which is the command line to be executed by a sub-process running a shell.

import os

# Simple command
os.system('ls -l')
$ python os_system_example.py
total 256
-rw-r--r--  1 dhellmann  dhellmann      0 Mar 12  2009 __init__.py
-rw-r--r--  1 dhellmann  dhellmann    110 May 16  2009 __init__.pyc
-rw-r--r--  1 dhellmann  dhellmann  21644 Mar  7 09:13 index.rst
-rw-r--r--  1 dhellmann  dhellmann   1360 Mar 12  2009 os_access.py
-rw-r--r--  1 dhellmann  dhellmann   1347 Mar 12  2009 os_cwd_example.py
-rw-r--r--  1 dhellmann  dhellmann   1499 Mar 12  2009 os_directories.py
-rw-r--r--  1 dhellmann  dhellmann   1573 Mar 12  2009 os_environ_example.py
-rw-r--r--@ 1 dhellmann  dhellmann   1241 Mar 12  2009 os_exec_example.py
-rw-r--r--  1 dhellmann  dhellmann   1267 Mar 12  2009 os_fork_example.py
-rw-r--r--  1 dhellmann  dhellmann   1703 Mar 12  2009 os_kill_example.py
-rw-r--r--  1 dhellmann  dhellmann   1476 Mar 12  2009 os_popen.py
-rw-r--r--  1 dhellmann  dhellmann   1418 Mar 12  2009 os_popen2.py
-rw-r--r--  1 dhellmann  dhellmann   1440 Mar 12  2009 os_popen2_seq.py
-rw-r--r--  1 dhellmann  dhellmann   1569 Mar 12  2009 os_popen3.py
-rw-r--r--  1 dhellmann  dhellmann   1478 Mar 12  2009 os_popen4.py
-rw-r--r--  1 dhellmann  dhellmann   1395 Mar 12  2009 os_process_id_example.py
-rw-r--r--  1 dhellmann  dhellmann   1842 Mar 12  2009 os_process_user_example.py
-rw-r--r--  1 dhellmann  dhellmann   1206 Mar 12  2009 os_spawn_example.py
-rw-r--r--  1 dhellmann  dhellmann   1516 Mar 12  2009 os_stat.py
-rw-r--r--@ 1 dhellmann  dhellmann   1751 Mar 15  2009 os_stat_chmod.py
-rwxr--r--  1 dhellmann  dhellmann      8 Mar  7 09:14 os_stat_chmod_example.txt
-rw-r--r--  1 dhellmann  dhellmann   1419 Mar 12  2009 os_symlinks.py
-rw-r--r--  1 dhellmann  dhellmann   1250 Mar 12  2009 os_system_background.py
-rw-r--r--  1 dhellmann  dhellmann   1191 Mar 12  2009 os_system_example.py
-rw-r--r--@ 1 dhellmann  dhellmann   1214 Mar 15  2009 os_system_shell.py
-rw-r--r--@ 1 dhellmann  dhellmann   1499 Mar 12  2009 os_wait_example.py
-rw-r--r--@ 1 dhellmann  dhellmann   1555 Mar 12  2009 os_waitpid_example.py
-rw-r--r--  1 dhellmann  dhellmann   1643 Mar 12  2009 os_walk.py

Since the command is passed directly to the shell for processing, it can even include shell syntax such as globbing or environment variables:

import os

# Command with shell expansion
os.system('ls -ld $TMPDIR')
$ python os_system_shell.py
drwx------  94 dhellmann  dhellmann  3196 Mar  7 09:14 /var/folders/9R/9R1t+tR02Raxzk+F71Q50U+++Uw/-Tmp-/

Unless you explicitly run the command in the background, the call to os.system() blocks until it is complete. Standard input, output, and error from the child process are tied to the appropriate streams owned by the caller by default, but can be redirected using shell syntax.

import os
import time

print 'Calling...'
os.system('date; (sleep 3; date) &')

print 'Sleeping...'
time.sleep(5)

This is getting into shell trickery, though, and there are better ways to accomplish the same thing.

$ python os_system_background.py
Sun Mar  7 09:14:29 EST 2010
Sun Mar  7 09:14:32 EST 2010
Calling...
Sleeping...

Creating Processes with os.fork()

The POSIX functions fork() and exec*() (available under Mac OS X, Linux, and other UNIX variants) are available through the os module. Entire books have been written about reliably using these functions, so check your library or bookstore for more details than I will present here.

To create a new process as a clone of the current process, use os.fork():

import os

pid = os.fork()

if pid:
    print 'Child process id:', pid
else:
    print 'I am the child'

Your output will vary based on the state of your system each time you run the example, but it should look something like:

$ python os_fork_example.py
Child process id: 14585
I am the child

After the fork, you end up with 2 processes running the same code. To tell which one you are in, check the return value. If it is 0, you are inside the child process. If it is not 0, you are in the parent process and the return value is the process id of the child process.

From the parent process, it is possible to send the child signals. This is a bit more complicated to set up, and uses the signal module, so let’s walk through the code. First we can define a signal handler to be invoked when the signal is received.

import os
import signal
import time

def signal_usr1(signum, frame):
   pid = os.getpid()
   print 'Received USR1 in process %s' % pid

Then we fork, and in the parent pause a short amount of time before sending a USR1 signal using os.kill(). The short pause gives the child process time to set up the signal handler.

print 'Forking...'
child_pid = os.fork()
if child_pid:
   print 'PARENT: Pausing before sending signal...'
   time.sleep(1)
   print 'PARENT: Signaling %s' % child_pid
   os.kill(child_pid, signal.SIGUSR1)

In the child, we set up the signal handler and go to sleep for a while to give the parent time to send us the signal:

else:
   print 'CHILD: Setting up signal handler'
   signal.signal(signal.SIGUSR1, signal_usr1)
   print 'CHILD: Pausing to wait for signal'
   time.sleep(5)

In a real app, you probably wouldn’t need to (or want to) call sleep, of course.

$ python os_kill_example.py
Forking...
PARENT: Pausing before sending signal...
PARENT: Signaling 14588
Forking...
CHILD: Setting up signal handler
CHILD: Pausing to wait for signal
Received USR1 in process 14588

As you see, a simple way to handle separate behavior in the child process is to check the return value of fork() and branch. For more complex behavior, you may want more code separation than a simple branch. In other cases, you may have an existing program you have to wrap. For both of these situations, you can use the os.exec*() series of functions to run another program. When you “exec” a program, the code from that program replaces the code from your existing process.

import os

child_pid = os.fork()
if child_pid:
    os.waitpid(child_pid, 0)
else:
    os.execlp('ls', 'ls', '-l', '/tmp/')
$ python os_exec_example.py
total 1616
-rw-------   1 root       wheel        1507 Mar  5 03:00 ccc_exclude.2HkEqq
-rw-------   1 root       wheel        1507 Mar  4 03:00 ccc_exclude.a5JJoz
-rw-------   1 root       wheel        1507 Mar  7 03:00 ccc_exclude.waawqY
-rw-------   1 root       wheel        1507 Mar  6 03:00 ccc_exclude.yuqI2O
srwxrwxrwx   1 dhellmann  wheel           0 Feb 22 17:20 com.hp.launchport
-rw-r--r--   1 dhellmann  wheel      385641 Mar  6 13:49 distribute-0.6.10.tar.gz
-rw-r--r--   1 dhellmann  wheel      390582 Mar  6 13:49 distribute-0.6.8.tar.gz
drwx------   3 dhellmann  wheel         102 Mar  6 12:22 emacs527
-rw-r--r--   1 dhellmann  wheel       12288 Mar  7 09:13 example.db
drwxr-xr-x  12 dhellmann  wheel         408 Mar  6 08:52 hb.95600
drwxr-xr-x   2 dhellmann  wheel          68 Mar  6 08:59 hb.95680
drwx------   3 dhellmann  wheel         102 Feb 22 17:20 launch-eJNg4h
drwx------   3 dhellmann  wheel         102 Feb 22 17:20 launch-lmuC2Y
drwx------   3 dhellmann  wheel         102 Feb 22 17:20 launch-ngEJaX
drwx------   3 dhellmann  wheel         102 Feb 22 17:20 launchd-128.khXXYy
-rw-r--r--   1 dhellmann  wheel       12288 Mar  7 09:10 pymotw_import_example.shelve
-rw-r--r--   1 dhellmann  wheel         448 Mar  7 09:11 trace_example.recurse.cover
drwxr-xr-x   2 dhellmann  dhellmann      68 Mar  4 03:15 var_backups

There are many variations of exec*(), depending on what form you might have the arguments in, whether you want the path and environment of the parent process to be copied to the child, etc. Have a look at the library documentation to for details.

For all variations, the first argument is a path or filename and the remaining arguments control how that program runs. They are either passed as command line arguments or override the process “environment” (see os.environ and os.getenv).

Waiting for a Child

Suppose you are using multiple processes to work around the threading limitations of Python and the Global Interpreter Lock. If you start several processes to run separate tasks, you will want to wait for one or more of them to finish before starting new ones, to avoid overloading the server. There are a few different ways to do that using wait() and related functions.

If you don’t care, or know, which child process might exit first os.wait() will return as soon as any exits:

import os
import sys
import time

for i in range(3):
    print 'PARENT: Forking %s' % i
    worker_pid = os.fork()
    if not worker_pid:
        print 'WORKER %s: Starting' % i
        time.sleep(2 + i)
        print 'WORKER %s: Finishing' % i
        sys.exit(i)

for i in range(3):
    print 'PARENT: Waiting for %s' % i
    done = os.wait()
    print 'PARENT:', done

Notice that the return value from os.wait() is a tuple containing the process id and exit status (“a 16-bit number, whose low byte is the signal number that killed the process, and whose high byte is the exit status”).

$ python os_wait_example.py
PARENT: Forking 0
WORKER 0: Starting
WORKER 0: Finishing
PARENT: Forking 0
PARENT: Forking 1
WORKER 1: Starting
WORKER 1: Finishing
PARENT: Forking 0
PARENT: Forking 1
PARENT: Forking 2
WORKER 2: Starting
WORKER 2: Finishing
PARENT: Forking 0
PARENT: Forking 1
PARENT: Forking 2
PARENT: Waiting for 0
PARENT: (14594, 0)
PARENT: Waiting for 1
PARENT: (14595, 256)
PARENT: Waiting for 2
PARENT: (14596, 512)

If you want a specific process, use os.waitpid().

import os
import sys
import time

workers = []
for i in range(3):
    print 'PARENT: Forking %s' % i
    worker_pid = os.fork()
    if not worker_pid:
        print 'WORKER %s: Starting' % i
        time.sleep(2 + i)
        print 'WORKER %s: Finishing' % i
        sys.exit(i)
    workers.append(worker_pid)

for pid in workers:
    print 'PARENT: Waiting for %s' % pid
    done = os.waitpid(pid, 0)
    print 'PARENT:', done
$ python os_waitpid_example.py
PARENT: Forking 0
WORKER 0: Starting
WORKER 0: Finishing
PARENT: Forking 0
PARENT: Forking 1
WORKER 1: Starting
WORKER 1: Finishing
PARENT: Forking 0
PARENT: Forking 1
PARENT: Forking 2
WORKER 2: Starting
WORKER 2: Finishing
PARENT: Forking 0
PARENT: Forking 1
PARENT: Forking 2
PARENT: Waiting for 14599
PARENT: (14599, 0)
PARENT: Waiting for 14600
PARENT: (14600, 256)
PARENT: Waiting for 14601
PARENT: (14601, 512)

wait3() and wait4() work in a similar manner, but return more detailed information about the child process with the pid, exit status, and resource usage.

Spawn

As a convenience, the os.spawn*() family of functions handles the fork() and exec*() calls for you in one statement:

import os

os.spawnlp(os.P_WAIT, 'ls', 'ls', '-l', '/tmp/')
$ python os_spawn_example.py
total 1616
-rw-------   1 root       wheel        1507 Mar  5 03:00 ccc_exclude.2HkEqq
-rw-------   1 root       wheel        1507 Mar  4 03:00 ccc_exclude.a5JJoz
-rw-------   1 root       wheel        1507 Mar  7 03:00 ccc_exclude.waawqY
-rw-------   1 root       wheel        1507 Mar  6 03:00 ccc_exclude.yuqI2O
srwxrwxrwx   1 dhellmann  wheel           0 Feb 22 17:20 com.hp.launchport
-rw-r--r--   1 dhellmann  wheel      385641 Mar  6 13:49 distribute-0.6.10.tar.gz
-rw-r--r--   1 dhellmann  wheel      390582 Mar  6 13:49 distribute-0.6.8.tar.gz
drwx------   3 dhellmann  wheel         102 Mar  6 12:22 emacs527
-rw-r--r--   1 dhellmann  wheel       12288 Mar  7 09:13 example.db
drwxr-xr-x  12 dhellmann  wheel         408 Mar  6 08:52 hb.95600
drwxr-xr-x   2 dhellmann  wheel          68 Mar  6 08:59 hb.95680
drwx------   3 dhellmann  wheel         102 Feb 22 17:20 launch-eJNg4h
drwx------   3 dhellmann  wheel         102 Feb 22 17:20 launch-lmuC2Y
drwx------   3 dhellmann  wheel         102 Feb 22 17:20 launch-ngEJaX
drwx------   3 dhellmann  wheel         102 Feb 22 17:20 launchd-128.khXXYy
-rw-r--r--   1 dhellmann  wheel       12288 Mar  7 09:10 pymotw_import_example.shelve
-rw-r--r--   1 dhellmann  wheel         448 Mar  7 09:11 trace_example.recurse.cover
drwxr-xr-x   2 dhellmann  dhellmann      68 Mar  4 03:15 var_backups

See also

os
Standard library documentation for this module.
subprocess
The subprocess module supersedes os.popen().
tempfile
The tempfile module for working with temporary files.
Unix Manual Page Introduction

Includes definitions of real and effective ids, etc.

http://www.scit.wlv.ac.uk/cgi-bin/mansec?2+intro

Speaking UNIX, Part 8.

Learn how UNIX multitasks.

http://www.ibm.com/developerworks/aix/library/au-speakingunix8/index.html

Unix Concepts

For more discussion of stdin, stdout, and stderr.

http://www.linuxhq.com/guides/LUG/node67.html

Delve into Unix Process Creation

Explains the life cycle of a UNIX process.

http://www.ibm.com/developerworks/aix/library/au-unixprocess.html

Advanced Programming in the UNIX(R) Environment
Covers working with multiple processes, such as handling signals, closing duplicated file descriptors, etc.

File Access

Bookmark and Share