Programmer's Python: Async - Subprocesses
Written by Mike James   
Monday, 24 June 2024
Article Index
Programmer's Python: Async - Subprocesses
Input/Output
Interaction
Non-Blocking Read Pipe
Program listing

Input/Output

When you use the run command the calling process waits for the subprocess to complete and then returns a CompletedProcess object with attributes that provide information about what happened:

  • args – the arguments used to launch the process, i.e. a copy of the first arguments used in run

  • returncode - exit status of the child process where 0 usually indicates success

You can use check_returncode() which raises a CalledProcessError if the return code was non-zero. You can also return the output of the subprocess into the stdout and stderr attributes, but for this to work you have to request that the output is captured by setting the capture_output to True, for example:

import subprocess
result = subprocess.run(["dir","/"],
capture_output = True)
print("results",result.stdout)

This suppresses any output from appearing on any console that might be in use and it allows you to process the returned string.

You can also arrange to send some data to the program. The important thing to realize is that such data is treated as if it was typed on the keyboard to the running program. To send input you simply use the input parameter. As an example designed to demonstrate automating user input, the command:

rm test -i

checks that the file exists and asks if it is OK to delete it. The user then types a Y or an N and the file is treated accordingly.

To delete the file without the question, you need to specify input:

import subprocess
result = subprocess.run(['rm', 'test', '-i'],
capture_output = True, input=b"y\n")

The input string will be “fed” into the running program as it demands input. Notice the use of “\n” to throw a newline. For example to delete two files:

import subprocess
result = subprocess.run(['rm', 'test1', 'test2', '-i'],
capture_output = True,input = b"y\ny\n")

The first “y\n” is used as input to the question to delete the first file and the second is used for the second.

This raises the issue of what the encoding of the input and output text is. By default all text input and output is performed using a byte sequence, something that is discussed in Programmer’s Python: Everything Is Data,ISBN: 978-1871962598. This is often all you need as most command line programs work in ASCII or the ASCII subset of UTF-8 depending on how you want think of it. If a command line program goes beyond basic UTF-8 or uses a different encoding then you can opt to build your own byte sequences that represent the text or you can specify the encoding and work with Python strings and allow the system to deal with the encoding/decoding.

Essentially the default mode is to open the pipe that connects the two processes in binary mode. This means that no encoding/decoding is performed and the data is transported between the two processes without modification.

You can open the pipe in text mode and accept default encoding/decoding by setting the text parameter to True. This means that the data is presented as Python UTF-8 strings and in addition line endings are all converted to \n. You can set the encoding in use with the encoding parameter. A common approach is to first use binary mode to discover exactly what codes are being transferred between the two processes and then change to text mode with the specific encoding.

Popen

The run command is a simplifying wrapper around the Popen class which gives you much more control over what happens, but at the expense of increased complexity. All of the parameters that we have looked at in using the run command can be used in Popen and they have the same meaning – as run passes them on to an internal call to Popen. The Popen class makes the use of pipes to connect the processes more obvious and direct manipulation of these pipes is the way it gains its flexibility. The fact that Popen is lower level means that there are many features which depend on the operating system you are using.

The most important thing to know about Popen is that it returns immediately and leaves the child process running – unlike run which waits for the child process to complete before returning. You can see that this is the case by comparing the output of:

import subprocess
subprocess.Popen(["dir","/"])
print("finished")

which displays finished followed by the directory listing, and:

import subprocess
subprocess.run(["dir","/"])
print("finished")

which displays the directory listing and then finished.

The main process can check what the child process is doing with the poll method Popen.poll() which returns None if the child process is still running and a returncode if it has terminated.

You can wait for a child process using Popen.wait(timeout=None) which waits for the process to terminate or the timeout to be up and returns a returncode if it has or raises a TimeoutExpired exception otherwise. If a child process refuses to stop you can terminate it using Popen.terminate() or Popen.kill(). These work in the same way as for general processes, see Chapter 3.

The fact that Popen returns at once means that the main process can get on with other tasks while the child process runs, but more specifically it allows the main process to interact with the child process. To do this the Popen has to specify that input and output from the child process is redirected to the main process via pipes. By default the I/O is not modified and connects to the console and keyboard with variable results depending on the system and the exact operating environment.

To specify where the standard I/O streams should be redirected you need use the stdin, stdout and stderr parameters. By default these are set to None. If you want to redirect them to a pipe that can be used by the main process then use PIPE. If you want to simply discard the data use DEVNULL. A more complex option is to specify a valid file object that will be used to store the data for later processing. You can also redirect the stderr stream to stdout by setting the stderr parameter to STDOUT.

To understand how this all works you have to understand that a command line program generally writes all its output to the stdout stream and reads any data it needs from stdin. Any error reports are written to stderr. When these streams are redirected to pipes everything works in exactly the same way, but the main process now plays the role of the user. The output data will sit in the pipe until the main process reads it. The child process will also wait for the main process to write data to the pipe if it performs an input operation. What this means is that the main process can interact with the child process by reading data and responding to it by writing data to the stdin pipe.

This sounds easy, but in practice it is generally much more difficult as humans are much more flexible in their interpretation of data than a program can easily mimic. Also notice that any program that doesn’t simply use stdin, stdout and stderr isn’t going to be controllable via Popen. In particular, any program that has a GUI interface is completely unsuitable.

The simplest, but not the most flexible, way to communicate with the child process is to use the communicate method:

Popen.communicate(input=None, timeout=None)

This sends the data assigned to input, if any, to stdin and then waits for the process to end before reading the pipe associated with stdout. If the timeout occurs first an exception is raised. In this sense using Popen with communicate doesn’t give you anything beyond the run function. For example, the dir program can be written to use Popen:

import subprocess
p = subprocess.Popen(["dir","/"],stdout=subprocess.PIPE)
stdout_data,stderr_data=p.communicate() print(stdout_data) print("finished")

and it works in much the same way.



Last Updated ( Monday, 24 June 2024 )