Programmer's Python Async - Process-Based Parallelism

Written by Mike James

Monday, 28 November 2022

Article Index
Programmer's Python Async - Process-Based Parallelism
Daemon
Waiting for the First to Complete

Page 3 of 3

Waiting for the First to Complete

The above example waits on p1 and p2 and p3 to finish. What is more difficult is to wait until one of the processes is complete, i.e. wait for p1 or p2 or p3, whichever completes first. The easiest way to do this is to make use of the Connection object which is introduced later as a way of communicating between processes. The technique relies on the sentinel attribute to return a handle to a system object that becomes “ready” when the process is complete. This is a low-level feature that changes how it is implemented depending on the operating system. The good news is that at the Python level it works in the same way under Linux and Windows.

The multiprocessing.connection.wait function will wait on a list of sentinel handles until one of them becomes “ready”. It returns a list of sentinel handles that have become ready, for example:

import multiprocessing
import multiprocessing.connection
import random
import time
def myProcess():
    time.sleep(random.randrange(1,4))
if __name__ == '__main__':
    p1=multiprocessing.Process(target=myProcess)
    p2=multiprocessing.Process(target=myProcess)
    p3=multiprocessing.Process(target=myProcess)
    p1.start()
    p2.start()
    p3.start()
    waitList= [p1.sentinel,p2.sentinel,p3.sentinel]
    res=multiprocessing.connection.wait(waitList)
    print(res) 
    print(waitList.index(res[0])+1)

The first part of the program simply creates three processes which wait for random times to use as an example of waiting for the first process to complete. The final part of the program builds a list of sentinel values, one per process. Then we use the wait function to suspend the parent thread until one of the child processes completes. The return value is a list of sentinel values that are “ready” and these values are easily converted into the numbers of the processes that have finished. Notice that the program only takes the first sentinel value in the list. In practice you might want to process them all. Also, as all the processes in this example are non-daemon, they all run to completion after the main process ends.

As the set of sentinel values only has to be an iterable, refer to Programmer’s Python: Everything Is An Object, ISBN: 978-1871962741 if you are not familiar with this distinction, you could write it as:

waitDict= {p1.sentinel:p1, p2.sentinel:p2,
                                   p3.sentinel:p3}
res=multiprocessing.connection.wait(waitDict)
print(waitDict[res[0]])

This has the advantage of making the Process object corresponding to the process that finished first easier to find, i.e. waitDict[res[0]] is the process object.

Computing Pi

As a simple example, suppose you want to compute the mathematical constant pi to a few digits using the well-known formula:

pi=4*(1-1/3+1/5-1/7 ... )

This is very easy to implement, we just need to generate the odd integers, but to get pi to a reasonable number of digits you have to compute a lot of terms. In other words, this series is very slow to converge. The simple-minded synchronous approach is to write something like:

def myPi(m,n):
    pi=0
    for k in range(m,n+1):
        s= 1 if k%2 else -1 
        pi += s / (2 * k - 1)
    print(4*pi)

This computes the series from the m^th to the n^th term. The reason for this elaboration is that it allows us to compute different parts of the series in different processes. Of course,

myPi(1,N)

computes the full series up to the N^th term.

If you try this out:

if __name__ == '__main__':
    N=10000000
    t1=time.perf_counter()
    myPi(1,N)
    t2=time.perf_counter()
    print((t2-t1)*1000)

You will find that it takes about 1700 ms to compute Pi to five digits on a medium speed Windows PC and 4500 ms on a four-core Raspberry Pi 4.

We can easily modify the calculation by splitting the sum into two portions and using a separate process for one half of the sum:

if __name__ == '__main__':
    N=10000000
    p1=multiprocessing.Process(target=myPi,
                                  args=(N//2+1,N))
    t1=time.perf_counter()
    p1.start()
    myPi(1,N//2)
    p1.join()
    t2=time.perf_counter()
    print((t2-t1)*1000)

Running this reduces the time to 1200 ms on the PC and 2500 ms the Pi 4.

If you try these programs out using an IDE or a debugger than you may well discover that there is no significant speed gain. As before, this is because of the way programs are run under the debugger – again, to appreciate the speed increase try running them from the command line.

Notice that the computation is performed using Python’s unlimited precision arithmetic, Bignum arithmetic, so you could continue to use this very slowly converging series to compute any number of decimal places. To know more about Python’s novel approach to large numbers see Chapter 2 of Programmer’s Python: Everything Is Data, ISBN: 978-1871962598.

Thus far we haven’t explored any way that data can be exchanged between processes, so our only option is to print the results from each one. The subject of sharing data between isolated process is a complicated one and is postponed until Chapter 7. Processes may be isolated from one another, but they do share a single Terminal instance and so the print sends data to the same output.

Increasing the number of processes to four on the Pi decreases the time to 1600ms, which demonstrates some decreasing returns on using parallelism. The complete program is:

import time
import multiprocessing
def myPi(m,n):
    pi=0
    for k in range(m,n+1):
        s= 1 if k%2 else -1 
        pi += s / (2 * k - 1)
    print(4*pi)    
    
if __name__ == '__main__':
    N=10000000
    p1=multiprocessing.Process(target=myPi,
                                 args=(N//4+1,N//4*2))
    p2=multiprocessing.Process(target=myPi,
                                 args=(N//4*2+1,N//4*3))
    p3=multiprocessing.Process(target=myPi,
                                 args=(N//4*3+1,N))
    t1=time.perf_counter()
    p1.start()
    p2.start()
    p3.start()
    myPi(1,N//4)
    p1.join()
    p2.join()
    p3.join()
    t2=time.perf_counter()
    print((t2-t1)*1000)

However, if you increase N to 100000000 the single-process version takes 45 s and the four-process version takes just 12 s. For longer running processes the initial overheads matter less.

In chapter but not included in this extract

Fork v Spawn
Forkserver

Summary

A process runs as a program in its own right, isolated from other processes.
Using multiple processes running on multiple cores can speed up a program.
The Process class can be used to create a child process and run a target function.
It is best practice to follow the three rules:

Always use if __name__ == '__main__': to ensure that the setup code doesn’t get run in the child process.
Do not define or modify any global resources in the setup code as these will not always be available in the child process.
Prefer to use parameters to pass initial data to the child process rather than global constants.

Daemon processes, i.e. daemon=True, stop running when their parent ends. Non-daemon processes, the default, continue to run until they complete.
You can use Join to wait for a process to finish.
multiprocessing.connection.wait can be used to wait for the first process to finish.
Python uses three different ways to start new processes – fork, spawn and forkserver.
Fork is the default on Linux and works by cloning the parent process and starting the target function running.
Spawn is the default on Windows and works by recreating the parent process by creating a new process, loading Python, loading and running the parent program before starting the target.
Forkserver creates a server loaded with Python and any modules that the child process needs. When a child process is created the server is forked and then the parent program is loaded and run before the target is started.
You can set the start method to be used to one of the supported methods on a particular operating system.

Programmer's Python:
Async
Threads, processes, asyncio & more

Is now available as a print book: Amazon

1) A Lightning Tour of Python.

2) Asynchronous Explained

3) Processed-Based Parallelism
Extract 1 Process Based Parallism
4) Threads
Extract 1 -- Threads
5) Locks and Deadlock

6) Synchronization

7) Sharing Data
Extract 1 - Pipes & Queues

8) The Process Pool
Extract 1 -The Process Pool 1

9) Process Managers

10) Subprocesses ***NEW!

11) Futures
Extract 1 Futures,

12) Basic Asyncio
Extract 1 Basic Asyncio

13) Using asyncio
Extract 1 Asyncio Web Client
14) The Low-Level API
Extract 1 - Streams & Web Clients
Appendix I Python in Visual Studio Code

Comments

or email your comment to: comments@i-programmer.info

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

<ASIN:1871962765>

<ASIN:1871962749>

<ASIN:1871962595>

<< Prev - Next

Last Updated ( Wednesday, 30 November 2022 )