Programmer's Python Async - Process-Based Parallelism
Written by Mike James   
Monday, 28 November 2022
Article Index
Programmer's Python Async - Process-Based Parallelism
Daemon
Waiting for the First to Complete

Waiting for the First to Complete

The above example waits on p1 and p2 and p3 to finish. What is more difficult is to wait until one of the processes is complete, i.e. wait for p1 or p2 or p3, whichever completes first. The easiest way to do this is to make use of the Connection object which is introduced later as a way of communicating between processes. The technique relies on the sentinel attribute to return a handle to a system object that becomes “ready” when the process is complete. This is a low-level feature that changes how it is implemented depending on the operating system. The good news is that at the Python level it works in the same way under Linux and Windows.

The multiprocessing.connection.wait function will wait on a list of sentinel handles until one of them becomes “ready”. It returns a list of sentinel handles that have become ready, for example:

import multiprocessing
import multiprocessing.connection
import random
import time
def myProcess():
    time.sleep(random.randrange(1,4))
if __name__ == '__main__':
    p1=multiprocessing.Process(target=myProcess)
    p2=multiprocessing.Process(target=myProcess)
    p3=multiprocessing.Process(target=myProcess)
    p1.start()
    p2.start()
    p3.start()
    waitList= [p1.sentinel,p2.sentinel,p3.sentinel]
    res=multiprocessing.connection.wait(waitList)
    print(res) 
    print(waitList.index(res[0])+1)

The first part of the program simply creates three processes which wait for random times to use as an example of waiting for the first process to complete. The final part of the program builds a list of sentinel values, one per process. Then we use the wait function to suspend the parent thread until one of the child processes completes. The return value is a list of sentinel values that are “ready” and these values are easily converted into the numbers of the processes that have finished. Notice that the program only takes the first sentinel value in the list. In practice you might want to process them all. Also, as all the processes in this example are non-daemon, they all run to completion after the main process ends.

As the set of sentinel values only has to be an iterable, refer to Programmer’s Python: Everything Is An Object, ISBN: 978-1871962741 if you are not familiar with this distinction, you could write it as:

waitDict= {p1.sentinel:p1, p2.sentinel:p2,
p3.sentinel:p3} res=multiprocessing.connection.wait(waitDict) print(waitDict[res[0]])

This has the advantage of making the Process object corresponding to the process that finished first easier to find, i.e. waitDict[res[0]] is the process object.

Computing Pi

As a simple example, suppose you want to compute the mathematical constant pi to a few digits using the well-known formula:

pi=4*(1-1/3+1/5-1/7 ... )

This is very easy to implement, we just need to generate the odd integers, but to get pi to a reasonable number of digits you have to compute a lot of terms. In other words, this series is very slow to converge. The simple-minded synchronous approach is to write something like:

def myPi(m,n):
    pi=0
    for k in range(m,n+1):
        s= 1 if k%2 else -1 
        pi += s / (2 * k - 1)
    print(4*pi)

This computes the series from the mth to the nth term. The reason for this elaboration is that it allows us to compute different parts of the series in different processes. Of course,

myPi(1,N)

computes the full series up to the Nth term.

If you try this out:

if __name__ == '__main__':
    N=10000000
    t1=time.perf_counter()
    myPi(1,N)
    t2=time.perf_counter()
    print((t2-t1)*1000)

You will find that it takes about 1700 ms to compute Pi to five digits on a medium speed Windows PC and 4500 ms on a four-core Raspberry Pi 4.

We can easily modify the calculation by splitting the sum into two portions and using a separate process for one half of the sum:

if __name__ == '__main__':
    N=10000000
    p1=multiprocessing.Process(target=myPi,
args=(N//2+1,N)) t1=time.perf_counter() p1.start() myPi(1,N//2) p1.join() t2=time.perf_counter() print((t2-t1)*1000)

Running this reduces the time to 1200 ms on the PC and 2500 ms the Pi 4.

If you try these programs out using an IDE or a debugger than you may well discover that there is no significant speed gain. As before, this is because of the way programs are run under the debugger – again, to appreciate the speed increase try running them from the command line.

Notice that the computation is performed using Python’s unlimited precision arithmetic, Bignum arithmetic, so you could continue to use this very slowly converging series to compute any number of decimal places. To know more about Python’s novel approach to large numbers see Chapter 2 of Programmer’s Python: Everything Is Data, ISBN: 978-1871962598.

Thus far we haven’t explored any way that data can be exchanged between processes, so our only option is to print the results from each one. The subject of sharing data between isolated process is a complicated one and is postponed until Chapter 7. Processes may be isolated from one another, but they do share a single Terminal instance and so the print sends data to the same output.

Increasing the number of processes to four on the Pi decreases the time to 1600ms, which demonstrates some decreasing returns on using parallelism. The complete program is:

import time
import multiprocessing
def myPi(m,n):
    pi=0
    for k in range(m,n+1):
        s= 1 if k%2 else -1 
        pi += s / (2 * k - 1)
    print(4*pi)    
    
if __name__ == '__main__':
    N=10000000
    p1=multiprocessing.Process(target=myPi,
args=(N//4+1,N//4*2)) p2=multiprocessing.Process(target=myPi,
args=(N//4*2+1,N//4*3)) p3=multiprocessing.Process(target=myPi,
args=(N//4*3+1,N)) t1=time.perf_counter() p1.start() p2.start() p3.start() myPi(1,N//4) p1.join() p2.join() p3.join() t2=time.perf_counter() print((t2-t1)*1000)

However, if you increase N to 100000000 the single-process version takes 45 s and the four-process version takes just 12 s. For longer running processes the initial overheads matter less.

In chapter but not included in this extract

  • Fork v Spawn
  • Forkserver

Summary

  • A process runs as a program in its own right, isolated from other processes.

  • Using multiple processes running on multiple cores can speed up a program.

  • The Process class can be used to create a child process and run a target function.

  • It is best practice to follow the three rules:

      1. Always use if __name__ == '__main__': to ensure that the setup code doesn’t get run in the child process.

      2. Do not define or modify any global resources in the setup code as these will not always be available in the child process.

      3. Prefer to use parameters to pass initial data to the child process rather than global constants.

  • Daemon processes, i.e. daemon=True, stop running when their parent ends. Non-daemon processes, the default, continue to run until they complete.

  • You can use Join to wait for a process to finish.

  • multiprocessing.connection.wait can be used to wait for the first process to finish.

  • Python uses three different ways to start new processes – fork, spawn and forkserver.

  • Fork is the default on Linux and works by cloning the parent process and starting the target function running.

  • Spawn is the default on Windows and works by recreating the parent process by creating a new process, loading Python, loading and running the parent program before starting the target.

  • Forkserver creates a server loaded with Python and any modules that the child process needs. When a child process is created the server is forked and then the parent program is loaded and run before the target is started.

  • You can set the start method to be used to one of the supported methods on a particular operating system.

 

Programmer's Python:
Async
Threads, processes, asyncio & more

Is now available as a print book: Amazon

pythonAsync360Contents

1)  A Lightning Tour of Python

Python's Origins, Basic Python, Data Structures, Control Structures – Loops, Space Matters, Conditionals and Indenting, Pattern Matching, Everything Is An Object – References, Functions , Objects and Classes, Inheritance, Main and Modules, IDEs for Python, Pythonic – The Meta Philosophy, Where Next, Summary.

2) Asynchronous Explained

A Single Thread, Processes, I/O-Bound and CPU-Bound, Threads, Locking, Deadlock, Processes with Multiple Threads, Single-Threaded Async, Events,,Events or Threads, Callback Hell, More Than One CPU – Concurrency, Summary.

3) Processed-Based Parallelism

        Extract 1 - Process Based Parallism

The Process Class, Daemon, Waiting for Processes, Waiting for the First to Complete, Computing Pi, Fork v Spawn, Forkserve, Controlling Start Method, Summary.

4) Threads

        Extract 1 -- Threads

The Thread Class, Threads and the GIL, Threading Utilities, Daemon Threads, Waiting for a Thread, Local Variables, Thread Local Storage, Computing Pi with Multiple Threads, I/O-Bound Threads, Sleep(0), Timer Object, Summary.

5) Locks and Deadlock

Race Conditions, Hardware Problem or Heisenbug, Locks, Locks and Processes, Deadlock, Context Managed Locks, Recursive Lock, Semaphore, Atomic Operations, Atomic CPython, Lock-Free Code, Computing Pi Using Locks, Summary.

 6) Synchronization

Join, First To Finish, Events, Barrier, Condition Object, The Universal Condition Object, Summary.

 7) Sharing Data

The Queue, Pipes, Queues for Threads, Shared Memory,  Shared ctypes, Raw Shared Memory, Shared Memory, Manager, Computing Pi , Summary.

8) The Process Pool

Waiting for Pool Processes, Computing Pi using AsyncResult, Map_async, Starmap_async, Immediate Results – imap, MapReduce, Sharing and Locking, Summary.

9) Process Managers

The SyncManager, How Proxies Work, Locking, Computing Pi with a Manager, Custom Managers, A Custom Data Type, The BaseProxy, A Property Proxy, Remote Managers, A Remote Procedure Call, Final Thoughts, Summary.

10) Subprocesses

Running a program, Input/Output, Popen, Interaction, Non-Blocking Read Pipe, Using subprocess, Summary.

11) Futures

Futures, Executors, I/O-Bound Example, Waiting On Futures, Future Done Callbacks, Dealing With Exceptions, Locking and Sharing Data, Locking and Process Parameters, Using initializer to Create Shared Globals, Using a Process Manager to Share Resources, Sharing Futures and Deadlock, Computing Pi with Futures, Process Pool or Concurrent Futures, Summary.

12) Basic Asyncio

        Extract 1 Basic Asyncio

Callbacks, Futures and Await, Coroutines, Await, Awaiting Sleep, Tasks, Execution Order, Tasks and Futures, Waiting On Coroutines, Sequential and Concurrent, Canceling Tasks, Dealing With Exceptions, Shared Variables and Locks, Context Variables, Queues, Summary.

13) Using asyncio

   Extract 1 Asyncio Web Client

Streams, Downloading a Web Page, Server, A Web Server, SSL Server, Using Streams, Converting Blocking To Non-blocking, Running in Threads, Why Not Just Use Threads, CPU-Bound Tasks, Asyncio-Based Modules, Working With Other Event Loops – Tkinter, Subprocesses, Summary.

14) The Low-Level API

    Extract 1 - Streams & Web Clients

The Event Loop, Using the Loop, Executing Tasks in Processes, Computing Pi With asyncio, Network Functions,
Transports and Protocols, A UDP Server, A UDP Client, Broadcast UDP, Sockets, Event Loop Implementation, What Makes a Good Async Operation, Summary.

Appendix I Python in Visual Studio Code

 

picobook

 



 

Comments




or email your comment to: comments@i-programmer.info

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

<ASIN:1871962765>

<ASIN:1871962749>

<ASIN:1871962595>



Last Updated ( Wednesday, 30 November 2022 )