Page 1 of 3 The process pool is the way to make your Python program run faster even if it is CPU bound. Find out how to use the pool in this extract from Programmer's Python: Async.
Programmer's Python: Async Threads, processes, asyncio & more
Is now available as a print book: Amazon
Contents
1) A Lightning Tour of Python.
2) Asynchronous Explained
3) Processed-Based Parallelism Extract 1 Process Based Parallism 4) Threads Extract 1 -- Threads 5) Locks and Deadlock
6) Synchronization
7) Sharing Data Extract 1 - Pipes & Queues
8) The Process Pool Extract 1 -The Process Pool 1
9) Process Managers
10) Subprocesses ***NEW!
11) Futures Extract 1 Futures,
12) Basic Asyncio Extract 1 Basic Asyncio
13) Using asyncio Extract 1 Asyncio Web Client 14) The Low-Level API Extract 1 - Streams & Web Clients Appendix I Python in Visual Studio Code
There are overheads in creating processes and threads and one strategy to reduce the cost of creating them is to use a pool of pre-created items. In most other languages it is the idea of a “thread pool” which is important, but in Python the GIL acts as a deterrent to using many threads. As only one thread can be running Python code at any given time, there isn’t a huge advantage is splitting a program into multiple threads. While Python does support a thread pool class, Threading in multiprocessing.pool, it isn’t much used and the newer and more used thread pool features in concurrent.futures is described in Chapter 11.
As processes provide a way to improve performance, it is the process pool which is more important. You can create a pool of processes ready to perform jobs which you can submit later, using:
multiprocessing.pool.Pool(number, initializer, initargs,
maxtasksperchild, context)
All of the parameters are optional and often all you need to do is to specify the number of processes to create. Notice that, unlike a typical thread pool, this is not a system process pool that you can assume is already constructed. The processes are created for you when you use the Pool constructor and they are destroyed when there is nothing more for them to do or when they have completed maxtasksperchild jobs.
It isn’t a good idea to keep a process around for too long and give it lots of jobs to do because processes tend to accumulate resources which are only freed when the process ends. A good balance of re-use and re-creation of processes is desirable. If you don’t specify the number of processes to create then the number of CPUs as reported by os.cpu_count is used. This makes sense for CPU-bound processes, but is less suitable if the processes perform I/O that they have to wait for.
You can also specify an initializer function which will be called using initargs when the process is started. Notice that this only happens once, even if the process is reused by multiple jobs.
As Pool sets up global resources, you cannot simply allow Python to automatically clean up when the object is garbage collected – you need to explicitly use its close or terminate method to free resources. The difference between the two is that the close method allows the processes to finish what they are doing before closing them and the terminate method stops the processes immediately. After calling terminate or close you can call join to wait for all of the processes to finish. Notice that it doesn’t make sense to call join if you haven’t used close or terminate as the processes don’t necessarily end when their jobs are complete.
The safest way to use Pool is in a with as a context manager:
with multiprocessing.pool.Pool(2) as myPool:
use myPool
Notice that this uses the terminate method when the with ends and this means you need to test that everything has completed before leaving the with.
|