Programmer's Python Async - Streams & Web Clients
Written by Mike James   
Monday, 07 November 2022
Article Index
Programmer's Python Async - Streams & Web Clients
StreamWriter
The Response

When the server receives the GET request it finds the specified file and sends it to the client using the same socket connection. The first part of the message sent to the client is a set of headers which we need to read and process. The first line of any response is always:

HTTP/1.1 200 OK\r\n

which gives the HTTP version and the status code which we can assume is going to be 200, i.e. no error. If you want to write a complete client you need to extract the error code and react to it. In our simple demonstration we can read it and ignore it:

    headers=""
    line = await reader.readline()

Next we need to read the headers that the server has sent. These arrive one to a line and the end is marked by a blank like, just like the headers we sent to the server:

    while True:
        line = await reader.readline()
        line = line.decode('ascii')
        if line=="\r\n":
break headers+=line

This loop reads each line in turn, converts it to a Python string using ASCII encoding and builds up a complete string of headers. The loop ends when we read a blank line.

We need to process the headers because the Content-Length header tells us how many bytes to read to get the content, i.e. the HTML that makes up the page. We need this because we cannot read data expecting an EOF signal, because there isn’t one. The socket stays open in case you have another request to send to the server. If you do wait for an EOF then you will usually wait a long time before the server times out.

We need to read the Content-Length header to find the number of bytes to read. We could use some simple string manipulation to extract the header we want, but there is a standard way to parse HTTP headers even if it is obscure because it is part of the email module. It turns out the emails use HTTP as their protocol and hence you can use email.message_from_string to parse HTTP headers:

def parseHeaders(headers):
    message = email.message_from_string(headers)
    return dict(message.items())

This utility function returns all of the headers as a dictionary keyed on the header names with values of the strings they are set to. Now we can use this to get the Content-Length header:

  headers = parseHeaders(headers)
  length = int(headers["Content-Length"])

As we now know the number of characters to read the rest of the procedure is simple:

    line = await reader.read(length)
    line = line.decode('utf8')  
    writer.close()
    await writer.wait_closed()
    return line

This time we decode the content using utf8 because this is what most modern web pages use for their content. To check, we should decode the Content-Type header which in this case reads:

Content-Type: text/html; charset=UTF-8

So the content is HTML and it is UTF-8 encoded.

To demonstrate all of this we need a coroutine to start things off:

async def main():
    start = time.perf_counter()
    results = await asyncio.gather(
              download('http://www.example.com/'), 
download('http://www.example.com/'))
end = time.perf_counter()
print((end-start)*1000)
print(results[0][:25])
asyncio.run(main())

This creates two tasks to download the same page, starts them both off asynchronously and waits for them to complete. Whenever one of the tasks has to wait for data to be available it releases the main thread and the other gets a chance to run and so on. As a result main mostly has little to do and you can increase the number of downloads without increasing the time it takes by much. For example, adding an additional download on a test machine to the asynchronous program increases the time it takes by about 30 ms, whereas for a synchronous program it adds 220 ms. This means that downloading 100 pages takes about 3 seconds asynchronously, but 21 seconds doing the job synchronously.

The complete program is:

import asyncio
import urllib.parse
import time
import email
def parseHeaders(headers):
message = email.message_from_string(headers)
return dict(message.items())
async def download(url):
url = urllib.parse.urlsplit(url)
reader, writer = await asyncio.open_connection(
url.hostname, 443,ssl=True)
request = (
f"GET /index.html HTTP/1.1\r\n"
f"Host: {url.hostname}\r\n"
f"\r\n"
)

writer.write(request.encode('ascii'))
headers = ""
line = await reader.readline()
while True:
line = await reader.readline()
line = line.decode('ascii')
if line == "\r\n":
break
headers += line
headers = parseHeaders(headers)
length = int(headers["Content-Length"])
line = await reader.read(length)
line = line.decode('utf8')
writer.close()
await writer.wait_closed()
return line
async def main():
start = time.perf_counter()
results = await asyncio.gather(
download('http://www.example.com/'),
download('http://www.example.com/'))
end = time.perf_counter()
print((end-start)*1000)
print(results[0][:25])
asyncio.run(main())

In chapter but not in this extract

  • Server
  • A Web Server
  • SSL Server
  •  Using Streams
  • Converting Blocking To Non-blocking
  • Running in Threads
  • Why Not Just Use Threads?
  • CPU-Bound Tasks
  • Asyncio-Based Modules
  • Working With Other Event Loops – Tkinter
  • Subprocesses

 Summary

  • The asyncio module makes network connections easy and asynchronous.

  • Network communication is via streams – StreamReader and StreamWriter - which work like more sophisticated Pipes.

  • Implementing a web client is easy, but there is no high-level function which downloads an HTML page. You have to work with the HTTP protocol.

  • The email module has many useful functions for working with HTTP.

  • Creating an SSL client is a matter of changing a single line in the program.

  • Creating a web server is only slightly more difficult in that you have to support multiple potential clients.

  • Converting the server to SSL requires the generation and installation of a certificate.

  • You can use raw sockets which do not support streams. The only reason for doing this is to implement a custom protocol.

  • To convert a blocking synchronous function into a non-blocking asynchronous function all you have to do is run it on another thread and release the original thread to service the event loop.

  • The asyncio module provides a function that allows you to run a function on another thread asynchronously.

  • You can use additional threads to run CPU-bound functions asynchronously.

  • There are additional modules that provide asynchronous versions of standard operations, usually by running them on an additional thread.

  • A particular problem is coexisting with modules that implement their own event loop such as tkinter. There are two approaches – to find an update function which can be called from an asyncio event loop or to use a separate thread to run each event loop.

  • The asyncio module provides a very easy way to run subprocesses without having to worry about blocking the thread or dealing with buffers.

Programmer's Python:
Async
Threads, processes, asyncio & more

Is now available as a print book: Amazon

pythonAsync360Contents

1)  A Lightning Tour of Python

Python's Origins, Basic Python, Data Structures, Control Structures – Loops, Space Matters, Conditionals and Indenting, Pattern Matching, Everything Is An Object – References, Functions , Objects and Classes, Inheritance, Main and Modules, IDEs for Python, Pythonic – The Meta Philosophy, Where Next, Summary.

2) Asynchronous Explained

A Single Thread, Processes, I/O-Bound and CPU-Bound, Threads, Locking, Deadlock, Processes with Multiple Threads, Single-Threaded Async, Events,,Events or Threads, Callback Hell, More Than One CPU – Concurrency, Summary.

3) Processed-Based Parallelism

        Extract 1 - Process Based Parallism

The Process Class, Daemon, Waiting for Processes, Waiting for the First to Complete, Computing Pi, Fork v Spawn, Forkserve, Controlling Start Method, Summary.

4) Threads

The Thread Class, Threads and the GIL, Threading Utilities, Daemon Threads, Waiting for a Thread, Local Variables, Thread Local Storage, Computing Pi with Multiple Threads, I/O-Bound Threads, Sleep(0), Timer Object, Summary.

5) Locks and Deadlock

Race Conditions, Hardware Problem or Heisenbug, Locks, Locks and Processes, Deadlock, Context Managed Locks, Recursive Lock, Semaphore, Atomic Operations, Atomic CPython, Lock-Free Code, Computing Pi Using Locks, Summary.

 6) Synchronization

Join, First To Finish, Events, Barrier, Condition Object, The Universal Condition Object, Summary.

 7) Sharing Data

The Queue, Pipes, Queues for Threads, Shared Memory,  Shared ctypes, Raw Shared Memory, Shared Memory, Manager, Computing Pi , Summary.

8) The Process Pool

Waiting for Pool Processes, Computing Pi using AsyncResult, Map_async, Starmap_async, Immediate Results – imap, MapReduce, Sharing and Locking, Summary.

9) Process Managers

The SyncManager, How Proxies Work, Locking, Computing Pi with a Manager, Custom Managers, A Custom Data Type, The BaseProxy, A Property Proxy, Remote Managers, A Remote Procedure Call, Final Thoughts, Summary.

10) Subprocesses

Running a program, Input/Output, Popen, Interaction, Non-Blocking Read Pipe, Using subprocess, Summary.

11) Futures

Futures, Executors, I/O-Bound Example, Waiting On Futures, Future Done Callbacks, Dealing With Exceptions, Locking and Sharing Data, Locking and Process Parameters, Using initializer to Create Shared Globals, Using a Process Manager to Share Resources, Sharing Futures and Deadlock, Computing Pi with Futures, Process Pool or Concurrent Futures, Summary.

12) Basic Asyncio

        Extract 1 Basic Asyncio

Callbacks, Futures and Await, Coroutines, Await, Awaiting Sleep, Tasks, Execution Order, Tasks and Futures, Waiting On Coroutines, Sequential and Concurrent, Canceling Tasks, Dealing With Exceptions, Shared Variables and Locks, Context Variables, Queues, Summary.

13) Using asyncio

Streams, Downloading a Web Page, Server, A Web Server, SSL Server, Using Streams, Converting Blocking To Non-blocking, Running in Threads, Why Not Just Use Threads, CPU-Bound Tasks, Asyncio-Based Modules, Working With Other Event Loops – Tkinter, Subprocesses, Summary.

14) The Low-Level API

    Extract 1 - Streams & Web Clients

The Event Loop, Using the Loop, Executing Tasks in Processes, Computing Pi With asyncio, Network Functions,
Transports and Protocols, A UDP Server, A UDP Client, Broadcast UDP, Sockets, Event Loop Implementation, What Makes a Good Async Operation, Summary.

Appendix I Python in Visual Studio Code

 

picobook

 



 

Comments




or email your comment to: comments@i-programmer.info

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

<ASIN:1871962765>

<ASIN:1871962749>

<ASIN:1871962595>



Last Updated ( Tuesday, 08 November 2022 )