Programmer's Python Async - Streams & Web Clients
Written by Mike James   
Monday, 07 November 2022
Article Index
Programmer's Python Async - Streams & Web Clients
StreamWriter
The Response

When the server receives the GET request it finds the specified file and sends it to the client using the same socket connection. The first part of the message sent to the client is a set of headers which we need to read and process. The first line of any response is always:

HTTP/1.1 200 OK\r\n

which gives the HTTP version and the status code which we can assume is going to be 200, i.e. no error. If you want to write a complete client you need to extract the error code and react to it. In our simple demonstration we can read it and ignore it:

    headers=""
    line = await reader.readline()

Next we need to read the headers that the server has sent. These arrive one to a line and the end is marked by a blank like, just like the headers we sent to the server:

    while True:
        line = await reader.readline()
        line = line.decode('ascii')
        if line=="\r\n":
break headers+=line

This loop reads each line in turn, converts it to a Python string using ASCII encoding and builds up a complete string of headers. The loop ends when we read a blank line.

We need to process the headers because the Content-Length header tells us how many bytes to read to get the content, i.e. the HTML that makes up the page. We need this because we cannot read data expecting an EOF signal, because there isn’t one. The socket stays open in case you have another request to send to the server. If you do wait for an EOF then you will usually wait a long time before the server times out.

We need to read the Content-Length header to find the number of bytes to read. We could use some simple string manipulation to extract the header we want, but there is a standard way to parse HTTP headers even if it is obscure because it is part of the email module. It turns out the emails use HTTP as their protocol and hence you can use email.message_from_string to parse HTTP headers:

def parseHeaders(headers):
    message = email.message_from_string(headers)
    return dict(message.items())

This utility function returns all of the headers as a dictionary keyed on the header names with values of the strings they are set to. Now we can use this to get the Content-Length header:

  headers = parseHeaders(headers)
  length = int(headers["Content-Length"])

As we now know the number of characters to read the rest of the procedure is simple:

    line = await reader.read(length)
    line = line.decode('utf8')  
    writer.close()
    await writer.wait_closed()
    return line

This time we decode the content using utf8 because this is what most modern web pages use for their content. To check, we should decode the Content-Type header which in this case reads:

Content-Type: text/html; charset=UTF-8

So the content is HTML and it is UTF-8 encoded.

To demonstrate all of this we need a coroutine to start things off:

async def main():
    start = time.perf_counter()
    results = await asyncio.gather(
              download('http://www.example.com/'), 
download('http://www.example.com/'))
end = time.perf_counter()
print((end-start)*1000)
print(results[0][:25])
asyncio.run(main())

This creates two tasks to download the same page, starts them both off asynchronously and waits for them to complete. Whenever one of the tasks has to wait for data to be available it releases the main thread and the other gets a chance to run and so on. As a result main mostly has little to do and you can increase the number of downloads without increasing the time it takes by much. For example, adding an additional download on a test machine to the asynchronous program increases the time it takes by about 30 ms, whereas for a synchronous program it adds 220 ms. This means that downloading 100 pages takes about 3 seconds asynchronously, but 21 seconds doing the job synchronously.

The complete program is:

import asyncio
import urllib.parse
import time
import email
def parseHeaders(headers):
message = email.message_from_string(headers)
return dict(message.items())
async def download(url):
url = urllib.parse.urlsplit(url)
reader, writer = await asyncio.open_connection(
url.hostname, 443,ssl=True)
request = (
f"GET /index.html HTTP/1.1\r\n"
f"Host: {url.hostname}\r\n"
f"\r\n"
)

writer.write(request.encode('ascii'))
headers = ""
line = await reader.readline()
while True:
line = await reader.readline()
line = line.decode('ascii')
if line == "\r\n":
break
headers += line
headers = parseHeaders(headers)
length = int(headers["Content-Length"])
line = await reader.read(length)
line = line.decode('utf8')
writer.close()
await writer.wait_closed()
return line
async def main():
start = time.perf_counter()
results = await asyncio.gather(
download('http://www.example.com/'),
download('http://www.example.com/'))
end = time.perf_counter()
print((end-start)*1000)
print(results[0][:25])
asyncio.run(main())

In chapter but not in this extract

  • Server
  • A Web Server
  • SSL Server
  •  Using Streams
  • Converting Blocking To Non-blocking
  • Running in Threads
  • Why Not Just Use Threads?
  • CPU-Bound Tasks
  • Asyncio-Based Modules
  • Working With Other Event Loops – Tkinter
  • Subprocesses

 Summary

  • The asyncio module makes network connections easy and asynchronous.

  • Network communication is via streams – StreamReader and StreamWriter - which work like more sophisticated Pipes.

  • Implementing a web client is easy, but there is no high-level function which downloads an HTML page. You have to work with the HTTP protocol.

  • The email module has many useful functions for working with HTTP.

  • Creating an SSL client is a matter of changing a single line in the program.

  • Creating a web server is only slightly more difficult in that you have to support multiple potential clients.

  • Converting the server to SSL requires the generation and installation of a certificate.

  • You can use raw sockets which do not support streams. The only reason for doing this is to implement a custom protocol.

  • To convert a blocking synchronous function into a non-blocking asynchronous function all you have to do is run it on another thread and release the original thread to service the event loop.

  • The asyncio module provides a function that allows you to run a function on another thread asynchronously.

  • You can use additional threads to run CPU-bound functions asynchronously.

  • There are additional modules that provide asynchronous versions of standard operations, usually by running them on an additional thread.

  • A particular problem is coexisting with modules that implement their own event loop such as tkinter. There are two approaches – to find an update function which can be called from an asyncio event loop or to use a separate thread to run each event loop.

  • The asyncio module provides a very easy way to run subprocesses without having to worry about blocking the thread or dealing with buffers.

Programmer's Python:
Async
Threads, processes, asyncio & more

Is now available as a print book: Amazon

pythonAsync360Contents

1)  A Lightning Tour of Python.

2) Asynchronous Explained

3) Processed-Based Parallelism
         Extract 1 Process Based Parallism
4) Threads
         Extract 1 -- Threads
5) Locks and Deadlock

6) Synchronization

7) Sharing Data
        Extract 1 - Pipes & Queues

8) The Process Pool
        Extract 1 -The Process Pool 1 

9) Process Managers

10) Subprocesses ***NEW!

11) Futures
        Extract 1 Futures

12) Basic Asyncio
        Extract 1 Basic Asyncio

13) Using asyncio
        Extract 1 Asyncio Web Client
14) The Low-Level API
       Extract 1 - Streams & Web Clients
Appendix I Python in Visual Studio Code

 

espbook

 

Comments




or email your comment to: comments@i-programmer.info

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

<ASIN:1871962765>

<ASIN:1871962749>

<ASIN:1871962595>



Last Updated ( Tuesday, 08 November 2022 )