Programmer's Python Async - Streams & Web Clients
Written by Mike James   
Monday, 07 November 2022
Article Index
Programmer's Python Async - Streams & Web Clients
StreamWriter
The Response

StreamWriter

The two key methods used for writing data are:

  • write(data) - attempts to write the data to the socket, if that fails, the data is queued in an internal write buffer until it can be sent.

  • writelines(data)- writes a list (or any iterable) of bytes to the underlying socket immediately. If that fails, the data is queued in an internal write buffer until it can be sent.

Neither of these is a coroutine as they both always return immediately. However, the drain() coroutine, which waits until it is appropriate to resume writing to the stream, should be called after each write operation, for example:

writer.write(data)
await writer.drain()

The logic of this is that there is no point in performing another write if there is not enough space in the buffer. Instead a better option is to wait until the data has drained out of the buffer, hence the name of the coroutine, and the main thread is released to do something else.

The close() method closes both the stream and the underlying socket and should be used along with the wait_closed() coroutine:

stream.close()
await stream.wait_closed()

The logic is that there is no point in carrying on until the stream has been closed and so you might as well free the main thread. You can also use is_closing() to test whether the stream is closed or is in the process of closing.

The write_eof() method sends the EOF signal to the reader. Not all streams support the EOF signal so use it in conjunction with the can_write_eof() coroutine, which returns True if the underlying transport supports the write_eof() method and False otherwise.

There are also two lower level methods:

  • get_extra_info(name, default = None) - accesses optional transport information

  • transport – returns the underlying asyncio transport

Notice that the methods provided by StreamWriter are mostly not coroutines. The reason for this is that write methods generally return at once because the data is either immediately sent via the socket or placed in a buffer to be sent over the socket connection. This means there is usually no reason for them to free the main thread as they don’t block. However, as already mentioned, you should use the drain() coroutine to check that there is space in the buffer so that the write operations don’t have to wait.

pythonAsync180

Downloading a Web Page

We have already used the request module to download a web page asynchronously using multiple threads. While the request module isn’t suitable for use with asyncio, it is fairly straightforward to modify it to work asynchronously, see later. There is also a module, aiohttp, based on asyncio that lets you work at a higher level. However, using streams is easy and instructive.

First we need a coroutine that downloads a web page. This starts by parsing the url and making a connection to the web server:

async def download(url):
    url = urllib.parse.urlsplit(url)
    reader, writer = await asyncio.open_connection(
                                     url.hostname, 80)

Of course, most web servers are on port 80, as specified in the open, but this does vary. If you want to use https then change the open_connection to:

reader, writer = await asyncio.open_connection(
url.hostname, 443,ssl = True)

This provides basic SSL security. If you need to specify the certificate to be used or check the server certificate then you need to look into the ssl module.

Now we have a bidirectional TCP connection to the server and a reader and writer ready to send and receive data. What data we actually send and receive depends on the protocol in use. Web servers use HTTP, which is a very simple text-based protocol.

The HTTP protocol is essentially a set of text headers of the form:

headername: headerdata \r\n

that tell the server what to do, and a set of headers that the server sends back to tell you what it has done. You can look up the details of HTTP headers in the documentation – there are a lot of them.

The most basic transaction the client can have with the server is to send a GET request for the server to send back a particular file. Thus the simplest header is:

"GET /index.html HTTP/1.1\r\n\r\n"

which is a request for the server to send index.html. In most cases we need one more header, HOST, which gives the domain name of the server. Why do we need it? Simply because HTTP says you should and many websites are hosted by a single server at the same IP address. Which website the server retrieves the file from is governed by the domain name you specify in the HOST header.

This means that the simplest set of headers we can send the server is:

"GET /index.htm HTTP/1.1\r\nHOST:example.org\r\n\r\n";

which corresponds to the headers:

GET /index.html HTTP/1.1
HOST:example.org

An HTTP request always ends with a blank line. If you don't send the blank line then you will get no response from most servers. In addition, the HOST header has to have the domain name with no additional syntax - no slashes and no http: or similar. We can use Python’s f strings and automatic concatenation, topics covered in Programmer’s Python: Everything Is Data, to create the header data:

    request = (
        f"GET /index.html HTTP/1.1\r\n"
        f"Host: {url.hostname}\r\n"
        f"\r\n"
    )

Now we are ready to send our request to the server:

writer.write(request.encode('ascii'))

Notice that we specify the encoding as ascii because headers are only allowed to contain ASCII characters.



Last Updated ( Tuesday, 08 November 2022 )