C Sockets - No Need For A Web Server!

Written by Mike James

Monday, 15 August 2016

Article Index
C Sockets - No Need For A Web Server!
Web Client
Web Server

Page 2 of 3

The actual address is defined in the in_addr field - this is a struct with only one field you should use and rely on s_addr. This is a 32 bit representation of an IP address. The format is fairly simple. Regard the 32 bit value as four bytes with each byte coding one value of the "dotted" IP address. That is if the IP address is w.x.y.z then w, x, y and z are the bytes of s_addr. For example the IP address of example.com is 93.184.216.34 and converting each value into its byte equivalent in hex gives 5d.b8.d8.22 which would be the hex value we have to store in addr if it wasn't for the fact that the bytes are stored in reverse order. That is the hex equivalent of the IP address is 0x22d8b85d and this is used to initialize the address struct:

addr.sin_addr.s_addr = 0x22d8b85d;

To make all this work you need to add:

#include <sys/types.h>

and

#include <netinet/in.h>

With the address worked out and safely stored we can now make the connection:

connect(sockfd, &addr, sizeof (addr));

This will return zero if it does connect and we do need to test this. You will also get a type warning because the pointer to the addr structure isn't the as defined in the function. In fact there are many variations on the addr structure which you could pass and it is the standard idiom to cast them to the function's pointer type:

connect(sockfd, (struct sockaddr *) &addr, sizeof (addr)

Finally we need to check for an error:

if (connect(sockfd, (struct sockaddr *) &addr, sizeof (addr)) < 0)return -1;

As long as there is no error then we can start to send and receive data - but what data?

The answer is that it all depends on the protocol you are using. There is nothing about a socket that tells you what to send. It is a completely general I/O mechanism. You can send anything but if you don't send what the server is expecting you wont get very far.

The web uses the HTTP protocol and this is essentially a set of text formatted headers that tell the server what to do and a set of headers that the server sends back to tell you what it has done.

The most basic transaction the client can have with the server is to send a GET request for the server to send a particular file. Thus the simplest header is:

char header[] = "GET /index.html HTTP/1.1\r\n\r\n";

which is a request for the server to send index.html. However, in most cases we do need one more header, HOST, which gives the domain name of the server. Why do we need to do this? Simply because HTTP says you should and many web sites are hosted by a single server at the same IP address. Which website the server retrieves the file from is governed by the domain name you specify in the HOST header.

This means that the simplest set of headers we can send the sever is:

char header[] = "GET /index.html HTTP/1.1\r\nHOST:example.org\r\n\r\n";

which corresponds to the headers

GET /index.html HTTP/1.1 HOST:example.org

An HTTP request always ends with a blank line. If you don't send the blank line then you will get no response from most servers. In addition the HOST header has to have the domain name with no additional syntax - no slashes and no http: or similar.

With the headers defined we can send our first HTTP request using write as if the socket was just another file to write data to:

int n = write(sockfd, header, strlen(header));

and of course to use the strlen function we need to add

#include <string.h>

The server receives the HTTP request and should respond by sending the data corresponding to the file specified i.e. index.html. We can read the response just as if the socket was a file:

char buffer[2048]; n = read(sockfd, buffer, 2048); printf("%s", buffer);

You can make this more complicated by checking the number of bytes read and reading more if the buffer was full but this is a simple and direct way to get the HTML. In fact you get more than the HTML as you get the entire HTTP response including the response headers:

HTTP/1.1 200 OK Cache-Control: max-age=604800 Content-Type: text/html Date: Sun, 14 Aug 2016 15:30:44 GMT Etag: "359670651+gzip+ident" Expires: Sun, 21 Aug 2016 15:30:44 GMT Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT Server: ECS (ewr/15F9) Vary: Accept-Encoding X-Cache: HIT x-ec-custom-error: 1 Content-Length: 1270 <!doctype html><html> <head>

and so on...

Notice the blank line marking the end of the header - the data payload follows the header.

The complete program is:

#include <stdio.h>#include <stdlib.h>#include <sys/socket.h>#include <string.h> #include <sys/types.h>#include <netinet/in.h>int main(int argc, char** argv) { int sockfd = socket(AF_INET, SOCK_STREAM, 0); struct sockaddr_in addr;addr.sin_family = AF_INET; addr.sin_port = htons(80); addr.sin_addr.s_addr = 0x22d8b85d; if (connect(sockfd, (struct sockaddr *) &addr, sizeof (addr)) < 0)return -1; char header[] = "GET /index.html HTTP/1.1\r\n Host:example.org\r\n\r\n"; int n = write(sockfd, header, strlen(header)); char buffer[2048]; n = read(sockfd, buffer, 2048); printf("%s", buffer); return (EXIT_SUCCESS); }

Of course we can do much better than this simple example. For one thing each socket operation needs to be checked for errors - we only check for the mostly likely error that the sever refuses the connection.

Connecting Using A URL

There is also utility function that will perform DNS lookup for you or convert an IP address specified so you don't need to specify an IP address struct.

Surprisingly this is almost an easier way to do things and it has become the standard way to set up a socket.

The getaddrinfo function not only looks up the URL using DNS it also constructs all of the structs you need to open a socket and connect. It will also return as many address specifications as you request IPv4 and IPv6 for example.

The function specification is:

int getaddrinfo(const char *node, const char *service, const struct addrinfo *hints, struct addrinfo **res);

and you need to add

#include <netdb.h>

You pass it the IP address or the DNS name i.e. either " 93.184.216.34" or "www.example.com" as node. The service can be specified as a port address "80" or as a service name "http". The hints struct is used to specify what sort of socket and address you are going to use. The result is a linked list of structs pointed at by addrinfo. The only slightly complication in using getaddrinfo is that you might have more than one result - one of IPv4 say and one for IPv6 say and then you have to work out which one to actually use. The result structure contains structs that you need to both open the socket and to connect.

For example if you set up the hints as:

struct addrinfo hints; memset(&hints, 0, sizeof hints); hints.ai_family = AF_INET ; hints.ai_socktype = SOCK_STREAM;

This asks for structs to be made for a TCP IPv4 socket. We can now get the address details we need:

struct addrinfo *servinfo; int status = getaddrinfo("www.example.com", "80", &hints, &servinfo);

Notice that you could use the IP address as a string.

As long as this works the result should be a linked list with a single entry. In this case servinfo points to the the first and only addrinfo struct. If there are any additional structs they are pointed at by

servinfo->next

which is NULL if there is no next struct.

Using the single result is easy. To create the socket we use:

int sockfd = socket(servinfo->ai_family, servinfo->ai_socktype, servinfo->ai_protocol);

and to connect to the server we use:

connect(sockfd, servinfo->ai_addr, servinfo->ai_addrlen);

This is so much simpler that you tend to fall into the idiom of writing:

struct addrinfo hints; memset(&hints, 0, sizeof hints); hints.ai_family = AF_INET ; hints.ai_socktype = SOCK_STREAM;struct addrinfo *servinfo; int status = getaddrinfo("www.example.com", "80", &hints, &servinfo); int sockfd = socket(servinfo->ai_family, servinfo->ai_socktype, servinfo->ai_protocol);connect(sockfd, servinfo->ai_addr, servinfo->ai_addrlen);

any time you need a socket connected to a given URL or IP address and port.

The only minor complication is that you need to remember to free the linked list once you are finished with it using:

freeaddrinfo(servinfo);

operators

<< Prev - Next >>

Last Updated ( Monday, 15 August 2016 )