Peer-to-Peer file sharing
Written by Administrator   
Tuesday, 09 March 2010
Article Index
Peer-to-Peer file sharing
Napster and Gnuttella
BitTorrent

Banner

BitTorrent

Today one of the most used decentralised P2P networks is BitTorrent. This makes use of the same basic ideas but uses jargon which is becoming standard for P2P in general. The first machine to share a file is called a "seed". The seeded complete file is then used supply other machines, peers, to download portions of the file on the way to acquiring the whole file. As each machine downloads a portion that portion becomes a download target. This means that even though no machine in a set of peers has the entire file they can still down load a complete copy to another peer by each contributing a few portions. Any peer that does eventually acquire a complete copy of the file becomes an additional seed.

The first seed creates a metadata file or a torrent file which describes the file to be shared. This is uploaded to a tracker - a server which co-ordinates file distribution. Any peer that wants to download the file first has to acquire the appropriate torrent file from a tracker which also informs it of seeds and other peers with file fragments waiting to be downloaded. The need for a tracker, i.e. a central server, can be avoided by using a modification to the protocol in which every peer acts as a tracker.

Faster P2P

The key to making a P2P network fast is to have multiple copies of files on different machines and connect clients to machines that aren't heavily loaded. At the next level of sophistication it should be possible to dynamically move the download of a file from one machine to another. However this implies that the files stored on different machines are exact byte for byte copies. How can you be sure that two files are the same? If there are two video files called "Star Trek" they might be the same movie but recorded at different standards or with different editing.

The solution is to compute checksums or hash values on similarly named files. A hash value is computed using every byte in the file and if two files are exactly the same they always give the same hash value. If two files are different then the probability that they have different hash values is very high. Using this method the system can quickly, but not perfectly, determine if two files can be used as if they were the same. Indeed some networks, eDonkey for example, treat two files with the same hash but different names as identical and two files with the same names but different hash as different.

Another technique to speed things up is to allow a file to be downloaded by another user before its download has completed - see the description of BitTorrent above.

These, and even more advanced methods of automatically organising P2P networks, are likely to be the main areas of development in the future and surprisingly they are coming from commercial attempts to make P2P a mainstream computing technique.

If you would like to be informed about new articles on I Programmer you can either follow us on Twitter, on Facebook , on Digg or you can subscribe to our weekly newsletter.

Banner


The Fundamentals of Pointers

Despite the fact that pointers have been long regarded as "dangerous" they are still deeply embedded in the way we do things. Much of the difficulty in using them stems from not understanding where th [ ... ]



Principles Of Execution - The CPU

The real complexity of any computer system resides in the processor, but do you know how it works? I mean how it really works? How does the code that you write turn into something that does something? [ ... ]


Other Articles

<ASIN:0415952980>

<ASIN:0470699922>

<ASIN:613020437X>

<ASIN:0072192844>

<ASIN:0782140181>

Peer-to-Peer file sharing

 

 

Peer-to-Peer (P2P) file sharing is both a technology and a legal, if not moral, battle. There are good technical reasons based on efficiency and making best use of networked resources for wanting to build P2P systems. It’s an important new way of doing things and companies like IBM and even Microsoft have P2P software that you can use. Apart from its technical advantages P2P manages to get through some legal loopholes in the copyright law and surprisingly this has, in turn, influenced the technology. So what is it that makes P2P so special?

 

File sharing

 

The traditional method of sharing files, or any resource, via a network is to use a machine as a central server. That is, a single machine is dedicated to the task of storing files and making them available to any valid clients. The server is not only responsible for looking after the files but for checking that a client has permission to access the files and then delivering them. A typical example of a central file server is a web server and this illustrates most of the advantages of the approach. Users know where to find the server and they can easily find out what is stored on it. The disadvantage is that each client that tries to download a file, or a web page in this case, adds an extra load on the server. In addition all of the data has to flow to and from the server and this creates a communications bottleneck. This might not be a problem when there are only a few clients but as the number of simultaneous requests for web pages or files in general grows there comes a point where the server cannot cope. Trying to increase the performance of a central server is a difficult task and we say that the approach doesn’t “scale” well.

 

**FIG1.TIF
A central file server

 

For a solution that scales well we have to look to P2P architectures that get the same job done. The basic idea of P2P is simple – do away with a central server and let all of the machines store files that the other machines want to gain access to. In this way when a client wants to download a file it doesn’t always get it from same machine. This means that different parts of the network can share the load of lots of clients trying to download files. Of course now each client has the task of finding the file and a suitable download location. This is the real problem that has to be solved by any P2P system and strangely enough it’s the one that the law has something to say about.

 

The Napster solution

 

Napster (1999) was a pioneering P2P file sharing system and it was built with one idea in mind – to freely share music clips. Individual people stored audio files, typically MP3 files, on their own hard disks and then shared them directly with other people. The Napster client made it possible to find and download music that other people offered and to share music on your own hard disk at the same time. The problem of locating a file and suitable server was solved by the setting up of a central Napster server. When you started the Napster client it connected to the central server and told it what files were available on your machine. When you typed in a query for a piece of music the Napster server listed all the machines that stored the file. You picked a machine and then your PC connected to it and downloaded the music directly.

 

**FIG2.TIF

The Napster P2P

 

Surely this is illegal? Napster was designed to take advantage of a loophole in copyright law. This allows friends to share music with friends and Napster attempted to extend the concept of friendship to include being friends by virtue of being a Napster user. Surely this isn’t P2P because it still makes use of a central server? It is P2P because the central server only holds the catalogue of what is available the file storage and file transfers are very much P2P. However the existence of a central server is a weakness and it eventually led to the downfall of Napster. A Judge finally ruled that Napster’s users were not friends and Napster was guilty of encouraging people to pirate music files. The Napster site was closed down (2001) and so did the entire P2P system. When Napster re-launched itself it was as a pay-for-music site and others took up the challenge of creating P2P networks that were more difficult to eliminate.

 

The Gnuttella way

 

Designed by a group of programmers working for AOL and posted on their website for free download the Gnuttella network soon became a success. Even when AOL took the offending software off its servers Gnutella continued to work and grow because it didn’t suffer from the same flaw that Napster did. Gnutella did away with central servers altogether by keeping the catalogue on the peers. If you search for a file using Gnuttella it passes the name to one other Gnuttella client – it knows the IP address of this machine either because you typed it in or because it was written into the client software. The machine you contact checks to see if it has the file if not it passes the request on to a number of other machines. If any machine has the file it sends a data packet containing the address of the machine that has the file. The request is also passed on to other machines and it “fans out” across the Gnuttella network potentially reaching a huge number of clients. If each node contacts just three other nodes then after propagating the request 10 times over 8000 machines have been queried. The search is generally limited to a given depth by specifying a TTL or Time To Live parameter on the query.

 

**FIG3.TIF
Gnuttella’s search pattern

 

The big advantage of the Gnuttella approach is that there is no central server for any function. All you need to get started and keep working is the IP address of one machine on the Gnuttella network. Addresses of other machines are propagated through the network along with queries. From the clients’ point of view the only disadvantage of the approach is that their machines are being used as part of the catalogue and this takes some of their Internet bandwidth.

 

However a little thought reveals that a bigger problem with this method of searching is that the time it takes to do a search of a reasonable proportion of the files stored on the network increases with the size of the network. As the Gnutella network grew users noticed a slow down and a new design was created, Gnutella2, in an attempt to overcome the problem of a distributed catalogue. The solution was to designate some of the peers as “super nodes” which divided the network up into smaller sub-networks.

 

**FIG4.TIF

The Gnutella 2 solution.

 

Soon after the open source Gnutella 2 or G2 was established the same idea was tried by the FastTrack network better known as Kazaa, which is in fact the name of its browser. FastTrack introduced other innovations such as begin able to pause and restart a download but it isn’t open source and it charges a licence fee to any company wanting to produce a client. FastTrack is a commercial P2P system and it makes money mainly by advertising. To protect its income FastTrack makes use of encryption which it changes at regular intervals to make sure that only licensed clients can connect. It even has files that are “shared” by premium content providers who levy a charge for downloading. This is the most commercial of the P2P networks. Other networks also use a range of enticements to encourage you to share files – usually a points system that entitles you to so much downloading in return for so much uploading.

 

Currently the most sophisticated P2P networks – eDonkey and Overnet, for example - are working to make downloading easier and faster. The key to making a P2P network fast is to have multiple copies of files on different machines and connect clients to machines that aren’t heavily loaded. At the next level of sophistication it should be possible to dynamically move the download of a file from one machine to another. However this implies that the files stored on different machines are exact byte for byte copies.  How can you be sure that two files are the same? If there are two video files called “Star Trek” they might be the same movie but recorded at different standards or with different editing. The solution is to compute checksums or hash values on similarly named files. A hash value is computed using every byte in the file and if two files are exactly the same they always give the same hash value. If two files are different then the probability that they have different hash values is very high. Using this method the system can quickly, but not perfectly, determine if two files can be used as if they were the same. Another technique is to allow a file to be downloaded by another user before its download has completed. These, and even more advanced methods of automatically organising P2P networks, are likely to be the main areas of development in the future and surprisingly they are coming from commercial attempts to make P2P a mainstream computing technique.


Last Updated ( Saturday, 30 October 2010 )