P2P File Sharing

Peer-to-Peer file sharing is an exciting technology for delivering big content files to users who are distributed around a network. In contrast to traditional file servers, where large amounts of bandwidth are consumed to send out files from a central location, each P2P client (i.e., a user’s PC with client software running) obtains files from other clients who already have copies of the files. This essentially eliminates the need for central file servers—all that is needed is a way for clients to locate other clients who have the files that they need and a method to download those files from the other clients.

Unfortunately, in the world of professional media, P2P has a less than perfect reputation. This is due to some of the applications that have been operated over P2P networks, including the original Napster, eDonkey and Grokster, to name a few. Although these technologies had legitimate applications, many users employed these networks for sharing unauthorized copies of copyrighted material, particularly music ripped from CDs. As a result of some major court cases earlier this decade, the operators of these networks were found to be liable for damages, and all of the aforementioned networks have been shut down by lawsuits.

(click thumbnail)Fig. 1: P2P versus Client ServerRecently, however, the technology has received somewhat of a reprieve, due to legitimate applications powered by applications such at BitTorrent. This particular technology has even been embraced by many of the large Hollywood studios for distributing their online video content. P2P’s bad reputation is truly unfortunate, because the technology can be a highly efficient delivery system for large media files.

HOW P2P WORKS

The concept of P2P is quite simple, as shown in Fig. 1. A traditional client-server architecture is shown on the left side of the illustration. In this setup, each client must contact the server for any and all of the files that it requires, operating just like a standard PC browser connecting to a Web server. For simple Web pages stored in small files this is not a problem, since the chances of all the clients requesting a file at the same time are low. However, when clients request large media files that may each require multiple minutes to download, the chances of contention for the server resources are greatly increased. P2P was designed to address this problem.

On the right side of Fig. 1, a P2P network is shown, with clients able to obtain the files they require from any other client that already has a copy of the file. This essentially eliminates the bottleneck of the server, since clients have a choice of where to obtain their files. If one source is busy or offline, another source can be chosen. Also, the process for downloading the files in P2P can use the same array of protocols used by client-server systems.

FINDING FILES

One of the big problems for any P2P network is giving clients the ability to locate the content that they need. Since files can be located on any other client in the network, an efficient method of locating them is required for the system to perform well.

In first-generation P2P networks like Napster, a centralized file registry was used, which contained a set of servers with information about the locations of files on all of the clients that were actively online. When a client wanted to download a particular file, it would contact the central server, which would then give out a list of other clients that already had the file. The requesting client would select one of the listed clients and then attempt to make contact and download the file from that client. This scheme worked fairly well, but had some drawbacks. First, a receiving client needed to maintain contact with the sending client for whatever time was required to download the file; if contact was broken, then the process would have to be repeated, and possibly with another client. Second, the central server was a single point of vulnerability for the network; when legal challenges finally forced Napster’s central server to be shut down, the network ceased to function.

Second-generation P2P networks have made a jump forward by eliminating the central server. Instead, each client is required to know the IP address of at least one other client. Whenever a user decides to look for a file, a request is issued from that user’s client to any or all of the other clients that it knows about. If one of those clients has the desired file, then the requesting client can download the file directly from that client. If not, then the clients forward the requests to other clients that they are in contact with, and the request propagates through the network. As soon as a source is found for the content, the requesting client can then begin a download. If multiple sources of the file are located, then the requesting client can choose one of the sources. The big benefit of eliminating the central server is the removal of a single point of vulnerability for the P2P system.

BitTorrent takes a different approach, wherein the content files are broken up into smaller pieces before they are distributed. To find content, a user navigates with a standard Web browser to find a Web site that describes a desired piece of content, which is represented by a “.torrent” file. The user downloads this file, which is not the content; rather it is metadata about the pieces that make up the content file. The user’s BitTorrent application then sends the .torrent file to a specified “tracker,” which is a form of server that tracks all of the clients that have received pieces of the file. The tracker responds to the requesting client with a list that gives a location for each piece of the file that is needed to make up the whole content file. The client is then able to download all of the content pieces from the listed locations, assemble them, and play the content for the user. These tracker systems could be seen as a single point of failure, but for content owners they provide a benefit—they keep track of which clients have which files.

ISP HEADACHES

All P2P networks can drive significant loads on Internet service provider networks, because files that are transferred to a user’s client have to come from somewhere. In other words, whenever a device is receiving data, another device on the network has to be sending that same data. (This wouldn’t be true if the Internet was multicast enabled, but sadly it isn’t.) Unfortunately for ISPs, both DSL and cable modem technologies are asymmetrical, with more bandwidth in the forward direction (to users) than in the reverse path (from users). Whenever a client is sourcing data, the reverse path from that client can be filled with traffic.

BitTorrent can also create significant bandwidth spikes on forward paths that connect to clients, because multiple clients can simultaneously send data to a single client. This helps speed downloads for that client and helps to distribute popular files rapidly throughout the network. Of course, when this happens, each source client’s return path becomes saturated, and the forward path to the receiving client can also become swamped with data.

Overall, the reasons why content providers and users like P2P are pretty clear: faster downloads, and lower bandwidth charges for the content providers’ servers. It’s also pretty clear why P2P is a headache for ISPs—heavy loads on the low bandwidth side of their user connections, and traffic spikes in the forward path. Engineering a network to serve both standard client-server users as well as P2P users is a substantial engineering challenge indeed.

Recommended reading