On P2P Networks and P2P-Based Content Discovery on the Internet
MetadataShow full item record
The Internet has evolved into a medium centered around content: people watch videos on YouTube, share their pictures via Flickr, and use Facebook to keep in touch with their friends. Yet, the only globally deployed service to discover content - i.e., Domain Name System (DNS) - does not discover content at all; it merely translates domain names into locations. The lack of persistent naming, in particular, makes content discovery, instead of domain discovery, challenging. Content Distribution Networks (CDNs), which augment DNSs with location-awareness, also suffer from the same problem of lack of persistent content names. Recently, several infrastructure- level solutions to this problem have emerged, but their fundamental limitation is that they fail to preserve the autonomy of network participants. Specifically, the storage requirements for resolution within each participant may not be proportional to their capacity. Furthermore, these solutions cannot be incrementally deployed. To the best of our knowledge, content discovery services based on peer-to-peer (P2P) networks are the only ones that support persistent content names. These services also come with the built-in advantage of scalability and deployability. However, P2P networks have been deployed in the real-world only recently, and their real-world characteristics are not well understood. It is important to understand these real-world characteristics in order to improve the performance and propose new designs by identifying the weaknesses of existing designs. In this dissertation, we first propose a novel, lightweight technique for capturing P2P traffic. Using our captured data, we characterize several aspects of P2P networks and draw conclusions about their weaknesses. Next, we create a botnet to demonstrate the lethality of the weaknesses of P2P networks. Finally, we address the weaknesses of P2P systems to design a P2P-based content discovery service, which resolves the drawbacks of existing content discovery systems and can operate at Internet-scale. This dissertation includes both previously published/unpublished and co-authored material.