Research Papers

As a graduate student, I read a lot of research papers. This is a collection of those that I think are interesting.

The Datacenter as a Computer

This is a 120 page document describing the design of state of the art, large scale computing facilities, such as those run by the big Internet companies. It discusses everything from facilities issues through the computing hardware through to the software infrastructure. This is an excellent design guide about how everyone should be designing data centers of all sizes, not just huge facilities. Don't be intimidated by its length: it is very easy to read. Just browse the table of contents and pick and choose the sections that interest you. I particularly enjoyed Chapter 5: Energy and Power Efficiency. ... (130 words)

Read more ...

[ 2009-May-23 18:06 | Permanent Link ]

The Google File System

This paper has received a lot of attention, so I won't talk about it much. I just wanted to point out that many of the people involved in this paper have been involved in distributed file systems for a while. Sanjay Ghemawat was involved in the Harp file system. Howard Gobioff and Fay Chang were involved in the Network-Attached Secure Disk research project at CMU, which, to my non-expert knowledge, is the first project that proposed separating metadata from data in distributed file systems, which is the core design behind GFS. ... (121 words)

Read more ...

[ 2007-April-15 14:12 | Permanent Link ]

The Landscape of Parallel Computing Research

This paper is a survey of parallel computing that happens to be getting a lot of attention on the web right now. On the whole, it is an interesting read for anyone curious about computer architecture. I don't agree with everything in the paper, but I did learn a few things, and it does give me some hope that we will be able to work with the very parallel architectures we will be seeing soon. My favourite part: They present a really good, really simple argument for why it might be good to have asymmetrical architectures with one big, complicated, fast CPU for sequential code, and a whole lot of smaller, simpler CPUs for parallel code. Tim Bray's opinions about this paper are interesting, as are some of the comments on that page. ... (159 words)

Read more ...

[ 2007-February-27 23:16 | Permanent Link ]

Rethink the Sync

This is one of the two best paper winners from OSDI'06. The idea that is presented is brilliant: get rid of synchronous disk writes, where the program blocks until disk I/O has actually been written to disk. Instead, allow the program to continue, and buffer all output (such as network communication or updating the screen) until the disk write completes. This allows a combination of the benefits of synchronous and asynchronous disk writes, which is an interesting idea. I am skeptical about how useful their specific implementation is, but I can definitely see how alternative implementations of this idea would be useful. Evan Martin has a good discussion, and there are also a few question and answers on the OSDI'06 discussion page. ... (274 words)

Read more ...

[ 2007-February-13 20:48 | Permanent Link ]

Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications

This is one of the important peer-to-peer papers. My only criticism is that Chord only provides probabilistic guarantees. I don't see how it can be used as a building block for distributed systems that need to provide strong semantics. ... (70 words)

Read more ...

[ 2006-December-24 23:19 | Permanent Link ]

The Chubby Lock Service for Loosely-Coupled Distributed Systems

Google's lock service provides an easy interface for master elections, distributed locks, and naming. This paper states "Chubby was an engineering effort [...]; it was not research." However, I think Burrows is wrong: This is great research, and is exactly the kind that academics are bad at. He presents a novel design which provides a useful abstraction for building distributed systems. Best of all, it has been tested extensively inside Google, and he presents useful information about what worked and what didn't. I would like to read more papers like this. ... (118 words)

Read more ...

[ 2006-October-27 19:45 | Permanent Link ]

Chain Replication: Strong Consistency and Reliability

This paper introduces chain replication, a technique for replicated distributed systems that can maintain strong consistency guarantees such as ACID, while still providing good performance and reliability when things fail. I'm not completely convinced that this is novel or necessarily the best choice, but this paper is a great example of presenting a relatively simple idea and exploring it very well. Definitely a technique worth investigating, although I never like it when people propose a system that relies on a Paxos master election. I guess that is what happens when you make strong consistency guarantees. ... (127 words)

Read more ...

[ 2006-September-14 21:31 | Permanent Link ]

Microreboot: Microkernels for applications

This paper argues that applications should be divided into separate components, so that recovery from errors can take place by restarting small components, instead of having to restart the entire server. I think this this is a great idea, but I'm not sure it is a new one. Microkernel operating systems are based completely on this design. Additionally, the common "three tier architecture" decomposes a service into data, application and presentation layers. Still, I agree with the authors that this is an interesting technique for making software more reliable, and this paper does a good job of exploring it. ... (134 words)

Read more ...

[ 2006-September-04 15:29 | Permanent Link ]

Thesis: Practical Routing in Delay-Tolerant Networks

Evan P. C. Jones, "Practical routing in delay-tolerant networks," Master's thesis, Electrical & Computer Engineering, University of Waterloo, Waterloo, Ontario, Canada, June 2006. This is an extended and revised version of my earlier workshop paper. Questions or comments are welcomed. [local copy with minor corrections]

[ 2006-June-08 23:21 | Permanent Link ]

Towards 2020 Science

Microsoft Research Cambridge recently released a report entitled "Toward 2020 Science." This report is the result of a workshop that invited an international group of natural and computer scientists to put together a roadmap for science, and particularly computing, over the next sixteen years before 2020. The major focus is the claim that computer science will transform science, not just as a tool for doing "traditional" science, but as a technique for performing experiments and creating models. They have a downloadable PDF, but they will also mail you a copy of their very high-quality printed version, if you request it. I recommend reading it if you are interested in the intersection of science and computing.

[ 2006-April-05 14:22 | Permanent Link ]