Systems Papers You Must Read (imho)
The following list provides a collection of papers that I personally consider important pieces of work in systems and recommend all of my students to read. However, especially in the last few years, the systems research community has grown immensely, which makes it virtually impossible to write a compreshensive list of all important papers in the field. As a result, you will notice absence of entire subfields, such as networking and security, mostly because I am not knowledgable enough in those areas to make comprehensive list.
Feel free to reach out, if you know of any paper that should really be part of this list. I am also happy to add links to any material (blogposts, videos, etc.) that make the papers easier to understand.
Consensus and Replication
- Epidemic Algorithms for Replicated Database Maintenance
- The FLP Impossibility Result
- Consensus in the Presence of Partial Synchrony
- Ittai Abraham wrote a great explanation of different communication models
- Paxos made simple
- Practical Byzantine Fault-Tolernace
- Chain Replication for Supporting High Throughput and Availability
- Bitcoin-NG: A Scalable Blockchain Protocol
- This work describes the Bitcoin protocol much better than the original whitepaper
- Blogpost
Database Systems and Distributed Storage
- Principles of Transaction-oriented Database Recovery
- Linearizability
- Peter Bailis on Linearizability vs. Serializability
- Dynamo: Amazon's Highly Available Key-value Store
- Secure Untrusted Data Repository (SUNDR)
- Spanner: Google's Globally-Distributed Database
- HyperDex: A Distributed, Searchable Key-Value Store
- Coordination Avoidance in Database Systems
- C-Store: A Column-oriented DBMS
- Technically not published in a system's venue but a great read
- Seeing is Believing: A Client-Centric Specification of Database Isolation
- Covers all common isolation models in database management systems
Filesystems and Local Sotrage
- A Case for Redundant Arrays of Inexpensive Disks (RAID)
- The Design and Implementation of a Log-Structured File System
- The Log-Structured Merge-Tree (LSM-Tree)
- The foundation for many modern key-value stores such as RocksDB
- PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Tree
- Rethink the Sync
Operating Systems
- Monitors: An Operating System Structuring Concept
- The UNIX time-sharing system
- The Multikernel: A new OS architecture for scalable multicore systems
- Unikernels: Library Operating Systems for the Cloud
- Logical Attestation: An Authorization Architecture for Trustworthy Computing
- Arrakis: The Operating System is the Control Plane
Cloud and Serverless
- Operating System Support for Virtual Machines
- Xen and the Art of Virtualization
- Firecracker: Lightweight Virtualization for Serverless Applications
Miscellaneous
- The End-to-End Argument in System Design
- Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications
- MapReduce: Simplified Data Processing on Large Clusters
- Congestion Avoidance and Control
- The one networking paper you should read even if you do not want to do any networking research