Thursday, October 23, 2014

CodingTMD’s Reading List

Following reading list is selected from the papers I had read in the past 3 years. It will help you to gain a basic knowledge of what happened in current industry and bring you a little sense about how to design a distributed system with certain principles.

Feel free to post the good paper you had read in the comments for sharing.:)

  1.  In Search of an Understandable Consensus Algorithm.  Diego Ongaro, John Ousterhout, 2013
  2.   A Simple Totally Ordered Broadcast Protocol. Benjamin Reed, Flavio P. Junqueira,2008
  3.  Paxos Made Live - An Engineering Perspective. Tushar Deepak Chandra, Robert Griesemer, Joshua Redstone, 2007
  4.   The Chubby Lock Service for Loosely-Coupled Distributed Systems. Mike Burrows, 2006
  5.   Paxos Made Simple. Leslie Lamport, 2001
  6.   Impossibility of Distributed Consensus with One Faulty Process. Michael Fischer, Nancy Lynch, Michael Patterson, 1985
  7.  The Byzantine Generals Problem. Leslie Lamport, 1982
  8.   An Algorithm for Concurrency Control and Recovery in Replicated Distributed Databases. PA Bernstein, N Goodman, 1984
  9.   Wait-Free Synchronization. M Herlihy…, 1991
  10. ZooKeeper: Wait-free coordination for Internet-scale systems. P Hunt, M Konar, FP Junqueira, 2010

  1. Highly Available Transactions: Virtues and Limitations. Peter Bailis, Aaron Davidson, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica, 2013
  2. Consistency Tradeoffs in Modern Distributed Database System Design. Daniel J. Abadi, 2012
  3. CAP Twelve Years Later: How the “Rules” Have Changed. Eric Brewer, 2012
  4. Optimistic Replication. Yasushi Saito and Marc Shapiro, 2005
  5. Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services. Seth Gilbert, Nancy Lynch, 2002
  6. Harvest, Yield, and Scalable Tolerant Systems. Armando Fox, Eric A. Brewer, 1999
  7. Linearizability: A Correctness Condition for Concurrent Objects. Maurice P. Herlihy, Jeannette M. Wing, 1990
  8. Time, Clocks, and the Ordering of Events in a Distributed System. Leslie Lamport, 1978

Conflict-free data structures
  1.  A Comprehensive Study of Convergent and Commutative Replicated Data Types. Mark Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski, 2011
  2. A Commutative Replicated Data Type For Cooperative Editing. Nuno Preguica, Joan Manuel Marques, Marc Shapiro, Mihai Letia, 2009
  3. CRDTs: Consistency without Concurrency Control. Mihai Letia, Nuno Preguiça, Marc Shapiro, 2009
  4. Conflict-free replicated data types. Marc Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski, 2011
  5. Designing a commutative replicated data type. Marc Shapiro, Nuno Preguiça, 2007
Distributed programming
  1.  Logic and Lattices for Distributed Programming. Neil Conway, William Marczak, Peter Alvaro, Joseph M. Hellerstein, David Maier, 2012
  2. Dedalus: Datalog in Time and Space. Peter Alvaro, William R. Marczak, Neil Conway, Joseph M. Hellerstein, David Maier, Russell Sears, 2011
  3. MapReduce: Simplified Data Processing on Large Clusters. Jeffrey Dean, Sanjay Ghemawat, 2004
  4. A Note On Distributed Computing. Samuel C. Kendall, Jim Waldo, Ann Wollrath, Geoff Wyant, 1994
  5. An Overview of the Scala Programming Language. M Odersky, P Altherr, V Cremet, B Emir, S Man, 2004
  6.  Erlang. Joe Ar mstrong, 2010

Implemented and theoretical distributed systems.
  1.  A History of The Virtual Synchrony Replication Model. Ken Birman,  2010
  2.  Cassandra — A Decentralized Structured Storage System. Avinash Lakshman, Prashant Malik, 2009
  3.  Dynamo: Amazon’s Highly Available Key-Value Store. Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels, 2007
  4.  Stasis: Flexible Transactional Storage. Russell Sears, Eric Brewer, 2006
  5.   Bigtable: A Distributed Storage System for Structured Data. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, 2006
  6.  The Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, 2003
  7.  Lessons from Giant-Scale Services. Eric A. Brewer, 2001
  8.  Towards Robust Distributed Systems. Eric A. Brewer, 2000
  9.  Cluster-Based Scalable Network Services. Armando Fox, Steven D. Gribble, Yatin Chawathe, Eric A. Brewer, Paul Gauthier, 1997
  10. The Process Group Approach to Reliable Distributed Computing. Ken Birman, 1993
  11. Bitcoin: A Peer-to-Peer Electronic Cash System.
  12. The Hadoop Distributed File System.  Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, 2010
  13. Hive – A Petabyte Scale Data Warehouse Using Hadoop. A Thusoo, JS Sarma, N Jain, Z Shao, 2010
  14. Scalable Web Architecture and Distributed Systems.  Kate Matsudaira,
  15. Kafka: a Distributed Messaging System for Log Processing. J Kreps, N Narkhede, 2011
  16. Storm: Distributed and fault-tolerant real-time computation. Nathan Marz, 2012
  17. Spark: Cluster Computing withWorking Sets. M Zaharia, M Chowdhury, MJ Franklin…, 2010
  18. Flat Datacenter Storage.  EB Nightingale, J Elson, J Fan, OS Hofmann, J Howell…, 2012
  19. Ananta: Cloud Scale Load Balancing. P Patel, D Bansal, L Yuan, A Murthy…, 2013
  20.  F1 - The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business. Jeff Shute, Stephan Ellner…, 2012
  21. BigTable, Dynamo & Cassandra – A Review. A Kala Karun, S Surendran, 2012
  22. Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency.  B Calder, J Wang, A Ogus, N Nilakantan…, 2011

  1.   The Dangers of Replication and a Solution. J Gray, P Helland, P O'Neil, D Shasha - ACM SIGMOD Record, 1996

Industry Implementation
  1.   Hadoop Architecture and its Usage at Facebook.  Dhruba Borthakur, 2009
  2.   WEB SEARCH FOR A PLANET: THE GOOGLE CLUSTER ARCHITECTURE. LA Barroso, J Dean, U Holzle - Micro, Ieee, 2003
  3.   HDFS scalability: the limits to growth. Konstantin V. Shvachko, 2010
  4.   Autopilot: Automatic Data Center Management.  Michael Isard, 2007
  5.   Storage Infrastructure behind Facebook Messages: Using HBase at Scale. AS Aiyer, M Bautin, GJ Chen, P Damania, 2012
  6.   Scaling Memcache at Facebook.  R Nishtala, H Fugal, S Grimm, M Kwiatkowski, 2013
  7.   Finding a needle in Haystack: Facebook’s photo storage. D Beaver, S Kumar, HC Li, J Sobel, P Vajge, 2010
  8.  Apache Hadoop Goes Realtime at Facebook. D Borthakur, J Gray, JS Sarma…, 2011
  9.   Data Warehousing and Analytics Infrastructure at Facebook.  A Thusoo, Z Shao, S Anthony, D Borthakur…, 2010
  10. Large Scale Computing @ Linkedin. Bhupesh Bansal, 2009
  11. An Analysis of Facebook Photo Caching. Q Huang, K Birman, R van Renesse, W Lloyd…, 2013
  12. The “Big Data” Ecosystem at LinkedIn. R Sumbaly, J Kreps, S Shah, 2013
  13. Data Infrastructure at LinkedIn. A Auradkar, C Botev, S Das…, 2012

  1.  Deep C (and C++) . Olve Maudal and Jon Jagger, 2011

  1. ColumnStores vs. RowStores: How Different Are They Really? DJ AbadiSR Madden, N Hachem, 2008
  2.  Hadoop and its evolving ecosystem. J. Yates Monteith, John D. McGregor, and John E. Ingram
  3. Orleans: Cloud Computing for Everyone. S Bykov, A Geller, G Kliot, JR Larus, R Pandya, 2011
  4. Twitter Data Analytics.  Shamanth Kumar, Fred Morstatter, Huan Liu, 2013
  5. MapReduce is Good Enough? If All You Have is a Hammer, Throw Away Everything That’s Not a Nail!  Jimmy Lin, 2012

Data Mining
  1.   Data Mining with Big Data. X Wu, X Zhu, GQ Wu, W Ding, 2014
  2.   SAMOA: A Platform for Mining Big Data Streams. G De Francisci Morales , 2013
  3.   Mining Big Data: Current Status, and Forecast to the Future. W Fan, A Bifet, 2013
  4.   Scaling Big Data Mining Infrastructure: The Twitter Experience. J Lin, D Ryaboy, 2013

  1. Cloud Design Pattern.
  2. Data Access For Highly scalable solutions.
  3. Computer Architecture - A Quantitative Approach.
  4.  DISTRIBUTED SYSTEMS - Concepts and Design.  Fifth Edition. George Coulouris
  5. Beautiful Architecture. Diomidis Spinellis, Georgios Gousios etc
  6. Mining Social Media: Tracking Content and Predicting Behavior. Manos Tsagkias
  7.  Seven Databases in Seven Weeks.  Eric Redmond and Jim R. Wilson