Resources

30 most influential papers in the world of big data

25th Jun `14, 11:33 AM in Resources

Here is a list of some of the most influential papers in the world of big data. We’ve…

Baiju-NT
Baiju NT Contributor
Follow

Here is a list of some of the most influential papers in the world of big data. We’ve compiled these papers based on recommendations by big data enthusiasts in various social media channels. In case we’ve missed out any important paper, please let us know.

1. Dynamo: Amazon’s Highly Available Key-value Store

2. Bigtable: A Distributed Storage System for Structured Data

3. MapReduce: Simplified Data Processing on Large Clusters

4. The Google File System

5. Cassandra – A Decentralized Structured Storage System

6. Spanner: Google’s Globally-Distributed Database

7. Large-scale Incremental Processing Using Distributed Transactions and Notifications

8. Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications

9. Dremel: Interactive Analysis of Web-Scale Datasets

10. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems

11. The Dangers of Replication and a Solution

12. Interpreting the Data: Parallel Analysis with Sawzall

13. RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems

14. Data clustering: 50 years beyond K-means

15. Bayesian semi-supervised learning with support vector machine

16. What is Data Science?

17. Science Data Management in Coming Decade

18. What Next? A Dozen Information – Technology Research Goals

19. Frustratingly Easy Domain Adaptation

20. NoSQL Databases

21. Volley: Automated Data Placement for Geo-Distributed Cloud Services

22. Q-Clouds: Managing Performance Interference Effects for QoS-Aware Clouds

23. Large-scale Incremental Processing Using Distributed Transactions and Notifications

24. MapReduce Online

25. Lithium: Virtual Machine Storage for the Cloud

26. Availability in Globally Distributed Storage Systems

27. Cloud Storage for Cloud Computing

28. The Collective: A Cache-Based System Management Architecture

29. Parallax: Virtual Disks for Virtual Machines

30. Data-Intensive Supercomputing: The case for DISC


MORE FROM BIG DATA MADE SIMPLE