As computers enter ever more areas of our daily lives, the amount of data they produce has grown enormously. But for this “big data” to be useful it must first be analyzed, meaning it needs to be stored in such a way that it can be accessed quickly when required.
Previously, any data that needed to be accessed in a hurry would be stored in a computer’s main memory, or dynamic random access memory (DRAM)—but the size of the datasets now being produced makes this impossible.
So instead, information tends to be stored on multiple hard disks on a number of machines across an Ethernet network. However, this storage architecture considerably increases the time it takes to access the information, according to Sang-Woo Jun, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT.
“Storing data over a network is slow because there is a significant additional time delay in managing data access across multiple machines in both software and hardware,” Jun says. “And if the data does not fit in DRAM, you have to go to secondary storage—hard disks, possibly connected over a network—which is very slow indeed.”