The Big Data sessions at E-Records were — and I don’t use this word lightly — breathtaking. Heart-pounding. Exhilarating. Of course, I was only able to understand why these research presentations were so impressive after the NARA employee sitting next to me explained it to me, but now I hope to be able to convey what I did learn in a way that will do it justice.
What do I mean by “big data?” Put simply, it just means extremely large and complex data sets. (Honestly, Wikipedia’s pretty helpful here: “big data consists of data sets that grow so large and complex that they become awkward to work with using on-hand database management tools.” Source)
The National Science Foundation has sponsored, and NARA has supplemented funding for, two grant projects to analyze big data and better understand the specific challenges of digital preservation and access. Computer scientists in the Image and Spatial Data Analysis Group of the world-class research institution National Center for Supercomputing Applications (NCSA) are studying two issues critical to preserving digital information: 1) What tools archivists need to preserve e-records; and 2) How to provide searchable access to handwritten information — specifically, to the 125 terabytes of data that comprise the digitized 1940 Census.