Large web sites like Facebook are constantly under attack from hackers and groups trying to spread malware, which means sites like Facebook gather a lot of data about what attacks look like and where they’re coming from. In order to help standardize its methods for collecting and analyzing all this data, Facebook built a new framework called ThreatData, which it detailed in a blog post on Tuesday afternoon.
Essentially, though, ThreatData is composed of systems for ingesting and transforming data feeds from many different sources, storing and analyzing that data for historical and real-time trends (using the Hadoop-based Hive for the former and Scuba for the latter), and then reacting to threats in real time. Blog post author Mark Hammell, a threat researcher at Facebook, explained how ThreatData has been used for everything from detecting a campaign to spread smartphone malware via spam messages to creating a “super anti-virus” program that’s much more thorough than any commercial software.
The image below shows a graph Facebook developed using ThreatData to map malicious and victimized IP addresses, with the pie chart breaking that data down by ISP in the United States.