Using Big Data Techniques to Explode the Keyword Targeting Myth

Keyword targeting is held up as the gold standard for understanding content. But is it true? To find out, I decided to test this longstanding notion to see if it still holds up.
Note: This is also a story about using Big Data techniques and how you can employ them yourself for fun and profit. Stay with me, it’s not as hard as you might think!
The Experiment
For my experiment, I crawled several thousand pages from The New York Times. Using The Times’ pages was a natural choice, both because lots of data experiments use this source, and also because I know they share the keywords they associate to each URL as a metadata field. After doing some de-duplication and other cleaning, I ended up with a my data set of 2,142 stories, choosing articles in three main categories: politics, art and entertainment and business.

