Saturday, November 19, 2011

Google Ngram Viewer

I have been playing with the Google Ngram Viewer, which measures the use of words and phrases in books through time. It is a byproduct of the Google Digitization Project. It is easy to use and free to anyone.

Question One: What is the most written about sport in America?

I entered baseball, football, basketball, and hockey into the search box and chose American English. I wanted to eliminate publications from the British Commonwealth where football is the game Americans call soccer. My assumption was that I would see baseball overtaken by football in the 1960s. What I found is here: The Results.

It is not what I thought at all. Football was the more popular topic through most of the 20th Century. Baseball passed it in the late 1980s but football caught back up.

Question Two: Who's more written about between the Beatles and the Rolling Stones? How far behind would the Beach Boys be?

I suspected the Beatles would win. I was not sure how the writing would vary over time. I searched between 1960 and 2008 (last year for which Google has searchable content). Here is the result: Click Here

It appears that the fame of the two British bands peaked around 2002 or 2003 when the Beatles were five times more written about the Rolling Stones and ten times more than the Beach Boys.

Question Three: Which Beatle has gotten the most attention in books?

Here is the result: Beatle Measure

George Harrison gets an early lead, which probably means there was another person with the name. This shows a shortcoming of the Ngram idea. Still, I was surprised how much more attention John Lennon has gotten.

You can do this, too. To learn how Ngram can be used for real research watch this TED lecture.


Glenn said...

Poor Ringo always gets the short stick. His talent continues to go unappreciated!

Robert said...

Stunning. Not necessarily your subjects but the fact that we, the common unwashed masses, can do this type of research.

Thanks so much for enlightening me about this!!!!