Home » Analytics » Google Books Ngram Viewer

Google Books Ngram Viewer

R in the Cloud

R for Quantitative Finance

Software

Train in R

Here is a terrific data visualization from Google based on their digitized books collection. How does it work, basically you can test the frequency of various words across time periods from 1700s to 2010.

Like the frequency and intensity of kung fu vs yoga, or pizza versus hot dog. The basic datasets scans millions /billions of words.

Here is my yoga vs kung fu vs judo graph.

http://ngrams.googlelabs.com/info

What’s all this do?

When you enter phrases into the Google Books Ngram Viewer, it displays a graph showing how those phrases have occurred in a corpus of books (e.g., “British English”, “English Fiction”, “French”) over the selected years. Let’s look at a sample graph:

This shows trends in three ngrams from 1950 to 2000: “nursery school” (a 2-gram or bigram), “kindergarten” (a 1-gram or unigram), and “child care” (another bigram). What the y-axis shows is this: of all the bigrams contained in our sample of books written in English and published in the United States, what percentage of them are “nursery school” or “child care”? Of all the unigrams, what percentage of them are “kindergarten”? Here, you can see that use of the phrase “child care” started to rise in the late 1960s, overtaking “nursery school” around 1970 and then “kindergarten” around 1973. It peaked shortly after 1990 and has been falling steadily since.

(Interestingly, the results are noticeably different when the corpus is switched to British English.)

Corpora

Below are descriptions of the corpora that can be searched with the Google Books Ngram Viewer. All of these corpora were generated in July 2009; we will update these corpora as our book scanning continues, and the updated versions will have distinct persistent identifiers.

Informal corpus name Persistent identifier Description
American English googlebooks-eng-us-all-20090715 Same filtering as the English corpus but further restricted to books published in the United States.
British English googlebooks-eng-gb-all-20090715 Same filtering as the English corpus but further restricted to books published in Great Britain.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Predictive Analytics- The Book

Conferences

Books

Follow

Get every new post delivered to your Inbox.

Join 735 other followers