Text Mining Barack Obama using R #rstats

  • We copy and paste President Barack Obama’s “Yes We Can” speech in a text document and read it in. For a word cloud we need a dataframe with two columns, one with words and the the other with frequency.We read in the transcript from http://www.nytimes.com/2008/01/08/us/politics/08text-obama.html?pagewanted=all&_r=0  and paste in the file located in the local directory- /home/ajay/Desktop/new. Note tm is a powerful package and will read ALL the text documents within the particular folder

library(tm)

library(wordcloud)

txt2=”/home/ajay/Desktop/new”

b=Corpus(DirSource(txt2), readerControl = list(language = “eng”))

> b b b tdm m1 v1 d1 wordcloud(d1$word,d1$freq)

Now it seems we need to remove some of the very commonly occuring words like “the” and “and”. We are not using the standard stopwords in english (the tm package provides that see Chapter 13 Text Mining case studies), as the words “we” and “can” are also included .

> b tdm m1 v1 d1 wordcloud(d1$word,d1$freq)

But let’s see how the wordcloud changes if we remove all English Stopwords.

> b tdm m1 v1 d1 wordcloud(d1$word,d1$freq)

and you can draw your own conclusions from the content of this famous speech based on your political preferences.

Politicians can give interesting speeches but they may be full of simple sounding words…..

Citation-

1. Ingo Feinerer (2012). tm: Text Mining Package. R package version0.5-7.1.

Ingo Feinerer, Kurt Hornik, and David Meyer (2008). Text Mining
Infrastructure in R. Journal of Statistical Software 25/5. URL:
http://www.jstatsoft.org/v25/i05/

2. Ian Fellows (2012). wordcloud: Word Clouds. R package version 2.0.

http://CRAN.R-project.org/package=wordcloud

3. You can see more than 100 of Obama’s speeches at http://obamaspeeches.com/

Quote- numbers dont lie, people do.

.

Author: Ajay Ohri

http://about.me/ajayohri

6 thoughts on “Text Mining Barack Obama using R #rstats”

  1. Very cool, thanks for sharing and with the thought process of removing articles and stop words… You’re right, a speech that seems complex can have some simple redundancy which stays with people.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s