Text Mining Barack Obama using R #rstats

  • We copy and paste President Barack Obama’s “Yes We Can” speech in a text document and read it in. For a word cloud we need a dataframe with two columns, one with words and the the other with frequency.We read in the transcript from http://www.nytimes.com/2008/01/08/us/politics/08text-obama.html?pagewanted=all&_r=0  and paste in the file located in the local directory- /home/ajay/Desktop/new. Note tm is a powerful package and will read ALL the text documents within the particular folder




b=Corpus(DirSource(txt2), readerControl = list(language = “eng”))

> b b b tdm m1 v1 d1 wordcloud(d1$word,d1$freq)

Now it seems we need to remove some of the very commonly occuring words like “the” and “and”. We are not using the standard stopwords in english (the tm package provides that see Chapter 13 Text Mining case studies), as the words “we” and “can” are also included .

> b tdm m1 v1 d1 wordcloud(d1$word,d1$freq)

But let’s see how the wordcloud changes if we remove all English Stopwords.

> b tdm m1 v1 d1 wordcloud(d1$word,d1$freq)

and you can draw your own conclusions from the content of this famous speech based on your political preferences.

Politicians can give interesting speeches but they may be full of simple sounding words…..


1. Ingo Feinerer (2012). tm: Text Mining Package. R package version0.5-7.1.

Ingo Feinerer, Kurt Hornik, and David Meyer (2008). Text Mining
Infrastructure in R. Journal of Statistical Software 25/5. URL:

2. Ian Fellows (2012). wordcloud: Word Clouds. R package version 2.0.


3. You can see more than 100 of Obama’s speeches at http://obamaspeeches.com/

Quote- numbers dont lie, people do.


Multi State Models

Arc de Triomphe

A special issue of the Journal of Statistical Software has come out devoted to Multi State Models and Competing Risks. It is a must read for anyone with interest in Pharma Analytics or Survival Analysis- even if you dont know much R

Here is an extract from “mstate: An R Package for the Analysis ofCompeting Risks and Multi-State Models”

Multi-state models are a very useful tool to answer a wide range of questions in sur-vival analysis that cannot, or only in a more complicated way, be answered by classicalmodels. They are suitable for both biomedical and other applications in which time-to-event variables are analyzed. However, they are still not frequently applied. So far, animportant reason for this has been the lack of available software. To overcome this prob-lem, we have developed the mstate package in R for the analysis of multi-state models.The package covers all steps of the analysis of multi-state models, from model buildingand data preparation to estimation and graphical representation of the results. It canbe applied to non- and semi-parametric (Cox) models. The package is also suitable forcompeting risks models, as they are a special category of multi-state models.




Issues for JSS Special Volume 38: Competing Risks and Multi-State Models

Special Issue about Competing Risks and Multi-State Models

Hein Putter
Vol. 38, Issue 1, Jan 2011
Submitted 2011-01-03, Accepted 2011-01-03

Analyzing Competing Risk Data Using the R timereg Package

Thomas H. Scheike, Mei-Jie Zhang
Vol. 38, Issue 2, Jan 2011
Submitted 2009-05-25, Accepted 2010-06-22

p3state.msm: Analyzing Survival Data from an Illness-Death Model

Luís Filipe Meira Machado, Javier Roca-Pardiñas
Vol. 38, Issue 3, Jan 2011
Submitted 2009-06-30, Accepted 2010-03-02

Empirical Transition Matrix of Multi-State Models: The etm Package

Arthur Allignol, Martin Schumacher, Jan Beyersmann
Vol. 38, Issue 4, Jan 2011
Submitted 2009-01-08, Accepted 2010-03-11

Lexis: An R Class for Epidemiological Studies with Long-Term Follow-Up

Martyn Plummer, Bendix Carstensen
Vol. 38, Issue 5, Jan 2011
Submitted 2010-02-09, Accepted 2010-09-16

Using Lexis Objects for Multi-State Models in R

Bendix Carstensen, Martyn Plummer
Vol. 38, Issue 6, Jan 2011
Submitted 2010-02-09, Accepted 2010-09-16

mstate: An R Package for the Analysis of Competing Risks and Multi-State Models

Liesbeth C. de Wreede, Marta Fiocco, Hein Putter
Vol. 38, Issue 7, Jan 2011
Submitted 2010-01-17, Accepted 2010-08-20

Multi-State Models for Panel Data: The msm Package for R

Christopher Jackson
Vol. 38, Issue 8, Jan 2011
Submitted 2009-07-21, Accepted 2010-08-18

JSS-Announce mailing list