Using Twitter Data with R #rstats updated for API changes

Step 1

Install Package twitteR

install.packages("twitteR")
> install.packages("twitteR")
Installing package(s) into ‘/home/R/library’
(as ‘lib’ is unspecified)
also installing the dependencies ‘ROAuth’, ‘rjson’

trying URL 'http://cran.rstudio.com/src/contrib/ROAuth_0.9.3.tar.gz'
Content type 'application/x-gzip' length 6202 bytes
opened URL
==================================================
downloaded 6202 bytes

trying URL 'http://cran.rstudio.com/src/contrib/rjson_0.2.13.tar.gz'
Content type 'application/x-gzip' length 98132 bytes (95 Kb)
opened URL
==================================================
downloaded 95 Kb

trying URL 'http://cran.rstudio.com/src/contrib/twitteR_1.1.7.tar.gz'
Content type 'application/x-gzip' length 121696 bytes (118 Kb)
opened URL
==================================================
downloaded 118 Kb

* installing *source* package ‘ROAuth’ ...
** package ‘ROAuth’ successfully unpacked and MD5 sums checked
** R
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded

* DONE (ROAuth)
* installing *source* package ‘rjson’ ...
** package ‘rjson’ successfully unpacked and MD5 sums checked
** libs
g++ -I/usr/share/R/include -DNDEBUG      -fpic  -O3 -pipe  -g  -c dump.cpp -o dump.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -O3 -pipe  -g  -c parser.c -o parser.o
g++ -shared -o rjson.so dump.o parser.o -L/usr/lib/R/lib -lR
installing to /home/R/library/rjson/libs
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
   ‘json_rpc_server.Rnw’ 
** testing if installed package can be loaded

* DONE (rjson)
* installing *source* package ‘twitteR’ ...
** package ‘twitteR’ successfully unpacked and MD5 sums checked
** R
** inst
** preparing package for lazy loading
Creating a generic function for ‘as.data.frame’ from package ‘base’ in package ‘twitteR’
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded

* DONE (twitteR)

The downloaded source packages are in
	‘/tmp/RtmpvY7yMN/downloaded_packages’

Step 2

Load Package twitteR

library(twitteR)
> library(twitteR)
Loading required package: ROAuth
Loading required package: RCurl
Loading required package: bitops
Loading required package: digest
Loading required package: rjson

Step 3

Using your Twitter Login ID and Password Login to https://dev.twitter.com/apps/  

In case you forget the twitter.com username and password, click on Forgot Password to reset Password

Screenshot from 2013-09-11 13:40:51

Step 4

Create a new app for yourself by navigating to My Applications

Screenshot from 2013-09-11 13:41:56

Step 5

Your Apps are here

https://dev.twitter.com/apps

Click on New Application (button on top right)

Screenshot from 2013-09-11 13:44:16

Step 6

Fill the options here- leave the callback  url blank

Name should be Unique

Description should be atleast 10 Charachters

Website can be a placeholder as of now (or your blog address) Screenshot from 2013-09-11 13:46:03

Agree to Terms and Conditions

Type the Spam Check Number and Letters Screenshot from 2013-09-11 13:46:18

Step 7

Note these  details from your new APP

Consumer Key

Consumer Secret

On the Bottom –

Click on Create your OAuth Token

Finally your APP page should look like this (dont worry i will be deleting this app so you cant hack my twitter yet)

Screenshot from 2013-09-11 13:52:10

Step 8

Go to R

Type the following code  after you changed the two consumer keys (IMPORTANT- You will need to change your Consumer Key and Consumer Secret to the one specific to YOUR app)

NOTE- WordPress makes some changes when you copy and paste code to your blog

(like  adding &8221 to lines 2-4 below- Ignore this please)

THE final formatted code is at very end of the post

library(twitteR)
reqURL <- “https://api.twitter.com/oauth/request_token&#8221;
accessURL <- “https://api.twitter.com/oauth/access_token&#8221;
authURL <- “https://api.twitter.com/oauth/authorize&#8221;
consumerKey <- “2uQlGBBMMXdDffcK2IkAsg
consumerSecret <- “xrGr71kTfdT3ypWFURGxyJOC4Oqf46Rwu4qxyxoEfM”
twitCred <- OAuthFactory$new(consumerKey=consumerKey,
consumerSecret=consumerSecret,
requestURL=reqURL,
accessURL=accessURL,
authURL=authURL)

Screenshot from 2013-09-11 13:41:56Step 9

Do the Twitter Handshake by pasting this

command in R Console

twitCred$handshake()

You will see a message like this from R

> twitCred$handshake()
To enable the connection, please direct your web browser to: http://api.twitter.com/oauth/authorize?oauth_token=pJqojAg2gxmqip3SprJAyOckdcD1nB3MvlbP2dWUDGQ When complete, record the PIN given to you and provide it here:

Step 10

Go to the link above given by R

You will get this message

Screenshot from 2013-09-11 14:01:42Click on blue button

Authorize app

Step 11 Entering the Pin

Now you see a pin here

like this

Screenshot from 2013-09-11 14:02:48-You cant copy and paste it. Write it down and then type in your R console

Step 12

Now register the credentials using

registerTwitterOAuth(twitCred)

you will see this

if done correctly

> registerTwitterOAuth(twitCred) [1] TRUE

123Step 13

Search Twitter using commands like here. Note it returns only 499 tweets

> a=searchTwitter(“#rstats”, n=2000)

Warning message: In doRppAPICall(“search/tweets”, n, params = params, retryOnRateLimit = retryOnRateLimit, : 2000 tweets were requested but the API can only return 499

Step 14 Now you can start analyzing the data

Convert the data into a data frame tweets_df = twListToDF(a)

Install Packages tm (for textmining and wordcloud)

> install.packages(c(“tm”, “wordcloud”))

Load the Packages

library(tm)

library(wordcloud)

Basic Word Cloud can be created using code below

b=Corpus(VectorSource(tweets_df$text), readerControl = list(language = “eng”))

b<- tm_map(b, tolower) #Changes case to lower case

b<- tm_map(b, stripWhitespace) #Strips White Space

b <- tm_map(b, removePunctuation) #Removes Punctuation

inspect(b)

tdm <- TermDocumentMatrix(b)

m1 <- as.matrix(tdm)

v1<- sort(rowSums(m1),decreasing=TRUE)

d1<- data.frame(word = names(v1),freq=v1)

wordcloud(d1$word,d1$freq)

For more detailed analysis on what you can do with Twitter and R, read this http://cran.r-project.org/web/packages/twitteR/twitteR.pdf or this https://sites.google.com/site/miningtwitter/

Step 15 Keep your OAuth keys safely, and do your homework with out bothering your instructor too much.

If you try and copy the paste the code from a website, be sure to change the quotation marks “” manually in your R console

Also see on text mining https://decisionstats.com/2012/03/19/text-mining-barack-obama/

FINAL CODE

install.packages("twitteR")
library(twitteR)
reqURL <- "https://api.twitter.com/oauth/request_token"
accessURL <- "https://api.twitter.com/oauth/access_token"
authURL <- "https://api.twitter.com/oauth/authorize"
consumerKey <- "rR16FxDLkTYmuVhqH4s4EQ"
consumerSecret <- "xrGr71kTfdT3ypWFURGxyJOC4Oqf46Rwu4qxyxoEfM"
twitCred <- OAuthFactory$new(consumerKey=consumerKey,
                             consumerSecret=consumerSecret,
                             requestURL=reqURL,
                             accessURL=accessURL,
                             authURL=authURL)
twitCred$handshake() #Pause here for the Handshake Pin Code
registerTwitterOAuth(twitCred) #Wait till you see True


a=searchTwitter("#rstats", n=2000) #Get the Tweets
 tweets_df = twListToDF(a) #Convert to Data Frame
 install.packages(c("tm", "wordcloud"))
 library(tm) 
library(wordcloud)   
  b=Corpus(VectorSource(tweets_df$text), readerControl = list(language = "eng"))
 b<- tm_map(b, tolower) #Changes case to lower case 
b<- tm_map(b, stripWhitespace) #Strips White Space 
b <- tm_map(b, removePunctuation) #Removes Punctuation
 inspect(b) 
tdm <- TermDocumentMatrix(b) 
m1 <- as.matrix(tdm) 
v1<- sort(rowSums(m1),decreasing=TRUE) 
d1<- data.frame(word = names(v1),freq=v1) 
wordcloud(d1$word,d1$freq)

 

Author: Ajay Ohri

http://about.me/ajayohri

6 thoughts on “Using Twitter Data with R #rstats updated for API changes”

  1. Nice tutorial, just keep in mind that if running code on windows OS you need the code below.

    options(RCurlOptions = list(cainfo = system.file(“CurlSSL”, “cacert.pem”, package = “RCurl”)))
    download.file(url=”http://curl.haxx.se/ca/cacert.pem”, destfile=”cacert.pem”)

    twitCred$handshake(cainfo=”cacert.pem”)

Leave a comment