Part 1 How do people at Facebook use R?
tamar Rosenn, Facebook
Itamar conveyed how Facebook’s Data Team used R in 2007 to answer two questions about new users: (i) which data points predict whether a user will stay? and (ii) if they stay, which data points predict how active they’ll be after three months?
For the first question, Itamar’s team used recursive partitioning (via the rpartpackage) to infer that just two data points are significantly predictive of whether a user remains on Facebook: (i) having more than one session as a new user, and (ii) entering basic profile information.
For the second question, they fit the data to a logistic model using a least angle regression approach (via the lars package), and found that activity at three months was predicted by variables related to three classes of behavior: (i) how often a user was reached out to by others, (ii) frequency of third party application use, and (iii) what Itamar termed “receptiveness” — related to how forthcoming a user was on the site.
source-http://www.dataspora.com/2009/02/predictive-analytics-using-r/
and cute graphs like the famous
https://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919

and
studying baseball on facebook
https://www.facebook.com/notes/facebook-data-team/baseball-on-facebook/10150142265858859
by counting the number of posts that occurred the day after a team lost divided by the total number of wins, since losses for great teams are remarkable and since winning teams’ fans just post more.

But mostly at
https://www.facebook.com/data?sk=notes and https://www.facebook.com/data?v=app_4949752878
and creating new packages
1. jjplot (not much action here!)
https://r-forge.r-project.org/scm/viewvc.php/?root=jjplot
though
I liked the promise of JJplot at
http://pleasescoopme.com/2010/03/31/using-jjplot-to-explore-tipping-behavior/
2. ising models
https://github.com/slycoder/Rflim
https://www.facebook.com/note.php?note_id=10150359708746212
3. R pipe
https://github.com/slycoder/Rpipe
even the FB interns are cool
http://brenocon.com/blog/2009/02/comparison-of-data-analysis-packages-r-matlab-scipy-excel-sas-spss-stata/
Part 2 How do people with R use Facebook?
Using the API at https://developers.facebook.com/tools/explorer
and code mashes from
http://romainfrancois.blog.free.fr/index.php?post/2012/01/15/Crawling-facebook-with-R
http://applyr.blogspot.in/2012/01/mining-facebook-data-most-liked-status.html
but the wonderful troubleshooting code from http://www.brocktibert.com/blog/2012/01/19/358/
which needs to be added to the code first
and using network package
>access_token=”XXXXXXXXXXXX”
Annoyingly the Facebook token can expire after some time, this can lead to huge wait and NULL results with Oauth errors
If that happens you need to regenerate the token
What we need
> require(RCurl)
> require(rjson)
> download.file(url=”http://curl.haxx.se/ca/cacert.pem”, destfile=”cacert.pem”)
Roman’s Famous Facebook Function (altered)
> facebook <- function( path = “me”, access_token , options){
+ if( !missing(options) ){
+ options <- sprintf( “?%s”, paste( names(options), “=”, unlist(options), collapse = “&”, sep = “” ) )
+ } else {
+ options <- “”
+ }
+ data <- getURL( sprintf( “https://graph.facebook.com/%s%s&access_token=%s”, path, options, access_token ), cainfo=”cacert.pem” )
+ fromJSON( data )
+ }
Now getting the friends list
> friends <- facebook( path=”me/friends” , access_token=access_token)
> # extract Facebook IDs
> friends.id <- sapply(friends$data, function(x) x$id)
> # extract names
> friends.name <- sapply(friends$data, function(x) iconv(x$name,”UTF-8″,”ASCII//TRANSLIT”))
> # short names to initials
> initials <- function(x) paste(substr(x,1,1), collapse=””)
> friends.initial <- sapply(strsplit(friends.name,” “), initials)
This matrix can take a long time to build, so you can change the value of N to say 40 to test your network. I needed to press the escape button to cut short the plotting of all 400 friends of mine.
> # friendship relation matrix
> N <- length(friends.id)
> friendship.matrix <- matrix(0,N,N)
> for (i in 1:N) {
+ tmp <- facebook( path=paste(“me/mutualfriends”, friends.id[i], sep=”/”) , access_token=access_token)
+ mutualfriends <- sapply(tmp$data, function(x) x$id)
+ friendship.matrix[i,friends.id %in% mutualfriends] <- 1
+ }
Plotting using Network package in R (with help from the comments at http://applyr.blogspot.in/2012/01/mining-facebook-data-most-liked-status.html)
> require(network)
>net1<- as.network(friendship.matrix)
> plot(net1, label=friends.initial, arrowhead.cex=0)
(Rgraphviz is tough if you are on Windows 7 like me)
but there is an alternative igraph solution at https://github.com/sciruela/facebookFriends/blob/master/facebook.r
After all that-..talk.. a graph..of my Facebook Network with friends initials as labels..

Opinion piece-
I hope plans to make the Facebook R package get fulfilled (just as the twitteR package led to many interesting analysis)
and also Linkedin has an API at http://developer.linkedin.com/apis
I think it would be interesting to plot professional relationships across social networks as well. But I hope to see a LinkedIn package (or blog code) soon.
As for jjplot, I had hoped ggplot and jjplot merged or atleast had some kind of inclusion in the Deducer GUI. Maybe a Google Summer of Code project if people are busy!!
Also the geeks at Facebook.com can think of giving something back to the R community, as Google generously does with funding packages like RUnit, Deducer and Summer of Code, besides sponsoring meet ups etc.
(note – this is part of the research for the upcoming book ” R for Business Analytics”)
ps-
but didnt get time to download all my posts using R code at
https://gist.github.com/1634662#
or do specific Facebook Page analysis using R at
http://tonybreyal.wordpress.com/2012/01/06/r-web-scraping-r-bloggers-facebook-page-to-gain-further-information-about-an-authors-r-blog-posts-e-g-number-of-likes-comments-shares-etc/
Updated-
#access token from https://developers.facebook.com/tools/explorer
access_token="AAuFgaOcVaUZAssCvL9dPbZCjghTEwwhNxZAwpLdZCbw6xw7gARYoWnPHxihO1DcJgSSahd67LgZDZD"
require(RCurl)
require(rjson)
# download the file needed for authentication http://www.brocktibert.com/blog/2012/01/19/358/
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")
# http://romainfrancois.blog.free.fr/index.php?post/2012/01/15/Crawling-facebook-with-R
facebook <- function( path = "me", access_token = token, options){
if( !missing(options) ){
options <- sprintf( "?%s", paste( names(options), "=", unlist(options), collapse = "&", sep = "" ) )
} else {
options <- ""
}
data <- getURL( sprintf( "https://graph.facebook.com/%s%s&access_token=%s", path, options, access_token ), cainfo="cacert.pem" )
fromJSON( data )
}
# see http://applyr.blogspot.in/2012/01/mining-facebook-data-most-liked-status.html
# scrape the list of friends
friends <- facebook( path="me/friends" , access_token=access_token)
# extract Facebook IDs
friends.id <- sapply(friends$data, function(x) x$id)
# extract names
friends.name <- sapply(friends$data, function(x) iconv(x$name,"UTF-8","ASCII//TRANSLIT"))
# short names to initials
initials <- function(x) paste(substr(x,1,1), collapse="")
friends.initial <- sapply(strsplit(friends.name," "), initials)
# friendship relation matrix
#N <- length(friends.id)
N <- 200
friendship.matrix <- matrix(0,N,N)
for (i in 1:N) {
tmp <- facebook( path=paste("me/mutualfriends", friends.id[i], sep="/") , access_token=access_token)
mutualfriends <- sapply(tmp$data, function(x) x$id)
friendship.matrix[i,friends.id %in% mutualfriends] <- 1
}
require(network)
net1<- as.network(friendship.matrix)
plot(net1, label=friends.initial, arrowhead.cex=0)
Created by Pretty R at inside-R.org
Like this:
Like Loading...