Facebook and R

Part 1 How do people at Facebook use R?

tamar Rosenn, Facebook

Itamar conveyed how Facebook’s Data Team used R in 2007 to answer two questions about new users: (i) which data points predict whether a user will stay? and (ii) if they stay, which data points predict how active they’ll be after three months?

For the first question, Itamar’s team used recursive partitioning (via the rpartpackage) to infer that just two data points are significantly predictive of whether a user remains on Facebook: (i) having more than one session as a new user, and (ii) entering basic profile information.

For the second question, they fit the data to a logistic model using a least angle regression approach (via the lars package), and found that activity at three months was predicted by variables related to three classes of behavior: (i) how often a user was reached out to by others, (ii) frequency of third party application use, and (iii) what Itamar termed “receptiveness” — related to how forthcoming a user was on the site.

source-http://www.dataspora.com/2009/02/predictive-analytics-using-r/

and cute graphs like the famous

https://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919

 

and

studying baseball on facebook

https://www.facebook.com/notes/facebook-data-team/baseball-on-facebook/10150142265858859

by counting the number of posts that occurred the day after a team lost divided by the total number of wins, since losses for great teams are remarkable and since winning teams’ fans just post more.

 

But mostly at

https://www.facebook.com/data?sk=notes and https://www.facebook.com/data?v=app_4949752878

 

and creating new packages

1. jjplot (not much action here!)

https://r-forge.r-project.org/scm/viewvc.php/?root=jjplot

though

I liked the promise of JJplot at

http://pleasescoopme.com/2010/03/31/using-jjplot-to-explore-tipping-behavior/

2. ising models

https://github.com/slycoder/Rflim

https://www.facebook.com/note.php?note_id=10150359708746212

3. R pipe

https://github.com/slycoder/Rpipe

 

even the FB interns are cool

http://brenocon.com/blog/2009/02/comparison-of-data-analysis-packages-r-matlab-scipy-excel-sas-spss-stata/

 

Part 2 How do people with R use Facebook?

Using the API at https://developers.facebook.com/tools/explorer

and code mashes from

 

http://romainfrancois.blog.free.fr/index.php?post/2012/01/15/Crawling-facebook-with-R

http://applyr.blogspot.in/2012/01/mining-facebook-data-most-liked-status.html

but the wonderful troubleshooting code from http://www.brocktibert.com/blog/2012/01/19/358/

which needs to be added to the code first

 

and using network package

>access_token=”XXXXXXXXXXXX”

Annoyingly the Facebook token can expire after some time, this can lead to huge wait and NULL results with Oauth errors

If that happens you need to regenerate the token

What we need
> require(RCurl)
> require(rjson)
> download.file(url=”http://curl.haxx.se/ca/cacert.pem”, destfile=”cacert.pem”)

Roman’s Famous Facebook Function (altered)

> facebook <- function( path = “me”, access_token , options){
+ if( !missing(options) ){
+ options <- sprintf( “?%s”, paste( names(options), “=”, unlist(options), collapse = “&”, sep = “” ) )
+ } else {
+ options <- “”
+ }
+ data <- getURL( sprintf( “https://graph.facebook.com/%s%s&access_token=%s&#8221;, path, options, access_token ), cainfo=”cacert.pem” )
+ fromJSON( data )
+ }

 

Now getting the friends list
> friends <- facebook( path=”me/friends” , access_token=access_token)
> # extract Facebook IDs
> friends.id <- sapply(friends$data, function(x) x$id)
> # extract names
> friends.name <- sapply(friends$data, function(x) iconv(x$name,”UTF-8″,”ASCII//TRANSLIT”))
> # short names to initials
> initials <- function(x) paste(substr(x,1,1), collapse=””)
> friends.initial <- sapply(strsplit(friends.name,” “), initials)

This matrix can take a long time to build, so you can change the value of N to say 40 to test your network. I needed to press the escape button to cut short the plotting of all 400 friends of mine.
> # friendship relation matrix
> N <- length(friends.id)
> friendship.matrix <- matrix(0,N,N)
> for (i in 1:N) {
+ tmp <- facebook( path=paste(“me/mutualfriends”, friends.id[i], sep=”/”) , access_token=access_token)
+ mutualfriends <- sapply(tmp$data, function(x) x$id)
+ friendship.matrix[i,friends.id %in% mutualfriends] <- 1
+ }

 

Plotting using Network package in R (with help from the  comments at http://applyr.blogspot.in/2012/01/mining-facebook-data-most-liked-status.html)

> require(network)

>net1<- as.network(friendship.matrix)

> plot(net1, label=friends.initial, arrowhead.cex=0)

(Rgraphviz is tough if you are on Windows 7 like me)

but there is an alternative igraph solution at https://github.com/sciruela/facebookFriends/blob/master/facebook.r

 

After all that-..talk.. a graph..of my Facebook Network with friends initials as labels..

 

Opinion piece-

I hope plans to make the Facebook R package get fulfilled (just as the twitteR  package led to many interesting analysis)

and also Linkedin has an API at http://developer.linkedin.com/apis

I think it would be interesting to plot professional relationships across social networks as well. But I hope to see a LinkedIn package (or blog code) soon.

As for jjplot, I had hoped ggplot and jjplot merged or atleast had some kind of inclusion in the Deducer GUI. Maybe a Google Summer of Code project if people are busy!!

Also the geeks at Facebook.com can think of giving something back to the R community, as Google generously does with funding packages like RUnit, Deducer and Summer of Code, besides sponsoring meet ups etc.

 

(note – this is part of the research for the upcoming book ” R for Business Analytics”)

 

ps-

but didnt get time to download all my posts using R code at

https://gist.github.com/1634662#

or do specific Facebook Page analysis using R at

http://tonybreyal.wordpress.com/2012/01/06/r-web-scraping-r-bloggers-facebook-page-to-gain-further-information-about-an-authors-r-blog-posts-e-g-number-of-likes-comments-shares-etc/

Updated-

 #access token from https://developers.facebook.com/tools/explorer
access_token="AAuFgaOcVaUZAssCvL9dPbZCjghTEwwhNxZAwpLdZCbw6xw7gARYoWnPHxihO1DcJgSSahd67LgZDZD"
require(RCurl)
 require(rjson)
# download the file needed for authentication http://www.brocktibert.com/blog/2012/01/19/358/
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")
# http://romainfrancois.blog.free.fr/index.php?post/2012/01/15/Crawling-facebook-with-R
facebook <- function( path = "me", access_token = token, options){
if( !missing(options) ){
options <- sprintf( "?%s", paste( names(options), "=", unlist(options), collapse = "&", sep = "" ) )
} else {
options <- ""
}
data <- getURL( sprintf( "https://graph.facebook.com/%s%s&access_token=%s", path, options, access_token ), cainfo="cacert.pem" )
fromJSON( data )
}

 # see http://applyr.blogspot.in/2012/01/mining-facebook-data-most-liked-status.html

# scrape the list of friends
friends <- facebook( path="me/friends" , access_token=access_token)
# extract Facebook IDs
friends.id <- sapply(friends$data, function(x) x$id)
# extract names 
friends.name <- sapply(friends$data, function(x)  iconv(x$name,"UTF-8","ASCII//TRANSLIT"))
# short names to initials 
initials <- function(x) paste(substr(x,1,1), collapse="")
friends.initial <- sapply(strsplit(friends.name," "), initials)

# friendship relation matrix
#N <- length(friends.id)
N <- 200
friendship.matrix <- matrix(0,N,N)
for (i in 1:N) {
  tmp <- facebook( path=paste("me/mutualfriends", friends.id[i], sep="/") , access_token=access_token)
  mutualfriends <- sapply(tmp$data, function(x) x$id)
  friendship.matrix[i,friends.id %in% mutualfriends] <- 1
}
require(network)
net1<- as.network(friendship.matrix)
plot(net1, label=friends.initial, arrowhead.cex=0)

Created by Pretty R at inside-R.org

10 Ways We will miss Steve Jobs

I am not an Apple fanboy.In fact I dont use a Mac (because Linux works well for me at much cheaper rates)

I am going to miss Steve Jobs like I miss …… still.

1) The Original Pirate – I liked Steve Jobs ever since I saw Pirates of Silicon Valley, I wanted to be like the Jobs who created jobs http://en.wikipedia.org/wiki/Pirates_of_Silicon_Valley

Artists steal. Yeah baby!

2) Music -Itunes Improbably the man who came up with the idea of music @ 99 cents helped more artists earn money in the era of Napster. Music piracy is not dead, but at 99 cents you CAN afford the songs

3) Aesthetics- and Design- as competitive barriers. It was all about the interface. People care about interfaces. Shoody software wont sell.

4) Portable Music- yes I once wrote a poem on my first Ipod. http://www.decisionstats.com/ode-to-an-ipod/ No , it doesnot rank as the top ten poems on Ipod in SERP

Walkman ‘s evolution was the Ipod – and it was everywhere.

5) Big Phones can be cool too- I loved my IPhone and so did everyone. But thats because making cool phones before that was all about making the tiniest thinnest phone. Using Videochat on Iphone and webs surfing were way much cooler than anything before or since.

6) Apps for Money for Geeks. Yes the Apps marketplace was more enriching to the geek universe than all open source put together.

7) Turtleneck Steve- You know when Steve Jobs was about to make a presentation because one week before and one week later the whole tech media behaved like either a fanboy or we are too cool to be an Apple fanboy but we will report it still. The man who wrote no code sold more technology than everyone else using just a turtleneck and presentations.

8) Pixar toons- Yes Pixar toons made sure cartoons were pieces of art and not just funny stuff anymore. This one makes me choke up

9) Kicking Microsoft butt- Who else but Steve can borrow money from MS and then beat it in every product it wanted to.

10) Not being evil. Steve Jobs made more money for more geeks than anyone. and he made it look good! The original DONT BE EVIL guy who never needed to say it aloud

Take a bow Steve Jobs (or touch the first Apple product that comes to your hand after reading this!)

The article was first written on Aug 25,2011 on Steve Jobs resignation news.It has been updated to note his departing from this planet as of yesterday.

 

 

 

 

More fun on Google Plus

I have been posting cool stuff from my G+ stream almost since the social network got released so continuing the series of posts on great stuff I get in my Google Plus stream

1) Photographers are good sharers
Anna Rumiantseva's profile photoAnna Rumiantseva originally shared this post:
Photos from our recent trip to Santa Fe, NM. These are of Loretto Chapel which has the Miraculous Staircase. This staircase has a mystery to it has it is said to be built without nails by a carpenter who showed up after the sisters of the chapel prayed for 9 days. It took several months to be built by this carpenter who then left without pay and could not be found. The sisters believe it was St. Joseph himself that built the staircase and answered their prayers.
Please share if you like!
2) Cool Designer Retro Stuff
 the water cooler at my workplace.
3) Social Media Experts-
Jay Jaboneta's profile photoJay Jaboneta originally shared this post:
GMA Network launched an online campaign to raise awareness about the responsible use of social media, so please think before you click.

Jay Jaboneta changed his profile photo.

4) No you cant share gifs on Facebook
5)  Cool Art
Monica Rocha's profile photoMonica Rocha originally shared this post:
6) Toons
Rupesh Nandy's profile photoRupesh Nandy originally shared this post:
Birthdays – Then & Now
8) Geeks rock!
David Smith's profile photoDavid Smith originally shared this post:
Yet another instance of the Golden Ratio in Nature: Irene.
lastly 9) Digital art
Marcelo Almeida's profile photoMarcelo Almeida originally shared this post:
behind the smile

 

But Willie Nelson rules them all

Willie Nelson covers Coldplay. Sounds pretty good! This reminds me of Johnny Cash’s cover ofHurt. (Yes, this is a Chipotle ad. It’s still pretty cool.)
youtube.com – Coldplay’s haunting classic ‘The Scientist’ is performed by country music legend Willie Nelson

https://www.youtube-nocookie.com/v/aMfSGt6rHos?version=3&hl=en_US&rel=0

– see earlier posts at

  1. https://decisionstats.com/best-of-google-plus-week-1-top10/
  2. http://www.decisionstats.com/best-of-google-plus-week-2-top-10/
  3.  http://www.decisionstats.com/the-best-of-google-plus-week-3-top-10/
  4. http://www.decisionstats.com/funny-stuff-on-google-plus/
  5. http://www.decisionstats.com/fun-with-google-plus/
Warning- this and earlier post deals with cute memes that can take a lot of time and energy!

 

 

 

 

 

Interview Beth Schultz Editor AllAnalytics.com

Here is an interview with Beth Scultz Editor in Chief, AllAnalytics.com .

Allanalytics.com http://www.allanalytics.com/ is the new online community on Predictive Analytics, and its a bit different in emphasizing quality more than just quantity. Beth is veteran in tech journalism and communities.

Ajay-Describe your journey in technology journalism and communication. What are the other online communities that you have been involved with?

Beth- I’m a longtime IT journalist, having begun my career covering the telecommunications industry at the brink of AT&T’s divestiture — many eons ago. Over the years, I’ve covered the rise of internal corporate networking; the advent of the Internet and creation of the Web for business purposes; the evolution of Web technology for use in building intranets, extranets, and e-commerce sites; the move toward a highly dynamic next-generation IT infrastructure that we now call cloud computing; and development of myriad enterprise applications, including business intelligence and the analytics surrounding them. I have been involved in developing online B2B communities primarily around next-generation enterprise IT infrastructure and applications. In addition, Shawn Hessinger, our community editor, has been involved in myriad Web sites aimed at creating community for small business owners.

 Ajay- Technology geeks get all the money while journalists get a story. Comments please

Beth- Great technology geeks — those being the ones with technology smarts as well as business savvy — do stand to make a lot of money. And some pursue that to all ends (with many entrepreneurs gunning for the acquisition) while others more or less fall into it. Few journalists, at least few tech journalists, have big dollars in mind. The gratification for journalists comes in being able to meet these folks, hear and deliver their stories — as appropriate — and help explain what makes this particular technology geek developing this certain type of product or service worth paying attention to.

 Ajay- Describe what you are trying to achieve with the All Analytics community and how it seeks to differentiate itself with other players in this space.

 Beth- With AllAnaltyics.com, we’re concentrating on creating the go-to site for CXOs, IT professionals, line-of-business managers, and other professionals to share best practices, concrete experiences, and research about data analytics, business intelligence, information optimization, and risk management, among many other topics. We differentiate ourself by featuring excellent editorial content from a top-notch group of bloggers, access to industry experts through weekly chats, ongoing lively and engaging message board discussions, and biweekly debates.

We’re a new property, and clearly in rapid building mode. However, we’ve already secured some of the industry’s most respected BI/analytics experts to participate as bloggers. For example, a small sampling of our current lineup includes the always-intrigueing John Barnes, a science fiction novelist and statistics guru; Sandra Gittlen, a longtime IT journalist with an affinity for BI coverage; Olivia Parr-Rud, an internationally recognized expert in BI and organizational alignment; Tom Redman, a well-known data-quality expert; and Steve Williams, a leading BI strategy consultant. I blog daily as well, and in particular love to share firsthand experiences of how organizations are benefiting from the use of BI, analytics, data warehousing, etc. We’ve featured inside looks at analytics initiatives at companies such as 1-800-Flowers.com, Oberweis Dairy, the Cincinnati Zoo & Botanical Garden, and Thomson Reuters, for example.

In addition, we’ve hosted instant e-chats with Web and social media experts Joe Stanganelli and Pierre DeBois, and this Friday, Aug. 26, at 3 p.m. ET we’ll be hosting an e-chat with Marshall Sponder, Web metrics guru and author of the newly published book, Social Media Analytics: Effective Tools for Building, Interpreting, and Using Metrics. (Readers interested in participating in the chat do need to fill out a quick registration form, available here http://www.allanalytics.com/register.asp . The chat is available here http://www.allanalytics.com/messages.asp?piddl_msgthreadid=241039&piddl_msgid=439898#msg_439898 .

Experts participating in our biweekly debate series, called Point/Counterpoint, have broached topics such as BI in the cloud, mobile BI and whether an analytics culture is truly possible to build.

Ajay-  What are some tips you would like to share about writing tech stories to aspiring bloggers.

Beth- I suppose my best advice is this: Don’t write about technology for technology’s sake. Always strive to tell the audience why they should care about a particular technology, product, or service. How might a reader use it to his or her company’s advantage, and what are the potential benefits? Improved productivity, increased revenue, better customer service? Providing anecdotal evidence goes a long way toward delivering that message, as well.

Ajay- What are the other IT world websites that have made a mark on the internet.

Beth- I’d be remiss if I didn’t give a shout out to UBM TechWeb sites, including InformationWeek, which has long charted the use of IT within the enterprise; Dark Reading, a great source for folks interested in securing an enterprise’s information assets; and Light Reading, which takes the pulse of the telecom industry.

 Biography- 

Beth Schultz has more than two decades of experience as an IT writer and editor. Most recently, she brought her expertise to bear writing thought-provoking editorial and marketing materials on a variety of technology topics for leading IT publications and industry players. Previously, she oversaw multimedia content development, writing and editing for special feature packages at Network World. Beth has a keen ability to identify business and technology trends, developing expertise through in-depth analysis and early-adopter case studies. Over the years, she has earned more than a dozen national and regional editorial excellence awards for special issues from American Business Media, American Society of Business Press Editors, Folio.net, and others.

 

DirkE and JD swoon about Shane's MOM in Room 106 while writing R code

In a shadowy room in cyberworld , two geeks plot revenge on a common

blgger and up vote each other on stack overflow while discussing Shane’s MOM

http://chat.stackoverflow.com/transcript/106/2010/11/15

 

How can you announce this on SO?

Oh…

Sure…go for it.

I’ll downvote it.

🙂

We should also add it into the [r] wiki.

I added it to the wiki.

We should probably try to clean that up a little; some of the other tags have put a lot of effort into it. (e.g. stackoverflow.com/tags/java/info)

Whoa! I didn’t downvote your post.

I was wondering…
Feel free to upvote it to set it back to even.

3:18 PM

I did.

Someone voted to close too.

Some people take themselves way too seriously…

Yup. And not unlike the people constantly call for community-wiki.

BTW I didn’t see the button for CW anymore once it was posted. What am I missing?

I think that I may have seen something about a bug related to that…

Four close votes, and -2 score. Whoa Nelly.

Ha! I’m not overly surprised. Meant to suggest that you use CW…

Ironically, you’re still ahead in the rep. on this question, right? Although I think that it might get downvoted into oblivion before we’re done…

I up-voted. Dirk, I’ve got your back. 😉

 

 

Begin……

3 hours later…

8:09 PM

@DirkEddelbuettel you catch Ajay’s latest? ow.ly/3a8gK

Jeebus

I had actually unsub’ed from his feed. Now I know why. How you’re doing with the Yahoo Pipes app?

Methinks he has some sort of clinical compulsive condition given how every single post has to include a reference that his facvourite software company from NC, and/or members of their management team.

@DirkEddelbuettel I stumbled on that one in Twitter. pipes project has been tabled while I fight some other battles.

I think he’s fishing for SEO sugar with his posts. His use of words seems contrived to include key words over and over

Twitter is so useless, between him and Ed Borasky’s (znmeb) spambots nothing else of value appers.

I guess like so many streams it requires filtering. The basic twitter blocking takes them out prett

y quickly

So blocking is common? They ought to show that: “subscribed to N, listened to by M and showing good taste by blocking O asshats”
8:19 PM

ha! yeah that would be good signaling. Not sure how common it is, but I use it mostly for spam bots. I actually have only blocked 2 warm blooded humans (counting Ajay’s multiple accts as one person)

Dirk Eddelbuettel

Oh boy 🙂 Romain has fired a salvo on r-devel: “Depends on what your goal is: getting the job done, or learning about the R/C API”. Hehe.

@JDLong Tell who: One is Shane’s mother, and the other is … ?

JD Long
JD Long

speaking of shane’s mom, he and Josh deciding to be productive members of real society today?

Blog Update

Some changes at Decisionstats-

1) We are back at Decisionstats.com and Decisionstats.wordpress.com will point to that as well. The SEO effects would be interesting and so would be the Instant Pagerank or LinkRank or whatever Coffee/Percolator they use in Cali to index the site.

2) AsterData is no longer a sponsor- but Predictive Analytics Conference is. Welcome PAWS! I have been a blog partner to PAWS ever since it began- and it’s a great marketing fit. Expect to see a lot of exclusive content and interviews from great speakers at PAWS.

3) The Feedblitz newsletter (now at 404 subscribers) is now a weekly subscription to send one big big email rather than lots of email through the week- this is because my blogging frequency is moving up as I collect material for a new book on business analytics that I would probably release in 2011 (if all goes well, touchwood). Linkedin group would be getting a weekly update announcement. If you are connected to Decisionstats on Analyticbridge _ I would soon try to find a way to update the whole post automatically using RSS and Ning.com . or not. Depends.

4) R continues to be a bigger focus. So will SPSS and maybe JMP. Newer softwares or older softwares that change more rapidly would get more coverage. Generally a particular software is covered if it has newer features, or an interesting techie conference, or it gets sued.

5) I will occasionally write a poem or post a video once a week randomly to prove geeks and nerds and analysts can have fun (much more fun actually dont we)

Thanks for reading this. Sept 2010 was the best ever for Decisionstats.com – we crossed 15,000 + visitors and thanks for that again! I promise to bore you less and less as we grow old together on the blog 😉

Twitter Cloud and a note on Cloud Computing

That’s what I use twitter for. If you have a twitter account you can follow me here

http://twitter.com/decisionstats

A couple of weeks ago I accidentally deleted many followers using a Twitter App called Refollow- I was trying to clean up people I follow and checked the wrong tick box-

so please if you feel I unfollowed you- it was a mistake. Seriously.

[tweetmeme=”decisionstats”]

 

 

 

 

 

 

 

 

 

 

 

 

On Cloud Computing- and Google- rumours ( 🙂 ) are emerging that Google’s push for cloud computing is to turn desktop computing to IBM like mainframe computing .  Except that there are too many players this time. Where is the Department of Justice and anti trust – does Amazon qualify for being too big in cloud computing currently.

Or the rumours could be spread by Microsoft/ Apple / Amazon competitors etc. Geeks are like that sometimes.