R for Business Analytics- Book by Ajay Ohri

So the cover art is ready, and if you are a reviewer, you can reserve online copies of the book I have been writing for past 2 years. Special thanks to my mentors, detractors, readers and students- I owe you a beer!

You can also go here-

http://www.springer.com/statistics/book/978-1-4614-4342-1

 

R for Business Analytics

R for Business Analytics

Ohri, Ajay

2012, 2012, XVI, 300 p. 208 illus., 162 in color.

Hardcover
Information

ISBN 978-1-4614-4342-1

Due: September 30, 2012

(net)

approx. 44,95 €
  • Covers full spectrum of R packages related to business analytics
  • Step-by-step instruction on the use of R packages, in addition to exercises, references, interviews and useful links
  • Background information and exercises are all applied to practical business analysis topics, such as code examples on web and social media analytics, data mining, clustering and regression models

R for Business Analytics looks at some of the most common tasks performed by business analysts and helps the user navigate the wealth of information in R and its 4000 packages.  With this information the reader can select the packages that can help process the analytical tasks with minimum effort and maximum usefulness. The use of Graphical User Interfaces (GUI) is emphasized in this book to further cut down and bend the famous learning curve in learning R. This book is aimed to help you kick-start with analytics including chapters on data visualization, code examples on web analytics and social media analytics, clustering, regression models, text mining, data mining models and forecasting. The book tries to expose the reader to a breadth of business analytics topics without burying the user in needless depth. The included references and links allow the reader to pursue business analytics topics.

 

This book is aimed at business analysts with basic programming skills for using R for Business Analytics. Note the scope of the book is neither statistical theory nor graduate level research for statistics, but rather it is for business analytics practitioners. Business analytics (BA) refers to the field of exploration and investigation of data generated by businesses. Business Intelligence (BI) is the seamless dissemination of information through the organization, which primarily involves business metrics both past and current for the use of decision support in businesses. Data Mining (DM) is the process of discovering new patterns from large data using algorithms and statistical methods. To differentiate between the three, BI is mostly current reports, BA is models to predict and strategize and DM matches patterns in big data. The R statistical software is the fastest growing analytics platform in the world, and is established in both academia and corporations for robustness, reliability and accuracy.

Content Level » Professional/practitioner

Keywords » Business Analytics – Data Mining – Data Visualization – Forecasting – GUI – Graphical User Interface – R software – Text Mining

Related subjects » Business, Economics & Finance – Computational Statistics – Statistics

TABLE OF CONTENTS

Why R.- R Infrastructure.- R Interfaces.- Manipulating Data.- Exploring Data.- Building Regression Models.- Data Mining using R.- Clustering and Data Segmentation.- Forecasting and Time-Series Models.- Data Export and Output.- Optimizing your R Coding.- Additional Training Literature.- Appendix

Avengers Review

Avengers is the big ticket block buster which heralds summer just like the groundhog denotes spring. An ensemble cast (of superheroes and okay actors) , it stars Hulk (angry green man aka Dr Bruce Banner /Mark Ruffalo) ,Iron Man (genius billionaire philanthropist playboy aka Tony Stark / Robert Downey Jr), Thor (an Australian looking Chris H), Loki (God of Mischief played  by German looking Tom Hiddleston ), Captain America  and Scarlet Johnassen and Jeremy “Hurt Locker” Renner and Samuel L Jackson. You know somethings is gotta give if the A List stars(?) in the cast is going to be longer than a plot summary.

Well Loki the bad guys strikes a deal with some other bad Guys of funnily named world called Assguard (parallel universe!) and tries to find a cube (which is all energy powerful like the Transformers 1 Cube)  and in return gets an Army from the dark side (who look just  like Cybertrons and Lords of the Rings orcs combined).

The Avengers after much dilly dallying, trying to emote, create bromances, tension buildup, in the end decide to give you what you came looking for- a visual feast of credible looking CGI to counter the bad guys. The scene stealer is the Hulk. He is kind of cute for a big green guy, if you dont know what I mean, see the movie!

This is American cinema at its most profoundly intellectual since the Die Hard series. and its quite entertaining, especially if you are a geeky comic book fan-boy (like me).

Summer is here and so are the super-heroes!! Unleash the popcorn.

 

How to learn Hacking Part 2

Now that you have read the basics here at http://www.decisionstats.com/how-to-learn-to-be-a-hacker-easily/ (please do read this before reading the below)

 

Here is a list of tutorials that you should study (in order of ease)

1) LEARN BASICS – enough to get you a job maybe if that’s all you wanted.

http://www.offensive-security.com/metasploit-unleashed/Main_Page

2) READ SOME MORE-

Lena’s Reverse Engineering Tutorial-“Use Google.com  for finding the Tutorial

Lena’s Reverse Engineering tutorial. It includes 36 parts of individual cracking techniques and will teach you the basics of protection bypassing

01. Olly + assembler + patching a basic reverseme
02. Keyfiling the reverseme + assembler
03. Basic nag removal + header problems
04. Basic + aesthetic patching
05. Comparing on changes in cond jumps, animate over/in, breakpoints
06. “The plain stupid patching method”, searching for textstrings
07. Intermediate level patching, Kanal in PEiD
08. Debugging with W32Dasm, RVA, VA and offset, using LordPE as a hexeditor
09. Explaining the Visual Basic concept, introduction to SmartCheck and configuration
10. Continued reversing techniques in VB, use of decompilers and a basic anti-anti-trick
11. Intermediate patching using Olly’s “pane window”
12. Guiding a program by multiple patching.
13. The use of API’s in software, avoiding doublechecking tricks
14. More difficult schemes and an introduction to inline patching
15. How to study behaviour in the code, continued inlining using a pointer
16. Reversing using resources
17. Insights and practice in basic (self)keygenning
18. Diversion code, encryption/decryption, selfmodifying code and polymorphism
19. Debugger detected and anti-anti-techniques
20. Packers and protectors : an introduction
21. Imports rebuilding
22. API Redirection
23. Stolen bytes
24. Patching at runtime using loaders from lena151 original
25. Continued patching at runtime & unpacking armadillo standard protection
26. Machine specific loaders, unpacking & debugging armadillo
27. tElock + advanced patching
28. Bypassing & killing server checks
29. Killing & inlining a more difficult server check
30. SFX, Run Trace & more advanced string searching
31. Delphi in Olly & DeDe
32. Author tricks, HIEW & approaches in inline patching
33. The FPU, integrity checks & loader versus patcher
34. Reversing techniques in packed software & a S&R loader for ASProtect
35. Inlining inside polymorphic code
36. Keygenning

If you want more free training – hang around this website

http://www.owasp.org/index.php/Cheat_Sheets

OWASP Cheat Sheet Series

Draft OWASP Cheat Sheets

3) SPEND SOME MONEY on TRAINING

http://www.corelan-training.com/index.php/training/corelan-live/

Course overview

Module 1 – The x86 environment

  • System Architecture
  • Windows Memory Management
  • Registers
  • Introduction to Assembly
  • The stack

Module 2 – The exploit developer environment

  • Setting up the exploit developer lab
  • Using debuggers and debugger plugins to gather primitives

Module 3 – Saved Return Pointer Overwrite

  • Functions
  • Saved return pointer overwrites
  • Stack cookies

Module 4 – Abusing Structured Exception Handlers

  • Abusing exception handler overwrites
  • Bypassing Safeseh

Module 5 – Pointer smashing

  • Function pointers
  • Data/object pointers
  • vtable/virtual functions

Module 6 – Off-by-one and integer overflows

  • Off-by-one
  • Integer overflows

Module 7 – Limited buffers

  • Limited buffers, shellcode splitting

Module 8 – Reliability++ & reusability++

  • Finding and avoiding bad characters
  • Creative ways to deal with character set limitations

Module 9 – Fun with Unicode

  • Exploiting Unicode based overflows
  • Writing venetian alignment code
  • Creating and Using venetian shellcode

Module 10 – Heap Spraying Fundamentals

  • Heap Management and behaviour
  • Heap Spraying for Internet Explorer 6 and 7

Module 11 – Egg Hunters

  • Using and tweaking Egg hunters
  • Custom egghunters
  • Using Omelet egghunters
  • Egghunters in a WoW64 environment

Module 12 – Shellcoding

  • Building custom shellcode from scratch
  • Understanding existing shellcode
  • Writing portable shellcode
  • Bypassing Antivirus

Module 13 – Metasploit Exploit Modules

  • Writing exploits for the Metasploit Framework
  • Porting exploits to the Metasploit Framework

Module 14 – ASLR

  • Bypassing ASLR

Module 15 – W^X

  • Bypassing NX/DEP
  • Return Oriented Programming / Code Reuse (ROP) )

Module 16 – Advanced Heap Spraying

  • Heap Feng Shui & heaplib
  • Precise heap spraying in modern browsers (IE8 & IE9, Firefox 13)

Module 17 – Use After Free

  • Exploiting Use-After-Free conditions

Module 18 – Windows 8

  • Windows 8 Memory Protections and Bypass
TRAINING SCHEDULES AT

ALSO GET CERTIFIED http://www.offensive-security.com/information-security-training/penetration-testing-with-backtrack/ ($950 cost)

the syllabus is here at

http://www.offensive-security.com/documentation/penetration-testing-with-backtrack.pdf

4) HANG AROUND OTHER HACKERS

At http://attrition.org/attrition/

or The Noir  Hat Conferences-

http://blackhat.com/html/bh-us-12/training/bh-us-12-training_complete.html

or read this website

http://software-security.sans.org/developer-how-to/

5) GET A DEGREE

Yes it is possible

 

See http://web.jhu.edu/jhuisi/

The Johns Hopkins University Information Security Institute (JHUISI) is the University’s focal point for research and education in information security, assurance and privacy.

Scholarship Information

 

The Information Security Institute is now accepting applications for the Department of Defense’s Information Assurance Scholarship Program (IASP).  This scholarship includes full tuition, a living stipend, books and health insurance. In return each student recipient must work for a DoD agency at a competitive salary for six months for every semester funded. The scholarship is open to American citizens only.

http://web.jhu.edu/jhuisi/mssi/index.html

MASTER OF SCIENCE IN SECURITY INFORMATICS PROGRAM

The flagship educational experience offered by Johns Hopkins University in the area of information security and assurance is represented by the Master of Science in Security Informatics degree.  Over thirty courses are available in support of this unique and innovative graduate program.

———————————————————–

Disclaimer- I havent done any of these things- This is just a curated list from Quora  so I am open to feedback.

You use this at your own risk of conscience ,local legal jurisdictions and your own legal liability.

 

 

 

 

 

 

Book Review- Machine Learning for Hackers

This is review of the fashionably named book Machine Learning for Hackers by Drew Conway and John Myles White (O’Reilly ). The book is about hacking code in R.

 

The preface introduces the reader to the authors conception of what machine learning and hacking is all about. If the name of the book was machine learning for business analytsts or data miners, I am sure the content would have been unchanged though the popularity (and ambiguity) of the word hacker can often substitute for its usefulness. Indeed the many wise and learned Professors of statistics departments through out the civilized world would be mildly surprised and bemused by their day to day activities as hacking or teaching hackers. The book follows a case study and example based approach and uses the GGPLOT2 package within R programming almost to the point of ignoring any other native graphics system based in R. It can be quite useful for the aspiring reader who wishes to understand and join the booming market for skilled talent in statistical computing.

Chapter 1 has a very useful set of functions for data cleansing and formatting. It walks you through the basics of formatting based on dates and conditions, missing value and outlier treatment and using ggplot package in R for graphical analysis. The case study used is an Infochimps dataset with 60,000 recordings of UFO sightings. The case study is lucid, and done at a extremely helpful pace illustrating the powerful and flexible nature of R functions that can be used for data cleansing.The chapter mentions text editors and IDEs but fails to list them in a tabular format, while listing several other tables like Packages used in the book. It also jumps straight from installation instructions to functions in R without getting into the various kinds of data types within R or specifying where these can be referenced from. It thus assumes a higher level of basic programming understanding for the reader than the average R book.

Chapter 2 discusses data exploration, and has a very clear set of diagrams that explain the various data summary operations that are performed routinely. This is an innovative approach and will help students or newcomers to the field of data analysis. It introduces the reader to type determination functions, as well different kinds of encoding. The introduction to creating functions is quite elegant and simple , and numerical summary methods are explained adequately. While the chapter explains data exploration with the help of various histogram options in ggplot2 , it fails to create a more generic framework for data exploration or rules to assist the reader in visual data exploration in non standard data situations. While the examples are very helpful for a reader , there needs to be slightly more depth to step out of the example and into a framework for visual data exploration (or references for the same). A couple of case studies however elaborately explained cannot do justice to the vast field of data exploration and especially visual data exploration.

Chapter 3 discussed binary classification for the specific purpose for spam filtering using a dataset from SpamAssassin. It introduces the reader to the naïve Bayes classifier and the principles of text mining suing the tm package in R. Some of the example codes could have been better commented for easier readability in the book. Overall it is quite a easy tutorial for creating a naïve Bayes classifier even for beginners.

Chapter 4 discusses the issues in importance ranking and creating recommendation systems specifically in the case of ordering email messages into important and not important. It introduces the useful grepl, gsub, strsplit, strptime ,difftime and strtrim functions for parsing data. The chapter further introduces the reader to the concept of log (and affine) transformations in a lucid and clear way that can help even beginners learn this powerful transformation concept. Again the coding within this chapter is sparsely commented which can cause difficulties to people not used to learn reams of code. ( it may have been part of the code attached with the book, but I am reading an electronic book and I did not find an easy way to go back and forth between the code and the book). The readability of the chapters would be further enhanced by the use of flow charts explaining the path and process followed than overtly verbose textual descriptions running into multiple pages. The chapters are quite clearly written, but a helpful visual summary can help in both revising the concepts and elucidate the approach taken further.A suggestion for the authors could be to compile the list of useful functions they introduce in this book as a sort of reference card (or Ref Card) for R Hackers or atleast have a chapter wise summary of functions, datasets and packages used.

Chapter 5 discusses linear regression , and it is a surprising and not very good explanation of regression theory in the introduction to regression. However the chapter makes up in practical example what it oversimplifies in theory. The chapter on regression is not the finest chapter written in this otherwise excellent book. Part of this is because of relative lack of organization- correlation is explained after linear regression is explained. Once again the lack of a function summary and a process flow diagram hinders readability and a separate section on regression metrics that help make a regression result good or not so good could be a welcome addition. Functions introduced include lm.

Chapter 6 showcases Generalized Additive Model (GAM) and Polynomial Regression, including an introduction to singularity and of over-fitting. Functions included in this chapter are transform, and poly while the package glmnet is also used here. The chapter also introduces the reader formally to the concept of cross validation (though examples of cross validation had been introduced in earlier chapters) and regularization. Logistic regression is also introduced at the end in this chapter.

Chapter 7 is about optimization. It describes error metric in a very easy to understand way. It creates a grid by using nested loops for various values of intercept and slope of a regression equation and computing the sum of square of errors. It then describes the optim function in detail including how it works and it’s various parameters. It introduces the curve function. The chapter then describes ridge regression including definition and hyperparameter lamda. The use of optim function to optimize the error in regression is useful learning for the aspiring hacker. Lastly it describes a case study of breaking codes using the simplistic Caesar cipher, a lexical database and the Metropolis method. Functions introduced in this chapter include .Machine$double.eps .

Chapter 8 deals with Principal Component Analysis and unsupervised learning. It uses the ymd function from lubridate package to convert string to date objects, and the cast function from reshape package to further manipulate the structure of data. Using the princomp functions enables PCA in R.The case study creates a stock market index and compares the results with the Dow Jones index.

Chapter 9 deals with Multidimensional Scaling as well as clustering US senators on the basis of similarity in voting records on legislation .It showcases matrix multiplication using %*% and also the dist function to compute distance matrix.

Chapter 10 has the subject of K Nearest Neighbors for recommendation systems. Packages used include class ,reshape and and functions used include cor, function and log. It also demonstrates creating a custom kNN function for calculating Euclidean distance between center of centroids and data. The case study used is the R package recommendation contest on Kaggle. Overall a simplistic introduction to creating a recommendation system using K nearest neighbors, without getting into any of the prepackaged packages within R that deal with association analysis , clustering or recommendation systems.

Chapter 11 introduces the reader to social network analysis (and elements of graph theory) using the example of Erdos Number as an interesting example of social networks of mathematicians. The example of Social Graph API by Google for hacking are quite new and intriguing (though a bit obsolete by changes, and should be rectified in either the errata or next edition) . However there exists packages within R that should be atleast referenced or used within this chapter (like TwitteR package that use the Twitter API and ROauth package for other social networks). Packages used within this chapter include Rcurl, RJSONIO, and igraph packages of R and functions used include rbind and ifelse. It also introduces the reader to the advanced software Gephi. The last example is to build a recommendation engine for whom to follow in Twitter using R.

Chapter 12 is about model comparison and introduces the concept of Support Vector Machines. It uses the package e1071 and shows the svm function. It also introduces the concept of tuning hyper parameters within default algorithms . A small problem in understanding the concepts is the misalignment of diagram pages with the relevant code. It lastly concludes with using mean square error as a method for comparing models built with different algorithms.

 

Overall the book is a welcome addition in the library of books based on R programming language, and the refreshing nature of the flow of material and the practicality of it’s case studies make this a recommended addition to both academic and corporate business analysts trying to derive insights by hacking lots of heterogeneous data.

Have a look for yourself at-
http://shop.oreilly.com/product/0636920018483.do

Facebook and R

Part 1 How do people at Facebook use R?

tamar Rosenn, Facebook

Itamar conveyed how Facebook’s Data Team used R in 2007 to answer two questions about new users: (i) which data points predict whether a user will stay? and (ii) if they stay, which data points predict how active they’ll be after three months?

For the first question, Itamar’s team used recursive partitioning (via the rpartpackage) to infer that just two data points are significantly predictive of whether a user remains on Facebook: (i) having more than one session as a new user, and (ii) entering basic profile information.

For the second question, they fit the data to a logistic model using a least angle regression approach (via the lars package), and found that activity at three months was predicted by variables related to three classes of behavior: (i) how often a user was reached out to by others, (ii) frequency of third party application use, and (iii) what Itamar termed “receptiveness” — related to how forthcoming a user was on the site.

source-http://www.dataspora.com/2009/02/predictive-analytics-using-r/

and cute graphs like the famous

https://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919

 

and

studying baseball on facebook

https://www.facebook.com/notes/facebook-data-team/baseball-on-facebook/10150142265858859

by counting the number of posts that occurred the day after a team lost divided by the total number of wins, since losses for great teams are remarkable and since winning teams’ fans just post more.

 

But mostly at

https://www.facebook.com/data?sk=notes and https://www.facebook.com/data?v=app_4949752878

 

and creating new packages

1. jjplot (not much action here!)

https://r-forge.r-project.org/scm/viewvc.php/?root=jjplot

though

I liked the promise of JJplot at

http://pleasescoopme.com/2010/03/31/using-jjplot-to-explore-tipping-behavior/

2. ising models

https://github.com/slycoder/Rflim

https://www.facebook.com/note.php?note_id=10150359708746212

3. R pipe

https://github.com/slycoder/Rpipe

 

even the FB interns are cool

http://brenocon.com/blog/2009/02/comparison-of-data-analysis-packages-r-matlab-scipy-excel-sas-spss-stata/

 

Part 2 How do people with R use Facebook?

Using the API at https://developers.facebook.com/tools/explorer

and code mashes from

 

http://romainfrancois.blog.free.fr/index.php?post/2012/01/15/Crawling-facebook-with-R

http://applyr.blogspot.in/2012/01/mining-facebook-data-most-liked-status.html

but the wonderful troubleshooting code from http://www.brocktibert.com/blog/2012/01/19/358/

which needs to be added to the code first

 

and using network package

>access_token=”XXXXXXXXXXXX”

Annoyingly the Facebook token can expire after some time, this can lead to huge wait and NULL results with Oauth errors

If that happens you need to regenerate the token

What we need
> require(RCurl)
> require(rjson)
> download.file(url=”http://curl.haxx.se/ca/cacert.pem”, destfile=”cacert.pem”)

Roman’s Famous Facebook Function (altered)

> facebook <- function( path = “me”, access_token , options){
+ if( !missing(options) ){
+ options <- sprintf( “?%s”, paste( names(options), “=”, unlist(options), collapse = “&”, sep = “” ) )
+ } else {
+ options <- “”
+ }
+ data <- getURL( sprintf( “https://graph.facebook.com/%s%s&access_token=%s&#8221;, path, options, access_token ), cainfo=”cacert.pem” )
+ fromJSON( data )
+ }

 

Now getting the friends list
> friends <- facebook( path=”me/friends” , access_token=access_token)
> # extract Facebook IDs
> friends.id <- sapply(friends$data, function(x) x$id)
> # extract names
> friends.name <- sapply(friends$data, function(x) iconv(x$name,”UTF-8″,”ASCII//TRANSLIT”))
> # short names to initials
> initials <- function(x) paste(substr(x,1,1), collapse=””)
> friends.initial <- sapply(strsplit(friends.name,” “), initials)

This matrix can take a long time to build, so you can change the value of N to say 40 to test your network. I needed to press the escape button to cut short the plotting of all 400 friends of mine.
> # friendship relation matrix
> N <- length(friends.id)
> friendship.matrix <- matrix(0,N,N)
> for (i in 1:N) {
+ tmp <- facebook( path=paste(“me/mutualfriends”, friends.id[i], sep=”/”) , access_token=access_token)
+ mutualfriends <- sapply(tmp$data, function(x) x$id)
+ friendship.matrix[i,friends.id %in% mutualfriends] <- 1
+ }

 

Plotting using Network package in R (with help from the  comments at http://applyr.blogspot.in/2012/01/mining-facebook-data-most-liked-status.html)

> require(network)

>net1<- as.network(friendship.matrix)

> plot(net1, label=friends.initial, arrowhead.cex=0)

(Rgraphviz is tough if you are on Windows 7 like me)

but there is an alternative igraph solution at https://github.com/sciruela/facebookFriends/blob/master/facebook.r

 

After all that-..talk.. a graph..of my Facebook Network with friends initials as labels..

 

Opinion piece-

I hope plans to make the Facebook R package get fulfilled (just as the twitteR  package led to many interesting analysis)

and also Linkedin has an API at http://developer.linkedin.com/apis

I think it would be interesting to plot professional relationships across social networks as well. But I hope to see a LinkedIn package (or blog code) soon.

As for jjplot, I had hoped ggplot and jjplot merged or atleast had some kind of inclusion in the Deducer GUI. Maybe a Google Summer of Code project if people are busy!!

Also the geeks at Facebook.com can think of giving something back to the R community, as Google generously does with funding packages like RUnit, Deducer and Summer of Code, besides sponsoring meet ups etc.

 

(note – this is part of the research for the upcoming book ” R for Business Analytics”)

 

ps-

but didnt get time to download all my posts using R code at

https://gist.github.com/1634662#

or do specific Facebook Page analysis using R at

http://tonybreyal.wordpress.com/2012/01/06/r-web-scraping-r-bloggers-facebook-page-to-gain-further-information-about-an-authors-r-blog-posts-e-g-number-of-likes-comments-shares-etc/

Updated-

 #access token from https://developers.facebook.com/tools/explorer
access_token="AAuFgaOcVaUZAssCvL9dPbZCjghTEwwhNxZAwpLdZCbw6xw7gARYoWnPHxihO1DcJgSSahd67LgZDZD"
require(RCurl)
 require(rjson)
# download the file needed for authentication http://www.brocktibert.com/blog/2012/01/19/358/
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")
# http://romainfrancois.blog.free.fr/index.php?post/2012/01/15/Crawling-facebook-with-R
facebook <- function( path = "me", access_token = token, options){
if( !missing(options) ){
options <- sprintf( "?%s", paste( names(options), "=", unlist(options), collapse = "&", sep = "" ) )
} else {
options <- ""
}
data <- getURL( sprintf( "https://graph.facebook.com/%s%s&access_token=%s", path, options, access_token ), cainfo="cacert.pem" )
fromJSON( data )
}

 # see http://applyr.blogspot.in/2012/01/mining-facebook-data-most-liked-status.html

# scrape the list of friends
friends <- facebook( path="me/friends" , access_token=access_token)
# extract Facebook IDs
friends.id <- sapply(friends$data, function(x) x$id)
# extract names 
friends.name <- sapply(friends$data, function(x)  iconv(x$name,"UTF-8","ASCII//TRANSLIT"))
# short names to initials 
initials <- function(x) paste(substr(x,1,1), collapse="")
friends.initial <- sapply(strsplit(friends.name," "), initials)

# friendship relation matrix
#N <- length(friends.id)
N <- 200
friendship.matrix <- matrix(0,N,N)
for (i in 1:N) {
  tmp <- facebook( path=paste("me/mutualfriends", friends.id[i], sep="/") , access_token=access_token)
  mutualfriends <- sapply(tmp$data, function(x) x$id)
  friendship.matrix[i,friends.id %in% mutualfriends] <- 1
}
require(network)
net1<- as.network(friendship.matrix)
plot(net1, label=friends.initial, arrowhead.cex=0)

Created by Pretty R at inside-R.org

Interview Prof Benjamin Alamar , Sports Analytics

Here is an interview with Prof Benjamin Alamar, founding editor of the Journal of Quantitative Analysis in Sport, a professor of sports management at Menlo College and the Director of Basketball Analytics and Research for the Oklahoma City Thunder of the NBA.

Ajay – The movie Moneyball recently sparked out mainstream interest in analytics in sports.Describe the role of analytics in sports management

Benjamin- Analytics is impacting sports organizations on both the sport and business side.
On the Sport side, teams are using analytics, including advanced data management, predictive anlaytics, and information systems to gain a competitive edge. The use of analytics results in more accurate player valuations and projections, as well as determining effective strategies against specific opponents.
On the business side, teams are using the tools of analytics to increase revenue in a variety of ways including dynamic ticket pricing and optimizing of the placement of concession stands.
Ajay-  What are the ways analytics is used in specific sports that you have been part of?

Benjamin- A very typical first step for a team is to utilize the tools of predictive analytics to help inform their draft decisions.

Ajay- What are some of the tools, techniques and software that analytics in sports uses?
Benjamin- The tools of sports analytics do not differ much from the tools of business analytics. Regression analysis is fairly common as are other forms of data mining. In terms of software, R is a popular tool as is Excel and many of the other standard analysis tools.
Ajay- Describe your career journey and how you became involved in sports management. What are some of the tips you want to tell young students who wish to enter this field?

Benjamin- I got involved in sports through a company called Protrade Sports. Protrade initially was a fantasy sports company that was looking to develop a fantasy game based on advanced sports statistics and utilize a stock market concept instead of traditional drafting. I was hired due to my background in economics to develop the market aspect of the game.

There I met Roland Beech (who now works for the Mavericks) and Aaron Schatz (owner of footballoutsiders.com) and learned about the developing field of sports statistics. I then changed my research focus from economics to sports statistics and founded the Journal of Quantitative Analysis in Sports. Through the journal and my published research, I was able to establish a reputation of doing quality, useable work.

For students, I recommend developing very strong data management skills (sql and the like) and thinking carefully about what sort of questions a general manager or coach would care about. Being able to demonstrate analytic skills around actionable research will generally attract the attention of pro teams.

About-

Benjamin Alamar, Professor of Sport Management, Menlo College

Benjamin Alamar

Professor Benjamin Alamar is the founding editor of the Journal of Quantitative Analysis in Sport, a professor of sports management at Menlo College and the Director of Basketball Analytics and Research for the Oklahoma City Thunder of the NBA. He has published academic research in football, basketball and baseball, has presented at numerous conferences on sports analytics. He is also a co-creator of ESPN’s Total Quarterback Rating and a regular contributor to the Wall Street Journal. He has consulted for teams in the NBA and NFL, provided statistical analysis for author Michael Lewis for his recent book The Blind Side, and worked with numerous startup companies in the field of sports analytics. Professor Alamar is also an award winning economist who has worked academically and professionally in intellectual property valuation, public finance and public health. He received his PhD in economics from the University of California at Santa Barbara in 2001.

Prof Alamar is a speaker at Predictive Analytics World, San Fransisco and is doing a workshop there

http://www.predictiveanalyticsworld.com/sanfrancisco/2012/agenda.php#day2-17

2:55-3:15pm

All level tracks Track 1: Sports Analytics
Case Study: NFL, MLB, & NBA
Competing & Winning with Sports Analytics

The field of sports analytics ties together the tools of data management, predictive modeling and information systems to provide sports organization a competitive advantage. The field is rapidly developing based on new and expanded data sources, greater recognition of the value, and past success of a variety of sports organizations. Teams in the NFL, MLB, NBA, as well as other organizations have found a competitive edge with the application of sports analytics. The future of sports analytics can be seen through drawing on these past successes and the developments of new tools.

You can know more about Prof Alamar at his blog http://analyticfootball.blogspot.in/ or journal at http://www.degruyter.com/view/j/jqas. His detailed background can be seen at http://menlo.academia.edu/BenjaminAlamar/CurriculumVitae

Business Metrics

Business Metrics (a partial extract from my upcoming book “R for Business Analytics”

Business Metrics are important variables that are collected on a periodic basis to assess the health and sustainability of a business. They should have the following properties-

1) What is a Business Metric-The absence of collection of regular update of the business metric could cause business disruption by incorrect and incomplete decision making.

2) Cost of Business Metrics- The costs of collection, storage and updating of the business metric is less than the opportunity costs of wrong decision making cause by lack of information of that business metric.

3) Continuity in your Business Metrics- The business metrics are continuous in comparing across time periods and business units- if necessary the assumptions for smoothing the comparisons should be listed in the business metric presentation itself.

4) Simplify your Business Metrics– Business metrics can be derived as well from other business metrics. If necessary and to avoid clutter only the most important business metrics should be presented, or the metrics with the biggest deviation from past trends should be mentioned.

5) Normalize your Business Metrics- Scale of the business metric units should be comparable to other business metrics as well as significant to emphasize the difference in numbers.

6) Standardize your Business Metrics– Dimension of business metrics should be increased to enhance comparison and contrasts without enhancing complexity. This means adding an extra dimension for analysis rather than a 2 by 2 comparison, to add time /geography/ employee/business owner as a dimension .

%d bloggers like this: