Home » Analytics

Category Archives: Analytics

1-click Random Decision Forests

Reblogged from The Official Blog of BigML.com:

Click to visit the original post
  • Click to visit the original post
  • Click to visit the original post
  • Click to visit the original post
  • Click to visit the original post

One of the pitfalls of machine learning is that creating a single predictive model has the potential to overfit your data. That is, the performance on your training data might be very good, but the model does not generalize well to new data. Ensemble learning of decision trees, also referred to as forests or simply ensembles,  is a tried-and-true technique for reducing the error of single machine-learned models. 

Read more… 953 more words

A SunBurst of Insight

Reblogged from The Official Blog of BigML.com:

Click to visit the original post
  • Click to visit the original post
  • Click to visit the original post
  • Click to visit the original post
  • Click to visit the original post

This is a guest post by David Gerster (@gerster), a data scientist and investor in BigML.

I work at a consumer web company, and recently used BigML to understand what drives return visits to our site. I followed Standard Operating Procedure for data mining, sampling a group of users, dividing them into two classes, and creating several features that I hoped would be useful in predicting these classes.

Read more… 620 more words

a nice addition to Big Data Visualization- sunbursts (which I have covered in the Dat Viz chapter of my R book) Great work by BigML.com

Predictive Analytics World goes to Chicago

Message from our Sponsors and my favorite Analytics conference ( only if I could attend a cool analytics conference nearby in Asia (singapore/turkey?)  -sighs) Even useR wont come to Asia ever?-

This is the number 1 conference for analytics in the world and it is next month in Chicago, USA? So you think you have the best analytics software or product or service. Here is where you can find it out!

It’s time to amp-up your analytics strategy. It’s time to beef up your analytics strategy by attending Predictive Analytics World Chicago, June 10-13, 2013. With over 30 case studies from leading organizations across a spectrum of industries, this is the must-attend event for anyone serious about their analytics strategy.

Here’s what your peers had to say about their experience at PAW:

“Great speakers, interesting content, and great networking. PAW conferences are among my favorite analytic events!”
– Karl Rexer, Ph.D. Rexer Analytics“This vendor neutral conference always gives me tangible ideas I can put to work right away.”
– Greg Hayworth, Humana

“Predictive Analytics World did a great job keeping up with the trends in Predictive Modeling. There were also plenty of opportunities to learn about the most valuable resources available to data scientists.”
– Conor Sontag, Marketing Evolution

“People who are in analytics must join Predictive Analytics World and see the state of the art projects.”
– Burak Buyuktombak, Avea Telecommunication Services (Turkey)

And there is more where that came from.

Who’s attending PAW Chicago 2013?

Here are just a few of the many companies attending:

Whose attending PAW Chicago

And many more!

Registration options for all budgets.

PAW Chicago has a variety of conference pass options available to meet budgets of all sizes.

Learn more about pricing and how to register.

Register Now!

2013 Chicago Sponsors
Follow Us on Twitter Be a Fan on Facebook LinkedIn Group Live Twitter Feed

Using a Linux only package in Windows #rstats

Here is some R code for using a R package that has only a tar.gz file available (used to load R packages in Linux) and no Zip file available (used to load R packages in Windows).

Step 1- Download the tar.gz file.

Step 2 Unzip it (twice) using 7zip

Step 3 Change the path variable below to your unzipped, downloaded location for the R sub folder within the package folder .

Step 4 Copy and Paste this in R

Step 5 Start using the R package in Windows (where 75% of the money and clients and businesses still are)

Caveat Emptor- No X Dependencies (ok!)

path="C:\\Users\\KUs\\Desktop\\segue\\R"
b=dir(path)
c=length(b)
for (i in 1:c){source(gsub(" ","",paste(path,"\\",b[i])))}
ls()

 

R2D2

Adding a + to the bit.ly link you get to get analytics on your spammers

Just add a + sign to any bit.ly link and you get to see associated analytics for that link.

you can get information (traffic, referrers, locations, conversations) about any Bit.ly link simply by taking the short URL and adding a “+” at the end (minus the quotes)

Click on the image below and notice the + sign in the URL.

Read more here this can be useful than just fun-

Using Bit.ly for Spying, Link Building and Happiness

Unrelated- I interview Hilary Mason, Analytics legend and Bit.ly Chief Scientist here -

Interview Hilary Mason Chief Scientist bitly

nah

Using R for Cricket Analysis #rstats #IPL

#Downloading the Data for batting across all formats of cricket
library(XML)
url="http://stats.espncricinfo.com/ci/engine/stats/index.html?class=11;template=results;type=batting"
tables=readHTMLTable(url,stringsAsFactors = F)
#Note we wrote stringsAsFactors=F in this to avoid getting factor variables, 
#since we will need to convert these variables to numeric variables
table2=tables$"Overall figures"
rm(tables)
#Creating new variables from Span
table2$Debut=as.numeric(substr(table2$Span,1,4))
table2$LastYr=as.numeric(substr(table2$Span,6,10))
table2$YrsPlayed=table2$LastYr-table2$Debut
#Creating New Variables. In cricket a not out score is denoted by * which can cause data quality error. 
#This is treated by grepl for finding and gsub for removing the *. 
#Note the double \ to escape regex charachter
table2$HSNotOut=grepl("\\*",table2$HS)
table2$HS2=gsub("\\*","",table2$HS)
#Creating a FOR Loop (!) to convert variables to numeric variables
for (i in 3:17) {
+     table2[, i] <- as.numeric(table2[, i])
+ }

and we see why Sachin Tendulkar is the best (by using ggplot via Deducer)

dmancasestudy5

Also see 

  • Freaknomics Challenge-
    1. Prove match fixing does not and cannot exist in IPL
    2. Create an ideal fantasy team
    
    

 

Using R for Cricket Analysis #rstats

ESPN Crincinfo is the best site for cricket data (you can see an earlier detailed post on the database  here http://decisionstats.com/2012/04/07/cricinfo-statsguru-database-for-statistical-and-graphical-analysis/  ), and using the XML package in R we can easily scrape and manipulate data

Here is the code.

library(XML)
url="http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;team=6;template=results;type=batting"
#Note I can also break the url string and use paste command to modify this url with parameters
tables=readHTMLTable(url)
tables$"Overall figures"

#Now see this- since I only got 50 results in each page, I look at the url of next page

table1=tables$"Overall figures"
url="http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;page=2;team=6;template=results;type=batting"
tables=readHTMLTable(url)
table2=tables$"Overall figures"

#Now I need to join these two tables vertically

table3=rbind(table1,table2)

Note-I can also automate the web scraping .
Now the data is within R, we can use something like Deducer to visualize.
Created by Pretty R at inside-R.org
Follow

Get every new post delivered to your Inbox.

Join 1,089 other followers

%d bloggers like this: