Home » Analytics
Category Archives: Analytics
1-click Random Decision Forests
Reblogged from The Official Blog of BigML.com:
One of the pitfalls of machine learning is that creating a single predictive model has the potential to overfit your data. That is, the performance on your training data might be very good, but the model does not generalize well to new data. Ensemble learning of decision trees, also referred to as forests or simply ensembles, is a tried-and-true technique for reducing the error of single machine-learned models.
A SunBurst of Insight
Reblogged from The Official Blog of BigML.com:
This is a guest post by David Gerster (@gerster), a data scientist and investor in BigML.
I work at a consumer web company, and recently used BigML to understand what drives return visits to our site. I followed Standard Operating Procedure for data mining, sampling a group of users, dividing them into two classes, and creating several features that I hoped would be useful in predicting these classes.
Predictive Analytics World goes to Chicago
Message from our Sponsors and my favorite Analytics conference ( only if I could attend a cool analytics conference nearby in Asia (singapore/turkey?) -sighs) Even useR wont come to Asia ever?-
This is the number 1 conference for analytics in the world and it is next month in Chicago, USA? So you think you have the best analytics software or product or service. Here is where you can find it out!
| It’s time to amp-up your analytics strategy. It’s time to beef up your analytics strategy by attending Predictive Analytics World Chicago, June 10-13, 2013. With over 30 case studies from leading organizations across a spectrum of industries, this is the must-attend event for anyone serious about their analytics strategy.
Here’s what your peers had to say about their experience at PAW:
And there is more where that came from. Who’s attending PAW Chicago 2013? Here are just a few of the many companies attending:
And many more! Registration options for all budgets. PAW Chicago has a variety of conference pass options available to meet budgets of all sizes. |
||||||
![]() |
||||||
|
|
||||||
Using a Linux only package in Windows #rstats
Here is some R code for using a R package that has only a tar.gz file available (used to load R packages in Linux) and no Zip file available (used to load R packages in Windows).
Step 1- Download the tar.gz file.
Step 2 Unzip it (twice) using 7zip
Step 3 Change the path variable below to your unzipped, downloaded location for the R sub folder within the package folder .
Step 4 Copy and Paste this in R
Step 5 Start using the R package in Windows (where 75% of the money and clients and businesses still are)
Caveat Emptor- No X Dependencies (ok!)
- WE DO NOT BREAK USERSPACE!
-
- Torvalds, Linus (2012-12-23). Linus Torvalds - LKML
Adding a + to the bit.ly link you get to get analytics on your spammers
Just add a + sign to any bit.ly link and you get to see associated analytics for that link.
you can get information (traffic, referrers, locations, conversations) about any Bit.ly link simply by taking the short URL and adding a “+” at the end (minus the quotes)
Click on the image below and notice the + sign in the URL.
Read more here this can be useful than just fun-
Using Bit.ly for Spying, Link Building and Happiness
Unrelated- I interview Hilary Mason, Analytics legend and Bit.ly Chief Scientist here -
Using R for Cricket Analysis #rstats #IPL
#Downloading the Data for batting across all formats of cricket library(XML) url="http://stats.espncricinfo.com/ci/engine/stats/index.html?class=11;template=results;type=batting" tables=readHTMLTable(url,stringsAsFactors = F) #Note we wrote stringsAsFactors=F in this to avoid getting factor variables, #since we will need to convert these variables to numeric variables table2=tables$"Overall figures" rm(tables) #Creating new variables from Span table2$Debut=as.numeric(substr(table2$Span,1,4)) table2$LastYr=as.numeric(substr(table2$Span,6,10)) table2$YrsPlayed=table2$LastYr-table2$Debut #Creating New Variables. In cricket a not out score is denoted by * which can cause data quality error. #This is treated by grepl for finding and gsub for removing the *. #Note the double \ to escape regex charachter table2$HSNotOut=grepl("\\*",table2$HS) table2$HS2=gsub("\\*","",table2$HS) #Creating a FOR Loop (!) to convert variables to numeric variables for (i in 3:17) { + table2[, i] <- as.numeric(table2[, i]) + } and we see why Sachin Tendulkar is the best (by using ggplot via Deducer)
Also see
- http://decisionstats.com/2013/04/14/using-r-for-cricket-analysis-rstats/
- http://decisionstats.com/2012/04/07/cricinfo-statsguru-database-for-statistical-and-graphical-analysi
-
Freaknomics Challenge-
- Prove match fixing does not and cannot exist in IPL
- Create an ideal fantasy team
Using R for Cricket Analysis #rstats
ESPN Crincinfo is the best site for cricket data (you can see an earlier detailed post on the database here http://decisionstats.com/2012/04/07/cricinfo-statsguru-database-for-statistical-and-graphical-analysis/ ), and using the XML package in R we can easily scrape and manipulate data
Here is the code.
library(XML) url="http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;team=6;template=results;type=batting" #Note I can also break the url string and use paste command to modify this url with parameters tables=readHTMLTable(url) tables$"Overall figures" #Now see this- since I only got 50 results in each page, I look at the url of next page table1=tables$"Overall figures" url="http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;page=2;team=6;template=results;type=batting" tables=readHTMLTable(url) table2=tables$"Overall figures" #Now I need to join these two tables vertically table3=rbind(table1,table2) Note-I can also automate the web scraping . Now the data is within R, we can use something like Deducer to visualize.
Created by Pretty R at inside-R.org













