Home » Posts tagged 'data' (Page 2)
Tag Archives: data
Using Rapid Miner and R for Sports Analytics #rstats
Ajay- Why did you choose Rapid Miner and R? What were the other software alternatives you considered and discarded?
Analyst- We considered most of the other major players in statistics/data mining or enterprise BI. However, we found that the value proposition for an open source solution was too compelling to justify the premium pricing that the commercial solutions would have required. The widespread adoption of R and the variety of packages and algorithms available for it, made it an easy choice. We liked RapidMiner as a way to design structured, repeatable processes, and the ability to optimize learner parameters in a systematic way. It also handled large data sets better than R on 32-bit Windows did. The GUI, particularly when 5.0 was released, made it more usable than R for analysts who weren’t experienced programmers.
Ajay- What analytics do you do think Rapid Miner and R are best suited for?
Analyst- We use RM+R mainly for sports analysis so far, rather than for more traditional business applications. It has been quite suitable for that, and I can easily see how it would be used for other types of applications.
Ajay- Any experiences as an enterprise customer? How was the installation process? How good is the enterprise level support?
Analyst- Rapid-I has been one of the most responsive tech companies I’ve dealt with, either in my current role or with previous employers. They are small enough to be able to respond quickly to requests, and in more than one case, have fixed a problem, or added a small feature we needed within a matter of days. In other cases, we have contracted with them to add larger pieces of specific functionality we needed at reasonable consulting rates. Those features are added to the mainline product, and become fully supported through regular channels. The longer consulting projects have typically had a turnaround of just a few weeks.
Ajay- What challenges if any did you face in executing a pure open source analytics bundle ?
Analyst- As Rapid-I is a smaller company based in Europe, the availability of training and consulting in the USA isn’t as extensive as for the major enterprise software players, and the time zone differences sometimes slow down the communications cycle. There were times where we were the first customer to attempt a specific integration point in our technical environment, and with no prior experiences to fall back on, we had to work with Rapid-I to figure out how to do it. Compared to the what traditional software vendors provide, both R and RM tend to have sparse, terse, occasionally incomplete documentation. The situation is getting better, but still lags behind what the traditional enterprise software vendors provide.
Ajay- What are the things you can do in R ,and what are the things you prefer to do in Rapid Miner (comparison for technical synergies)
Analyst- Our experience has been that RM is superior to R at writing and maintaining structured processes, better at handling larger amounts of data, and more flexible at fine-tuning model parameters automatically. The biggest limitation we’ve had with RM compared to R is that R has a larger library of user-contributed packages for additional data mining algorithms. Sometimes we opted to use R because RM hadn’t yet implemented a specific algorithm. The introduction the R extension has allowed us to combine the strengths of both tools in a very logical and productive way.
In particular, extending RapidMiner with R helped address RM’s weakness in the breadth of algorithms, because it brings the entire R ecosystem into RM (similar to how Rapid-I implemented much of the Weka library early on in RM’s development). Further, because the R user community releases packages that implement new techniques faster than the enterprise vendors can, this helps turn a potential weakness into a potential strength. However, R packages tend to be of varying quality, and are more prone to go stale due to lack of support/bug fixes. This depends heavily on the package’s maintainer and its prevalence of use in the R community. So when RapidMiner has a learner with a native implementation, it’s usually better to use it than the R equivalent.
RCOMM 2012 goes live in August
An awesome conference by an awesome software Rapid Miner remains one of the leading enterprise grade open source software , that can help you do a lot of things including flow driven data modeling ,web mining ,web crawling etc which even other software cant.
Presentations include:
- Mining Machine 2 Machine Data (Katharina Morik, TU Dortmund University)
- Handling Big Data (Andras Benczur, MTA SZTAKI)
- Introduction of RapidAnalytics at Telenor (Telenor and United Consult)
- and more
Here is a list of complete program
Program
Time
|
Tuesday
|
Wednesday
|
Thursday
|
Friday
|
09:00 – 10:30 |
Introductory Speech Ingo Mierswa (Rapid-I)Resource-aware Data Mining or M2M Mining (Invited Talk) Katharina Morik (TU Dortmund University)
Data Analysis
NeurophRM: Integration of the Neuroph framework into RapidMiner |
To be announced (Invited Talk) Andras Benczur Recommender Systems
Extending RapidMiner with Recommender Systems Algorithms Implementation of User Based Collaborative Filtering in RapidMiner |
Parallel Training / Workshop Session
Advanced Data Mining and Data Transformations or |
|
10:30 – 11:00 |
Coffee Break |
Coffee Break |
Coffee Break |
|
11:00 – 12:30 |
Data Analysis
Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner Customers’ LifeStyle Targeting on Big Data using Rapid Miner Robust GPGPU Plugin Development for RapidMiner |
Extensions
Optimization Plugin For RapidMiner
Image Mining Extension – Year After Incorporating R Plots into RapidMiner Reports |
||
12:30 – 13:30 |
Lunch |
Lunch |
Lunch |
|
13:30 – 15:30 |
Parallel Training / Workshop Session
Basic Data Mining and Data Transformations or |
Applications
Introduction of RapidAnalyticy Enterprise Edition at Telenor Hungary
Application of RapidMiner in Steel Industry Research and Development A Comparison of Data-driven Models for Forecast River Flow Portfolio Optimization Using Local Linear Regression Ensembles in Rapid Miner |
Extensions
An Octave Extension for RapidMiner
Unstructured Data
Processing Data Streams with the RapidMiner Streams-Plugin Automated Creation of Corpuses for the Needs of Sentiment Analysis
Demonstration: News from the Rapid-I Labs This short session demonstrates the latest developments from the Rapid-I lab and will let you how you can build powerful analysis processes and routines by using those RapidMiner tools. |
Certification Exam |
15:30 – 16:00 |
Coffee Break |
Coffee Break |
Coffee Break |
|
16:00 – 18:00 |
Book Presentation and Game Show
Data Mining for the Masses: A New Textbook on Data Mining for Everyone Matthew North presents his new book “Data Mining for the Masses” introducing data mining to a broader audience and making use of RapidMiner for practical data mining problems.
Game Show |
User Support
Get some Coffee for free – Writing Operators with RapidMiner Beans Meta-Modeling Execution Times of RapidMiner operators Conference day ends at ca. 17:00. |
||
19:30 |
Social Event (Conference Dinner) |
Social Event (Visit of Bar District) |
and you should have a look at https://rapid-i.com/rcomm2012f/index.php?option=com_content&view=article&id=65
Conference is in Budapest, Hungary,Europe.
( Disclaimer- Rapid Miner is an advertising sponsor of Decisionstats.com in case you didnot notice the two banner sized ads.)
Interview Rob J Hyndman Forecasting Expert #rstats
Here is an interview with Prof Rob J Hyndman who has created many time series forecasting methods and authored books as well as R packages on the same.
Probably the biggest impact I’ve had is in helping the Australian government forecast the national health budget. In 2001 and 2002, they had underestimated health expenditure by nearly $1 billion in each year which is a lot of money to have to find, even for a national government. I was invited to assist them in developing a new forecasting method, which I did. The new method has forecast errors of the order of plus or minus $50 million which is much more manageable. The method I developed for them was the basis of the ETS models discussed in my 2008 book on exponential smoothing (www.exponentialsmoothing.net)
Making Big Data Analytics an API call away
I have compared some of Amazon’s database in the cloud offerings with Google’s and especially the Google BigQuery API in my latest article. With more than 2 years under its belt for development, Google BigQuery API is a good service to test out if you want to reduce dependencies on database vendors.
Read it at
Google BigQuery API Makes Big Data Analytics Easy
http://blog.programmableweb.com/2012/08/07/google-bigquery-api-makes-big-data-analytics-easy/
New Free Online Book by Rob Hyndman on Forecasting using #Rstats
From the creator of some of the most widely used packages for time series in the R programming language comes a brand new book, and its online!
This time the book is free, will be updated and 7 chapters are ready (to read!)
. If you do forecasting professionally, now is the time to suggest your own use cases to be featured as the book gets ready by end- 2012. The book is intended as a replacement for Makridakis, Wheelwright and Hyndman (Wiley 1998).
The book is written for three audiences:
(1) people finding themselves doing forecasting in business when they may not have had any formal training in the area;
(2) undergraduate students studying business;
(3) MBA students doing a forecasting elective.
The book is different from other forecasting textbooks in several ways.
- It is free and online, making it accessible to a wide audience.
- It is continuously updated. You don’t have to wait until the next edition for errors to be removed or new methods to be discussed. We will update the book frequently.
- There are dozens of real data examples taken from our own consulting practice. We have worked with hundreds of businesses and organizations helping them with forecasting issues, and this experience has contributed directly to many of the examples given here, as well as guiding our general philosophy of forecasting.
- We emphasise graphical methods more than most forecasters. We use graphs to explore the data, analyse the validity of the models fitted and present the forecasting results.
A print version and a downloadable e-version of the book will be available to purchase on Amazon, but not until a few more chapters are written.
Contents
(Ajay-Support the open textbook movement!)
If you’ve found this book helpful, please consider helping to fund free, open and online textbooks. (Donations via PayPal.)
Update!
I have been busy-
1) Finally my divorce came through. My advice – dont do it without a pre-nup ! Alimony means all the money.
2) Spending time on Quora after getting bored from LinkedIn, Twitter,Facebook,Google Plus,Tumblr, WordPress
See this answer to-
1) we will change the world
2) if we get 1% of a billion people market, we will be rich
3) if we have got funding, most of the job is done
4) lets pay ourselves high salaries since we got funded
5) our idea is awesome and cant be copied, improvised, stolen, replicated
6) startups are painless
7) it is a better life than a corporate career
8) long term vision is important than short term cash burn
9) we will never sell out or exit. never
10) its a great idea to make startups with friend
Say hello to me – http://www.quora.com/Ajay-Ohri/answers
3) Writing freelance articles on APIs for Programmable Web
Why write pro? See point 1)
Recent Articles-
http://blog.programmableweb.com/2012/07/30/predict-the-future-with-google-prediction-api/
http://blog.programmableweb.com/2012/08/01/your-store-in-the-cloud-google-cloud-storage-api/
http://blog.programmableweb.com/2012/07/27/the-romney-vs-obama-api/
4) Writing poetry on http://poemsforkush.com/. It now gets 23000 views a month. I wish I could say my poems were great, but the readers are kind (364 subscribers!) and also Google Image Search is very very kind.
5) Kicking tires with next book ” R for Cloud Computing” and be tuned for another writing announcement
6) Waiting for Paul Kent, VP, SAS Big Data to reply to my emails for interview after HE promised me!! You dont get to 105 interviews without being a bit stubborn!
7) Sighing on politics engulfing my American friends especially with regards to Chic-fil-A and Romney’s gaffes. Now thats what I call a first world problem! Protesting by eating or boycotting chicken sandwiches! In India we had the world’s biggest blackout two days in a row- and no one is attending the Hunger Fast against corruption protests!
8) Watching Olympics! Our glorious nation of 1.2 billion very smart people has managed to win 1 Bronze till today!! Michael Phelps has won more medals and more gold than the whole of India has since the Olympics Games began!!
9) Consulting to pay the bills. includes writing R code, making presentations. Why consult when I have writing to do? See point 1)
10) Reading New York Times to get insights on Big Data and Analytics. Trust them- they know what they are doing!





