Obfuscate using Rapid Miner

ob·fus·cate/ˈäbfəˌskāt/

Verb:
  1. Render obscure, unclear, or unintelligible.
  2. Bewilder (someone).

A nice geeky function in Rapid Miner is the Obfuscator

This operator can be used to anonymize your data. It is possible to save the obfuscating map into a fi le which can be used to remap the old values and names. Please use the operator Deobfuscator for this

 

RapidMiner is free for download here (its open source)

http://rapid-i.com/content/view/26/201/

Time Series for Web Analytics

I am mostly language agnostic, though I dislike shoddy design in software (like SAS Enterprise Guide), shoddy websites (like the outdated designed of http://www.r-project.org/ site) , and dishonest marketing in inventing buzz words  (or as they say — excessively dishonest marketing).

At the same time I love nicely designed software (Rattle,Rapid Miner, JMP), great websites for software (like http://rstudio.org/ ) and suitably targeted marketing (like IBM’s) and appreciate intellectual honesty in a field where honest men are rare to find ( http://www.nytimes.com/2012/08/12/business/how-big-data-became-so-big-unboxed.html?_r=1&hpw

I digress- Here are some papers I find interesting to read.

Fun with Rapid Miner

I fired up my Rapid Miner 5.1 and discovered a tonne of features that I cant do with (cough) other software. I think I need to get some training in this or even a live project so I can write more on Rapid Miner.

You need to view in full screen mode if you want to see what I am doing.

Background Music is unrelated!

RCOMM 2012 goes live in August

An awesome conference by an awesome software Rapid Miner remains one of the leading enterprise grade open source software , that can help you do a lot of things including flow driven data modeling ,web mining ,web crawling etc which even other software cant.

Presentations include:

  • Mining Machine 2 Machine Data (Katharina Morik, TU Dortmund University)
  • Handling Big Data (Andras Benczur, MTA SZTAKI)
  • Introduction of RapidAnalytics at Telenor (Telenor and United Consult)
  • and more

Here is a list of complete program

 

Program

 

Time
Slot
Tuesday
Training / Workshop 1
Wednesday
Conference 1
Thursday
Conference 2
Friday
Training / Workshop 2
09:00 – 10:30
Introductory Speech
Ingo Mierswa (Rapid-I)Resource-aware Data Mining or M2M Mining (Invited Talk)

Katharina Morik (TU Dortmund University)

More information

 

Data Analysis

 

NeurophRM: Integration of the Neuroph framework into RapidMiner
Miloš Jovanović, Jelena Stojanović, Milan Vukićević, Vera Stojanović, Boris Delibašić (University of Belgrade)

To be announced (Invited Talk)
Andras Benczur 

Recommender Systems

 

Extending RapidMiner with Recommender Systems Algorithms
Matej Mihelčić, Nino Antulov-Fantulin, Matko Bošnjak, Tomislav Šmuc (Ruđer Bošković Institute)

Implementation of User Based Collaborative Filtering in RapidMiner
Sérgio Morais, Carlos Soares (Universidade do Porto)

Parallel Training / Workshop Session

Advanced Data Mining and Data Transformations

or

Development Workshop Part 2

10:30 – 11:00
Coffee Break
Coffee Break
Coffee Break
11:00 – 12:30
Data Analysis

Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner
Mennatallah Amer, Markus Goldstein (DFKI)

Customers’ LifeStyle Targeting on Big Data using Rapid Miner
Maksim Drobyshev (LifeStyle Marketing Ltd)

Robust GPGPU Plugin Development for RapidMiner
Andor Kovács, Zoltán Prekopcsák (Budapest University of Technology and Economics)

Extensions

 

Optimization Plugin For RapidMiner
Venkatesh Umaashankar, Sangkyun Lee (TU Dortmund University; presented by Hendrik Blom)

 

Image Mining Extension – Year After
Radim Burget, Václav Uher, Jan Mašek (Brno University of Technology)

Incorporating R Plots into RapidMiner Reports
Peter Jeszenszky (University of Debrecen)

12:30 – 13:30
Lunch
Lunch
Lunch
13:30 – 15:30
Parallel Training / Workshop Session

Basic Data Mining and Data Transformations

or

Development Workshop Part 1

Applications

 

Introduction of RapidAnalyticy Enterprise Edition at Telenor Hungary
t.b.a. (Telenor Hungary and United Consult)

 

Application of RapidMiner in Steel Industry Research and Development
Bengt-Henning Maas, Hakan Koc, Martin Bretschneider (Salzgitter Mannesmann Forschung)

A Comparison of Data-driven Models for Forecast River Flow
Milan Cisty, Juraj Bezak (Slovak University of Technology)

Portfolio Optimization Using Local Linear Regression Ensembles in Rapid Miner
Gábor Nagy, Tamás Henk, Gergő Barta (Budapest University of Technology and Economics)

Extensions

 

An Octave Extension for RapidMiner
Sylvain Marié (Schneider Electric)

 

Unstructured Data

 

Processing Data Streams with the RapidMiner Streams-Plugin
Christian Bockermann, Hendrik Blom (TU Dortmund)

Automated Creation of Corpuses for the Needs of Sentiment Analysis
Peter Koncz, Jan Paralic (Technical University of Kosice)

 

Demonstration: News from the Rapid-I Labs
Simon Fischer; Rapid-I

This short session demonstrates the latest developments from the Rapid-I lab and will let you how you can build powerful analysis processes and routines by using those RapidMiner tools.

Certification Exam
15:30 – 16:00
Coffee Break
Coffee Break
Coffee Break
16:00 – 18:00
Book Presentation and Game Show

Data Mining for the Masses: A New Textbook on Data Mining for Everyone
Matthew North (Washington & Jefferson College)

Matthew North presents his new book “Data Mining for the Masses” introducing data mining to a broader audience and making use of RapidMiner for practical data mining problems.

 

Game Show
Did you miss last years’ game show “Who wants to be a data miner?”? Use RapidMiner for problems it was never created for and beat the time and other contestants!

User Support

Get some Coffee for free – Writing Operators with RapidMiner Beans
Christian Bockermann, Hendrik Blom (TU Dortmund)

Meta-Modeling Execution Times of RapidMiner operators
Matija Piškorec, Matko Bošnjak, Tomislav Šmuc (Ruđer Bošković Institute)

Conference day ends at ca. 17:00.

19:30
Social Event (Conference Dinner)
Social Event (Visit of Bar District)

 

and you should have a look at https://rapid-i.com/rcomm2012f/index.php?option=com_content&view=article&id=65

Conference is in Budapest, Hungary,Europe.

( Disclaimer- Rapid Miner is an advertising sponsor of Decisionstats.com in case you didnot notice the two banner sized ads.)

 

Making Big Data Analytics an API call away

I have compared some of Amazon’s database in the cloud offerings with Google’s and especially the Google BigQuery API in my latest article. With more than 2 years under its belt for development, Google BigQuery API is a good service to test out if you want to reduce dependencies on database vendors.
Read it at
Google BigQuery API Makes Big Data Analytics Easy
http://blog.programmableweb.com/2012/08/07/google-bigquery-api-makes-big-data-analytics-easy/

Understanding OAuth 1.0 for #rstats

The lovely lovely diagram at  https://developer.linkedin.com/documents/oauth-overview   is worth a thousand words and errors.

Very useful if you are trying to coax rCurl to do the job for you.

Credits-Idan Gazit

 

 

Also a great slideshare in Japanese (no! Google Translate didnt work on pdf’s and slideshares and scribds (why!!) but still very lucid on using OAuth with R for Twitter.

Why use OAuth- you get 350 calls per hour for authenticated sessions than 150 calls .

I tried but failed using registerTwitterOAuth

There is a real need for a single page where you can go and see which social netowork /website is using what kind of oAuth, which url within that website has your API keys, and the accompanying R Code for the same. Google Plus,LinkedIn, Twitter, Facebook all can be scraped better by OAuth. Something like this-

 

New Free Online Book by Rob Hyndman on Forecasting using #Rstats

From the creator of some of the most widely used packages for time series in the R programming language comes a brand new book, and its online!

This time the book is free, will be updated and 7 chapters are ready (to read!)

. If you do forecasting professionally, now is the time to suggest your own use cases to be featured as the book gets ready by end- 2012. The book is intended as a replace­ment for Makri­dakis, Wheel­wright and Hyn­d­man (Wiley 1998).

http://otexts.com/fpp/

The book is writ­ten for three audi­ences:

(1) people find­ing them­selves doing fore­cast­ing in busi­ness when they may not have had any for­mal train­ing in the area;

(2) undergraduate stu­dents study­ing busi­ness;

(3) MBA stu­dents doing a fore­cast­ing elec­tive.

The book is dif­fer­ent from other fore­cast­ing text­books in sev­eral ways.

  • It is free and online, mak­ing it acces­si­ble to a wide audience.
  • It is con­tin­u­ously updated. You don’t have to wait until the next edi­tion for errors to be removed or new meth­ods to be dis­cussed. We will update the book frequently.
  • There are dozens of real data exam­ples taken from our own con­sult­ing prac­tice. We have worked with hun­dreds of busi­nesses and orga­ni­za­tions help­ing them with fore­cast­ing issues, and this expe­ri­ence has con­tributed directly to many of the exam­ples given here, as well as guid­ing our gen­eral phi­los­o­phy of forecasting.
  • We empha­sise graph­i­cal meth­ods more than most fore­cast­ers. We use graphs to explore the data, analyse the valid­ity of the mod­els fit­ted and present the fore­cast­ing results.

A print ver­sion and a down­load­able e-version of the book will be avail­able to pur­chase on Ama­zon, but not until a few more chap­ters are written.

Contents

(Ajay-Support the open textbook movement!)

If you’ve found this book helpful, please consider helping to fund free, open and online textbooks. (Donations via PayPal.)

Look for yourself at http://otexts.com/fpp/