DECISION STATS

Rapid Miner User Conference 2012

One of those cool conferences that is on my bucket list- this time in Hungary (That’s a nice place)

But I am especially interested in seeing how far Radoop has come along !

Disclaimer- Rapid Miner has been a Decisionstats.com sponsor for many years. It is also a very cool software but I like the R Extension facility even more!

—————————————————————

and not very expensive too compared to other User Conferences in Europe!-

http://rcomm2012.org/index.php/registration/prices

Information about Registration

Early Bird registration until July 20th, 2012.
Normal registration from July 21st, 2012 until August 13th, 2012.
Latest registration from August 14th, 2012 until August 24th, 2012.
Students have to provide a valid Student ID during registration.
The Dinner is included in the All Days and in the Conference packages.
All prices below are net prices. Value added tax (VAT) has to be added if applicable.

Prices for Regular Visitors

Days and Event	Early Bird Rate	Normal Rate	Latest Registration
Tuesday (Training / Development 1)	190 Euro	230 Euro	280 Euro
Wednesday + Thursday (Conference)	290 Euro	350 Euro	420 Euro
Friday (Training / Development 2 and Exam)	190 Euro	230 Euro	280 Euro
All Days *(Full Package)*	610 Euro	740 Euro	900 Euro

Prices for Authors and Students

In case of students, please note that you will have to provide a valid student ID during registration.

Days and Event	Early Bird Rate	Normal Rate	Latest Registration
Tuesday (Training / Development 1)	90 Euro	110 Euro	140 Euro
Wednesday + Thursday (Conference)	140 Euro	170 Euro	210 Euro
Friday (Training / Development 2 and Exam)	90 Euro	110 Euro	140 Euro
All Days *(Full Package)*	290 Euro	350 Euro	450 Euro

http://rcomm2012.org/index.php/program

Program

Time Slot	Tuesday Training / Workshop 1	Wednesday Conference 1	Thursday Conference 2	Friday Training / Workshop 2
09:00 – 10:30		Introductory Speech Ingo Mierswa; Rapid-I Data Analysis NeurophRM: Integration of the Neuroph framework into RapidMiner Miloš Jovanović, Jelena Stojanović, Milan Vukićević, Vera Stojanović, Boris Delibašić (University of Belgrade)	To be announced (Invited Talk) To be announced Recommender Systems Extending RapidMiner with Recommender Systems Algorithms Matej Mihelčić, Nino Antulov-Fantulin, Matko Bošnjak, Tomislav Šmuc (Ruđer Bošković Institute) Implementation of User Based Collaborative Filtering in RapidMiner Sérgio Morais, Carlos Soares (Universidade do Porto)	Parallel Training / Workshop Session Advanced Data Mining and Data Transformations or Development Workshop Part 2
10:30 – 12:30		Data Analysis Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner Mennatallah Amer, Markus Goldstein (DFKI) Customers’ LifeStyle Targeting on Big Data using Rapid Miner Maksim Drobyshev (LifeStyle Marketing Ltd) Robust GPGPU Plugin Development for RapidMiner Andor Kovács, Zoltán Prekopcsák (Budapest University of Technology and Economics)	Extensions Image Mining Extension – Year After Radim Burget, Václav Uher, Jan Mašek (Brno University of Technology) Incorporating R Plots into RapidMiner Reports Peter Jeszenszky (University of Debrecen) An Octave Extension for RapidMiner Sylvain Marié (Schneider Electric)
12:30 – 13:30		Lunch	Lunch	Lunch
13:30 – 15:00	Parallel Training / Workshop Session Basic Data Mining and Data Transformations or Development Workshop Part 1	Applications Application of RapidMiner in Steel Industry Research and Development Bengt-Henning Maas, Hakan Koc, Martin Bretschneider (Salzgitter Mannesmann Forschung) A Comparison of Data-driven Models for Forecast River Flow Milan Cisty, Juraj Bezak (Slovak University of Technology) Portfolio Optimization Using Local Linear Regression Ensembles in Rapid Miner Gábor Nagy, Tamás Henk, Gergő Barta (Budapest University of Technology and Economics)	Unstructured Data Processing Data Streams with the RapidMiner Streams-Plugin Christian Bockermann, Hendrik Blom (TU Dortmund) Automated Creation of Corpuses for the Needs of Sentiment Analysis Peter Koncz, Jan Paralic (Technical University of Kosice) Demonstration News from the Rapid-I Labs Simon Fischer; Rapid-I This short session demonstrates the latest developments from the Rapid-I lab and will let you how you can build powerful analysis processes and routines by using those RapidMiner tools.	Certification Exam
15:00 – 17:00		Book Presentation and Game Show Data Mining for the Masses: A New Textbook on Data Mining for Everyone Matthew North (Washington & Jefferson College) Matthew North presents his new book “Data Mining for the Masses” introducing data mining to a broader audience and making use of RapidMiner for practical data mining problems. Game Show Did you miss last years’ game show “Who wants to be a data miner?”? Use RapidMiner for problems it was never created for and beat the time and other contestants!	User Support Get some Coffee for free – Writing Operators with RapidMiner Beans Christian Bockermann, Hendrik Blom (TU Dortmund) Meta-Modeling Execution Times of RapidMiner operators Matija Piškorec, Matko Bošnjak, Tomislav Šmuc (Ruđer Bošković Institute)
19:00		Social Event (Conference Dinner)	Social Event (Visit of Bar District)

Training: Basic Data Mining and Data Transformations

This is a short introductory training course for users who are not yet familiar with RapidMiner or only have a few experiences with RapidMiner so far. The topics of this training session include

Basic Usage
- User Interface
- Creating and handling RapidMiner repositories
- Starting a new RapidMiner project
- Operators and processes
- Loading data from flat files
- Storing data, processes, and results
Predictive Models
- Linear Regression
- Naïve Bayes
- Decision Trees
Basic Data Transformations
- Changing names and roles
- Handling missing values
- Changing value types by discretization and dichotimization
- Normalization and standardization
- Filtering examples and attributes
Scoring and Model Evaluation
- Applying models
- Splitting data
- Evaluation methods
- Performance criteria
- Visualizing Model Performance

Training: Advanced Data Mining and Data Transformations

This is a short introductory training course for users who already know some basic concepts of RapidMiner and data mining and have already used the software before, for example in the first training on Tuesday. The topics of this training session include

Advanced Data Handling
- Sampling
- Balancing data
- Joins and Aggregations
- Detection and removal of outliers
- Dimensionality reduction
Control process execution
- Remember process results
- Recall process results
- Loops
- Using branches and conditions
- Exception handling
- Definition of macros
- Usage of macros
- Definition of log values
- Clearing log tables
- Transforming log tables to data

Development Workshop Part 1 and Part 2

Want to exchange ideas with the developers of RapidMiner? Or learn more tricks for developing own operators and extensions? During our development workshops on Tuesday and Friday, we will build small groups of developers each working on a small development project around RapidMiner. Beginners will get a comprehensive overview of the architecture of RapidMiner before making the first steps and learn how to write own operators. Advanced developers will form groups with our experienced developers, identify shortcomings of RapidMiner and develop a new extension which might be presented during the conference already. Unfinished work can be continued in the second workshop on Friday before results might be published on the Marketplace or can be taken home as a starting point for new custom operators.

Talking on Big Data Analytics

I am ~~going~~ being sponsored to a Government of India sponsored talk on Big Data Analytics at Bangalore on Friday the 13 th of July. If you are in Bangalore, India you may drop in for a dekko. Schedule and Abstracts (i am on page 7 out 9) .

Your tax payer money is hard at work- (hassi majak only if you are a desi. hassi to fassi.)

13 July 2012 (9.30 – 11.00 & 11.30 – 1.00)
Big Data Big Analytics
The talk will showcase using open source technologies in statistical computing for big data, namely the R programming language and its use cases in big data analysis. It will review case studies using the Amazon Cloud, custom packages in R for Big Data, tools like Revolution Analytics RevoScaleR package, as well as the newly launched SAP Hana used with R. We will also review Oracle R Enterprise. In addition we will show some case studies using BigML.com (using Clojure) , and approaches using PiCloud. In addition it will showcase some of Google APIs for Big Data Analysis.

Lastly we will talk on social media analysis ,national security use cases (i.e. cyber war) and privacy hazards of big data analytics.

Schedule

View more presentations from Ajay Ohri.

Abstracts

View more documents from Ajay Ohri.

Working with a large number of files for reading into R #rstats

Using the dir() and list.files() commands lists all the files in a particular directory. These can be interactively read by R, by referencing to specific parts of the list created by the above two commands. This is useful when you are working with a large number of files, that get generated or re-generated after specific time periods (like web server log files)

> getwd()
[1] “C:/Users/KUs/Documents”
> path=”C:/Users/KUs/Desktop/tester”
> dir(path)
[1] “tester.csv” “tester2.csv” “tester3.csv””tester4.csv”
> setwd(path)
> read.table(file=dir(path)[1],sep=”t”,header=T)
X1 X2 X3 X4
1 to be 2 B

> read.table(file=dir(path)[4],sep=”,”,header=T)
zoo bee doo bee.1 daa
1 12 32 43 34 qwerty

Happy Higgs- Boson Day

Dark Matters from PHD Comics on Vimeo.

The Higgs Boson Explained from PHD Comics on Vimeo.

Saving Output in R for Presentations

While SAS language has a beautifully designed ODS (Output Delivery System) for saving output from certain analysis in excel files (and html and others), in R one can simply use the object, put it in a write.table and save it a csv file using the file parameter within write.table.

As a business analytics consultant, the output from a Proc Means, Proc Freq (SAS) or a summary/describe/table command (in R) is to be presented as a final report. Copying and pasting is not feasible especially for large amounts of text, or remote computers.

Using the following we can simple save the output in R

> getwd()
[1] “C:/Users/KUs/Desktop/Ajay”
> setwd(“C:\Users\KUs\Desktop”)

#We shifted the directory, so we can save output without putting the entire path again and again for each step.

#I have found the summary command most useful for initial analysis and final display (particularly during the data munging step)

nams=summary(ajay)

# I assigned a new object to the analysis step (summary), it could also be summary,names, describe (HMisc) or table (for frequency analysis),
> write.table(nams,sep=”,”,file=”output.csv”)

Note: This is for basic beginners in R using it for business analytics dealing with large number of variables.

pps: Note

If you have a large number of files in a local directory to be read in R, you can avoid typing the entire path again and again by modifying the file parameter in the read.table and changing the working directory to that folder

setwd(“C:/Users/KUs/Desktop/”)
ajayt1=read.table(file=”test1.csv”,sep=”,”,header=T)

ajayt2=read.table(file=”test2.csv”,sep=”,”,header=T)

and so on…

maybe there is a better approach somewhere on Stack Overflow or R help, but this will work just as well.

you can then merge the objects created ajayt1 and ajayt2… (to be continued)

Awesome website for #rstats Mining Twitter using R

Just came across this very awesome website.

Did you know there were six kinds of wordclouds in R.

(giggles like a little boy)

https://sites.google.com/site/miningtwitter/questions/talking-about

Simple Wordcloud	Comparison Wordcloud
Tweets about some given topic	Tweets of some given user (ex 1)
Tweets of some given user (ex 2)	Modified tag-cloud

This guy – the force is strong in him

Gaston Sanchez
Data Analysis + Visualization + Statistics + R = FUN

http://www.gastonsanchez.com/about


	Contact Info gaston.stat@gmail.com	> home
	linkedIn pinterest resume.pdf


About		Currently, I’m a postdoc in Rasmus Nielsen’s Lab in the Center for Theoretical Evolutionary Genomics at the University of California, Berkeley. I’m also collaborating with the Biology Scholars Program (BSP) at UC Berkeley, and I am affiliated to the Program on Reproductive Health and the Environment (PRHE) at UC San Francisco. In my (scarce) free time outside the academic world, I often work on collaborative projects for marketing analytics, statistical consulting, and statistical advising in general.

2012 Web Analytics H1

Decisionstats.com is doing okay it seems as per my web analytics software

Decisionstats.com -2012-06-30

and the poetry traffic is getting lot more love now!

At http://poemsforkush.com

Information about Registration

Prices for Regular Visitors

Days and Event

Early Bird Rate

Normal Rate

Latest Registration

Prices for Authors and Students

Days and Event

Early Bird Rate

Normal Rate

Latest Registration

Program

Time Slot

Tuesday Training / Workshop 1

Wednesday Conference 1

Thursday Conference 2

Friday Training / Workshop 2

09:00 – 10:30

10:30 – 12:30

12:30 – 13:30

Lunch

Lunch

Lunch

13:30 – 15:00

15:00 – 17:00

19:00

Social Event (Conference Dinner)

Social Event (Visit of Bar District)

Training: Basic Data Mining and Data Transformations

Training: Advanced Data Mining and Data Transformations

Development Workshop Part 1 and Part 2

Please share:

Please share:

Please share:

Please share:

Please share:

Please share:

Please share:

Time
Slot

Tuesday
Training / Workshop 1

Wednesday
Conference 1

Thursday
Conference 2

Friday
Training / Workshop 2