Home » Posts tagged 'book'
Tag Archives: book
RCOMM 2012 goes live in August
An awesome conference by an awesome software Rapid Miner remains one of the leading enterprise grade open source software , that can help you do a lot of things including flow driven data modeling ,web mining ,web crawling etc which even other software cant.
Presentations include:
- Mining Machine 2 Machine Data (Katharina Morik, TU Dortmund University)
- Handling Big Data (Andras Benczur, MTA SZTAKI)
- Introduction of RapidAnalytics at Telenor (Telenor and United Consult)
- and more
Here is a list of complete program
Program
Time
|
Tuesday
|
Wednesday
|
Thursday
|
Friday
|
09:00 – 10:30 |
Introductory Speech Ingo Mierswa (Rapid-I)Resource-aware Data Mining or M2M Mining (Invited Talk) Katharina Morik (TU Dortmund University)
Data Analysis
NeurophRM: Integration of the Neuroph framework into RapidMiner |
To be announced (Invited Talk) Andras Benczur Recommender Systems
Extending RapidMiner with Recommender Systems Algorithms Implementation of User Based Collaborative Filtering in RapidMiner |
Parallel Training / Workshop Session
Advanced Data Mining and Data Transformations or |
|
10:30 – 11:00 |
Coffee Break |
Coffee Break |
Coffee Break |
|
11:00 – 12:30 |
Data Analysis
Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner Customers’ LifeStyle Targeting on Big Data using Rapid Miner Robust GPGPU Plugin Development for RapidMiner |
Extensions
Optimization Plugin For RapidMiner
Image Mining Extension – Year After Incorporating R Plots into RapidMiner Reports |
||
12:30 – 13:30 |
Lunch |
Lunch |
Lunch |
|
13:30 – 15:30 |
Parallel Training / Workshop Session
Basic Data Mining and Data Transformations or |
Applications
Introduction of RapidAnalyticy Enterprise Edition at Telenor Hungary
Application of RapidMiner in Steel Industry Research and Development A Comparison of Data-driven Models for Forecast River Flow Portfolio Optimization Using Local Linear Regression Ensembles in Rapid Miner |
Extensions
An Octave Extension for RapidMiner
Unstructured Data
Processing Data Streams with the RapidMiner Streams-Plugin Automated Creation of Corpuses for the Needs of Sentiment Analysis
Demonstration: News from the Rapid-I Labs This short session demonstrates the latest developments from the Rapid-I lab and will let you how you can build powerful analysis processes and routines by using those RapidMiner tools. |
Certification Exam |
15:30 – 16:00 |
Coffee Break |
Coffee Break |
Coffee Break |
|
16:00 – 18:00 |
Book Presentation and Game Show
Data Mining for the Masses: A New Textbook on Data Mining for Everyone Matthew North presents his new book “Data Mining for the Masses” introducing data mining to a broader audience and making use of RapidMiner for practical data mining problems.
Game Show |
User Support
Get some Coffee for free – Writing Operators with RapidMiner Beans Meta-Modeling Execution Times of RapidMiner operators Conference day ends at ca. 17:00. |
||
19:30 |
Social Event (Conference Dinner) |
Social Event (Visit of Bar District) |
and you should have a look at https://rapid-i.com/rcomm2012f/index.php?option=com_content&view=article&id=65
Conference is in Budapest, Hungary,Europe.
( Disclaimer- Rapid Miner is an advertising sponsor of Decisionstats.com in case you didnot notice the two banner sized ads.)
Interview Rob J Hyndman Forecasting Expert #rstats
Here is an interview with Prof Rob J Hyndman who has created many time series forecasting methods and authored books as well as R packages on the same.
Probably the biggest impact I’ve had is in helping the Australian government forecast the national health budget. In 2001 and 2002, they had underestimated health expenditure by nearly $1 billion in each year which is a lot of money to have to find, even for a national government. I was invited to assist them in developing a new forecasting method, which I did. The new method has forecast errors of the order of plus or minus $50 million which is much more manageable. The method I developed for them was the basis of the ETS models discussed in my 2008 book on exponential smoothing (www.exponentialsmoothing.net)
New Free Online Book by Rob Hyndman on Forecasting using #Rstats
From the creator of some of the most widely used packages for time series in the R programming language comes a brand new book, and its online!
This time the book is free, will be updated and 7 chapters are ready (to read!)
. If you do forecasting professionally, now is the time to suggest your own use cases to be featured as the book gets ready by end- 2012. The book is intended as a replacement for Makridakis, Wheelwright and Hyndman (Wiley 1998).
The book is written for three audiences:
(1) people finding themselves doing forecasting in business when they may not have had any formal training in the area;
(2) undergraduate students studying business;
(3) MBA students doing a forecasting elective.
The book is different from other forecasting textbooks in several ways.
- It is free and online, making it accessible to a wide audience.
- It is continuously updated. You don’t have to wait until the next edition for errors to be removed or new methods to be discussed. We will update the book frequently.
- There are dozens of real data examples taken from our own consulting practice. We have worked with hundreds of businesses and organizations helping them with forecasting issues, and this experience has contributed directly to many of the examples given here, as well as guiding our general philosophy of forecasting.
- We emphasise graphical methods more than most forecasters. We use graphs to explore the data, analyse the validity of the models fitted and present the forecasting results.
A print version and a downloadable e-version of the book will be available to purchase on Amazon, but not until a few more chapters are written.
Contents
(Ajay-Support the open textbook movement!)
If you’ve found this book helpful, please consider helping to fund free, open and online textbooks. (Donations via PayPal.)
Little Book of R For Time Series #rstats
I loved this book. Only 75 pages and very lucidly written and available on Github for free. Nice job by Avril Coghlan a.coghlan@ucc.ie
.Of course My usual suspects for Time Series Readings are -
1) The seminal pdf (2008!!) by a certain Prof Hyndman
http://www.maths.anu.edu.au/~johnm/courses/r/ASC2008/pdf/Rtimeseries-ohp.pdf
2) JSS Paper -Automatic Time Series Forecasting: The forecast
Package for R http://www.jstatsoft.org/v27/i03/paper
3) The CRAN View http://cran.r-project.org/web/views/TimeSeries.html
This is cluttered and getting more and more cluttered. Some help on helping recent converts to R, especially in the field of corporate forecasting or time series for business analytics would really help.
Avril does an awesome job with this curiously named (
) booklet at http://a-little-book-of-r-for-time-series.readthedocs.org/en/latest/src/timeseries.html
Update!
I have been busy-
1) Finally my divorce came through. My advice – dont do it without a pre-nup ! Alimony means all the money.
2) Spending time on Quora after getting bored from LinkedIn, Twitter,Facebook,Google Plus,Tumblr, WordPress
See this answer to-
1) we will change the world
2) if we get 1% of a billion people market, we will be rich
3) if we have got funding, most of the job is done
4) lets pay ourselves high salaries since we got funded
5) our idea is awesome and cant be copied, improvised, stolen, replicated
6) startups are painless
7) it is a better life than a corporate career
8) long term vision is important than short term cash burn
9) we will never sell out or exit. never
10) its a great idea to make startups with friend
Say hello to me – http://www.quora.com/Ajay-Ohri/answers
3) Writing freelance articles on APIs for Programmable Web
Why write pro? See point 1)
Recent Articles-
http://blog.programmableweb.com/2012/07/30/predict-the-future-with-google-prediction-api/
http://blog.programmableweb.com/2012/08/01/your-store-in-the-cloud-google-cloud-storage-api/
http://blog.programmableweb.com/2012/07/27/the-romney-vs-obama-api/
4) Writing poetry on http://poemsforkush.com/. It now gets 23000 views a month. I wish I could say my poems were great, but the readers are kind (364 subscribers!) and also Google Image Search is very very kind.
5) Kicking tires with next book ” R for Cloud Computing” and be tuned for another writing announcement
6) Waiting for Paul Kent, VP, SAS Big Data to reply to my emails for interview after HE promised me!! You dont get to 105 interviews without being a bit stubborn!
7) Sighing on politics engulfing my American friends especially with regards to Chic-fil-A and Romney’s gaffes. Now thats what I call a first world problem! Protesting by eating or boycotting chicken sandwiches! In India we had the world’s biggest blackout two days in a row- and no one is attending the Hunger Fast against corruption protests!
8) Watching Olympics! Our glorious nation of 1.2 billion very smart people has managed to win 1 Bronze till today!! Michael Phelps has won more medals and more gold than the whole of India has since the Olympics Games began!!
9) Consulting to pay the bills. includes writing R code, making presentations. Why consult when I have writing to do? See point 1)
10) Reading New York Times to get insights on Big Data and Analytics. Trust them- they know what they are doing!
Interview John Myles White , Machine Learning for Hackers
Here is an interview with one of the younger researchers and rock stars of the R Project, John Myles White, co-author of Machine Learning for Hackers.
Ajay- What inspired you guys to write Machine Learning for Hackers. What has been the public response to the book. Are you planning to write a second edition or a next book?
John-We decided to write Machine Learning for Hackers because there were so many people interested in learning more about Machine Learning who found the standard textbooks a little difficult to understand, either because they lacked the mathematical background expected of readers or because it wasn’t clear how to translate the mathematical definitions in those books into usable programs. Most Machine Learning books are written for audiences who will not only be using Machine Learning techniques in their applied work, but also actively inventing new Machine Learning algorithms. The amount of information needed to do both can be daunting, because, as one friend pointed out, it’s similar to insisting that everyone learn how to build a compiler before they can start to program. For most people, it’s better to let them try out programming and get a taste for it before you teach them about the nuts and bolts of compiler design. If they like programming, they can delve into the details later.
Ajay- What are the key things that a potential reader can learn from this book?
John- We cover most of the nuts and bolts of introductory statistics in our book: summary statistics, regression and classification using linear and logistic regression, PCA and k-Nearest Neighbors. We also cover topics that are less well known, but are as important: density plots vs. histograms, regularization, cross-validation, MDS, social network analysis and SVM’s. I hope a reader walks away from the book having a feel for what different basic algorithms do and why they work for some problems and not others. I also hope we do just a little to shift a future generation of modeling culture towards regularization and cross-validation.
Ajay- Describe your journey as a science student up till your Phd. What are you current research interests and what initiatives have you done with them?
John-As an undergraduate I studied math and neuroscience. I then took some time off and came back to do a Ph.D. in psychology, focusing on mathematical modeling of both the brain and behavior. There’s a rich tradition of machine learning and statistics in psychology, so I got increasingly interested in ML methods during my years as a grad student. I’m about to finish my Ph.D. this year. My research interests all fall under one heading: decision theory. I want to understand both how people make decisions (which is what psychology teaches us) and how they should make decisions (which is what statistics and ML teach us). My thesis is focused on how people make decisions when there are both short-term and long-term consequences to be considered. For non-psychologists, the classic example is probably the explore-exploit dilemma. I’ve been working to import more of the main ideas from stats and ML into psychology for modeling how real people handle that trade-off. For psychologists, the classic example is the Marshmallow experiment. Most of my research work has focused on the latter: what makes us patient and how can we measure patience?
Ajay- How can academia and private sector solve the shortage of trained data scientists (assuming there is one)?
John- There’s definitely a shortage of trained data scientists: most companies are finding it difficult to hire someone with the real chops needed to do useful work with Big Data. The skill set required to be useful at a company like Facebook or Twitter is much more advanced than many people realize, so I think it will be some time until there are undergraduates coming out with the right stuff. But there’s huge demand, so I’m sure the market will clear sooner or later.
(TIL he has played in several rock bands!)
Rapid Miner User Conference 2012
One of those cool conferences that is on my bucket list- this time in Hungary (That’s a nice place)
But I am especially interested in seeing how far Radoop has come along !
Disclaimer- Rapid Miner has been a Decisionstats.com sponsor for many years. It is also a very cool software but I like the R Extension facility even more!
—————————————————————
and not very expensive too compared to other User Conferences in Europe!-
http://rcomm2012.org/index.php/registration/prices
Information about Registration
- Early Bird registration until July 20th, 2012.
- Normal registration from July 21st, 2012 until August 13th, 2012.
- Latest registration from August 14th, 2012 until August 24th, 2012.
- Students have to provide a valid Student ID during registration.
- The Dinner is included in the All Days and in the Conference packages.
- All prices below are net prices. Value added tax (VAT) has to be added if applicable.
Prices for Regular Visitors
Days and Event |
Early Bird Rate |
Normal Rate |
Latest Registration |
| Tuesday
(Training / Development 1) |
190 Euro | 230 Euro | 280 Euro |
| Wednesday + Thursday
(Conference) |
290 Euro | 350 Euro | 420 Euro |
| Friday
(Training / Development 2 and Exam) |
190 Euro | 230 Euro | 280 Euro |
| All Days
(Full Package) |
610 Euro | 740 Euro | 900 Euro |
Prices for Authors and Students
In case of students, please note that you will have to provide a valid student ID during registration.
Days and Event |
Early Bird Rate |
Normal Rate |
Latest Registration |
| Tuesday
(Training / Development 1) |
90 Euro | 110 Euro | 140 Euro |
| Wednesday + Thursday
(Conference) |
140 Euro | 170 Euro | 210 Euro |
| Friday
(Training / Development 2 and Exam) |
90 Euro | 110 Euro | 140 Euro |
| All Days
(Full Package) |
290 Euro | 350 Euro | 450 Euro |
Time
|
Tuesday
|
Wednesday
|
Thursday
|
Friday
|
09:00 – 10:30 |
Introductory Speech Ingo Mierswa; Rapid-I
Data Analysis
NeurophRM: Integration of the Neuroph framework into RapidMiner |
To be announced (Invited Talk) To be announced
Recommender Systems
Extending RapidMiner with Recommender Systems Algorithms Implementation of User Based Collaborative Filtering in RapidMiner |
Parallel Training / Workshop Session
Advanced Data Mining and Data Transformations or |
|
10:30 – 12:30 |
Data Analysis
Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner Customers’ LifeStyle Targeting on Big Data using Rapid Miner Robust GPGPU Plugin Development for RapidMiner |
Extensions
Image Mining Extension – Year After Incorporating R Plots into RapidMiner Reports An Octave Extension for RapidMiner |
||
12:30 – 13:30 |
Lunch |
Lunch |
Lunch |
|
13:30 – 15:00 |
Parallel Training / Workshop Session
Basic Data Mining and Data Transformations or |
Applications
Application of RapidMiner in Steel Industry Research and Development A Comparison of Data-driven Models for Forecast River Flow Portfolio Optimization Using Local Linear Regression Ensembles in Rapid Miner |
Unstructured Data
Processing Data Streams with the RapidMiner Streams-Plugin Automated Creation of Corpuses for the Needs of Sentiment Analysis
Demonstration
News from the Rapid-I Labs This short session demonstrates the latest developments from the Rapid-I lab and will let you how you can build powerful analysis processes and routines by using those RapidMiner tools. |
Certification Exam |
15:00 – 17:00 |
Book Presentation and Game Show
Data Mining for the Masses: A New Textbook on Data Mining for Everyone Matthew North presents his new book “Data Mining for the Masses” introducing data mining to a broader audience and making use of RapidMiner for practical data mining problems.
Game Show |
User Support
Get some Coffee for free – Writing Operators with RapidMiner Beans Meta-Modeling Execution Times of RapidMiner operators |
||
19:00 |
Social Event (Conference Dinner) |
Social Event (Visit of Bar District) |
Training: Basic Data Mining and Data Transformations
This is a short introductory training course for users who are not yet familiar with RapidMiner or only have a few experiences with RapidMiner so far. The topics of this training session include
- Basic Usage
- User Interface
- Creating and handling RapidMiner repositories
- Starting a new RapidMiner project
- Operators and processes
- Loading data from flat files
- Storing data, processes, and results
- Predictive Models
- Linear Regression
- Naïve Bayes
- Decision Trees
- Basic Data Transformations
- Changing names and roles
- Handling missing values
- Changing value types by discretization and dichotimization
- Normalization and standardization
- Filtering examples and attributes
- Scoring and Model Evaluation
- Applying models
- Splitting data
- Evaluation methods
- Performance criteria
- Visualizing Model Performance
Training: Advanced Data Mining and Data Transformations
This is a short introductory training course for users who already know some basic concepts of RapidMiner and data mining and have already used the software before, for example in the first training on Tuesday. The topics of this training session include
- Advanced Data Handling
- Sampling
- Balancing data
- Joins and Aggregations
- Detection and removal of outliers
- Dimensionality reduction
- Control process execution
- Remember process results
- Recall process results
- Loops
- Using branches and conditions
- Exception handling
- Definition of macros
- Usage of macros
- Definition of log values
- Clearing log tables
- Transforming log tables to data
Development Workshop Part 1 and Part 2
Want to exchange ideas with the developers of RapidMiner? Or learn more tricks for developing own operators and extensions? During our development workshops on Tuesday and Friday, we will build small groups of developers each working on a small development project around RapidMiner. Beginners will get a comprehensive overview of the architecture of RapidMiner before making the first steps and learn how to write own operators. Advanced developers will form groups with our experienced developers, identify shortcomings of RapidMiner and develop a new extension which might be presented during the conference already. Unfinished work can be continued in the second workshop on Friday before results might be published on the Marketplace or can be taken home as a starting point for new custom operators.





