Browsing update- Dear Decisionstats.com Reader

Wordpress default1 mainpage
Image via Wikipedia

In view of the recent root level breach of WordPress, which may include viewing source code for hidden hacks or Trojans, as effective immediately, please Decisionstats.com has no responsibility for any viruses, or Trojans that you may inadvertently download while on this website. I will be responsible for any deliberate malicious honey traps I put up , but any body putting an interesting comment with a link on this website , can and may direct you to phishing.

All disputes will be to subject to the jurisdiction of Tis Hazari Court, Delhi, India as already mentioned.

New book on BigData Analytics and Data mining using #Rstats with a GUI

Joseph Marie Jacquard
Image via Wikipedia

I am hoping to put this on my pre-ordered or Amazon Wish list. The book the common people who wanted to do data mining with , but were unable to ask aloud they didnt know much.  It is written by the seminal Australian authority on data mining Dr Graham Williams whom I interviewed here at https://decisionstats.com/2009/01/13/interview-dr-graham-williams/

Data Mining for the masses using an ergonomically designed Graphical User Interface.

Thank you Springer. Thank you Dr Graham Williams

http://www.springer.com/statistics/physical+%26+information+science/book/978-1-4419-9889-7

Data Mining with Rattle and R

Data Mining with Rattle and R

The Art of Excavating Data for Knowledge Discovery

Series: Use R

Williams, Graham

1st Edition., 2011, XX, 409 p. 150 illus. in color.

  • Softcover, ISBN 978-1-4419-9889-7

    Due: August 29, 2011

    54,95 €
  • Encourages the concept of programming with data – more than just pushing data through tools, but learning to live and breathe the data
  • Accessible to many readers and not necessarily just those with strong backgrounds in computer science or statistics
  • Details some of the more popular algorithms for data mining, as well as covering model evaluation and model deployment

Data mining is the art and science of intelligent data analysis. By building knowledge from information, data mining adds considerable value to the ever increasing stores of electronic data that abound today. In performing data mining many decisions need to be made regarding the choice of methodology, the choice of data, the choice of tools, and the choice of algorithms.

Throughout this book the reader is introduced to the basic concepts and some of the more popular algorithms of data mining. With a focus on the hands-on end-to-end process for data mining, Williams guides the reader through various capabilities of the easy to use, free, and open source Rattle Data Mining Software built on the sophisticated R Statistical Software. The focus on doing data mining rather than just reading about data mining is refreshing.

The book covers data understanding, data preparation, data refinement, model building, model evaluation,  and practical deployment. The reader will learn to rapidly deliver a data mining project using software easily installed for free from the Internet. Coupling Rattle with R delivers a very sophisticated data mining environment with all the power, and more, of the many commercial offerings.

Content Level » Research

Keywords » Data mining

Related subjects » Physical & Information Science

Related- https://decisionstats.com/2009/01/13/interview-dr-graham-williams/

Interviews with R Community

This chart represents several constituent comp...
Image via Wikipedia

Authors

Interview Luis Torgo Author Data Mining with R

https://decisionstats.com/2011/01/12/interview-luis-torgo-author-data-mining-with-r/

John Fox, R Commander

https://decisionstats.com/2009/09/14/interview-professor-john-fox-creator-r-commander/

Interview Dr Graham Williams RATTLE GUI

https://decisionstats.com/2009/01/13/interview-dr-graham-williams/

Hadley Wickham

https://decisionstats.com/2010/01/12/interview-hadley-wickham-r-project-data-visualization-guru/

R for SAS and SPSS Users

https://decisionstats.com/2009/01/21/r-for-sas-and-spss-users-2/

R for Stata Users

https://decisionstats.com/2010/06/29/interview-r-for-stata-users/

R Consulting

Interview David Katz ,Dataspora /David Katz Consulting

https://decisionstats.com/2011/02/11/interview-david-katz-dataspora-david-katz-consulting/

Case Study

(http://www.predictiveanalyticsworld.com/sanfrancisco/2011/agenda.php#day2-16a)

Room: Salon 5 & 6
4:45pm – 5:05pm

Track 2: Social Data and Telecom 
Case Study: Major North American Telecom
Social Networking Data for Churn Analysis

A North American Telecom found that it had a window into social contacts – who has been calling whom on its network. This data proved to be predictive of churn. Using SQL, and GAM in R, we explored how to use this data to improve the identification of likely churners. We will present many dimensions of the lessons learned on this engagement.

Speaker: David Katz, Senior Analyst, Dataspora, and President, David Katz Consulting

Q&A with David Smith, Revolution Analytics

https://decisionstats.com/2010/08/03/q-a-with-david-smith-revolution-analytics/

Inference for R

https://decisionstats.com/2009/06/04/inference-for-r/

David Smith Revolution Computing

https://decisionstats.com/2009/05/29/interview-david-smith-revolution-computing/

Richard Schultz Revolution Computing

https://decisionstats.com/2009/01/31/interviewrichard-schultz-ceo-revolution-computing/

Karime Chine, Elastic R

https://decisionstats.com/2009/06/21/interview-karim-chine-biocep-cloud-computing-with-r/

Top R Interviews

 

Portrait of baron A.I.Vassiliev (later - count)
Image via Wikipedia

 

Here is a list of the Top R Related Interviews I have done (in random order)-

1) John Fox , Creator of R Commander

https://decisionstats.com/2009/09/14/interview-professor-john-fox-creator-r-commander/

2) Dr Graham Williams, Creator of Rattle

https://decisionstats.com/2009/01/13/interview-dr-graham-williams/

3) David Smith, back when he was community Director of then Revolution Computing.

https://decisionstats.com/2009/05/29/interview-david-smith-revolution-computing/

and his second interview

https://decisionstats.com/2010/08/03/q-a-with-david-smith-revolution-analytics/

4) Robert Schultz, the first CEO of Revolution Computing (now Analytics)

https://decisionstats.com/2009/01/31/interviewrichard-schultz-ceo-revolution-computing/

5) Bob  Muenchen, author of R for SAS and SPSS users AND R for Stata users

https://decisionstats.com/2010/06/29/interview-r-for-stata-users/

https://decisionstats.com/2008/10/16/r-for-sas-and-spss-users/

6) Karim Chine, creator Biocep, Cloud Computing for R

https://decisionstats.com/2009/06/21/interview-karim-chine-biocep-cloud-computing-with-r/

7) Paul van Eikeran, Inference for R,the first enterprise package to use R from within MS Office.

https://decisionstats.com/2009/06/04/inference-for-r/

8) Hadley Wickham, creator GGPlot and R Author

https://decisionstats.com/2010/01/12/interview-hadley-wickham-r-project-data-visualization-guru/

Thats a lot of R interviews- I need to balance them out a bit I guess.

Rattle Re-Introduced

Latest version of Rattle just went online-

Here is the change log- Dr Graham Williams is also coming out with a book on using Rattle- the R GUI devoted to data mining.

Source-http://cran.r-project.org/web/packages/rattle/index.html

rattle (2.5.42) unstable; urgency=low

  * Update rattle.info() to recursively identify all dependencies,
 report
    their version number and any updates available from CRAN and generate
    command to update packages that have updates available. See
    ?rattle.info for the options.

  * Fix bug causing R Dataset option of the Evaluate window to always
    revert to the first named dataset.

  * Fix bug in transforms where weights were not being handled in
    refreshing of the Data tab.

  * Fix a bug in box plots when trying to label outliers when there aren't
    any.

 -- Graham Williams <Graham.Williams@togaware.com>  Sun, 
19 Sep 2010 05:01:51 +1000

rattle (2.5.41) unstable; urgency=low

  * Use GtkBuilder for Export dialog.

  * Test use of glade vs GtkBuilder on multiple platforms.

  * Rename rattle.info to rattle.version.

  * Add weight column to data tab.

  * Support weights for nnet, multinom, survival.

  * Add weights information to PMML as a PMML Extension.

  * Ensure GtkFrame is available as a data type whilst waiting for 
updated
    RGtk2.

  * Bug fix to packageIsAvailable not reruning any result.

  * Replace destroy with withdraw for plot window as the former has
    started crashing R.

  * Improve Log formatting for various model build commands.

  * Be sure to include the car package for Anova for multinom models.

  * Release pmml 1.2.24: Bug fix glm binomial regression - note as
    classification model.

 -- Graham Williams <Graham.Williams@togaware.com>  Wed, 15 Sep 2010 
14:56:09 +1000
And a video I did of exploring various Rattle options using Camtasia,
 a very useful software for screen capture and video tutorials
from http://www.techsmith.com/download/camtasiatrial.asp
Updated- my video skils being quite bad- I replaced it with another video. 
However Camtasia is the best screen capture video tool
Also , an update Analyticdroid is on hold for now. see- for more details http://rattle.togaware.com/

Data Mining through the Android

Here is something interesting (I probably have to ask someone or wait for Android to come to India to do this personally0.

It uses the Android App Development ( which is quite easy if you have a Linux) and basically runs R from the cloud using a GUI Rattle. Fire away the data while watching a movie or just on the go !

See this-

http://analyticdroid.togaware.com/

Question- How useful do you think it will be to do this?  Would you like to run R on your mobile?

Interview :Dr Graham Williams

(Updated with comments from Dr Graham in the comments section )


I have often talked about how the Graphical User Interface ,Rattle for R language makes learning R and building models quite simple. Rattle‘s latest version has been released and got extensive publicity including in KD Nuggets .I wrote to it’s creator Dr Graham, and he agreed for an extensive interview explaining data mining, its evolution and the philosophy and logic behind open source languages like R as well as Rattle.

Dr Graham Williams is the author of the Rattle data mining software and Adjunct Professor, University of Canberra and Australian National University.  Rattle is available from rattle.togaware.com.

Ajay Could you describe your career journey . What made you enter this field and what experiences helped shape your perspectives . What would your advice be to young professionals entering this field today.

Graham – With a PhD in Artificial Intelligence (topic: combining multiple decision trees to build ensembles) and a strong interest in practical applications, I started out in the late 1980’s developing expert systems for business and government, including bank loan assessment systems and bush fire prediction.

When data mining emerged as a discipline in the early 1990’s I was involved in setting up the first data mining team in Australia with the government research organization (CSIRO). In 2004 I joined the Australian Taxation Office and provide the technical lead for the deployment of its Analytics team, overseeing the development of a data
mining capability. I have been teaching data mining at the Australian National University (and elsewhere) since 1995 and continue to do so.

The business needs for Data Mining and Analytics continues to grow, although courses in Data Mining are still not so common. A data miner combines good backgrounds in Computer Science and Statistics. The Computer Science is too little emphasized, but is crucial for skills in developing repeatable procedures and good software engineering
practices, which I believe to be important in Data Mining.

Data Mining is more than just using a point and click graphical user interface (GUI). It is an experimental endeavor where we really need to be able to follow our nose as we explore through our data, and then capture the whole process in an automatically repeatable manner that can be readily communicated to others. A programming language offers this sophisticated level of communications.

Too often, I see analysts, when given a new dataset that updates last years data, essentially start from scratch with the data pre-processing, cleaning, and then mining, rather than beginning with last year’s captured processes and tuning to this year’s data.  The GUI generation of software often does not encourage repeatability.

Ajay -What made you get involved with R . What is the advantage of using Rattle
versus normal R.

Graham- I have used Clementine and SAS Enterprise miner over many years (and IBM’s original Intelligent Miner and Thinking Machines’ Darwin, and many other tools that emerged early on with Data Mining). Commercial vendors come and go (even large one’s like IBM, in terms of the products they support).

Lock-in is one problem with commercial tools. Another is that many vendors, understandably, won’t put resources into new algorithms until they are well accepted.
Because it is open source, R is robust, reliable, and provides access to the most advanced statistics. Many research Statisticians publish their new algorithms in R. But what is most important is that the source code is always going to be available. Not everyone has the skill to delve into that source code, but at least we have a chance to
do so. We also know that there is a team of highly qualified developers whose work is openly peer reviewed. I can monitor their coding changes, if I so wanted.  This helps ensure quality and integrity.

Rolling out R to a community of data analysts, though, does present challenges. Being primarily a language for statistics, we need to learn to speak that language. That is, we need to communicate with language rather than pictures (or GUI). It is, of course, easier to draw pictures, but pictures can be limiting. I believe a written language allows us to express and communicate ideas better and more formally. But it needs to be with the philosophy that we are communicating those ideas to our fellow humans, not just writing code to be executed by the computer.

Nonetheless, GUIs are great as memory aides, for doing simple tasks, and for learning how to perform particular tasks. Rattle aims to do the standard data mining steps, but to also expose everything that is done as R commands in the log. In fact, the log is designed to be able to be run as an R script, and to teach the user the R commands.

Ajay- What are the advantages of using Rattle  instead of SAS or SPSS. What are the disadvantages of using Rattle instead of SAS or SPSS.

Graham- Because it is free and open source, Rattle (and R) can be readily used in teaching data mining.  In business it is, initially, useful for people who want to experiment with data mining without the sometimes quite significant up front costs of the commercial offerings. For serious data mining, Rattle and R offers all of the data mining algorithms offered by the commercial vendors, but also many more. Rattle provides a simple, tab-based, user interface which is not as graphically sophisticated as Clementine in SPSS and SAS Enterprise Miner.

But with just 4 button clicks you will have built your first data mining model.

The usual disadvantage quoted for R (and so Rattle) is in the handling of large datasets – SAS and SPSS can handle datasets out of memory although they do slow down when doing so. R is memory based, so going to a 64bit platform is often necessary for the larger datasets. A very rough rule of thumb has been that the 2-3GB limit of the common 32bit processors can handle a dataset of up to about 50,000 rows with 100 columns (or 100,000 rows and 10 columns, etc), depending on the algorithms you deploy. I generally recommend, as quite a powerful yet inexpensive data mining machine, one running on an AMD64 processor, running the Debian GNU/Linux operating system, with as much memory as you can afford (e.g., 4GB to 32GB, although some machines today can go up to 128 GB, but memory gets expensive at that end of the scale).

Ajay – Rattle is free to download and use- yet it must have taken you some time
to build it.What are your revenue streams to support your time and efforts?

Graham –Yes, Rattle is free software: free for anyone to use, free to review the code, free to extend the code, free to use it for whatever purpose.  I have been developing Rattle for a few years now, with a number of
contributions from other users. Rattle, of course, gets its full power from R. The R community works together to help each other,
and others, for the benefit of all. Rattle and R can be the basic toolkit for knowledge workers providing analyses. I know of a number of data mining consultants around the world who are using Rattle to support their day-to-day consultancy work.

As a company, Togaware provides user support, installations of R and Rattle, runs training in using Rattle and in doing data mining. It also delivers data mining projects to clients. Togaware also provides support for incorporating Rattle (and R) into other products (e.g., as RStat for Information Builders).

Ajay – What is your vision of analytics for the future. How do you think the recession of 2008 and slowdown in 2009 will affect choice of softwares.

Graham- Watching the growth of data mining and analytics over the past 18 years it does seem that there has been and continues to be a monotonically increasing interest and demand for Analytics. Analytics continues to demonstrate benefit.

The global financial crisis, as others have suggested, should lead organizations to consider alternatives to expensive software. Good quality free and open source software has been available for a while now, but the typical CTO is still more comfortable purchasing expensive software. A purchase gives some sense of (false?) security but formally provides no warranty. My philosophy has been that we
should invest in our people, within an organization, and treat software as a commodity, that we openly contribute back into.

Imagine a world where we only use free open source software. The savings made by all will be substantial (consider OpenOffice versus MS/Office license fees paid by governments world wide, or Rattle versus SAS Enterprise Miner annual license fees). A small part of that saving might be expended on ensuring we have staff who are capable of understanding and extending that software to suit our needs, rather than vice versa (i.e., changing our needs to suit the software). We feed our extensions back into the grid of open source software, whilst also benefiting from contributions others are making. Some commercial vendors like to call this “communism” as part of their attempt to discredit open source, but we had better learn to share, for the good of the planet, before we lose it.

( Note from Ajay – If you are curious to try R , and have just 15 minutes to try it in, download Rattle from rattle.togaware.com. It has a click and point  interface and auto generates R code in it’s log. Trust me, it would time well spent.)