Data Mining with R GUI -Rattle #Rstats

Why is RATTLE my favorite R package?
because it allows data mining in a very nice interface.
Complicated software need not have complicated interfaces.
Have a look-

(Note- download rattle from

For better visibility please click the full screen button or click the second pps below- automatically advances every 5 secs

Hearst DataMining Challenge

Check out the Hearst Data Mining Challenge- a new competition-sponsored by DMA, Hearst Magazine, and EXL




Over the years, the magazine publishing industry has made significant strides in improving subscription based circulation by developing analytic frameworks that better predict customer response to acquisition and renewal offers. The objective of this contest is to apply the same analytic discipline and effectively predict newsstand locations “response”. Specifically the objective is to predict the number of copies to be placed in each newsstand location to optimize the overall contribution of the newsstand location typically referred to as draw.

Data for the competition is provided by CMG and Experian.



HOW TO ENTER: Beginning October 14th, 2010 at 12:01 AM (ET) throughDecember 3rd, 2010 at 11:59 PM (ET) go to the Hearst Challenge website located at (the “Site”) and complete and submit the entry form pursuant to the onscreen instructions. Entrants will be provided a historical sample of newsstand location draw, sales and associated location level data to help develop their predictive algorithm. Hearst will in turn hold back two distinct sets of draw/sales data, one to be used as a validation set by the contestant and one to be used as a final contest evaluation set. Entrants may not include any other external variables for the challenge. Additional details will be provided with the data. Entrants will be able to track their performance against the validation set throughout the course of the challenge via a leader tracking board to be made available on the Site. Entries must include the following documentation:

  • Data file with id variables and expected sales values by store and publication
  • The final model/ algorithm code used to score the final data set
  • Any supporting documentation that pertains to the development of the submitted model/algorithm including variable creation. Variables that were used in the model need to be traced through from input to coefficient / node (if using a tree based methodology).

Check out for further details.