Revolution Analytics Product Launches for #rstats in 2011

Revolution Analytics just launched an roadmap detailing their product plan for 2011.

 

In particular I am excited for the new GUI coming up, the Hadoop packages, new K Means and Data Sort/merge using Revoscaler for bigger datasets, and also the option to offer support for community packages like ggplot2 titled ” More value in Community Version”. Continue reading “Revolution Analytics Product Launches for #rstats in 2011”

Contribution to #Rstats by Revolution

I have been watching for Revolution Analytics product almost since the inception of the company. It has managed to sail over storms, naysayers and critics with simple and effective strategy of launching good software, making good partnerships and keeping up media visibility with white papers, joint webinars, blogs, conferences and events.

However this is a listing of all technical contributions made by Revolution Analytics products to the #rstats project.

1) Useful Packages mostly in parallel processing or more efficient computing like

 

2) RevoScaler package to beat R’s memory problem (this is probably the best in my opinion as it is yet to be replicated by the open source version and is a clear cut reason for going in for the paid version)

http://www.revolutionanalytics.com/products/enterprise-big-data.php

  • Efficient XDF File Format designed to efficiently handle huge data sets.
  • Data Step Functionality to quickly clean, transform, explore, and visualize huge data sets.
  • Data selection functionality to store huge data sets out of memory, and select subsets of rows and columns for in-memory operation with all R functions.
  • Visualize Large Data sets with line plots and histograms.
  • Built-in Statistical Algorithms for direct analysis of huge data sets:
    • Summary Statistics
    • Linear Regression
    • Logistic Regression
    • Crosstabulation
  • On-the-fly data transformations to include derived variables in models without writing new data files.
  • Extend Existing Analyses by writing user- defined R functions to “chunk” through huge data sets.
  • Direct import of fixed-format text data files and SAS data sets into .xdf format

 

3) RevoDeploy R for  API based R solution – I somehow think this feature will get more important as time goes on but it seems a lower visibility offering right now.

http://www.revolutionanalytics.com/products/enterprise-deployment.php

  • Collection of Web services implemented as a RESTful API.
  • JavaScript and Java client libraries, allowing users to easily build custom Web applications on top of R.
  • .NET Client library — includes a COM interoperability to call R from VBA
  • Management Console for securely administrating servers, scripts and users through HTTP and HTTPS.
  • XML and JSON format for data exchange.
  • Built-in security model for authenticated or anonymous invocation of R Scripts.
  • Repository for storing R objects and R Script execution artifacts.

 

4) Revolutions IDE (or Productivity Environment) for a faster coding environment than command line. The GUI by Revolution Analytics is in the works. – Having used this- only the Code Snippets function is a clear differentiator from newer IDE and GUI. The code snippets is awesome though and even someone who doesnt know much R can get analysis set up quite fast and accurately.

http://www.revolutionanalytics.com/products/enterprise-productivity.php

  • Full-featured Visual Debugger for debugging R scripts, with call stack window and step-in, step-over, and step-out capability.
  • Enhanced Script Editor with hover-over help, word completion, find-across-files capability, automatic syntax checking, bookmarks, and navigation buttons.
  • Run Selection, Run to Line and Run to Cursor evaluation
  • R Code Snippets to automatically generate fill-in-the-blank sections of R code with tooltip help.
  • Object Browser showing available data and function objects (including those in packages), with context menus for plotting and editing data.
  • Solution Explorer for organizing, viewing, adding, removing, rearranging, and sourcing R scripts.
  • Customizable Workspace with dockable, floating, and tabbed tool windows.
  • Version Control Plug-in available for the open source Subversion version control software.

 

Marketing contributions from Revolution Analytics-

1) Sponsoring R sessions and user meets

2) Evangelizing R at conferences  and partnering with corporate partners including JasperSoft, Microsoft , IBM and others at http://www.revolutionanalytics.com/partners/

3) Helping with online initiatives like http://www.inside-r.org/ (which is curiously dormant and now largely superseded by R-Bloggers.com) and the syntax highlighting tool at http://www.inside-r.org/pretty-r. In addition Revolution has been proactive in reaching out to the community

4) Helping pioneer blogging about R and Twitter Hash tag discussions , and contributing to Stack Overflow discussions. Within a short while, #rstats online community has overtaken a lot more established names- partly due to decentralized nature of its working.

 

Did I miss something out? yes , they share their code by GPL.

 

Let me know by feedback

Workflows and MyExperiment.org

Here is a great website for sharing workflows – it is called MyExperiment.org and it can also include Work flows from many software.

myExperiment currently has 4742 members270 groups1842 workflows423 files and 173 packs

Could it also include workflow from Red-R from #rstats or Enterprise Miner

Continue reading “Workflows and MyExperiment.org”

Interview- Top Data Mining Blogger on Earth , Sandro Saitta

Surajustement Modèle 2
Image via Wikipedia

If you do a Google search for Data Mining Blog- for the past several years one Blog will come on top. data mining blog – Google Search http://bit.ly/kEdPlE

To honor 5 years of Sandro Saitta’s blog (yes thats 5 years!) , we cover an exclusive interview with him where he reveals his unique sauce for cool techie blogging.

Ajay- Describe your journey as a scientist and data miner, from early experiences, to schooling to your work/research/blogging.

Sandro- My first experience with data mining was my master project. I used decision tree to predict pollen concentration for the following week using input data such as wind, temperature and rain. The fact that an algorithm can make a computer learn from experience was really amazing to me. I found it so interesting that I started a PhD in data mining. This time, the field of application was civil engineering. Civil engineers put a lot of sensors on their structure in order to understand how they behave. With all these sensors they generate a lot of data. To interpret these data, I used data mining techniques such as feature selection and clustering. I started my blog, Data Mining Research, during my PhD, to share with other researchers.

I then started applying data mining in the stock market as my first job in industry. I realized the difference between image recognition, where 99% correct classification rate is state of the art, and stock market, where you’re happy with 55%. However, the company ambiance was not as good as I thought, so I moved to consulting. There, I applied data mining in behavioral targeting to increase click-through rates. When you compare the number of customers who click with the ones who don’t, then you really understand what class imbalance mean. A few months ago, I accepted a very good opportunity at SICPA. I’m looking forward to resolving new challenges there.

Ajay- Your blog is the top ranked blog for “data mining blog”. Could you share some tips on better blogging for analytics and technical people

Sandro- It’s always difficult to start a blog, since at the beginning you have no reader. Writing for nobody may seem stupid, but it is not. By writing my first posts during my PhD I was reorganizing my ideas. I was expressing concepts which were not always clear to me. I thus learned a lot and also improved my English level. Of course, it’s still not perfect, but I hope most people can understand me.

Next come the readers. A few dozen each week first. To increase this number, I then started to learn SEO (Search Engine Optimization) by reading books and blogs. I tested many techniques that increased Data Mining Research visibility in the blogosphere. I think SEO is interesting when you already have some content published (which means not at the very beginning of your blog). After a while, once your blog is nicely ranked, the main task is to work on the content of the blog. To be of interest, your content must be particular: original, informative or provocative for example. I also had the chance to have a good visibility thanks to well-known people in the field like Kevin Hillstrom, Gregory Piatetsky-Shapiro, Will Dwinnell / Dean Abbott, Vincent Granville, Matthew Hurst and many others.

Ajay- Whats your favorite statistical software and what are the various softwares that you have worked with.
Could you compare and contrast these software as well.

Sandro- My favorite software at this point is SAS. I worked with it for two years. Once you know the language, you can perform ETL and data mining so easily. It’s also very fast compared to others. There are a lot of tools for data mining, but I cannot think of a tool that is as powerful as SAS and, in the same time, has a high-level programming language behind it.

I also worked with R and Matlab. R is very nice since you have all the up-to-date data mining algorithms implemented. However, working in the memory is not always a good choice, especially for ETL. Matlab is an excellent tool for prototyping. It’s not so fast and certainly not done for ETL, but the price is low regarding all the possibilities for data mining. According to me, SAS is the best choice for ETL and a good choice for data mining. Of course, there is the price.

Ajay- What are your favorite techniques and training resources for learning basics of data mining to say statisticians or business management graduates.

Sandro- I’m the kind of guy who likes to read books. I read data mining books one after the other. The fact that the same concepts are explained differently (and by different people) helps a lot in learning a topic like data mining. Of course, nothing replaces experience in the field. You can read hundreds of books, you will still not be a good practitioner until you really apply data mining in specific fields. My second choice after books is blogs. By reading data mining blogs, you will really see the issues and challenges in the field. It’s still not experience, but we are closer. Finally, web resources and networks such as KDnuggets of course, but also AnalyticBridge and LinkedIn.

Ajay- Describe your hobbies and how they help you ,if at all in your professional life.

Sandro- One of my hobbies is reading. I read a lot of books about data mining, SEO, Google as well as Sci-Fi and Fantasy. I’m a big fan of Asimov by the way. My other hobby is playing tennis. I think I simply use my hobbies as a way to find equilibrium in my life. I always try to find the best balance between work, family, friends and sport.

Ajay- What are your plans for your website for 2011-2012.

Sandro- I will continue to publish guest posts and interviews. I think it is important to let other people express themselves about data mining topics. I will not write about my current applications due to the policies of my current employer. But don’t worry, I still have a lot to write, whether it is technical or not. I will also emphasis more on my experience with data mining, advices for data miners, tips and tricks, and of course book reviews!

Standard Disclosure of Blogging- Sandro awarded me the Peoples Choice award for his blog for 2010 and carried out my interview. There is a lot of love between our respective wordpress blogs, but to reassure our puritan American readers- it is platonic and intellectual.

About Sandro S-



Sandro Saitta is a Data Mining Research Engineer at SICPA Security Solutions. He is also a blogger at Data Mining Research (www.dataminingblog.com). His interests include data mining, machine learning, search engine optimization and website marketing.

You can contact Mr Saitta at his Twitter address- 

https://twitter.com/#!/dataminingblog

#Rstats gets into Enterprise Cloud Software

Defense Agencies of the United States Departme...
Image via Wikipedia

Here is an excellent example of how websites should help rather than hinder new customers take a demo of the software without being overwhelmed by sweet talking marketing guys who dont know the difference between heteroskedasticity, probability, odds and likelihood.

It is made by Zementis (Dr Michael Zeller has been a frequent guest here) and Revolution Analytics is still the best shot in Enterprise software for #Rstats

Now if only Revo could get into the lucrative Department of Energy or Department of Defense business- they could change the world AND earn some more revenue than they have been doing. But seriously.

Check out http://deployr.revolutionanalytics.com/zementis/ and play with it. or better still mash it with some data viz and ROC curves.- or extend it with some APIS 😉

East loves Gold and USD. and chokes on it

A brief analysis shows how Eastern Hemisphere loves gold and USD so much

I did the graph in JMP since it is an easier GUI for me to use (I do have some learning disabilities).

https://www.cia.gov/library/publications/the-world-factbook/rankorder/2188rank.html

RANK
COUNTRY RESERVES OF FOREIGN EXCHANGE AND GOLD DATE OF INFORMATION
1 China
$ 2,622,000,000,000
31 December 2010 est.
2 Japan
$ 1,096,000,000,000
31 December 2010 est.
3 Russia
$ 483,100,000,000
30 November 2010
4 Saudi Arabia
$ 456,200,000,000
31 December 2010 est.
5 Taiwan
$ 387,200,000,000
31 December 2010 est.
6 Brazil
$ 290,900,000,000
31 December 2010 est.
7 India
$ 284,100,000,000
31 December 2010 est.
8 Korea, South
$ 274,600,000,000
31 December 2010 est.
9 Hong Kong
$ 268,900,000,000
31 December 2010 est.
10 Switzerland
$ 236,600,000,000
31 December 2010
11 Singapore
$ 225,800,000,000
31 December 2010 est.
12 Thailand
$ 176,100,000,000
31 December 2010 est.
13 Algeria
$ 150,100,000,000
31 December 2010 est.
14 Mexico
$ 116,400,000,000
31 December 2010 est.
15 Libya
$ 107,300,000,000
31 December 2010 est.
16 Malaysia
$ 106,500,000,000
31 December 2010 est.
17 Poland
$ 99,760,000,000
31 December 2010 est.
18 Indonesia
$ 96,210,000,000
31 December 2010 est.
19 Turkey
$ 78,000,000,000
31 December 2010 est.
20 Iran
$ 75,060,000,000
31 December 2010 est.
21 Israel
$ 66,980,000,000
31 December 2010 est.
22 Philippines
$ 62,370,000,000
31 December 2010 est.
23 Argentina
$ 53,610,000,000
31 December 2010 est.
24 Romania
$ 50,510,000,000
31 December 2010 est.
25 Iraq
$ 45,680,000,000
31 December 2010 est.
26 South Africa
$ 45,520,000,000
31 December 2010 est.
27 Hungary
$ 44,990,000,000
31 December 2010 est.
28 Peru
$ 44,110,000,000
31 December 2010
29 Nigeria
$ 43,360,000,000
31 December 2010 est.
30 Czech Republic
$ 42,340,000,000
31 December 2010 est.
31 Lebanon
$ 41,570,000,000
31 December 2010 est.
32 United Arab Emirates
$ 39,100,000,000
31 December 2010 est.
33 Australia
$ 38,620,000,000
31 December 2010 est.
34 Egypt
$ 35,720,000,000
31 December 2010 est.
35 Ukraine
$ 32,910,000,000
31 December 2010 est.
36 Kazakhstan
$ 32,440,000,000
31 December 2010 est.
37 Venezuela
$ 29,490,000,000
31 December 2010 est.
38 Colombia
$ 28,500,000,000
31 December 2010 est.
39 Chile
$ 26,080,000,000
31 December 2010 est.
40 Morocco
$ 24,570,000,000
31 December 2010 est.
41 Macau
$ 23,730,000,000
42 Kuwait
$ 22,420,000,000
31 December 2010 est.
43 Qatar
$ 22,410,000,000
31 December 2010 est.
44 Austria
$ 21,890,000,000
31 December 2010 est.
45 Syria
$ 17,960,000,000
31 December 2010 est.
46 New Zealand
$ 17,850,000,000
31 December 2010 est.
47 Bulgaria
$ 17,270,000,000
31 December 2010 est.
48 Angola
$ 16,890,000,000
31 December 2010 est.
49 Pakistan
$ 16,100,000,000
31 December 2010 est.
50 Serbia
$ 15,100,000,000
30 November 2010 est.
51 Oman
$ 14,000,000,000
31 December 2010 est.
52 Croatia
$ 13,790,000,000
31 December 2010 est.
53 Vietnam
$ 13,000,000,000
31 December 2010 est.
54 Jordan
$ 12,640,000,000
31 December 2010 est.
55 Tunisia
$ 11,230,000,000
31 December 2010 est.
56 Turkmenistan
$ 10,810,000,000
31 December 2010 est.
57 Bangladesh
$ 10,790,000,000
31 December 2010 est.
58 Uzbekistan
$ 10,500,000,000
31 December 2010 est.
59 Bolivia
$ 9,730,000,000
31 December 2010 est.
60 Trinidad and Tobago
$ 9,659,000,000
31 December 2010 est.
61 Finland
$ 9,128,000,000
31 December 2010 est.
62 Botswana
$ 7,834,000,000
31 December 2010 est.
63 Uruguay
$ 7,700,000,000
31 December 2010 est.
64 Latvia
$ 7,170,000,000
31 December 2010 est.
65 Lithuania
$ 6,418,000,000
31 December 2010 est.
66 Azerbaijan
$ 6,330,000,000
31 December 2010 est.
67 Belarus
$ 5,755,000,000
31 December 2010 est.
68 Yemen
$ 5,744,000,000
31 December 2010 est.
69 Guatemala
$ 5,709,000,000
31 December 2010 est.
70 Sri Lanka
$ 5,630,000,000
31 December 2010 est.
71 Cuba
$ 4,847,000,000
31 December 2010 est.
72 Kenya
$ 4,585,000,000
31 December 2010 est.
73 Costa Rica
$ 4,584,000,000
31 December 2010 est.
74 Iceland
$ 4,206,000,000
31 December 2010 est.
75 Bosnia and Herzegovina
$ 4,200,000,000
31 December 2010 est.
76 Paraguay
$ 4,130,000,000
31 December 2010 est.
77 Congo, Republic of the
$ 4,123,000,000
31 December 2010 est.
78 Equatorial Guinea
$ 4,086,000,000
31 December 2010 est.
79 Cameroon
$ 4,023,000,000
31 December 2010 est.
80 Cote d’Ivoire
$ 3,985,000,000
31 December 2010 est.
81 Cambodia
$ 3,840,000,000
31 December 2010 est.
82 Ghana
$ 3,800,000,000
31 December 2010 est.
83 Bahrain
$ 3,766,000,000
31 December 2010 est.
84 Burma
$ 3,762,000,000
31 December 2010 est.
85 Uganda
$ 3,743,000,000
31 December 2010 est.
86 Tanzania
$ 3,687,000,000
31 December 2010 est.
87 Estonia
$ 3,641,000,000
31 December 2010 est.
88 Ecuador
$ 3,590,000,000
31 December 2010 est.
89 Panama
$ 3,525,000,000
31 December 2010 est.
90 Papua New Guinea
$ 3,017,000,000
31 December 2010 est.
91 El Salvador
$ 2,882,000,000
31 December 2010 est.
92 Dominican Republic
$ 2,705,000,000
31 December 2010 est.
93 Gabon
$ 2,602,000,000
31 December 2010 est.
94 Mauritius
$ 2,360,000,000
31 December 2010 est.
95 Georgia
$ 2,350,000,000
31 December 2010 est.
96 Honduras
$ 2,302,000,000
31 December 2010 est.
97 Zambia
$ 2,287,000,000
31 December 2010 est.
98 Armenia
$ 2,247,000,000
31 December 2010 est.
99 Macedonia
$ 2,217,000,000
30 November 2010 est.
100 Senegal
$ 2,200,000,000
31 December 2010 est.
101 Ireland
$ 2,104,000,000
31 December 2010
102 Sudan
$ 2,063,000,000
31 December 2010 est.
103 Albania
$ 1,992,000,000
31 December 2010 est.
104 Mozambique
$ 1,982,000,000
31 December 2010 est.
105 Namibia
$ 1,961,000,000
31 December 2010 est.
106 Ethiopia
$ 1,880,000,000
31 December 2010 est.
107 Jamaica
$ 1,850,000,000
31 December 2010 est.
108 Moldova
$ 1,710,000,000
31 December 2010 est.
109 Kyrgyzstan
$ 1,615,000,000
31 December 2010 est.
110 Burkina Faso
$ 1,588,000,000
31 December 2010 est.
111 Haiti
$ 1,587,000,000
31 December 2010 est.
112 Nicaragua
$ 1,580,000,000
31 December 2010 est.
113 Benin
$ 1,254,000,000
31 December 2010 est.
114 Slovakia
$ 1,160,000,000
31 January 2010 est.
115 Madagascar
$ 1,038,000,000
31 December 2010 est.
116 Congo, Democratic Republic of the
$ 1,010,000,000
March 2010 est.
117 Lesotho
$ 893,000,000
31 December 2010 est.
118 Chad
$ 868,000,000
31 December 2010 est.
119 Rwanda
$ 816,000,000
31 December 2010 est.
120 Laos
$ 756,000,000
31 December 2010 est.
121 Swaziland
$ 708,000,000
31 December 2010 est.
122 Togo
$ 686,000,000
31 December 2010 est.
123 Barbados
$ 620,000,000
2007
124 Malta
$ 522,000,000
31 December 2010 est.
125 Guyana
$ 506,000,000
31 December 2010 est.
126 Zimbabwe
$ 376,000,000
31 December 2010 est.
127 Burundi
$ 320,000,000
31 December 2010 est.
128 Tajikistan
$ 303,000,000
31 December 2010 est.
129 Malawi
$ 301,000,000
31 December 2010 est.
130 Cape Verde
$ 296,000,000
31 December 2010 est.
131 Suriname
$ 263,300,000
2006
132 Belize
$ 219,000,000
31 December 2010 est.
133 Gambia, The
$ 203,000,000
31 December 2010 est.
134 Seychelles
$ 193,000,000
31 December 2010 est.
135 Eritrea
$ 104,000,000
31 December 2010 est.
136 Samoa
$ 70,150,000
FY03/04
137 Sao Tome and Principe
$ 46,000,000
31 December 2010 est.
138 Tonga
$ 40,830,000
FY04/05
139 Vanuatu
$ 40,540,000
2003

New book on BigData Analytics and Data mining using #Rstats with a GUI

Joseph Marie Jacquard
Image via Wikipedia

I am hoping to put this on my pre-ordered or Amazon Wish list. The book the common people who wanted to do data mining with , but were unable to ask aloud they didnt know much.  It is written by the seminal Australian authority on data mining Dr Graham Williams whom I interviewed here at https://decisionstats.com/2009/01/13/interview-dr-graham-williams/

Data Mining for the masses using an ergonomically designed Graphical User Interface.

Thank you Springer. Thank you Dr Graham Williams

http://www.springer.com/statistics/physical+%26+information+science/book/978-1-4419-9889-7

Data Mining with Rattle and R

Data Mining with Rattle and R

The Art of Excavating Data for Knowledge Discovery

Series: Use R

Williams, Graham

1st Edition., 2011, XX, 409 p. 150 illus. in color.

  • Softcover, ISBN 978-1-4419-9889-7

    Due: August 29, 2011

    54,95 €
  • Encourages the concept of programming with data – more than just pushing data through tools, but learning to live and breathe the data
  • Accessible to many readers and not necessarily just those with strong backgrounds in computer science or statistics
  • Details some of the more popular algorithms for data mining, as well as covering model evaluation and model deployment

Data mining is the art and science of intelligent data analysis. By building knowledge from information, data mining adds considerable value to the ever increasing stores of electronic data that abound today. In performing data mining many decisions need to be made regarding the choice of methodology, the choice of data, the choice of tools, and the choice of algorithms.

Throughout this book the reader is introduced to the basic concepts and some of the more popular algorithms of data mining. With a focus on the hands-on end-to-end process for data mining, Williams guides the reader through various capabilities of the easy to use, free, and open source Rattle Data Mining Software built on the sophisticated R Statistical Software. The focus on doing data mining rather than just reading about data mining is refreshing.

The book covers data understanding, data preparation, data refinement, model building, model evaluation,  and practical deployment. The reader will learn to rapidly deliver a data mining project using software easily installed for free from the Internet. Coupling Rattle with R delivers a very sophisticated data mining environment with all the power, and more, of the many commercial offerings.

Content Level » Research

Keywords » Data mining

Related subjects » Physical & Information Science

Related- https://decisionstats.com/2009/01/13/interview-dr-graham-williams/