Here is the winner of the Data Mining Research People Award 2010: Ajay Ohri! Thanks to Ajay for giving some time to answer Data Mining Research questions. And all the best to his blog, Decision Stat!
Data Mining Research (DMR): Could you please introduce yourself to the readers of Data Mining Research?
Ajay Ohri (AO): I am a business consultant and writer based out of Delhi- India. I have been working in and around the field of business analytics since 2004, and have worked with some very good and big companies primarily in financial analytics and outsourced analytics. Since 2007, I have been writing my blog at http://decisionstats.com which now has almost 10,000 views monthly.
All in all, I wrote about data, and my hobby is also writing (poetry). Both my hobby and my profession stem from my education ( a masters in business, and a bachelors in mechanical engineering).
My research interests in data mining are interfaces (simpler interfaces to enable better data mining), education (making data mining less complex and accessible to more people and students), and time series and regression (specifically ARIMAX)
In business my research interests software marketing strategies (open source, Software as a service, advertising supported versus traditional licensing) and creation of technology and entrepreneurial hubs (like Palo Alto and Research Triangle, or Bangalore India).
DMR: I know you have worked with both SAS and R. Could you give your opinion about these two data mining tools?
AO: As per my understanding, SAS stands for SAS language, SAS Institute and SAS software platform. The terms are interchangeably used by people in industry and academia- but there have been some branding issues on this.
I have not worked much with SAS Enterprise Miner , probably because I could not afford it as business consultant, and organizations I worked with did not have a budget for Enterprise Miner.
I have worked alone and in teams with Base SAS, SAS Stat, SAS Access, and SAS ETS- and JMP. Also I worked with SAS BI but as a user to extract information.
You could say my use of SAS platform was mostly in predictive analytics and reporting, but I have a couple of projects under my belt for knowledge discovery and data mining, and pattern analysis. Again some of my SAS experience is a bit dated for almost 1 year ago.
I really like specific parts of SAS platform – as in the interface design of JMP (which is better than Enterprise Guide or Base SAS ) -and Proc Sort in Base SAS- I guess sequential processing of data makes SAS way faster- though with computing evolving from Desktops/Servers to even cheaper time shared cloud computers- I am not sure how long Base SAS and SAS Stat can hold this unique selling proposition.
I dislike the clutter in SAS Stat output, it confuses me with too much information, and I dislike shoddy graphics in the rendering output of graphical engine of SAS. Its shoddy coding work in SAS/Graph and if JMP can give better graphics why is legacy source code preventing SAS platform from doing a better job of it.
I sometimes think the best part of SAS is actually code written by Goodnight and Sall in 1970’s , the latest procs don’t impress me much.
SAS as a company is something I admire especially for its way of treating employees globally- but it is strange to see the rest of tech industry not following it. Also I don’t like over aggression and the SAS versus Rest of the Analytics /Data Mining World mentality that I sometimes pick up when I deal with industry thought leaders.
I think making SAS Enterprise Miner, JMP, and Base SAS in a completely new web interface priced at per hour rates is my wishlist but I guess I am a bit sentimental here- most data miners I know from early 2000’s did start with SAS as their first bread earning software. Also I think SAS needs to be better priced in Business Intelligence- it seems quite cheap in BI compared to Cognos/IBM but expensive in analytical licensing.
If you are a new stats or business student, chances are – you may know much more R than SAS today. The shift in education at least has been very rapid, and I guess R is also more of a platform than a analytics or data mining software.
I like a lot of things in R- from graphics, to better data mining packages, modular design of software, but above all I like the can do kick ass spirit of R community. Lots of young people collaborating with lots of young to old professors, and the energy is infectious. Everybody is a CEO in R ’s world. Latest data mining algols will probably start in R, published in journals.
Which is better for data mining SAS or R? It depends on your data and your deadline. The golden rule of management and business is -it depends.
Also I have worked with a lot of KXEN, SQL, SPSS.
DMR: Can you tell us more about Decision Stats? You have a traffic of 120′000 for 2010. How did you reach such a success?
AO: I don’t think 120,000 is a success. Its not a failure. It just happened- the more I wrote, the more people read.In 2007-2008 I used to obsess over traffic. I tried SEO, comments, back linking, and I did some black hat experimental stuff. Some of it worked- some didn’t.
In the end, I started asking questions and interviewing people. To my surprise, senior management is almost always more candid , frank and honest about their views while middle managers, public relations, marketing folks can be defensive.
Social Media helped a bit- Twitter, Linkedin, Facebook really helped my network of friends who I suppose acted as informal ambassadors to spread the word.
Again I was constrained by necessity than choices- my middle class finances ( I also had a baby son in 2007-my current laptop still has some broken keys – by my inability to afford traveling to conferences, and my location Delhi isn’t really a tech hub.
The more questions I asked around the internet, the more people responded, and I wrote it all down.
I guess I just was lucky to meet a lot of nice people on the internet who took time to mentor and educate me.
I tried building other websites but didn’t succeed so i guess I really don’t know. I am not a smart coder, not very clever at writing but I do try to be honest.
Basic economics says pricing is proportional to demand and inversely proportional to supply. Honest and candid opinions have infinite demand and an uncertain supply.
DMR: There is a rumor about a R book you plan to publish in 2011 Can you confirm the rumor and tell us more?
AO: I just signed a contract with Springer for ” R for Business Analytics”. R is a great software, and lots of books for statistically trained people, but I felt like writing a book for the MBAs and existing analytics users- on how to easily transition to R for Analytics.
Like any language there are tricks and tweaks in R, and with a focus on code editors, IDE, GUI, web interfaces, R’s famous learning curve can be bent a bit.
Making analytics beautiful, and simpler to use is always a passion for me. With 3000 packages, R can be used for a lot more things and a lot more simply than is commonly understood.
The target audience however is business analysts- or people working in corporate environments.
Ajay Ohri has been working in the field of analytics since 2004 , when it was a still nascent emerging Industries in India. He has worked with the top two Indian outsourcers listed on NYSE,and with Citigroup on cross sell analytics where he helped sell an extra 50000 credit cards by cross sell analytics .He was one of the very first independent data mining consultants in India working on analytics products and domestic Indian market analytics .He regularly writes on analytics topics on his web site www.decisionstats.com and is currently working on open source analytical tools like R besides analytical software like SPSS and SAS.
- Skills of a good data miner (zyxo.wordpress.com)
- Data Mining with WEKA (r-bloggers.com)
- How Data Mining Can Help You Score on the First Date (volokh.com)
- Upcoming webinar on investigative analytics (dbms2.com)
- IBM SPSS 19 Now Available to the Global Academic Community via e-academy’s OnTheHub eStore (prweb.com)
JMP 9 releases on Oct 12- it is a very good reliable data visualization and analytical tool ( AND available on Mac as well)
AND IT is advertising R Graphics as well (lol- I can visualize the look on some ahem SAS fans in the R Project)
Updated Pricing- note I am not sure why they are charging US academics 495$ when SAS On Demand is free for academics. Shouldnt JMP be free to students- maybe John Sall and his people can do a tradeoff analysis for this given JMP’s graphics are better than Base SAS (which is under some pressure from WPS and R)
*Offer good in the U.S. only.
From- the mailer-
|Be First in Line for JMP® 9
Save up to $300 when you pre-order a
single-user license by Oct. 11
Make JMP your analytic hub for visual data discovery with this special offer, good through Oct. 11, 2010. Pre-order a single-user license of JMP 9 – for a discount of up to $300 – and get ready for a leap in data interactivity.
Order now and enjoy the compelling new features of JMP 9 when the software is released Oct. 12. New capabilities in JMP 9 let you:
What if I already have a JMP 8 single-user license?
What if I’m an annual license customer?
What if I work or study in the academic world?
Please feel free to forward this offer to interested colleagues.
Got two or more users?
Remember: Act by Oct. 11!
JMP runs on Macintosh and Windows
Here are some thoughts on using existing statistical software for better analytics and/or business intelligence (reporting)-
1) User Interface Design Matters- Most stats software have a legacy approach to user interface design. While the Graphical User Interfaces need to more business friendly and user friendly- example you can call a button T Test or You can call it Compare > Means of Samples (with a highlight called T Test). You can call a button Chi Square Test or Call it Compare> Counts Data. Also excessive reliance on drop down ignores the next generation advances in OS- namely touchscreen instead of mouse click and point.
Given the fact that base statistical procedures are the same across softwares, a more thoughtfully designed user interface (or revamped interface) can give softwares an edge over legacy designs.
2) Branding of Software Matters- One notable whine against SAS Institite products is a premier price. But really that software is actually inexpensive if you see other reporting software. What separates a Cognos from a Crystal Reports to a SAS BI is often branding (and user interface design). This plays a role in branding events – social media is often the least expensive branding and marketing channel. Same for WPS and Revolution Analytics.
3) Alliances matter- The alliances of parent companies are reflected in the sales of bundled software. For a complete solution , you need a database plus reporting plus analytical software. If you are not making all three of the above, you need to partner and cross sell. Technically this means that software (either DB, or Reporting or Analytics) needs to talk to as many different kinds of other softwares and formats. This is why ODBC in R is important, and alliances for small companies like Revolution Analytics, WPS and Netezza are just as important as bigger companies like IBM SPSS, SAS Institute or SAP. Also tie-ins with Hadoop (like R and Netezza appliance) or Teradata and SAS help create better usage.
4) Cloud Computing Interfaces could be the edge- Maybe cloud computing is all hot air. Prudent business planing demands that any software maker in analytics or business intelligence have an extremely easy to load interface ( whether it is a dedicated on demand website) or an Amazon EC2 image. Easier interfaces win and with the cloud still in early stages can help create an early lead. For R software makers this is critical since R is bad in PC usage for larger sets of data in comparison to counterparts. On the cloud that disadvantage vanishes. An easy to understand cloud interface framework is here ( its 2 years old but still should be okay) http://knol.google.com/k/data-mining-through-cloud-computing#
5) Platforms matter- Softwares should either natively embrace all possible platforms or bundle in middle ware themselves.
Here is a case study SAS stopped supporting Apple OS after Base SAS 7. Today Apple OS is strong ( 3.47 million Macs during the most recent quarter ) and the only way to use SAS on a Mac is to do either
or do a install of Ubuntu on the Mac ( https://help.ubuntu.com/community/MacBook ) and do this
Why does this matter? Well SAS is free to academics and students from this year, but Mac is a preferred computer there. Well WPS can be run straight away on the Mac (though they are curiously not been able to provide academics or discounted student copies 😉 ) as per
Does this give a disadvantage based on platform. Yes. However JMP continues to be supported on Mac. This is also noteworthy given the upcoming Chromium OS by Google, Windows Azure platform for cloud computing.