Spring Cleaning – What I wrote

 

A partial list of writings by me over the years

 

  • Big Data Initiatives in Developing Nations

 

Can big data, open data, and programs such as the Aadhaar Project enhance lives in underprivileged segments of society? March 2015

http://www.ibmbigdatahub.com/blog/big-data-initiatives-developing-nations

2) Downsides Dampen Open-Source Analytics September 2011 http://www.allanalytics.com/author.asp?section_id=1408&doc_id=233454

 

3) KDNuggets – Articles on Data Science

 

  1. Using Python and R together: 3 main approaches December 2015

 

  1. Interview: Ingo Mierswa, RapidMiner CEO on “Predaction” and Key Turning Points  June 2014
  2. Guide to Data Science Cheat Sheets 2014/05/12
  3. Book Review: Data Just Right 2014/04/03
  4. Exclusive Interview: Richard Socher, founder of etcML, Easy Text Classification Startup 2014/03/31
  5. Trifacta – Tackling Data Wrangling with Automation and Machine Learning 2014/03/17
  6. Paxata automates Data Preparation for Big Data Analytics 2014/03/07
  7. etcML Promises to Make Text Classification Easy  2014/03/05
  8. Wolfram Breakthrough Knowledge-based Programming Language – what it means for Data Science? 2014/03/02

Programmable Web- Articles on APIs

 

  1. Keen IO Helps Developers Solve Custom Analytics Needs 06-09-2014
  2. Scoreoid Aims to Gamify the World Using APIs 01-27-2014
  3. Plot.ly’s Plot to Visualize More Data 01-22-2014
  4. LumenData’s Acquisition of Algorithms.io is a Win-Win 01-08-2014
  5. Yactraq API Sees Huge Growth in 2013 01-06-2014
  6. Scrape.it Describes a Better Way to Extract Data12-20-2013
  7. Exclusive Interview: App Store Analytics API 12-04-2013
  8. APIs Enter 3d Printing Industry 11-29-2013
  9. PW Interview: José Luis Martinez of Textalytics 11-06-2013
  10. PW Interview Simon Chan PredictionIO 11-05-2013
  11. PW Interview: Scott Gimpel Founder and CEO FantasyData.com 10-23-2013
  12. PW Interview Brandon Levy, cofounder and CEO of Stitch Labs 10-08-2013
  13. PW Interview: Jolo Balbin Co-Founder Text Teaser 09-18-2013
  14. PW Interview:Bob Bickel CoFounder Redline13 07-29-2013
  15. PW Interview : Brandon Wirtz CTO Stremor.com 07-04-2013
  16. PW Interview: Andy Bartley, CEO Algorithms.io 06-04-2013
  17. PW Interview: Francisco J Martin, CEO BigML.com 05-30-2013
  18. PW Interview: Tal Rotbart Founder- CTO, SpringSense 05-28-2013
  19. PW Interview: Jeh Daruwala CEO Yactraq API, Behavorial Targeting for videos 05-13-2013
  20. PW Interview: Michael Schonfeld of Dwolla API on Innovation Meeting the Payment Web 05-02-2013
  21. PW Interview: Stephen Balaban of Lamda Labs on the Face Recognition API 04-29-2013
  22. PW Interview: Amber Feng, Stripe API, The Payment Web 04-24-2013
  23. PW Interview: Greg Lamp and Austin Ogilvie of Yhat on Shipping Predictive Models via API 04-22-2013
  24. Google Mirror API documentation is open for developers 04-18-2013
  25. PW Interview: Ricky Robinett, Ordr.in API, Ordering Food meets API 04-16-2013
  26. PW Interview: Jacob Perkins, Text Processing API, NLP meets API 04-10-2013
  27. Amazon EC2 On Demand Windows Instances -Prices reduced by 20% 04-08-2013
  28. Amazon S3 API Requests prices slashed by half 04-03-2013
  29. PW Interview: Stuart Battersby, Chatterbox API, Machine Learning meets Social 04-02-2013
  30. PW Interview: Karthik Ram, rOpenSci, Wrapping all science APIs 03-20-2013
  31. Viralheat Human Intent API- To buy or not to buy 03-13-2013
  32. Interview Tammer Kamel CEO and Founder Quandl 03-07-2013
  33. YHatHQ API: Calling Hosted Statistical Models 03-04-2013
  34. Quandl API: A Wikipedia for Numerical Data 02-25-2013
  35. Amazon Redshift API is out of limited preview and available! 02-18-2013
  36. Windows Azure Media Services REST API 02-14-2013
  37. Data Science Toolkit Wraps Many Data Services in One API 02-11-2013
  38. Diving into Codeacademy’s API Lessons 01-31-2013
  39. Google APIs finetuning Cloud Storage JSON API 01-29-2013
  40. Interview Hilary Mason Chief Scientist bitly 01-28-2013
  41. Interview: Viralheat CEO Raj Kadam on API Growth 01-22-2013
  42. Google Compute API – Affordable Computing at Google Scale 01-17-2013
  43. Ergast API Puts Car Racing Fans in the Driver’s Seat12-05-2012
  44. Springer APIs- Fostering Innovation via API Contests 11-20-2012
  45. Statistically programming the web – Shiny,HttR and RevoDeploy API 11-19-2012
  46. Google Cloud SQL API- Bigger ,Faster and now Free 11-12-2012
  47. A Look at the Web’s Most Popular API -Google Maps API 10-09-2012
  48. Cloud Storage APIs for the next generation Enterprise 09-26-2012
  49. Last.fm API: Sultan of Musical APIs 09-12-2012
  50. Socrata Data API: Keeping Government Open 08-29-2012
  51. BigML API Gets Bigger 08-22-2012
  52. Bing APIs: the Empire Strikes Back 08-15-2012
  53. Google Cloud SQL: Relational Database on the Cloud 08-13-2012
  54. Google BigQuery API Makes Big Data Analytics Easy 08-07-2012
  55. Your Store in The Cloud -Google Cloud Storage API 08-01-2012
  56. Predict the future with Google Prediction API 07-30-2012
  57. The Romney vs Obama API 07-27-2012

 

StatisticsViews

http://www.statisticsviews.com/details/feature/8868901/A-Tutorial-on-Python.html

 

CONFERENCES AND TALKS

1) Big Data Big Analyticshttp://krishnarajpm.com/bigdata/abstract.pdf Workshop on  Statistical Machine Learning and Game Theory  Approaches for Large Scale Data Analysis  9 July 2012 – 14 July 2012  Sponsored by Mathematical Sciences, Division of Science and Engineering  Research Board at Bangalore India

Department of Science & Technology Government of India. (sponsored airfare-hotel accomodation-honorium)

SLIDES Big data Big Analytics

2) Data Analytics using the Cloud- Challenges and Opportunities for India at 1st International Symposium on Big Data and Cloud Computing Challenges(ISBCC-2014) March 27-28, 2014 VIT University, Chennai, India Sponsored by BRNS (flight)

http://chennai.vit.ac.in/isbcc/

SLIDES Data analytics using the cloud challenges and opportunities for india from Ajay Ohri

3) Open Source Analytics at OSSCamp 2014 http://osscamp.in/

http://osscamp.in/events/6/open-source-analytics-overview-r-python-and-others

SLIDES- Open source analytics from Ajay Ohri

4) Society for Industrial and Applied Mathematics- Delhi Technological University Evolute 2015 : Annual Symposium Speaker

5) Talk on Analytics as a profession at Indian Institute of Technology Delhi

Learning R and Teaching R from Ajay Ohri

Workshops

Pre-Placement training workshop for Economics Students, Delhi School of Economics.

A Workshop on R from Ajay Ohri

Books

R for Business Analytics http://www.springer.com/us/book/9781461443421

R for Cloud Computing : A Data Science Approach http://www.springer.com/us/book/9781493917013

Revolution Analytics ( Microsoft) Corporate Blog

http://blog.revolutionanalytics.com/2011/08/9-more-ways-to-bring-data-into-r.html

http://blog.revolutionanalytics.com/2012/11/using-r-in-the-human-resources-department.html

 

Journal Articles

Journal of Statistical Software

https://www.jstatsoft.org/article/view/v066b04

Technometrics

Technometrics, Vol. 55 (3), August, 2013

http://amstat.tandfonline.com/doi/abs/10.1080/00401706.2013.822219

 

Major Media

been cited by Wired Magazine and ReadWriteWeb for espousing a marketplace for algorithms.

http://www.wired.com/2014/08/algorithmia/

http://readwrite.com/2011/06/01/an-app-store-for-algorithms/

 

Interviews (of Ajay Ohri)

  1. Big Step Interview July 2015  Expert Interview with Ajay Ohri on the Importance of Big Data http://blog.bigstep.com/big-data-experts-interviews/expert-interview-with-ajay-ohri-on-the-importance-of-big-data/
  2. AnalyticsVidhya Feb 2015 Interview with Industry expert – Ajay Ohri, Founder, decisionstats.com http://www.analyticsvidhya.com/blog/2015/02/interview-expert-ajay-ohri-founder-decisionstats-com/
  3. AnalyticsIndia Magazine Nov 2012 Interview – Ajay Ohri, Author “R for Business Analytics” http://analyticsindiamag.com/interview-ajay-ohri-author-r-for-business-analytics/
  4. HRTechEurope More R in HR Nov 2012 http://blog.hrtecheurope.com/more-r-in-hr/
  5. Data Mining Research Jan 2011 Interview Data Mining Research interview: Ajay Ohrihttp://www.dataminingblog.com/data-mining-research-interview-ajay-ohri/

AnalyticBridge Apr 2008 Interview with Ajay Ohri, Data Mining Consultant from India http://www.analyticbridge.com/group/interviews/forum/topics/2004291:Topic:11703

Writing on APIs for Programmable Web

I have been writing free lance on APIs for Programmable Web. Here is an updated list of the articles, many of these would be of interest to analytics users. Note- some of these are interviews and they are in bold. Note to regular readers: I keep updating this list , and at each updation bring it to the front page, then allowing the blog postings to slide it down!

Scoreoid Aims to Gamify the World Using APIs January 27th, 2014

Plot.ly’s Plot to Visualize More Data January 22nd, 2014

LumenData’s Acquisition of Algorithms.io is a Win-Win January 8th, 2014

Yactraq API Sees Huge Growth in 2013  January 6th, 2014

Scrape.it Describes a Better Way to Extract Data December 20th, 2013

Exclusive Interview: App Store Analytics API December 4th, 2013

APIs Enter 3d Printing Industry November 29th, 2013

PW Interview: José Luis Martinez of Textalytics November 6th, 2013

PW Interview Simon Chan PredictionIO November 5th, 2013

PW Interview: Scott Gimpel Founder and CEO FantasyData.com October 23rd, 2013

PW Interview Brandon Levy, cofounder and CEO of Stitch Labs October 8th, 2013

PW Interview: Jolo Balbin Co-Founder Text Teaser  September 18th, 2013

PW Interview:Bob Bickel CoFounder Redline13 July 29th, 2013

PW Interview : Brandon Wirtz CTO Stremor.com   July 4th, 2013

PW Interview: Andy Bartley, CEO Algorithms.io  June 4th, 2013

PW Interview: Francisco J Martin, CEO BigML.com 2013/05/30

PW Interview: Tal Rotbart Founder- CTO, SpringSense 2013/05/28

PW Interview: Jeh Daruwala CEO Yactraq API, Behavorial Targeting for videos 2013/05/13

PW Interview: Michael Schonfeld of Dwolla API on Innovation Meeting the Payment Web  2013/05/02

PW Interview: Stephen Balaban of Lamda Labs on the Face Recognition API  2013/04/29

PW Interview: Amber Feng, Stripe API, The Payment Web 2013/04/24

PW Interview: Greg Lamp and Austin Ogilvie of Yhat on Shipping Predictive Models via API   2013/04/22

Google Mirror API documentation is open for developers   2013/04/18

PW Interview: Ricky Robinett, Ordr.in API, Ordering Food meets API    2013/04/16

PW Interview: Jacob Perkins, Text Processing API, NLP meets API   2013/04/10

Amazon EC2 On Demand Windows Instances -Prices reduced by 20%  2013/04/08

Amazon S3 API Requests prices slashed by half  2013/04/02

PW Interview: Stuart Battersby, Chatterbox API, Machine Learning meets Social 2013/04/02

PW Interview: Karthik Ram, rOpenSci, Wrapping all science API2013/03/20

Viralheat Human Intent API- To buy or not to buy 2013/03/13

Interview Tammer Kamel CEO and Founder Quandl 2013/03/07

YHatHQ API: Calling Hosted Statistical Models 2013/03/04

Quandl API: A Wikipedia for Numerical Data 2013/02/25

Amazon Redshift API is out of limited preview and available! 2013/02/18

Windows Azure Media Services REST API 2013/02/14

Data Science Toolkit Wraps Many Data Services in One API 2013/02/11

Diving into Codeacademy’s API Lessons 2013/01/31

Google APIs finetuning Cloud Storage JSON API 2013/01/29

2012
Ergast API Puts Car Racing Fans in the Driver’s Seat 2012/12/05
Springer APIs- Fostering Innovation via API Contests 2012/11/20
Statistically programming the web – Shiny,HttR and RevoDeploy API 2012/11/19
Google Cloud SQL API- Bigger ,Faster and now Free 2012/11/12
A Look at the Web’s Most Popular API -Google Maps API 2012/10/09
Cloud Storage APIs for the next generation Enterprise 2012/09/26
Last.fm API: Sultan of Musical APIs 2012/09/12
Socrata Data API: Keeping Government Open 2012/08/29
BigML API Gets Bigger 2012/08/22
Bing APIs: the Empire Strikes Back 2012/08/15
Google Cloud SQL: Relational Database on the Cloud 2012/08/13
Google BigQuery API Makes Big Data Analytics Easy 2012/08/05
Your Store in The Cloud -Google Cloud Storage API 2012/08/01
Predict the future with Google Prediction API 2012/07/30
The Romney vs Obama API 2012/07/27

England rule India- again

If you type the words “business intelligence expert” in Google. you may get the top ranked result as http://goo.gl/pCqUh or Peter James Thomas, a profound name as it can be as it spans three of the most important saints in the church.

The current post for this is very non business -intelligence topic called Wager. http://peterjamesthomas.com/2011/07/20/wager/

It details how Peter, a virtual friend whom I have never met, and who looks suspiciously like Hugh Grant with the hair, and Ajay Ohri (myself) waged a wager on which cricket team would emerge victorious in the ongoing test series . It was a 4 match series, and India needed to win atleast the series or avoid losing it by a difference of 2, to retain their world cricket ranking (in Tests) as number 1.

Sadly at the end of the third test, the Indian cricket team have lost the series, the world number 1 ranking, and some serious respect by 3-0.

What is a Test Match? It is a game of cricket played over 5 days.
Why was Ajay so confident India would win. Because India won the one day world championship this April 2011. The one day series is a one day match of cricket.

There lies the problem. From an analytic point of view, I had been lulled into thinking that past performance was an indicator of future performance, indeed the basis of most analytical assumptions. Quite critically, I managed to overlook the following cricketing points-

1) Cricket performance is different from credit performance. It is the people and their fitness.

India’s strike bowler Zaheer Khan was out due to injury, we did not have any adequate replacement for him. India’s best opener Virender Sehwag was out due to shoulder injury in the first two tests.

Moral – Statistics can be misleading if you do not apply recent knowledge couple with domain expertise (in this case cricket)

2) What goes up must come down. Indeed if a team has performed its best two months back, it is a good sign that cyclicality will ensure performance will go down.

Moral- Do not depend on regression or time series with ignoring cyclical trends.

3) India’s cricket team is aging. England ‘s cricket team is youthful.

I should have gotten this one right. One of the big and understated reasons that the Indian economy is booming -is because we have the youngest population in the world with a median age of 28.

or as http://en.wikipedia.org/wiki/Demographics_of_India

India has more than 50% of its population below the age of 25 and more than 65% hovers below the age of 35. It is expected that, in 2020, the average age of an Indian will be 29 years, compared to 37 for China and 48 for Japan; and, by 2030, India’s dependency ratio should be just over 0.4

India’s population is 1.21 billion people, so potentially a much larger pool of athletes , once we put away our laptops that is.

http://en.wikipedia.org/wiki/Demographics_of_UK

 

the total population of the United Kingdom was 58,789,194 (I dont have numbers for average age)

 

Paradoxically India have the oldest cricket team in the world . This calls for detailed investigation and some old timers should give way to new comers after this drubbing.

Moral- Demographics matters. It is the people who vary more than any variable.

4) The Indian cricket team has played much less Test cricket and much more 20:20 and one day matches. 20:20 is a format in which only twenty overs are bowled per side. In Test Matches 90 overs are bowled every day for 5 days.

Stamina is critical in sports.

Moral- Context is important in extrapolating forecasts.

Everything said and done- the English cricket team played hard and fair and deserve to be number ones. I would love to say more on the Indian cricket team, but I now intend to watch Manchester United play soccer.

 

 

 

 

 

Protected: Happy Labour Day to American Stats-ical Association

This content is password protected. To view it please enter your password below:

Protected: Whats behind that pretty SAS Blog?

This content is password protected. To view it please enter your password below:

Carole-Ann’s 2011 Predictions for Decision Management

Carole-Ann’s 2011 Predictions for Decision Management

For Ajay Ohri on DecisionStats.com

What were the top 5 events in 2010 in your field?
  1. Maturity: the Decision Management space was made up of technology vendors, big and small, that typically focused on one or two aspects of this discipline.  Over the past few years, we have seen a lot of consolidation in the industry – first with Business Intelligence (BI) then Business Process Management (BPM) and lately in Business Rules Management (BRM) and Advanced Analytics.  As a result the giant Platform vendors have helped create visibility for this discipline.  Lots of tiny clues finally bubbled up in 2010 to attest of the increasing activity around Decision Management.  For example, more products than ever were named Decision Manager; companies advertised for Decision Managers as a job title in their job section; most people understand what I do when I am introduced in a social setting!
  2. Boredom: unfortunately, as the industry matures, inevitably innovation slows down…  At the main BRMS shows we heard here and there complaints that the technology was stalling.  We heard it from vendors like Red Hat (Drools) and we heard it from bored end-users hoping for some excitement at Business Rules Forum’s vendor panel.  They sadly did not get it
  3. Scrum: I am not thinking about the methodology there!  If you have ever seen a rugby game, you can probably understand why this is the term that comes to mind when I look at the messy & confusing technology landscape.  Feet blindly try to kick the ball out while superhuman forces are moving randomly the whole pack – or so it felt when I played!  Business Users in search of Business Solutions are facing more and more technology choices that feel like comparing apples to oranges.  There is value in all of them and each one addresses a specific aspect of Decision Management but I regret that the industry did not simplify the picture in 2010.  On the contrary!  Many buzzwords were created or at least made popular last year, creating even more confusion on a muddy field.  A few examples: Social CRM, Collaborative Decision Making, Adaptive Case Management, etc.  Don’t take me wrong, I *do* like the technologies.  I sympathize with the decision maker that is trying to pick the right solution though.
  4. Information: Analytics have been used for years of course but the volume of data surrounding us has been growing to unparalleled levels.  We can blame or thank (depending on our perspective) Social Media for that.  Sites like Facebook and LinkedIn have made it possible and easy to publish relevant (as well as fluffy) information in real-time.  As we all started to get the hang of it and potentially over-publish, technology evolved to enable the storage, correlation and analysis of humongous volumes of data that we could not dream of before.  25 billion tweets were posted in 2010.  Every month, over 30 billion pieces of data are shared on Facebook alone.  This is not just about vanity and marketing though.  This data can be leveraged for the greater good.  Carlos pointed to some fascinating facts about catastrophic event response team getting organized thanks to crowd-sourced information.  We are also seeing, in the Decision management world, more and more applicability for those very technology that have been developed for the needs of Big Data – I’ll name for example Hadoop that Carlos (yet again) discussed in his talks at Rules Fest end of 2009 and 2010.
  5. Self-Organization: it may be a side effect of the Social Media movement but I must admit that I was impressed by the success of self-organizing initiatives.  Granted, this last trend has nothing to do with Decision Management per se but I think it is a great evolution worth noting.  Let me point to a couple of examples.  I usually attend traditional conferences and tradeshows in which the content can be good but is sometimes terrible.  I was pleasantly surprised by the professionalism and attendance at *un-conferences* such as P-Camp (P stands for Product – an event for Product Managers).  When you think about it, it is already difficult to get a show together when people are dedicated to the tasks.  How crazy is it to have volunteers set one up with no budget and no agenda?  Well, people simply show up to do their part and everyone has fun voting on-site for what seems the most appealing content at the time.  Crowdsourcing applied to shows: it works!  Similar experience with meetups or tweetups.  I also enjoyed attending some impromptu Twitter jam sessions on a given topic.  Social Media is certainly helping people reach out and get together in person or virtually and that is wonderful!

A segment of a social network
Image via Wikipedia

What are the top three trends you see in 2011?

  1. Performance:  I might be cheating here.   I was very bullish about predicting much progress for 2010 in the area of Performance Management in your Decision Management initiatives.  I believe that progress was made but Carlos did not give me full credit for the right prediction…  Okay, I am a little optimistic on timeline…  I admit it…  If it did not fully happen in 2010, can I predict it again in 2011?  I think that companies want to better track their business performance in order to correct the trajectory of course but also to improve their projections.  I see that it is turning into reality already here and there.  I expect it to become a trend in 2011!
  2. Insight: Big Data being available all around us with new technologies and algorithms will continue to propagate in 2011 leading to more widely spread Analytics capabilities.  The buzz at Analytics shows on Social Network Analysis (SNA) is a sign that there is interest in those kinds of things.  There is tremendous information that can be leveraged for smart decision-making.  I think there will be more of that in 2011 as initiatives launches in 2010 will mature into material results.
    5 Ways to Cultivate an Active Social Network
    Image by Intersection Consulting via Flickr
  3. Collaboration:  Social Media for the Enterprise is a discipline in the making.  Social Media was initially seen for the most part as a Marketing channel.  Over the years, companies have started experimenting with external communities and ideation capabilities with moderate success.  The few strategic initiatives started in 2010 by “old fashion” companies seem to be an indication that we are past the early adopters.  This discipline may very well materialize in 2011 as a core capability, well, or at least a new trend.  I believe that capabilities such Chatter, offered by Salesforce, will transform (slowly) how people interact in the workplace and leverage the volumes of social data captured in LinkedIn and other Social Media sites.  Collaboration is of course a topic of interest for me personally.  I even signed up for Kare Anderson’s collaboration collaboration site – yes, twice the word “collaboration”: it is really about collaborating on collaboration techniques.  Even though collaboration does not require Social Media, this medium offers perspectives not available until now.

Brief Bio-

Carole-Ann is a renowned guru in the Decision Management space. She created the vision for Decision Management that is widely adopted now in the industry. Her claim to fame is the strategy and direction of Blaze Advisor, the then-leading BRMS product, while she also managed all the Decision Management tools at FICO (business rules, predictive analytics and optimization). She has a vision for Decision Management both as a technology and a discipline that can revolutionize the way corporations do business, and will never get tired of painting that vision for her audience. She speaks often at Industry conferences and has conducted university classes in France and Washington DC.

Leveraging her Masters degree in Applied Mathematics / Computer Science from a “Grande Ecole” in France, she started her career building advanced systems using all kinds of technologies — expert systems, rules, optimization, dashboarding and cubes, web search, and beta version of database replication – as well as conducting strategic consulting gigs around change management.

She now tweets as @CMatignon, blogs at blog.sparklinglogic.com and interacts at community.sparklinglogic.com.

She started her career building advanced systems using all kinds of technologies — expert systems, rules, optimization, dashboarding and cubes, web search, and beta version of database replication.  At Cleversys (acquired by Kurt Salmon & Associates), she also conducted strategic consulting gigs mostly around change management.

While playing with advanced software components, she found a passion for technology and joined ILOG (acquired by IBM).  She developed a growing interest in Optimization as well as Business Rules.  At ILOG, she coined the term BRMS while brainstorming with her Sales counterpart.  She led the Presales organization for Telecom in the Americas up until 2000 when she joined Blaze Software (acquired by Brokat Technologies, HNC Software and finally FICO).

Her 360-degree experience allowed her to gain appreciation for all aspects of a software company, giving her a unique perspective on the business.  Her technical background kept her very much in touch with technology as she advanced.

She also became addicted to Twitter in the process.  She is active on all kinds of social media, always looking for new digital experience!

Outside of work, Carole-Ann loves spending time with her two boys.  They grow fruits in their Northern California home and cook all together in the French tradition.

profile on LinkedIn

TwitterFollow me on Twitter

Filtering to Gain Social Network Value
Image by Intersection Consulting via Flickr
Social Networks Hype Cycle
Image by fredcavazza via Flickr

Interview Ajay Ohri Decisionstats.com with DMR

From-

http://www.dataminingblog.com/data-mining-research-interview-ajay-ohri/

Here is the winner of the Data Mining Research People Award 2010: Ajay Ohri! Thanks to Ajay for giving some time to answer Data Mining Research questions. And all the best to his blog, Decision Stat!

Data Mining Research (DMR): Could you please introduce yourself to the readers of Data Mining Research?

Ajay Ohri (AO): I am a business consultant and writer based out of Delhi- India. I have been working in and around the field of business analytics since 2004, and have worked with some very good and big companies primarily in financial analytics and outsourced analytics. Since 2007, I have been writing my blog at http://decisionstats.com which now has almost 10,000 views monthly.

All in all, I wrote about data, and my hobby is also writing (poetry). Both my hobby and my profession stem from my education ( a masters in business, and a bachelors in mechanical engineering).

My research interests in data mining are interfaces (simpler interfaces to enable better data mining), education (making data mining less complex and accessible to more people and students), and time series and regression (specifically ARIMAX)
In business my research interests software marketing strategies (open source, Software as a service, advertising supported versus traditional licensing) and creation of technology and entrepreneurial hubs (like Palo Alto and Research Triangle, or Bangalore India).

DMR: I know you have worked with both SAS and R. Could you give your opinion about these two data mining tools?

AO: As per my understanding, SAS stands for SAS language, SAS Institute and SAS software platform. The terms are interchangeably used by people in industry and academia- but there have been some branding issues on this.
I have not worked much with SAS Enterprise Miner , probably because I could not afford it as business consultant, and organizations I worked with did not have a budget for Enterprise Miner.
I have worked alone and in teams with Base SAS, SAS Stat, SAS Access, and SAS ETS- and JMP. Also I worked with SAS BI but as a user to extract information.
You could say my use of SAS platform was mostly in predictive analytics and reporting, but I have a couple of projects under my belt for knowledge discovery and data mining, and pattern analysis. Again some of my SAS experience is a bit dated for almost 1 year ago.

I really like specific parts of SAS platform – as in the interface design of JMP (which is better than Enterprise Guide or Base SAS ) -and Proc Sort in Base SAS- I guess sequential processing of data makes SAS way faster- though with computing evolving from Desktops/Servers to even cheaper time shared cloud computers- I am not sure how long Base SAS and SAS Stat can hold this unique selling proposition.

I dislike the clutter in SAS Stat output, it confuses me with too much information, and I dislike shoddy graphics in the rendering output of graphical engine of SAS. Its shoddy coding work in SAS/Graph and if JMP can give better graphics why is legacy source code preventing SAS platform from doing a better job of it.

I sometimes think the best part of SAS is actually code written by Goodnight and Sall in 1970’s , the latest procs don’t impress me much.

SAS as a company is something I admire especially for its way of treating employees globally- but it is strange to see the rest of tech industry not following it. Also I don’t like over aggression and the SAS versus Rest of the Analytics /Data Mining World mentality that I sometimes pick up when I deal with industry thought leaders.

I think making SAS Enterprise Miner, JMP, and Base SAS in a completely new web interface priced at per hour rates is my wishlist but I guess I am a bit sentimental here- most data miners I know from early 2000’s did start with SAS as their first bread earning software. Also I think SAS needs to be better priced in Business Intelligence- it seems quite cheap in BI compared to Cognos/IBM but expensive in analytical licensing.

If you are a new stats or business student, chances are – you may know much more R than SAS today. The shift in education at least has been very rapid, and I guess R is also more of a platform than a analytics or data mining software.

I like a lot of things in R- from graphics, to better data mining packages, modular design of software, but above all I like the can do kick ass spirit of R community. Lots of young people collaborating with lots of young to old professors, and the energy is infectious. Everybody is a CEO in R ’s world. Latest data mining algols will probably start in R, published in journals.

Which is better for data mining SAS or R? It depends on your data and your deadline. The golden rule of management and business is -it depends.

Also I have worked with a lot of KXEN, SQL, SPSS.

DMR: Can you tell us more about Decision Stats? You have a traffic of 120′000 for 2010. How did you reach such a success?

AO: I don’t think 120,000 is a success. Its not a failure. It just happened- the more I wrote, the more people read.In 2007-2008 I used to obsess over traffic. I tried SEO, comments, back linking, and I did some black hat experimental stuff. Some of it worked- some didn’t.

In the end, I started asking questions and interviewing people. To my surprise, senior management is almost always more candid , frank and honest about their views while middle managers, public relations, marketing folks can be defensive.

Social Media helped a bit- Twitter, Linkedin, Facebook really helped my network of friends who I suppose acted as informal ambassadors to spread the word.
Again I was constrained by necessity than choices- my middle class finances ( I also had a baby son in 2007-my current laptop still has some broken keys :) – by my inability to afford traveling to conferences, and my location Delhi isn’t really a tech hub.

The more questions I asked around the internet, the more people responded, and I wrote it all down.

I guess I just was lucky to meet a lot of nice people on the internet who took time to mentor and educate me.

I tried building other websites but didn’t succeed so i guess I really don’t know. I am not a smart coder, not very clever at writing but I do try to be honest.

Basic economics says pricing is proportional to demand and inversely proportional to supply. Honest and candid opinions have infinite demand and an uncertain supply.

DMR: There is a rumor about a R book you plan to publish in 2011 :-) Can you confirm the rumor and tell us more?

AO: I just signed a contract with Springer for ” R for Business Analytics”. R is a great software, and lots of books for statistically trained people, but I felt like writing a book for the MBAs and existing analytics users- on how to easily transition to R for Analytics.

Like any language there are tricks and tweaks in R, and with a focus on code editors, IDE, GUI, web interfaces, R’s famous learning curve can be bent a bit.

Making analytics beautiful, and simpler to use is always a passion for me. With 3000 packages, R can be used for a lot more things and a lot more simply than is commonly understood.
The target audience however is business analysts- or people working in corporate environments.

Brief Bio-
Ajay Ohri has been working in the field of analytics since 2004 , when it was a still nascent emerging Industries in India. He has worked with the top two Indian outsourcers listed on NYSE,and with Citigroup on cross sell analytics where he helped sell an extra 50000 credit cards by cross sell analytics .He was one of the very first independent data mining consultants in India working on analytics products and domestic Indian market analytics .He regularly writes on analytics topics on his web site www.decisionstats.com and is currently working on open source analytical tools like R besides analytical software like SPSS and SAS.

%d bloggers like this: