Quantifying Analytics ROI

Japanese House Crest “Go-Shichi no Kiri”
Image via Wikipedia

I had a brief twitter exchange with Jim Davis, Chief Marketing Officer, SAS Institute on Return of Investment on Business Analytics Projects for customers. I have interviewed Jim Davis before last year https://decisionstats.com/2009/06/05/interview-jim-davis-sas-institute/

Now Jim Davis is a big guy, and he is rushing from the launch of SAS Institute’s Social Media Analytics in Japan- to some arguably difficult flying conditions in time to be home in America for Thanksgiving. That and and I have not been much of a good Blog Boy recently, more swayed by love of open source, than love of software per se. I love equally, given I am bad at both equally.

Anyways, Jim’s contention  ( http://twitter.com/Davis_Jim ) was customers should go in business analytics only if there is Positive Return on Investment.  I am quoting him here-

What is important is that there be a positive ROI on each and every BA project. Otherwise don’t do it.

That’s not the marketing I was taught in my business school- basically it was sell, sell, sell.

However I see most BI sales vendors also go through -let me meet my sales quota for this quarter- and quantifying customer ROI is simple maths than predictive analytics but there seems to be some information assymetry in it.

Here is a paper from North Western University on ROI in IT projects-.

but overall it would be in the interest of customers and Business Analytics Vendors to publish aggregated ROI.

The opponents to this transparency in ROI would be market leaders in market share, who have trapped their customers by high migration costs (due to complexity) or contractually.

A recent study listed Oracle having a large percentage of unhappy customers who would still renew!, SAP had problems when it raised prices for licensing arbitrarily (that CEO is now CEO of HP and dodging legal notices from Oracle).

Indeed Jim Davis’s famous unsettling call for focusing on Business Analytics,as Business Intelligence is dead- that call has been implemented more aggressively by IBM in analytical acquisitions than even SAS itself which has been conservative about inorganic growth. Quantifying ROI, should theoretically aid open source software the most (since they are cheapest in up front licensing) or newer technologies like MapReduce /Hadoop (since they are quite so fast)- but I think that market has a way of factoring in these things- and customers are not as foolish neither as unaware of costs versus benefits of migration.

The contrary to this is Business Analytics and Business Intelligence are imperfect markets with duo-poly  or big players thriving in absence of customer regulation.

You get more protection as a customer of $20 bag of potato chips, than as a customer of a $200,000 software. Regulators are wary to step in to ensure ROI fairness (since most bright techies are qither working for private sector, have their own startup or invested in startups)- who in Govt understands Analytics and Intelligence strong enough to ensure vendor lock-ins are not done, and market flexibility is done. It is also a lower choice for embattled regulators to ensure ROI on enterprise software unlike the aggressiveness they have showed in retail or online software.

Who will Analyze the Analysts and who can quantify the value of quants (or penalize them for shoddy quantitative analytics)- is an interesting phenomenon we expect to see more of.

 

 

Summer School on Uncertainty Quantification

Scheme for sensitivity analysis
Image via Wikipedia

SAMSI/Sandia Summer School on Uncertainty Quantification – June 20-24, 2011

http://www.samsi.info/workshop/samsisandia-summer-school-uncertainty-quantification

The utilization of computer models for complex real-world processes requires addressing Uncertainty Quantification (UQ). Corresponding issues range from inaccuracies in the models to uncertainty in the parameters or intrinsic stochastic features.

This Summer school will expose students in the mathematical and statistical sciences to common challenges in developing, evaluating and using complex computer models of processes. It is essential that the next generation of researchers be trained on these fundamental issues too often absent of traditional curricula.

Participants will receive not only an overview of the fast developing field of UQ but also specific skills related to data assimilation, sensitivity analysis and the statistical analysis of rare events.

Theoretical concepts and methods will be illustrated on concrete examples and applications from both nuclear engineering and climate modeling.

The main lecturers are:
Dan Cacuci (N.C. State University): data assimilation and applications to nuclear engineering

Dan Cooley (Colorado State University): statistical analysis of rare events
This short course will introduce the current statistical practice for analyzing extreme events. Statistical practice relies on fitting distributions suggested by asymptotic theory to a subset of data considered to be extreme. Both block maximum and threshold exceedance approaches will be presented for both the univariate and multivariate cases.

Doug Nychka (NCAR): data assimilation and applications in climate modeling
Climate prediction and modeling do not incorporate geophysical data in the sequential manner as weather forecasting and comparison to data is typically based on accumulated statistics, such as averages. This arises because a climate model matches the state of the Earth’s atmosphere and ocean “on the average” and so one would not expect the detailed weather fluctuations to be similar between a model and the real system. An emerging area for climate model validation and improvement is the use of data assimilation to scrutinize the physical processes in a model using observations on shorter time scales. The idea is to find a match between the state of the climate model and observed data that is particular to the observed weather. In this way one can check whether short time physical processes such as cloud formation or dynamics of the atmosphere are consistent with what is observed.

Dongbin Xiu (Purdue University): sensitivity analysis and polynomial chaos for differential equations
This lecture will focus on numerical algorithms for stochastic simulations, with an emphasis on the methods based on generalized polynomial chaos methodology. Both the mathematical framework and the technical details will be examined, along with performance comparisons and implementation issues for practical complex systems.

The main lectures will be supplemented by discussion sessions and by presentations from UQ practitioners from both the Sandia and Los Alamos National Laboratories.

http://www.samsi.info/workshop/samsisandia-summer-school-uncertainty-quantification

Why do bloggers blog ?

Xbox (revision 1.0) internal layout. Including...
Image via Wikipedia

Step 1 is to create internal motivation to create a blog in the first place

Step 2 is to find what to write

Reasons Bloggers Blog-

Basic -Ranting


Examples- I hate Facebook Platform team treats me badly with waits, and breaks my code.

SAS Marketing wont give me  a big discount to make me look good in front of my boss.

Companies  wont give me their software for free- even though I will use it to make money (and not play X Box)

I want my vendors to be FOSS but my customers to switch to SaaS.

Google wont do this- Apple wont do that- Microsoft wont do those.

Revolution would give me 4 great packages but not the open source for RevoScaler (which only 300 people would understand in the first place)

Safety-

I better kiss the Professor and give a Turkey for dinner, as he sits on my thesis committee.

I will recommend Prof X’s lousy book in the hope he recommends my lousy book as a textbook too.

It is safe to laugh when the boss is making a joke-I should comment on her corporate blog, and retweet her.

Belonging-

I belong to this great online community of smart people. Let me agree to what they say.

I really believe in EVERYTHING that ALL the 2 MILLION members of the community have to say ALL the TIME.

I belong to this online community because all my friends are on my computer.

4 Egositic

My blog page rank is now X plus delta tau because of sugary key words (2004)

My technorati numbers rise (2005)

I was once on Digg (2007)

I have Z * exp N followers on Twitter and even more on Facebook (2008)

My Klout is increasing on twitter, My stack overflow reputation ‘s cup floweth over. (2009)

My Karma on Reddit is more important than my Karma in real life (2010)

Self Actualization-

I got time to kill- and I think I may learn more, meet intersting people and discover something wandering on the internet.

All those who wonder are not lost- Wikiquote

I got a story to tell, poems to write, code to give away. A free  Blog is something a Chinese , an Iranian  and a North korean really really know what the value is.

But after all that, WHY Do Bloggers Blog?

  • Because we are still waiting for Facebook to create the Blog Killer.
  • Its better than saying I am unemployed and a social loner
  • Reddit Karma feels good. Any Karma of any kind.

Happy Thanksgiving Id

http://en.wikipedia.org/wiki/Eid_al-Adha

Eid al-Adha (Arabic: عيد الأضحى‎ ‘Īdu l-’Aḍḥā) or “Festival of Sacrifice” or

“Greater Eid” is an important religious holiday celebrated by Muslims

worldwide to commemorate the willingness of Abraham (Ibrahim) to

sacrifice his son Ishmael (Isma’il) as an act of obedience to God, before

God intervened to provide him with a ram (uncastrated male sheep) to

sacrifice instead.[1]

The meat is divided into three parts to be distributed to others. The family retains one third of the share, another third is given to relatives, friends and neighbors, and the other third is given to the poor & needy.

Eid al-Adha is the latter of two Eid festivals celebrated by Muslims, whose basis comes from Sura 2 (Al-Baqara) Ayah 196 in the Qur’an.

 

The incident with Abraham and God is also mentioned in Old Testament

1431 (Islamic Calendar): November 16, 2010.

http://en.wikipedia.org/wiki/Binding_of_Isaac

The Binding of Isaac, in Genesis 22:1-24 is a story from the Hebrew Bible in which God asks Abraham to sacrifice his son, Isaac, on Mount Moriah.

The narration is referred to as the Akedah (עקדה) or Akedat Yitzchak (עקידת יצחק) in Hebrew and as the Dhabih (ذبيح) in Arabic. The sacrifice itself is called an Olah in Hebrew — for the significance of sacrifices, especially in Biblical times, see korban.

Thanksgiving

http://en.wikipedia.org/wiki/Thanksgiving#cite_note-Encyclop.C3.A6dia_Britannica-0

Thanksgiving Day is a harvest festival celebrated primarily in the United States and Canada.

Thanksgiving was a holiday to express thankfulness, gratitude, and appreciation to God, family and friends for which all have been blessed of material possessions and relationships.

Traditionally, it has been a time to give thanks for a bountiful harvest. This holiday has since moved away from its religious roots.

Note from Ajay-

Goats are slaughtered on Id and Turkeys on Thanksgiving

Happy Holidays to you.

Related Articles-

https://decisionstats.com/2010/09/18/happy-yom-kippur/

http://www.oyate.org/resources/shortthanks.html

(Id is  a holiday in secular India as we celebrate Minority Festivals-by constitutional law )

Statistical Analysis with R- by John M Quick

I was asked to be a techie reviewe for John M Quick’s new R book “Statistical Analysis with R” from Packt Publishing some months ago-(very much to my surprise I confess)-

I agreed- and technical reviewer work does take time- its like being a mid wife and there is whole team trying to get the book to birth.

Statistical Analysis with R- is a Beginner’s Guide so has nice screenshots, simple case studies, and quizzes to check recall of student/ reader. I remember struggling with the official “beginner’s guide to R” so this one is different in that it presents a story of a Chinese Army and how to use R to plan resources to fight the battle. It’s recommended especially for undergraduate courses- R need not be an elitist language- and given my experience with Asian programming acumen – I am sure it is a matter of time before high schools in India teach basic R in final years ( I learnt quite a shit load of quantum physics as compulsory topics in Indian high schools- but I guess we didnt have Jersey Shore things to do)

Congrats to author Mr John M Quick- he is doing his educational Phd from ASU- and I am sure both he and his approach to making education simple informative and fun will go places.

Only bad thing- The Name Statistical Analysis with R has atleast three other books , but I guess Google will catch up to it.

This book is here-https://www.packtpub.com/statistical-analysis-with-r-beginners-guide/book

Amazon goes HPC and GPU: Dirk E to revise his R HPC book

Looking south above Interstate 80, the Eastsho...
Image via Wikipedia

Amazon just did a cluster Christmas present for us tech geek lizards- before Google could out doogle them with end of the Betas (cough- its on NDA)

Clusters used by Academic Departments now have a great chance to reduce cost without downsizing- but only if the CIO gets the email.

While Professor Goodnight of SAS / North Carolina University is still playing time sharing versus mind sharing games with analytical birdies – his 70 mill server farm set in Feb last is about to get ready

( I heard they got public subsidies for environment- but thats historic for SAS– taking public things private -right Prof as SAS itself began as a publicly funded project. and that was in the 1960s and they didnt even have no lobbyists as well. )

In realted R news, Dirk E has been thinking of a R HPC book without paying attention to Amazon but would now have to include Amazon

(he has been thinking of writing that book for 5 years, but hey he’s got a day job, consulting gigs with revo, photo ops at Google, a blog, packages to maintain without binaries, Dirk E we await thy book with bated holes.

Whos Dirk E – well http://dirk.eddelbuettel.com/ is like the Terminator of R project (in terms of unpronounceable surnames)

Back to the cause du jeure-

 

From http://aws.amazon.com/ec2/hpc-applications/ but minus corporate buzz words.

 

Unique to Cluster Compute and Cluster GPU instances is the ability to group them into clusters of instances for use with HPC

applications. This is particularly valuable for those applications that rely on protocols like Message Passing Interface (MPI) for tightly coupled inter-node communication.

Cluster Compute and Cluster GPU instances function just like other Amazon EC2 instances but also offer the following features for optimal performance with HPC applications:

  • When run as a cluster of instances, they provide low latency, full bisection 10 Gbps bandwidth between instances. Cluster sizes up through and above 128 instances are supported.
  • Cluster Compute and Cluster GPU instances include the specific processor architecture in their definition to allow developers to tune their applications by compiling applications for that specific processor architecture in order to achieve optimal performance.

The Cluster Compute instance family currently contains a single instance type, the Cluster Compute Quadruple Extra Large with the following specifications:

23 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cc1.4xlarge

The Cluster GPU instance family currently contains a single instance type, the Cluster GPU Quadruple Extra Large with the following specifications:

22 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
2 x NVIDIA Tesla “Fermi” M2050 GPUs
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cg1.4xlarge

.

Sign Up for Amazon EC2

Interview James Dixon Pentaho

Here is an interview with James Dixon the founder of Pentaho, self confessed Chief Geek and CTO. Pentaho has been growing very rapidly and it makes open source Business Intelligence solutions- basically the biggest chunk of enterprise software market currently.

Ajay-  How would you describe Pentaho as a BI product for someone who is completely used to traditional BI vendors (read non open source). Do the Oracle lawsuits over Java bother you from a business perspective?

James-

Pentaho has a full suite of BI software:

* ETL: Pentaho Data Integration

* Reporting: Pentaho Reporting for desktop and web-based reporting

* OLAP: Mondrian ROLAP engine, and Analyzer or Jpivot for web-based OLAP client

* Dashboards: CDF and Dashboard Designer

* Predictive Analytics: Weka

* Server: Pentaho BI Server, handles web-access, security, scheduling, sharing, report bursting etc

We have all of the standard BI functionality.

The Oracle/Java issue does not bother me much. There are a lot of software companies dependent on Java. If Oracle abandons Java a lot resources will suddenly focus on OpenJDK. It would be good for OpenJDK and might be the best thing for Java in the long term.

Ajay-  What parts of Pentaho’s technology do you personally like the best as having an advantage over other similar proprietary packages.

Describe the latest Pentaho for Hadoop offering and Hadoop/HIVE ‘s advantage over say Map Reduce and SQL.

James- The coolest thing is that everything is pluggable:

* ETL: New data transformation steps can be added. New orchestration controls (job entries) can be added. New perspectives can be added to the design UI. New data sources and destinations can be added.

* Reporting: New content types and report objects can be added. New data sources can be added.

* BI Server: Every factory, engine, and layer can be extended or swapped out via configuration. BI components can be added. New visualizations can be added.

This means it is very easy for Pentaho, partners, customers, and community member to extend our software to do new things.

In addition every engine and component can be fully embedded into a desktop or web-based application. I made a youtube video about our philosophy: http://www.youtube.com/watch?v=uMyR-In5nKE

Our Hadoop offerings allow ETL developers to work in a familiar graphical design environment, instead of having to code MapReduce jobs in Java or Python.

90% of the Hadoop use cases we hear about are transformation/reporting/analysis of structured/semi-structured data, so an ETL tool is perfect for these situations.

Using Pentaho Data Integration reduces implementation and maintenance costs significantly. The fact that our ETL engine is Java and is embeddable means that we can deploy the engine to the Hadoop data nodes and transform the data within the nodes.

Ajay-  Do you think the combination of recession, outsourcing,cost cutting, and unemployment are a suitable environment for companies to cut technology costs by going out of their usual vendor lists and try open source for a change /test projects.

Jamie- Absolutely. Pentaho grew (downloads, installations, revenue) throughout the recession. We are on target to do 250% of what we did last year, while the established vendors are flat in terms of new license revenue.

Ajay-  How would you compare the user interface of reports using Pentaho versus other reporting software. Please feel free to be as specific.

James- We have all of the everyday, standard reporting features covered.

Over the years the old tools, like Crystal Reports, have become bloated and complicated.

We don’t aim to have 100% of their features, because we’d end us just as complicated.

The 80:20 rule applies here. 80% of the time people only use 20% of their features.

We aim for 80% feature parity, which should cover 95-99% of typical use cases.

Ajay-  Could you describe the Pentaho integration with R as well as your relationship with Weka. Jaspersoft already has a partnership with Revolution Analytics for RevoDeployR (R on a web server)-

Any  R plans for Pentaho as well?

James- The feature set of R and Weka overlap to a small extent – both of them include basic statistical functions. Weka is focused on predictive models and machine learning, whereas R is focused on a full suite of statistical models. The creator and main Weka developer is a Pentaho employee. We have integrated R into our ETL tool. (makes me happy 🙂 )

(probably not a good time to ask if SAS integration is done as well for a big chunk of legacy base SAS/ WPS users)

About-

As “Chief Geek” (CTO) at Pentaho, James Dixon is responsible for Pentaho’s architecture and technology roadmap. James has over 15 years of professional experience in software architecture, development and systems consulting. Prior to Pentaho, James held key technical roles at AppSource Corporation (acquired by Arbor Software which later merged into Hyperion Solutions) and Keyola (acquired by Lawson Software). Earlier in his career, James was a technology consultant working with large and small firms to deliver the benefits of innovative technology in real-world environments.