Tale of Two Analytical Interfaces

Occam’s razor (or Ockham’s razor[1]) is often expressed in Latin as the lex parsimoniae(translating to the law of parsimonylaw of economy or law of succinctness). The principle is popularly summarized as “the simplest explanation is more likely the correct one.

Using a simple screenshot- you can see Facebook Analytics for a Facebook page is simpler at explaining who is coming to visit rather than Google Analytics Dashboard (which has not seen the attention of a Visual UI or Graphic Redesign)

And if Facebook is going to take over the internet, well it is definitely giving better analytics in the process. What do you think?

Which Interface is simpler- and gives you better targeting. Ignore the numbers and just see the metrics measured and the way they are presented. Coincidently R is used at Facebook a lot (which has given the jjplot package)- and Google has NOT INVESTED MAJOR MONEY in creating Premium R Packages or Big Data Packages. I am talking investment at the scale Google is known for- not measly meetups.

(the summer of code dont count- it is for students mostly)

(but thanks for the Pizza G Men- and maybe revise that GA interface by putting a razor to some metrics)

GA vs Facebook Analytics


New Google Ad Planner

Dusan's User Interface challenge
Image by moggs oceanlane via Flickr

The new Google Ad Planner is really nice-seems better than old Adwords interface, though needs a UI redesign before it can complete with the clean cut slice and dice of Facebook Ad Planner.

It’s the interface, stupid that makes an Iphone sell more than the Symbian even with 90% functionality. Same reasons why Google Storage is okay but Google Prediction API gets slower liftoff than Amazon Console (now with FREE instances) – though the R interface to Prediction API sure helps.

Prediction API is a terrific tool dying for oxygen out there (and will end up like Wave- I hope not)

Sometimes you need artists as well as engineers to design query tools, G Men- and guess the Double Click anti trust rumours have quietened down enough because why the heck did double click interface integration take so loooong.

( and btw why cant Google just get into the multi billion dashboard business if they can manage ALL the data IN THE INTERNET ——they sure can do it for specific companies- – but wait-

they are probably waiting for AsterData to stop sucking thumbs ,chanting on MapReduce SQL,  MapReduce SQL nursery rhymes and start inventing NEW STUFF again (or atleast creating two product brands from nCluster (when you and I were in school together giggle)

Btw the time Google make up their mind to enter BI or wait for Aster to finish- IBM would have gulped and burped all there it is- and thats the way that market rolls.

Back to Ad s and Mad Men.

Here are some screenshots-of the new Google Ad Planner-

I found it useful to review traffic for third party websites (even better than Google Trends) and thats a definite plus over Facebooks closed dormitory world of ads.

Click on them for some more views or go straight to http://google.com/adplanner and Enjoy Baby!

Which websites attract your target customers?

View a site listing: 

Ad Planner top 1,000 sites

Refine your online advertising with DoubleClick Ad Planner, a free media planning tool that can help you:

Identify websites your target customers are likely to visit

  • Define audiences by demographics and interests.
  • Search for websites relevant to your target audience.
  • Access unique users, page views, and other data for millions of websites from over 40 countries.

Easily build media plans for yourself or your clients

  • Create lists of websites where you’d like to advertise.
  • Generate aggregated website statistics for your media plan.


Take charge of your DoubleClick Ad Planner site listing

View a site listing: 

Ad Planner top 1,000 sites

DoubleClick Ad Planner is a media planning tool where advertisers find sites for their media buys. As a site owner, you can access the DoubleClick Ad Planner Publisher Center and
Market your site
Write a site description to present your audience and unique value to advertisers.
Help advertisers search for you
Choose categories for your site and ad formats you support.
Improve the data that advertisers see
Share your Google Analytics data to reflect the most accurate traffic numbers for your site.


Interview James Dixon Pentaho

Here is an interview with James Dixon the founder of Pentaho, self confessed Chief Geek and CTO. Pentaho has been growing very rapidly and it makes open source Business Intelligence solutions- basically the biggest chunk of enterprise software market currently.

Ajay-  How would you describe Pentaho as a BI product for someone who is completely used to traditional BI vendors (read non open source). Do the Oracle lawsuits over Java bother you from a business perspective?


Pentaho has a full suite of BI software:

* ETL: Pentaho Data Integration

* Reporting: Pentaho Reporting for desktop and web-based reporting

* OLAP: Mondrian ROLAP engine, and Analyzer or Jpivot for web-based OLAP client

* Dashboards: CDF and Dashboard Designer

* Predictive Analytics: Weka

* Server: Pentaho BI Server, handles web-access, security, scheduling, sharing, report bursting etc

We have all of the standard BI functionality.

The Oracle/Java issue does not bother me much. There are a lot of software companies dependent on Java. If Oracle abandons Java a lot resources will suddenly focus on OpenJDK. It would be good for OpenJDK and might be the best thing for Java in the long term.

Ajay-  What parts of Pentaho’s technology do you personally like the best as having an advantage over other similar proprietary packages.

Describe the latest Pentaho for Hadoop offering and Hadoop/HIVE ‘s advantage over say Map Reduce and SQL.

James- The coolest thing is that everything is pluggable:

* ETL: New data transformation steps can be added. New orchestration controls (job entries) can be added. New perspectives can be added to the design UI. New data sources and destinations can be added.

* Reporting: New content types and report objects can be added. New data sources can be added.

* BI Server: Every factory, engine, and layer can be extended or swapped out via configuration. BI components can be added. New visualizations can be added.

This means it is very easy for Pentaho, partners, customers, and community member to extend our software to do new things.

In addition every engine and component can be fully embedded into a desktop or web-based application. I made a youtube video about our philosophy: http://www.youtube.com/watch?v=uMyR-In5nKE

Our Hadoop offerings allow ETL developers to work in a familiar graphical design environment, instead of having to code MapReduce jobs in Java or Python.

90% of the Hadoop use cases we hear about are transformation/reporting/analysis of structured/semi-structured data, so an ETL tool is perfect for these situations.

Using Pentaho Data Integration reduces implementation and maintenance costs significantly. The fact that our ETL engine is Java and is embeddable means that we can deploy the engine to the Hadoop data nodes and transform the data within the nodes.

Ajay-  Do you think the combination of recession, outsourcing,cost cutting, and unemployment are a suitable environment for companies to cut technology costs by going out of their usual vendor lists and try open source for a change /test projects.

Jamie- Absolutely. Pentaho grew (downloads, installations, revenue) throughout the recession. We are on target to do 250% of what we did last year, while the established vendors are flat in terms of new license revenue.

Ajay-  How would you compare the user interface of reports using Pentaho versus other reporting software. Please feel free to be as specific.

James- We have all of the everyday, standard reporting features covered.

Over the years the old tools, like Crystal Reports, have become bloated and complicated.

We don’t aim to have 100% of their features, because we’d end us just as complicated.

The 80:20 rule applies here. 80% of the time people only use 20% of their features.

We aim for 80% feature parity, which should cover 95-99% of typical use cases.

Ajay-  Could you describe the Pentaho integration with R as well as your relationship with Weka. Jaspersoft already has a partnership with Revolution Analytics for RevoDeployR (R on a web server)-

Any  R plans for Pentaho as well?

James- The feature set of R and Weka overlap to a small extent – both of them include basic statistical functions. Weka is focused on predictive models and machine learning, whereas R is focused on a full suite of statistical models. The creator and main Weka developer is a Pentaho employee. We have integrated R into our ETL tool. (makes me happy 🙂 )

(probably not a good time to ask if SAS integration is done as well for a big chunk of legacy base SAS/ WPS users)


As “Chief Geek” (CTO) at Pentaho, James Dixon is responsible for Pentaho’s architecture and technology roadmap. James has over 15 years of professional experience in software architecture, development and systems consulting. Prior to Pentaho, James held key technical roles at AppSource Corporation (acquired by Arbor Software which later merged into Hyperion Solutions) and Keyola (acquired by Lawson Software). Earlier in his career, James was a technology consultant working with large and small firms to deliver the benefits of innovative technology in real-world environments.

R Apache – The next frontier of R Computing

I am currently playing/ trying out RApache- one more excellent R product from Vanderbilt’s excellent Dept of Biostatistics and it’s prodigious coder Jeff Horner.

The big ninja himself

I really liked the virtual machine idea- you can download a virtual image of Rapache and play with it- .vmx is easy to create and great to share-


Basically using R Apache (with an EC2 on backend) can help you create customized dashboards, BI apps, etc all using R’s graphical and statistical capabilities.

What’s R Apache?

As  per


Rapache embeds the R interpreter inside the Apache 2 web server. By doing this, Rapache realizes the full potential of R and its facilities over the web. R programmers configure appache by mapping Universal Resource Locaters (URL’s) to either R scripts or R functions. The R code relies on CGI variables to read a client request and R’s input/output facilities to write the response.

One advantage to Rapache’s architecture is robust multi-process management by Apache. In contrast to Rserve and RSOAP, Rapache is a pre-fork server utilizing HTTP as the communications protocol. Another advantage is a clear separation, a loose coupling, of R code from client code. With Rserve and RSOAP, the client must send data and R commands to be executed on the server. With Rapache the only client requirements are the ability to communicate via HTTP. Additionally, Rapache gains significant authentication, authorization, and encryption mechanism by virtue of being embedded in Apache.

Existing Demos of Architechture based on R Apache-

  1. http://rweb.stat.ucla.edu/ggplot2/ An interactive web dashboard for plotting graphics based on csv or Google Spreadsheet Data
  2. http://labs.dataspora.com/gameday/ A demo visualization of a web based dashboard system of baseball pitches by pitcher by player 








3. http://data.vanderbilt.edu/rapache/bbplot For baseball results – a demo of a query based web dashboard system- very good BI feel.

Whats coming next in R Apache?

You can  download version 1.1.10 of rApache now. There
are only two significant changes and you don’t have to edit your
apache config or change any code (just recompile rApache and

1) Error reporting should be more informative. both when you
accidentally introduce errors in the Apache config, and when your code
introduces warnings and errors from web requests.

I’ve struggled with this one for awhile, not really knowing what
strategy would be best. Basically, rApache hooks into the R I/O layer
at such a low level that it’s hard to capture all warnings and errors
as they occur and introduce them to the user in a sane manner. In
prior releases, when ROutputErrors was in effect (either the apache
directive or the R function) one would typically see a bunch of grey
boxes with a red outline with a title of RApache Warning/Error!!!.
Unfortunately those grey boxes could contain empty lines, one line of
error, or a few that relate to the lines in previously displayed
boxes. Really a big uninformative mess.

The new approach is to print just one warning box with the title
“”Oops!!! <b>rApache</b> has something to tell you. View source and
read the HTML comments at the end.” and then as the title implies you
can read the HTML comment located at the end of the file… after the
closing html. That way, you’re actually reading how R would present
the warnings and errors to you as if you executed the code at the R
command prompt. And if you don’t use ROutputErrors, the warning/error
messages are printed in the Apache log file, just as they were before,
but nicer 😉

2) Code dispatching has changed so please let me know if I’ve
introduced any strange behavior.

This was necessary to enhance error reporting. Prior to this release,
rApache would use R’s C API exclusively to build up the call to your
code that is then passed to R’s evaluation engine. The advantage to
this approach is that it’s much more efficient as there is no parsing
involved, however all information about parse errors, files which
produced errors, etc. were lost. The new approach uses R’s built-in
parse function to build up the call and then passes it of to R. A
slight overhead, but it should be negligible. So, if you feel that
this approach is too slow OR I’ve introduced bugs or strange behavior,
please let me know.


I’m gaining more experience building Debian/Ubuntu packages each day,
so hopefully by some time in 2011 you can rely on binary releases for
these distributions and not install rApache from source! Fingers

Development on the rApache 1.1 branch will be winding down (save bug
fix releases) as I transition to the 1.2 branch. This will involve
taking out a small chunk of code that defines the rApache development
environment (all the CGI variables and the functions such as
setHeader, setCookie, etc) and placing it in its own R package…
unnamed as of yet. This is to facilitate my development of the ralite
R package, a small single user cross-platform web server.

The goal for ralite is to speed up development of R web applications,
take out a bit of friction in the development process by not having to
run the full rApache server. Plus it would allow users to develop in
the rApache enronment while on windows and later deploy on more
capable server environments. The secondary goal for ralite is it’s use
in other web server environments (nginx and IIS come to mind) as a
persistent per-client process.

And finally, wiki.rapache.net will be the new www.rapache.net once I
translate the manual over… any day now.

From –http://biostat.mc.vanderbilt.edu/wiki/Main/JeffreyHorner



Not convinced ?- try the demos above.

IBM SPSS 19: Marketing Analytics and RFM

What is RFM Analysis?

Recency Frequency Monetization is basically a technique to classify your entire customer list. You may be a retail player with thousands of customers or a enterprise software seller with only two dozen customers.

RFM Analysis can help you cut through and focus on the real customer that drives your profit.

As per Wikipedia


RFM is a method used for analyzing customer behavior and defining market segments. It is commonly used in database marketing and direct marketing and has received particular attention in retail.

RFM stands for

  • Recency – How recently a customer has purchased?
  • Frequency – How often he purchases?
  • Monetary Value – How much does he spend?

To create an RFM analysis, one creates categories for each attribute. For instance, the Recency attribute might be broken into three categories: customers with purchases within the last 90 days; between 91 and 365 days; and longer than 365 days. Such categories may be arrived at by applying business rules, or using a data mining technique, such asCHAID, to find meaningful breaks.


Even if you dont know what or how to do a RFM, see below for an easy to do way.

I just got myself an evaluation copy of a fully loaded IBM SPSS 19 Module and did some RFM Analysis on some data- the way SPSS recent version is it makes it very very useful even to non statistical tool- but an extremely useful one to a business or marketing user.

Here are some screenshots to describe the features.

1) A simple dashboard to show functionality (with room for improvement for visual appeal)

2) Simple Intuitive design to inputting data3) Some options in creating marketing scorecards4) Easy to understand features for a business audiences

rather than pseudo techie jargon5) Note the clean design of the GUI in specifying data input type6) Again multiple options to export results in a very user friendly manner with options to customize business report7) Graphical output conveniently pasted inside a word document rather than a jumble of images. Auto generated options for customized standard graphs.8) An attractive heatmap to represent monetization for customers. Note the effect that a scale of color shades have in visual representation of data.9) Comparative plots placed side by side with easy to understand explanation (in the output word doc not shown here)10) Auto generated scores attached to data table to enhance usage. 

Note here I am evaluating RFM as a marketing technique (which is well known) but also the GUI of IBM SPSS 19 Marketing Analytics. It is simple, and yet powerful into turning what used to be a purely statistical software for nerds into a beautiful easy to implement tool for business users.

So what else can you do in Marketing Analytics with SPSS 19.

IBM SPSS Direct Marketing

The Direct Marketing add-on option allows organizations to ensure their marketing programs are as effective as possible, through techniques specifically designed for direct marketing, including:

• RFM Analysis. This technique identifies existing customers who are most likely to respond to a new offer.

• Cluster Analysis. This is an exploratory tool designed to reveal natural groupings (or clusters) within your data. For example, it can identify different groups of customers based on various demographic and purchasing characteristics.

• Prospect Profiles. This technique uses results from a previous or test campaign to create descriptive profiles. You can use the profiles to target specific groups of contacts in future campaigns.

• Postal Code Response Rates. This technique uses results from a previous campaign to calculate postal code response rates. Those rates can be used to target specific postal codes in future campaigns.

• Propensity to Purchase. This technique uses results from a test mailing or previous campaign to generate propensity scores. The scores indicate which contacts are most likely to respond.

• Control Package Test. This technique compares marketing campaigns to see if there is a significant difference in effectiveness for different packages or offers.

Click here to find out more about Direct Marketing.