A new 5 page brochure from Revolution Analytics. Not that slick and some marketing under-kill (which frankly is a surprise)- but I guess Revolution Analytics does not have a full time graphics designer to help with it’s collateral.
Take a look if you are curious how and why R is getting more and more ready for business.
Tableau which has been making waves recntly with its great new data visualization tool announced a partner with my old friends at AsterData. Its really cool piece of data vis and very very fast on the desktop- so I can imagine what speed it can help with AsterData’s MPP Row and Column Zingbang AND Parallel Analytical Functions
Tableau and AsterData also share the common Stanfordian connection (but it seems software is divided quite equally between Stanford, Hardvard Dropouts and North Carolina )
It remains to be seen in this announcement how much each company can leverage the partnership or whether it turns like the SAS Institute- AsterData partnership last year or whether it is just to announce connectors in their software to talk to each other.
AsterData remains the guys with the potential but I would be wrong to say MapReduce–SQL is as hot in December 2010 as it was in June 2009- and the elephant in the room would be Hadoop. That and Google’s continued shyness from encashing its principal comptency of handling Big Data (but hush – I signed a NDA with the Google Prediction API– so things maaaay change very rapidly on ahem that cloud)
Disclaimer- AsterData was my internship sponsor during my winter training while at Univ of Tenn.
Occam’s razor (or Ockham’s razor[1]) is often expressed in Latin as the lex parsimoniae(translating to the law of parsimony, law of economy or law of succinctness). The principle is popularly summarized as “the simplest explanation is more likely the correct one.
Using a simple screenshot- you can see Facebook Analytics for a Facebook page is simpler at explaining who is coming to visit rather than Google Analytics Dashboard (which has not seen the attention of a Visual UI or Graphic Redesign)
And if Facebook is going to take over the internet, well it is definitely giving better analytics in the process. What do you think?
Which Interface is simpler- and gives you better targeting. Ignore the numbers and just see the metrics measured and the way they are presented. Coincidently R is used at Facebook a lot (which has given the jjplot package)- and Google has NOT INVESTED MAJOR MONEY in creating Premium R Packages or Big Data Packages. I am talking investment at the scale Google is known for- not measly meetups.
(the summer of code dont count- it is for students mostly)
(but thanks for the Pizza G Men- and maybe revise that GA interface by putting a razor to some metrics)
Basically Inside R is a go-to site for tips, tricks, packages, as well as blog posts. It thus enhances R Bloggers – but also adds in other multiple features as well.
It is an excellent place for R beginners and learning R. Also it is moderated ( so you wont get the flashy jhing bhang stuff- just your R.
What I really liked is the Pretty R functionality for turning R code -its nifty for color coding R code for use of posting in your blog, journal or article
and when you are there drop them a line for their excellent R support for events (like Pizza, sponsorship) and nifty R packages (doSNOW, foreach, RevoScaler, RevoDeployR) and how much open core makes them look silly?
Come on Revolution- share the open code for RevoScaler package- did you notice any sales dip when you open sourced the other packages? (cue to David Smith to roll his eyes again)
Here is the software matrix that I am trying to develop for analytical software- It should help as a tentative guide for software purchases- it’s independent so unbiased (hopefully)- and it will try and bring as much range or sensitivity as possible. The list (rather than matrix) is of the format-
Type 0f analysis-
Data Visualization (Reporting with Pivot Ability to aggregate, disaggregate)
Reporting without Pivot Ability
Regression -Logistic Regression for Propensity or Risk Models
Often I am asked by clients, friends and industry colleagues on the suitability or unsuitability of particular software for analytical needs. My answer is mostly-
It depends on-
1) Cost of Type 1 error in purchase decision versus Type 2 error in Purchase Decision. (forgive me if I mix up Type 1 with Type 2 error- I do have some weird childhood learning disabilities which crop up now and then)
Here I define Type 1 error as paying more for a software when there were equivalent functionalities available at lower price, or buying components you do need , like SPSS Trends (when only SPSS Base is required) or SAS ETS, when only SAS/Stat would do.
The first kind is of course due to the presence of free tools with GUI like R, R Commander and Deducer (Rattle does have a 500$ commercial version).
The emergence of software vendors like WPS (for SAS language aficionados) which offer similar functionality as Base SAS, as well as the increasing convergence of business analytics (read predictive analytics), business intelligence (read reporting) has led to somewhat brand clutter in which all softwares promise to do everything at all different prices- though they all have specific strengths and weakness. To add to this, there are comparatively fewer business analytics independent analysts than say independent business intelligence analysts.
2) Type 2 Error- In this case the opportunity cost of delayed projects, business models , or lower accuracy – consequences of buying a lower priced software which had lesser functionality than you required.
To compound the magnitude of error 2, you are probably in some kind of vendor lock-in, your software budget is over because of buying too much or inappropriate software and hardware, and still you could do with some added help in business analytics. The fear of making a business critical error is a substantial reason why open source software have to work harder at proving them competent. This is because writing great software is not enough, we need great marketing to sell it, and great customer support to sustain it.
As Business Decisions are decisions made in the constraints of time, information and money- I will try to create a software purchase matrix based on my knowledge of known softwares (and unknown strengths and weakness), pricing (versus budgets), and ranges of data handling. I will add in basically an optimum approach based on known constraints, and add in flexibility for unknown operational constraints.
I will restrain this matrix to analytics software, though you could certainly extend it to other classes of enterprise software including big data databases, infrastructure and computing.
Noted Assumptions- 1) I am vendor neutral and do not suffer from subjective bias or affection for particular software (based on conferences, books, relationships,consulting etc)
2) All software have bugs so all need customer support.
3) All software have particular advantages , strengths and weakness in terms of functionality.
4) Cost includes total cost of ownership and opportunity cost of business analytics enabled decision.
5) All software marketing people will praise their own software- sometimes over-selling and mis-selling product bundles.
Software compared are SPSS, KXEN, R,SAS, WPS, Revolution R, SQL Server, and various flavors and sub components within this. Optimized approach will include parallel programming, cloud computing, hardware costs, and dependent software costs.
1) It is slower with bigger datasets than SPSS language and SAS language .If you use bigger datasets, then you should either consider more hardware , or try and wait for some of the ODBC connect packages.
2) It needs more time to learn than SAS language .Much more time to learn how to do much more.
3) R programmers are lesser paid than SAS programmers.They prefer it that way.It equates the satisfaction of creating a package in development with a world wide community with the satisfaction of using a package and earning much more money per hour.
4) It forces you to learn the exact details of what you are doing due to its object oriented structure. Thus you either get no answer or get an exact answer. Your customer pays you by the hour not by the correct answers.
5) You can not push a couple of buttons or refer to a list of top ten most commonly used commands to finish the project.
6) It is free. And open for all. It is socialism expressed in code. Some of the packages are built by university professors. It is free.Free is bad. Who pays for the mortgage of the software programmers if all softwares were free ? Who pays for the Friday picnics. Who pays for the Good Night cruises?
7) It is free. Your organization will not commend you for saving them money- they will question why you did not recommend this before. And why did you approve all those packages that expire in 2011.R is fReeeeee. Customers feel good while spending money.The more software budgets you approve the more your salary is. R thReatens all that.
8) It is impossible to install a package you do not need or want. There is no one calling you on the phone to consider one more package or solution. R can make you lonely.
9) R uses mostly Command line. Command line is from the Seventies. Or the Eighties. The GUI’s RCmdr and Rattle are there but still…..
10) R forces you to learn new stuff by the month. You prefer to only earn by the month. Till the day your job got offshored…
Ajay- The above post was reprinted by personal request. It was written on Jan 2009- and may not be truly valid now. It is meant to be taken in good humor-not so seriously.