Certifications in Analytics and Business Intelligence

I sometimes get a chat message on Twitter/ Facebook asking for help on some specific data issue. More often than not it is something like – How do I get started in BI/BA /Data stuff. So here is a list of certifications which I think are quite nice as beginning points or even CV multipliers.

[tweetmeme=”Decisionstats”]

1) Google’s Certifications

http://www.google.com/intl/en/adwords/professionals/

2) SAS Certifications

Quite well established and easily one of the best structured certification programs in the industry.

http://support.sas.com/certify/index.html

3) SPSS

The SPSS certification began last year and it helps provide a valuable skill set for both your practice as well as your resume. Also useful to have a second skill set apart from SAS in terms of statistical software.

http://www.spss.com/certification/

At this point I would like you to pause and think if the above certifications are useful or cost  effective for you as they are broadly general qualifications in statistical platforms as well as in applying them for the web analytics ( a key area for business analytics).

For more specialized certifications here are some more-

1) Microsoft SQL Server

http://www.microsoft.com/learning/en/us/certification/cert-sql-server.aspx

2) TDWI Certification

http://tdwi.org/pages/certification/index.aspx

3) IBM

Not sure how updated these are so caveat emptor!

http://www.redbooks.ibm.com/abstracts/sg245747.html

If you are knowledgeable about IBM’s Business Intelligence solutions and the fundamental concepts of DB2 Universal Database, and you are capable of performing the intermediate and advanced skills required to design, develop, and support Business Intelligence applications

Also IBM Cognos Certifications

http://www-01.ibm.com/software/data/education/cognos-cert.html

4) MicroStrategy

http://www.microstrategy.com/education/Certification/

5) Oracle

Included the all new Sun Certifications as well.

http://certification.oracle.com/

and http://blogs.oracle.com/certification/

6) SAP Certifications

http://www.sap.com/services/education/certification/index.epx

7) Cloudera’s Hadoop Certification

http://www.cloudera.com/developers/learn-hadoop/hadoop-certification/

These are some Business Intelligence and Business Analytics related certifications that I assembled in a list. Many other programs were either too software development specific or did not have a certification for general usage (like many R trainings or company tool specific trainings). Please feel free to add in any suggestions.

Google: Prediction API and other cool stuff

Google just announced it’s tools Big Query and Prediction API for use with it’s new cloud storage device called Google Storage. With this the computing cycle seems to have come a full circle – from mainframe to desktop/servers to cloud. The Prediction API seems interesting but it, and the other services, are quite clearly dependent on market as well as developer enthusiasm. Me thinks, Google knows a thing or two about Big Data, and this one looks like a revenue positive product from Google ( unless they get REST less and let it languish like other great ideas-like Docs,Wave etc)

Also could be interesting is applications from both R, as well as SAS and SPSS to start using this remote data cloud/server farm 😉

With Storage,Querying and Prediction Analysis- Google is definitely in the Infrastructure as a Service business, but success with these services would be crucial to establish it’s name in the formidably lucrative business analytics and business intelligence fields.

http://code.google.com/apis/predict/

http://code.google.com/apis/bigquery/

http://code.google.com/apis/storage/

Software- Appls and Bugs

Some time ago I had written on a Twitter application bubble (actually it was a year ago here at https://decisionstats.wordpress.com/2009/04/05/tweets-viruses-and-bubbles/)

The automatic Twitter follow /unfollow (or atleast the automated unfollow ) was used by Twitter App Refollow.com (which is quite old- so it was a surprise when Twitter blamed the recent 0 followers 0 floowing on a bug which allows automated following) and the RSS automated reader is used by Twitterfeed.com (among others). I accidently created/revealed a bug in 2009  with the hash command #rstats which is used as a search index in twitter’s search engine) when I basically married a lot of RSS feeds pertaining to R and added the #rstats with them to the alternative twitter handle (Rarchive) . I did the same with the #sas with Sascommunity (which I later donated on request back to that community sascommunity.org). Basically this had the temporary effect of skewing search results for these search terms for a day (till Twitter fixed it).

As Twitter evolves from a well funded startup to a business- and tries to become more structured from chaotic flux, such bugs will continue to evolve. Bugs and especially software bugs are meant to be fixed (or squashed). This by no means should be a relection on the health of the software service (here- Twitter). Indeed the biggest worry is a mainstream software that has no flexibility for creative third party applications and thinks that it is bug-free. Perfect software exists in a perfect world- and delusional perfection can be dangerous thinking especially for software with clients (even more for statistical software).

Which stats softwares are you using and how confident you are that the bugs are being resolved openly?

CommeRcial R- Integration in software

Some updates to R on the commercial side.

Revolution Computing is apparently now renamed Revolution Analytics. Hopefully this and the GUI development will help pay more focused attention on working in R in a mainstream office situation. I am still waiting for David Smith’s cheery hey-guys-we-changed-again blog post though at a new site called inside-r.org/ or his old blog site at blog.revolution-computing.com

They probably need to hire more people now – Curt Monash, noted all-things-data software guru has the inside dope here

Techworld writes more here at http://www.techworld.com.au/article/345288/startup_wants_r_alternative_ibm_sas

The company’s software is priced “aggressively” versus IBM and SAS. A single supported workstation costs $2,000 for an annual subscription. Pricing for server-based licenses varies depending on the implementation.

But Revolution Analytics faces a tough challenge from those larger vendors, as well as the likes of XLSolutions, which offers R training and a competing software package, R-Plus.

SPSS though continues to integrate R solidly and also march ahead with Python (which is likely to be the next gen in statistical programming if it keeps up) http://insideout.spss.com/

With the release of Version 18 of IBM SPSS Statistics and the Developer product, easy-to-install versions of the Python and R materials are posted.  In particular, look for the R Essentials link on the main page or from the Plugins page.  It installs the R Plugin, the correct version of R, and a bunch of example R integrations as bundles.  It’s much easier to get going with this now.

Netezza , a business intelligence vendor promises more integration and even a training in R based analytics here

R Modeling for TwinFin i-Class

Objective
Learn how to use TwinFin i-Class for scaling up the R language.

Description
In this class, you’ll learn how to use R to create models using huge data and how to create R algorithms that exploit our asymmetric massively parallel (AMPP®) architecture. Netezza has seamlessly integrated with R to offload the heavy lifting of the computational processing on TwinFin i-Class. This results in higher performance and increased scalability for R. Sign up for this class to learn how to take advantage of TwinFin i-Class for your R modeling. Topics include:

  1. R CRAN package installation on TwinFin i-Class
  2. Creating models using R on TwinFin i-Class
  3. Creating R algorithms for TwinFin i-Class

Format
Hands-on classroom lecture, lab exercises, tour

Audience
Knowledgeable R users – modelers, analytic developers, data miners

Course Length
0.5 day: 12pm-4pm Wednesday, June 23 OR 8am-12pm Thursday, June 24 OR 1pm-5pm Thursday, June 24, 2010

Delivery
Enzee Universe 2010, Boston, MA

Student Prerequisites

  • Working knowledge of R and parallel computing
  • Have analytic, compute-intensive challenges
  • Understanding of data mining and analytics”

My favourite GUI in stats , JMP (also from SAS Institute) is going to deploy R integration as soon as this September – Read more here- http://www.sas.com/news/preleases/JMP-to-R-integrationSGF10.html

Also SAS-IML studio is not lagging behind

The next release of SAS/IML will extend R integration to the server environment – enabling users to deploy results in batch mode and access R from SAS on additional platforms, such as UNIX and Linux.

I am kind of happy at one of the best GUI’s integrating with one of the most innovative stats softwares. It’s like two of your best friends getting married. (see screenshots of the softwares)

All in all- R as a platform making good overall progress from all sides of the corporate software spectrum which can only be good for R developers as well as users/students.

Top 10 Graphical User Interfaces in Statistical Software

Here is a list of top 10 GUIs in Statistical Software. The overall criterion is based on-

  • User Friendly Nature for a New User to begin click and point and learn.
  • Cleanliness of Automated Code or Log generated.
  • Practical application in consulting and corporate world.
  • Cost and Ease of Ownership (including purchase,install,training,maintainability,renewal)
  • Aesthetics (or just plain pretty)

However this list is not in order of ranking- ( as beauty (of GUI) lies in eyes of the beholder). For a list of top 10 GUI in R language only please see –

https://rforanalytics.wordpress.com/graphical-user-interfaces-for-r/

This is only a GUI based list so it excludes notable command line or text editor submit commands based softwares which are also very powerful and user friendly.

  1. JMP –

While critics of SAS Institute often complain on the premium pricing of the basic model (especially AFTER the entry of another SAS language software WPS from http://www.teamwpc.co.uk/products/wps – they should try out JMP from http://jmp.com – it has a 1 month free evaluation, is much less expensive and the GUI makes it very very easy to do basic statistical analysis and testing. The learning curve is surprisingly fast to pick it up (as it should be for well designed interfaces) and it allows for very good quality output graphics as well.

2.SPSS

The original GUI in this class of softwares- it has now expanded to a big portfolio of products. However SPSS 18 is nice with the increasing focus on Python and an early adoptee of R compatible interfaces, SPSS does offer a much affordable solution as well with a free evaluation. See especially http://www.spss.com/statistics/ and http://www.spss.com/software/modeling/modeler-pro/

the screenshot here is of SPSS Modeler

3. WPS

While it offers an alternative to Base SAS and SAS /Access software , I really like the affordability (1 Month Free Evaluation and overall lower cost especially for multiple CPU servers ), speed (on the desktop but not on the IBM OS version ) and the intuitive design as well as extensibility of the Workbench. It may look like an integrated development environment and not a proper GUI, but with all the menu features it does qualify as a GUI in my opinion. Continue reading “Top 10 Graphical User Interfaces in Statistical Software”

Norman Nie: R GUI and More

Here is an interview from Norman Nie, SPSS Founder and CEO, REvolution Computing (R Platform).

Some notable thoughts

For example, SPSS was really among the first to deliver rich GUIs that make it easier to use by more people. This is why one of the first things you’ll see from REvolution is a GUI for R – to make R more accessible and hereby further accelerate adoption.

This is good news if executed- I have often written (in agony actually because I use it) for the need for GUIs for R. My last post on that was here. Indeed the one reason SPSS was easily adopted by business school students (like me) in India in 2001-3 was the much better GUI over SAS ‘s GUIs.

However some self delusion/ PR / cognitive dissonance seems at play at Dr Nie’s words

If you look at the last 40 years of university curriculum, SPSS – the product I helped build – has been the dominant player, even becoming the common thread uniting a diverse range of disciplines, which have in turn been applied to business. Data is ubiquitous: tools and data warehouses allow you to query a given set of data repeatedly. R does these things better than the alternatives out there; it is indeed the wave of the future.

SPSS has been a strong number 2- but it has never overtaken SAS. Part of that is SAS handles much bigger datasets much more easily than SPSS did ( and that is where R’s RAM only size can be a concern). Given the decreasing prices of RAM memory, the BIG-LM like packages, and the shift for cloud based computing(with rampable memory on demand) this can be less of an issue- but analysts generally like to have a straight way of handling bigger datasets. Indeed SAS with vertical focus and the recent social media analytics continues to innovate both itself as well as through its alliance partnerships in the Enterprise software world- and REvolution Computing would further need to tie up or sew these analytical partners especially data warehousing or BI providers to ensure R’s analytical functions can be used where there is maximum value for their usage to the corporate customer as well as the academic customer.

Part 2 of Nie’s interview should be interesting .

2010-2011 would likely see

Round 2 : Red Corner ( Nie)                             Gray Corner (Goodnight)

if

Norman Nie can truly deliver a REvolution in Computing

or else

he becomes number two again the second time around to Jim Goodnight’s software giant.

Towards better Statistical Interfaces

I was just walking about the U Tenn campus thinking about my next month departure from the school back to India when I ran into Bob Muenchen , head of the Stats consulting centre and more famously the author of ” R for SAS and SPSS users” . Bob mentioned that the edition for R for Stata should be ready for next month. It was also his idea for the article on Red R.

In fact what perplexes users of statistical software like me is why complex softwares like R or SAS choose interfaces that are clearly not as well designed in simplicity as they are in statistical rigor. I think SPSS to some extent and JMP to a much greater extent represent well designed user interfaces. While Rattle , R Commander , R Analytical Flow and Red R are examples for R interfaces SAS also invested in the Enterprise class interfaces.

On all these I belive there is a much greater need for say a Pro UI designer and clean it up. I was reading Prof Maeda’s laws of simplicity ( see http://lawsofsimplicity.com ) and just comparing and contrasting that with some of the softwares I end up using.

The Principles of Reduce ( Shrink, Hide , Embody ) and Organize ( Sort , Label , Integrate and Priortize ) need to be looked into by the Chief Software Interface designers for analytics and BI. While attempts to create more and more robust and faster algorithms and prettier dashboards are important is it not important to simplify the process and procedures to do so . The software which is easier to learn and pick up will tend to have an edge over less visually designed softwares. Keeping it simple helped Apple in the retail electronics and software , it needs to be seen who or which enterprise BI or BA software will make attempts to do the same. An ideal stats or BI interface should be simple and powerful enough to be used by decision makers directly on occasion rather rely on the middleware of analysts and consultants solely.