Certifications in Analytics and Business Intelligence

I sometimes get a chat message on Twitter/ Facebook asking for help on some specific data issue. More often than not it is something like – How do I get started in BI/BA /Data stuff. So here is a list of certifications which I think are quite nice as beginning points or even CV multipliers.

[tweetmeme=”Decisionstats”]

1) Google’s Certifications

http://www.google.com/intl/en/adwords/professionals/

2) SAS Certifications

Quite well established and easily one of the best structured certification programs in the industry.

http://support.sas.com/certify/index.html

3) SPSS

The SPSS certification began last year and it helps provide a valuable skill set for both your practice as well as your resume. Also useful to have a second skill set apart from SAS in terms of statistical software.

http://www.spss.com/certification/

At this point I would like you to pause and think if the above certifications are useful or cost  effective for you as they are broadly general qualifications in statistical platforms as well as in applying them for the web analytics ( a key area for business analytics).

For more specialized certifications here are some more-

1) Microsoft SQL Server

http://www.microsoft.com/learning/en/us/certification/cert-sql-server.aspx

2) TDWI Certification

http://tdwi.org/pages/certification/index.aspx

3) IBM

Not sure how updated these are so caveat emptor!

http://www.redbooks.ibm.com/abstracts/sg245747.html

If you are knowledgeable about IBM’s Business Intelligence solutions and the fundamental concepts of DB2 Universal Database, and you are capable of performing the intermediate and advanced skills required to design, develop, and support Business Intelligence applications

Also IBM Cognos Certifications

http://www-01.ibm.com/software/data/education/cognos-cert.html

4) MicroStrategy

http://www.microstrategy.com/education/Certification/

5) Oracle

Included the all new Sun Certifications as well.

http://certification.oracle.com/

and http://blogs.oracle.com/certification/

6) SAP Certifications

http://www.sap.com/services/education/certification/index.epx

7) Cloudera’s Hadoop Certification

http://www.cloudera.com/developers/learn-hadoop/hadoop-certification/

These are some Business Intelligence and Business Analytics related certifications that I assembled in a list. Many other programs were either too software development specific or did not have a certification for general usage (like many R trainings or company tool specific trainings). Please feel free to add in any suggestions.

Google: Prediction API and other cool stuff

Google just announced it’s tools Big Query and Prediction API for use with it’s new cloud storage device called Google Storage. With this the computing cycle seems to have come a full circle – from mainframe to desktop/servers to cloud. The Prediction API seems interesting but it, and the other services, are quite clearly dependent on market as well as developer enthusiasm. Me thinks, Google knows a thing or two about Big Data, and this one looks like a revenue positive product from Google ( unless they get REST less and let it languish like other great ideas-like Docs,Wave etc)

Also could be interesting is applications from both R, as well as SAS and SPSS to start using this remote data cloud/server farm 😉

With Storage,Querying and Prediction Analysis- Google is definitely in the Infrastructure as a Service business, but success with these services would be crucial to establish it’s name in the formidably lucrative business analytics and business intelligence fields.

http://code.google.com/apis/predict/

http://code.google.com/apis/bigquery/

http://code.google.com/apis/storage/

Graphs

Some graphs from the Official Graphs Gallery at sas.com

http://support.sas.com/sassamples/graphgallery/PROC_G3D_Graph_Types_Plots_Scatter.html

From R’s Graph Gallery Here is the same-

http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=10

Which one do you like? Sometimes graphics is about imagination and not just software.

Software- Appls and Bugs

Some time ago I had written on a Twitter application bubble (actually it was a year ago here at https://decisionstats.wordpress.com/2009/04/05/tweets-viruses-and-bubbles/)

The automatic Twitter follow /unfollow (or atleast the automated unfollow ) was used by Twitter App Refollow.com (which is quite old- so it was a surprise when Twitter blamed the recent 0 followers 0 floowing on a bug which allows automated following) and the RSS automated reader is used by Twitterfeed.com (among others). I accidently created/revealed a bug in 2009  with the hash command #rstats which is used as a search index in twitter’s search engine) when I basically married a lot of RSS feeds pertaining to R and added the #rstats with them to the alternative twitter handle (Rarchive) . I did the same with the #sas with Sascommunity (which I later donated on request back to that community sascommunity.org). Basically this had the temporary effect of skewing search results for these search terms for a day (till Twitter fixed it).

As Twitter evolves from a well funded startup to a business- and tries to become more structured from chaotic flux, such bugs will continue to evolve. Bugs and especially software bugs are meant to be fixed (or squashed). This by no means should be a relection on the health of the software service (here- Twitter). Indeed the biggest worry is a mainstream software that has no flexibility for creative third party applications and thinks that it is bug-free. Perfect software exists in a perfect world- and delusional perfection can be dangerous thinking especially for software with clients (even more for statistical software).

Which stats softwares are you using and how confident you are that the bugs are being resolved openly?

The Top Statistical Softwares (GUI)

The list of top Statistical Softwares (GUI) is continued below. You can see the earlier post here

6. R Commander– While initially aimed at being a basic statistics GUI, the tremendous popularity of R Commander and the extensions in the form of plugins has helped make this one of the most widely used GUI. In short if you dont know ANY R, and still want to do basic descriptive stats and modeling this will come in handy- with an added script window for custom code for advanced users and extensions like that for DoE (design of experiments) and QCC (Quality Control) packages the e-plugins are a great way to extend this. I suspect the only thing holding it back is Dr Fox and the rest of R Core’s reluctance to fully embrace GUI as a software medium. You can read his earlier interview here-https://decisionstats.wordpress.com/2009/09/14/interview-professor-john-fox-creator-r-commander/

Technically it is possible to convert just about any package to a GUI menu in R Commander using the e-plugins.

7. SAS GUIs

Enterprise (Guide)

SAS Enterprise Guide was the higher end (and higher priced solution) to enhanced editor’s lack of menu driven commands. It works but many people I know prefer the text editor just as well.


The Enterprise Miner is a separate software and works more like Red R or SPSS Modeler does. Again EM is one of the major DM softwares out there, but the similarity in names is a bit confusing.

Even the Base SAS Enhanced Editor does have some menus for importing data, or querying etc, but it is rarely confused for being a GUI.

8. Oracle Data Miner and Knime

I like both the ODM and Knime but I find the lack of advertising or promotional support puzzling. Both these softwares can do well to combine technical excellence with some marketing. And since they are both free you can check them out yourself here

Oracle Data Mining

You can download it here-(note- the Oracle Web Site itself is a bit aging 🙂 )

http://www.oracle.com/technology/products/bi/odm/odminer.html

Knime is the open source GUI which can be found here-

http://www.knime.org/introduction/features

9. RAwkard

Another R GUI- it stands out on the comprehensive ways you can customize your code in menus rather than writing all or learning by rote the syntax.

From http://sourceforge.net/apps/mediawiki/rkward/index.php?title=Main_Page

you can see it below. I recommend this GUI over other GUIs especially if you are new to R and do more data visualization which needs custom graphics.

10. Red R and R JGR/ Deducer

Red R and RJGR/Deducer are both up and coming GUIs for R. While REd R is R version for Enterprise Miner, Deducer is coming up with a new GUI for ggplot the powerful graphics package in R.

Some GUIs excluded from this list are – Statistica, MatLab, EViews(?) because I dont really work with them, and thought it best to turn them over to someone who knows them better.

Hope this list of GUIs helps you- note most of the softwares can be learnt within a quick hour and two if you know basic software skills/data manipulation so going through the GUI list is a faster way of adding value to your resume/knowledge base as well.


Learning SAS for free

A big longstanding demand for the SAS Institute to enable better access to it’s on demand program for academics was fulfilled when SAS announced  it ‘s access for free- GLOBALLY.

This is really nice as it helps SAS get a huge pool of potential developers and programmers and it helps students learn a valuable skill. In today’s world, having SAS as a language on your resume is probably the fastest surest way to get a job.

Also R would have to work harder to retain academics and students/future users. The “our software is free” arguement wont cut it any more.

SAS OnDemand for Academics is an online service for teaching and learning data management and analytics. Users register and access SAS software via the Web and perform processing by connecting to a hosted server at SAS. Through SAS OnDemand for Academics, users have access to multiple SAS applications such as SAS® Enterprise Guide® (which includes access to Base SAS) and SAS® Enterprise Miner™ (which includes SAS Text Miner). Additional SAS software applications will be added over time.

and

SAS is removing a potential barrier to students seeking experience using advanced data analysis to solve classroom and real-world problems. SAS OnDemand for Academics, already used at no cost by professors at some 200 colleges and universities, will be available at no cost to all students worldwide in fall 2010. SAS OnDemand for Academics quickly and easily delivers the power of SAS software to higher education.

Source-http://www.sas.com/news/preleases/ondemandforacademics-nocostSGF10.html

CommeRcial R- Integration in software

Some updates to R on the commercial side.

Revolution Computing is apparently now renamed Revolution Analytics. Hopefully this and the GUI development will help pay more focused attention on working in R in a mainstream office situation. I am still waiting for David Smith’s cheery hey-guys-we-changed-again blog post though at a new site called inside-r.org/ or his old blog site at blog.revolution-computing.com

They probably need to hire more people now – Curt Monash, noted all-things-data software guru has the inside dope here

Techworld writes more here at http://www.techworld.com.au/article/345288/startup_wants_r_alternative_ibm_sas

The company’s software is priced “aggressively” versus IBM and SAS. A single supported workstation costs $2,000 for an annual subscription. Pricing for server-based licenses varies depending on the implementation.

But Revolution Analytics faces a tough challenge from those larger vendors, as well as the likes of XLSolutions, which offers R training and a competing software package, R-Plus.

SPSS though continues to integrate R solidly and also march ahead with Python (which is likely to be the next gen in statistical programming if it keeps up) http://insideout.spss.com/

With the release of Version 18 of IBM SPSS Statistics and the Developer product, easy-to-install versions of the Python and R materials are posted.  In particular, look for the R Essentials link on the main page or from the Plugins page.  It installs the R Plugin, the correct version of R, and a bunch of example R integrations as bundles.  It’s much easier to get going with this now.

Netezza , a business intelligence vendor promises more integration and even a training in R based analytics here

R Modeling for TwinFin i-Class

Objective
Learn how to use TwinFin i-Class for scaling up the R language.

Description
In this class, you’ll learn how to use R to create models using huge data and how to create R algorithms that exploit our asymmetric massively parallel (AMPP®) architecture. Netezza has seamlessly integrated with R to offload the heavy lifting of the computational processing on TwinFin i-Class. This results in higher performance and increased scalability for R. Sign up for this class to learn how to take advantage of TwinFin i-Class for your R modeling. Topics include:

  1. R CRAN package installation on TwinFin i-Class
  2. Creating models using R on TwinFin i-Class
  3. Creating R algorithms for TwinFin i-Class

Format
Hands-on classroom lecture, lab exercises, tour

Audience
Knowledgeable R users – modelers, analytic developers, data miners

Course Length
0.5 day: 12pm-4pm Wednesday, June 23 OR 8am-12pm Thursday, June 24 OR 1pm-5pm Thursday, June 24, 2010

Delivery
Enzee Universe 2010, Boston, MA

Student Prerequisites

  • Working knowledge of R and parallel computing
  • Have analytic, compute-intensive challenges
  • Understanding of data mining and analytics”

My favourite GUI in stats , JMP (also from SAS Institute) is going to deploy R integration as soon as this September – Read more here- http://www.sas.com/news/preleases/JMP-to-R-integrationSGF10.html

Also SAS-IML studio is not lagging behind

The next release of SAS/IML will extend R integration to the server environment – enabling users to deploy results in batch mode and access R from SAS on additional platforms, such as UNIX and Linux.

I am kind of happy at one of the best GUI’s integrating with one of the most innovative stats softwares. It’s like two of your best friends getting married. (see screenshots of the softwares)

All in all- R as a platform making good overall progress from all sides of the corporate software spectrum which can only be good for R developers as well as users/students.