Interview Dean Abbott Abbott Analytics

Here is an interview with noted Analytics Consultant and trainer Dean Abbott. Dean is scheduled to take a workshop on Predictive Analytics at PAW (Predictive Analytics World Conference)  Oct 18 , 2010 in Washington D.C

Ajay-  Describe your upcoming hands on workshop at Predictive Analytics World and how it can help people learn more predictive modeling.

Refer- http://www.predictiveanalyticsworld.com/dc/2010/handson_predictive_analytics.php

Dean- The hands-on workshop is geared toward individuals who know something about predictive analytics but would like to experience the process. It will help people in two regards. First, by going through the data assessment, preparation, modeling and model assessment stages in one day, the attendees will see how predictive analytics works in reality, including some of the pain associated with false starts and mistakes. At the same time, they will experience success with building reasonable models to solve a problem in a single day. I have found that for many, having to actually build the predictive analytics solution if an eye-opener. Seeing demonstrations show the capabilities of a tool, but greater value for an end-user is the development of intuition of what to do at each each stage of the process that makes the theory of predictive analytics real.

Second, they will gain experience using a top-tier predictive analytics software tool, Enterprise Miner (EM). This is especially helpful for those who are considering purchasing EM, but also for those who have used open source tools and have never experienced the additional power and efficiencies that come with a tool that is well thought out from a business solutions standpoint (as opposed to an algorithm workbench).

Ajay-  You are an instructor with software ranging from SPSS, S Plus, SAS Enterprise Miner, Statistica and CART. What features of each software do you like best and are more suited for application in data cases.

Dean- I’ll add Tibco Spotfire Miner, Polyanalyst and Unica’s Predictive Insight to the list of tools I’ve taught “hands-on” courses around, and there are at least a half dozen more I demonstrate in lecture courses (JMP, Matlab, Wizwhy, R, Ggobi, RapidMiner, Orange, Weka, RandomForests and TreeNet to name a few). The development of software is a fascinating undertaking, and each tools has its own strengths and weaknesses.

I personally gravitate toward tools with data flow / icon interface because I think more that way, and I’ve tired of learning more programming languages.

Since the predictive analytics algorithms are roughly the same (backdrop is backdrop no matter which tool you use), the key differentiators are

(1) how data can be loaded in and how tightly integrated can the tool be with the database,

(2) how well big data can be handled,

(3) how extensive are the data manipulation options,

(4) how flexible are the model reporting options, and

(5) how can you get the models and/or predictions out.

There are vast differences in the tools on these matters, so when I recommend tools for customers, I usually interview them quite extensively to understand better how they use data and how the models will be integrated into their business practice.

A final consideration is related to the efficiency of using the tool: how much automation can one introduce so that user-interaction is minimized once the analytics process has been defined. While I don’t like new programming languages, scripting and programming often helps here, though some tools have a way to run the visual programming data diagram itself without converting it to code.

Ajay- What are your views on the increasing trend of consolidation and mergers and acquisitions in the predictive analytics space. Does this increase the need for vendor neutral analysts and consultants as well as conferences.

Dean- When companies buy a predictive analytics software package, it’s a mixed bag. SPSS purchasing of Clementine was ultimately good for the predictive analytics, though it took several years for SPSS to figure out what they wanted to do with it. Darwin ultimately disappeared after being purchased by Oracle, but the newer Oracle data mining tool, ODM, integrates better with the database than Darwin did or even would have been able to.

The biggest trend and pressure for the commercial vendors is the improvements in the Open Source and GNU tools. These are becoming more viable for enterprise-level customers with big data, though from what I’ve seen, they haven’t caught up with the big commercial players yet. There is great value in bringing both commercial and open source tools to the attention of end-users in the context of solutions (rather than sales) in a conference setting, which is I think an advantage that Predictive Analytics World has.

As a vendor-neutral consultant, flux is always a good thing because I have to be proficient in a variety of tools, and it is the breadth that brings value for customers entering into the predictive analytics space. But it is very difficult to keep up with the rapidly-changing market and that is something I am weighing myself: how many tools should I keep in my active toolbox.

Ajay-  Describe your career and how you came into the Predictive Analytics space. What are your views on various MS Analytics offered by Universities.

Dean- After getting a masters degree in Applied Mathematics, my first job was at a small aerospace engineering company in Charlottesville, VA called Barron Associates, Inc. (BAI); it is still in existence and doing quite well! I was working on optimal guidance algorithms for some developmental missile systems, and statistical learning was a key part of the process, so I but my teeth on pattern recognition techniques there, and frankly, that was the most interesting part of the job. In fact, most of us agreed that this was the most interesting part: John Elder (Elder Research) was the first employee at BAI, and was there at that time. Gerry Montgomery and Paul Hess were there as well and left to form a data mining company called AbTech and are still in analytics space.

After working at BAI, I had short stints at Martin Marietta Corp. and PAR Government Systems were I worked on analytics solutions in DoD, primarily radar and sonar applications. It was while at Elder Research in the 90s that began working in the commercial space more in financial and risk modeling, and then in 1999 I began working as an independent consultant.

One thing I love about this field is that the same techniques can be applied broadly, and therefore I can work on CRM, web analytics, tax and financial risk, credit scoring, survey analysis, and many more application, and cross-fertilize ideas from one domain into other domains.

Regarding MS degrees, let me first write that I am very encouraged that data mining and predictive analytics are being taught in specific class and programs rather than as just an add-on to an advanced statistics or business class. That stated, I have mixed feelings about analytics offerings at Universities.

I find that most provide a good theoretical foundation in the algorithms, but are weak in describing the entire process in a business context. For those building predictive models, the model-building stage nearly always takes much less time than getting the data ready for modeling and reporting results. These are cross-discipline tasks, requiring some understanding of the database world and the business world for us to define the target variable(s) properly and clean up the data so that the predictive analytics algorithms to work well.

The programs that have a practicum of some kind are the most useful, in my opinion. There are some certificate programs out there that have more of a business-oriented framework, and the NC State program builds an internship into the degree itself. These are positive steps in the field that I’m sure will continue as predictive analytics graduates become more in demand.

Biography-

DEAN ABBOTT is President of Abbott Analytics in San Diego, California. Mr. Abbott has over 21 years of experience applying advanced data mining, data preparation, and data visualization methods in real-world data intensive problems, including fraud detection, response modeling, survey analysis, planned giving, predictive toxicology, signal process, and missile guidance. In addition, he has developed and evaluated algorithms for use in commercial data mining and pattern recognition products, including polynomial networks, neural networks, radial basis functions, and clustering algorithms, and has consulted with data mining software companies to provide critiques and assessments of their current features and future enhancements.

Mr. Abbott is a seasoned instructor, having taught a wide range of data mining tutorials and seminars for a decade to audiences of up to 400, including DAMA, KDD, AAAI, and IEEE conferences. He is the instructor of well-regarded data mining courses, explaining concepts in language readily understood by a wide range of audiences, including analytics novices, data analysts, statisticians, and business professionals. Mr. Abbott also has taught both applied and hands-on data mining courses for major software vendors, including Clementine (SPSS, an IBM Company), Affinium Model (Unica Corporation), Statistica (StatSoft, Inc.), S-Plus and Insightful Miner (Insightful Corporation), Enterprise Miner (SAS), Tibco Spitfire Miner (Tibco), and CART (Salford Systems).

Interview Stephanie McReynolds Director Product Marketing, AsterData

Here is an interview with Stephanie McReynolds who works as as Director of Product Marketing with AsterData. I asked her a couple of questions about the new product releases from AsterData in analytics and MapReduce.

Ajay – How does the new Eclipse Plugin help people who are already working with huge datasets but are new to AsterData’s platform?

Stephanie- Aster Data Developer Express, our new SQL-MapReduce development plug-in for Eclipse, makes MapReduce applications easy to develop. With Aster Data Developer Express, developers can develop, test and deploy a complete SQL-MapReduce application in under an hour. This is a significant increase in productivity over the traditional analytic application development process for Big Data applications, which requires significant time coding applications in low-level code and testing applications on sample data.

Ajay – What are the various analytical functions that are introduced by you recently- list say the top 10.

Stephanie- At Aster Data, we have an intense focus on making the development process easier for SQL-MapReduce applications. Aster Developer Express is a part of this initiative, as is the release of pre-defined analytic functions. We recently launched both a suite of analytic modules and a partnership program dedicated to delivering pre-defined analytic functions for the Aster Data nCluster platform. Pre-defined analytic functions delivered by Aster Data’s engineering team are delivered as modules within the Aster Data Analytic Foundation offering and include analytics in the areas of pattern matching, clustering, statistics, and text analysis– just to name a few areas. Partners like Fuzzy Logix and Cobi Systems are extending this library by delivering industry-focused analytics like Monte Carlo Simulations for Financial Services and geospatial analytics for Public Sector– to give you a few examples.

Ajay – So okay I want to do a K Means Cluster on say a million rows (and say 200 columns) using the Aster method. How do I go about it using the new plug-in as well as your product.

Stephanie- The power of the Aster Data environment for analytic application development is in SQL-MapReduce. SQL is a powerful analytic query standard because it is a declarative language. MapReduce is a powerful programming framework because it can support high performance parallel processing of Big Data and extreme expressiveness, by supporting a wide variety of programming languages, including Java, C/C#/C++, .Net, Python, etc. Aster Data has taken the performance and expressiveness of MapReduce and combined it with the familiar declarativeness of SQL. This unique combination ensures that anyone who knows standard SQL can access advanced analytic functions programmed for Big Data analysis using MapReduce techniques.

kMeans is a good example of an analytic function that we pre-package for developers as part of the Aster Data Analytic Foundation. What does that mean? It means that the MapReduce portion of the development cycle has been completed for you. Each pre-packaged Aster Data function can be called using standard SQL, and executes the defined analytic in a fully parallelized manner in the Aster Data database using MapReduce techniques. The result? High performance analytics with the expressiveness of low-level languages accessed through declarative SQL.

Ajay – I see an an increasing focus on Analytics. Is this part of your product strategy and how do you see yourself competing with pure analytics vendors.

Stephanie – Aster Data is an infrastructure provider. Our core product is a massively parallel processing database called nCluster that performs at or beyond the capabilities of any other analytic database in the market today. We developed our analytics strategy as a response to demand from our customers who were looking beyond the price/performance wars being fought today and wanted support for richer analytics from their database provider. Aster Data analytics are delivered in nCluster to enable analytic applications that are not possible in more traditional database architectures.

Ajay – Name some recent case studies in Analytics of implementation of MR-SQL with Analytical functions

Stephanie – There are three new classes of applications that Aster Data Express and Aster Analytic Foundation support: iterative analytics, prediction and optimization, and ad hoc analysis.

Aster Data customers are uncovering critical business patterns in Big Data by performing hypothesis-driven, iterative analytics. They are exploring interactively massive volumes of data—terabytes to petabytes—in a top-down deductive manner. ComScore, an Aster Data customer that performs website experience analysis is a good example of an Aster Data customer performing this type of analysis.

Other Aster Data customers are building applications for prediction and optimization that discover trends, patterns, and outliers in data sets. Examples of these types of applications are propensity to churn in telecommunications, proactive product and service recommendations in retail, and pricing and retention strategies in financial services. Full Tilt Poker, who is using Aster Data for fraud prevention is a good example of a customer in this space.

The final class of application that I would like to highlight is ad hoc analysis. Examples of ad hoc analysis that can be performed includes social network analysis, advanced click stream analysis, graph analysis, cluster analysis and a wide variety of mathematical, trigonometry, and statistical functions. LinkedIn, whose analysts and data scientists have access to all of their customer data in Aster Data are a good example of a customer using the system in this manner.

While Aster Data customers are using nCluster in a number of other ways, these three new classes of applications are areas in which we are seeing particularly innovative application development.

Biography-

Stephanie McReynolds is Director of Product Marketing at Aster Data, where she is an evangelist for Aster Data’s massively parallel data-analytics server product. Stephanie has over a decade of experience in product management and marketing for business intelligence, data warehouse, and complex event processing products at companies such as Oracle, Peoplesoft, and Business Objects. She holds both a master’s and undergraduate degree from Stanford University.

Q&A with David Smith, Revolution Analytics.

Here’s a group of questions and answers that David Smith of Revolution Analytics was kind enough to answer post the launch of the new R Package which integrates Hadoop and R-                         RevoScaleR

Ajay- How does RevoScaleR work from a technical viewpoint in terms of Hadoop integration?

David-The point isn’t that there’s a deep technical integration between Revolution R and Hadoop, rather that we see them as complementary (not competing) technologies. Hadoop is amazing at reliably (if slowly) processing huge volumes of distributed data; the RevoScaleR package complements Hadoop by providing statistical algorithms to analyze the data processed by Hadoop. The analogy I use is to compare a freight train with a race car: use Hadoop to slog through a distributed data set and use Map/Reduce to output an aggregated, rectangular data file; then use RevoScaleR to perform statistical analysis on the processed data (and use the speed of RevolScaleR to iterate through many model options to find the best one).

Ajay- How is it different from MapReduce and R Hipe– existing R Hadoop packages?
David- They’re complementary. In fact, we’ll be publishing a white paper soon by Saptarshi Guha, author of the Rhipe R/Hadoop integration, showing how he uses Hadoop to process vast volumes of packet-level VOIP data to identify call time/duration from the packets, and then do a regression on the table of calls using RevoScaleR. There’s a little more detail in this blog post: http://blog.revolutionanalytics.com/2010/08/announcing-big-data-for-revolution-r.html
Ajay- Is it going to be proprietary, free or licensable (open source)?
David- RevoScaleR is a proprietary package, available to paid subscribers (or free to academics) with Revolution R Enterprise. (If you haven’t seen it, you might be interested in this Q&A I did with Matt Shotwell: http://biostatmatt.com/archives/533 )
Ajay- Any existing client case studies for Terabyte level analysis using R.
David- The VOIP example above gets close, but most of the case studies we’ve seen in beta testing have been in the 10’s to 100’s of Gb range. We’ve tested RevoScaleR on larger data sets internally, but we’re eager to hear about real-life use cases in the terabyte range.
Ajay- How can I use RevoScaleR on my dual chip Win Intel laptop for say 5 gb of data.
David- One of the great things about RevoScaleR is that it’s designed to work on commodity hardware like a dual-core laptop. You won’t be constrained by the limited RAM available, and the parallel processing algorithms will make use of all cores available to speed up the analysis even further. There’s an example in this white paper (http://info.revolutionanalytics.com/bigdata.html) of doing linear regression on 13Gb of data on a simple dual-core laptop in less than 5 seconds.
AJ-Thanks to David Smith, for this fast response and wishing him, Saptarshi Guha Dr Norman Nie and the rest of guys at Revolution Analytics a congratulations for this new product launch.

Interview : R For Stata Users

Here is an interview with Bob Muenchen , author of ” R For SAS and SPSS Users” and co-author with Joe Hilbe of ” R for Stata Users”.

Describe your new book R for Stata Users and how it is helpful to users.

Stata is a marvelous software package. Its syntax is well designed, concise and easy to learn. However R offers Stata users advantages in two key areas: education and analysis.

Regarding education, R is quickly becoming the universal language of data analysis. Books, journal articles and conference talks often include R code because it’s a powerful language and everyone can run it. So R has become an essential part of the education of data analysts, statisticians and data miners.

Regarding analysis, R offers a vast array of methods that R users have written. Next to R, Stata probably has more useful user-written add-ons than any other analytic software. The Statistical Software Components collection at Boston College’s Department of Economics is quite impressive (http://ideas.repec.org/s/boc/bocode.html), containing hundreds of useful additions to Stata. However, R’s collection of add-ons currently contains 3,680 packages, and more are being added every week.  Stata users can access these fairly easily by doing their data management in Stata, saving a Stata format data set, importing it into R and running what they need. Working this way, the R program may only be a few lines long.

In our book, the section “Getting Started Quickly” outlines the most essential 50 pages for Stata users to read to work in this way. Of course the book covers all the basics of R, should the reader wish to learn more. Being enthusiastic programmers, we’ll be surprised if they don’t want to read it all.

There are many good books on R, but as I learned the language I found myself constantly wondering how each concept related to the packages I already knew. So in this book we describe R first using Stata terminology and then using R terminology. For example, when introducing the R data frame, we start out saying that it’s just like a Stata data set: a rectangular set of variables that are usually numeric with perhaps one or two character variables. Then we move on to say that R also considers it a special type of “list” which constrains all its “components” to be equal in length. That then leads into entirely new territory.

The entire book is laid out to make learning easy for Stata users. The names used in the table of contents are Stata-based. The reader may look up how to “collapse” a data set by a grouping variable to find that one way R can do that is with the mysteriously named “tapply” function. A Stata user would never have guessed to look for that name

. When reading from cover-to-cover that may not be that big of a deal, but as you go back to look things up it’s a huge time saver. The index is similar in that you can look every subject up by its Stata name to find the R function or vice versa. People see me with both my books near my desk and chuckle that they’re there for advertising. Not true! I look details up in them all the time.

I didn’t have enough in-depth knowledge of Stata to pull this off by myself, so I was pleased to get Joe Hilbe as a co-author. Joe is a giant in the world of Stata. He wrote several of the Stata commands that ship with the product including glm, logistic and manova. He was also the first editor of the Stata Technical Bulletin, which later turned into the Stata Journal. I have followed his work from his days as editor of the statistical software reviews section in the journal The American Statistician. There he not only edited but also wrote many of the reviews which I thoroughly enjoyed reading over the years. If you don’t already know Stata, his review of Stata 9.0 is still good reading (November 1, 2005, 59(4): 335-348).

Describe the relationship between Stata and R and how it is the same or different from SAS / SPSS and R.

This is a very interesting question. I pointed out in R for SAS and SPSS Users that SAS and SPSS are structured very similarly while R is totally different. Stata, on the other hand, has many similarities to R. Here I’ll quote directly from the book:

• Both include rich programming languages designed for writing new analytic methods, not just a set of prewritten commands.

• Both contain extensive sets of analytic commands written in their own languages.

• The pre-written commands in R, and most in Stata, are visible and open for you to change as you please.

• Both save command or function output in a form you can easily use as input to further analysis.

• Both do modeling in a way that allows you to readily apply your models for tasks such as making predictions on new data sets. Stata calls these postestimation commands and R calls them extractor functions.

• In both, when you write a new command, it is on an equal footing with commands written by the developers. There are no additional “Developer’s Kits” to purchase.

• Both have legions of devoted users who have written numerous extensions and who continue to add the latest methods many years before their competitors.

• Both can search the Internet for user-written commands and download them automatically to extend their capabilities quickly and easily.

• Both hold their data in the computer’s main memory, offering speed but limiting the amount of data they can handle.

Can the book be used by a R user for learning Stata

That’s certainly not ideal. The sections that describe the relationship between the two languages would be good to know and all the example programs are presented in both R and Stata form. However, we spend very little time explaining the Stata programs while going into the R ones step by step. That said, I continue to receive e-mails from R experts who learned SAS or SPSS from R for SAS and SPSS Users, so it is possible.

Describe the response to your earlier work R for SAS and SPSS users and if any new editions is forthcoming.

I am very pleased with the reviews for R for SAS and SPSS Users. You can read them all, even the one really bad one, at http://r4stats.com. We incorporated all the advice from those reviews into R for Stata Users, so we hope that this book will be well received too.

In the first book, Appendix B: A Comparison of SAS and SPSS Products with R Packages and Functions has been particularly popular for helping people find the R packages they need. As it expanded, I moved it to the web site: http://r4stats.com/add-on-modules. All three packages are changing so fast that I sometimes edit that table several times per week!
The second edition to R for SAS and SPSS Users is due to the publisher by the end of February, so it will be in the bookstores by sometime in April 2011, if all goes as planned. I have a list of thirty new topics to add, and those won’t all fit. I have some tough decisions to make!
On a personal note, Ajay, it was a pleasure getting to meet you when you came to UT, especially our chats on the current state of the analytics market and where it might be headed. I love the fact that the Internet allows people to meet across thousands of miles. I look forward to reading more on DecisionStats!
About –

Bob Muenchen has twenty-eight years of experience consulting, managing and teaching in a variety of complex, research oriented computing environments. You can read about him here http://web.utk.edu/~muenchen/RobertMuenchenResume.html

Interview Jeanne Harris Co-Author -Analytics at Work and Competing with Analytics

Here is an interview with Jeanne Harris, Executive Research Fellow and a Senior Executive at the Accenture Institute for High Performance and co-author of “Analytics at Work.”


Ajay- Describe your background in analytics and strategy

Jeanne- I’ve been involved in strategy and analytics since the mid 1980s, when I worked as part of a project that resulted in a Harvard Business Review article with Michael Porter entitled Information for Competitive Advantage. Since that time, I have led Accenture’s business intelligence, analytics, performance management, knowledge management, and data warehousing consulting practices. I have worked extensively with clients seeking to improve their managerial information, decision-making, analytical and knowledge management capabilities.

I am currently an Executive Research Fellow and a Senior Executive at the Accenture Institute for High Performance in Chicago. I lead the Institute’s global research agenda in the areas of information, technology, analytics and talent.

My research into building an enterprise analytical capability began in 1999, which led to an article called Data to Knowledge to Results: Building an Analytical Capability in California Management Review the following year.

In 2007, I co-authored “Competing on Analytics” with Tom Davenport, which argued that successful businesses must start making decisions based on data, not instinct.

Ajay- How is “Analytics at Work an extension of your previous work? How is it different and distinct?

Jeanne- In Competing on Analytics we argued that there are big rewards for organizations that embrace fact-based decision-making. High performers are 5 times as likely to view analytical capabilities as a key element of their strategy. Across the board, high performance is associated with more extensive and sophisticated use of analytical capabilities. Companies like USbased Progressive Insurance which build their corporate culture around analytics have a real and sustainable competitive advantage.

As I spoke to clients, I realized that for every company that viewed analytics as their new core competitive strategy, there were many more who just wanted pragmatic advice on how to use analytics to make smarter business decisions and achieve better results.

If an analytical organization could be established by executive fiat we would not have needed to write another book. But like anything worthwhile, putting analytics to work takes effort and thought.

Fortunately, every organization, regardless of its current analytical capability, can benefit by becoming more analytical, gaining better insights, making better decisions and translating that into improved business performance. In Analytics at Work, we describe the elements an organization needs to establish a sustainable, robust, enterprise-wide analytical capability. This book contains a lot of pragmatic “how to” advice gleaned from our research and Accenture’s many client experiences.

Ajay- Do you see Analytics as a tool for cost cutting especially in the present economic scenario?

Jeanne- Certainly, analytics are an important tool for cutting costs and improving efficiency. Optimization techniques and predictive models can anticipate market shifts, enable companies to move quickly to slash costs and eliminate waste. But there are other reasons analytics are more important than ever:

Manage risk. More precise metrics and risk management models will enable managers to make better decisions, reduce risk and monitor changing business conditions more effectively.

Know what is really working. Rigorous testing and monitoring of metrics can establish whether your actions are really making desired changes in your business or not.

Leverage existing investments (in IT and information) to get more insight, faster execution and more business value in business processes.

Invest to emerge stronger as business conditions improve. High performers take advantage of downturns to retool, gain insights into new market dynamics and invest so that they are prepared for the upturn. Analytics give executives insight into the dynamics of their business and how shifts influence business performance.

Identify and seize new opportunities for competitive advantage and differentiation.

Ajay- What are some analytics vendors and what do you think are their various differentiating points among each other?

Jeanne- Certain tools are ideally suited for some situations and not others.  It’s the same with analytics vendors.  At Accenture, we focus on matching the right tool with the demands of the situation, regardless of the vendor.

Ajay- What areas in an organization do you see Analytics applied the most and where do you think it is currently applied the least?

Jeanne- According to new Accenture research, two-thirds of senior managers in all areas of organizations the US and UK say their top long-term objective is to develop the ability to model and predict behavior, actions and decisions to the point where individual decisions and offers can be made in real time based on the analysis at hand[1]. Analytics is most frequently used in customer facing applications to generate customer insights, although in certain industries such as transportation it is also commonly used for logistics and yield management.  Right now, analytics is probably most infrequently used in HR, although talent management analytics is a very hot topic.

Ajay- What is the best case study you can think where Analytics made a difference? And name a case study where it back fired.

Jeanne- It is hard to pick one favorite case study, when we wrote a whole book full of them!  Harrahs is a great case study of a company that uses analytics for competitive differentiation.  Netflix is another company that has built its entire business model around an algorithm. Of course Google is essentially an analytical company too. What is of note is that during previous downturns, companies that thrived used data-derived insights made by informed decision makers to produce lasting competitive advantage.

In the book, we discuss the use and misuse of analytics as it relates to the global financial crisis, which we found to be a fascinating case study.

Ajay- Universities now offer Master in Analytics? What are your thoughts for an Ideal MS (Analytics) curriculum?

Jeanne- Yes, there are several universities around the world with degrees such as:

· Applied Analytics

· Applied Mathematics

· Econometrics

· Statistics

· Informatics

· Operations Research

Examples of universities in the US offering these programs include:

· Kellogg School of Management, Northwestern University

· Miami University (in Ohio)

· Central Michigan University,

· Villanova University

· North Carolina State University


Obviously analysts require extensive expertise in quantitative methods and modeling technology. But this is just the starting point. To be effective in business they also require industry and business process knowledge. They need to understand enough about IT to understand how analytics fit into the overall IT infrastructure. They need excellent written and verbal communications skills so their insights are understood.  Analysts must also have collaboration skills to work with business managers to apply insights and achieve better business performance. So relationship and consultative skills are critical. As analytics become more central to the organization, more analysts need to know how to lead, coach and develop other professional analysts.  They also will need to help coach and develop the analytical acumen of other information workers in the organizations.

Leading academic institutions are building more analytics into their business curriculums.  And the best analytics degree programs are adding more training to develop industry & business process acumen, as well as relationship, communication & consultative skills.

Biography-

Jeanne G. Harris has worked with analytics, decision support and business intelligence at Accenture for over 23 years and headed the consulting practice in that area for the firm for several years. She is now Executive Research Fellow and Director of Research for the Accenture Institute for High Performance Business. She has been co Author with Tom Davenport for the seminal path breaking book Competing with Analytics and now on Analytics at Work.

Here is a link to the new book-having read some of it (and still reading it) I recommend it highly as a practical actionable guide.






Interview Hadley Wickham R Project Data Visualization Guru

Here is an interview with the genius behind many of the R Project’s Graphical Packages- Dr Hadley Wickham.

Ajay– Describe your pivotal moments in your career in science from a high school science student leading up till here as a professor.

Hadley– After high school I went to medical school. After three years and a degree I realised that I really didn’t want to be a doctor so I went back to two topics that I had enjoyed in high school: programming and statistics. I really loved the practice of statistics, digging in to data and figuring out what was going on, but didn’t find the theoretical study of computer science so interesting. That spurred me to get my MSc in Statistics and then to apply to graduate school in the US.

The next pivotal moment occurred when I accepted a PhD offer from Iowa State. I applied to ISU because I was interested in multivariate data and visualisation and heard that the department had a focus on those two topics, through the presence of Di Cook and Heike Hofmann. I couldn’t have made a better choice – Di and Heike were fantastic major professors and I loved the combination of data analysis, software development and teaching that they practiced. That in turn lead to my decision to look for a job in academia.

Ajay– You have created almost ten R Packages as per your website http://had.co.nz/. Do you think there is a potential for a commercial version for a data visualization R software? What are your views on the current commercial R packages?

Hadley– I think there’s a lot of opportunity for the development of user-friendly data visualisation tools based on R. These would be great for novices and casual users, wrapping up the complexities of the command-line into an approachable GUI – see Jeroen Oom’s http://yeroon.net/ggplot2 for an example.

Developing these tools is not something that is part of my research endeavors. I’m a strong believer in the power of computational thinking and the advantages that programming (instead of pointing and clicking) brings. Creating visualizations with code makes reproducibility, automation and communication much easier – all of which are important for good science.

Commercial packages fill a hole in the R ecosystem. They make R more palatable to enterprise customers with guaranteed support, and they can offer a way to funnel some of that money back into the R ecosystem. I am optimistic about the future of these endeavors.

Ajay– Clearly with your interest in graphics, you seem to favor visual solutions. Do you also feel that R Project could benefit from better R GUIs or GUIs for specific packages?

Hadley– See above – while GUIs are useful for novices and casual users, they are not a good fit for the demands of science. In my opinion, what R needs more are better tutorials and documentation so that people don’t need to use GUIs. I’m very excited about the new dynamic html help system – I think it has huge potential for making R easier to use.

Compared to other programming languages, R currently lacks good online (free) introductions for new users. I think this is because many R developers are academics and the incentives aren’t there to make freely available documentation. Personally, I would love to make (e.g.) the ggplot2 book available openly available under a creative common license, but I would receive no academic credit for doing so.

Ajay– Describe the top 3-5 principles which you have explained in your book, ggplot2: Elegant graphics for data analysis). What are other important topics that you cover in the book?

Hadley– The ggplot2 book gives you the theory to understand the construction of almost any statistical graphic. With this theory in hand, you are much better equipped to create visualisations that are tailored to the exact problem you face, rather than having to rely on a canned set of pre-made graphics.

The book is divided into sections based on the components of this theory, called the layered grammar of graphics, which is based on Lee Wilkinson’s excellent “The Grammar of Graphics”. It’s quite possible to use ggplot2 without understanding these components, but the better you understand, the better your ability to critique and improve your graphics.

Ajay– What are the five best tutorials that you would recommend for students learning data visualization in R? As a data visualization person do you feel that R could do with more video tutorials?

Hadley– If you want to learn about ggplot2, I’d highly recommend the following two resources:

* The Learning R blog, http://learnr.wordpress.com/
* The ggplot2 mailing list, http://groups.google.com/group/ggplot2

For general data management and manipulation (often needed before you can visualise data) and visualisation using base graphics, Quick-R (http://www.statmethods.net/) is very useful.

Local useR groups can be an excellent if you live nearby. Lately, the bay area (http://www.meetup.com/R-Users/) and the New York (http://www.meetup.com/nyhackr/) useR groups have had some excellent speakers on visualisation, and they often post slides and videos online.

Ajay– What are your personal hobbies? How important are work-life balance and serendipity for creative, scientific and academic people?

Hadley– When I’m not working, I enjoy reading and cooking. I find it’s important to take regular breaks from my research and software development work. When I come back I’m usually bursting with new ideas. Two resources that have helped shape my views on creativity and productivity are Elizabeth’s Gilbert TED talk on nurturing creativity (http://www.ted.com/index.php/talks/elizabeth_gilbert_on_genius.html) and
“The Creative Habit: Learn It and Use It for Life”, by Twyla Twarp (http://amzn.com/0743235266). I highly recommend both of them.

Dr Wickham’s impressive biography can be best seen at http://had.co.nz/

Audio Interviews -Dr. Colleen McCue National Security Expert

During times of National Insecurity, I remembered and dug up an interview at SAS Data Mining 2009 in which Dr Colleen McCue talks of how working with SAS and Data Mining can help.

[tweetmeme=”Decisionstats”]

Interview Dr Colleen

Dr. Colleen McCue, President & CEO of MC2 Solutions, brings over 18 years of experience in advanced analytics and the development of actionable solutions to complex information processing problems in the applied public safety and national security environment. Dr. McCue’s areas of expertise include the application of data mining and predictive analytics to the analysis of crime and intelligence data, with particular emphasis on deployment strategies, surveillance detection, threat and vulnerability assessment and the behavioral analysis of violent crime. Her experience as the Crime Analysis Program Manager for the Richmond, Virginia Police Department and pioneering work in operationally relevant and actionable analytical strategies has been used to support a wide array of national security and public safety clients. Dr. McCue has authored a book on the use of advanced analytics in the applied public safety environment entitled, “Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis.” Dr. McCue earned her undergraduate degree from the University of Illinois at Chicago and Doctorate in Psychology from Dartmouth College, and completed a five-year postdoctoral fellowship in the Department of Pharmacology & Toxicology at the Medical College of Virginia, Virginia Commonwealth University. Dr. McCue can be reached atcolleen@mc2solutions.net or 804.894.1154.

MC2 Solutions, LLC specializes in the provision of public safety and national security research, analysis and training.

Watch Colleen’s Webinar, Why Just Count Crime When You Can Prevent It?