Talking on Big Data Analytics

I am going  being sponsored to a Government of India sponsored talk on Big Data Analytics at Bangalore on Friday the 13 th of July. If you are in Bangalore, India you may drop in for a dekko. Schedule and Abstracts (i am on page 7 out 9) .

Your tax payer money is hard at work- (hassi majak only if you are a desi. hassi to fassi.)

13 July 2012 (9.30 – 11.00 & 11.30 – 1.00)
Big Data Big Analytics
The talk will showcase using open source technologies in statistical computing for big data, namely the R programming language and its use cases in big data analysis. It will review case studies using the Amazon Cloud, custom packages in R for Big Data, tools like Revolution Analytics RevoScaleR package, as well as the newly launched SAP Hana used with R. We will also review Oracle R Enterprise. In addition we will show some case studies using BigML.com (using Clojure) , and approaches using PiCloud. In addition it will showcase some of Google APIs for Big Data Analysis.

Lastly we will talk on social media analysis ,national security use cases (i.e. cyber war) and privacy hazards of big data analytics.

Schedule

View more presentations from Ajay Ohri.
Abstracts

View more documents from Ajay Ohri.

 

Interview Jason Kuo SAP Analytics #Rstats

Here is an interview with Jason Kuo who works with SAP Analytics as Group Solutions Marketing Manager. Jason answers questions on SAP Analytics and it’s increasing involvement with R statistical language.

Ajay- What made you choose R as the language to tie important parts of your technology platform like HANA and SAP Predictive Analysis. Did you consider other languages like Julia or Python.

Jason- It’s the most popular. Over 50% of the statisticians and data analysts use R. With 3,500+ algorithms its arguably the most comprehensive statistical analysis language. That said,we are not closing the door on others.

Ajay- When did you first start getting interested in R as an analytics platform?

Jason- SAP has been tracking R for 5+ years. With R’s explosive growth over the last year or two, it made sense for us to dramatically increase our investment in R.

Ajay- Can we expect SAP to give back to the R community like Google and Revolution Analytics does- by sponsoring Package development or sponsoring user meets and conferences?

Will we see SAP’s R HANA package in this year’s R conference User 2012 in Nashville

Jason- Yes. We plan to provide a specific driver for HANA tables for input of the data to native R. This planned for end of 2012. We’ll then review our event strategy. SAP has been a sponsor of Predictive Analytics World for several years and was indeed a founding sponsor. We may be attending the year’s R conference in Nashville.

Ajay- What has been some of the initial customer feedback to your analytics expansion and offerings. 

Jason- We have completed two very successful Pilots of the R Integration for HANA with two of SAP’s largest customers.

About-

Jason has over 15 years of BI and Data Warehousing industry experience. Having worked at Oracle, Business Objects, and now SAP, Jason has been involved in numerous technical marketing roles involving performance management dashboards, information management, text analysis, predictive analytics, and now big data. He has a bachelor’s of science in operations research from the University of Michigan.

 

Interview Alvaro Tejada Galindo, SAP Labs Montreal, Using SAP Hana with #Rstats

Here is a brief interview with Alvaro Tejada Galindo aka Blag who is a developer working with SAP Hana and R at SAP Labs, Montreal. SAP Hana is SAP’s latest offering in BI , it’s also a database and a computing environment , and using R and HANA together on the cloud can give major productivity gains in terms of both speed and analytical ability, as per preliminary use cases.

Ajay- Describe how you got involved with databases and R language.
Blag-  I used to work as an ABAP Consultant for 11 years, but also been involved with programming since the last 13 years, so I was in touch with SQLServer, Oracle, MySQL and SQLite. When I joined SAP, I heard that SAP HANA was going to use an statistical programming language called “R”. The next day I started my “R” learning.

Ajay- What made the R language a fit for SAP HANA. Did you consider other languages? What is your view on Julia/Python/SPSS/SAS/Matlab languages

Blag- I think “R” is a must for SAP HANA. As the fastest database in the market, we needed a language that could help us shape the data in the best possible way. “R” filled that purpose very well. Right now, “R” is not the only language as “L” can be used as well (http://wiki.tcl.tk/17068) …not forgetting “SQLScript” which is our own version of SQL (http://goo.gl/x3bwh) . I have to admit that I tried Julia, but couldn’t manage to make it work. Regarding Python, it’s an interesting question as I’m going to blog about Python and SAP HANA soon. About Matlab, SPSS and SAS I haven’t used them, so I got nothing to say there.

Ajay- What is your view on some of the limitations of R that can be overcome with using it with SAP HANA.

Blag-  I think mostly the ability of SAP HANA to work with big data. Again, SAP HANA and “R” can work very nicely together and achieve things that weren’t possible before.

Ajay-  Have you considered other vendors of R including working with RStudio, Revolution Analytics, and even Oracle R Enterprise.

Blag-  I’m not really part of the SAP HANA or the R groups inside SAP, so I can’t really comment on that. I can only say that I use RStudio every time I need to do something with R. Regarding Oracle…I don’t think so…but they can use any of our products whenever they want.

Ajay- Do you have a case study on an actual usage of R with SAP HANA that led to great results.

Blag-   Right now the use of “R” and SAP HANA is very preliminary, I don’t think many people has start working on it…but as an example that it works, you can check this awesome blog entry from my friend Jitender Aswani “Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps “ (http://allthingsr.blogspot.com/#!/2012/04/big-data-r-and-hana-analyze-200-million.html)

Ajay- Does your group in SAP plan to give to the R ecosystem by attending conferences like UseR 2012, sponsoring meets, or package development etc

Blag- My group is in charge of everything developers, so sure, we’re planning to get more in touch with R developers and their ecosystem. Not sure how we’re going to deal with it, but at least I’m going to get myself involved in the Montreal R Group.

 

About-

http://scn.sap.com/people/alvaro.tejadagalindo3

Name: Alvaro Tejada Galindo
Email: a.tejada.galindo@sap.com
Profession: Development
Company: SAP Canada Labs-Montreal
Town/City: Montreal
Country: Canada
Instant Messaging Type: Twitter
Instant Messaging ID: Blag
Personal URL: http://blagrants.blogspot.com
Professional Blog URL: http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/u/252210910
My Relation to SAP: employee
Short Bio: Development Expert for the Technology Innovation and Developer Experience team.Used to be an ABAP Consultant for the last 11 years. Addicted to programming since 1997.

http://www.sap.com/solutions/technology/in-memory-computing-platform/hana/overview/index.epx

and from

http://en.wikipedia.org/wiki/SAP_HANA

SAP HANA is SAP AG’s implementation of in-memory database technology. There are four components within the software group:[1]

  • SAP HANA DB (or HANA DB) refers to the database technology itself,
  • SAP HANA Studio refers to the suite of tools provided by SAP for modeling,
  • SAP HANA Appliance refers to HANA DB as delivered on partner certified hardware (see below) as anappliance. It also includes the modeling tools from HANA Studio as well replication and data transformation tools to move data into HANA DB,[2]
  • SAP HANA Application Cloud refers to the cloud based infrastructure for delivery of applications (typically existing SAP applications rewritten to run on HANA).

R is integrated in HANA DB via TCP/IP. HANA uses SQL-SHM, a shared memory-based data exchange to incorporate R’s vertical data structure. HANA also introduces R scripts equivalent to native database operations like join or aggregation.[20] HANA developers can write R scripts in SQL and the types are automatically converted in HANA. R scripts can be invoked with HANA tables as both input and output in the SQLScript. R environments need to be deployed to use R within SQLScript

More blog posts on using SAP and R together

Dealing with R and HANA

http://scn.sap.com/community/in-memory-business-data-management/blog/2011/11/28/dealing-with-r-and-hana
R meets HANA

http://scn.sap.com/community/in-memory-business-data-management/blog/2012/01/29/r-meets-hana

HANA meets R

http://scn.sap.com/community/in-memory-business-data-management/blog/2012/01/26/hana-meets-r
When SAP HANA met R – First kiss

http://scn.sap.com/community/developer-center/hana/blog/2012/05/21/when-sap-hana-met-r–first-kiss

 

Using RODBC with SAP HANA DB-

SAP HANA: My experiences on using SAP HANA with R

http://scn.sap.com/community/in-memory-business-data-management/blog/2012/02/21/sap-hana-my-experiences-on-using-sap-hana-with-r

and of course the blog that started it all-

Jitender Aswani’s http://allthingsr.blogspot.in/

 

 

Jill Dyche on 2012

In part 3 of the series for predictions for 2012, here is Jill Dyche, Baseline Consulting/DataFlux.

Part 2 was Timo Elliot, SAP at http://www.decisionstats.com/timo-elliott-on-2012/ and Part 1 was Jim Kobielus, Forrester at http://www.decisionstats.com/jim-kobielus-on-2012/

Ajay: What are the top trends you saw happening in 2011?

 

Well, I hate to say I saw them coming, but I did. A lot of managers committed some pretty predictable mistakes in 2011. Here are a few we witnessed in 2011 live and up close:

 

1.       In the spirit of “size matters,” data warehouse teams continued to trumpet the volumes of stored data on their enterprise data warehouses. But a peek under the covers of these warehouses reveals that the data isn’t integrated. Essentially this means a variety of heterogeneous virtual data marts co-located on a single server. Neat. Big. Maybe even worthy of a magazine article about how many petabytes you’ve got. But it’s not efficient, and hardly the example of data standardization and re-use that everyone expects from analytical platforms these days.

 

2.       Development teams still didn’t factor data integration and provisioning into their project plans in 2011. So we saw multiple projects spawn duplicate efforts around data profiling, cleansing, and standardization, not to mention conflicting policies and business rules for the same information. Bummer, since IT managers should know better by now. The problem is that no one owns the problem. Which brings me to the next mistake…

 

3.       No one’s accountable for data governance. Yeah, there’s a council. And they meet. And they talk. Sometimes there’s lunch. And then nothing happens because no one’s really rewarded—or penalized for that matter—on data quality improvements or new policies. And so the reports spewing from the data mart are still fraught and no one trusts the resulting decisions.

 

But all is not lost since we’re seeing some encouraging signs already in 2012. And yes, I’d classify some of them as bona-fide trends.

 

Ajay: What are some of those trends?

 

Job descriptions for data stewards, data architects, Chief Data Officers, and other information-enabling roles are becoming crisper, and the KPIs for these roles are becoming more specific. Data management organizations are being divorced from specific lines of business and from IT, becoming specialty organizations—okay, COEs if you must—in their own rights. The value proposition for master data management now includes not just the reconciliation of heterogeneous data elements but the support of key business strategies. And C-level executives are holding the data people accountable for improving speed to market and driving down costs—not just delivering cleaner data. In short, data is becoming a business enabler. Which, I have to just say editorially, is better late than never!

 

Ajay: Anything surprise you, Jill?

 

I have to say that Obama mentioning data management in his State of the Union speech was an unexpected but pretty powerful endorsement of the importance of information in both the private and public sector.

 

I’m also sort of surprised that data governance isn’t being driven more frequently by the need for internal and external privacy policies. Our clients are constantly asking us about how to tightly-couple privacy policies into their applications and data sources. The need to protect PCI data and other highly-sensitive data elements has made executives twitchy. But they’re still not linking that need to data governance.

 

I should also mention that I’ve been impressed with the people who call me who’ve had their “aha!” moment and realize that data transcends analytic systems. It’s operational, it’s pervasive, and it’s dynamic. I figured this epiphany would happen in a few years once data quality tools became a commodity (they’re far from it). But it’s happening now. And that’s good for all types of businesses.

 

About-

Jill Dyché has written three books and numerous articles on the business value of information technology. She advises clients and executive teams on leveraging technology and information to enable strategic business initiatives. Last year her company Baseline Consulting was acquired by DataFlux Corporation, where she is currently Vice President of Thought Leadership. Find her blog posts on www.dataroundtable.com.

Timo Elliott on 2012

Continuing the DecisionStats series on  trends for 2012, Timo Elliott , Technology Evangelist  at SAP Business Objects, looks at the predictions he made in the beginning of  2011 and follows up with the things that surprised him in 2011, and what he foresees in 2012.

You can read last year’s predictions by Mr Elliott at http://www.decisionstats.com/brief-interview-timo-elliott/

Timo- Here are my comments on the “top three analytics trends” predictions I made last year:

(1) Analytics, reinvented. New DW techniques make it possible to do sub-second, interactive analytics directly against row-level operational data. Now BI processes and interfaces need to be rethought and redesigned to make best use of this — notably by blurring the distinctions between the “design” and “consumption” phases of BI.

I spent most of 2011 talking about this theme at various conferences: how existing BI technology israpidly becoming obsolete and how the changes are akin to the move from film to digital photography. Technology that has been around for many years (in-memory, column stores, datawarehouse appliances, etc.) came together to create exciting new opportunities and even generally-skeptical industry analysts put out press releases such as “Gartner Says Data Warehousing Reaching Its Most Significant Inflection Point Since Its Inception.” Some of the smaller BI vendors had been pushing in-memory analytics for years, but the general market started paying more attention when megavendors like SAP started painting a long-term vision of in-memory becoming a core platform for applications, not just analytics. Database leader Oracle was forced to upgrade their in-memory messaging from “It’s a complete fantasy” to “we have that too”.

(2) Corporate and personal BI come together. The ability to mix corporate and personal data for quick, pragmatic analysis is a common business need. The typical solution to the problem — extracting and combining the data into a local data store (either Excel or a departmental data mart) — pleases users, but introduces duplication and extra costs and makes a mockery of information governance. 2011 will see the rise of systems that let individuals and departments load their data into personal spaces in the corporate environment, allowing pragmatic analytic flexibility without compromising security and governance.

The number of departmental “data discovery” initiatives continued to rise through 2011, but new tools do make it easier for business people to upload and manipulate their own information while using the corporate standards. 2012 will see more development of “enterprise data discovery” interfaces for casual users.

(3) The next generation of business applications. Where are the business applications designed to support what people really do all day, such as implementing this year’s strategy, launching new products, or acquiring another company? 2011 will see the first prototypes of people-focused, flexible, information-centric, and collaborative applications, bringing together the best of business intelligence, “enterprise 2.0”, and existing operational applications.

2011 saw the rise of sophisticated, user-centric mobile applications that combine data from corporate systems with GPS mapping and the ability to “take action”, such as mobile medical analytics for doctors or mobile beauty advisor applications, and collaborative BI started becoming a standard part of enterprise platforms.

And one that should happen, but probably won’t: (4) Intelligence = Information + PEOPLE. Successful analytics isn’t about technology — it’s about people, process, and culture. The biggest trend in 2011 should be organizations spending the majority of their efforts on user adoption rather than technical implementation.

Unsurprisingly, there was still high demand for presentations on why BI projects fail and how to implement BI competency centers.  The new architectures probably resulted in even more emphasis on technology than ever, while business peoples’ expectations skyrocketed, fueled by advances in the consumer world. The result was probably even more dissatisfaction in the past, but the benefits of the new architectures should start becoming clearer during 2012.

What surprised me the most:

The rapid rise of Hadoop / NoSQL. The potentials of the technology have always been impressive, but I was surprised just how quickly these technology has been used to address real-life business problems (beyond the “big web” vendors where it originated), and how quickly it is becoming part of mainstream enterprise analytic architectures (e.g. Sybase IQ 15.4 includes native MapReduce APIs, Hadoop integration and federation, etc.)

Prediction for 2012:

As I sat down to gather my thoughts about BI in 2012, I quickly came up with the same long laundry list of BI topics as everybody else: in-memory, mobile, predictive, social, collaborative decision-making, data discovery, real-time, etc. etc.  All of these things are clearly important, and where going to continue to see great improvements this year. But I think that the real “next big thing” in BI is what I’m seeing when I talk to customers: they’re using these new opportunities not only to “improve analytics” but also fundamentally rethink some of their key business processes.

Instead of analytics being something that is used to monitor and eventually improve a business process, analytics is becoming a more fundamental part of the business process itself. One example is a large telco company that has transformed the way they attract customers. Instead of laboriously creating a range of rate plans, promoting them, and analyzing the results, they now use analytics to automatically create hundreds of more complex, personalized rate plans. They then throw them out into the market, monitor in real time, and quickly cull any that aren’t successful. It’s a way of doing business that would have been inconceivable in the past, and a lot more common in the future.

 

About

 

Timo Elliott

Timo Elliott is a 20-year veteran of SAP BusinessObjects, and has spent the last quarter-century working with customers around the world on information strategy.

He works closely with SAP research and innovation centers around the world to evangelize new technology prototypes.

His popular Business Analytics blog tracks innovation in analytics and social media, including topics such as augmented corporate reality, collaborative decision-making, and social network analysis.

His PowerPoint Twitter Tools lets presenters see and react to tweets in real time, embedded directly within their slides.

A popular and engaging speaker, Elliott presents regularly to IT and business audiences at international conferences, on subjects such as why BI projects fail and what to do about it, and the intersection of BI and enterprise 2.0.

Prior to Business Objects, Elliott was a computer consultant in Hong Kong and led analytics projects for Shell in New Zealand. He holds a first-class honors degree in Economics with Statistics from Bristol University, England

Timo can be contacted via Twitter at https://twitter.com/timoelliott

 Part 1 of this series was from James Kobielus, Forrestor at http://www.decisionstats.com/jim-kobielus-on-2012/

2011 Analytics Recap

Events in the field of data that impacted us in 2011

1) Oracle unveiled plans for R Enterprise. This is one of the strongest statements of its focus on in-database analytics. Oracle also unveiled plans for a Public Cloud

2) SAS Institute released version 9.3 , a major analytics software in industry use.

3) IBM acquired many companies in analytics and high tech. Again.However the expected benefits from Cognos-SPSS integration are yet to show a spectacular change in market share.

2011 Selected acquisitions

Emptoris Inc. December 2011

Cúram Software Ltd. December 2011

DemandTec December 2011

Platform Computing October 2011

 Q1 Labs October 2011

Algorithmics September 2011

 i2 August 2011

Tririga March 2011

 

4) SAP promised a lot with SAP HANA- again no major oohs and ahs in terms of market share fluctuations within analytics.

http://www.sap.com/india/news-reader/index.epx?articleID=17619

5) Amazon continued to lower prices of cloud computing and offer more options.

http://aws.amazon.com/about-aws/whats-new/2011/12/21/amazon-elastic-mapreduce-announces-support-for-cc2-8xlarge-instances/

6) Google continues to dilly -dally with its analytics and cloud based APIs. I do not expect all the APIs in the Google APIs suit to survive and be viable in the enterprise software space.  This includes Google Cloud Storage, Cloud SQL, Prediction API at https://code.google.com/apis/console/b/0/ Some of the location based , translation based APIs may have interesting spin offs that may be very very commercially lucrative.

7) Microsoft -did- hmm- I forgot. Except for its investment in Revolution Analytics round 1 many seasons ago- very little excitement has come from MS plans in data mining- The plugins for cloud based data mining from Excel remain promising yet , while Azure remains a stealth mode starter.

8) Revolution Analytics promised us a GUI and didnt deliver (till yet 🙂 ) . But it did reveal a much better Enterprise software Revolution R 5.0 is one of the strongest enterprise software in the R /Stat Computing space and R’s memory handling problem is now an issue of perception than actual stuff thanks to newer advances in how it is used.

9) More conferences, more books and more news on analytics startups in 2011. Big Data analytics remained a strong buzzword. Expect more from this space including creative uses of Hadoop based infrastructure.

10) Data privacy issues continue to hamper and impede effective analytics usage. So does rational and balanced regulation in some of the most advanced economies. We expect more regulation and better guidelines in 2012.