Talking on Big Data Analytics

I am going  being sponsored to a Government of India sponsored talk on Big Data Analytics at Bangalore on Friday the 13 th of July. If you are in Bangalore, India you may drop in for a dekko. Schedule and Abstracts (i am on page 7 out 9) .

Your tax payer money is hard at work- (hassi majak only if you are a desi. hassi to fassi.)

13 July 2012 (9.30 – 11.00 & 11.30 – 1.00)
Big Data Big Analytics
The talk will showcase using open source technologies in statistical computing for big data, namely the R programming language and its use cases in big data analysis. It will review case studies using the Amazon Cloud, custom packages in R for Big Data, tools like Revolution Analytics RevoScaleR package, as well as the newly launched SAP Hana used with R. We will also review Oracle R Enterprise. In addition we will show some case studies using BigML.com (using Clojure) , and approaches using PiCloud. In addition it will showcase some of Google APIs for Big Data Analysis.

Lastly we will talk on social media analysis ,national security use cases (i.e. cyber war) and privacy hazards of big data analytics.

Schedule

View more presentations from Ajay Ohri.
Abstracts

View more documents from Ajay Ohri.

 

Interview Jason Kuo SAP Analytics #Rstats

Here is an interview with Jason Kuo who works with SAP Analytics as Group Solutions Marketing Manager. Jason answers questions on SAP Analytics and it’s increasing involvement with R statistical language.

Ajay- What made you choose R as the language to tie important parts of your technology platform like HANA and SAP Predictive Analysis. Did you consider other languages like Julia or Python.

Jason- It’s the most popular. Over 50% of the statisticians and data analysts use R. With 3,500+ algorithms its arguably the most comprehensive statistical analysis language. That said,we are not closing the door on others.

Ajay- When did you first start getting interested in R as an analytics platform?

Jason- SAP has been tracking R for 5+ years. With R’s explosive growth over the last year or two, it made sense for us to dramatically increase our investment in R.

Ajay- Can we expect SAP to give back to the R community like Google and Revolution Analytics does- by sponsoring Package development or sponsoring user meets and conferences?

Will we see SAP’s R HANA package in this year’s R conference User 2012 in Nashville

Jason- Yes. We plan to provide a specific driver for HANA tables for input of the data to native R. This planned for end of 2012. We’ll then review our event strategy. SAP has been a sponsor of Predictive Analytics World for several years and was indeed a founding sponsor. We may be attending the year’s R conference in Nashville.

Ajay- What has been some of the initial customer feedback to your analytics expansion and offerings. 

Jason- We have completed two very successful Pilots of the R Integration for HANA with two of SAP’s largest customers.

About-

Jason has over 15 years of BI and Data Warehousing industry experience. Having worked at Oracle, Business Objects, and now SAP, Jason has been involved in numerous technical marketing roles involving performance management dashboards, information management, text analysis, predictive analytics, and now big data. He has a bachelor’s of science in operations research from the University of Michigan.

 

Interview Alvaro Tejada Galindo, SAP Labs Montreal, Using SAP Hana with #Rstats

Here is a brief interview with Alvaro Tejada Galindo aka Blag who is a developer working with SAP Hana and R at SAP Labs, Montreal. SAP Hana is SAP’s latest offering in BI , it’s also a database and a computing environment , and using R and HANA together on the cloud can give major productivity gains in terms of both speed and analytical ability, as per preliminary use cases.

Ajay- Describe how you got involved with databases and R language.
Blag-  I used to work as an ABAP Consultant for 11 years, but also been involved with programming since the last 13 years, so I was in touch with SQLServer, Oracle, MySQL and SQLite. When I joined SAP, I heard that SAP HANA was going to use an statistical programming language called “R”. The next day I started my “R” learning.

Ajay- What made the R language a fit for SAP HANA. Did you consider other languages? What is your view on Julia/Python/SPSS/SAS/Matlab languages

Blag- I think “R” is a must for SAP HANA. As the fastest database in the market, we needed a language that could help us shape the data in the best possible way. “R” filled that purpose very well. Right now, “R” is not the only language as “L” can be used as well (http://wiki.tcl.tk/17068) …not forgetting “SQLScript” which is our own version of SQL (http://goo.gl/x3bwh) . I have to admit that I tried Julia, but couldn’t manage to make it work. Regarding Python, it’s an interesting question as I’m going to blog about Python and SAP HANA soon. About Matlab, SPSS and SAS I haven’t used them, so I got nothing to say there.

Ajay- What is your view on some of the limitations of R that can be overcome with using it with SAP HANA.

Blag-  I think mostly the ability of SAP HANA to work with big data. Again, SAP HANA and “R” can work very nicely together and achieve things that weren’t possible before.

Ajay-  Have you considered other vendors of R including working with RStudio, Revolution Analytics, and even Oracle R Enterprise.

Blag-  I’m not really part of the SAP HANA or the R groups inside SAP, so I can’t really comment on that. I can only say that I use RStudio every time I need to do something with R. Regarding Oracle…I don’t think so…but they can use any of our products whenever they want.

Ajay- Do you have a case study on an actual usage of R with SAP HANA that led to great results.

Blag-   Right now the use of “R” and SAP HANA is very preliminary, I don’t think many people has start working on it…but as an example that it works, you can check this awesome blog entry from my friend Jitender Aswani “Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps “ (http://allthingsr.blogspot.com/#!/2012/04/big-data-r-and-hana-analyze-200-million.html)

Ajay- Does your group in SAP plan to give to the R ecosystem by attending conferences like UseR 2012, sponsoring meets, or package development etc

Blag- My group is in charge of everything developers, so sure, we’re planning to get more in touch with R developers and their ecosystem. Not sure how we’re going to deal with it, but at least I’m going to get myself involved in the Montreal R Group.

 

About-

http://scn.sap.com/people/alvaro.tejadagalindo3

Name: Alvaro Tejada Galindo
Email: a.tejada.galindo@sap.com
Profession: Development
Company: SAP Canada Labs-Montreal
Town/City: Montreal
Country: Canada
Instant Messaging Type: Twitter
Instant Messaging ID: Blag
Personal URL: http://blagrants.blogspot.com
Professional Blog URL: http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/u/252210910
My Relation to SAP: employee
Short Bio: Development Expert for the Technology Innovation and Developer Experience team.Used to be an ABAP Consultant for the last 11 years. Addicted to programming since 1997.

http://www.sap.com/solutions/technology/in-memory-computing-platform/hana/overview/index.epx

and from

http://en.wikipedia.org/wiki/SAP_HANA

SAP HANA is SAP AG’s implementation of in-memory database technology. There are four components within the software group:[1]

  • SAP HANA DB (or HANA DB) refers to the database technology itself,
  • SAP HANA Studio refers to the suite of tools provided by SAP for modeling,
  • SAP HANA Appliance refers to HANA DB as delivered on partner certified hardware (see below) as anappliance. It also includes the modeling tools from HANA Studio as well replication and data transformation tools to move data into HANA DB,[2]
  • SAP HANA Application Cloud refers to the cloud based infrastructure for delivery of applications (typically existing SAP applications rewritten to run on HANA).

R is integrated in HANA DB via TCP/IP. HANA uses SQL-SHM, a shared memory-based data exchange to incorporate R’s vertical data structure. HANA also introduces R scripts equivalent to native database operations like join or aggregation.[20] HANA developers can write R scripts in SQL and the types are automatically converted in HANA. R scripts can be invoked with HANA tables as both input and output in the SQLScript. R environments need to be deployed to use R within SQLScript

More blog posts on using SAP and R together

Dealing with R and HANA

http://scn.sap.com/community/in-memory-business-data-management/blog/2011/11/28/dealing-with-r-and-hana
R meets HANA

http://scn.sap.com/community/in-memory-business-data-management/blog/2012/01/29/r-meets-hana

HANA meets R

http://scn.sap.com/community/in-memory-business-data-management/blog/2012/01/26/hana-meets-r
When SAP HANA met R – First kiss

http://scn.sap.com/community/developer-center/hana/blog/2012/05/21/when-sap-hana-met-r–first-kiss

 

Using RODBC with SAP HANA DB-

SAP HANA: My experiences on using SAP HANA with R

http://scn.sap.com/community/in-memory-business-data-management/blog/2012/02/21/sap-hana-my-experiences-on-using-sap-hana-with-r

and of course the blog that started it all-

Jitender Aswani’s http://allthingsr.blogspot.in/

 

 

HANA Oncolyzer

An interesting use case of technology for better health is HANA Oncolyzer at http://epic.hpi.uni-potsdam.de/Home/HanaOncolyzer

“Build on the newest in-memory technology the HANA Oncolyzer is able to analyze even huge amounts of medical data in shortest time”, says Dr. Alexander Zeier, Deputy Chair of EPIC. Research institutes and university hospital support from HANA Oncolyzer by building the basis for a flexible exchange of information about efficiency of medicines and treatments.

In near future, the tumor’s DNA of all cancer patients needs to be analyzed to support specific patient therapies. These analyses result in medical data in amount of multiple terabytes. “These data need to be analyzed regarding mutations and anomalies in real-time”, says Matthias Steinbrecher at SAP’s Innovation Center in Potsdam. As one of the aims the research prototype HANA Oncolyzer was developed at our chair in cooperation with SAP’s Innovation Center in Potsdam. “The ‘heart’ of our development builds the in-memory technology that supports the parallel analysis of million of data within seconds in main memory”, saysMatthieu Schapranow, Ph.D. cand. at the HPI.

and

research activities result in 500.000 or more data points per patient.

and

With the help of a dedicated iPad application medical doctors can access all data mobile at any location anytime.

 

SAP HANA Contest

From SAP, a new contest based on HANA

http://wiki.sdn.sap.com/wiki/display/events/SAP%20HANA%20InnoJam%20Online%202011?bc=true

Become a champion and join our new developer challenge! Explore SAP HANA and share a critical business problem you want to solve on idea place for SAP HANA InnoJam online and describe your idea. Your participation starts here today, 13th September. The first 100 submitters will play on SAP HANA and get a chance to shine and win big prizes! To be prepared for the new challange, please read All sources you might need for your SAP HANA training, first chapters of the SAP HANA Pocketbook (coming soon) and read more on the official SAP HANA webpage.

Phase 1

  • Share a critical business problem you want to solve on Idea Place, the place is open from September 13th, 2011
  • First 100 submitters go to Phase 2

Phase 2

  • Join SAP HANA Champions community, get started with HANA, build teams
  • Access SAP HANA sandbox
  • Use SAP HANA to solve your business problem
  • Get helped by SAP experts if and when needed

Phase 3

  • Share your solution: written description and a short video
  • Top 8 finalists go to Phase 4

Phase 4

  • 8 finalists present their problem/solution/feedback in front of SAP executives and a panel of judges (early January in Palo Alto)
  • Winner prizes will be identified soon
Also suggested reading-

Brief Interview with James G Kobielus

Here is a brief one question interview with James Kobielus, Senior Analyst, Forrester.

Ajay-Describe the five most important events in Predictive Analytics you saw in 2010 and the top three trends in 2011 as per you.

Jim-

Five most important developments in 2010:

  • Continued emergence of enterprise-grade Hadoop solutions as the core of the future cloud-based platforms for advanced analytics
  • Development of the market for analytic solution appliances that incorporate several key features for advanced analytics: massively parallel EDW appliance, in-database analytics and data management function processing, embedded statistical libraries, prebuilt logical domain models, and integrated modeling and mining tools
  • Integration of advanced analytics into core BI platforms with user-friendly, visual, wizard-driven, tools for quick, exploratory predictive modeling, forecasting, and what-if analysis by nontechnical business users
  • Convergence of predictive analytics, data mining, content analytics, and CEP in integrated tools geared  to real-time social media analytics
  • Emergence of CRM and other line-of-business applications that support continuously optimized “next-best action” business processes through embedding of predictive models, orchestration engines, business rules engines, and CEP agility

Three top trends I see in the coming year, above and beyond deepening and adoption of the above-bulleted developments:

  • All-in-memory, massively parallel analytic architectures will begin to gain a foothold in complex EDW environments in support of real-time elastic analytics
  • Further crystallization of a market for general-purpose “recommendation engines” that, operating inline to EDWs, CEP environments, and BPM platforms, enable “next-best action” approaches to emerge from today’s application siloes
  • Incorporation of social network analysis functionality into a wider range of front-office business processes to enable fine-tuned behavioral-based customer segmentation to drive CRM optimization

About –http://www.forrester.com/rb/analyst/james_kobielus

James G. Kobielus
Senior Analyst, Forrester Research

RESEARCH FOCUS

James serves Business Process & Applications professionals. He is a leading expert on data warehousing, predictive analytics, data mining, and complex event processing. In addition to his core coverage areas, James contributes to Forrester’s research in business intelligence, data integration, data quality, and master data management.

PREVIOUS WORK EXPERIENCE

James has a long history in IT research and consulting and has worked for both vendors and research firms. Most recently, he was at Current Analysis, an IT research firm, where he was a principal analyst covering topics ranging from data warehousing to data integration and the Semantic Web. Prior to that position, James was a senior technical systems analyst at Exostar (a hosted supply chain management and eBusiness hub for the aerospace and defense industry). In this capacity, James was responsible for identifying and specifying product/service requirements for federated identity, PKI, and other products. He also worked as an analyst for the Burton Group and was previously employed by LCC International, DynCorp, ADEENA, International Center for Information Technologies, and the North American Telecommunications Association. He is both well versed and experienced in product and market assessments. James is a widely published business/technology author and has spoken at many industry events