Home » Posts tagged 'version'

Tag Archives: version

Top 7 Business Strategy Models

UPDATED POST- Some Models I use for Business Strategy- to analyze the huge reams of qualitative and uncertain data that business generates. I have added a bonus the Business canvas

  1. Porters 5 forces Model-To analyze industries
  2. Business Canvas
  3. BCG Matrix- To analyze Product Portfolios
  4. Porters Diamond Model- To analyze locations
  5. McKinsey 7 S Model-To analyze teams
  6. Gernier Theory- To analyze growth of organization
  7. Herzberg Hygiene Theory- To analyze soft aspects of individuals
  8. Marketing Mix Model- To analyze marketing mix.

(more…)

Geeks for Privacy: Play Color Cipher and Visual Cryptography

Maybe the guys in Anonymous or Wikileaks can now use visual cryptography while using Snapchat to fool the NSA or CIA

Personally I think a browser with inbuilt backdoors to Tor Relays and data transfer by Bit Torrrents could be worthy a project too.

Quit the bullshit, Google- you are as evil as The Russian Communist Empire

I was just reading up on my weekly to-read list and came across this interesting method. It is called Play Color Cipher-

Each Character ( Capital, Small letters, Numbers (0-9), Symbols on the keyboard ) in the plain text is substituted with a color block from the available 18 Decillions of colors in the world [11][12][13] and at the receiving end the cipher text block (in color) is decrypted in to plain text block. It overcomes the problems like “Meet in the middle attack, Birthday attack and Brute force attacks [1]”.
It also reduces the size of the plain text when it is encrypted in to cipher text by 4 times, with out any loss of content. Cipher text occupies very less buffer space; hence transmitting through channel is very fast. With this the transportation cost through channel comes down.

ColorCipherBlocks

Reference-

http://www.ijcaonline.org/journal/number28/pxc387832.pdf

Visual Cryptography is indeed an interesting topic-

Visual cryptography, an emerging cryptography technology, uses the characteristics of human vision to decrypt encrypted
images. It needs neither cryptography knowledge nor complex computation. For security concerns, it also ensures that hackers
cannot perceive any clues about a secret image from individual cover images. Since Naor and Shamir proposed the basic
model of visual cryptography, researchers have published many related studies.

Visual_crypto_animation_demo

Visual cryptography (VC) schemes hide the secret image into two or more images which are called
shares. The secret image can be recovered simply by stacking the shares together without any complex
computation involved. The shares are very safe because separately they reveal nothing about the secret image.

Visual Cryptography provides one of the secure ways to transfer images on the Internet. The advantage
of visual cryptography is that it exploits human eyes to decrypt secret images .

ESPECIALLY SEE |THIS AND THIS

http://cacr.uwaterloo.ca/~dstinson/VCS-flag.html

and

http://cacr.uwaterloo.ca/~dstinson/VCS-pi.html

Even more fun—– visual cryptography using a series of bar codes – leaving the man in middle guessing how many sub images are there and which if at all is the real message

 

vispixel

References-

Color Visual Cryptography Scheme Using Meaningful Shares

http://csis.bits-pilani.ac.in/faculty/murali/netsec-10/seminar/refs/muralikrishna4.pdf

Visual cryptography for color images

http://csis.bits-pilani.ac.in/faculty/murali/netsec-10/seminar/refs/muralikrishna3.pdf

Other Resources

  1. http://users.telenet.be/d.rijmenants/en/visualcrypto.htm
  2. Visual Crypto – One-time Image Create two secure images from one by Robert Hansen
  3. Visual Crypto Java Applet at the University of Regensburg
  4. Visual Cryptography Kit Software to create image layers
  5. On-line Visual Crypto Applet by Leemon Baird
  6. Extended Visual Cryptography (pdf) by Mizuho Nakajima and Yasushi Yamaguchi
  7. Visual Cryptography Paper by Moni Noar and Adi Shamir
  8. Visual Crypto Talk (pdf) by Frederik Vercauteren ESAT Leuven
  9. http://cacr.uwaterloo.ca/~dstinson/visual.html
  10. t the University of Salerno web page on visual cryptogrpahy.
  11. Visual Crypto Page by Doug Stinson
  12. Simple implementation of the visual cryptography scheme based on Moni Naor and Adi Shamir, Visual Cryptography, EUROCRYPT 1994, pp1–12. This technique allows visual information like pictures to be encrypted so that decryption can be done visually.The code outputs two files. Try printing them on two separate transparencies and putting them one on top of the other to see the hidden message. http://algorito.com/algorithm/visual-cryptography

Visual Cryptography 

Ajay- I think a combination of sharing and color ciphers would prove more helpful to secure Internet Communication than existing algorithms. It also levels the playing field from computationally rich players to creative coders.

JMP Student Edition

I really liked the initiatives at JMP/Academic. Not only they offer the software bundled with a textbook, which is both good common sense as well as business sense given how fast students can get confused

(Rant 1 Bundling with textbooks is something I think is Revolution Analytics should think of doing instead of just offering the academic  version for free downloading- it would be interesting to see the penetration of R academic market with Revolution’s version and the open source version with the existing strategy)

From http://www.jmp.com/academic/textbooks.shtml

Major publishers of introductory statistics textbooks offer a 12-month license to JMP Student Edition, a streamlined version of JMP, with their textbooks.

and a glance through this http://www.jmp.com/academic/pdf/jmp_se_comparison.pdf  shows it is a credible and not extremely whittled down version which would be just dishonest.

And I loved this Reference Card at http://www.jmp.com/academic/pdf/jmp10_se_quick_guide.pdf

 

Oracle, SAP- Hana, Revolution Analytics and even SAS/STAT itself can make more reference cards like this- elegant solutions for students and new learners!

More- creative-rants Honestly why do corporate sites use PDFs anymore when they can use Instapaper , or any of these SlideShare/Scribd formats to show information in a better way without diverting the user from the main webpage.

But I digress, back to JMP

 

Resources for Faculty Using JMP® Student Edition

Faculty who select a JMP Student Edition bundle for their courses may be eligible for additional resources, including course materials and training.

Special JMP® Student Edition for AP Statistics

JMP Student Edition is available in a convenient five-year license for qualified Advanced Placement statistics programs.

Try and have a look yourself at http://www.jmp.com/academic/student.shtml

 

 

 

Interview Rob J Hyndman Forecasting Expert #rstats

Here is an interview with Prof Rob J Hyndman who has created many time series forecasting methods and authored books as well as R packages on the same.

Ajay -Describe your journey from being a student of science to a Professor. What were some key turning points along that journey?
 
Rob- I started a science honours degree at the University of Melbourne in 1985. By the end of 1985 I found myself simultaneously working as a statistical consultant (having completed all of one year of statistics courses!). For the next three years I studied mathematics, statistics and computer science at university, and tried to learn whatever I needed to in order to help my growing group of clients. Often we would cover things in classes that I’d already taught myself through my consulting work. That really set the trend for the rest of my career. I’ve always been an academic on the one hand, and a statistical consultant on the other. The consulting work has led me to learn a lot of things that I would not otherwise have come across, and has also encouraged me to focus on research problems that are of direct relevance to the clients I work with.
I never set out to be an academic. In fact, I thought that I would get a job in the business world as soon as I finished my degree. But once I completed the degree, I was offered a position as a statistical consultant within the University of Melbourne, helping researchers in various disciplines and doing some commercial work. After a year, I was getting bored doing only consulting, and I thought it would be interesting to do a PhD. I was lucky enough to be offered a generous scholarship which meant I was paid more to study than to continue working.
Again, I thought that I would probably go and get a job in the business world after I finished my PhD. But I finished it early and my scholarship was going to be cut off once I submitted my thesis. So instead, I offered to teach classes for free at the university and delayed submitting my thesis until the scholarship period ran out. That turned out to be a smart move because the university saw that I was a good teacher, and offered me a lecturing position starting immediately I submitted my thesis. So I sort of fell into an academic career.
I’ve kept up the consulting work part-time because it is interesting, and it gives me a little extra money. But I’ve also stayed an academic because I love the freedom to be able to work on anything that takes my fancy.
Ajay- Describe your upcoming book on Forecasting.
 
Rob- My first textbook on forecasting (with Makridakis and Wheelwright) was written a few years after I finished my PhD. It has been very popular, but it costs a lot of money (about $140 on Amazon). I estimate that I get about $1 for every book sold. The rest goes to the publisher (Wiley) and all they do is print, market and distribute it. I even typeset the whole thing myself and they print directly from the files I provided. It is now about 15 years since the book was written and it badly needs updating. I had a choice of writing a new edition with Wiley or doing something completely new. I decided to do a new one, largely because I didn’t want a publisher to make a lot of money out of students using my hard work.
It seems to me that students try to avoid buying textbooks and will search around looking for suitable online material instead. Often the online material is of very low quality and contains many errors.
As I wasn’t making much money on my textbook, and the facilities now exist to make online publishing very easy, I decided to try a publishing experiment. So my new textbook will be online and completely free. So far it is about 2/3 completed and is available at http://otexts.com/fpp/. I am hoping that my co-author (George Athanasopoulos) and I will finish it off before the end of 2012.
The book is intended to provide a comprehensive introduction to forecasting methods. We don’t attempt to discuss the theory much, but provide enough information for people to use the methods in practice. It is tied to the forecast package in R, and we provide code to show how to use the various forecasting methods.
The idea of online textbooks makes a lot of sense. They are continuously updated so if we find a mistake we fix it immediately. Also, we can add new sections, or update parts of the book, as required rather than waiting for a new edition to come out. We can also add richer content including video, dynamic graphics, etc.
For readers that want a print edition, we will be aiming to produce a print version of the book every year (available via Amazon).
I like the idea so much I’m trying to set up a new publishing platform (otexts.com) to enable other authors to do the same sort of thing. It is taking longer than I would like to make that happen, but probably next year we should have something ready for other authors to use.
Ajay- How can we make textbooks cheaper for students as well as compensate authors fairly
 
Rob- Well free is definitely cheaper, and there are a few businesses trying to make free online textbooks a reality. Apart from my own efforts, http://www.flatworldknowledge.com/ is producing a lot of free textbooks. And textbookrevolution.org is another great resource.
With otexts.com, we will compensate authors in two ways. First, the print versions of a book will be sold (although at a vastly cheaper rate than other commercial publishers). The royalties on print sales will be split 50/50 with the authors. Second, we plan to have some features of each book available for subscription only (e.g., solutions to exercises, some multimedia content, etc.). Again, the subscription fees will be split 50/50 with the authors.
Ajay- Suppose a person who used to use forecasting software from another company decides to switch to R. How easy and lucid do you think the current documentation on R website for business analytics practitioners such as these – in the corporate world.
 
Rob- The documentation on the R website is not very good for newcomers, but there are a lot of other R resources now available. One of the best introductions is Matloff’s “The Art of R Programming”. Provided someone has done some programming before (e.g., VBA, python or java), learning R is a breeze. The people who have trouble are those who have only ever used menu interfaces such as Excel. Then they are not only learning R, but learning to think about computing in a different way from what they are used to, and that can be tricky. However, it is well worth it. Once you know how to code, you can do so much more.  I wish some basic programming was part of every business and statistics degree.
If you are working in a particular area, then it is often best to find a book that uses R in that discipline. For example, if you want to do forecasting, you can use my book (otexts.com/fpp/). Or if you are using R for data visualization, get hold of Hadley Wickham’s ggplot2 book.
Ajay- In a long and storied career- What is the best forecast you ever made ? and the worst?
 
 Rob- Actually, my best work is not so much in making forecasts as in developing new forecasting methodology. I’m very proud of my forecasting models for electricity demand which are now used for all long-term planning of electricity capacity in Australia (see  http://robjhyndman.com/papers/peak-electricity-demand/  for the details). Also, my methods for population forecasting (http://robjhyndman.com/papers/stochastic-population-forecasts/ ) are pretty good (in my opinion!). These methods are now used by some national governments (but not Australia!) for their official population forecasts.
Of course, I’ve made some bad forecasts, but usually when I’ve tried to do more than is reasonable given the available data. One of my earliest consulting jobs involved forecasting the sales for a large car manufacturer. They wanted forecasts for the next fifteen years using less than ten years of historical data. I should have refused as it is unreasonable to forecast that far ahead using so little data. But I was young and naive and wanted the work. So I did the forecasts, and they were clearly outside the company’s (reasonable) expectations, and they then refused to pay me. Lesson learned. It’s better to refuse work than do it poorly.

Probably the biggest impact I’ve had is in helping the Australian government forecast the national health budget. In 2001 and 2002, they had underestimated health expenditure by nearly $1 billion in each year which is a lot of money to have to find, even for a national government. I was invited to assist them in developing a new forecasting method, which I did. The new method has forecast errors of the order of plus or minus $50 million which is much more manageable. The method I developed for them was the basis of the ETS models discussed in my 2008 book on exponential smoothing (www.exponentialsmoothing.net)

. And now anyone can use the method with the ets() function in the forecast package for R.
About-
Rob J Hyndman is Pro­fessor of Stat­ist­ics in the Depart­ment of Eco­no­met­rics and Busi­ness Stat­ist­ics at Mon­ash Uni­ver­sity and Dir­ector of the Mon­ash Uni­ver­sity Busi­ness & Eco­nomic Fore­cast­ing Unit. He is also Editor-in-Chief of the Inter­na­tional Journal of Fore­cast­ing and a Dir­ector of the Inter­na­tional Insti­tute of Fore­casters. Rob is the author of over 100 research papers in stat­ist­ical sci­ence. In 2007, he received the Moran medal from the Aus­tralian Academy of Sci­ence for his con­tri­bu­tions to stat­ist­ical research, espe­cially in the area of stat­ist­ical fore­cast­ing. For 25 years, Rob has main­tained an act­ive con­sult­ing prac­tice, assist­ing hun­dreds of com­pan­ies and organ­iz­a­tions. His recent con­sult­ing work has involved fore­cast­ing elec­tri­city demand, tour­ism demand, the Aus­tralian gov­ern­ment health budget and case volume at a US call centre.

RevoDeployR and commercial BI using R and R based cloud computing using Open CPU

Revolution Analytics has of course had RevoDeployR, and in a  webinar strive to bring it back to center spotlight.

BI is a good lucrative market, and visualization is a strength in R, so it is matter of time before we have more R based BI solutions. I really liked the two slides below for explaining RevoDeployR better to newbies like me (and many others!)

Integrating R into 3rd party and Web applications using RevoDeployR

Please click here to download the PDF.

Here are some additional links that may be of interest to you:

 

( I still think someone should make a commercial version of Jeroen Oom’s web interfaces and Jeff Horner’s web infrastructure (see below) for making customized Business Intelligence (BI) /Data Visualization solutions , UCLA and Vanderbilt are not exactly Stanford when it comes to deploying great academic solutions in the startup-tech world). I kind of think Google or someone at Revolution  should atleast dekko OpenCPU as a credible cloud solution in R.

I still cant figure out whether Revolution Analytics has a cloud computing strategy and Google seems to be working mysteriously as usual in broadening access to the Google Compute Cloud to the rest of R Community.

Open CPU  provides a free and open platform for statistical computing in the cloud. It is meant as an open, social analysis environment where people can share and run R functions and objects. For more details, visit the websit: www.opencpu.org

and esp see

https://public.opencpu.org/userapps/opencpu/opencpu.demo/runcode/

Jeff Horner’s

http://rapache.net/

Jerooen Oom’s

Interview James G Kobielus IBM Big Data

Here is an interview with  James G Kobielus, who is the Senior Program Director, Product Marketing, Big Data Analytics Solutions at IBM. Special thanks to Payal Patel Cudia of IBM’s communication team,for helping with the logistics for this.

Ajay -What are the specific parts of the IBM Platform that deal with the three layers of Big Data -variety, velocity and volume

James-Well first of all, let’s talk about the IBM Information Management portfolio. Our big data platform addresses the three layers of big data to varying degrees either together in a product , or two out of the three or even one of the three aspects. We don’t have separate products for the variety, velocity and volume separately.

Let us define these three layers-Volume refers to the hundreds of terabytes and petabytes of stored data inside organizations today. Velocity refers to the whole continuum from batch to real time continuous and streaming data.

Variety refers to multi-structure data from structured to unstructured files, managed and stored in a common platform analyzed through common tooling.

For Volume-IBM has a highly scalable Big Data platform. This includes Netezza and Infosphere groups of products, and Watson-like technologies that can support petabytes volume of data for analytics. But really the support of volume ranges across IBM’s Information Management portfolio both on the database side and the advanced analytics side.

For real time Velocity, we have real time data acquisition. We have a product called IBM Infosphere, part of our Big Data platform, that is specifically built for streaming real time data acquisition and delivery through complex event processing. We have a very rich range of offerings that help clients build a Hadoop environment that can scale.

Our Hadoop platform is the most real time capable of all in the industry. We are differentiated by our sheer breadth, sophistication and functional depth and tooling integrated in our Hadoop platform. We are differentiated by our streaming offering integrated into the Hadoop platform. We also offer a great range of modeling and analysis tools, pretty much more than any other offering in the Big Data space.

Attached- Jim’s slides from Hadoop World

Ajay- Any plans for Mahout for Hadoop

Jim- I cant speak about product plans. We have plans but I cant tell you anything more. We do have a feature in Big Insights called System ML, a library for machine learning.

Ajay- How integral are acquisitions for IBM in the Big Data space (Netezza,Cognos,SPSS etc). Is it true that everything that you have in Big Data is acquired or is the famous IBM R and D contributing here . (see a partial list of IBM acquisitions at at http://www.ibm.com/investor/strategy/acquisitions.wss )

Jim- We have developed a lot on our own. We have the deepest R and D of anybody in the industry in all things Big Data.

For example – Watson has Big Insights Hadoop at its core. Apache Hadoop is the heart and soul of Big Data (see http://www-01.ibm.com/software/data/infosphere/hadoop/ ). A great deal that makes Big Insights so differentiated is that not everything that has been built has been built by the Hadoop community.

We have built additions out of the necessity for security, modeling, monitoring, and governance capabilities into BigInsights to make it truly enterprise ready. That is one example of where we have leveraged open source and we have built our own tools and technologies and layered them on top of the open source code.

Yes of course we have done many strategic acquisitions over the last several years related to Big Data Management and we continue to do so. This quarter we have done 3 acquisitions with strong relevance to Big Data. One of them is Vivisimo (http://www-03.ibm.com/press/us/en/pressrelease/37491.wss ).

Vivisimo provides federated Big Data discovery, search and profiling capabilities to help you figure out what data is out there,what is relevance of that data to your data science project- to help you answer the question which data should you bring in your Hadoop Cluster.

 We also did Varicent , which is more performance management and we did TeaLeaf , which is a customer experience solution provider where customer experience management and optimization is one of the hot killer apps for Hadoop in the cloud. We have done great many acquisitions that have a clear relevance to Big Data.

Netezza already had a massively parallel analytics database product with an embedded library of models called Netezza Analytics, and in-database capabilties to massively parallelize Map Reduce and other analytics management functions inside the database. In many ways, Netezza provided capabilities similar to that IBM had provided for many years under the Smart Analytics Platform (http://www-01.ibm.com/software/data/infosphere/what-is-advanced-analytics/ ) .

There is a differential between Netezza and ISAS.

ISAS was built predominantly in-house over several years . If you go back a decade ago IBM acquired Ascential Software , a product portfolio that was the heart and soul of IBM InfoSphere Information Manager that is core to our big Data platform. In addition to Netezza, IBM bought SPSS two years back. We already had data mining tools and predictive modeling in the InfoSphere portfolio, but we realized we needed to have the best of breed, SPSS provided that and so IBM acquired them.

 Cognos- We had some BI reporting capabilities in the InfoSphere portfolio that we had built ourselves and also acquired for various degrees from prior acquisitions. But clearly Cognos was one of the best BI vendors , and we were lacking such a rich tool set in our product in visualization and cubing and so for that reason we acquired Cognos.

There is also Unica – which is a marketing campaign optimization which in many ways is a killer app for Hadoop. Projects like that are driving many enterprises.

Ajay- How would you rank order these acquisitions in terms of strategic importance rather than data of acquisition or price paid.

Jim-Think of Big Data as an ecosystem that has components that are fitted to particular functions for data analytics and data management. Is the database the core, or the modeling tool the core, or the governance tools the core, or is the hardware platform the core. Everything is critically important. We would love to hear from you what you think have been most important. Each acquisition has helped play a critical role to build the deepest and broadest solution offering in Big Data. We offer the hardware, software, professional services, the hosting service. I don’t think there is any validity to a rank order system.

Ajay-What are the initiatives regarding open source that Big Data group have done or are planning?

Jim- What we are doing now- We are very much involved with the Apache Hadoop community. We continue to evolve the open source code that everyone leverages.. We have built BigInsights on Apache Hadoop. We have the closest, most up to date in terms of version number to Apache Hadoop ( Hbase,HDFS, Pig etc) of all commercial distributions with our BigInsights 1.4 .

We have an R library integrated with BigInsights . We have a R library integrated with Netezza Analytics. There is support for R Models within the SPSS portfolio. We already have a fair amount of support for R across the portfolio.

Ajay- What are some of the concerns (privacy,security,regulation) that you think can dampen the promise of Big Data.

Jim- There are no showstoppers, there is really a strong momentum. Some of the concerns within the Hadoop space are immaturity of the technology, the immaturity of some of the commercial offerings out there that implement Hadoop, the lack of standardization for formal sense for Hadoop.

There is no Open Standards Body that declares, ratifies the latest version of Mahout, Map Reduce, HDFS etc. There is no industry consensus reference framework for layering these different sub projects. There are no open APIs. There are no certifications or interoperability standards or organizations to certify different vendors interoperability around a common API or framework.

The lack of standardization is troubling in this whole market. That creates risks for users because users are adopting multiple Hadoop products. There are lots of Hadoop deployments in the corporate world built around Apache Hadoop (purely open source). There may be no assurance that these multiple platforms will interoperate seamlessly. That’s a huge issue in terms of just magnifying the risk. And it increases the need for the end user to develop their own custom integrated code if they want to move data between platforms, or move map-reduce jobs between multiple distributions.

Also governance is a consideration. Right now Hadoop is used for high volume ETL on multi structured and unstructured data sources, or Hadoop is used for exploratory sand boxes for data scientists. These are important applications that are a majority of the Hadoop deployments . Some Hadoop deployments are stand alone unstructured data marts for specific applications like sentiment analysis like.

Hadoop is not yet ready for data warehousing. We don’t see a lot of Hadoop being used as an alternative to data warehouses for managing the single version of truth of system or record data. That day will come but there needs to be out there in the marketplace a broader range of data governance mechanisms , master data management, data profiling products that are mature that enterprises can use to make sure their data inside their Hadoop clusters is clean and is the single version of truth. That day has not arrived yet.

One of the great things about IBM’s acquisition of Vivisimo is that a piece of that overall governance picture is discovery and profiling for unstructured data , and that is done very well by Vivisimo for several years.

What we will see is vendors such as IBM will continue to evolve security features inside of our Hadoop platform. We will beef up our data governance capabilities for this new world of Hadoop as the core of Big Data, and we will continue to build up our ability to integrate multiple databases in our Hadoop platform so that customers can use data from a bit of Hadoop,some data from a bit of traditional relational data warehouse, maybe some noSQL technology for different roles within a very complex Big Data environment.

That latter hybrid deployment model is becoming standard across many enterprises for Big Data. A cause for concern is when your Big Data deployment has a bit of Hadoop, bit of noSQL, bit of EDW, bit of in-memory , there are no open standards or frameworks for putting it all together for a unified framework not just for interoperability but also for deployment.

There needs to be a virtualization or abstraction layer for unified access to all these different Big Data platforms by the users/developers writing the queries, by administrators so they can manage data and resources and jobs across all these disparate platforms in a seamless unified way with visual tooling. That grand scenario, the virtualization layer is not there yet in any standard way across the big data market. It will evolve, it may take 5-10 years to evolve but it will evolve.

So, that’s the concern that can dampen some of the enthusiasm for Big Data Analytics.

About-

You can read more about Jim at http://www.linkedin.com/pub/james-kobielus/6/ab2/8b0 or

follow him on Twitter at http://twitter.com/jameskobielus

You can read more about IBM Big Data at http://www-01.ibm.com/software/data/bigdata/

Interview Alvaro Tejada Galindo, SAP Labs Montreal, Using SAP Hana with #Rstats

Here is a brief interview with Alvaro Tejada Galindo aka Blag who is a developer working with SAP Hana and R at SAP Labs, Montreal. SAP Hana is SAP’s latest offering in BI , it’s also a database and a computing environment , and using R and HANA together on the cloud can give major productivity gains in terms of both speed and analytical ability, as per preliminary use cases.

Ajay- Describe how you got involved with databases and R language.
Blag-  I used to work as an ABAP Consultant for 11 years, but also been involved with programming since the last 13 years, so I was in touch with SQLServer, Oracle, MySQL and SQLite. When I joined SAP, I heard that SAP HANA was going to use an statistical programming language called “R”. The next day I started my “R” learning.

Ajay- What made the R language a fit for SAP HANA. Did you consider other languages? What is your view on Julia/Python/SPSS/SAS/Matlab languages

Blag- I think “R” is a must for SAP HANA. As the fastest database in the market, we needed a language that could help us shape the data in the best possible way. “R” filled that purpose very well. Right now, “R” is not the only language as “L” can be used as well (http://wiki.tcl.tk/17068) …not forgetting “SQLScript” which is our own version of SQL (http://goo.gl/x3bwh) . I have to admit that I tried Julia, but couldn’t manage to make it work. Regarding Python, it’s an interesting question as I’m going to blog about Python and SAP HANA soon. About Matlab, SPSS and SAS I haven’t used them, so I got nothing to say there.

Ajay- What is your view on some of the limitations of R that can be overcome with using it with SAP HANA.

Blag-  I think mostly the ability of SAP HANA to work with big data. Again, SAP HANA and “R” can work very nicely together and achieve things that weren’t possible before.

Ajay-  Have you considered other vendors of R including working with RStudio, Revolution Analytics, and even Oracle R Enterprise.

Blag-  I’m not really part of the SAP HANA or the R groups inside SAP, so I can’t really comment on that. I can only say that I use RStudio every time I need to do something with R. Regarding Oracle…I don’t think so…but they can use any of our products whenever they want.

Ajay- Do you have a case study on an actual usage of R with SAP HANA that led to great results.

Blag-   Right now the use of “R” and SAP HANA is very preliminary, I don’t think many people has start working on it…but as an example that it works, you can check this awesome blog entry from my friend Jitender Aswani “Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps “ (http://allthingsr.blogspot.com/#!/2012/04/big-data-r-and-hana-analyze-200-million.html)

Ajay- Does your group in SAP plan to give to the R ecosystem by attending conferences like UseR 2012, sponsoring meets, or package development etc

Blag- My group is in charge of everything developers, so sure, we’re planning to get more in touch with R developers and their ecosystem. Not sure how we’re going to deal with it, but at least I’m going to get myself involved in the Montreal R Group.

 

About-

http://scn.sap.com/people/alvaro.tejadagalindo3

Name: Alvaro Tejada Galindo
Email: a.tejada.galindo@sap.com
Profession: Development
Company: SAP Canada Labs-Montreal
Town/City: Montreal
Country: Canada
Instant Messaging Type: Twitter
Instant Messaging ID: Blag
Personal URL: http://blagrants.blogspot.com
Professional Blog URL: http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/u/252210910
My Relation to SAP: employee
Short Bio: Development Expert for the Technology Innovation and Developer Experience team.Used to be an ABAP Consultant for the last 11 years. Addicted to programming since 1997.

http://www.sap.com/solutions/technology/in-memory-computing-platform/hana/overview/index.epx

and from

http://en.wikipedia.org/wiki/SAP_HANA

SAP HANA is SAP AG’s implementation of in-memory database technology. There are four components within the software group:[1]

  • SAP HANA DB (or HANA DB) refers to the database technology itself,
  • SAP HANA Studio refers to the suite of tools provided by SAP for modeling,
  • SAP HANA Appliance refers to HANA DB as delivered on partner certified hardware (see below) as anappliance. It also includes the modeling tools from HANA Studio as well replication and data transformation tools to move data into HANA DB,[2]
  • SAP HANA Application Cloud refers to the cloud based infrastructure for delivery of applications (typically existing SAP applications rewritten to run on HANA).

R is integrated in HANA DB via TCP/IP. HANA uses SQL-SHM, a shared memory-based data exchange to incorporate R’s vertical data structure. HANA also introduces R scripts equivalent to native database operations like join or aggregation.[20] HANA developers can write R scripts in SQL and the types are automatically converted in HANA. R scripts can be invoked with HANA tables as both input and output in the SQLScript. R environments need to be deployed to use R within SQLScript

More blog posts on using SAP and R together

Dealing with R and HANA

http://scn.sap.com/community/in-memory-business-data-management/blog/2011/11/28/dealing-with-r-and-hana
R meets HANA

http://scn.sap.com/community/in-memory-business-data-management/blog/2012/01/29/r-meets-hana

HANA meets R

http://scn.sap.com/community/in-memory-business-data-management/blog/2012/01/26/hana-meets-r
When SAP HANA met R – First kiss

http://scn.sap.com/community/developer-center/hana/blog/2012/05/21/when-sap-hana-met-r–first-kiss

 

Using RODBC with SAP HANA DB-

SAP HANA: My experiences on using SAP HANA with R

http://scn.sap.com/community/in-memory-business-data-management/blog/2012/02/21/sap-hana-my-experiences-on-using-sap-hana-with-r

and of course the blog that started it all-

Jitender Aswani’s http://allthingsr.blogspot.in/

 

 

Follow

Get every new post delivered to your Inbox.

Join 839 other followers