SAS for R Users

I recently managed to get a copy of SAS University Edition.  Screenshot from 2015-03-04 19:54:34

1) Here were some problems I had to resolve- The download size is 1.5 gb of a zipped file ( a virtual machine image). Since I have a internet broadband based in India it led to many failed attempts before I could get it. The unzipped file is almost 3.5 gb. You can get the download file here http://www.sas.com/en_us/software/university-edition/download-software.html.

Secondly the hardware needed is 64 bit, so I basically upgraded my Dell Computer. This was a useful upgrade for me anyway.

2) You can get an Internet Download Manager to resume downloading in case your Internet connection has issues downloading a 1.5 gb file in one go. For Linux you can see http://flareget.com/download/

and for Windows http://www.internetdownloadmanager.com/download.html

 

3) I chose VM Player for Linux because I am much more comfortable with VM Player ( Desktop free version). I got that from here ~200 MB https://my.vmware.com/web/vmware/free#desktop_end_user_computing/vmware_player/6_0

Screenshot from 2015-03-04 19:39:17

4) Finally I installed VM Player and Open an Existing Virtual Machine to boot up SAS University Edition  Screenshot from 2015-03-04 19:43:08

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I was able to open the SAS Studio at the IP Address provided.

Screenshot from 2015-03-04 21:52:32

5)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I downloaded a   Dataset from this collection here

https://archive.ics.uci.edu/ml/datasets/Adult

 

6) Then I uploaded it to within the SAS Studio System

Screenshot from 2015-02-28 17:06:34Screenshot from 2015-03-03 12:29:07

7) Lastly I was able to run some basic commandsScreenshot from 2015-03-03 12:27:48

Screenshot from 2015-03-04 21:54:06

I was really impressed by the enhancements made to the interface, the ability to search command help through a drop down, the color coded editor and of course the case insensitive SAS language (though I am not a fan of the semi colon I loved using Ctrl + / for easy commenting and uncommenting)

  1. For a SAS turned R turned SAS coder- here are some views
  2. SAS has different windows for coding, log and output. R generally has one
  3. SAS is case insensitive while R is case sensitive. This is a blessing especially for variable and dataset names.
  4. SAS deals with Datasets than can be considered the same as Rs Data Frame.
  5. R’s flexibility in data types is not really comparable to SAS as it is quite fast enough.
  6. SAS has a Macro Language for repeatable tasks
  7. SQL is embedded within SAS as Proc SQL and in R through sqldf package
  8. You have to pay for each upgrade in SAS ecosystem. I am not clear on the transparent pricing, which components does what and whether they have a cloud option for renting by the hour. How about one web page that lists product description and price.
  9. SAS University Edition is a OS agnostic tool, for that itself it is quite impressive compared to say Academic Edition of Revolution Analytics.
  10. R is object oriented and uses [] and $ notation for sub objects. SAS is divided into two main parts- data and proc steps, and uses the . notation and var system
  11. SAS language has a few basic procs but many many options.
  12. How good a SAS coder you are often depends on what you can do in data manipulation in SAS Data Step
  13. Graphics is still better in R ggplot. But the SAS speed is thrilling.
  14. RAM is limited in the University Edition to 1 GB but I found that still quite fast. However I can upload only a 10 mb file to the SAS Studio for University Edition which I found reasonable for teaching purposes.

 

 

Interview Alvaro Tejada Galindo, SAP Labs Montreal, Using SAP Hana with #Rstats

Here is a brief interview with Alvaro Tejada Galindo aka Blag who is a developer working with SAP Hana and R at SAP Labs, Montreal. SAP Hana is SAP’s latest offering in BI , it’s also a database and a computing environment , and using R and HANA together on the cloud can give major productivity gains in terms of both speed and analytical ability, as per preliminary use cases.

Ajay- Describe how you got involved with databases and R language.
Blag-  I used to work as an ABAP Consultant for 11 years, but also been involved with programming since the last 13 years, so I was in touch with SQLServer, Oracle, MySQL and SQLite. When I joined SAP, I heard that SAP HANA was going to use an statistical programming language called “R”. The next day I started my “R” learning.

Ajay- What made the R language a fit for SAP HANA. Did you consider other languages? What is your view on Julia/Python/SPSS/SAS/Matlab languages

Blag- I think “R” is a must for SAP HANA. As the fastest database in the market, we needed a language that could help us shape the data in the best possible way. “R” filled that purpose very well. Right now, “R” is not the only language as “L” can be used as well (http://wiki.tcl.tk/17068) …not forgetting “SQLScript” which is our own version of SQL (http://goo.gl/x3bwh) . I have to admit that I tried Julia, but couldn’t manage to make it work. Regarding Python, it’s an interesting question as I’m going to blog about Python and SAP HANA soon. About Matlab, SPSS and SAS I haven’t used them, so I got nothing to say there.

Ajay- What is your view on some of the limitations of R that can be overcome with using it with SAP HANA.

Blag-  I think mostly the ability of SAP HANA to work with big data. Again, SAP HANA and “R” can work very nicely together and achieve things that weren’t possible before.

Ajay-  Have you considered other vendors of R including working with RStudio, Revolution Analytics, and even Oracle R Enterprise.

Blag-  I’m not really part of the SAP HANA or the R groups inside SAP, so I can’t really comment on that. I can only say that I use RStudio every time I need to do something with R. Regarding Oracle…I don’t think so…but they can use any of our products whenever they want.

Ajay- Do you have a case study on an actual usage of R with SAP HANA that led to great results.

Blag-   Right now the use of “R” and SAP HANA is very preliminary, I don’t think many people has start working on it…but as an example that it works, you can check this awesome blog entry from my friend Jitender Aswani “Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps “ (http://allthingsr.blogspot.com/#!/2012/04/big-data-r-and-hana-analyze-200-million.html)

Ajay- Does your group in SAP plan to give to the R ecosystem by attending conferences like UseR 2012, sponsoring meets, or package development etc

Blag- My group is in charge of everything developers, so sure, we’re planning to get more in touch with R developers and their ecosystem. Not sure how we’re going to deal with it, but at least I’m going to get myself involved in the Montreal R Group.

 

About-

http://scn.sap.com/people/alvaro.tejadagalindo3

Name: Alvaro Tejada Galindo
Email: a.tejada.galindo@sap.com
Profession: Development
Company: SAP Canada Labs-Montreal
Town/City: Montreal
Country: Canada
Instant Messaging Type: Twitter
Instant Messaging ID: Blag
Personal URL: http://blagrants.blogspot.com
Professional Blog URL: http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/u/252210910
My Relation to SAP: employee
Short Bio: Development Expert for the Technology Innovation and Developer Experience team.Used to be an ABAP Consultant for the last 11 years. Addicted to programming since 1997.

http://www.sap.com/solutions/technology/in-memory-computing-platform/hana/overview/index.epx

and from

http://en.wikipedia.org/wiki/SAP_HANA

SAP HANA is SAP AG’s implementation of in-memory database technology. There are four components within the software group:[1]

  • SAP HANA DB (or HANA DB) refers to the database technology itself,
  • SAP HANA Studio refers to the suite of tools provided by SAP for modeling,
  • SAP HANA Appliance refers to HANA DB as delivered on partner certified hardware (see below) as anappliance. It also includes the modeling tools from HANA Studio as well replication and data transformation tools to move data into HANA DB,[2]
  • SAP HANA Application Cloud refers to the cloud based infrastructure for delivery of applications (typically existing SAP applications rewritten to run on HANA).

R is integrated in HANA DB via TCP/IP. HANA uses SQL-SHM, a shared memory-based data exchange to incorporate R’s vertical data structure. HANA also introduces R scripts equivalent to native database operations like join or aggregation.[20] HANA developers can write R scripts in SQL and the types are automatically converted in HANA. R scripts can be invoked with HANA tables as both input and output in the SQLScript. R environments need to be deployed to use R within SQLScript

More blog posts on using SAP and R together

Dealing with R and HANA

http://scn.sap.com/community/in-memory-business-data-management/blog/2011/11/28/dealing-with-r-and-hana
R meets HANA

http://scn.sap.com/community/in-memory-business-data-management/blog/2012/01/29/r-meets-hana

HANA meets R

http://scn.sap.com/community/in-memory-business-data-management/blog/2012/01/26/hana-meets-r
When SAP HANA met R – First kiss

http://scn.sap.com/community/developer-center/hana/blog/2012/05/21/when-sap-hana-met-r–first-kiss

 

Using RODBC with SAP HANA DB-

SAP HANA: My experiences on using SAP HANA with R

http://scn.sap.com/community/in-memory-business-data-management/blog/2012/02/21/sap-hana-my-experiences-on-using-sap-hana-with-r

and of course the blog that started it all-

Jitender Aswani’s http://allthingsr.blogspot.in/

 

 

Oracle launches its version of R #rstats

From-

http://www.oracle.com/us/corporate/press/1515738

Integrates R Statistical Programming Language into Oracle Database 11g

News Facts

Oracle today announced the availability of Oracle Advanced Analytics, a new option for Oracle Database 11g that bundles Oracle R Enterprise together with Oracle Data Mining.
Oracle R Enterprise delivers enterprise class performance for users of the R statistical programming language, increasing the scale of data that can be analyzed by orders of magnitude using Oracle Database 11g.
R has attracted over two million users since its introduction in 1995, and Oracle R Enterprise dramatically advances capability for R users. Their existing R development skills, tools, and scripts can now also run transparently, and scale against data stored in Oracle Database 11g.
Customer testing of Oracle R Enterprise for Big Data analytics on Oracle Exadata has shown up to 100x increase in performance in comparison to their current environment.
Oracle Data Mining, now part of Oracle Advanced Analytics, helps enable customers to easily build and deploy predictive analytic applications that help deliver new insights into business performance.
Oracle Advanced Analytics, in conjunction with Oracle Big Data ApplianceOracle Exadata Database Machine and Oracle Exalytics In-Memory Machine, delivers the industry’s most integrated and comprehensive platform for Big Data analytics.

Comprehensive In-Database Platform for Advanced Analytics

Oracle Advanced Analytics brings analytic algorithms to data stored in Oracle Database 11g and Oracle Exadata as opposed to the traditional approach of extracting data to laptops or specialized servers.
With Oracle Advanced Analytics, customers have a comprehensive platform for real-time analytic applications that deliver insight into key business subjects such as churn prediction, product recommendations, and fraud alerting.
By providing direct and controlled access to data stored in Oracle Database 11g, customers can accelerate data analyst productivity while maintaining data security throughout the enterprise.
Powered by decades of Oracle Database innovation, Oracle R Enterprise helps enable analysts to run a variety of sophisticated numerical techniques on billion row data sets in a matter of seconds making iterative, speed of thought, and high-quality numerical analysis on Big Data practical.
Oracle R Enterprise drastically reduces the time to deploy models by eliminating the need to translate the models to other languages before they can be deployed in production.
Oracle R Enterprise integrates the extensive set of Oracle Database data mining algorithms, analytics, and access to Oracle OLAP cubes into the R language for transparent use by R users.
Oracle Data Mining provides an extensive set of in-database data mining algorithms that solve a wide range of business problems. These predictive models can be deployed in Oracle Database 11g and use Oracle Exadata Smart Scan to rapidly score huge volumes of data.
The tight integration between R, Oracle Database 11g, and Hadoop enables R users to write one R script that can run in three different environments: a laptop running open source R, Hadoop running with Oracle Big Data Connectors, and Oracle Database 11g.
Oracle provides single vendor support for the entire Big Data platform spanning the hardware stack, operating system, open source R, Oracle R Enterprise and Oracle Database 11g.
To enable easy enterprise-wide Big Data analysis, results from Oracle Advanced Analytics can be viewed from Oracle Business Intelligence Foundation Suite and Oracle Exalytics In-Memory Machine.

Supporting Quotes

“Oracle is committed to meeting the challenges of Big Data analytics. By building upon the analytical depth of Oracle SQL, Oracle Data Mining and the R environment, Oracle is delivering a scalable and secure Big Data platform to help our customers solve the toughest analytics problems,” said Andrew Mendelsohn, senior vice president, Oracle Server Technologies.
“We work with leading edge customers who rely on us to deliver better BI from their Oracle Databases. The new Oracle R Enterprise functionality allows us to perform deep analytics on Big Data stored in Oracle Databases. By leveraging R and its library of open source contributed CRAN packages combined with the power and scalability of Oracle Database 11g, we can now do that,” said Mark Rittman, co-founder, Rittman Mead.
Oracle Advanced Analytics — an option to Oracle Database 11g Enterprise Edition – extends the database into a comprehensive advanced analytics platform through two major components: Oracle R Enterprise and Oracle Data Mining. With Oracle Advanced Analytics, customers have a comprehensive platform for real-time analytic applications that deliver insight into key business subjects such as churn prediction, product recommendations, and fraud alerting.

Oracle R Enterprise tightly integrates the open source R programming language with the database to further extend the database with Rs library of statistical functionality, and pushes down computations to the database. Oracle R Enterprise dramatically advances the capability for R users, and allows them to use their existing R development skills and tools, and scripts can now also run transparently and scale against data stored in Oracle Database 11g.

Oracle Data Mining provides powerful data mining algorithms that run as native SQL functions for in-database model building and model deployment. It can be accessed through the SQL Developer extension Oracle Data Miner to build, evaluate, share and deploy predictive analytics methodologies. At the same time the high-performance Oracle-specific data mining algorithms are accessible from R.

BENEFITS

  • Scalability—Allows customers to easily scale analytics as data volume increases by bringing the algorithms to where the data resides – in the database
  • Performance—With analytical operations performed in the database, R users can take advantage of the extreme performance of Oracle Exadata
  • Security—Provides data analysts with direct but controlled access to data in Oracle Database 11g, accelerating data analyst productivity while maintaining data security
  • Save Time and Money—Lowers overall TCO for data analysis by eliminating data movement and shortening the time it takes to transform “raw data” into “actionable information”
Oracle R Hadoop Connector Gives R users high performance native access to Hadoop Distributed File System (HDFS) and MapReduce programming framework.
This is a  R package
From the datasheet at

Revolution Webinar Series #Rstats

Revolution Analytics Webinar-

 

Featured Webinar
David Champagne REGISTER NOW
Presenter David Champagne
CTO, Revolution Analytics
Date Tuesday, December 20th
Time 11:00AM – 11:30AM Pacific 
Click here for the webinar time in your local time zone

Big Data Starts with R

Traditional IT infrastructure is simply unable to meet

the demands of the new “Big Data Analytics” landscape.   Many enterprises are turning to the “R” statistical programming language and Hadoop (both open source projects) as a potential solution. This webinar will introduce the statistical capabilities of R within the Hadoop ecosystem.  We’ll cover:

  • An introduction to new packages developed by Revolution Analytics to facilitate interaction with the data stores HDFS and HBase so that they can be leveraged from the R environment
  • An overview of how to write Map Reduce jobs in R using Hadoop
  • Special considerations that need to be made when working with R and Hadoop.

We’ll also provide additional resources that are available to people interested in integrating R and Hadoop.

 

Upcoming Webinars
Wed, Dec 14th
11:00AM – 11:30AM PT
Revolution R Enterprise – 100% R and MoreR users already know why the R language is the lingua franca of statisticians today: because it’s the most powerful statistical language in the world. Revolution Analytics builds on the power of open source R, and adds performance, productivity and integration features to create Revolution R Enterprise. In this webinar, author and blogger David Smith will introduce the additional capabilities of Revolution R Enterprise.
 Archived Webinars-
Revolution Webinar: New Features in Revolution R Enterprise 5.0 (including RevoScaleR) to Support Scalable Data AnalysisRevolution R Enterprise 5.0 is Revolution Analytics’ scalable analytics platform.  At its core is Revolution Analytics’ enhanced Distribution of R, the world’s most widely-used project for statistical computing.  In this webinar, Dr. Ranney will discuss new features and show examples of the new functionality, which extend the platform’s usability, integration and scalability

 

UseR goes to Nashville, USA

So if Vanderbilt did lose (again) to UT (http://www.govolsxtra.com/news/2011/nov/20/video-tennessee-highlights-vanderbilt-game/) , they have somethign better to look before next season’s football season.

UseR is coming to Tennessee in 2012! This is the premier conference happens annually for R language (>2 mill users), and alternated between Europe and North America every other year.

Details here

http://biostat.mc.vanderbilt.edu/wiki/Main/UseR-2012

useR! 2012 (12-15 June 2012)
Department of Biostatistics
Vanderbilt University
School of Medicine
Nashville Tennessee USA

 

 

 

 


Pre-conference Survey

If you plan to attend useR! 2012, help us plan by completing a RedCAP Survey.

 


Contact

Stephania McNeal-Goddard
Assistant to the Chair
stephania.mcneal-goddard@vanderbilt.edu
Phone:             615.322.2768
Fax: 615.343.4924
Vanderbilt University School of Medicine
Department of Biostatistics
S-2323 Medical Center North
Nashville, TN 37232-2158

 

 


Abstracts and Tutorial Proposals

Participants are encouraged to submit an abstract to for oral presentation during a Kaleidoscope or Focus session, or for poster presentation. Tutorial proposals are also welcomed.

Deadlines

  • Tutorial Submission: Dec 1 – Jan 31
  • Tutorial Acceptance Notification: Feb 1 – Feb 29
  • Abstract Submission: Dec 1 – Mar 12
  • Abstract Acceptance Notification: Mar 13 – Apr 15

 

 


Registration

 

Deadlines

  • Early Registration: Jan 1 – Feb 29
  • Regular Registration: Mar 1 – May 12
  • Late Registration: May 13 – June 11
  • On-site Registration: June 12 – June 15

 

 


Travel and Lodging Information

Vanderbilt University is located in Nashville, Tennessee, USA.

Air Travel

The nearest major airport to Vanderbilt University is the Nashville International Airport (BNA). The airport is about 10 miles east of the campus and downtown Nashville. The BNA website maintains a list of ground transportation options for air travelers. The approximate taxi fare from the airport to Vanderbilt University is $27. Shuttles and buses are also available from the airport. The latter is economical (approximate fare is $1.60), but the travel time is more than an hour.

Car Travel

Nashville is located at the intersection of three major interstates. Interstate 40 approaches from the east and west, interstate 24 from the northwest and southeast, and interstate 65 from the northeast and south.

Oracle adds R to Big Data Appliance -Use #Rstats

From the press release, Oracle gets on R and me too- NoSQL

http://www.oracle.com/us/corporate/press/512001

The Oracle Big Data Appliance is a new engineered system that includes an open source distribution of Apache™ Hadoop™, Oracle NoSQL Database, Oracle Data Integrator Application Adapter for Hadoop, Oracle Loader for Hadoop, and an open source distribution of R.

From

http://www.theregister.co.uk/2011/10/03/oracle_big_data_appliance/

the Big Data Appliance also includes the R programming language, a popular open source statistical-analysis tool. This R engine will integrate with 11g R2, so presumably if you want to do statistical analysis on unstructured data stored in and chewed by Hadoop, you will have to move it to Oracle after the chewing has subsided.

This approach to R-Hadoop integration is different from that announced last week between Revolution Analytics, the so-called Red Hat for stats that is extending and commercializing the R language and its engine, and Cloudera, which sells a commercial Hadoop setup called CDH3 and which was one of the early companies to offer support for Hadoop. Both Revolution Analytics and Cloudera now have Oracle as their competitor, which was no doubt no surprise to either.

In any event, the way they do it, the R engine is put on each node in the Hadoop cluster, and those R engines just see the Hadoop data as a native format that they can do analysis on individually. As statisticians do analyses on data sets, the summary data from all the nodes in the Hadoop cluster is sent back to their R workstations; they have no idea that they are using MapReduce on unstructured data.

Oracle did not supply configuration and pricing information for the Big Data Appliance, and also did not say when it would be for sale or shipping to customers

From

http://www.oracle.com/us/corporate/features/feature-oracle-nosql-database-505146.html

A Horizontally Scaled, Key-Value Database for the Enterprise
Oracle NoSQL Database is a commercial grade, general-purpose NoSQL database using a key/value paradigm. It allows you to manage massive quantities of data, cope with changing data formats, and submit simple queries. Complex queries are supported using Hadoop or Oracle Database operating upon Oracle NoSQL Database data.

Oracle NoSQL Database delivers scalable throughput with bounded latency, easy administration, and a simple programming model. It scales horizontally to hundreds of nodes with high availability and transparent load balancing. Customers might choose Oracle NoSQL Database to support Web applications, acquire sensor data, scale authentication services, or support online serves and social media.

and

from

http://siliconangle.com/blog/2011/09/30/oracle-adopting-open-source-r-to-connect-legacy-systems/

Oracle says it will integrate R with its Oracle Database. Other signs from Oracle show the deeper interest in using the statistical framework for integration with Hadoop to potentially speed statistical analysis. This has particular value with analyzing vast amounts of unstructured data, which has overwhelmed organizations, especially over the past year.

and

from

http://www.oracle.com/us/corporate/features/features-oracle-r-enterprise-498732.html

Oracle R Enterprise

Integrates the Open-Source Statistical Environment R with Oracle Database 11g
Oracle R Enterprise allows analysts and statisticians to run existing R applications and use the R client directly against data stored in Oracle Database 11g—vastly increasing scalability, performance and security. The combination of Oracle Database 11g and R delivers an enterprise-ready, deeply integrated environment for advanced analytics. Users can also use analytical sandboxes, where they can analyze data and develop R scripts for deployment while results stay managed inside Oracle Database.