SAS Data Loader for Hadoop eliminates the complexities of writing MapReduce code, with a simple, point-and-click interface that empowers business analysts to prepare, integrate and cleanse big data faster and easier than ever. In addition, data scientists and programmers can run SAS code on Hadoop in parallel for better performance and greater productivity.
- Download and install Cloudera QuickStart VM for CDH 5.3x.
- Download and install either VMware Player 6.0 or later (for Windows) or VMware Fusion for OS X 6.0 (for Mac).
- Download and install your 90-day free trial of SAS Data Loader for Hadoop.
I recently managed to get a copy of SAS University Edition.
1) Here were some problems I had to resolve- The download size is 1.5 gb of a zipped file ( a virtual machine image). Since I have a internet broadband based in India it led to many failed attempts before I could get it. The unzipped file is almost 3.5 gb. You can get the download file here http://www.sas.com/en_us/software/university-edition/download-software.html.
Secondly the hardware needed is 64 bit, so I basically upgraded my Dell Computer. This was a useful upgrade for me anyway.
2) You can get an Internet Download Manager to resume downloading in case your Internet connection has issues downloading a 1.5 gb file in one go. For Linux you can see http://flareget.com/download/
and for Windows http://www.internetdownloadmanager.com/download.html
3) I chose VM Player for Linux because I am much more comfortable with VM Player ( Desktop free version). I got that from here ~200 MB https://my.vmware.com/web/vmware/free#desktop_end_user_computing/vmware_player/6_0
4) Finally I installed VM Player and Open an Existing Virtual Machine to boot up SAS University Edition
I was able to open the SAS Studio at the IP Address provided.
I downloaded a Dataset from this collection here
6) Then I uploaded it to within the SAS Studio System
7) Lastly I was able to run some basic commands
I was really impressed by the enhancements made to the interface, the ability to search command help through a drop down, the color coded editor and of course the case insensitive SAS language (though I am not a fan of the semi colon I loved using Ctrl + / for easy commenting and uncommenting)
- For a SAS turned R turned SAS coder- here are some views
- SAS has different windows for coding, log and output. R generally has one
- SAS is case insensitive while R is case sensitive. This is a blessing especially for variable and dataset names.
- SAS deals with Datasets than can be considered the same as Rs Data Frame.
- R’s flexibility in data types is not really comparable to SAS as it is quite fast enough.
- SAS has a Macro Language for repeatable tasks
- SQL is embedded within SAS as Proc SQL and in R through sqldf package
- You have to pay for each upgrade in SAS ecosystem. I am not clear on the transparent pricing, which components does what and whether they have a cloud option for renting by the hour. How about one web page that lists product description and price.
- SAS University Edition is a OS agnostic tool, for that itself it is quite impressive compared to say Academic Edition of Revolution Analytics.
- R is object oriented and uses  and $ notation for sub objects. SAS is divided into two main parts- data and proc steps, and uses the . notation and var system
- SAS language has a few basic procs but many many options.
- How good a SAS coder you are often depends on what you can do in data manipulation in SAS Data Step
- Graphics is still better in R ggplot. But the SAS speed is thrilling.
- RAM is limited in the University Edition to 1 GB but I found that still quite fast. However I can upload only a 10 mb file to the SAS Studio for University Edition which I found reasonable for teaching purposes.
I have built a comprehensive learning path for professionals, students and researchers at http://www.analyticsvidhya.com/learning-paths-data-science-business-analytics-business-intelligence-big-data/learning-path-r-data-science/
Rather than simply put a list of resources, I have tried to create a structured path which is agnostic to any one source instead takes in best sources for each step or phase in the analytics work flow.
There are links to resources by Hadley Wickham, Revolution , Data Camp, videos, live projects, slideshares, tutorials done in a systematic manner.
Have a look and let me know how this can be made better