Home » Posts tagged 'sas' (Page 3)
Tag Archives: sas
While SAS language has a beautifully designed ODS (Output Delivery System) for saving output from certain analysis in excel files (and html and others), in R one can simply use the object, put it in a write.table and save it a csv file using the file parameter within write.table.
As a business analytics consultant, the output from a Proc Means, Proc Freq (SAS) or a summary/describe/table command (in R) is to be presented as a final report. Copying and pasting is not feasible especially for large amounts of text, or remote computers.
Using the following we can simple save the output in R
#We shifted the directory, so we can save output without putting the entire path again and again for each step.
#I have found the summary command most useful for initial analysis and final display (particularly during the data munging step)
# I assigned a new object to the analysis step (summary), it could also be summary,names, describe (HMisc) or table (for frequency analysis),
Note: This is for basic beginners in R using it for business analytics dealing with large number of variables.
If you have a large number of files in a local directory to be read in R, you can avoid typing the entire path again and again by modifying the file parameter in the read.table and changing the working directory to that folder
and so on…
maybe there is a better approach somewhere on Stack Overflow or R help, but this will work just as well.
you can then merge the objects created ajayt1 and ajayt2… (to be continued)
Just got the email-more software is good news!
Revolution R Enterprise 6.0 for 32-bit and 64-bit Windows and 64-bit Red Hat Enterprise Linux (RHEL 5.x and RHEL 6.x) features an updated release of the RevoScaleR package that provides fast, scalable data management and data analysis: the same code scales from data frames to local, high-performance .xdf files to data distributed across a Windows HPC Server cluster or IBM Platform Computing LSF cluster. RevoScaleR also allows distribution of the execution of essentially any R function across cores and nodes, delivering the results back to the user.
Detailed information on what’s new in 6.0 and known issues:
and from the manual-lots of function goodies for Big Data
- IBM Platform LSF Cluster support [Linux only]. The new RevoScaleR function, RxLsfCluster, allows you to create a distributed compute context for the Platform LSF workload manager.
- Azure Burst support added for Microsoft HPC Server [Windows only]. The new RevoScaleR function, RxAzureBurst, allows you to create a distributed compute context to have computations performed in the cloud using Azure Burst
- The rxExec function allows distributed execution of essentially any R function across cores and nodes, delivering the results back to the user.
- functions RxLocalParallel and RxLocalSeq allow you to create compute context objects for local parallel and local sequential computation, respectively.
- RxForeachDoPar allows you to create a compute context using the currently registered foreach parallel backend (doParallel, doSNOW, doMC, etc.). To execute rxExec calls, simply register the parallel backend as usual, then set your compute context as follows: rxSetComputeContext(RxForeachDoPar())
- rxSetComputeContext and rxGetComputeContext simplify management of compute contexts.
- rxGlm, provides a fast, scalable, distributable implementation of generalized linear models. This expands the list of full-featured high performance analytics functions already available: summary statistics (rxSummary), cubes and cross tabs (rxCube,rxCrossTabs), linear models (rxLinMod), covariance and correlation matrices (rxCovCor),
binomial logistic regression (rxLogit), and k-means clustering (rxKmeans)example: a Tweedie family with 1 million observations and 78 estimated coefficients (categorical data)
took 17 seconds with rxGlm compared with 377 seconds for glm on a quadcore laptop
and easier working with R’s big brother SAS language
RevoScaleR high-performance analysis functions will now conveniently work directly with a variety of external data sources (delimited and fixed format text files, SAS files, SPSS files, and ODBC data connections). New functions are provided to create data source objects to represent these data sources (RxTextData, RxOdbcData, RxSasData, and RxSpssData), which in turn can be specified for the ‘data’ argument for these RevoScaleR analysis functions: rxHistogram, rxSummary, rxCube, rxCrossTabs, rxLinMod, rxCovCor, rxLogit, and rxGlm.
you can analyze a SAS file directly as follows:
# Create a SAS data source with information about variables and # rows to read in each chunk
sasDataFile <- file.path(rxGetOption(“sampleDataDir”),”claims.sas7bdat”)
sasDS <- RxSasData(sasDataFile, stringsAsFactors = TRUE,colClasses = c(RowNum = “integer”),rowsPerRead = 50)
# Compute and draw a histogram directly from the SAS file
rxHistogram( ~cost|type, data = sasDS)
# Compute summary statistics
rxSummary(~., data = sasDS)
# Estimate a linear model
linModObj <- rxLinMod(cost~age + car_age + type, data = sasDS)
# Import a subset into a data frame for further inspection
subData <- rxImport(inData = sasDS, rowSelection = cost > 400,
varsToKeep = c(“cost”, “age”, “type”))
The installation instructions and instructions for getting started with Revolution R Enterprise & RevoDeployR for Windows:
Here is a brief interview with Alvaro Tejada Galindo aka Blag who is a developer working with SAP Hana and R at SAP Labs, Montreal. SAP Hana is SAP’s latest offering in BI , it’s also a database and a computing environment , and using R and HANA together on the cloud can give major productivity gains in terms of both speed and analytical ability, as per preliminary use cases.
Ajay- What made the R language a fit for SAP HANA. Did you consider other languages? What is your view on Julia/Python/SPSS/SAS/Matlab languages
Blag- I think “R” is a must for SAP HANA. As the fastest database in the market, we needed a language that could help us shape the data in the best possible way. “R” filled that purpose very well. Right now, “R” is not the only language as “L” can be used as well (
) …not forgetting “SQLScript” which is our own version of SQL (
) . I have to admit that I tried Julia, but couldn’t manage to make it work. Regarding Python, it’s an interesting question as I’m going to blog about Python and SAP HANA soon. About Matlab, SPSS and SAS I haven’t used them, so I got nothing to say there.
Ajay- What is your view on some of the limitations of R that can be overcome with using it with SAP HANA.
Blag- I think mostly the ability of SAP HANA to work with big data. Again, SAP HANA and “R” can work very nicely together and achieve things that weren’t possible before.
Ajay- Have you considered other vendors of R including working with RStudio, Revolution Analytics, and even Oracle R Enterprise.
Blag- I’m not really part of the SAP HANA or the R groups inside SAP, so I can’t really comment on that. I can only say that I use RStudio every time I need to do something with R. Regarding Oracle…I don’t think so…but they can use any of our products whenever they want.
Ajay- Do you have a case study on an actual usage of R with SAP HANA that led to great results.
Blag- Right now the use of “R” and SAP HANA is very preliminary, I don’t think many people has start working on it…but as an example that it works, you can check this awesome blog entry from my friend Jitender Aswani “Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps “ (
Ajay- Does your group in SAP plan to give to the R ecosystem by attending conferences like UseR 2012, sponsoring meets, or package development etc
Blag- My group is in charge of everything developers, so sure, we’re planning to get more in touch with R developers and their ecosystem. Not sure how we’re going to deal with it, but at least I’m going to get myself involved in the Montreal R Group.
|Name:||Alvaro Tejada Galindo|
|Company:||SAP Canada Labs-Montreal|
|Instant Messaging Type:|
|Instant Messaging ID:||Blag|
|Professional Blog URL:||
|My Relation to SAP:||employee|
|Short Bio:||Development Expert for the Technology Innovation and Developer Experience team.Used to be an ABAP Consultant for the last 11 years. Addicted to programming since 1997.|
SAP HANA is SAP AG’s implementation of in-memory database technology. There are four components within the software group:
- SAP HANA DB (or HANA DB) refers to the database technology itself,
- SAP HANA Studio refers to the suite of tools provided by SAP for modeling,
- SAP HANA Appliance refers to HANA DB as delivered on partner certified hardware (see below) as anappliance. It also includes the modeling tools from HANA Studio as well replication and data transformation tools to move data into HANA DB,
- SAP HANA Application Cloud refers to the cloud based infrastructure for delivery of applications (typically existing SAP applications rewritten to run on HANA).
R is integrated in HANA DB via TCP/IP. HANA uses SQL-SHM, a shared memory-based data exchange to incorporate R’s vertical data structure. HANA also introduces R scripts equivalent to native database operations like join or aggregation. HANA developers can write R scripts in SQL and the types are automatically converted in HANA. R scripts can be invoked with HANA tables as both input and output in the SQLScript. R environments need to be deployed to use R within SQLScript
More blog posts on using SAP and R togetherDealing with R and HANA
HANA meets R
When SAP HANA met R – First kiss
Using RODBC with SAP HANA DB-
SAP HANA: My experiences on using SAP HANA with R
and of course the blog that started it all-
A nice conference from the grand old institution of Analytics, SAS Institute’s annual analytic pow-wow.
I especially like some of the trainings- and wonder if they could be stored as e-learning modules for students/academics to review
in SAS’s extensive and generous Online Education Program.
Sunday Morning Workshop
SAS Sentiment Analysis Studio: Introduction to Building Models
This course provides an introduction to SAS Sentiment Analysis Studio. It is designed for system designers, developers, analytical consultants and managers who want to understand techniques and approaches for identifying sentiment in textual documents.
Sunday, Oct. 7, 8:30a.m.-12p.m. – $250
Sunday Afternoon Workshops
Business Analytics Consulting Workshops
This workshop is designed for the analyst, statistician, or executive who wants to discuss best-practice approaches to solving specific business problems, in the context of analytics. The two-hour workshop will be customized to discuss your specific analytical needs and will be designed as a one-on-one session for you, including up to five individuals within your company sharing your analytical goal. This workshop is specifically geared for an expert tasked with solving a critical business problem who needs consultation for developing the analytical approach required. The workshop can be customized to meet your needs, from a deep-dive into modeling methods to a strategic plan for analytic initiatives. In addition to the two hours at the conference location, this workshop includes some advanced consulting time over the phone, making it a valuable investment at a bargain price.
Sunday, Oct. 7; 1-3 p.m. or 3:30-5:30 p.m. – $200
Demand-Driven Forecasting: Sensing Demand Signals, Shaping and Predicting Demand
This half-day lecture teaches students how to integrate demand-driven forecasting into the consensus forecasting process and how to make the current demand forecasting process more demand-driven.
Sunday, Oct. 7; 1-5 p.m.
Forecast Value Added Analysis
Forecast Value Added (FVA) is the change in a forecasting performance metric (such as MAPE or bias) that can be attributed to a particular step or participant in the forecasting process. FVA analysis is used to identify those process activities that are failing to make the forecast any better (or might even be making it worse). This course provides step-by-step guidelines for conducting FVA analysis – to identify and eliminate the waste, inefficiency, and worst practices from your forecasting process. The result can be better forecasts, with fewer resources and less management time spent on forecasting.
Sunday, Oct. 7; 1-5 p.m.
SAS Enterprise Content Categorization: An Introduction
This course gives an introduction to methods of unstructured data analysis, document classification and document content identification. The course also uses examples as the basis for constructing parse expressions and resulting entities.
Sunday, Oct. 7; 1-5 p.m.
You can see more on this yourself at -
The noted Diamonds dataset in the ggplot2 package of R is actually culled from the website
However it has ~55000 diamonds, while the whole Diamonds search engine has almost ten times that number. Using iMacros – a Google Chrome Plugin, we can scrape that data (or almost any data). The iMacros chrome plugin is available at
while notes on coding are at
Imacros makes coding as easy as recording macro and the code is automatcially generated for whatever actions you do. You can set parameters to extract only specific parts of the website, and code can be run into a loop (of 9999 times!)
Here is the iMacros code-Note you need to navigate to the web site
before running it
VERSION BUILD=5100505 RECORDER=CR
SET !EXTRACT_TEST_POPUP NO
SET !ERRORIGNORE YES
TAG POS=6 TYPE=TABLE ATTR=TXT:* EXTRACT=TXT
TAG POS=1 TYPE=DIV ATTR=CLASS:paginate_enabled_next
SAVEAS TYPE=EXTRACT FOLDER=* FILE=test+3
and voila- all the diamonds you need to analyze!
The returning data can be read using the standard delimiter data munging in the language of SAS or R.
More on IMacros from
Automate your web browser. Record and replay repetitious work
If you encounter any problems with iMacros for Chrome, please let us know in our Chrome user forum at http://forum.iopus.com/viewforum.php?f=21 Our forum is also the best place for new feature suggestions ---- iMacros was designed to automate the most repetitious tasks on the web. If there’s an activity you have to do repeatedly, just record it in iMacros. The next time you need to do it, the entire macro will run at the click of a button! With iMacros, you can quickly and easily fill out web forms, remember passwords, create a webmail notifier, and more. You can keep the macros on your computer for your own use, use them within bookmark sync / Xmarks or share them with others by embedding them on your homepage, blog, company Intranet or any social bookmarking service as bookmarklet. The uses are limited only by your imagination! Popular uses are as web macro recorder, form filler on steroids and highly-secure password manager (256-bit AES encryption).
JMP , the visual data exploration, statistical quality control software from SAS Institute launched version 10 of its software today.
JMP 10 includes:
Numerous enhancements to the drag-and-drop Graph Builder, including a new iPad application.
A cutting-edge Control Chart Builder to create process control charts with drag-and-drop ease.
New reliability capabilities, including growth and forecast models.
Additions and improvements for sorting and filtering data, design of experiments, statistical modeling, scripting, add-in and application development, script debugging and more.
From JohnSall’s blog post at
Much of the development centered on four focus areas:
1. Graph Builder everywhere. The Graph Builder platform itself has new features like Heatmap and Treemap, an elements palette and properties panel, making the choices more visible. But Graph Builder also has some descendents now, including the new Control Chart Builder, which makes creating control charts an interactive process. In addition, some of the drag-and-drop features that are used to change columns in Graph Builder are also available in Distribution, Fit Y by X, and a few other places. Finally, Graph Builder has been ported to the iPad. For the first time, you can use JMP for exploration and presentation on a mobile device for free. So just think of Graph Builder as gradually taking over in lots of places.
2. Expert-driven design.reliability, measurement systems, and partial least squares analyses.
3. Performance. this release has the most new multithreading so far
4. Application development
You can read more here -