The Teradata add-on package for R
teradataR is a package or library that allows R users to easily connect to Teradata, establish data frames (R data formats) to Teradata and to call in-database analytic functions within Teradata. This allows R users to work within their R console environment while leveraging the in-database functions developed with Teradata Warehouse Miner. This package provides 44 different analytical functions and an additional 20 data connection and R infrastructure functions. In addition, we’ve added a function that will list the stored procedures within Teradata provide the capability to call functions from R.
- 20 Functions to enable R infrastructure to operate with Teradata
- tdConnect – Connect to Teradata via ODBC
- Td.data.frame – Establish data frame connections to a Teradata table
- 44 in-database analytical functions callable from R. Sample of the functions include:
- Descriptive statistics: Overlap, histogram, frequency, statistics, matrix functions, and values analysis
- Reorganization functions: join, merge and samples
- Transformations: bincode, recode, rescale, sigmoid, zscore and null replacement
- K-Means clustering and Score K-Means
- Statistical tests: ks, dagostino.pearson, shapiro.wilk, bionomial, and wilcoxon
- R language features nrow, ncol, min, max, summary, as.dataframe, and dim
- Tool and R functions that allow users to create their own custom analytic functions that’s callable by R.
- Teradata Warehouse Miner can capture any analytic stream including UDFs and create a stored procedure
- Analytic process to create new derived predictive variables can be captured as a stored procedure.
- Entire process to create or update an analytical data set can be captured as a stored procedure.
- R function can list all the stored procedures within Teradata.
- R function can call a stored procedure that runs in-database
TeradataR allows R users to leverage all the benefits of in-database processing with Teradata:
- Eliminate data movement from Teradata to the R framework for key data intensive tasks.
- Leverage the speed of Teradata database’s parallel processing to run analytics against big data.
- Ability to operate within the R console environment.
- Embed your frequently performed tasks to run in-database.
- R and TeradataR are free downloads.
This package allows users of R to interact with a Teradata database. R is an open source language for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering) and graphical techniques, and is highly extensible. Users can use many statistical functions directly against the Teradata system without having to extract the data into memory.
Enhancements included with this new 1.0.1 release include:
- teradataR User Guide
- addition of Mac OS X Package
- addition of Red Hat Linux Package (added 2/23/12)
- summary has been enhanced to run faster
- JDBC support added to allow Windows or Mac users to run the package with JDBC
- td.data.frame enhanced to allow support for manipulation to add columns and expressions
- td.data.frame enhanced to use Teradata 14.0 Fastpath Transform Functions (see Appendix B)
- td.tapply function added to apply a select group of functions to columns of an array
A new R package for Red Hat Linux has been added to the teradataR 1.0.1 release. This new package provides the same functionality as in the previously released Windows and Mac OS X packages, but is built for Red Hat Linux. This version was built and tested on Red Hat Linux 6.2 32-bit. (The R version for Red Hat Linux is 2.14.1)
Installing this package is the same as any normal R package; just extract it into your R library area, or use the
install.packagescommand with the file path.
With plenty of prolific and enthusiastic developers, the number of packages for R is expected to grow tremendously. Statisticians and analysts using these packages will find innovative ways to use data to answer their research and business questions. And as organizations become more willing to rely on open-source software for mission-critical tasks, R is poised to become an essential tool for analyzing our complex world.
From the user guide-
teradataR allows R users to easily connect to Teradata, establish td data frames (virtual R data frames) to
Teradata and to call in-database analytic functions within Teradata. This allows R users to work within their R
console environment while leveraging the in-database functions
A Function List
teradataR-package Allow access to Teradata via R
as.data.frame.td.data.frame Convert td data frame to a data frame
as.td.data.frame Coerce to a td data frame
dim.td.data.frame Dimensions of a td data frame
Is.td.data.frame Is an Object a Teradata Data Frame
Is.td.expression Is an Object a Teradata Expression
mean.td.data.frame Arithmetic Mean
median.td.data.frame Median Value
predict.kmeans Kmeans Model Prediction
print.td.data.frame Show contents of a td data frame
sum.td.data.frame Sum of column
summary.td.data.frame Summary of Teradata Data Frame
Td.bincode Create Table of Bincode Values
Td.binomial Binomial Test
Td.binomialsign Binomial Sign Test
Td.call.sp Locate and call stored procedure
Td.cor Correlation Matrix
Td.cov Covariance Matrix
Td.dagostino.pearson D’Agostino Pearson Test
Td.data.frame Teradata Data Frames
Td.f.oneway One way F Test
Td.factanal Factor Analysis
Td.freq Frequency Analysis
Td.join Join Tables in Teradata
Td.kmeans K-Means Clustering
Td.ks Kolmogorov Smirnov Test
Td.lilliefors Lilliefors Test
Td.merge Merge Rows of Teradata Tables
Td.mode Mode Value of Column
Td.mwnkw Mann-Whitney/Kruskal Wallis Test
Td.nullreplace Replace Null Values
Td.quantiles Quantile Values
Td.rescale Rescale Values of Column
Td.sample Sample Rows
Td.shapiro.wilk Shapiro Wilk
Td.sigmoid Sigmoid Transformation
Td.smirnov Smirnov Test
Td.solve Solve a system of equations
Td.stats General Statistics
Td.t.paired T Test Paired
Td.t.unpaired T Test Unpaired
Td.t.unpairedi T Test – Unpaired Indicator
Td.wilcoxon Wilcoxon Test
Td.zscore Zscore Transformation
tdClose Close connection
tdConnect Connect to Teradata database
tdMetadataDB Set metadata database
tdQuery Query Teradata Database
teradataR Allow access to Teradata via R
[.td.data.frame Extract Teradata Data Frame
[<-.td.data.frame Replace value of Teradata Data Frame