parallel processing – DECISION STATS

Teradata updates Teradata-R

The Teradata add-on package for R

teradataR is a package or library that allows R users to easily connect to Teradata, establish data frames (R data formats) to Teradata and to call in-database analytic functions within Teradata. This allows R users to work within their R console environment while leveraging the in-database functions developed with Teradata Warehouse Miner. This package provides 44 different analytical functions and an additional 20 data connection and R infrastructure functions. In addition, we’ve added a function that will list the stored procedures within Teradata provide the capability to call functions from R.

20 Functions to enable R infrastructure to operate with Teradata
tdConnect – Connect to Teradata via ODBC
Td.data.frame – Establish data frame connections to a Teradata table
44 in-database analytical functions callable from R. Sample of the functions include:
Descriptive statistics: Overlap, histogram, frequency, statistics, matrix functions, and values analysis
Reorganization functions: join, merge and samples
Transformations: bincode, recode, rescale, sigmoid, zscore and null replacement
K-Means clustering and Score K-Means
Statistical tests: ks, dagostino.pearson, shapiro.wilk, bionomial, and wilcoxon
R language features nrow, ncol, min, max, summary, as.dataframe, and dim
Tool and R functions that allow users to create their own custom analytic functions that’s callable by R.
Teradata Warehouse Miner can capture any analytic stream including UDFs and create a stored procedure
- Analytic process to create new derived predictive variables can be captured as a stored procedure.
- Entire process to create or update an analytical data set can be captured as a stored procedure.
- R function can list all the stored procedures within Teradata.
- R function can call a stored procedure that runs in-database

TeradataR allows R users to leverage all the benefits of in-database processing with Teradata:

Eliminate data movement from Teradata to the R framework for key data intensive tasks.
Leverage the speed of Teradata database’s parallel processing to run analytics against big data.
Ability to operate within the R console environment.
Embed your frequently performed tasks to run in-database.
R and TeradataR are free downloads.

Source- http://developer.teradata.com/applications/articles/in-database-analytics-with-teradata-r

This package allows users of R to interact with a Teradata database. R is an open source language for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering) and graphical techniques, and is highly extensible. Users can use many statistical functions directly against the Teradata system without having to extract the data into memory.

Enhancements included with this new 1.0.1 release include:

teradataR User Guide
addition of Mac OS X Package
addition of Red Hat Linux Package (added 2/23/12)
summary has been enhanced to run faster
JDBC support added to allow Windows or Mac users to run the package with JDBC
td.data.frame enhanced to allow support for manipulation to add columns and expressions
td.data.frame enhanced to use Teradata 14.0 Fastpath Transform Functions (see Appendix B)
td.tapply function added to apply a select group of functions to columns of an array

From-http://downloads.teradata.com/download/applications/teradata-r

and

A new R package for Red Hat Linux has been added to the teradataR 1.0.1 release. This new package provides the same functionality as in the previously released Windows and Mac OS X packages, but is built for Red Hat Linux. This version was built and tested on Red Hat Linux 6.2 32-bit. (The R version for Red Hat Linux is 2.14.1)

Installing this package is the same as any normal R package; just extract it into your R library area, or use the install.packagescommand with the file path.

from- http://developer.teradata.com/tag/r

and

With plenty of prolific and enthusiastic developers, the number of packages for R is expected to grow tremendously. Statisticians and analysts using these packages will find innovative ways to use data to answer their research and business questions. And as organizations become more willing to rely on open-source software for mission-critical tasks, R is poised to become an essential tool for analyzing our complex world.

Source-http://www.teradatamagazine.com/v09n03/Connections/R-you-ready/

From the user guide-

http://downloads.teradata.com/download/applications/teradata-r

teradataR allows R users to easily connect to Teradata, establish td data frames (virtual R data frames) to
Teradata and to call in-database analytic functions within Teradata. This allows R users to work within their R
console environment while leveraging the in-database functions

A Function List
teradataR-package Allow access to Teradata via R
as.data.frame.td.data.frame Convert td data frame to a data frame
as.td.data.frame Coerce to a td data frame
dim.td.data.frame Dimensions of a td data frame
hist.td.data.frame Histograms
Is.td.data.frame Is an Object a Teradata Data Frame
Is.td.expression Is an Object a Teradata Expression
mean.td.data.frame Arithmetic Mean
median.td.data.frame Median Value
min.td.data.frame Minima
predict.kmeans Kmeans Model Prediction
print.td.data.frame Show contents of a td data frame
sum.td.data.frame Sum of column
summary.td.data.frame Summary of Teradata Data Frame
Td.bincode Create Table of Bincode Values
Td.binomial Binomial Test
Td.binomialsign Binomial Sign Test
Td.call.sp Locate and call stored procedure
Td.cor Correlation Matrix
Td.cov Covariance Matrix
Td.dagostino.pearson D’Agostino Pearson Test
Td.data.frame Teradata Data Frames
Td.f.oneway One way F Test
Td.factanal Factor Analysis
Td.freq Frequency Analysis
Td.hist Histograms
Td.join Join Tables in Teradata
Td.kmeans K-Means Clustering
Td.ks Kolmogorov Smirnov Test
Td.lilliefors Lilliefors Test
Td.merge Merge Rows of Teradata Tables
Td.mode Mode Value of Column
Td.mwnkw Mann-Whitney/Kruskal Wallis Test
Td.nullreplace Replace Null Values
Td.overlap Overlap
Td.quantiles Quantile Values
Td.rank Rank
Td.recode Recode
Td.rescale Rescale Values of Column
Td.sample Sample Rows
Td.shapiro.wilk Shapiro Wilk
Td.sigmoid Sigmoid Transformation
Td.smirnov Smirnov Test
Td.solve Solve a system of equations
Td.stats General Statistics
Td.t.paired T Test Paired
Td.t.unpaired T Test Unpaired
Td.t.unpairedi T Test – Unpaired Indicator
Td.values Values
Td.wilcoxon Wilcoxon Test
Td.zscore Zscore Transformation
tdClose Close connection
tdConnect Connect to Teradata database
tdMetadataDB Set metadata database
tdQuery Query Teradata Database
teradataR Allow access to Teradata via R
[.td.data.frame Extract Teradata Data Frame
[<-.td.data.frame Replace value of Teradata Data Frame

December Snowflakes R 2.14.1

Almost missed this one due to Christmas-

R 2.14.1 is out, and so are binaries

so download them here (winduh users!)

http://cran.r-project.org/bin/windows/base/

David S sums it all up here

http://blog.revolutionanalytics.com/2011/12/r-2141-is-released.html

This update makes a few small improvements (such as the ability to accurately count the number of available cores for parallel processing on Solaris and Windows, and improved support of grayscale Postscript and PDF graphics export) and fixes a few minor bugs (such as a correction to BIC calculations in the presence of zero-weight observations).

Binaries are here-

http://cran.r-project.org/bin/windows/base/R-2.14.1-win.exe

Prof Peter D speeaks here-

https://stat.ethz.ch/pipermail/r-announce/2011/000548.html

Changes in recent versions are here-

http://cran.r-project.org/bin/windows/base/CHANGES.R-2.14.1.html

Major Changes-

Direct support in R is starting with release 2.14.0 for High Performance Computing

Interesting announcement from PiCloud

An interesting announcement from PiCloud who is a cloud computing startup, but focused on python (as the name suggests). They basically have created a cloud library (or in R lingo – a package) that enables you to call cloud power sitting from the desktop interface itself. This announcement is for multiple IP addresses. Real parallel processing or just a quick trick in technical jargon- you decide!

Prepare
Run
Monitor

s1 cores are comparable in performance to c1 cores with one extra trick up their sleeve: each job running in parallel will have a different IP.

Why is this important?
Using unique IPs is necessary to minimize the automated throttling most sites will impose when seeing fast, repeated access from a single IP.

How do I use it?
If you’re already using our c1 cores, all you’ll need to do is set the _type keyword.

cloud.call(func, _type=’s1′)

How much?
$0.04/core/hour

Why don’t other cores have individual IPs?
For other core types, such as c2, multiple cores may be running on a single machine that is assigned only a single IP address. When using s1 cores, you’re guaranteed that each core sits on a different machine.

http://www.picloud.com/

Contribution to #Rstats by Revolution

I have been watching for Revolution Analytics product almost since the inception of the company. It has managed to sail over storms, naysayers and critics with simple and effective strategy of launching good software, making good partnerships and keeping up media visibility with white papers, joint webinars, blogs, conferences and events.

However this is a listing of all technical contributions made by Revolution Analytics products to the #rstats project.

1) Useful Packages mostly in parallel processing or more efficient computing like

foreach (http://cran.r-project.org/web/packages/foreach/index.html) ,
nws (http://cran.r-project.org/web/packages/nws/).
iterators (http://cran.r-project.org/web/packages/iterators/index.html),
doSMP (http://cran.r-project.org/web/packages/doSMP/index.html).
doSNOW (http://cran.r-project.org/web/packages/doSNOW/index.html),
doMC (http://cran.r-project.org/web/packages/doMC/index.html),
revoIPC (http://cran.r-project.org/web/packages/revoIPC/)

2) RevoScaler package to beat R’s memory problem (this is probably the best in my opinion as it is yet to be replicated by the open source version and is a clear cut reason for going in for the paid version)

http://www.revolutionanalytics.com/products/enterprise-big-data.php

Efficient XDF File Format designed to efficiently handle huge data sets.

Data Step Functionality to quickly clean, transform, explore, and visualize huge data sets.

Data selection functionality to store huge data sets out of memory, and select subsets of rows and columns for in-memory operation with all R functions.

Visualize Large Data sets with line plots and histograms.

Built-in Statistical Algorithms for direct analysis of huge data sets:

Summary Statistics

Linear Regression

Logistic Regression

Crosstabulation

On-the-fly data transformations to include derived variables in models without writing new data files.

Extend Existing Analyses by writing user- defined R functions to “chunk” through huge data sets.

Direct import of fixed-format text data files and SAS data sets into .xdf format

3) RevoDeploy R for API based R solution – I somehow think this feature will get more important as time goes on but it seems a lower visibility offering right now.

http://www.revolutionanalytics.com/products/enterprise-deployment.php

Collection of Web services implemented as a RESTful API.

JavaScript and Java client libraries, allowing users to easily build custom Web applications on top of R.

.NET Client library — includes a COM interoperability to call R from VBA

Management Console for securely administrating servers, scripts and users through HTTP and HTTPS.

XML and JSON format for data exchange.

Built-in security model for authenticated or anonymous invocation of R Scripts.

Repository for storing R objects and R Script execution artifacts.

4) Revolutions IDE (or Productivity Environment) for a faster coding environment than command line. The GUI by Revolution Analytics is in the works. – Having used this- only the Code Snippets function is a clear differentiator from newer IDE and GUI. The code snippets is awesome though and even someone who doesnt know much R can get analysis set up quite fast and accurately.

http://www.revolutionanalytics.com/products/enterprise-productivity.php

Full-featured Visual Debugger for debugging R scripts, with call stack window and step-in, step-over, and step-out capability.

Enhanced Script Editor with hover-over help, word completion, find-across-files capability, automatic syntax checking, bookmarks, and navigation buttons.

Run Selection, Run to Line and Run to Cursor evaluation

R Code Snippets to automatically generate fill-in-the-blank sections of R code with tooltip help.

Object Browser showing available data and function objects (including those in packages), with context menus for plotting and editing data.

Solution Explorer for organizing, viewing, adding, removing, rearranging, and sourcing R scripts.

Customizable Workspace with dockable, floating, and tabbed tool windows.

Version Control Plug-in available for the open source Subversion version control software.

Marketing contributions from Revolution Analytics-

1) Sponsoring R sessions and user meets

2) Evangelizing R at conferences and partnering with corporate partners including JasperSoft, Microsoft , IBM and others at http://www.revolutionanalytics.com/partners/

3) Helping with online initiatives like http://www.inside-r.org/ (which is curiously dormant and now largely superseded by R-Bloggers.com) and the syntax highlighting tool at http://www.inside-r.org/pretty-r. In addition Revolution has been proactive in reaching out to the community

4) Helping pioneer blogging about R and Twitter Hash tag discussions , and contributing to Stack Overflow discussions. Within a short while, #rstats online community has overtaken a lot more established names- partly due to decentralized nature of its working.

Did I miss something out? yes , they share their code by GPL.

Let me know by feedback

AsterData still alive;/launches SQL-MapReduce Developer Portal

so apparantly ole client AsterData continues to thrive under gentle touch of Terrific Data

———————————————————————————————————————————————————

Aster Data today launched the SQL-MapReduce Developer Portal, a new online community for data scientists and analytic developers. For your convenience, I copied the release below and it can also be found here. Please let me know if you have any questions or if there is anything else I can help you with.

Sara Korolevich

Point Communications Group for Aster Data

sarak@pointcgroup.com

Office: 602.279.1137

Mobile: 623.326.0881

Teradata Accelerates Big Data Analytics with First Collaborative Community for SQL-MapReduce®

New online community for data scientists and analytic developers enables development and sharing of powerful MapReduce analytics

San Carlos, California – Teradata Corporation (NYSE:TDC) today announced the launch of the Aster Data SQL-MapReduce® Developer Portal. This portal is the first collaborative online developer community for SQL-MapReduce analytics, an emerging framework for processing non-relational data and ultra-fast analytics.

“Aster Data continues to deliver on its unique vision for powerful analytics with a rich set of tools to make development of those analytics quick and easy,” said Tasso Argyros, vice president of Aster Data Marketing and Product Management, Teradata Corporation. “This new developer portal builds on Aster Data’s continuing SQL-MapReduce innovation, leveraging the flexibility and power of SQL-MapReduce for analytics that were previously impossible or impractical.”

The developer portal showcases the power and flexibility of Aster Data’s SQL-MapReduce – which uniquely combines standard SQL with the popular MapReduce distributed computing technology for processing big data – by providing a collaborative community for sharing SQL-MapReduce expert insights in addition to sharing SQL-MapReduce analytic functions and sample code. Data scientists, quantitative analysts, and developers can now leverage the experience, knowledge, and best practices of a community of experts to easily harness the power of SQL-MapReduce for big data analytics.

A recent report from IDC Research, “Taking Care of Your Quants: Focusing Data Warehousing Resources on Quantitative Analysts Matters,” has shown that by enabling data scientists with the tools to harness emerging types and sources of data, companies create significant competitive advantage and become leaders in their respective industry.

“The biggest positive differences among leaders and the rest come from the introduction of new types of data,” says Dan Vesset, program vice president, Business Analytics Solutions, IDC Research. “This may include either new transactional data sources or new external data feeds of transactional or multi-structured interactional data — the latter may include click stream or other data that is a by-product of social networking.”

Vesset goes on to say, “Aster Data provides a comprehensive platform for analytics and their SQL-MapReduce Developer Portal provides a community for sharing best practices and functions which can have an even greater impact to an organization’s business.”

With this announcement Aster Data extends its industry leadership in delivering the most comprehensive analytic platform for big data analytics — not only capable of processing massive volumes of multi-structured data, but also providing an extensive set of tools and capabilities that make it simple to leverage the power of MapReduce analytics. The Aster Data

SQL-MapReduce Developer Portal brings the power of SQL-MapReduce accessible to data scientists, quantitative analysis, and analytic developers by making it easy to share and collaborate with experts in developing SQL-MapReduce analytics. This portal builds on Aster Data’s history of SQL-MapReduce innovations, including:

The first deep integration of SQL with MapReduce
The first MapReduce support for .NET
The first integrated development environment, Aster Data
Developer Express
A comprehensive suite of analytic functions, Aster Data
Analytic Foundation

Aster Data’s patent-pending SQL-MapReduce enables analytic applications and functions that can deliver faster, deeper insights on terabytes to petabytes of data. These applications are implemented using MapReduce but delivered through standard SQL and business intelligence (BI) tools.

SQL-MapReduce makes it possible for data scientists and developers to empower business analysts with the ability to make informed decisions, incorporating vast amounts of data, regardless of query complexity or data type. Aster Data customers are using SQL-MapReduce for rich analytics including analytic applications for social network analysis, digital marketing optimization, and on-the-fly fraud detection and prevention.

“Collaboration is at the core of our success as one of the leading providers, and pioneers of social software,” said Navdeep Alam, director of Data Architecture at Mzinga. “We are pleased to be one of the early members of The Aster Data SQL-MapReduce Developer Portal, which will allow us the ability to share and leverage insights with others in using big data analytics to attain a deeper understanding of customers’ behavior and create competitive advantage for our business.”

SQL-MapReduce is one of the core capabilities within Aster Data’s flagship product. Aster DatanCluster™ 4.6, the industry’s first massively parallel processing (MPP) analytic platform has an integrated analytics engine that stores and processes both relational and non-relational data at scale. With Aster Data’s unique analytics framework that supports both SQL and
SQL-MapReduce™, customers benefit from rich, new analytics on large data volumes with complex data types. Aster Data analytic functions are embedded within the analytic platform and processed locally with data, which allows for faster data exploration. The SQL-MapReduce framework provides scalable fault-tolerance for new analytics, providing users with superior reliability, regardless of number of users, query size, or data types.

About Aster Data
Aster Data is a market leader in big data analytics, enabling the powerful combination of cost-effective storage and ultra-fast analysis of new sources and types of data. The Aster Data nCluster analytic platform is a massively parallel software solution that embeds MapReduce analytic processing with data stores for deeper insights on new data sources and types to deliver new analytic capabilities with breakthrough performance and scalability. Aster Data’s solution utilizes Aster Data’s patent-pending SQL-MapReduce to parallelize processing of data and applications and deliver rich analytic insights at scale. Companies including Barnes & Noble, Intuit, LinkedIn, Akamai, and MySpace use Aster Data to deliver applications such as digital marketing optimization, social network and relationship analysis, and fraud detection and prevention.

About Teradata
Teradata is the world’s leader in data warehousing and integrated marketing management through itsdatabase software, data warehouse appliances, and enterprise analytics. For more information, visitteradata.com.

# # #

Teradata is a trademark or registered trademark of Teradata Corporation in the United States and other countries.

PMML Plugin for Greenplum now available

From a press release from Zementis.

, the Universal PMML Plug-in for in-database scoring. Available now for the EMC Greenplum Database, a high-performance massively parallel processing (MPP) database, the plug-in leverages the Predictive Model Markup Language (PMML) to execute predictive models directly within EMC Greenplum, for highly optimized in-database scoring.

Developed by the Data Mining Group (DMG), PMML is supported by all major data mining vendors, e.g., IBM SPSS, SAS, Teradata, FICO, STASTICA, Microstrategy, TIBCO and Revolution Analytics as well as open source tools like R, KNIME and RapidMiner. With PMML, models built in any of these data mining tools can now instantly be deployed in the EMC Greenplum database. The net result is the ability to leverage the power of standards-based predictive analytics on a massive scale, right where the data resides.

“By partnering with Zementis, a true PMML innovator, we are able to offer a vendor-agnostic solution for moving enterprise-level predictive analytics into the database execution environment,” said Dr. Steven Hillion, Vice President of Analytics at EMC Greenplum. “With Zementis and PMML, the de-facto standard for representing data mining models, we are eliminating the need to recode predictive analytic models in order to deploy them within our database. In turn, this enables an analyst to reduce the time to insight required in most businesses today.”

Want to learn more?

To learn more about how the EMC Greenplum Database and the Universal PMML Plug-in work together, feel free to:

Visit the PMML Plug-in product page
Download the white paper

The Universal PMML Plug-in for the EMC Greenplum Database is available now. Contact us today for more information.

Michael Zeller, CEO, Zementis

Creating New Capabilities With An Analytics Lab (chucksblog.emc.com)
EMC Greenplum releases Community Edition of MPP database product, big data analysis gets cheaper still (zdnet.com)
EMC lets go of Greenplum Community Edition (go.theregister.com)
Greenplum, Big Data, and an Open Source Card (arnoldit.com)
EMC launches free edition of Greenplum database (zdnet.com)

Sector/ Sphere – Faster than Hadoop/Mapreduce at Terasort

Here is a preview of a relatively young software Sector and Sphere- which are claimed to be better than Hadoop /MapReduce at TeraSort Benchmark among others.

http://sector.sourceforge.net/tech.html

System Overview

The Sector/Sphere stack consists of the Sector distributed file system and the Sphere parallel data processing framework. The objective is to support highly effective and efficient large data storage and processing over commodity computer clusters.

Sector/Sphere Architecture

Sector consists of 4 parts, as shown in the above diagram. The Security server maintains the system security configurations such as user accounts, data IO permissions, and IP access control lists. The master servers maintain file system metadata, schedule jobs, and respond users’ requests. Sector supports multiple active masters that can join and leave at run time and they all actively respond users’ requests. The slave nodes are racks of computers that store and process data. The slaves nodes can be located within a single data center to across multiple data centers with high speed network connections. Finally, the client includes tools and programming APIs to access and process Sector data.

Sphere: Parallel Data Processing Framework

Sphere allows developers to write parallel data processing applications with a very simple set of API. It applies user-defined functions (UDF) on all input data segments in parallel. In a Sphere application, both inputs and outputs are Sector files. Multiple Sphere processing can be combined to support more complicated applications, with inputs/outputs exchanged/shared via the Sector file system.

Data segments are processed at their storage locations whenever possible (data locality). Failed data segments may be restarted on other nodes to achieve fault tolerance.

The Sphere framework can be compared to MapReduce as they both enforce data locality and provide simplified programming interfaces. In fact, Sphere can simulate any MapReduce operations, but Sphere is more efficient and flexible. Sphere can provide better data locality for applications that process files or multiple files as minimum input units and for applications that involve with iterative/combinative processing, which requires coordination of multiple UDFs to obtain the final result.

A Sphere application includes two parts: the client program that organizes inputs (including certain parameters), outputs, and UDFs; and the UDFs that process data segments. Data segmentation, load balancing, and fault tolerance are transparent to developers.

Space: Column-based Distbuted Data Table

Space stores data tables in Sector and uses Sphere for parallel query processing. Space is similar to BigTable. Table is stored by columns and is segmented on to multiple slave nodes. Tables are independent and no relationship between tables are supported. A reduced set of SQL operations is supported, including but not limited to table creation and modification, key-value update and lookup, and select operations based on UDF.

Supported by the Sector data placement mechanism and the Sphere parallel processing framework, Space can support efficient key-value lookup and certain SQL queries on very large data tables.

Space is currently still in development.

and just when you thought Hadoop was the only way to be on the cloud.

http://sector.sourceforge.net/benchmark.html

The Terasort Benchmark

The table below lists the performance (total processing time in seconds) of the Terasort benchmark of both Sphere and Hadoop. (Terasort benchmark: suppose there are N nodes in the system, the benchmark generates a 10GB file on each node and sorts the total N*10GB data. Data generation time is excluded.) Note that it is normal to see a longer processing time for more nodes because the total amount of data also increases proportionally.

The performance value listed in this page was achieved using the Open Cloud Testbed. Currently the testbed consists of 4 racks. Each rack has 32 nodes, including 1 NFS server, 1 head node, and 30 compute/slave nodes. The head node is a Dell 1950, dual dual-core Xeon 3.0GHz, 16GB RAM. The compute nodes are Dell 1435s, single dual core AMD Opteron 2.0GHz, 4GB RAM, and 1TB single disk. The 4 racks are located in JHU (Baltimore), StarLight (Chicago), UIC (Chicago), and Calit2(San Diego). The inter-rack bandwidth is 10GE, supported by CiscoWave deployed over National Lambda Rail.

	Sphere	Hadoop (3 replicas)	Hadoop (1 replica)
UIC	1265	2889	2252
UIC + StarLight	1361	2896	2617
UIC + StarLight + Calit2	1430	4341	3069
UIC + StarLight + Calit2 + JHU	1526	6675	3702

The benchmark uses the testfs/testdc examples of Sphere and randomwriter/sort examples of Hadoop. Hadoop parameters were tuned to reach good results.

Updated on Sep. 22, 2009: We have benchmarked the most recent versions of Sector/Sphere (1.24a) and Hadoop (0.20.1) on a new set of servers. Each server node costs $2,200 and consits of a single Intel Xeon E5410 2.4GHz CPU, 16GB RAM, 4*1TB RAID0 disk, and 1Gb/s NIC. The 120 nodes are hosted on 4 racks within the same data center and the inter-rack bandwidth is 20Gb/s.

The table below lists the performance of sorting 1TB data using Sector/Sphere version 1.24a and Hadoop 0.20.1. Related Hadoop parameters have been tuned for better performance (e.g., big block size), while Sector/Sphere does not require tuning. In addition, to achieve the highest performance, replication is disabled in both systems (note that replication does not afftect the performance of Sphere but will significantly decrease the performance of Hadoop).

Number of Racks	Sphere	Hadoop
1	28m 25s	85m 49s
2	15m 20s	37m 0s
3	10m 19s	25m 14s
4	7m 56s	17m 45s

The Teradata add-on package for R

Please share:

Please share:

Please share:

Please share:

Please share:

Related Articles

Please share:

System Overview

Sphere: Parallel Data Processing Framework

Space: Column-based Distbuted Data Table

and just when you thought Hadoop was the only way to be on the cloud.

The Terasort Benchmark

Please share: