Teradata updates Teradata-R

The Teradata add-on package for R

teradataR is a package or library that allows R users to easily connect to Teradata, establish data frames (R data formats) to Teradata and to call in-database analytic functions within Teradata. This allows R users to work within their R console environment while leveraging the in-database functions developed with Teradata Warehouse Miner. This package provides 44 different analytical functions and an additional 20 data connection and R infrastructure functions. In addition, we’ve added a function that will list the stored procedures within Teradata provide the capability to call functions from R.

20 Functions to enable R infrastructure to operate with Teradata
tdConnect – Connect to Teradata via ODBC
Td.data.frame – Establish data frame connections to a Teradata table
44 in-database analytical functions callable from R. Sample of the functions include:
Descriptive statistics: Overlap, histogram, frequency, statistics, matrix functions, and values analysis
Reorganization functions: join, merge and samples
Transformations: bincode, recode, rescale, sigmoid, zscore and null replacement
K-Means clustering and Score K-Means
Statistical tests: ks, dagostino.pearson, shapiro.wilk, bionomial, and wilcoxon
R language features nrow, ncol, min, max, summary, as.dataframe, and dim
Tool and R functions that allow users to create their own custom analytic functions that’s callable by R.
Teradata Warehouse Miner can capture any analytic stream including UDFs and create a stored procedure
- Analytic process to create new derived predictive variables can be captured as a stored procedure.
- Entire process to create or update an analytical data set can be captured as a stored procedure.
- R function can list all the stored procedures within Teradata.
- R function can call a stored procedure that runs in-database

TeradataR allows R users to leverage all the benefits of in-database processing with Teradata:

Eliminate data movement from Teradata to the R framework for key data intensive tasks.
Leverage the speed of Teradata database’s parallel processing to run analytics against big data.
Ability to operate within the R console environment.
Embed your frequently performed tasks to run in-database.
R and TeradataR are free downloads.

Source- http://developer.teradata.com/applications/articles/in-database-analytics-with-teradata-r

This package allows users of R to interact with a Teradata database. R is an open source language for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering) and graphical techniques, and is highly extensible. Users can use many statistical functions directly against the Teradata system without having to extract the data into memory.

Enhancements included with this new 1.0.1 release include:

teradataR User Guide
addition of Mac OS X Package
addition of Red Hat Linux Package (added 2/23/12)
summary has been enhanced to run faster
JDBC support added to allow Windows or Mac users to run the package with JDBC
td.data.frame enhanced to allow support for manipulation to add columns and expressions
td.data.frame enhanced to use Teradata 14.0 Fastpath Transform Functions (see Appendix B)
td.tapply function added to apply a select group of functions to columns of an array

From-http://downloads.teradata.com/download/applications/teradata-r

and

A new R package for Red Hat Linux has been added to the teradataR 1.0.1 release. This new package provides the same functionality as in the previously released Windows and Mac OS X packages, but is built for Red Hat Linux. This version was built and tested on Red Hat Linux 6.2 32-bit. (The R version for Red Hat Linux is 2.14.1)

Installing this package is the same as any normal R package; just extract it into your R library area, or use the install.packagescommand with the file path.

from- http://developer.teradata.com/tag/r

and

With plenty of prolific and enthusiastic developers, the number of packages for R is expected to grow tremendously. Statisticians and analysts using these packages will find innovative ways to use data to answer their research and business questions. And as organizations become more willing to rely on open-source software for mission-critical tasks, R is poised to become an essential tool for analyzing our complex world.

Source-http://www.teradatamagazine.com/v09n03/Connections/R-you-ready/

From the user guide-

http://downloads.teradata.com/download/applications/teradata-r

teradataR allows R users to easily connect to Teradata, establish td data frames (virtual R data frames) to
Teradata and to call in-database analytic functions within Teradata. This allows R users to work within their R
console environment while leveraging the in-database functions

A Function List
teradataR-package Allow access to Teradata via R
as.data.frame.td.data.frame Convert td data frame to a data frame
as.td.data.frame Coerce to a td data frame
dim.td.data.frame Dimensions of a td data frame
hist.td.data.frame Histograms
Is.td.data.frame Is an Object a Teradata Data Frame
Is.td.expression Is an Object a Teradata Expression
mean.td.data.frame Arithmetic Mean
median.td.data.frame Median Value
min.td.data.frame Minima
predict.kmeans Kmeans Model Prediction
print.td.data.frame Show contents of a td data frame
sum.td.data.frame Sum of column
summary.td.data.frame Summary of Teradata Data Frame
Td.bincode Create Table of Bincode Values
Td.binomial Binomial Test
Td.binomialsign Binomial Sign Test
Td.call.sp Locate and call stored procedure
Td.cor Correlation Matrix
Td.cov Covariance Matrix
Td.dagostino.pearson D’Agostino Pearson Test
Td.data.frame Teradata Data Frames
Td.f.oneway One way F Test
Td.factanal Factor Analysis
Td.freq Frequency Analysis
Td.hist Histograms
Td.join Join Tables in Teradata
Td.kmeans K-Means Clustering
Td.ks Kolmogorov Smirnov Test
Td.lilliefors Lilliefors Test
Td.merge Merge Rows of Teradata Tables
Td.mode Mode Value of Column
Td.mwnkw Mann-Whitney/Kruskal Wallis Test
Td.nullreplace Replace Null Values
Td.overlap Overlap
Td.quantiles Quantile Values
Td.rank Rank
Td.recode Recode
Td.rescale Rescale Values of Column
Td.sample Sample Rows
Td.shapiro.wilk Shapiro Wilk
Td.sigmoid Sigmoid Transformation
Td.smirnov Smirnov Test
Td.solve Solve a system of equations
Td.stats General Statistics
Td.t.paired T Test Paired
Td.t.unpaired T Test Unpaired
Td.t.unpairedi T Test – Unpaired Indicator
Td.values Values
Td.wilcoxon Wilcoxon Test
Td.zscore Zscore Transformation
tdClose Close connection
tdConnect Connect to Teradata database
tdMetadataDB Set metadata database
tdQuery Query Teradata Database
teradataR Allow access to Teradata via R
[.td.data.frame Extract Teradata Data Frame
[<-.td.data.frame Replace value of Teradata Data Frame

New Plotters in Rapid Miner 5.2

I almost missed this because of my vacation and traveling

Rapid Miner has a tonne of new stuff (Statuary Ethics Declaration- Rapid Miner has been an advertising partner for Decisionstats – see the right margin)

see

http://rapid-i.com/component/option,com_myblog/Itemid,172/lang,en/

Great New Graphical Plotters

and some flashy work

and a great series of educational lectures

A Simple Explanation of Decision Tree Modeling based on Entropies

Link: http://www.simafore.com/blog/bid/94454/A-simple-explanation-of-how-entropy-fuels-a-decision-tree-model

Description of some of the basics of decision trees. Simple and hardly any math, I like the plots explaining the basic idea of the entropy as splitting criterion (although we actually calculate gain ratio differently than explained…)

Logistic Regression for Business Analytics using RapidMiner

Link: http://www.simafore.com/blog/bid/57924/Logistic-regression-for-business-analytics-using-RapidMiner-Part-2

Same as above, but this time for modeling with logistic regression.
Easy to read and covering all basic ideas together with some examples. If you are not familiar with the topic yet, part 1 (see below) might help.

Part 1 (Basics): http://www.simafore.com/blog/bid/57801/Logistic-regression-for-business-analytics-using-RapidMiner-Part-1

Deploy Model: http://www.simafore.com/blog/bid/82024/How-to-deploy-a-logistic-regression-model-using-RapidMiner

Advanced Information: http://www.simafore.com/blog/bid/99443/Understand-3-critical-steps-in-developing-logistic-regression-models

and lastly a new research project for collaborative data mining

http://www.e-lico.eu/

e-LICO Architecture and Components

The goal of the e-LICO project is to build a virtual laboratory for interdisciplinary collaborative research in data mining and data-intensive sciences. The proposed e-lab will comprise three layers: the e-science and data mining layers will form a generic research environment that can be adapted to different scientific domains by customizing the application layer.

Drag a data set into one of the slots. It will be automatically detected as training data, test data or apply data, depending on whether it has a label or not.
Select a goal. The most frequent one is probably “Predictive Modelling”. All goals have comments, so you see what they can be used for.
Select “Fetch plans” and wait a bit to get a list of processes that solve your problem. Once the planning completes, select one of the processes (you can see a preview at the right) and run it. Alternatively, select multiple (selecting none means selecting all) and evaluate them on your data in a batch.

The assistant strives to generate processes that are compatible with your data. To do so, it performs a lot of clever operations, e.g., it automatically replaces missing values if missing values exist and this is required by the learning algorithm or performs a normalization when using a distance-based learner.

You can install the extension directly by using the Rapid-I Marketplace instead of the old update server. Just go to the preferences and enter http://rapidupdate.de:8180/UpdateServer as the update URL

Of course Rapid Miner has been of the most professional open source analytics company and they have been doing it for a long time now. I am particularly impressed by the product map (see below) and the graphical user interface.

http://rapid-i.com/content/view/186/191/lang,en/

Product Map

Just click on the products in the overview below in order to get more information about Rapid-I products.

Rapid-I Product Overview

JMP and R – #rstats

An amazing example of R being used sucessfully in combination (and not is isolation) with other enterprise software is the add-ins functionality of JMP and it’s R integration.

See the following JMP add-ins which use R

http://support.sas.com/demosdownloads/downarea_t4.jsp?productID=110454&jmpflag=Y

JMP Add-in: Multidimensional Scaling using R

This add-in creates a new menu command under the Add-Ins Menu in the submenu R Add-ins. The script will launch a custom dialog (or prompt for a JMP data table is one is not already open) where you can cast columns into roles for performing MDS on the data table. The analysis results in a data table of MDS dimensions and associated output graphics. MDS is a dimension reduction method that produces coordinates in Euclidean space (usually 2D, 3D) that best represent the structure of a full distance/dissimilarity matrix. MDS requires that input be a symmetric dissimilarity matrix. Input to this application can be data that is already in the form of a symmetric dissimilarity matrix or the dissimilarity matrix can be computed based on the input data (where dissimilarity measures are calculated between rows of the input data table in R).

Submitted by: Kelci Miclaus	Initiative: All
Application: Add-Ins	Analysis: Exploratory Data Analysis

http://support.sas.com/demosdownloads/downarea_t4.jsp?productID=110461&jmpflag=Y

Chernoff Faces Add-in

One way to plot multivariate data is to use Chernoff faces. For each observation in your data table, a face is drawn such that each variable in your data set is represented by a feature in the face. This add-in uses JMP’s R integration functionality to create Chernoff faces. An R install and the TeachingDemos R package are required to use this add-in.

Submitted by: Clay Barker	Initiative: All
Application: Add-Ins	Analysis: Data Visualization

http://support.sas.com/demosdownloads/downarea_t4.jsp?productID=110462&jmpflag=Y

Support Vector Machine for Classification

By simply opening a data table, specifying X, Y variables, selecting a kernel function, and specifying its parameters on the user-friendly dialog, you can build a classification model using Support Vector Machine. Please note that R package ‘e1071’ should be installed before running this dialog. The package can be found from http://cran.r-project.org/web/packages/e1071/index.html.

Submitted by: Jong-Seok Lee	Initiative: All
Application: Add-Ins	Analysis: Exploratory Data Analysis/Mining

http://support.sas.com/demosdownloads/downarea_t4.jsp?productID=110460&jmpflag=N

Penalized Regression Add-in

This add-in uses JMP’s R integration functionality to provide access to several penalized regression methods. Methods included are the LASSO (least absolutee shrinkage and selection operator, LARS (least angle regression), Forward Stagewise, and the Elastic Net. An R install and the “lars” and “elasticnet” R packages are required to use this add-in.

Submitted by: Clay Barker	Initiative: All
Application: Add-Ins	Analysis: Regression

http://support.sas.com/demosdownloads/downarea_t4.jsp?productID=110456&jmpflag=Y

MP Addin: Univariate Nonparametric Bootstrapping

This script performs simple univariate, nonparametric bootstrap sampling by using the JMP to R Project integration. A JMP Dialog is built by the script where the variable you wish to perform bootstrapping over can be specified. A statistic to compute for each bootstrap sample is chosen and the data are sent to R using new JSL functionality available in JMP 9. The boot package in R is used to call the boot() function and the boot.ci() function to calculate the sample statistic for each bootstrap sample and the basic bootstrap confidence interval. The results are brought back to JMP and displayed using the JMP Distribution platform.

Submitted by: Kelci Miclaus	Initiative: All
Application: Add-Ins	Analysis: Basic Statistics

Use R for Business- Competition worth $ 20,000 #rstats

All you contest junkies, R lovers and general change the world people, here’s a new contest to use R in a business application

http://www.revolutionanalytics.com/news-events/news-room/2011/revolution-analytics-launches-applications-of-r-in-business-contest.php

REVOLUTION ANALYTICS LAUNCHES “APPLICATIONS OF R IN BUSINESS” CONTEST

$20,000 in Prizes for Users Solving Business Problems with R

PALO ALTO, Calif. – September 1, 2011 – Revolution Analytics, the leading commercial provider of R software, services and support, today announced the launch of its “Applications of R in Business” contest to demonstrate real-world uses of applying R to business problems. The competition is open to all R users worldwide and submissions will be accepted through October 31. The Grand Prize winner for the best application using R or Revolution R will receive $10,000.

The bonus-prize winner for the best application using features unique to Revolution R Enterprise – such as itsbig-data analytics capabilities or its Web Services API for R – will receive $5,000. A panel of independent judges drawn from the R and business community will select the grand and bonus prize winners. Revolution Analytics will present five honorable mention prize winners each with $1,000.

“We’ve designed this contest to highlight the most interesting use cases of applying R and Revolution R to solving key business problems, such as Big Data,” said Jeff Erhardt, COO of Revolution Analytics. “The ability to process higher-volume datasets will continue to be a critical need and we encourage the submission of applications using large datasets. Our goal is to grow the collection of online materials describing how to use R for business applications so our customers can better leverage Big Analytics to meet their analytical and organizational needs.”

To enter Revolution Analytics’ “Applications of R in Business” competition Continue reading “Use R for Business- Competition worth $ 20,000 #rstats”

Interview Dan Steinberg Founder Salford Systems

Here is an interview with Dan Steinberg, Founder and President of Salford Systems (http://www.salford-systems.com/ )

Ajay- Describe your journey from academia to technology entrepreneurship. What are the key milestones or turning points that you remember.

Dan- When I was in graduate school studying econometrics at Harvard, a number of distinguished professors at Harvard (and MIT) were actively involved in substantial real world activities. Professors that I interacted with, or studied with, or whose software I used became involved in the creation of such companies as Sun Microsystems, Data Resources, Inc. or were heavily involved in business consulting through their own companies or other influential consultants. Some not involved in private sector consulting took on substantial roles in government such as membership on the President’s Council of Economic Advisors. The atmosphere was one that encouraged free movement between academia and the private sector so the idea of forming a consulting and software company was quite natural and did not seem in any way inconsistent with being devoted to the advancement of science.

Ajay- What are the latest products by Salford Systems? Any future product plans or modification to work on Big Data analytics, mobile computing and cloud computing.

Dan- Our central set of data mining technologies are CART, MARS, TreeNet, RandomForests, and PRIM, and we have always maintained feature rich logistic regression and linear regression modules. In our latest release scheduled for January 2012 we will be including a new data mining approach to linear and logistic regression allowing for the rapid processing of massive numbers of predictors (e.g., one million columns), with powerful predictor selection and coefficient shrinkage. The new methods allow not only classic techniques such as ridge and lasso regression, but also sub-lasso model sizes. Clear tradeoff diagrams between model complexity (number of predictors) and predictive accuracy allow the modeler to select an ideal balance suitable for their requirements.

The new version of our data mining suite, Salford Predictive Modeler (SPM), also includes two important extensions to the boosted tree technology at the heart of TreeNet. The first, Importance Sampled learning Ensembles (ISLE), is used for the compression of TreeNet tree ensembles. Starting with, say, a 1,000 tree ensemble, the ISLE compression might well reduce this down to 200 reweighted trees. Such compression will be valuable when models need to be executed in real time. The compression rate is always under the modeler’s control, meaning that if a deployed model may only contain, say, 30 trees, then the compression will deliver an optimal 30-tree weighted ensemble. Needless to say, compression of tree ensembles should be expected to be lossy and how much accuracy is lost when extreme compression is desired will vary from case to case. Prior to ISLE, practitioners have simply truncated the ensemble to the maximum allowable size. The new methodology will substantially outperform truncation.

The second major advance is RULEFIT, a rule extraction engine that starts with a TreeNet model and decomposes it into the most interesting and predictive rules. RULEFIT is also a tree ensemble post-processor and offers the possibility of improving on the original TreeNet predictive performance. One can think of the rule extraction as an alternative way to explain and interpret an otherwise complex multi-tree model. The rules extracted are similar conceptually to the terminal nodes of a CART tree but the various rules will not refer to mutually exclusive regions of the data.

Ajay- You have led teams that have won multiple data mining competitions. What are some of your favorite techniques or approaches to a data mining problem.

Dan- We only enter competitions involving problems for which our technology is suitable, generally, classification and regression. In these areas, we are partial to TreeNet because it is such a capable and robust learning machine. However, we always find great value in analyzing many aspects of a data set with CART, especially when we require a compact and easy to understand story about the data. CART is exceptionally well suited to the discovery of errors in data, often revealing errors created by the competition organizers themselves. More than once, our reports of data problems have been responsible for the competition organizer’s decision to issue a corrected version of the data and we have been the only group to discover the problem.

In general, tackling a data mining competition is no different than tackling any analytical challenge. You must start with a solid conceptual grasp of the problem and the actual objectives, and the nature and limitations of the data. Following that comes feature extraction, the selection of a modeling strategy (or strategies), and then extensive experimentation to learn what works best.

Ajay- I know you have created your own software. But are there other software that you use or liked to use?

Dan- For analytics we frequently test open source software to make sure that our tools will in fact deliver the superior performance we advertise. In general, if a problem clearly requires technology other than that offered by Salford, we advise clients to seek other consultants expert in that other technology.

Ajay- Your software is installed at 3500 sites including 400 universities as per http://www.salford-systems.com/company/aboutus/index.html What is the key to managing and keeping so many customers happy?

Dan- First, we have taken great pains to make our software reliable and we make every effort to avoid problems related to bugs. Our testing procedures are extensive and we have experts dedicated to stress-testing software . Second, our interface is designed to be natural, intuitive, and easy to use, so the challenges to the new user are minimized. Also, clear documentation, help files, and training videos round out how we allow the user to look after themselves. Should a client need to contact us we try to achieve 24-hour turn around on tech support issues and monitor all tech support activity to ensure timeliness, accuracy, and helpfulness of our responses. WebEx/GotoMeeting and other internet based contact permit real time interaction.

Ajay- What do you do to relax and unwind?

Dan- I am in the gym almost every day combining weight and cardio training. No matter how tired I am before the workout I always come out energized so locating a good gym during my extensive travels is a must. I am also actively learning Portuguese so I look to watch a Brazilian TV show or Portuguese dubbed movie when I have time; I almost never watch any form of video unless it is available in Portuguese.

Biography-

http://www.salford-systems.com/blog/dan-steinberg.html

Dan Steinberg, President and Founder of Salford Systems, is a well-respected member of the statistics and econometrics communities. In 1992, he developed the first PC-based implementation of the original CART procedure, working in concert with Leo Breiman, Richard Olshen, Charles Stone and Jerome Friedman. In addition, he has provided consulting services on a number of biomedical and market research projects, which have sparked further innovations in the CART program and methodology.

Dr. Steinberg received his Ph.D. in Economics from Harvard University, and has given full day presentations on data mining for the American Marketing Association, the Direct Marketing Association and the American Statistical Association. After earning a PhD in Econometrics at Harvard Steinberg began his professional career as a Member of the Technical Staff at Bell Labs, Murray Hill, and then as Assistant Professor of Economics at the University of California, San Diego. A book he co-authored on Classification and Regression Trees was awarded the 1999 Nikkei Quality Control Literature Prize in Japan for excellence in statistical literature promoting the improvement of industrial quality control and management.

His consulting experience at Salford Systems has included complex modeling projects for major banks worldwide, including Citibank, Chase, American Express, Credit Suisse, and has included projects in Europe, Australia, New Zealand, Malaysia, Korea, Japan and Brazil. Steinberg led the teams that won first place awards in the KDDCup 2000, and the 2002 Duke/TeraData Churn modeling competition, and the teams that won awards in the PAKDD competitions of 2006 and 2007. He has published papers in economics, econometrics, computer science journals, and contributes actively to the ongoing research and development at Salford.

Heritage prize= 3mill now open

I am still angry with THE netflix for 1 mill I lost out. No sweat! this time the money is 3 times as much, it is legit, and yes baby you can change the world, make it a better place and get rich.! see details below-http://www.heritagehealthprize.com/c/hhp/Data

Information

Data

Submissions

Forum

Leaderboard

HERITAGE HEALTH PRIZE DATA FILES

DATA FILES (CLICK TO DOWNLOAD)

HHP_release1.zip (7.28 mb)HHP_release2.zip (46.58 mb)SampleEntry.csv (1.61 mb)

You must accept this competition’s rules before you’ll be able to download data files.

IMPORTANT NOTE: The information provided below is intended only to provide general guidance to participants in the Heritage Health Prize Competition and is subject to the Competition Official Rules. Any capitalized term not defined below is defined in the Competition Official Rules. Please consult the Competition Official Rules for complete details.

Heritage Provider Network is providing Competition Entrants with deidentified member data collected during a forty-eight month period that is allocated among three data sets (the “Data Sets”). Competition Entrants will use the Data Sets to develop and test their algorithms for accurately predicting the number of days that the members will spend in a hospital (inpatient or emergency room visit) during the 12-month period following the Data Set cut-off date.

HHP_release2.zip contains the latest files, so you can ignore HHP_release1.zip. SampleEntry.CSV shows you how an entry should look.

Data Sets will be released to Entrants after registration on the Website according to the following schedule:

April 4, 2011 Claims Table – Y1 and DaysInHospital Table – Y2

May 4, 2011
All other Data Sets except Labs Table and Rx Table

From https://www.kaggle.com/

The $3 million Heritage Health Prize opens to entries

It’s been one month since the launch of the Heritage Health Prize. The prize has attracted some great publicity, receiving coverage from the Wall Street Journal, The Economist, Slate andForbes.

By now, people have had a good chance to poke around the first portion of the data. Now the fun starts! HPN have released two more years’-worth of data, set the accuracy threshold and are opening up the competition to entries. The data are available from the Heritage Health Prize page. Good luck to all participants!

The Deloitte/FIDE Chess Ratings Competition results

The Deloitte/FIDE Chess Ratings Competition attracted one of the strongest fields ever seen in a Kaggle Competition. The competition attracted 189 teams, ranging from chess ratings experts to Netflix Prize winners. As Jeff Sonas wrote on the Kaggle blog last week, the competition has far exceeded his expectations. A big congratulations the provisional winner, Tim Salimans, an econometrician at Erasmus University in Rotterdam. We look forward to reading about the approaches used by top performers on the Kaggle blog. We also look forward to the results of the FIDE prize, which could see the introduction of a new chess ratings system.

ICDAR 2011 Competition Results

The ICDAR 2011 competition also finished recently. The competiiton required participants to develop an algorithm that correctly matched handwriting samples. The winners were Lewis Griffin and Andrew Newell from the University College London who achieved Kaggle’s first ever perfect score by managing to match every sample correctly! Andrew and Lewis have posted a description of their winning method on the Kaggle blog.

Revolution R Enterprise

Since R is the most popular language used by Kaggle members, the Revolution Analytics team is making Revolution R Enterprise (the pre-eminent commercial version of R) available free of charge to Kaggle members. Revolution R Enterprise has several advantages over standard R, including the ability to seemlessly handle larger datasets. To get your free copy, visit http://info.revolutionanalytics.com/Kaggle.html.

Kaggle-in-Class

As many of you know, Kaggle offers a free platform, Kaggle-in-Class, for instructors who want to host competitions for their students. For those interested in hearing more about the use of Kaggle-in-Class as a teaching tool, Susan Holmes and Nelson Ray from Stanford University share their experience in a webinar organized by the Consortium for the Advancement of Undergraduate Statistics Education.

Data Mining to change the world of health care (decisionstats.com)

Data Mining to change the world of health care

http://www.heritagehealthprize.com/c/hhp

Heritage Health Prize, a $3 million competition to predict who will go to hospital and for how long.

So as not to overwhelm anyone, we will be releasing the data in three waves. Today’s launch allows people to register and download the first instalment, which includes enough data for people to start trying out models. It includes claims data from Y1, information on members and the details of hospitalizations recorded in Y2.

The next instalment will be released on May 4 and will involve the release a more comprehensive dataset, including claims for later years as well as the test dataset against which entries will be judged. It is at this point that we will open up the competition to entries, reveal the performance threshold and begin posting the leaderboard.

Finally, the last release happens on June 4 and will include some ancillary data of prescriptions and lab tests.

Kaggle members don’t sign up again. To register, simply login and accept the rules before downloading the data.

Finally the Twitter hashtag for the competition is #drflix. Help spread the word.

http://www.heritagehealthprize.com/c/hhp

Heritage Provider Network Announces the Heritage Health Prize Will Include $230,000 in Progress Prizes (prnewswire.com)
$3.2M in prizes for predicting hospitalization (revolutionanalytics.com)
Heritage Health Prize: Is $3 million enough to improve the U.S. health care system? (slate.com)

Tag: Data set

Teradata updates Teradata-R

The Teradata add-on package for R

New Plotters in Rapid Miner 5.2

e-LICO Architecture and Components

Product Map

JMP and R – #rstats

JMP Add-in: Multidimensional Scaling using R

Chernoff Faces Add-in

Support Vector Machine for Classification

Penalized Regression Add-in

MP Addin: Univariate Nonparametric Bootstrapping

Use R for Business- Competition worth $ 20,000 #rstats

REVOLUTION ANALYTICS LAUNCHES “APPLICATIONS OF R IN BUSINESS” CONTEST

$20,000 in Prizes for Users Solving Business Problems with R

Heritage prize= 3mill now open

HERITAGE HEALTH PRIZE DATA FILES

DATA FILES (CLICK TO DOWNLOAD)

May 4, 2011

From https://www.kaggle.com/

The $3 million Heritage Health Prize opens to entries

April 4, 2011	Claims Table – Y1 and DaysInHospital Table – Y2
May 4, 2011	All other Data Sets except Labs Table and Rx Table

The Teradata add-on package for R

Please share:

e-LICO Architecture and Components

Product Map

Please share:

JMP Add-in: Multidimensional Scaling using R

Chernoff Faces Add-in

Support Vector Machine for Classification

Penalized Regression Add-in

MP Addin: Univariate Nonparametric Bootstrapping

Please share:

REVOLUTION ANALYTICS LAUNCHES “APPLICATIONS OF R IN BUSINESS” CONTEST

$20,000 in Prizes for Users Solving Business Problems with R

Please share:

Please share:

HERITAGE HEALTH PRIZE DATA FILES

DATA FILES (CLICK TO DOWNLOAD)

May 4, 2011

From https://www.kaggle.com/

The $3 million Heritage Health Prize opens to entries

Related Articles

Please share:

Related Articles

Please share: