#Rstats Credit Scoring using R

I came across a nice, lucid and very readable document at the http://cran.r-project.org/doc/contrib/Sharma-CreditScoring.pdf

Credit Scoring is really a bread and butter activity at many analytics shopfloors, and I really liked the way Credit Scoring is explained and executed by the author- which can be used by any user regardless of experience.
Sharma-CreditScoringhttp://www.scribd.com/embeds/74139509/content?start_page=1&view_mode=list&access_key=key-ttkkmxe3hkmq3ic746c//

 

PAW Conferences

Message from Predictive Analytics World-

1. IF YOU ARE IN EUROPE:

PAW London is next week, 30 Nov – 1 Dec, plus a workshop from John Elder on 2 Dec. More info: www.pawcon.com/london

2. IF YOU ARE IN NORTH AMERICA: (Happy Thanksgiving!)

The complete agenda for PAW San Francisco (March 4-10, 2012) has been launched.

Super Early Bird: Savings of $600 over onsite registration until Dec 16

More info: www.pawcon.com/sanfrancisco
Register: www.pawcon.com/sanfrancisco/register.php

Webinar: Using R within Oracle #rstats

Webinar: Using R within Oracle — Nov 30, noon EST

==========================================
Oracle now supports the R open source statistical programming language. Come to this webinar to learn more about using R within an Oracle environment.

— URL for TechCast: https://stbeehive.oracle.com/bconf/confDetails?confID=334B:3BF0:owch:38893C00F42F38A1E0404498C8A6612B0004075AECF7&guest=true&confKey=608880
— Web Conference ID: 303397
— Web Conference Key: 608880
— Dialup:             1-866-682-4770      , ID 5548204, passcode 1234

After a steady rise in the past few years, in 2010 the open source data mining software R overtook other tools to become the tool used by more data miners (43%) than any other (http://www.rexeranalytics.com/Data-Miner-Survey-Results-2010.html).

Several analytic tool vendors have added R-integration to their software. However, Oracle is the largest company to throw their weight behind R. On October 3, Oracle unveiled their integration of R: Oracle R Enterprise (http://www.oracle.com/us/corporate/features/features-oracle-r-enterprise-498732.html) as part of their Oracle Big Data Appliance announcement (http://www.oracle.com/us/corporate/press/512001).

Oracle R Enterprise allows users to perform statistical analysis with advanced visualization on data stored in Oracle Database. Oracle R Enterprise enables scalable R solutions, while facilitating production deployment of R scripts and Hadoop based solutions, as well as integration of R results with Oracle BI Publisher and OBIEE dashboards.

This TechCast introduces the various Oracle R Enterprise components and features, along with R script demonstrations that interface with Oracle Database.

TechCast presenter: Mark Hornick, Senior Manager, Oracle Advanced Analytics Development.
This TechCast is part of the ongoing TechCasts series coordinated by Oracle BIWA: The BI, Warehousing and Analytics SIG (http://www.oracleBIWA.org).

Product Review – Revolution R 5.0

So I got the email from Revolution R. Version 5.0 is ready for download, and unlike half hearted attempts by many software companies they make it easy for the academics and researchers to get their free copy. Free as in speech and free as in beer.

Some thoughts-

1) R ‘s memory problem is now an issue of marketing and branding. Revolution Analytics has definitely bridged this gap technically  beautifully and I quote from their documentation-

The primary advantage 64-bit architectures bring to R is an increase in the amount of memory available to a given R process.
The first benefit of that increase is an increase in the size of data objects you can create. For example, on most 32-bit versions of R, the largest data object you can create is roughly 3GB; attempts to create 4GB objects result in errors with the message “cannot allocate vector of length xxxx.”
On 64-bit versions of R, you can generally create larger data objects, up to R’s current hard limit of 231 􀀀 1 elements in a vector (about 2 billion elements). The functions memory.size and memory.limit help you manage the memory used byWindows versions of R.
In 64-bit Revolution R Enterprise, R sets the memory limit by default to the amount of physical RAM minus half a gigabyte, so that, for example, on a machine with 8GB of RAM, the default memory limit is 7.5GB:

2) The User Interface is best shown as below or at  https://docs.google.com/presentation/pub?id=1V_G7r0aBR3I5SktSOenhnhuqkHThne6fMxly_-4i8Ag&start=false&loop=false&delayms=3000

-(but I am still hoping for the GUI ,Revolution Analytics promised us for Christmas)

3) The partnership with Microsoft HPC is quite awesome given Microsoft’s track record in enterprise software penetration

but I am also interested in knowing more about the Oracle version of R and what it will do there.

UseR goes to Nashville, USA

So if Vanderbilt did lose (again) to UT (http://www.govolsxtra.com/news/2011/nov/20/video-tennessee-highlights-vanderbilt-game/) , they have somethign better to look before next season’s football season.

UseR is coming to Tennessee in 2012! This is the premier conference happens annually for R language (>2 mill users), and alternated between Europe and North America every other year.

Details here

http://biostat.mc.vanderbilt.edu/wiki/Main/UseR-2012

useR! 2012 (12-15 June 2012)
Department of Biostatistics
Vanderbilt University
School of Medicine
Nashville Tennessee USA

 

 

 

 


Pre-conference Survey

If you plan to attend useR! 2012, help us plan by completing a RedCAP Survey.

 


Contact

Stephania McNeal-Goddard
Assistant to the Chair
stephania.mcneal-goddard@vanderbilt.edu
Phone:             615.322.2768
Fax: 615.343.4924
Vanderbilt University School of Medicine
Department of Biostatistics
S-2323 Medical Center North
Nashville, TN 37232-2158

 

 


Abstracts and Tutorial Proposals

Participants are encouraged to submit an abstract to for oral presentation during a Kaleidoscope or Focus session, or for poster presentation. Tutorial proposals are also welcomed.

Deadlines

  • Tutorial Submission: Dec 1 – Jan 31
  • Tutorial Acceptance Notification: Feb 1 – Feb 29
  • Abstract Submission: Dec 1 – Mar 12
  • Abstract Acceptance Notification: Mar 13 – Apr 15

 

 


Registration

 

Deadlines

  • Early Registration: Jan 1 – Feb 29
  • Regular Registration: Mar 1 – May 12
  • Late Registration: May 13 – June 11
  • On-site Registration: June 12 – June 15

 

 


Travel and Lodging Information

Vanderbilt University is located in Nashville, Tennessee, USA.

Air Travel

The nearest major airport to Vanderbilt University is the Nashville International Airport (BNA). The airport is about 10 miles east of the campus and downtown Nashville. The BNA website maintains a list of ground transportation options for air travelers. The approximate taxi fare from the airport to Vanderbilt University is $27. Shuttles and buses are also available from the airport. The latter is economical (approximate fare is $1.60), but the travel time is more than an hour.

Car Travel

Nashville is located at the intersection of three major interstates. Interstate 40 approaches from the east and west, interstate 24 from the northwest and southeast, and interstate 65 from the northeast and south.

Business Metrics

Business Metrics (a partial extract from my upcoming book “R for Business Analytics”

Business Metrics are important variables that are collected on a periodic basis to assess the health and sustainability of a business. They should have the following properties-

1) What is a Business Metric-The absence of collection of regular update of the business metric could cause business disruption by incorrect and incomplete decision making.

2) Cost of Business Metrics- The costs of collection, storage and updating of the business metric is less than the opportunity costs of wrong decision making cause by lack of information of that business metric.

3) Continuity in your Business Metrics- The business metrics are continuous in comparing across time periods and business units- if necessary the assumptions for smoothing the comparisons should be listed in the business metric presentation itself.

4) Simplify your Business Metrics– Business metrics can be derived as well from other business metrics. If necessary and to avoid clutter only the most important business metrics should be presented, or the metrics with the biggest deviation from past trends should be mentioned.

5) Normalize your Business Metrics- Scale of the business metric units should be comparable to other business metrics as well as significant to emphasize the difference in numbers.

6) Standardize your Business Metrics– Dimension of business metrics should be increased to enhance comparison and contrasts without enhancing complexity. This means adding an extra dimension for analysis rather than a 2 by 2 comparison, to add time /geography/ employee/business owner as a dimension .

Amazon CC2 – The Big Cloud is finally here

Finally a powerful enough cloud computing instance from Amazon EC2 – called CC2 priced at 3$ per hour (for Windows instances) and 2.4$/hour for Linux

It would be interesting to see how SAS, IBM SPSS or R can leverage these

Storage – On the storage front, the CC2 instance type is packed with 60.5 GB of RAM and 3.37 TB of instance storage.

Processing – The CC2 instance type includes 2 Intel Xeon processors, each with 8 hardware cores. We’ve enabled Hyper-Threading, allowing each core to process a pair of instruction streams in parallel. Net-net, there are 32 hardware execution threads and you can expect 88 EC2 Compute Units (ECU’s) from this 64-bit instance type

On a somewhat smaller scale, you can launch your own array of 290 CC2 instances and create a Top500 supercomputer (63.7 teraFLOPS) at a cost of less than $1000 per hour

http://aws.typepad.com/aws/2011/11/next-generation-cluster-computing-on-amazon-ec2-the-cc2-instance-type.html

 

 

and

http://aws.amazon.com/hpc-applications/

 

 

Cluster Compute Eight Extra Large specifications:
88 EC2 Compute Units (Eight-core 2 x Intel Xeon)
60.5 GB of memory
3370 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cc2.8xlarge
Price: Starting from $2.40 per hour

But some caveats

  • The instances are available in a single Availability Zone in the US East (Northern Virginia) Region. We plan to add capacity in other EC2 Regions throughout 2012.
  • You can run 2 CC2 instances by default.
  • You cannot currently launch instances of this type within a Virtual Private Cloud (VPC).