New Free Online Book by Rob Hyndman on Forecasting using #Rstats

From the creator of some of the most widely used packages for time series in the R programming language comes a brand new book, and its online!

This time the book is free, will be updated and 7 chapters are ready (to read!)

. If you do forecasting professionally, now is the time to suggest your own use cases to be featured as the book gets ready by end- 2012. The book is intended as a replace­ment for Makri­dakis, Wheel­wright and Hyn­d­man (Wiley 1998).

http://otexts.com/fpp/

The book is writ­ten for three audi­ences:

(1) people find­ing them­selves doing fore­cast­ing in busi­ness when they may not have had any for­mal train­ing in the area;

(2) undergraduate stu­dents study­ing busi­ness;

(3) MBA stu­dents doing a fore­cast­ing elec­tive.

The book is dif­fer­ent from other fore­cast­ing text­books in sev­eral ways.

  • It is free and online, mak­ing it acces­si­ble to a wide audience.
  • It is con­tin­u­ously updated. You don’t have to wait until the next edi­tion for errors to be removed or new meth­ods to be dis­cussed. We will update the book frequently.
  • There are dozens of real data exam­ples taken from our own con­sult­ing prac­tice. We have worked with hun­dreds of busi­nesses and orga­ni­za­tions help­ing them with fore­cast­ing issues, and this expe­ri­ence has con­tributed directly to many of the exam­ples given here, as well as guid­ing our gen­eral phi­los­o­phy of forecasting.
  • We empha­sise graph­i­cal meth­ods more than most fore­cast­ers. We use graphs to explore the data, analyse the valid­ity of the mod­els fit­ted and present the fore­cast­ing results.

A print ver­sion and a down­load­able e-version of the book will be avail­able to pur­chase on Ama­zon, but not until a few more chap­ters are written.

Contents

(Ajay-Support the open textbook movement!)

If you’ve found this book helpful, please consider helping to fund free, open and online textbooks. (Donations via PayPal.)

Look for yourself at http://otexts.com/fpp/

 

Machine Learning to Translate Code from different programming languages

Google Translate has been a pioneer in using machine learning for translating various languages (and so is the awesome Google Transliterate)

I wonder if they can expand it to programming languages and not just human languages.

 

Issues in converting  translating programming language code

1) Paths referred for stored objects

2) Object Names should remain the same and not translated

3) Multiple Functions have multiple uses , sometimes function translate is not straightforward

I think all these issues are doable, solveable and more importantly profitable.

 

I look forward to the day a iOS developer can convert his code to Android app code by simple upload and download.

Talking on Big Data Analytics

I am going  being sponsored to a Government of India sponsored talk on Big Data Analytics at Bangalore on Friday the 13 th of July. If you are in Bangalore, India you may drop in for a dekko. Schedule and Abstracts (i am on page 7 out 9) .

Your tax payer money is hard at work- (hassi majak only if you are a desi. hassi to fassi.)

13 July 2012 (9.30 – 11.00 & 11.30 – 1.00)
Big Data Big Analytics
The talk will showcase using open source technologies in statistical computing for big data, namely the R programming language and its use cases in big data analysis. It will review case studies using the Amazon Cloud, custom packages in R for Big Data, tools like Revolution Analytics RevoScaleR package, as well as the newly launched SAP Hana used with R. We will also review Oracle R Enterprise. In addition we will show some case studies using BigML.com (using Clojure) , and approaches using PiCloud. In addition it will showcase some of Google APIs for Big Data Analysis.

Lastly we will talk on social media analysis ,national security use cases (i.e. cyber war) and privacy hazards of big data analytics.

Schedule

View more presentations from Ajay Ohri.
Abstracts

View more documents from Ajay Ohri.

 

Interview Alvaro Tejada Galindo, SAP Labs Montreal, Using SAP Hana with #Rstats

Here is a brief interview with Alvaro Tejada Galindo aka Blag who is a developer working with SAP Hana and R at SAP Labs, Montreal. SAP Hana is SAP’s latest offering in BI , it’s also a database and a computing environment , and using R and HANA together on the cloud can give major productivity gains in terms of both speed and analytical ability, as per preliminary use cases.

Ajay- Describe how you got involved with databases and R language.
Blag-  I used to work as an ABAP Consultant for 11 years, but also been involved with programming since the last 13 years, so I was in touch with SQLServer, Oracle, MySQL and SQLite. When I joined SAP, I heard that SAP HANA was going to use an statistical programming language called “R”. The next day I started my “R” learning.

Ajay- What made the R language a fit for SAP HANA. Did you consider other languages? What is your view on Julia/Python/SPSS/SAS/Matlab languages

Blag- I think “R” is a must for SAP HANA. As the fastest database in the market, we needed a language that could help us shape the data in the best possible way. “R” filled that purpose very well. Right now, “R” is not the only language as “L” can be used as well (http://wiki.tcl.tk/17068) …not forgetting “SQLScript” which is our own version of SQL (http://goo.gl/x3bwh) . I have to admit that I tried Julia, but couldn’t manage to make it work. Regarding Python, it’s an interesting question as I’m going to blog about Python and SAP HANA soon. About Matlab, SPSS and SAS I haven’t used them, so I got nothing to say there.

Ajay- What is your view on some of the limitations of R that can be overcome with using it with SAP HANA.

Blag-  I think mostly the ability of SAP HANA to work with big data. Again, SAP HANA and “R” can work very nicely together and achieve things that weren’t possible before.

Ajay-  Have you considered other vendors of R including working with RStudio, Revolution Analytics, and even Oracle R Enterprise.

Blag-  I’m not really part of the SAP HANA or the R groups inside SAP, so I can’t really comment on that. I can only say that I use RStudio every time I need to do something with R. Regarding Oracle…I don’t think so…but they can use any of our products whenever they want.

Ajay- Do you have a case study on an actual usage of R with SAP HANA that led to great results.

Blag-   Right now the use of “R” and SAP HANA is very preliminary, I don’t think many people has start working on it…but as an example that it works, you can check this awesome blog entry from my friend Jitender Aswani “Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps “ (http://allthingsr.blogspot.com/#!/2012/04/big-data-r-and-hana-analyze-200-million.html)

Ajay- Does your group in SAP plan to give to the R ecosystem by attending conferences like UseR 2012, sponsoring meets, or package development etc

Blag- My group is in charge of everything developers, so sure, we’re planning to get more in touch with R developers and their ecosystem. Not sure how we’re going to deal with it, but at least I’m going to get myself involved in the Montreal R Group.

 

About-

http://scn.sap.com/people/alvaro.tejadagalindo3

Name: Alvaro Tejada Galindo
Email: a.tejada.galindo@sap.com
Profession: Development
Company: SAP Canada Labs-Montreal
Town/City: Montreal
Country: Canada
Instant Messaging Type: Twitter
Instant Messaging ID: Blag
Personal URL: http://blagrants.blogspot.com
Professional Blog URL: http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/u/252210910
My Relation to SAP: employee
Short Bio: Development Expert for the Technology Innovation and Developer Experience team.Used to be an ABAP Consultant for the last 11 years. Addicted to programming since 1997.

http://www.sap.com/solutions/technology/in-memory-computing-platform/hana/overview/index.epx

and from

http://en.wikipedia.org/wiki/SAP_HANA

SAP HANA is SAP AG’s implementation of in-memory database technology. There are four components within the software group:[1]

  • SAP HANA DB (or HANA DB) refers to the database technology itself,
  • SAP HANA Studio refers to the suite of tools provided by SAP for modeling,
  • SAP HANA Appliance refers to HANA DB as delivered on partner certified hardware (see below) as anappliance. It also includes the modeling tools from HANA Studio as well replication and data transformation tools to move data into HANA DB,[2]
  • SAP HANA Application Cloud refers to the cloud based infrastructure for delivery of applications (typically existing SAP applications rewritten to run on HANA).

R is integrated in HANA DB via TCP/IP. HANA uses SQL-SHM, a shared memory-based data exchange to incorporate R’s vertical data structure. HANA also introduces R scripts equivalent to native database operations like join or aggregation.[20] HANA developers can write R scripts in SQL and the types are automatically converted in HANA. R scripts can be invoked with HANA tables as both input and output in the SQLScript. R environments need to be deployed to use R within SQLScript

More blog posts on using SAP and R together

Dealing with R and HANA

http://scn.sap.com/community/in-memory-business-data-management/blog/2011/11/28/dealing-with-r-and-hana
R meets HANA

http://scn.sap.com/community/in-memory-business-data-management/blog/2012/01/29/r-meets-hana

HANA meets R

http://scn.sap.com/community/in-memory-business-data-management/blog/2012/01/26/hana-meets-r
When SAP HANA met R – First kiss

http://scn.sap.com/community/developer-center/hana/blog/2012/05/21/when-sap-hana-met-r–first-kiss

 

Using RODBC with SAP HANA DB-

SAP HANA: My experiences on using SAP HANA with R

http://scn.sap.com/community/in-memory-business-data-management/blog/2012/02/21/sap-hana-my-experiences-on-using-sap-hana-with-r

and of course the blog that started it all-

Jitender Aswani’s http://allthingsr.blogspot.in/

 

 

Software Review- BigML.com – Machine Learning meets the Cloud

I had a chance to dekko the new startup BigML https://bigml.com/ and was suitably impressed by the briefing and my own puttering around the site. Here is my review-

1) The website is very intutively designed- You can create a dataset from an uploaded file in one click and you can create a Decision Tree model in one click as well. I wish other cloud computing websites like  Google Prediction API make design so intutive and easy to understand. Also unlike Google Prediction API, the models are not black box models, but have a description which can be understood.

2) It includes some well known data sources for people trying it out. They were kind enough to offer 5 invite codes for readers of Decisionstats ( if you want to check it yourself, use the codes below the post, note they are one time only , so the first five get the invites.

BigML is still invite only but plan to get into open release soon.

3) Data Sources can only be by uploading files (csv) but they plan to change this hopefully to get data from buckets (s3? or Google?) and from URLs.

4) The one click operation to convert data source into a dataset shows a histogram (distribution) of individual variables.The back end is clojure , because the team explained it made the easiest sense and fit with Java. The good news (?) is you would never see the clojure code at the back end. You can read about it from http://clojure.org/

As cloud computing takes off (someday) I expect clojure popularity to take off as well.

Clojure is a dynamic programming language that targets the Java Virtual Machine (and the CLR, and JavaScript). It is designed to be a general-purpose language, combining the approachability and interactive development of a scripting language with an efficient and robust infrastructure for multithreaded programming. Clojure is a compiled language – it compiles directly to JVM bytecode, yet remains completely dynamic. Every feature supported by Clojure is supported at runtime. Clojure provides easy access to the Java frameworks, with optional type hints and type inference, to ensure that calls to Java can avoid reflection.

Clojure is a dialect of Lisp

 

5) As of now decision trees is the only distributed algol, but they expect to roll out other machine learning stuff soon. Hopefully this includes regression (as logit and linear) and k means clustering. The trees are created and pruned in real time which gives a slightly animated (and impressive effect). and yes model building is an one click operation.

The real time -live pruning is really impressive and I wonder why /how it can ever be replicated in other software based on desktop, because of the sheer interactive nature.

 

Making the model is just half the work. Creating predictions and scoring the model is what is really the money-earner. It is one click and customization is quite intuitive. It is not quite PMML compliant yet so I hope some Zemanta like functionality can be added so huge amounts of models can be applied to predictions or score data in real time.

 

If you are a developer/data hacker, you should check out this section too- it is quite impressive that the designers of BigML have planned for API access so early.

https://bigml.com/developers

BigML.io gives you:

  • Secure programmatic access to all your BigML resources.
  • Fully white-box access to your datasets and models.
  • Asynchronous creation of datasets and models.
  • Near real-time predictions.

 

Note: For your convenience, some of the snippets below include your real username and API key.

Please keep them secret.

REST API

BigML.io conforms to the design principles of Representational State Transfer (REST)BigML.io is enterely HTTP-based.

BigML.io gives you access to four basic resources: SourceDatasetModel and Prediction. You cancreatereadupdate, and delete resources using the respective standard HTTP methods: POSTGET,PUT and DELETE.

All communication with BigML.io is JSON formatted except for source creation. Source creation is handled with a HTTP PUT using the “multipart/form-data” content-type

HTTPS

All access to BigML.io must be performed over HTTPS

and https://bigml.com/developers/quick_start ( In think an R package which uses JSON ,RCurl  would further help in enhancing ease of usage).

 

Summary-

Overall a welcome addition to make software in the real of cloud computing and statistical computation/business analytics both easy to use and easy to deploy with fail safe mechanisms built in.

Check out https://bigml.com/ for yourself to see.

The invite codes are here -one time use only- first five get the invites- so click and try your luck, machine learning on the cloud.

If you dont get an invite (or it is already used, just leave your email there and wait a couple of days to get approval)

  1. https://bigml.com/accounts/register/?code=E1FE7
  2. https://bigml.com/accounts/register/?code=09991
  3. https://bigml.com/accounts/register/?code=5367D
  4. https://bigml.com/accounts/register/?code=76EEF
  5. https://bigml.com/accounts/register/?code=742FD

R for Predictive Modeling- PAW Toronto

A nice workshop on using R for Predictive Modeling by Max Kuhn Director, Nonclinical Statistics, Pfizer is on at PAW Toronto.

Workshop

Monday, April 23, 2012 in Toronto
Full-day: 9:00am – 4:30pm

R for Predictive Modeling:
A Hands-On Introduction

Intended Audience: Practitioners who wish to learn how to execute on predictive analytics by way of the R language; anyone who wants “to turn ideas into software, quickly and faithfully.”

Knowledge Level: Either hands-on experience with predictive modeling (without R) or hands-on familiarity with any programming language (other than R) is sufficient background and preparation to participate in this workshop.


What prior attendees have exclaimed


Workshop Description

This one-day session provides a hands-on introduction to R, the well-known open-source platform for data analysis. Real examples are employed in order to methodically expose attendees to best practices driving R and its rich set of predictive modeling packages, providing hands-on experience and know-how. R is compared to other data analysis platforms, and common pitfalls in using R are addressed.

The instructor, a leading R developer and the creator of CARET, a core R package that streamlines the process for creating predictive models, will guide attendees on hands-on execution with R, covering:

  • A working knowledge of the R system
  • The strengths and limitations of the R language
  • Preparing data with R, including splitting, resampling and variable creation
  • Developing predictive models with R, including decision trees, support vector machines and ensemble methods
  • Visualization: Exploratory Data Analysis (EDA), and tools that persuade
  • Evaluating predictive models, including viewing lift curves, variable importance and avoiding overfitting

Hardware: Bring Your Own Laptop
Each workshop participant is required to bring their own laptop running Windows or OS X. The software used during this training program, R, is free and readily available for download.

Attendees receive an electronic copy of the course materials and related R code at the conclusion of the workshop.


Schedule

  • Workshop starts at 9:00am
  • Morning Coffee Break at 10:30am – 11:00am
  • Lunch provided at 12:30 – 1:15pm
  • Afternoon Coffee Break at 2:30pm – 3:00pm
  • End of the Workshop: 4:30pm

Instructor

Max Kuhn, Director, Nonclinical Statistics, Pfizer

Max Kuhn is a Director of Nonclinical Statistics at Pfizer Global R&D in Connecticut. He has been applying models in the pharmaceutical industries for over 15 years.

He is a leading R developer and the author of several R packages including the CARET package that provides a simple and consistent interface to over 100 predictive models available in R.

Mr. Kuhn has taught courses on modeling within Pfizer and externally, including a class for the India Ministry of Information Technology.

Source-

http://www.predictiveanalyticsworld.com/toronto/2012/r_for_predictive_modeling.php

Oracle launches its version of R #rstats

From-

http://www.oracle.com/us/corporate/press/1515738

Integrates R Statistical Programming Language into Oracle Database 11g

News Facts

Oracle today announced the availability of Oracle Advanced Analytics, a new option for Oracle Database 11g that bundles Oracle R Enterprise together with Oracle Data Mining.
Oracle R Enterprise delivers enterprise class performance for users of the R statistical programming language, increasing the scale of data that can be analyzed by orders of magnitude using Oracle Database 11g.
R has attracted over two million users since its introduction in 1995, and Oracle R Enterprise dramatically advances capability for R users. Their existing R development skills, tools, and scripts can now also run transparently, and scale against data stored in Oracle Database 11g.
Customer testing of Oracle R Enterprise for Big Data analytics on Oracle Exadata has shown up to 100x increase in performance in comparison to their current environment.
Oracle Data Mining, now part of Oracle Advanced Analytics, helps enable customers to easily build and deploy predictive analytic applications that help deliver new insights into business performance.
Oracle Advanced Analytics, in conjunction with Oracle Big Data ApplianceOracle Exadata Database Machine and Oracle Exalytics In-Memory Machine, delivers the industry’s most integrated and comprehensive platform for Big Data analytics.

Comprehensive In-Database Platform for Advanced Analytics

Oracle Advanced Analytics brings analytic algorithms to data stored in Oracle Database 11g and Oracle Exadata as opposed to the traditional approach of extracting data to laptops or specialized servers.
With Oracle Advanced Analytics, customers have a comprehensive platform for real-time analytic applications that deliver insight into key business subjects such as churn prediction, product recommendations, and fraud alerting.
By providing direct and controlled access to data stored in Oracle Database 11g, customers can accelerate data analyst productivity while maintaining data security throughout the enterprise.
Powered by decades of Oracle Database innovation, Oracle R Enterprise helps enable analysts to run a variety of sophisticated numerical techniques on billion row data sets in a matter of seconds making iterative, speed of thought, and high-quality numerical analysis on Big Data practical.
Oracle R Enterprise drastically reduces the time to deploy models by eliminating the need to translate the models to other languages before they can be deployed in production.
Oracle R Enterprise integrates the extensive set of Oracle Database data mining algorithms, analytics, and access to Oracle OLAP cubes into the R language for transparent use by R users.
Oracle Data Mining provides an extensive set of in-database data mining algorithms that solve a wide range of business problems. These predictive models can be deployed in Oracle Database 11g and use Oracle Exadata Smart Scan to rapidly score huge volumes of data.
The tight integration between R, Oracle Database 11g, and Hadoop enables R users to write one R script that can run in three different environments: a laptop running open source R, Hadoop running with Oracle Big Data Connectors, and Oracle Database 11g.
Oracle provides single vendor support for the entire Big Data platform spanning the hardware stack, operating system, open source R, Oracle R Enterprise and Oracle Database 11g.
To enable easy enterprise-wide Big Data analysis, results from Oracle Advanced Analytics can be viewed from Oracle Business Intelligence Foundation Suite and Oracle Exalytics In-Memory Machine.

Supporting Quotes

“Oracle is committed to meeting the challenges of Big Data analytics. By building upon the analytical depth of Oracle SQL, Oracle Data Mining and the R environment, Oracle is delivering a scalable and secure Big Data platform to help our customers solve the toughest analytics problems,” said Andrew Mendelsohn, senior vice president, Oracle Server Technologies.
“We work with leading edge customers who rely on us to deliver better BI from their Oracle Databases. The new Oracle R Enterprise functionality allows us to perform deep analytics on Big Data stored in Oracle Databases. By leveraging R and its library of open source contributed CRAN packages combined with the power and scalability of Oracle Database 11g, we can now do that,” said Mark Rittman, co-founder, Rittman Mead.
Oracle Advanced Analytics — an option to Oracle Database 11g Enterprise Edition – extends the database into a comprehensive advanced analytics platform through two major components: Oracle R Enterprise and Oracle Data Mining. With Oracle Advanced Analytics, customers have a comprehensive platform for real-time analytic applications that deliver insight into key business subjects such as churn prediction, product recommendations, and fraud alerting.

Oracle R Enterprise tightly integrates the open source R programming language with the database to further extend the database with Rs library of statistical functionality, and pushes down computations to the database. Oracle R Enterprise dramatically advances the capability for R users, and allows them to use their existing R development skills and tools, and scripts can now also run transparently and scale against data stored in Oracle Database 11g.

Oracle Data Mining provides powerful data mining algorithms that run as native SQL functions for in-database model building and model deployment. It can be accessed through the SQL Developer extension Oracle Data Miner to build, evaluate, share and deploy predictive analytics methodologies. At the same time the high-performance Oracle-specific data mining algorithms are accessible from R.

BENEFITS

  • Scalability—Allows customers to easily scale analytics as data volume increases by bringing the algorithms to where the data resides – in the database
  • Performance—With analytical operations performed in the database, R users can take advantage of the extreme performance of Oracle Exadata
  • Security—Provides data analysts with direct but controlled access to data in Oracle Database 11g, accelerating data analyst productivity while maintaining data security
  • Save Time and Money—Lowers overall TCO for data analysis by eliminating data movement and shortening the time it takes to transform “raw data” into “actionable information”
Oracle R Hadoop Connector Gives R users high performance native access to Hadoop Distributed File System (HDFS) and MapReduce programming framework.
This is a  R package
From the datasheet at
%d bloggers like this: