When China overtook India- using DEDUCER

I was just reading about the new release of World Bank Data at http://www.r-chart.com/2010/09/new-world-bank-data-available.html Now World Bank Data is something I worked with in the past, but the RWDI package is a great package. (see http://www.r-chart.com/2010/09/new-world-bank-data-available.html)

The whole dataset is a 29 mb in zipped CSV though and is available for terrific macroeconomic analysis _ I downloaded it and loaded it instead.


I took a small subset of the data –

WDI_GDF_Data <- read.table("C:/Documents and Settings/abc/My Documents/Downloads/WDI_GDF_Data.csv",header=T,sep=",",quote="\"")
 WDI_GDF_Data.sub<-subset(WDI_GDF_Data,Country.Code == "CHN" | Country.Code == "IND" | Country.Code == "USA")
WDI_GDF_Data.sub.sub<-subset(WDI_GDF_Data.sub,Series.Code == "NY.GDP.PCAP.KD")
write.csv(WDI_GDF_Data.sub.sub,'C:/Documents and Settings/abc/Desktop/gdp3.csv')

Note- WordPress.com now supports source code in R via http://en.support.wordpress.com/code/posting-source-code/

Now this is basic data manipulation- and I used Deducer for it.

The best thing is the ability to use GGPlot using a GUI.
I am now trying to create more complicated plots for example with more than one Y variable but it is still a work in progress. Overall Deducer has made impressive improvements and with the JGR GUI seems very very promising. The look and feel also shows a combination of features (from SPSS ‘s variable and data view)

And yes China overtook India in 1985. In GDP per capita. Sigh

GGPLot though overtook Excel graphics as well.

Here is a video which is much better than my screenshots

JMP 9 releasing on Oct 12

JMP 9 releases on Oct 12- it is a very good reliable data visualization and analytical tool ( AND available on Mac as well)

AND IT is advertising R Graphics as well (lol- I can visualize the look on some ahem SAS fans in the R Project)

Updated Pricing- note I am not sure why they are charging US academics 495$ when SAS On Demand is free for academics. Shouldnt JMP be free to students- maybe John Sall and his people can do a tradeoff analysis for this given JMP’s graphics are better than Base SAS (which is under some pressure from WPS and R)


and http://www.enterpriseinnovation.net/content/sas-delivers-free-data-management-and-analytics-solutions-academe

*Offer good in the U.S. only.

New Corporate Customer


Save $300.

No special requirements.
Corporate Upgrade


Save $155.

Complete the form below or call 1-877-594-6567. Requires valid JMP® 8 serial number.
New Academic


Save $100.

Complete the form below or call 1-877-594-6567. Requires campus street address and campus e-mail address.
Academic Upgrade


Save $45.

Complete the form below or call 1-877-594-6567. Requires campus street address and campus e-mail address.

From- the mailer-

Be First in Line for JMP® 9
Save up to $300 when you pre-order a
single-user license by Oct. 11

Pre-Order JMP 9

Make JMP your analytic hub for visual data discovery with this special offer, good through Oct. 11, 2010. Pre-order a single-user license of JMP 9 – for a discount of up to $300 – and get ready for a leap in data interactivity.

Order now and enjoy the compelling new features of JMP 9 when the software is released Oct. 12. New capabilities in JMP 9 let you:

  • Optimize and simulate using your Microsoft Excel spreadsheets.
  • Use maps to find patterns in your geographic data.
  • Enjoy the updated look and flexibility of JMP 9 on Microsoft Windows.
  • Create and share custom add-ins that extend JMP.
  • Leverage an expanded array of advanced statistical methodologies.
  • Display analytic results from R using interactive graphics.


What if I already have a JMP 8 single-user license?
Great news! You can upgrade to JMP 9 for less than half the regular price.

What if I’m an annual license customer?
Don’t worry, we’ve got you covered. Annual license customers enjoy priority access to all the latest JMP releases as soon as they become available. JMP 9 will be shipped to you automatically.

What if I work or study in the academic world?
Call 1-877-594-6567 to learn about significant discounts for students and professors through the JMP Academic Program.

Please feel free to forward this offer to interested colleagues.

Got two or more users?
A JMP® annual license is the way to go. Call for details.

Remember: Act by Oct. 11!

JMP runs on Macintosh and Windows

Web R- Elastic R and RevoDeploy R

I had a skype video chat with Karime Chine and he was kind enough to walk me through the new portal Elastic-R at http://www.elastic-r.org

Basically you can work on a collaborative basis in this with multiple users working on excel as well as R projects.

Some screenshots-in a short presentation I made on my notes during K Chine’s presentation

Also, Revolution Analytics is coming out with a Web Services product for R

RevoDeployR: Web Services for R

Both are very powerful uses of R for cloud computing- and it would be interesting if the original cloud computing champion Google gets into the R Project.

Creating an Anonymous Bot

or Surfing the Net Anonmously and Having some Fun.

On the weekend, while browsing through http://freelancer.com I came across an intriguing offer-


Basically projects asking for increasing Youtube Views-


So this is one way I though it could be done-

1) Create an IP Address Anonymizer

Thats pretty simple- I used the Tor Project at http://www.torproject.org/easy-download.html.en

Basically it uses a peer to peer network to  connect to the internet and you can reset the connection as you want-so it hides your IP address.

Also useful for sending hatemail- limitation uses Firefox browser only.And also your webpage default keeps changing languages as the ip address changes.


The Tor Project is a 501(c)(3) non-profit based in the United States. The official address of the organization is:

The Tor Project
969 Main Street, Suite 206
Walpole, MA 02081 USA
Check your IP address at http://www.whatismyip.com/

2) Creating a Bot or an automatic clicking code ( without knowing code)

Go to https://addons.mozilla.org/en-US/firefox/addon/3863/

Remember when you could create an Excel Macro by just recording the Macro (in Excel 2003)

So while surfing if you need to do something again and again (like go the same Youtube video and clicking Like 5000 times) you can press record Macro

  • Do the action you want repeated again and again.
  • Click save Macro
  • Now run the Macro in a loop using the iMacro extension.

see screenshot below-

Note I have added two lines of code -WAIT SECONDS= 6

This means everytime the code runs in a loop it will wait for 6 seconds and then reload.

However I recommend you create a random number of wait seconds using Google Spreadsheet and the function RANDBETWEEN(5,400) (to limit between 5 and 400 seconds) and also use CONCATENATE with click and drag to create RANDOM wait times (instead of typing it say 500 times yourself)

see https://spreadsheets.google.com/ccc?key=tr18JVEE2TmAuH5V8fzJLRA#gid=0

That’s it – Your Anonymous Bot is ready.

See the  analytical results for my personal favourite Streaming Poetry video http://www.youtube.com/watch?v=a5yReaKRHOM

Easy isn’t it. Lines of code written= 0 , Number of Views =335 (before I grew bored)

Note- Officially it is against Youtube Terms http://www.youtube.com/t/terms to  use scripts or Bots so I did it for Research Purposes only. And the http://Freelancer.com needs to look into the activities underway at http://www.freelancer.com/projects/by-job/YouTube.html and also http://www.freelancer.com/projects/by-job/Facebook.html and http://www.freelancer.com/projects/by-job/Social-Networking.html

The final word on these activities is by http://xkcd.com or

Event: Predictive analytics with R, PMML and ADAPA

From http://www.meetup.com/R-Users/calendar/14405407/

The September meeting is at the Oracle campus. (This is next door to the Oracle towers, so there is plenty of free parking.) The featured talk is from Alex Guazzelli (Vice President – Analytics, Zementis Inc.) who will talk about “Predictive analytics with R, PMML and ADAPA”.

* 6:15 – 7:00 Networking and Pizza (with thanks to Revolution Analytics)
* 7:00 – 8:00 Talk: Predictive analytics with R, PMML and ADAPA
* 8:00 – 8:30 General discussion

Talk overview:

The rule in the past was that whenever a model was built in a particular development environment, it remained in that environment forever, unless it was manually recoded to work somewhere else. This rule has been shattered with the advent of PMML (Predictive Modeling Markup Language). By providing a uniform standard to represent predictive models, PMML allows for the exchange of predictive solutions between different applications and various vendors.

Once exported as PMML files, models are readily available for deployment into an execution engine for scoring or classification. ADAPA is one example of such an engine. It takes in models expressed in PMML and transforms them into web-services. Models can be executed either remotely by using web-services calls, or via a web console. Users can also use an Excel add-in to score data from inside Excel using models built in R.

R models have been exported into PMML and uploaded in ADAPA for many different purposes. Use cases where clients have used the flexibility of R to develop and the PMML standard combined with ADAPA to deploy range from financial applications (e.g., risk, compliance, fraud) to energy applications for the smart grid. The ability to easily transition solutions developed in R to the operational IT production environment helps eliminate the traditional limitations of R, e.g. performance for high volume or real-time transactional systems and memory constraints associated with large data sets.

Speaker Bio:

Dr. Alex Guazzelli has co-authored the first book on PMML, the Predictive Model Markup Language which is the de facto standard used to represent predictive models. The book, entitled PMML in Action: Unleashing the Power of Open Standards for Data Mining and Predictive Analytics, is available on Amazon.com. As the Vice President of Analytics at Zementis, Inc., Dr. Guazzelli is responsible for developing core technology and analytical solutions under ADAPA, a PMML-based predictive decisioning platform that combines predictive analytics and business rules. ADAPA is the first system of its kind to be offered as a service on the cloud.
Prior to joining Zementis, Dr. Guazzelli was involved in not only building but also deploying predictive solutions for large financial and telecommunication institutions around the globe. In academia, Dr. Guazzelli worked with data mining, neural networks, expert systems and brain theory. His work in brain theory and computational neuroscience has appeared in many peer reviewed publications. At Zementis, Dr. Guazzelli and his team have been involved in a myriad of modeling projects for financial, health-care, gaming, chemical, and manufacturing industries.

Dr. Guazzelli holds a Ph.D. in Computer Science from the University of Southern California and a M.S and B.S. in Computer Science from the Federal University of Rio Grande do Sul, Brazil.

R Excel :Updated

It was really nice to see the latest version of R Excel at http://rcom.univie.ac.at/ and bundled together in an aptly named package called R and Friends.

The look and feel of the package as well as ease of installing are really professional. I also liked the commercial equivalent at http://www.statconn.com/

However much older-guardians and  die- hards of command line,  feel that GUI is like putting lipstick on a pig, but we respectfully demur.

What does R Excel do? Well for one it can put the R Commander Interface INSIDE your Excel Spreadsheet. That makes it easy to use and a familiar interface even if you are newbie to R- (assuming you have done some Excel)

Download the latest version here


This package will automatically install and configure

  • R 2.11.1
  • rscproxy 1.3-1
  • rcom 2.2-1

It will also download and install a suitable version of the statconnDCOM server and of RExcel during installation. Therefore you will need a working Internet connection during the installation process.
This version of RAndFriends was created 20100516.

Download RAndFriendsSetup2111V3.1-5-1

We also give you information how to download all sources for R and the R packages included in RAndFriends.

Also read a paper on R and SAS interoperability (using HMisc package from Dr Harrell) at Holland Numerics


Predictive Forecasting in Commercial Applications

Most organizations tend to have a sales plan or forecast for the next 1 year.This is done for internal planning as well as give guidance to financial investment analysts covering the listed company.

However a lot of organizations use simplistic linear models of

1) either growth based on previous history (Last year Sales * Factor of forecast (e.g 10 % growth in sales) -TIME SERIES APPROACH


2) growth based on macro economic causal factors (e.g economy is in recession hence sales will grow by 3 %) REGRESSION BASED APPROACH and

3) A consensus of industrial factors (We have spare capacity of 10 % so we will likely slash prices and have sales growth of 2 % but profit growth of -3%) DELPHI BASED APPROACH (this is also based on bottoms up market feedback and top down sales pressure).

A better approach is to combine all these approaches in one or different models .

This can help build a much more robust forecasting model for organizations using nothing more than simple combination of excel cells.

The following model assumes only seven factors and tries to build a stable and relatively easy to understand forecast model.

Forecasted Sales for this quarter =

Historic Sales for this quarter last year *A1

+ Historic Average Sales for this quarter for past three -five years (based on industry cycle ups -downs)*A2

+ Historic Sales for this quarter/Actual Sales of Last Quarter( for seasonal factors )*A3

+Causal Factor 1 ( Eg. Outsourcing is likely to grow by 15 % in this year) *A4

+Causal Factor 2 (Foreign Exchange Movement.Dollar is likely to depreciate by 10 %)*A5

+ Causal Factor 3 (Our bench strength is likely to grow by 3 % in this quarter)*A6

+ Percentage Error Factor *A7 (There will always be +-5 to15 % error in forecasts.Capturing this error also helps provide a feedback loop for planning).

Here A1- A7 are constants

In order to get actual values of A1-A7 , run this a regression (use the add-in and tools menu in excel) on actual data for past three years quarters (keeping last six months seperate)

Then run the actual equation on last two quarters and check for actual error. If error exceeds the comfort level (+-3 % for critical industries and +-15 % for harder to predict industries) . Iterate the last two steps till you get a good equation.

Then substitute in the 7 factor predictive model to build your simple and robust sales plan for this quarter.

Happy forecasting !!!