Protected: Converting SAS language code to Java

This content is password protected. To view it please enter your password below:

Avengers Review

Avengers is the big ticket block buster which heralds summer just like the groundhog denotes spring. An ensemble cast (of superheroes and okay actors) , it stars Hulk (angry green man aka Dr Bruce Banner /Mark Ruffalo) ,Iron Man (genius billionaire philanthropist playboy aka Tony Stark / Robert Downey Jr), Thor (an Australian looking Chris H), Loki (God of Mischief played  by German looking Tom Hiddleston ), Captain America  and Scarlet Johnassen and Jeremy “Hurt Locker” Renner and Samuel L Jackson. You know somethings is gotta give if the A List stars(?) in the cast is going to be longer than a plot summary.

Well Loki the bad guys strikes a deal with some other bad Guys of funnily named world called Assguard (parallel universe!) and tries to find a cube (which is all energy powerful like the Transformers 1 Cube)  and in return gets an Army from the dark side (who look just  like Cybertrons and Lords of the Rings orcs combined).

The Avengers after much dilly dallying, trying to emote, create bromances, tension buildup, in the end decide to give you what you came looking for- a visual feast of credible looking CGI to counter the bad guys. The scene stealer is the Hulk. He is kind of cute for a big green guy, if you dont know what I mean, see the movie!

This is American cinema at its most profoundly intellectual since the Die Hard series. and its quite entertaining, especially if you are a geeky comic book fan-boy (like me).

Summer is here and so are the super-heroes!! Unleash the popcorn.


Software Review- Google Drive versus Dropbox

Here are some notes from reviewing Google Drive vs Dropbox

1) Google Drive gives more free space upfront  than Dropbox.5GB versus 2GB

2) Dropbox has a referral system 500 mb per referral while there is no referral system for Google Drive

3) The sync facility with Google Docs makes Google Drive especially useful for prior users of Google Docs.

4) API access to Google Drive is only for Chrome apps which is intriguing!

Apps will not have any API access to files unless users have first installed the app in Chrome Web Store.

You can use the Dropbox API much more easily –

See the platforms at

Choose your platform:

iOS Android Python Ruby


(though I wonder if you set the R working directory to the local shared drive for Google Drive it should sync up as well but of course be slower –

5) Google Drive icon is ugly (seriously, dude!) , but the features in the Windows app is just the same as the Dropbox App. Too similar 😉


6) Upgrade space is much more cheaper to Google Drive than Dropbox ( by Google Drive prices being exactly  a quarter of prices on Dropbox and max storage being 16 times as much). This will affect power storage users. I expect to see some slowdown in Dropbox new business unless G Drive has outage (like Gmail) . Existing users at Dropbox probably wont shift for the small dollar amount- though it is quite easy to do so.


Install Google Drive on your local workstation and cut and paste your Dropbox local folder to the Google Drive local folder!!

7) Dropbox deserves credit for being first (like Hotmail and AOL) but Google Drive is almost better in all respects!

Google Drive

5 GB of Drive (0% used)
10 GB of Gmail (48% used)
1 GB of Picasa (0% used)


25 GB
2,49 $ / Month
+25 GB for Drive and Picasa
Bonus: Your Gmail storage will be upgraded to 25 GB.
Choose this plan

100 GB
4,99 $ / Month
+100 GB for Drive and Picasa
Bonus: Your Gmail storage will be upgraded to 25 GB.
Choose this plan

 Need more storage?

Up to 16 TB available


Current account type

Large DropboxDropbox Badge greenFree
Up to 18 GB (2 GB + 500 MB per referral)
Account info 

Other account types

Large DropboxDropbox Badge orange50 GB +
Pro 50
+1 GB per referral, up to +32 GB
$9.99/month or $99.00/year Upgrade to Pro 50
Large DropboxDropbox Badge purple100 GB +
Pro 100
+1 GB per referral, up to +32 GB
$19.99/month or $199.00/year Upgrade to Pro 100
Triple DropboxDropbox For Teams Badge1 TB +
Plans starting at 1 TB
Large shared quota, centralized admin and billing, and more!




Software Review- – Machine Learning meets the Cloud

I had a chance to dekko the new startup BigML and was suitably impressed by the briefing and my own puttering around the site. Here is my review-

1) The website is very intutively designed- You can create a dataset from an uploaded file in one click and you can create a Decision Tree model in one click as well. I wish other cloud computing websites like  Google Prediction API make design so intutive and easy to understand. Also unlike Google Prediction API, the models are not black box models, but have a description which can be understood.

2) It includes some well known data sources for people trying it out. They were kind enough to offer 5 invite codes for readers of Decisionstats ( if you want to check it yourself, use the codes below the post, note they are one time only , so the first five get the invites.

BigML is still invite only but plan to get into open release soon.

3) Data Sources can only be by uploading files (csv) but they plan to change this hopefully to get data from buckets (s3? or Google?) and from URLs.

4) The one click operation to convert data source into a dataset shows a histogram (distribution) of individual variables.The back end is clojure , because the team explained it made the easiest sense and fit with Java. The good news (?) is you would never see the clojure code at the back end. You can read about it from

As cloud computing takes off (someday) I expect clojure popularity to take off as well.

Clojure is a dynamic programming language that targets the Java Virtual Machine (and the CLR, and JavaScript). It is designed to be a general-purpose language, combining the approachability and interactive development of a scripting language with an efficient and robust infrastructure for multithreaded programming. Clojure is a compiled language – it compiles directly to JVM bytecode, yet remains completely dynamic. Every feature supported by Clojure is supported at runtime. Clojure provides easy access to the Java frameworks, with optional type hints and type inference, to ensure that calls to Java can avoid reflection.

Clojure is a dialect of Lisp


5) As of now decision trees is the only distributed algol, but they expect to roll out other machine learning stuff soon. Hopefully this includes regression (as logit and linear) and k means clustering. The trees are created and pruned in real time which gives a slightly animated (and impressive effect). and yes model building is an one click operation.

The real time -live pruning is really impressive and I wonder why /how it can ever be replicated in other software based on desktop, because of the sheer interactive nature.


Making the model is just half the work. Creating predictions and scoring the model is what is really the money-earner. It is one click and customization is quite intuitive. It is not quite PMML compliant yet so I hope some Zemanta like functionality can be added so huge amounts of models can be applied to predictions or score data in real time.


If you are a developer/data hacker, you should check out this section too- it is quite impressive that the designers of BigML have planned for API access so early. gives you:

  • Secure programmatic access to all your BigML resources.
  • Fully white-box access to your datasets and models.
  • Asynchronous creation of datasets and models.
  • Near real-time predictions.


Note: For your convenience, some of the snippets below include your real username and API key.

Please keep them secret.

REST API conforms to the design principles of Representational State Transfer (REST) is enterely HTTP-based. gives you access to four basic resources: SourceDatasetModel and Prediction. You cancreatereadupdate, and delete resources using the respective standard HTTP methods: POSTGET,PUT and DELETE.

All communication with is JSON formatted except for source creation. Source creation is handled with a HTTP PUT using the “multipart/form-data” content-type


All access to must be performed over HTTPS

and ( In think an R package which uses JSON ,RCurl  would further help in enhancing ease of usage).



Overall a welcome addition to make software in the real of cloud computing and statistical computation/business analytics both easy to use and easy to deploy with fail safe mechanisms built in.

Check out for yourself to see.

The invite codes are here -one time use only- first five get the invites- so click and try your luck, machine learning on the cloud.

If you dont get an invite (or it is already used, just leave your email there and wait a couple of days to get approval)


Oracle R Updated!

Interesting message from the latest R blog



Oracle just released the latest update to Oracle R Enterprise, version 1.1. This release includes the Oracle R Distribution (based on open source R, version 2.13.2), an improved server installation, and much more.  The key new features include:

  • Extended Server Support: New support for Windows 32 and 64-bit server components, as well as continuing support for Linux 64-bit server components
  • Improved Installation: Linux 64-bit server installation now provides robust status updates and prerequisite checks
  • Performance Improvements: Improved performance for embedded R script execution calculations

In addition, the updated ROracle package, which is used with Oracle R Enterprise, now reads date data by conversion to character strings.

We encourage you download Oracle software for evaluation from the Oracle Technology Network. See these links for R-related software: Oracle R DistributionOracle R EnterpriseROracleOracle R Connector for Hadoop.  As always, we welcome comments and questions on the Oracle R Forum.



Oracle R Distribution 2-13.2 Update Available

Oracle has released an update to the Oracle R Distribution, an Oracle-supported distribution of open source R. Oracle R Distribution 2-13.2 now contains the ability to dynamically link the following libraries on both Windows and Linux:

  • The Intel Math Kernel Library (MKL) on Intel chips
  • The AMD Core Math Library (ACML) on AMD chips


To take advantage of the performance enhancements provided by Intel MKL or AMD ACML in Oracle R Distribution, simply add the MKL or ACML shared library directory to the LD_LIBRARY_PATH system environment variable. This automatically enables MKL or ACML to make use of all available processors, vastly speeding up linear algebra computations and eliminating the need to recompile R.  Even on a single core, the optimized algorithms in the Intel MKL libraries are faster than using R’s standard BLAS library.

Open-source R is linked to NetLib’s BLAS libraries, but they are not multi-threaded and only use one core. While R’s internal BLAS are efficient for most computations, it’s possible to recompile R to link to a different, multi-threaded BLAS library to improve performance on eligible calculations. Compiling and linking to R yourself can be involved, but for many, the significantly improved calculation speed justifies the effort. Oracle R Distribution notably simplifies the process of using external math libraries by enabling R to auto-load MKL orACML. For R commands that don’t link to BLAS code, taking advantage of database parallelism usingembedded R execution in Oracle R Enterprise is the route to improved performance.

For more information about rebuilding R with different BLAS libraries, see the linear algebra section in the R Installation and Administration manual. As always, the Oracle R Distribution is available as a free download to anyone. Questions and comments are welcome on the Oracle R Forum.

New Economics Theories for the new Tech World

When I was doing my MBA (a decade ago), one of the principal theories on why corporations exist was 1) Shareholder Value creation (grow wealth for investors) and a notable second was 2) Stakeholder Value creation- creating jobs for societies, providing tax to countries, providing employees with stable employment and incentives,  and of course creating monetary value for shareholders.

There were two ways you could raise money- debt or equity. Debt had the advantage of interest payments being tax deductible. Debt payments had to be met regularly. Equity had the advantage that equity holders were the last ones to be paid in case of closing the company down, which justified that rate of return on equity is generally higher than cost of debt.  Dividend payouts to stockholders could be deferred in a low revenue year or due to planning reasons.

Or in plain English, over the long term borrowing money from share holders in lieu of stocks was more expensive than selling bonds or borrowing from the banks.

Hybrid combinations of debt and equity were warrants and debentures that started off as one form of instrument and over a period of time gave much more flexibility and risk safety nets to both issuers and subscribers of capital. Another hybrid was stock options (now considered as a default option of rewarding employees in technology companies, but this was not always the case).

The use of call and put options in debentures, and the idea of vesting period in stock options was to promote lone term stability and minimize fluctuations in stock prices, employee attrition, besides of course to minimize the weighted average cost of capital. Venture capital was another class of capital known for both huge rates of return and risk taking (?)

But in today’s world where a Google has three classes of shares, companies trade shares before IPOs, and valuations of technology companies sink and rise by huge % over weeks (especially as they near IPO dates)- I wonder if traditional theories in finance need a much stronger overhaul.

or do markets need a regulatory overhaul, that would enable stock exchanges to have once more the credibility they had as the primary sources of raising capital.


Who will guard the guardians? Their conscience- the regulators or the news media?

There are ways of raising money that are not evil.

But they are not perfectly fair as well.

Easter Eggs in #Rstats



A virtual Easter egg is an intentional hidden messagein-joke, or feature in a work such as a computer programweb pagevideo gamemoviebook, or crossword. The term was coined — according to Warren Robinett — by Atari after they were pointed to the secret message left by Robinett in the game Adventure.[1] It draws a parallel with the custom of the Easter egg hunt observed in many Western nations as well as the last Russian imperial family’s tradition of giving elaborately jeweled egg-shaped creations by Carl Fabergé which contained hidden surprises

In R.


I like this

just type


and these two

on 32 bit R type


and on any version try four question marks

Perhaps the prettiest eggs are the demos in animation package.

But there is magic in asking for help on internal functions in R

Just type-


and you get the sobering thought that you probably are a R Muggle

Call an Internal Function


.Internal performs a call to an internal code which is built in to the R interpreter.

Only true R wizards should even consider using this function, and only R developers can add to the list of internal functions.




call a call expression

See Also

.Primitive, .External (the nearest equivalent available to users).

I liked that I could see the actual internal functions in svn at

The opening of the internals document floored me.

It must have been a curious year in 2003-4 when the copyright of R was held (briefly it seems) by the R Foundation and also by the R Development Core Team. (which sounds better?)

*  R : A Computer Language for Statistical Data Analysis
 *  Copyright (C) 1995, 1996  Robert Gentleman and Ross Ihaka
 *  Copyright (C) 1997--2012  The R Development Core Team
 *  Copyright (C) 2003, 2004  The R Foundation

My contribution

R help discourages for loop

Try ??for or ?for

you go into a loop till you hit escape

If you want more-just write
 .Internal(inspect(ls())) at the end of your  R program.