Product Review – Revolution R 5.0

So I got the email from Revolution R. Version 5.0 is ready for download, and unlike half hearted attempts by many software companies they make it easy for the academics and researchers to get their free copy. Free as in speech and free as in beer.

Some thoughts-

1) R ‘s memory problem is now an issue of marketing and branding. Revolution Analytics has definitely bridged this gap technically  beautifully and I quote from their documentation-

The primary advantage 64-bit architectures bring to R is an increase in the amount of memory available to a given R process.
The first benefit of that increase is an increase in the size of data objects you can create. For example, on most 32-bit versions of R, the largest data object you can create is roughly 3GB; attempts to create 4GB objects result in errors with the message “cannot allocate vector of length xxxx.”
On 64-bit versions of R, you can generally create larger data objects, up to R’s current hard limit of 231 􀀀 1 elements in a vector (about 2 billion elements). The functions memory.size and memory.limit help you manage the memory used byWindows versions of R.
In 64-bit Revolution R Enterprise, R sets the memory limit by default to the amount of physical RAM minus half a gigabyte, so that, for example, on a machine with 8GB of RAM, the default memory limit is 7.5GB:

2) The User Interface is best shown as below or at  https://docs.google.com/presentation/pub?id=1V_G7r0aBR3I5SktSOenhnhuqkHThne6fMxly_-4i8Ag&start=false&loop=false&delayms=3000

-(but I am still hoping for the GUI ,Revolution Analytics promised us for Christmas)

3) The partnership with Microsoft HPC is quite awesome given Microsoft’s track record in enterprise software penetration

but I am also interested in knowing more about the Oracle version of R and what it will do there.

Funny Economics

Some wry observations from me  on the world on economics-

1) 150 years after humiliating their country in the Opium Wars, Chinese mandarins have somehow convinced their leaders and military to park 2 trillion assets in Anglo Saxon debt. If Greece geting a 50% discount on its loan is the new precedent, when will the USA force its lendors to the negotiation table.

2) Income inequality and protests are something the Arabs and Israelis have in common. Besides being the sons of Abraham of course. Note the Persians are not considered the same as Arabs.

3) Advance knowledge of geo-political events can and ensures Western financial dealers have an edge on the sovereign funds in the other hemisphere.  What used to be the playgrounds of Eton has now shifted to the pubs of Boston and So Cal.

4) After spending 1 trillion USD on arms in the past one decade (funded by guys in item 1), the United States military forces is in a much better more advanced position to wage simultaneous war.

5) Can a war in Korean peninsula affect war in the Persian sphere of influence. Just follow the money , baby.

6) Saudi Wahabis continue to fund terror despite losing a lot of money in the economic meltdown in past few years. For every 1 $ increase in Saudi oil revenue, western oil companies ,traders, financiers make more, much more.

7) Demographics is an important key to economics. An aging Japan, and stagnant West is one cause to shift from manpower intensive warfare to cyber warfare. Plus Cyber warfare is good business . Underpopulated Russia and Arabs continue to lack true economic potential.

8) There are new economic incentives to develop tools to disseminate as well as distort information flow in real time in a hyper connected digital world.

 

Kindle as a Tablet for 109$ and how to order it if you in India

I was just blown away by the price and functionality of the Kindle, including the browser and in built Wi-Fi- ( though the 40$ leather bag was a bit sneaky as an accessory, I mean seriously, dude)

And unlike some media technology companies (like Hulu,Spotify , even some Youtube channels)

who offer products to Asia only after a delayed lag, it is just as easy to order Kindle sitting from India.

Thank you Amazon!

 

and lastly some art to help prod those people who offer beta sites for limited countries  even in this age.

Credit- Paul Mutant

http://www.flickr.com/photos/paulmutant/4992725876/

Global Warfare on Google Plus

Global Warfare is one of the latest games on Google Plus. There are lots of similarities between this game and Evony at http://evony.com

Global Warfare is made by Kabam https://www.kabam.com/games/global-warfare which is making a total of 3 games for Google Plus (out of 18) and it has Google Ventures as a strategic investor as well (and a member on the board). Google is clearly wanting to bet on online gaming with its earlier strategic investment in Zynga as well. It also acquired http://www.labpixies.com/  (which makes the game Sudoko Puzzles and Flood It but it has more games in reserve as can be seen at https://market.android.com/search?q=labpixies, so clearly G+ is being selective on Games directory at https://plus.google.com/games/directory)

With these gaming companies and others like http://www.digitalchocolate.com/about/ and http://www.rovio.com/index.php?page=company and http://www.popcap.com/ – well they are all there on G+

is gaming the ace in hand in G+ plans for Facebook- time will tell.

Evony of course was a very good game, as it was also very similar (allegedly) to Civilization, and though its advertising campaign of semi clad characters draws flak, it got the worlds attention and recall. While Evony was situated in medieval world,  Global Warfare is a modern warfare equivalent.

Features in Global Warfare-

  • Alliances,
  • multiple player online gaming,
  • social sharing and rewards,
  • in game purchases,
  •  persistent world

Some drawbacks-

  • Slight clutter in gaming space (and lack of nice fonts!)
  • Lack of help forums (or easy availability)
  • Lack of in game search for searching or navigating alliances
Overall- a nice addition to the G+ family of games

 

Oracle adds R to Big Data Appliance -Use #Rstats

From the press release, Oracle gets on R and me too- NoSQL

http://www.oracle.com/us/corporate/press/512001

The Oracle Big Data Appliance is a new engineered system that includes an open source distribution of Apache™ Hadoop™, Oracle NoSQL Database, Oracle Data Integrator Application Adapter for Hadoop, Oracle Loader for Hadoop, and an open source distribution of R.

From

http://www.theregister.co.uk/2011/10/03/oracle_big_data_appliance/

the Big Data Appliance also includes the R programming language, a popular open source statistical-analysis tool. This R engine will integrate with 11g R2, so presumably if you want to do statistical analysis on unstructured data stored in and chewed by Hadoop, you will have to move it to Oracle after the chewing has subsided.

This approach to R-Hadoop integration is different from that announced last week between Revolution Analytics, the so-called Red Hat for stats that is extending and commercializing the R language and its engine, and Cloudera, which sells a commercial Hadoop setup called CDH3 and which was one of the early companies to offer support for Hadoop. Both Revolution Analytics and Cloudera now have Oracle as their competitor, which was no doubt no surprise to either.

In any event, the way they do it, the R engine is put on each node in the Hadoop cluster, and those R engines just see the Hadoop data as a native format that they can do analysis on individually. As statisticians do analyses on data sets, the summary data from all the nodes in the Hadoop cluster is sent back to their R workstations; they have no idea that they are using MapReduce on unstructured data.

Oracle did not supply configuration and pricing information for the Big Data Appliance, and also did not say when it would be for sale or shipping to customers

From

http://www.oracle.com/us/corporate/features/feature-oracle-nosql-database-505146.html

A Horizontally Scaled, Key-Value Database for the Enterprise
Oracle NoSQL Database is a commercial grade, general-purpose NoSQL database using a key/value paradigm. It allows you to manage massive quantities of data, cope with changing data formats, and submit simple queries. Complex queries are supported using Hadoop or Oracle Database operating upon Oracle NoSQL Database data.

Oracle NoSQL Database delivers scalable throughput with bounded latency, easy administration, and a simple programming model. It scales horizontally to hundreds of nodes with high availability and transparent load balancing. Customers might choose Oracle NoSQL Database to support Web applications, acquire sensor data, scale authentication services, or support online serves and social media.

and

from

http://siliconangle.com/blog/2011/09/30/oracle-adopting-open-source-r-to-connect-legacy-systems/

Oracle says it will integrate R with its Oracle Database. Other signs from Oracle show the deeper interest in using the statistical framework for integration with Hadoop to potentially speed statistical analysis. This has particular value with analyzing vast amounts of unstructured data, which has overwhelmed organizations, especially over the past year.

and

from

http://www.oracle.com/us/corporate/features/features-oracle-r-enterprise-498732.html

Oracle R Enterprise

Integrates the Open-Source Statistical Environment R with Oracle Database 11g
Oracle R Enterprise allows analysts and statisticians to run existing R applications and use the R client directly against data stored in Oracle Database 11g—vastly increasing scalability, performance and security. The combination of Oracle Database 11g and R delivers an enterprise-ready, deeply integrated environment for advanced analytics. Users can also use analytical sandboxes, where they can analyze data and develop R scripts for deployment while results stay managed inside Oracle Database.

Use R for Business- Competition worth $ 20,000 #rstats

All you contest junkies, R lovers and general change the world people, here’s a new contest to use R in a business application

http://www.revolutionanalytics.com/news-events/news-room/2011/revolution-analytics-launches-applications-of-r-in-business-contest.php

REVOLUTION ANALYTICS LAUNCHES “APPLICATIONS OF R IN BUSINESS” CONTEST

$20,000 in Prizes for Users Solving Business Problems with R

 

PALO ALTO, Calif. – September 1, 2011 – Revolution Analytics, the leading commercial provider of R software, services and support, today announced the launch of its “Applications of R in Business” contest to demonstrate real-world uses of applying R to business problems. The competition is open to all R users worldwide and submissions will be accepted through October 31. The Grand Prize winner for the best application using R or Revolution R will receive $10,000.

The bonus-prize winner for the best application using features unique to Revolution R Enterprise – such as itsbig-data analytics capabilities or its Web Services API for R – will receive $5,000. A panel of independent judges drawn from the R and business community will select the grand and bonus prize winners. Revolution Analytics will present five honorable mention prize winners each with $1,000.

“We’ve designed this contest to highlight the most interesting use cases of applying R and Revolution R to solving key business problems, such as Big Data,” said Jeff Erhardt, COO of Revolution Analytics. “The ability to process higher-volume datasets will continue to be a critical need and we encourage the submission of applications using large datasets. Our goal is to grow the collection of online materials describing how to use R for business applications so our customers can better leverage Big Analytics to meet their analytical and organizational needs.”

To enter Revolution Analytics’ “Applications of R in Business” competition Continue reading “Use R for Business- Competition worth $ 20,000 #rstats”

Interview Dan Steinberg Founder Salford Systems

Here is an interview with Dan Steinberg, Founder and President of Salford Systems (http://www.salford-systems.com/ )

Ajay- Describe your journey from academia to technology entrepreneurship. What are the key milestones or turning points that you remember.

 Dan- When I was in graduate school studying econometrics at Harvard,  a number of distinguished professors at Harvard (and MIT) were actively involved in substantial real world activities.  Professors that I interacted with, or studied with, or whose software I used became involved in the creation of such companies as Sun Microsystems, Data Resources, Inc. or were heavily involved in business consulting through their own companies or other influential consultants.  Some not involved in private sector consulting took on substantial roles in government such as membership on the President’s Council of Economic Advisors. The atmosphere was one that encouraged free movement between academia and the private sector so the idea of forming a consulting and software company was quite natural and did not seem in any way inconsistent with being devoted to the advancement of science.

 Ajay- What are the latest products by Salford Systems? Any future product plans or modification to work on Big Data analytics, mobile computing and cloud computing.

 Dan- Our central set of data mining technologies are CART, MARS, TreeNet, RandomForests, and PRIM, and we have always maintained feature rich logistic regression and linear regression modules. In our latest release scheduled for January 2012 we will be including a new data mining approach to linear and logistic regression allowing for the rapid processing of massive numbers of predictors (e.g., one million columns), with powerful predictor selection and coefficient shrinkage. The new methods allow not only classic techniques such as ridge and lasso regression, but also sub-lasso model sizes. Clear tradeoff diagrams between model complexity (number of predictors) and predictive accuracy allow the modeler to select an ideal balance suitable for their requirements.

The new version of our data mining suite, Salford Predictive Modeler (SPM), also includes two important extensions to the boosted tree technology at the heart of TreeNet.  The first, Importance Sampled learning Ensembles (ISLE), is used for the compression of TreeNet tree ensembles. Starting with, say, a 1,000 tree ensemble, the ISLE compression might well reduce this down to 200 reweighted trees. Such compression will be valuable when models need to be executed in real time. The compression rate is always under the modeler’s control, meaning that if a deployed model may only contain, say, 30 trees, then the compression will deliver an optimal 30-tree weighted ensemble. Needless to say, compression of tree ensembles should be expected to be lossy and how much accuracy is lost when extreme compression is desired will vary from case to case. Prior to ISLE, practitioners have simply truncated the ensemble to the maximum allowable size.  The new methodology will substantially outperform truncation.

The second major advance is RULEFIT, a rule extraction engine that starts with a TreeNet model and decomposes it into the most interesting and predictive rules. RULEFIT is also a tree ensemble post-processor and offers the possibility of improving on the original TreeNet predictive performance. One can think of the rule extraction as an alternative way to explain and interpret an otherwise complex multi-tree model. The rules extracted are similar conceptually to the terminal nodes of a CART tree but the various rules will not refer to mutually exclusive regions of the data.

 Ajay- You have led teams that have won multiple data mining competitions. What are some of your favorite techniques or approaches to a data mining problem.

 Dan- We only enter competitions involving problems for which our technology is suitable, generally, classification and regression. In these areas, we are  partial to TreeNet because it is such a capable and robust learning machine. However, we always find great value in analyzing many aspects of a data set with CART, especially when we require a compact and easy to understand story about the data. CART is exceptionally well suited to the discovery of errors in data, often revealing errors created by the competition organizers themselves. More than once, our reports of data problems have been responsible for the competition organizer’s decision to issue a corrected version of the data and we have been the only group to discover the problem.

In general, tackling a data mining competition is no different than tackling any analytical challenge. You must start with a solid conceptual grasp of the problem and the actual objectives, and the nature and limitations of the data. Following that comes feature extraction, the selection of a modeling strategy (or strategies), and then extensive experimentation to learn what works best.

 Ajay- I know you have created your own software. But are there other software that you use or liked to use?

 Dan- For analytics we frequently test open source software to make sure that our tools will in fact deliver the superior performance we advertise. In general, if a problem clearly requires technology other than that offered by Salford, we advise clients to seek other consultants expert in that other technology.

 Ajay- Your software is installed at 3500 sites including 400 universities as per http://www.salford-systems.com/company/aboutus/index.html What is the key to managing and keeping so many customers happy?

 Dan- First, we have taken great pains to make our software reliable and we make every effort  to avoid problems related to bugs.  Our testing procedures are extensive and we have experts dedicated to stress-testing software . Second, our interface is designed to be natural, intuitive, and easy to use, so the challenges to the new user are minimized. Also, clear documentation, help files, and training videos round out how we allow the user to look after themselves. Should a client need to contact us we try to achieve 24-hour turn around on tech support issues and monitor all tech support activity to ensure timeliness, accuracy, and helpfulness of our responses. WebEx/GotoMeeting and other internet based contact permit real time interaction.

 Ajay- What do you do to relax and unwind?

 Dan- I am in the gym almost every day combining weight and cardio training. No matter how tired I am before the workout I always come out energized so locating a good gym during my extensive travels is a must. I am also actively learning Portuguese so I look to watch a Brazilian TV show or Portuguese dubbed movie when I have time; I almost never watch any form of video unless it is available in Portuguese.

 Biography-

http://www.salford-systems.com/blog/dan-steinberg.html

Dan Steinberg, President and Founder of Salford Systems, is a well-respected member of the statistics and econometrics communities. In 1992, he developed the first PC-based implementation of the original CART procedure, working in concert with Leo Breiman, Richard Olshen, Charles Stone and Jerome Friedman. In addition, he has provided consulting services on a number of biomedical and market research projects, which have sparked further innovations in the CART program and methodology.

Dr. Steinberg received his Ph.D. in Economics from Harvard University, and has given full day presentations on data mining for the American Marketing Association, the Direct Marketing Association and the American Statistical Association. After earning a PhD in Econometrics at Harvard Steinberg began his professional career as a Member of the Technical Staff at Bell Labs, Murray Hill, and then as Assistant Professor of Economics at the University of California, San Diego. A book he co-authored on Classification and Regression Trees was awarded the 1999 Nikkei Quality Control Literature Prize in Japan for excellence in statistical literature promoting the improvement of industrial quality control and management.

His consulting experience at Salford Systems has included complex modeling projects for major banks worldwide, including Citibank, Chase, American Express, Credit Suisse, and has included projects in Europe, Australia, New Zealand, Malaysia, Korea, Japan and Brazil. Steinberg led the teams that won first place awards in the KDDCup 2000, and the 2002 Duke/TeraData Churn modeling competition, and the teams that won awards in the PAKDD competitions of 2006 and 2007. He has published papers in economics, econometrics, computer science journals, and contributes actively to the ongoing research and development at Salford.