Software Lawsuits :Ergo

The latest round of software lawsuits makes things more interesting especially for Google. There are two notable developments

1) Google’s pact with Verizon for Even more Open Internet -From

http://googlepublicpolicy.blogspot.com/2010/08/joint-policy-proposal-for-open-internet.html

A provider that offers a broadband Internet access service
complying with the above principles could offer any other additional or differentiated services. Such other services would have to be distinguishable in scope and purpose from broadband . Internet access service, but could make use of or access Internet content, applications or services
and could include traffic prioritization.

2) Oracle’s lawsuit against Google for Intellectual Property enforcement of Java for Android. ( read here http://news.cnet.com/8301-30685_3-20013549-264.html

I once joked about nothing remains cool forever not even Google (see https://decisionstats.wordpress.com/2008/08/05/11-ways-to-beat-up-google/ ) and I did not foresee the big G beating itself into knots on its own.

It is hard to sympathize with Google (or Oracle or Verizon) but this is a mess that is created when lawyers (with a briefcase) steal value rather than a thousand engineers can create value.

Interestingly Google owns the IP for Map Reduce – so could it itself sue the Hadoop community over terms of royalty someday-like Oracle did with Java- hmmmmm interesting revenue stream

All in all I would be happy to see zero tiers on an internet (wireless or wired) and even Java developers to make some money on writing code. Open source is not free source.

Big Data and R: New Product Release by Revolution Analytics

Press Release by the Guys in Revolution Analytics- this time claiming to enable terabyte level analytics with R. Interesting stuff but techie details are awaited.

Revolution Analytics Brings

Big Data Analysis to R

The world’s most powerful statistics language can now tackle terabyte-class data sets using

Revolution R Enterpriseat a fraction of the cost of legacy analytics products


JSM 2010 – VANCOUVER (August 3, 2010) — Revolution Analytics today introduced ‘Big Data’ analysis to its Revolution R Enterprise software, taking the popular R statistics language to unprecedented new levels of capacity and performance for analyzing very large data sets. For the first time, R users will be able to process, visualize and model terabyte-class data sets in a fraction of the time of legacy products—without employing expensive or specialized hardware.

The new version of Revolution R Enterprise introduces an add-on package called RevoScaleR that provides a new framework for fast and efficient multi-core processing of large data sets. It includes:

  • The XDF file format, a new binary ‘Big Data’ file format with an interface to the R language that provides high-speed access to arbitrary rows, blocks and columns of data.
  • A collection of widely-used statistical algorithms optimized for Big Data, including high-performance implementations of Summary Statistics, Linear Regression, Binomial Logistic Regressionand Crosstabs—with more to be added in the near future.
  • Data Reading & Transformation tools that allow users to interactively explore and prepare large data sets for analysis.
  • Extensibility, expert R users can develop and extend their own statistical algorithms to take advantage of Revolution R Enterprise’s new speed and scalability capabilities.

“The R language’s inherent power and extensibility has driven its explosive adoption as the modern system for predictive analytics,” said Norman H. Nie, president and CEO of Revolution Analytics. “We believe that this new Big Data scalability will help R transition from an amazing research and prototyping tool to a production-ready platform for enterprise applications such as quantitative finance and risk management, social media, bioinformatics and telecommunications data analysis.”

Sage Bionetworks is the nonprofit force behind the open-source collaborative effort, Sage Commons, a place where data and disease models can be shared by scientists to better understand disease biology. David Henderson, Director of Scientific Computing at Sage, commented: “At Sage Bionetworks, we need to analyze genomic databases hundreds of gigabytes in size with R. We’re looking forward to using the high-speed data-analysis features of RevoScaleR to dramatically reduce the times it takes us to process these data sets.”

Take Hadoop and Other Big Data Sources to the Next Level

Revolution R Enterprise fits well within the modern ‘Big Data’ architecture by leveraging popular sources such as Hadoop, NoSQL or key value databases, relational databases and data warehouses. These products can be used to store, regularize and do basic manipulation on very large datasets—while Revolution R Enterprise now provides advanced analytics at unparalleled speed and scale: producing speed on speed.

“Together, Hadoop and R can store and analyze massive, complex data,” said Saptarshi Guha, developer of the popular RHIPE R package that integrates the Hadoop framework with R in an automatically distributed computing environment. “Employing the new capabilities of Revolution R Enterprise, we will be able to go even further and compute Big Data regressions and more.”

Platforms and Availability

The new RevoScaleR package will be delivered as part of Revolution R Enterprise 4.0, which will be available for 32-and 64-bit Microsoft Windows in the next 30 days. Support for Red Hat Enterprise Linux (RHEL 5) is planned for later this year.

On its website (http://www.revolutionanalytics.com/bigdata), Revolution Analytics has published performance and scalability benchmarks for Revolution R Enterprise analyzing a 13.2 gigabyte data set of commercial airline information containing more than 123 million rows, and 29 columns.

Additionally, the company will showcase its new Big Data solution in a free webinar on August 25 at 9:00 a.m. Pacific.

Additional Resources

•      Big Data Benchmark whitepaper

•      The Revolution Analytics Roadmap whitepaper

•      Revolutions Blog

•      Download free academic copy of Revolution R Enterprise

•      Visit Inside-R.org for the most comprehensive set of information on R

•      Spread the word: Add a “Download R!” badge on your website

•      Follow @RevolutionR on Twitter

About Revolution Analytics

Revolution Analytics (http://www.revolutionanalytics.com) is the leading commercial provider of software and support for the popular open source R statistics language. Its Revolution R products help make predictive analytics accessible to every type of user and budget. The company is headquartered in Palo Alto, Calif. and backed by North Bridge Venture Partners and Intel Capital.

Media Contact

Chantal Yang
Page One PR, for Revolution Analytics
Tel: +1 415-875-7494

Email:  revolution@pageonepr.com

R Oracle Data Mining

Here is a new package called R ODM and it is an interface to do Data Mining via Oracle Tables through R. You can read more here http://www.oracle.com/technetwork/database/options/odm/odm-r-integration-089013.html and here http://cran.fhcrc.org/web/packages/RODM/RODM.pdf . Also there is a contest for creative use of R and ODM.

R Interface to Oracle Data Mining

The R Interface to Oracle Data Mining ( R-ODM) allows R users to access the power of Oracle Data Mining’s in-database functions using the familiar R syntax. R-ODM provides a powerful environment for prototyping data analysis and data mining methodologies.

R-ODM is especially useful for:

  • Quick prototyping of vertical or domain-based applications where the Oracle Database supports the application
  • Scripting of “production” data mining methodologies
  • Customizing graphics of ODM data mining results (examples: classificationregressionanomaly detection)

The R-ODM interface allows R users to mine data using Oracle Data Mining from the R programming environment. It consists of a set of function wrappers written in source R language that pass data and parameters from the R environment to the Oracle RDBMS enterprise edition as standard user PL/SQL queries via an ODBC interface. The R-ODM interface code is a thin layer of logic and SQL that calls through an ODBC interface. R-ODM does not use or expose any Oracle product code as it is completely an external interface and not part of any Oracle product. R-ODM is similar to the example scripts (e.g., the PL/SQL demo code) that illustrates the use of Oracle Data Mining, for example, how to create Data Mining models, pass arguments, retrieve results etc.

R-ODM is packaged as a standard R source package and is distributed freely as part of the R environment’s Comprehensive R Archive Network ( CRAN). For information about the R environment, R packages and CRAN, see www.r-project.org.

and

Present and win an Apple iPod Touch!
The BI, Warehousing and Analytics (BIWA) SIG is giving an Apple iPOD Touch to the best new presenter. Be part of the TechCast series and get a chance to win!

Consider highlighting a creative use of R and ODM.

BIWA invites all Oracle professionals (experts, end users, managers, DBAs, developers, data analysts, ISVs, partners, etc.) to submit abstracts for 45 minute technical webcasts to our Oracle BIWA (IOUG SIG) Community in our Wednesday TechCast series. Note that the contest is limited to new presenters to encourage fresh participation by the BIWA community.

Also an interview with Oracle Data Mining head, Charlie Berger https://decisionstats.wordpress.com/2009/09/02/oracle/

Open Source and Software Strategy

Curt Monash at Monash Research pointed out some ongoing open source GPL issues for WordPress and the Thesis issue (Also see http://ma.tt/2009/04/oracle-and-open-source/ and  http://www.mattcutts.com/blog/switching-things-around/).

As a user of both going upwards of 2 years- I believe open source and GPL license enforcement are general parts of software strategy of most software companies nowadays. Some thoughts on  open source and software strategy-Thesis remains a very very popular theme and has earned upwards of 100,000 $ for its creator (estimate based on 20k plus installs and 60$ avg price)

  • Little guys like to give away code to get some satisfaction/ recognition, big guys give away free code only when its necessary or when they are not making money in that product segment anyway.
  • As Ethan Hunt said, ” Every Hero needs a Villian”. Every software (market share) war between players needs One Big Company Holding more market share and Open Source Strategy between other player who is not able to create in house code, so effectively out sources by creating open source project. But same open source propent rarely gives away the secret to its own money making project.
    • Examples- Google creates open source Android, but wont reveal its secret algorithm for search which drives its main profits,
    • Google again puts a paper for MapReduce but it’s Yahoo that champions Hadoop,
    • Apple creates open source projects (http://www.apple.com/opensource/) but wont give away its Operating Source codes (why?) which help people buys its more expensive hardware,
    • IBM who helped kickstart the whole proprietary code thing (remember MS DOS) is the new champion of open source (http://www.ibm.com/developerworks/opensource/) and
    • Microsoft continues to spark open source debate but read http://blogs.technet.com/b/microsoft_blog/archive/2010/07/02/a-perspective-on-openness.aspx and  also http://www.microsoft.com/opensource/
    • SAS gives away a lot of open source code (Read Jim Davis , CMO SAS here , but will stick to Base SAS code (even though it seems to be making more money by verticals focus and data mining).
    • SPSS was the first big analytics company that helps supports R (open source stats software) but will cling to its own code on its softwares.
    • WordPress.org gives away its software (and I like Akismet just as well as blogging) for open source, but hey as anyone who is on WordPress.com knows how locked in you can get by its (pricy) platform.
    • Vendor Lock-in (wink wink price escalation) is the elephant in the room for Big Software Proprietary Companies.
    • SLA Quality, Maintenance and IP safety is the uh-oh for going in for open source software mostly.
  • Lack of IP protection for revenue models for open source code is the big bottleneck  for a lot of companies- as very few software users know what to do with source code if you give it to them anyways.
    • If companies were confident that they would still be earning same revenue and there would be less leakage or theft, they would gladly give away the source code.
    • Derivative softwares or extensions help popularize the original softwares.
      • Half Way Steps like Facebook Applications  the original big company to create a platform for third party creators),
      • IPhone Apps and Android Applications show success of creating APIs to help protect IP and software control while still giving some freedom to developers or alternate
      • User Interfaces to R in both SAS/IML and JMP is a similar example
  • Basically open source is mostly done by under dog while top dog mostly rakes in money ( and envy)
  • There is yet to a big commercial success in open source software, though they are very good open source softwares. Just as Google’s success helped establish advertising as an alternate ( and now dominant) revenue source for online companies , Open Source needs a big example of a company that made billions while giving source code away and still retaining control and direction of software strategy.
  • Open source people love to hate proprietary packages, yet there are more shades of grey (than black and white) and hypocrisy (read lies) within  the open source software movement than the regulated world of big software. People will be still people. Software is just a piece of code.  😉

(Art citation-http://gapingvoid.com/about/ and http://gapingvoidgallery.com/

LinkedIn Apps : Blogging and Twitter

I like the following two LinkedIn Apps the most for sheer usefulness .Try them out the blog gets extra readers and Company Buzz helps you track company. See them right below the all new LinkedIn inbox.

image

This would take you to the Application Directory.

http://www.linkedin.com/static?key=application_directory

The WordPress blog helps me when I need to post some article to one of the 50 groups.

I then use the Add to Any button from www.addtoany.com and click share on LinkedIn.

LinkedIn indeed is quite convenient for sharing ( but be careful of overdoing it lest it goes spam)

image

I use the Company Buzz with a slight twist though.By using broad keywords I get a list of say people I want to follow in the world of data mining or business intelligence.

Next I follow NOT the people following the influential tweeters, but the people who they are using RT or messaging replies @.

image

Hope this helps your LinkedIn account. You can send me a LinkedIn Invite at www.linkedin.com/in/ajayohri .

And do take part in the Social Media School group on LinkedIn here http://www.linkedin.com/groups?gid=1854626