SAS Data Loader for Hadoop is now a 90 day free trial

From-

http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms/cdh-5-3-x.html

 

SAS Data Loader for Hadoop eliminates the complexities of writing MapReduce code, with a simple, point-and-click interface that empowers business analysts to prepare, integrate and cleanse big data faster and easier than ever. In addition, data scientists and programmers can run SAS code on Hadoop in parallel for better performance and greater productivity.

 


Get Started

  1. Download and install Cloudera QuickStart VM for CDH 5.3x.
  2. Download and install either VMware Player 6.0 or later (for Windows) or VMware Fusion for OS X 6.0 (for Mac).
  3. Download and install your 90-day free trial of SAS Data Loader for Hadoop.

and

from

http://www.sas.com/en_us/software/data-management/data-loader-hadoop.html

 

Oracle adds R to Big Data Appliance -Use #Rstats

From the press release, Oracle gets on R and me too- NoSQL

http://www.oracle.com/us/corporate/press/512001

The Oracle Big Data Appliance is a new engineered system that includes an open source distribution of Apache™ Hadoop™, Oracle NoSQL Database, Oracle Data Integrator Application Adapter for Hadoop, Oracle Loader for Hadoop, and an open source distribution of R.

From

http://www.theregister.co.uk/2011/10/03/oracle_big_data_appliance/

the Big Data Appliance also includes the R programming language, a popular open source statistical-analysis tool. This R engine will integrate with 11g R2, so presumably if you want to do statistical analysis on unstructured data stored in and chewed by Hadoop, you will have to move it to Oracle after the chewing has subsided.

This approach to R-Hadoop integration is different from that announced last week between Revolution Analytics, the so-called Red Hat for stats that is extending and commercializing the R language and its engine, and Cloudera, which sells a commercial Hadoop setup called CDH3 and which was one of the early companies to offer support for Hadoop. Both Revolution Analytics and Cloudera now have Oracle as their competitor, which was no doubt no surprise to either.

In any event, the way they do it, the R engine is put on each node in the Hadoop cluster, and those R engines just see the Hadoop data as a native format that they can do analysis on individually. As statisticians do analyses on data sets, the summary data from all the nodes in the Hadoop cluster is sent back to their R workstations; they have no idea that they are using MapReduce on unstructured data.

Oracle did not supply configuration and pricing information for the Big Data Appliance, and also did not say when it would be for sale or shipping to customers

From

http://www.oracle.com/us/corporate/features/feature-oracle-nosql-database-505146.html

A Horizontally Scaled, Key-Value Database for the Enterprise
Oracle NoSQL Database is a commercial grade, general-purpose NoSQL database using a key/value paradigm. It allows you to manage massive quantities of data, cope with changing data formats, and submit simple queries. Complex queries are supported using Hadoop or Oracle Database operating upon Oracle NoSQL Database data.

Oracle NoSQL Database delivers scalable throughput with bounded latency, easy administration, and a simple programming model. It scales horizontally to hundreds of nodes with high availability and transparent load balancing. Customers might choose Oracle NoSQL Database to support Web applications, acquire sensor data, scale authentication services, or support online serves and social media.

and

from

http://siliconangle.com/blog/2011/09/30/oracle-adopting-open-source-r-to-connect-legacy-systems/

Oracle says it will integrate R with its Oracle Database. Other signs from Oracle show the deeper interest in using the statistical framework for integration with Hadoop to potentially speed statistical analysis. This has particular value with analyzing vast amounts of unstructured data, which has overwhelmed organizations, especially over the past year.

and

from

http://www.oracle.com/us/corporate/features/features-oracle-r-enterprise-498732.html

Oracle R Enterprise

Integrates the Open-Source Statistical Environment R with Oracle Database 11g
Oracle R Enterprise allows analysts and statisticians to run existing R applications and use the R client directly against data stored in Oracle Database 11g—vastly increasing scalability, performance and security. The combination of Oracle Database 11g and R delivers an enterprise-ready, deeply integrated environment for advanced analytics. Users can also use analytical sandboxes, where they can analyze data and develop R scripts for deployment while results stay managed inside Oracle Database.

Why open source companies dont dance?

A social network diagram
Image via Wikipedia

I have been pondering on this seemingly logical paradox for some time now-

1) Why are open source solutions considered technically better but not customer friendly.

2) Why do startups and app creators in social media or mobile get much more press coverage than

profitable startups in enterprise software.

3) How does tech journalism differ in covering open source projects in enterprise versus retail software.

4) What are the hidden rules of the game of enterprise software.

Some observations-

1) Open source companies often focus much more on technical community management and crowd sourcing code. Traditional software companies focus much more on managing the marketing community of customers and influencers. Accordingly the balance of power is skewed in favor of techies and R and D in open source companies, and in favor of marketing and analyst relations in traditional software companies.

Traditional companies also spend much more on hiring top notch press release/public relationship agencies, while open source companies are both financially and sometimes ideologically opposed to older methods of marketing software. The reverse of this is you are much more likely to see Videos and Tutorials by an open source company than a traditional company. You can compare the websites of ClouderaDataStax, Hadapt ,Appistry and Mapr and contrast that with Teradata or Oracle (which has a much bigger and much more different marketing strategy.

Social media for marketing is also more efficiently utilized by smaller companies (open source) while bigger companies continue to pay influential analysts for expensive white papers that help present the brand.

Lack of budgets is a major factor that limits access to influential marketing for open source companies particularly in enterprise software.

2 and 3) Retail software is priced at 2-100$ and sells by volume. Accordingly technology coverage of these software is based on volume.

Enterprise software is much more expensively priced and has much more discreet volume or sales points. Accordingly the technology coverage of enterprise software is more discreet, in terms of a white paper coming every quarter, a webinar every month and a press release every week. Retail software is covered non stop , but these journalists typically do not charge for “briefings”.

Journalists covering retail software generally earn money by ads or hosting conferences. So they have an interest in covering new stuff or interesting disruptive stuff. Journalists or analysts covering enterprise software generally earn money by white papers, webinars, attending than hosting conferences, writing books. They thus have a much stronger economic incentive to cover existing landscape and technologies than smaller startups.

4) What are the hidden rules of the game of enterprise software.

  • It is mostly a white man’s world. this can be proved by statistical demographic analysis
  • There is incestuous intermingling between influencers, marketers, and PR people. This can be proved by simple social network analysis of who talks to who and how much. A simple time series between sponsorship and analysts coverage also will prove this (I am working on quantifying this ).
  • There are much larger switching costs to enterprise software than retail software. This leads to legacy shoddy software getting much chances than would have been allowed in an efficient marketplace.
  • Enterprise software is a less efficient marketplace than retail software in all definitions of the term “efficient markets”
  • Cloud computing, and SaaS and Open source threatens to disrupt the jobs and careers of a large number of people. In the long term, they will create many more jobs, but in the short term, people used to comfortable living of enterprise software (making,selling,or writing) will actively and passively resist these changes to the  paradigms in the current software status quo.
  • Open source companies dont dance and dont play ball. They prefer to hire 4 more college grads than commission 2 more white papers.

and the following with slight changes from a comment I made on a fellow blog-

  • While the paradigm on how to create new software has evolved from primarily silo-driven R and D departments to a broader collaborative effort, the biggest drawback is software marketing has not evolved.
  • If you want your own version of the open source community editions to be more popular, some standardization is necessary for the corporate decision makers, and we need better marketing paradigms.
  • While code creation is crowdsourced, solution implementation cannot be crowdsourced. Customers want solutions to a problem not code.
  • Just as open source as a production and licensing paradigm threatens to disrupt enterprise software, it will lead to newer ways to marketing software given the hostility of existing status quo.

R is Ready for Business™

A new 5 page brochure from Revolution Analytics. Not that slick and some marketing under-kill (which frankly is a surprise)- but I guess Revolution Analytics does not have a full time graphics designer to help with it’s collateral.

Take a look if you are curious how and why R is getting more and more ready for business.

AsterData partners with Tableau

This chart represents several constituent comp...
Image via Wikipedia

Tableau which has been making waves recntly with its great new data visualization tool announced a partner with my old friends at AsterData. Its really cool piece of data vis and very very fast on the desktop- so I can imagine what speed it can help with AsterData’s MPP Row and Column Zingbang AND Parallel Analytical Functions

Tableau and AsterData also share the common Stanfordian connection (but it seems software is divided quite equally between Stanford, Hardvard Dropouts and North Carolina )

It remains to be seen in this announcement how much each company  can leverage the partnership or whether it turns like the SAS Institute- AsterData partnership last year or whether it is just to announce connectors in their software to talk to each other.

See a Tableau vis at

http://public.tableausoftware.com/views/geographyofdiabetes/Dashboard2?:embed=yes&:toolbar=yes

AsterData remains the guys with the potential but I would be wrong to say MapReduceSQL is as hot in December 2010 as it was in June 2009- and the elephant in the room would be Hadoop. That and Google’s continued shyness from encashing its principal comptency of handling Big Data (but hush – I signed a NDA with the Google Prediction API– so things maaaay change very rapidly on ahem that cloud)

Disclaimer- AsterData was my internship sponsor during my winter training while at Univ of  Tenn.

 

Jim Goodnight on Open Source- and why he is right -sigh

Logo Open Source Initiative
Image via Wikipedia

Jim Goodnight – grand old man and Godfather of the Cosa Nostra of the BI/Database Analytics software industry said recently on open source in BI (btw R is generally termed in business analytics and NOT business intelligence software so these remarks were more apt to Pentaho and Jaspersoft )

Asked whether open source BI and data integration software from the likes of Jaspersoft, Pentaho and Talend is a growing threat, [Goodnight] said: “We haven’t noticed that a lot. Most of our companies need industrial strength software that has been tested, put through every possible scenario or failure to make sure everything works correctly.”

quotes from Jim Goodnight are courtesy Jason’s  story here:
http://www.cbronline.com/news/sas-ceo-says-cep-open-source-and-cloud-bi-have-limited-appeal

and the Pentaho follow-up reaction is here

http://bi.cbronline.com/news/pentaho-fires-back-across-sas-bows-over-limited-open-source-appeal

 

 

While you can rage and screech- here is the reality in terms of market share-

From Merv Adrian-‘s excellent article on market shares in BI

http://www.enterpriseirregulars.com/22444/decoding-bi-market-share-numbers-%E2%80%93-play-sudoku-with-analysts/

The first, labeled BI Platforms, is drawn fromGartner Market Share Analysis: Business Intelligence, Analytics and Performance Management Software, Worldwide, 2009, published May 2010 , and Gartner Dataquest Market Share: Business Intelligence, Analytics and Performance Management Software, Worldwide, 2009.

and

Advanced Analytics category.

and 

so whats the performance of Talend, Pentaho and Jaspersoft

From http://www.dbms2.com/category/products-and-vendors/talend/

It seems that Talend’s revenue was somewhat shy of $10 million in 2008.

and Talend itself says

http://www.talend.com/press/Talend-Announces-Record-2009-and-Continues-Growth-in-the-New-Year.php

Additional 2009 highlights include:

  • Achieved record revenue, more then doubling from 2008. The fourth quarter of 2009 was Talend’s tenth consecutive quarter of growth.
  • Grew customer base by 140% to over 1,000 customers, up from 420 at the end of 2008. Of these new customers, over 50% are Fortune 1000 companies.
  • Total downloads reached seven million, with over 300,000 users of the open source products.
  • Talend doubled its staff, increasing to 200 global employees. Continuing this trend, Talend has already hired 15 people in 2010 to support its rapid growth.

now for Jaspersoft numbers

http://www.dbms2.com/2008/09/14/jaspersoft-numbers/

Highlights include:

  • Revenue run rate in the double-digit millions.
  • 40% sequential growth most recent quarter. (I didn’t ask whether there was any reason to suspect seasonality.)
  • 130% annual revenue growth run rate.
  • “Not quite” profitable.
  • Several hundred commercial subscribers, at an average of $25K annually per, including >100 in Europe.
  • 9,000 paying customers of some kind.
  • 100,000+ total deployments, “very conservatively,” counting OEMs as one deployment each and not double-counting for OEMs’ customers. (Nick said Business Objects quotes 45,000 deployments by the same standards.)
  • 70% of revenue from the mid-market, defined as $100 million – $1 billion revenue. 30% from bigger enterprises. (Hmm. That begs a couple of questions, such as where OEM revenue comes in, and whether <$100 million enterprises were truly a negligible part of revenue.)

and for Pentaho numbers-

http://www.dbms2.com/2009/01/27/introduction-to-pentaho/

and http://www.monash.com/uploads/Pentaho-January-2009.pdf

suggests there are far far away from the top 5-6 vendors in BI

and a special mention  for postgreSQL– which is a non Profit but is seriously denting Oracle/MySQL

http://www.postgresql.org/about/

Limit Value
Maximum Database Size Unlimited
Maximum Table Size 32 TB
Maximum Row Size 1.6 TB
Maximum Field Size 1 GB
Maximum Rows per Table Unlimited
Maximum Columns per Table 250 – 1600 depending on column types
Maximum Indexes per Table Unlimited

and leading vendor is EnterpriseDB which is again IBM-partnering as well as IBM funded

http://www.sramanamitra.com/2009/05/18/enterprise-db/

and

http://www.enterprisedb.com/company/news_events/press_releases/2010_21.do

suggest it is still in early stages.

————————————————————–

So what do we conclude-

1) There is a complete lack of transparency in open source BI market shares as almost all these companies are privately held and do not disclose revenues.

2) What may be a pure play open source company may actually be a company funded by a big BI vendor (like Revolution Analytics is funded among others by Intel-Microsoft) and EnterpriseDB has IBM as an investor.MySQL and Sun of course are bought by Oracle

The degree of control by proprietary vendors on open source vendors is still not disclosed- whether they are holding a stake for strategic reasons or otherwise.

3) None of the Open Source Vendors are even close to a 1 Billion dollar revenue number.

Jim Goodnight is pointing out market reality when he says he has not seen much impact (in terms of market share). As for the rest of his remarks, well he’s got a job to do as CEO and thats talk up his company and trash the competition- which he as been doing for 3 decades and unlikely to change now unless there is severe market share impact. Unless you expect him to notice companies less than 5% of his size in revenue.

http://www.cbronline.com/news/sas-ceo-says-cep-open-source-and-cloud-bi-have-limited-appeal

http://bi.cbronline.com/news/pentaho-fires-back-across-sas-bows-over-limited-open-source-appeal

 

Cloudera and Aster Data partner up

Basically making it easier for data to move between the two systems Hadoop (Cloudera) and Aster’s Analytics (MapReduce/SQL)-

From the press release-http://goo.gl/vgsr

today announced an agreement that unites Cloudera Distribution for Hadoop (CDH) with Aster Data nCluster. The integration enables customers to leverage MPP platforms for large-scale data processing, management and analytics across structured and unstructured formats to analyze massive amounts of data for deeper business insight.

Cloudera is building a massively parallel, two-way connector for high-speed movement of data between CDH and Aster Data nCluster. The connector will be supported as part of Cloudera Enterprise.