I am surrounded by people
of dazzling brilliance , beauty and mind
Sometimes they are in the room in my face
Sometimes we interact digitally online
I would never be so cunning
So sharp, astute and yet so polite
I feel sometimes like a little cave man
who has stumbled upon the first artificial light
Or like a flattened sunflower
in a field of tall yellow poppy flower
I am bright but still a medium-ochre
In the middle of all that bright golden power
Maybe I will never be a genius
Die unrequited unsung like billions before
Hey I tried to live up to all that potential
But the pretending and defending was too much of a chore
so mediocre and such a medium ochre
my shining shall be twinkly winkly so-s0
it was a blast and atleast we tried
played ,laughed ,partied then died.
Jim Goodnight – grand old man and Godfather of the Cosa Nostra of the BI/Database Analytics software industry said recently on open source in BI (btw R is generally termed in business analytics and NOT business intelligence software so these remarks were more apt to Pentaho and Jaspersoft )
Asked whether open source BI and data integration software from the likes of Jaspersoft, Pentaho and Talend is a growing threat, [Goodnight] said: “We haven’t noticed that a lot. Most of our companies need industrial strength software that has been tested, put through every possible scenario or failure to make sure everything works correctly.”
The first, labeled BI Platforms, is drawn fromGartner Market Share Analysis: Business Intelligence, Analytics and Performance Management Software, Worldwide, 2009, published May 2010 , and Gartner Dataquest Market Share: Business Intelligence, Analytics and Performance Management Software, Worldwide, 2009.
and
Advanced Analytics category.
and
so whats the performance of Talend, Pentaho and Jaspersoft
Achieved record revenue, more then doubling from 2008. The fourth quarter of 2009 was Talend’s tenth consecutive quarter of growth.
Grew customer base by 140% to over 1,000 customers, up from 420 at the end of 2008. Of these new customers, over 50% are Fortune 1000 companies.
Total downloads reached seven million, with over 300,000 users of the open source products.
Talend doubled its staff, increasing to 200 global employees. Continuing this trend, Talend has already hired 15 people in 2010 to support its rapid growth.
40% sequential growth most recent quarter. (I didn’t ask whether there was any reason to suspect seasonality.)
130% annual revenue growth run rate.
“Not quite” profitable.
Several hundred commercial subscribers, at an average of $25K annually per, including >100 in Europe.
9,000 paying customers of some kind.
100,000+ total deployments, “very conservatively,” counting OEMs as one deployment each and not double-counting for OEMs’ customers. (Nick said Business Objects quotes 45,000 deployments by the same standards.)
70% of revenue from the mid-market, defined as $100 million – $1 billion revenue. 30% from bigger enterprises. (Hmm. That begs a couple of questions, such as where OEM revenue comes in, and whether <$100 million enterprises were truly a negligible part of revenue.)
1) There is a complete lack of transparency in open source BI market shares as almost all these companies are privately held and do not disclose revenues.
2) What may be a pure play open source company may actually be a company funded by a big BI vendor (like Revolution Analytics is funded among others by Intel-Microsoft) and EnterpriseDB has IBM as an investor.MySQL and Sun of course are bought by Oracle
The degree of control by proprietary vendors on open source vendors is still not disclosed- whether they are holding a stake for strategic reasons or otherwise.
3) None of the Open Source Vendors are even close to a 1 Billion dollar revenue number.
Jim Goodnight is pointing out market reality when he says he has not seen much impact (in terms of market share). As for the rest of his remarks, well he’s got a job to do as CEO and thats talk up his company and trash the competition- which he as been doing for 3 decades and unlikely to change now unless there is severe market share impact. Unless you expect him to notice companies less than 5% of his size in revenue.
If you use Windows for your stats computing and your data is in a database (probably true for almost all corporate business analysts) R 2.12 has provided a unique procedural hitch for you NO BINARIES for packages used till now to read from these databases.
The Readme notes of the release say-
Packages related to many database system must be linked to the exact
version of the database system the user has installed, hence it does
not make sense to provide binaries for packages
RMySQL, ROracle, ROracleUI, RPostgreSQL
although it is possible to install such packages from sources by
install.packages('packagename', type='source')
after reading the manual 'R Installation and Administration'.
So how to connect to Databases if the Windows Binary is not available-
So how to connect to PostgreSQL and MySQL databases.
Fortunately the RpgSQL package is still available for PostgreSQL
Using the RpgSQL package
library(RpgSQL)
#creating a connection
con <- dbConnect(pgSQL(), user = "postgres", password = "XXXX",dbname="postgres")
#writing a table from a R Dataset
dbWriteTable(con, "BOD", BOD)
# table names are lower cased unless double quoted. Here we write a Select SQL query
dbGetQuery(con, 'select * from "BOD"')
#disconnecting the connection
dbDisconnect(con)
You can also use RODBC package for connecting to your PostgreSQL database but you need to configure your ODBC connections in
Windows Start Panel-
Settings-Control Panel-
Administrative Tools-Data Sources (ODBC)
You should probably see something like this screenshot.
Coming back to R and noting the name of my PostgreSQL DSN from above screenshot-( If not there just click on add-scroll to appropriate database -here PostgreSQL and click on Finish- add in the default values for your database or your own created database values-see screenshot for help with other configuring- and remember to click Test below to check if username and password are working, port is correct etc.
so once the DSN is probably setup in the ODBC (frightening terminology is part of databases)- you can go to R to connect using RODBC package
#loading RODBC
library(RODBC)
#creating a Database connection
# for username,password,database name and DSN name
chan=odbcConnect("PostgreSQL35W","postgres;Password=X;Database=postgres")
#to list all table names
sqlTables(chan)
TABLE_QUALIFIER TABLE_OWNER TABLE_NAME TABLE_TYPE REMARKS
1 postgres public bod TABLE
2 postgres public database1 TABLE
3 postgres public tt TABLE
and then we run the same configuring DSN as we did for postgreSQL.
After that we use RODBC in pretty much the same way except changing for the default username and password for MySQL and changing the DSN name for the previous step.
channel <- odbcConnect("mysql","jasperdb;Password=XXX;Database=Test")
test2=sqlQuery(channel,"select * from jiuser")
test2
id username tenantId fullname emailAddress password externallyDefined enabled previousPasswordChangeTime1 1 jasperadmin 1 Jasper Administrator NA 349AFAADD5C5A2BD477309618DC NA 01
2 2 joe1ser 1 Joe User NA 4DD8128D07A NA 01
odbcClose(channel)
While using RODBC for all databases is a welcome step, perhaps the change release notes for Window Users of R may need to be more substantiative than one given for R 2.12.2
To help new AWS customers get started in the cloud, AWS is introducing a new free usage tier. Beginning November 1, new AWScustomers will be able to run a free Amazon EC2 Micro Instance for a year, while also leveraging a new free usage tier for Amazon S3, Amazon Elastic Block Store, Amazon Elastic Load Balancing, and AWSdata transfer. AWS’s free usage tier can be used for anything you want to run in the cloud: launch new applications, test existing applications in the cloud, or simply gain hands-on experience with AWS.
Below are the highlights of AWS’s new free usage tiers. All are available for one year (except Amazon SimpleDB, SQS, and SNS which are free indefinitely):
AWS’s free usage tier startsNovember 1, 2010. A valid creditcard is required to sign up.
See offer terms.
AWS Free Usage Tier (Per Month):
750 hours of Amazon EC2 Linux Micro Instance usage (613 MB of memory and 32-bit and 64-bit platform support) – enough hours to run continuously each month*
In addition to these services, the AWS Management Console is available at no charge to help you build and manage your application on AWS.
* These free tiers are only available to new AWS customers and are available for 12 months following your AWSsign-up date. When your free usage expires or if your application use exceeds the free usage tiers, you simply pay standard, pay-as-you-go service rates (see each service page for full pricing details). Restrictions apply; see offer terms for more details.
** These free tiers do not expire after 12 months and are available to both existing and new AWS customers indefinitely.
The new AWS free usage tier applies to participating services across all AWS regions: US – N. Virginia, US – N. California, EU – Ireland, and APAC – Singapore. Your free usage is calculated each month across all regions and automatically applied to your bill – free usage does not accumulate.
Of course , if I was the sales manager for SAS ETS I would be worried given the increasing capabilities in Time Series in R. But then again some deficiencies in R GUI for Time Series-
1) Layout is not very elegant
2) Not enough documented help (atleast for the Epack GUI- and no integrated help ACROSS packages-)
3) Graphical capabilties need more help documentation to interpret the output (especially in ACF and PACF plots)