Here is an interview with James Dixon the founder of Pentaho, self confessed Chief Geek and CTO. Pentaho has been growing very rapidly and it makes open source Business Intelligence solutions- basically the biggest chunk of enterprise software market currently.
Ajay- How would you describe Pentaho as a BI product for someone who is completely used to traditional BI vendors (read non open source). Do the Oracle lawsuits over Java bother you from a business perspective?
Pentaho has a full suite of BI software:
* ETL: Pentaho Data Integration
* Reporting: Pentaho Reporting for desktop and web-based reporting
* OLAP: Mondrian ROLAP engine, and Analyzer or Jpivot for web-based OLAP client
* Dashboards: CDF and Dashboard Designer
* Predictive Analytics: Weka
* Server: Pentaho BI Server, handles web-access, security, scheduling, sharing, report bursting etc
We have all of the standard BI functionality.
The Oracle/Java issue does not bother me much. There are a lot of software companies dependent on Java. If Oracle abandons Java a lot resources will suddenly focus on OpenJDK. It would be good for OpenJDK and might be the best thing for Java in the long term.
Ajay- What parts of Pentaho’s technology do you personally like the best as having an advantage over other similar proprietary packages.
Describe the latest Pentaho for Hadoop offering and Hadoop/HIVE ‘s advantage over say Map Reduce and SQL.
James- The coolest thing is that everything is pluggable:
* ETL: New data transformation steps can be added. New orchestration controls (job entries) can be added. New perspectives can be added to the design UI. New data sources and destinations can be added.
* Reporting: New content types and report objects can be added. New data sources can be added.
* BI Server: Every factory, engine, and layer can be extended or swapped out via configuration. BI components can be added. New visualizations can be added.
This means it is very easy for Pentaho, partners, customers, and community member to extend our software to do new things.
In addition every engine and component can be fully embedded into a desktop or web-based application. I made a youtube video about our philosophy: http://www.youtube.com/watch?v=uMyR-In5nKE
Our Hadoop offerings allow ETL developers to work in a familiar graphical design environment, instead of having to code MapReduce jobs in Java or Python.
90% of the Hadoop use cases we hear about are transformation/reporting/analysis of structured/semi-structured data, so an ETL tool is perfect for these situations.
Using Pentaho Data Integration reduces implementation and maintenance costs significantly. The fact that our ETL engine is Java and is embeddable means that we can deploy the engine to the Hadoop data nodes and transform the data within the nodes.
Ajay- Do you think the combination of recession, outsourcing,cost cutting, and unemployment are a suitable environment for companies to cut technology costs by going out of their usual vendor lists and try open source for a change /test projects.
Jamie- Absolutely. Pentaho grew (downloads, installations, revenue) throughout the recession. We are on target to do 250% of what we did last year, while the established vendors are flat in terms of new license revenue.
Ajay- How would you compare the user interface of reports using Pentaho versus other reporting software. Please feel free to be as specific.
James- We have all of the everyday, standard reporting features covered.
Over the years the old tools, like Crystal Reports, have become bloated and complicated.
We don’t aim to have 100% of their features, because we’d end us just as complicated.
The 80:20 rule applies here. 80% of the time people only use 20% of their features.
We aim for 80% feature parity, which should cover 95-99% of typical use cases.
Ajay- Could you describe the Pentaho integration with R as well as your relationship with Weka. Jaspersoft already has a partnership with Revolution Analytics for RevoDeployR (R on a web server)-
Any R plans for Pentaho as well?
James- The feature set of R and Weka overlap to a small extent – both of them include basic statistical functions. Weka is focused on predictive models and machine learning, whereas R is focused on a full suite of statistical models. The creator and main Weka developer is a Pentaho employee. We have integrated R into our ETL tool. (makes me happy 🙂 )
(probably not a good time to ask if SAS integration is done as well for a big chunk of legacy base SAS/ WPS users)
As “Chief Geek” (CTO) at Pentaho, James Dixon is responsible for Pentaho’s architecture and technology roadmap. James has over 15 years of professional experience in software architecture, development and systems consulting. Prior to Pentaho, James held key technical roles at AppSource Corporation (acquired by Arbor Software which later merged into Hyperion Solutions) and Keyola (acquired by Lawson Software). Earlier in his career, James was a technology consultant working with large and small firms to deliver the benefits of innovative technology in real-world environments.