Open Source Business Intelligence: Pentaho and Jaspersoft

Here are two products that are used widely for Business Intelligence_ They are open source and both have free preview.

Jaspersoft-For the Enterprise version click on the screenshot while for the free community version you can go to

http://jasperforge.org/projects/jasperserver

Interestingly (and not surprisingly) Revolution Analytics is teaming up with Jaspersoft to use R for reporting along with the Jaspersoft BI stack.

ADVANCED ANALYTICS ON DEMAND IN APPLICATIONS, IN DASHBOARDS, AND ON THE WEB

FREE WEBINAR WEDNESDAY, SEPTEMBER 22ND @9AM PACIFIC

DEPLOYING R: ADVANCED ANALYTICS ON DEMAND IN APPLICATIONS, IN DASHBOARDS, AND ON THE WEB

A JOINT WEBINAR FROM REVOLUTION ANALYTICS AND JASPERSOFT

Date: Wednesday, September 22, 2010
Time: 9:00am PDT (12:00pm EDT; 4:00pm GMT)
Presenters: David Smith, Vice President of Marketing, Revolution Analytics
Andrew Lampitt, Senior Director of Technology Alliances, Jaspersoft
Matthew Dahlman, Business Development Engineer, Jaspersoft
Registration: Click here to register now!

R is a popular and powerful system for creating custom data analysis, statistical models, and data visualizations. But how can you make the results of these R-based computations easily accessible to others? A PhD statistician could use R directly to run the forecasting model on the latest sales data, and email a report on request, but then the process is just going to have to be repeated again next month, even if the model hasn’t changed. Wouldn’t it be better to empower the Sales manager to run the model on demand from within the BI application she already uses—daily, even!—and free up the statistician to build newer, better models for others?

In this webinar, David Smith (VP of Marketing, Revolution Analytics) will introduce the new “RevoDeployR” Web Services framework for Revolution R Enterprise, which is designed to make it easy to integrate dynamic R-based computations into applications for business users. RevoDeployR empowers data analysts working in R to publish R scripts to a server-based installation of Revolution R Enterprise. Application developers can then use the RevoDeployR Web Services API to securely and scalably integrate the results of these scripts into any application, without needing to learn the R language. With RevoDeployR, authorized users of hosted or cloud-based interactive Web applications, desktop applications such as Microsoft Excel, and BI applications like Jaspersoft can all benefit from on-demand analytics and visualizations developed by expert R users.

To demonstrate the power of deploying R-based computations to business users, Andrew Lampitt will introduce Jaspersoft commercial open source business intelligence, the world’s most widely used BI software. In a live demonstration, Matt Dahlman will show how to supercharge the BI process by combining Jaspersoft and Revolution R Enterprise, giving business users on-demand access to advanced forecasts and visualizations developed by expert analysts.

Click here to register for the webinar.

Speaker Biographies:

David Smith is the Vice President of Marketing at Revolution Analytics, the leading commercial provider of software and support for the open source “R” statistical computing language. David is the co-author (with Bill Venables) of the official R manual An Introduction to R. He is also the editor of Revolutions (http://blog.revolutionanalytics.com), the leading blog focused on “R” language, and one of the originating developers of ESS: Emacs Speaks Statistics. You can follow David on Twitter as @revodavid.

Andrew Lampitt is Senior Director of Technology Alliances at Jaspersoft. Andrew is responsible for strategic initiatives and partnerships including cloud business intelligence, advanced analytics, and analytic databases. Prior to Jaspersoft, Andrew held other business positions with Sunopsis (Oracle), Business Objects (SAP), and Sybase (SAP). Andrew earned a BS in engineering from the University of Illinois at Urbana Champaign.

Matthew Dahlman is Jaspersoft’s Business Development Engineer, responsible for technical aspects of technology alliances and regional business development. Matt has held a wide range of technical positions including quality assurance, pre-sales, and technical evangelism with enterprise software companies including Sybase, Netonomy (Comverse), and Sunopsis (Oracle). Matt earned a BA in mathematics from Carleton College in Northfield, Minnesota.


The second widely used BI stack in open source is Pentaho.

You can download it here to evaluate it or click on screenshot to read more at

http://community.pentaho.com/

http://sourceforge.net/projects/pentaho/files/Business%20Intelligence%20Server/

Big Data Management and Advanced Analytics

Here is a new list for the top 10 considerations for Big Data Management and using Advanced Analytics -courtesy AsterData.

Source-

http://www.asterdata.com/wp_10_considerations/index.php?ref=decisionstats

“There are ten strong reasons why competitive organizations are turning to new data management solutions to handle their growing data volumes and evolving analytic needs. This new platform – a ‘data-analytics server’ – merges data storage and data analytics into one single system to conquer the big data challenge.

Big data storage is handled by a massively parallel database architecture; big data analytics is handled by an integrated analytics engine, so that analytics run fully in-database yielding ultra high performance on large data sets. The analytics engine leverages the powerful analytics framework MapReduce. The results are cost-effective, scalable data storage, ultra high performance and richer data analysis.”

Major considerations include:
Cost-effective, scalable data management – what are the requirements?
Advanced analytic queries – what’s meant by advanced analytics & how easy is it?
Running rich, diverse workloads – key factors for high concurrency & performance

Big Data Management and Advanced Analytics

Here is a new list for the top 10 considerations for Big Data Management and using Advanced Analytics -courtesy AsterData.

Source-

http://www.asterdata.com/wp_10_considerations/index.php?ref=decisionstats

“There are ten strong reasons why competitive organizations are turning to new data management solutions to handle their growing data volumes and evolving analytic needs. This new platform – a ‘data-analytics server’ – merges data storage and data analytics into one single system to conquer the big data challenge.

Big data storage is handled by a massively parallel database architecture; big data analytics is handled by an integrated analytics engine, so that analytics run fully in-database yielding ultra high performance on large data sets. The analytics engine leverages the powerful analytics framework MapReduce. The results are cost-effective, scalable data storage, ultra high performance and richer data analysis.”

Major considerations include:
Cost-effective, scalable data management – what are the requirements?
Advanced analytic queries – what’s meant by advanced analytics & how easy is it?
Running rich, diverse workloads – key factors for high concurrency & performance

Data Mining 2010:SAS Conference in Vegas

An interesting conference which I attended last year, this year one of the main guests is an ex professor of mine at UTenn. I am India bound this year though for family reasons.

http://www.sas.com/events/dmconf/over.html

Latest News

Early Bird Special
Register for M2010 before Sept. 17 and save $200 on conference fees!

Additional Data Mining Resources
Find additional data mining resouces including links to whitepapers, webinars, audio seminars, videos, blogs and online communities.

Location
Caesars Palace
Las Vegas, NV

Conference: October 25-26
Pre-conference workshops: October 24
Post-conference training: October 27-29

The M2010 Data Mining Conference is an international educational conference and exhibition for data mining practitioners including analysts, statisticians, programmers, consultants and anyone involved with data management within their organization, Hosted by SAS, M2010 is now in its 13th year and has become the world’s largest data mining conference, attracting over 600 people from various industries including Financial Services, Retail, Insurance, Technology, Education, Healthcare, Pharmaceutical, Government and more.

This conference is the top-choice for serious education and career networking. Conference highlights include

  • 6 keynotes
  • 36 sessions
  • 6 session tracks
  • exhibit hall
  • poster session
  • SAS software training
  • educational workshops
  • special events
  • networking opportunities
  • predictive modeling certification testing event.

Session Topics

  • Business applications
  • Data augmentation
  • Perspectives from the financial services industry
  • Fraud detection
  • Perspectives from the healthcare industry
  • New and emerging technologies
  • Perspectives from the retail industry
  • Data mining in marketing
  • Retention and Life Cycle Analysis
  • Text mining
  • And more! (View session abstracts.)

Aster Data hires Quentin Gallivan as CEO

AsterData formally marked phase 2 of it’s rapid growth story by getting as new CEO Quentin Gallivan (of Postini before it was sold to Google and also Pivotlink).

Founders (and Stanfordians) Mayan Bawa stays as Chief Customer Officer and Tasso Argyros as CTO. It has a very deja vu feel -like Eric Schmidt coming in CEO of Google in the glory days past.  Indeed the investment team in Google and AsterData is quite similar and so are the backgrounds of the founders.

AsterData of course creates the leading MapReduce (also created by Google) solution for providing BI infrastructure for big data and has been rapidly been expanding into new frontiers for Big Data.

Aster Data Appoints New Chief Executive Officer

Quentin Gallivan Joins Aster Data as CEO to Lead Company to Next Level of Growth

San Carlos, CA – September 9, 2010– Aster Data, a proven leader dedicated to providing the best data management and data processing platform for big data management and analytics, today announced the appointment of Quentin Gallivan as President and CEO. Gallivan brings more than 20 years of senior executive experience to the leading analytics and database company. With Aster Data achieving tremendous growth in the past year, Gallivan will take Aster Data to the next level, further accelerating its market leadership, sales, channel partnerships and international expansion.  Founding CEO Mayank Bawa, who grew the company from its inception based on the founders’ research at Stanford University, and whose passion for helping customers uniquely unlock the value of their data, will take on the role of Chief Customer Officer.  Bawa, in his new role, will lead the Company’s organization devoted to ensuring the success, longevity and innovation of its fast-growing customer base. Together, Gallivan and Bawa, along with co-founder and Chief Technology Officer, Tasso Argyros, will deliver on the the Company’s mission to help customers discover more value from their data, achieve deep insights through rich analytics and do more with their massive data volumes than has ever been possible.

Gallivan joins Aster Data with over 20 years of leadership experience in the high-tech industry and has held a variety of CEO and senior executive positions with leading technology companies. Before joining Aster Data, Gallivan served as CEO at PivotLink, the leading provider of business intelligence (BI) solutions delivered via Software as a Service (SaaS), where he rapidly grew the company to over 15,000 business users, from mid-sized companies to Fortune 1000 companies, across key industries including financial services, retail, CPG manufacturing and high technology. Prior to Pivotlink, Gallivan served as CEO of Postini where he scaled the company to 35,000 customers and over 10 million users until its eventual acquisition by Google in 2007.  Gallivan also served as executive vice president of worldwide sales and services at VeriSign where he was instrumental in growing the business from $20 million to $1.2 billion and was responsible for the design and execution of the global distribution strategy for the company’s security and services business. Gallivan also held a number of key executive and leadership positions at Netscape Communications and GE Information Services.

“We are delighted to have someone of Quentin’s caliber, who is a veteran of both emerging and established technology companies, lead Aster Data through our next stage of growth,” said Mayank Bawa, Chief Customer Officer and co-founder, Aster Data. “His significant experience around growing organizations and driving operational excellence will be invaluable as he takes Aster Data forward. I’m excited to shift my focus to customers and their success; to bring our innovations to our customers worldwide to help them unlock deep value from their growing data volumes.”

“I am very excited to be joining Aster Data and taking on the challenge of augmenting its already impressive level of growth and success.  Aster Data is very well respected and established in the marketplace, has an enviable solution for big data management that uniquely addresses both big data storage and data processing, an impressive client list and a very talented team,” said Quentin Gallivan, President and CEO, Aster Data. “My task will be to leverage these assets, help shape a new market and provide operational guidance and strategic direction to drive even greater value for shareholders, customers and employees alike.”

Amazon announces Micro Instances for cloud computing

From Amazon http://aws.amazon.com/ec2

Micro instances provide 613 MB of memory and support 32-bit and 64-bit platforms on both Linux and Windows. Micro instance pricing for On-Demand instances starts at $0.02 per hour for Linux and $0.03 per hour for Windows.

Customers have asked us for a lower priced instance type that could satisfy the needs of their less demanding applications. Micro instances are optimized for applications that require lower throughput, but which still may consume significant compute cycles periodically. Micro instances provide a small amount of consistent CPU resources, and also allow you to burst CPU capacity when additional cycles are available.

Micro instances are available immediately in all regions, and we invite you to go and try one out for yourself today! Learn more about Amazon EC2’s new Micro instances ataws.amazon.com/ec2.

Micro Instances

Instances of this family provide a small amount of consistent CPU resources and allow you to burst CPU capacity when additional cycles are available. They are well suited for lower throughput applications and web sites that consume significant compute cycles periodically.

  • Micro Instance 613 MB of memory, up to 2 ECUs (for short periodic bursts), EBS storage only, 32-bit or 64-bit platform

So dont buy that new CPU yet- use existing hardware in tandem with these micro instances (and internet) to compute- (but  only if your corporate IP administrator wasn’t trained in Windows only certifications 😉

Event: Predictive analytics with R, PMML and ADAPA

From http://www.meetup.com/R-Users/calendar/14405407/

The September meeting is at the Oracle campus. (This is next door to the Oracle towers, so there is plenty of free parking.) The featured talk is from Alex Guazzelli (Vice President – Analytics, Zementis Inc.) who will talk about “Predictive analytics with R, PMML and ADAPA”.

Agenda:
* 6:15 – 7:00 Networking and Pizza (with thanks to Revolution Analytics)
* 7:00 – 8:00 Talk: Predictive analytics with R, PMML and ADAPA
* 8:00 – 8:30 General discussion

Talk overview:

The rule in the past was that whenever a model was built in a particular development environment, it remained in that environment forever, unless it was manually recoded to work somewhere else. This rule has been shattered with the advent of PMML (Predictive Modeling Markup Language). By providing a uniform standard to represent predictive models, PMML allows for the exchange of predictive solutions between different applications and various vendors.

Once exported as PMML files, models are readily available for deployment into an execution engine for scoring or classification. ADAPA is one example of such an engine. It takes in models expressed in PMML and transforms them into web-services. Models can be executed either remotely by using web-services calls, or via a web console. Users can also use an Excel add-in to score data from inside Excel using models built in R.

R models have been exported into PMML and uploaded in ADAPA for many different purposes. Use cases where clients have used the flexibility of R to develop and the PMML standard combined with ADAPA to deploy range from financial applications (e.g., risk, compliance, fraud) to energy applications for the smart grid. The ability to easily transition solutions developed in R to the operational IT production environment helps eliminate the traditional limitations of R, e.g. performance for high volume or real-time transactional systems and memory constraints associated with large data sets.

Speaker Bio:

Dr. Alex Guazzelli has co-authored the first book on PMML, the Predictive Model Markup Language which is the de facto standard used to represent predictive models. The book, entitled PMML in Action: Unleashing the Power of Open Standards for Data Mining and Predictive Analytics, is available on Amazon.com. As the Vice President of Analytics at Zementis, Inc., Dr. Guazzelli is responsible for developing core technology and analytical solutions under ADAPA, a PMML-based predictive decisioning platform that combines predictive analytics and business rules. ADAPA is the first system of its kind to be offered as a service on the cloud.
Prior to joining Zementis, Dr. Guazzelli was involved in not only building but also deploying predictive solutions for large financial and telecommunication institutions around the globe. In academia, Dr. Guazzelli worked with data mining, neural networks, expert systems and brain theory. His work in brain theory and computational neuroscience has appeared in many peer reviewed publications. At Zementis, Dr. Guazzelli and his team have been involved in a myriad of modeling projects for financial, health-care, gaming, chemical, and manufacturing industries.

Dr. Guazzelli holds a Ph.D. in Computer Science from the University of Southern California and a M.S and B.S. in Computer Science from the Federal University of Rio Grande do Sul, Brazil.