Here is an announcement from Predictive Analytics World, the worlds largest vendor neutral conference dedicated to Predictive Analytics alone. Decisionstats has been a blog partner of PAWCON since inception. This is cool stuff!
|
||||||||||||||||||
Here is an announcement from Predictive Analytics World, the worlds largest vendor neutral conference dedicated to Predictive Analytics alone. Decisionstats has been a blog partner of PAWCON since inception. This is cool stuff!
|
||||||||||||||||||
For some time now, I had been hoping for a place where new package or algorithm developers get at least a fraction of the money that iPad or iPhone application developers get. Rapid Miner has taken the lead in establishing a marketplace for extensions. Is there going to be paid extensions as well- I hope so!!
This probably makes it the first “app” marketplace in open source and the second app marketplace in analytics after salesforce.com
It is hard work to think of new algols, and some of them can really be usefull.
Can we hope for #rstats marketplace where people downloading say ggplot3.0 atleast get a prompt to donate 99 cents per download to Hadley Wickham’s Amazon wishlist. http://www.amazon.com/gp/registry/1Y65N3VFA613B
Do you think it is okay to pay 99 cents per iTunes song, but not pay a cent for open source software.
I dont know- but I am just a capitalist born in a country that was socialist for the first 13 years of my life. Congratulations once again to Rapid Miner for innovating and leading the way.
http://rapid-i.com/component/option,com_myblog/show,Rapid-I-Marketplace-Launched.html/Itemid,172
| RapidMiner, Marketplace, Extensions | 30 May 2011 |
| Rapid-I Marketplace Launched by Simon Fischer |
Over the years, many of you have been developing new RapidMiner Extensions dedicated to a broad set of topics. Whereas these extensions are easy to install in RapidMiner – just download and place them in the plugins folder – the hard part is to find them in the vastness that is the Internet. Extensions made by ourselves at Rapid-I, on the other hand, are distributed by the update server making them searchable and installable directly inside RapidMiner.
We thought that this was a bit unfair, so we decieded to open up the update server to the public, and not only this, we even gave it a new look and name. The Rapid-I Marketplace is available in beta mode at http://rapidupdate.de:8180/ . You can use the Web interface to browse, comment, and rate the extensions, and you can use the update functionality in RapidMiner by going to the preferences and entering http://rapidupdate.de:8180/UpdateServer/ as the update server URL. (Once the beta test is complete, we will change the port back to 80 so we won’t have any firewall problems.)
As an Extension developer, just register with the Marketplace and drop me an email (fischer at rapid-i dot com) so I can give you permissions to upload your own extension. Upload is simple provided you use the standard RapidMiner Extension build process and will boost visibility of your extension.
Looking forward to see many new extensions there soon!
Disclaimer- Decisionstats is a partner of Rapid Miner. I have been liking the software for a long long time, and recently agreed to partner with them just like I did with KXEN some years back, and with Predictive AnalyticsConference, and Aster Data until last year.
I still think Rapid Miner is a very very good software,and a globally created software after SAP.
Here is the actual marketplace
http://rapidupdate.de:8180/UpdateServer/faces/index.xhtml
The Rapid-I Marketplace will soon replace the RapidMiner update server. Using this marketplace, you can share your RapidMiner extensions and make them available for download by the community of RapidMiner users. Currently, we are beta testing this server. If you want to use this server in RapidMiner, you must go to the preferences and enter http://rapidupdate.de:8180/UpdateServer for the update url. After the beta test, we will change the port back to 80, which is currently occupied by the old update server. You can test the marketplace as a user (downloading extensions) and as an Extension developer. If you want to publish your extension here, please let us know via the contact form.
![]()
| 5/30/11 12:39 PM | User burgetrm has uploaded version 1.1.0 of Imageprocessing. |
| 5/30/11 12:34 PM | User burgetrm has uploaded version 1.0.0 of Imageprocessing. |
| 5/30/11 11:55 AM | User burgetrm has created the new product Imageprocessing. |
| 5/30/11 11:12 AM | User Rapid-I has uploaded version 5.0.7 of RapidMiner. |
| 5/30/11 11:12 AM | User Rapid-I has uploaded version 5.0.2 of RapidMiner. |
![]()
This is a short list of several known as well as lesser known R ( #rstats) language codes, packages and tricks to build a business intelligence application. It will be slightly Messy (and not Messi) but I hope to refine it someday when the cows come home.
It assumes that BI is basically-
a Database, a Document Database, a Report creation/Dashboard pulling software as well unique R packages for business intelligence.
What is business intelligence?
Seamless dissemination of data in the organization. In short let it flow- from raw transactional data to aggregate dashboards, to control and test experiments, to new and legacy data mining models- a business intelligence enabled organization allows information to flow easily AND capture insights and feedback for further action.
BI software has lately meant to be just reporting software- and Business Analytics has meant to be primarily predictive analytics. the terms are interchangeable in my opinion -as BI reports can also be called descriptive aggregated statistics or descriptive analytics, and predictive analytics is useless and incomplete unless you measure the effect in dashboards and summary reports.
Data Mining- is a bit more than predictive analytics- it includes pattern recognizability as well as black box machine learning algorithms. To further aggravate these divides, students mostly learn data mining in computer science, predictive analytics (if at all) in business departments and statistics, and no one teaches metrics , dashboards, reporting in mainstream academia even though a large number of graduates will end up fiddling with spreadsheets or dashboards in real careers.
Using R with
1) Databases-
I created a short list of database connectivity with R here at https://rforanalytics.wordpress.com/odbc-databases-for-r/ but R has released 3 new versions since then.
The RODBC package remains the package of choice for connecting to SQL Databases.
http://cran.r-project.org/web/packages/RODBC/RODBC.pdf
Details on creating DSN and connecting to Databases are given at https://rforanalytics.wordpress.com/odbc-databases-for-r/
For document databases like MongoDB and CouchDB
( what is the difference between traditional RDBMS and NoSQL if you ever need to explain it in a cocktail conversation http://dba.stackexchange.com/questions/5/what-are-the-differences-between-nosql-and-a-traditional-rdbms
Basically dispensing with the relational setup, with primary and foreign keys, and with the additional overhead involved in keeping transactional safety, often gives you extreme increases in performance
NoSQL is a kind of database that doesn’t have a fixed schema like a traditional RDBMS does. With the NoSQL databases the schema is defined by the developer at run time. They don’t write normal SQL statements against the database, but instead use an API to get the data that they need.
instead relating data in one table to another you store things as key value pairs and there is no database schema, it is handled instead in code.)
I believe any corporation with data driven decision making would need to both have atleast one RDBMS and one NoSQL for unstructured data-Ajay. This is a sweeping generic statement 😉 , and is an opinion on future technologies.
From- http://tommy.chheng.com/2010/11/03/rmongo-accessing-mongodb-in-r/
http://plindenbaum.blogspot.com/2010/09/connecting-to-mongodb-database-from-r.html
Connecting to a MongoDB database from R using Java
http://nsaunders.wordpress.com/2010/09/24/connecting-to-a-mongodb-database-from-r-using-java/
Also see a nice basic analysis using R Mongo from
http://pseudofish.com/blog/2011/05/25/analysis-of-data-with-mongodb-and-r/
For CouchDB
please see https://github.com/wactbprot/R4CouchDB and
http://digitheadslabnotebook.blogspot.com/2010/10/couchdb-and-r.html
2) External Report Creating Software-
Jaspersoft- It has good integration with R and is a certified Revolution Analytics partner (who seem to be the only ones with a coherent #Rstats go to market strategy- which begs the question – why is the freest and finest stats software having only ONE vendor- if it was so great lots of companies would make exclusive products for it – (and some do -see https://rforanalytics.wordpress.com/r-business-solutions/ and https://rforanalytics.wordpress.com/using-r-from-other-software/)
From
http://www.jaspersoft.com/sites/default/files/downloads/events/Analytics%20-Jaspersoft-SEP2010.pdf
we see
http://jasperforge.org/projects/rrevodeployrbyrevolutionanalytics
RevoConnectR for JasperReports Server
RevoConnectR for JasperReports Server RevoConnectR for JasperReports Server is a Java library interface between JasperReports Server and Revolution R Enterprise’s RevoDeployR, a standardized collection of web services that integrates security, APIs, scripts and libraries for R into a single server. JasperReports Server dashboards can retrieve R charts and result sets from RevoDeployR.
http://jasperforge.org/plugins/esp_frs/optional_download.php?group_id=409
R and BI – Integrating R with Open Source Business Intelligence Platforms Pentaho and Jaspersoft David Reinke, Steve Miller Keywords: business intelligence Increasingly, R is becoming the tool of choice for statistical analysis, optimization, machine learning and visualization in the business world. This trend will only escalate as more R analysts transition to business from academia. But whereas in academia R is often the central tool for analytics, in business R must coexist with and enhance mainstream business intelligence (BI) technologies. A modern BI portfolio already includes relational databeses, data integration (extract, transform, load – ETL), query and reporting, online analytical processing (OLAP), dashboards, and advanced visualization. The opportunity to extend traditional BI with R analytics revolves on the introduction of advanced statistical modeling and visualizations native to R. The challenge is to seamlessly integrate R capabilities within the existing BI space. This presentation will explain and demo an initial approach to integrating R with two comprehensive open source BI (OSBI) platforms – Pentaho and Jaspersoft. Our efforts will be successful if we stimulate additional progress, transparency and innovation by combining the R and BI worlds. The demonstration will show how we integrated the OSBI platforms with R through use of RServe and its Java API. The BI platforms provide an end user web application which include application security, data provisioning and BI functionality. Our integration will demonstrate a process by which BI components can be created that prompt the user for parameters, acquire data from a relational database and pass into RServer, invoke R commands for processing, and display the resulting R generated statistics and/or graphs within the BI platform. Discussion will include concepts related to creating a reusable java class library of commonly used processes to speed additional development.
If you know Java- try http://ramanareddyg.blog.com/2010/07/03/integrating-r-and-pentaho-data-integration/
and I like this list by two venerable powerhouses of the BI Open Source Movement
http://www.openbi.com/demosarticles.html
Open Source BI as disruptive technology
http://www.openbi.biz/articles/osbi_disruption_openbi.pdf
Open Source Punditry
| TITLE | AUTHOR | COMMENTS |
|---|---|---|
| Commercial Open Source BI Redux | Dave Reinke & Steve Miller | An review and update on the predictions made in our 2007 article focused on the current state of the commercial open source BI market. Also included is a brief analysis of potential options for commercial open source business models and our take on their applicability. |
| Open Source BI as Disruptive Technology | Dave Reinke & Steve Miller | Reprint of May 2007 DM Review article explaining how and why Commercial Open Source BI (COSBI) will disrupt the traditional proprietary market. |
| TITLE | AUTHOR | COMMENTS |
|---|---|---|
| R You Ready for Open Source Statistics? | Steve Miller | R has become the “lingua franca” for academic statistical analysis and modeling, and is now rapidly gaining exposure in the commercial world. Steve examines the R technology and community and its relevancy to mainstream BI. |
| R and BI (Part 1): Data Analysis with R | Steve Miller | An introduction to R and its myriad statistical graphing techniques. |
| R and BI (Part 2): A Statistical Look at Detail Data | Steve Miller | The usage of R’s graphical building blocks – dotplots, stripplots and xyplots – to create dashboards which require little ink yet tell a big story. |
| R and BI (Part 3): The Grooming of Box and Whiskers | Steve Miller | Boxplots and variants (e.g. Violin Plot) are explored as an essential graphical technique to summarize data distributions by categories and dimensions of other attributes. |
| R and BI (Part 4): Embellishing Graphs | Steve Miller | Lattices and logarithmic data transformations are used to illuminate data density and distribution and find patterns otherwise missed using classic charting techniques. |
| R and BI (Part 5): Predictive Modelling | Steve Miller | An introduction to basic predictive modelling terminology and techniques with graphical examples created using R. |
| R and BI (Part 6) : Re-expressing Data |
Steve Miller | How do you deal with highly skewed data distributions? Standard charting techniques on this “deviant” data often fail to illuminate relationships. This article explains techniques to re-express skewed data so that it is more understandable. |
| The Stock Market, 2007 | Steve Miller | R-based dashboards are presented to demonstrate the return performance of various asset classes during 2007. |
| Bootstrapping for Portfolio Returns: The Practice of Statistical Analysis | Steve Miller | Steve uses the R open source stats package and Monte Carlo simulations to examine alternative investment portfolio returns…a good example of applied statistics using R. |
| Statistical Graphs for Portfolio Returns | Steve Miller | Steve uses the R open source stats package to analyze market returns by asset class with some very provocative embedded trellis charts. |
| Frank Harrell, Iowa State and useR!2007 | Steve Miller | In August, Steve attended the 2007 Internation R User conference (useR!2007). This article details his experiences, including his meeting with long-time R community expert, Frank Harrell. |
| An Open Source Statistical “Dashboard” for Investment Performance | Steve Miller | The newly launched Dashboard Insight web site is focused on the most useful of BI tools: dashboards. With this article discussing the use of R and trellis graphics, OpenBI brings the realm of open source to this forum. |
| Unsexy Graphics for Business Intelligence | Steve Miller | Utilizing Tufte’s philosophy of maximizing the data to ink ratio of graphics, Steve demonstrates the value in dot plot diagramming. The R open source statistical/analytics software is showcased. |
brew: Templating Framework for Report Generation brew implements a templating framework for mixing text and R code for report generation. brew template syntax is similar to PHP, Ruby's erb module, Java Server Pages, and Python's psp module. http://bit.ly/jINmaI
http://dirk.eddelbuettel.com/blog/2011/01/16/#overbought_oversold_plot
message from Predictive Analytics World. If you are NY based you may want to drop in and listen.———————————————————————————-Tom Davenport to Keynote at
Predictive Analytics World New York
Take advantage of Super Early Bird Pricing by May 20th and recognize savings of $400. Additional savings when you bring the team*
Join your peers October 17-21, 2011 at the Hilton New York for Predictive Analytics World, the business event for predictive analytics professionals, managers and commercial practitioners, covering today’s commercial deployment of predictive analytics, across industries and across software vendors. |
||||
| RAVE REVIEWS:“I came to PAW because it provides case studies relevant to my industry. It has lived up to the expectation and I think it’s the best analytics conference I’ve ever attended!“
Shaohua Zhang, Senior Data Mining Analyst
Rogers Telecommunications “Hands down, best applied analytics conference I have ever attended. Great exposure to cutting-edge predictive techniques and I was able to turn around and apply some of those learnings to my work immediately. I’ve never been able to say that after any conference I’ve attended before!” Jon Francis, Senior Statistician
T-Mobile PAW NYC’s agenda covers black box trading, churn modeling, crowdsourcing, demand forecasting, ensemble models, fraud detection, healthcare, insurance applications, law enforcement, litigation, market mix modeling, mobile analytics, online marketing, risk management, social data, supply chain management, targeting direct marketing, uplift modeling (net lift), and other innovative applications that benefit organizations in new and creative ways. Take advantage of Super Early Bird Pricing and realize Note: Each additional attendee from the same company registered at the same time receives an extra $200 off the Conference Pass. |
||||
Please use the following code to get a 15% discount on the 2 Day Conference Pass: AJAYNY11.
Predictive Analytics World Conference –New York City and London, UK
October 17-21, 2011 – New York City, NY (pawcon.com/nyc)
Nov 30 – Dec 1, 2011 – London, UK (pawcon.com/london)
Predictive Analytics World (pawcon.com) is the business-focused event for predictive analytics
professionals, managers and commercial practitioners, covering today’s commercial deployment of
predictive analytics, across industries and across software vendors. The conference delivers case
studies, expertise, and resources to achieve two objectives:
1) Bigger wins: Strengthen the business impact delivered by predictive analytics
2) Broader capabilities: Establish new opportunities with predictive analytics
Case Studies: How the Leading Enterprises Do It
Predictive Analytics World focuses on concrete examples of deployed predictive analytics. The leading
enterprises have signed up to tell their stories, so you can hear from the horse’s mouth precisely how
Fortune 500 analytics competitors and other top practitioners deploy predictive modeling, and what
kind of business impact it delivers.
PAW NEW YORK CITY 2011
PAW’s NYC program is the richest and most diverse yet, featuring over 40 sessions across three tracks
– including both X and Y tracks, and an “Expert/Practitioner” track — so you can witness how predictive
analytics is applied at major companies.
PAW NYC’s agenda covers hot topics and advanced methods such as ensemble models, social data,
search marketing, crowdsourcing, blackbox trading, fraud detection, risk management, survey analysis,
and other innovative applications that benefit organizations in new and creative ways.
WORKSHOPS: PAW NYC also features five full-day pre- and post-conference workshops that
complement the core conference program. Workshop agendas include advanced predictive modeling
methods, hands-on training, an intro to R (the open source analytics system), and enterprise decision
management.
For more see http://www.predictiveanalyticsworld.com/newyork/2011/
PAW LONDON 2011
PAW London’s agenda covers hot topics and advanced methods such as risk management, uplift
(incremental lift) modeling, open source analytics, and crowdsourcing data mining. Case study
presentations cover campaign targeting, churn modeling, next-best-offer, selecting marketing channels,
global analytics deployment, email marketing, HR candidate search, and other innovative applications
that benefit organizations in new and creative ways.
Join PAW and access the best keynotes, sessions, workshops, exposition, expert panel, live demos,
networking coffee breaks, reception, birds-of-a-feather lunches, brand-name enterprise leaders, and
industry heavyweights in the business.
For more see http://www.predictiveanalyticsworld.com/london
CROSS-INDUSTRY APPLICATIONS
Predictive Analytics World is the only conference of its kind, delivering vendor-neutral sessions across
verticals such as banking, financial services, e-commerce, education, government, healthcare, high
technology, insurance, non-profits, publishing, social gaming, retail and telecommunications
And PAW covers the gamut of commercial applications of predictive analytics, including response
modeling, customer retention with churn modeling, product recommendations, fraud detection, online
marketing optimization, human resource decision-making, law enforcement, sales forecasting, and
credit scoring.
Why bring together such a wide range of endeavors? No matter how you use predictive analytics, the
story is the same: Predicatively scoring customers optimizes business performance. Predictive analytics
initiatives across industries leverage the same core predictive modeling technology, share similar project
overhead and data requirements, and face common process challenges and analytical hurdles.
RAVE REVIEWS:
“Hands down, best applied, analytics conference I have ever attended. Great exposure to cutting-edge
predictive techniques and I was able to turn around and apply some of those learnings to my work
immediately. I’ve never been able to say that after any conference I’ve attended before!”
Jon Francis
Senior Statistician
T-Mobile
Read more: Articles and blog entries about PAW can be found at http://www.predictiveanalyticsworld.com/
pressroom.php
VENDORS. Meet the vendors and learn about their solutions, software and service. Discover the best
predictive analytics vendors available to serve your needs – learn what they do and see how they
compare
COLLEAGUES. Mingle, network and hang out with your best and brightest colleagues. Exchange
experiences over lunch, coffee breaks and the conference reception connecting with those professionals
who face the same challenges as you.
GET STARTED. If you’re new to predictive analytics, kicking off a new initiative, or exploring new ways
to position it at your organization, there’s no better place to get your bearings than Predictive Analytics
World. See what other companies are doing, witness vendor demos, participate in discussions with the
experts, network with your colleagues and weigh your options!
For more information:
http://www.predictiveanalyticsworld.com
View videos of PAW Washington DC, Oct 2010 — now available on-demand:
http://www.predictiveanalyticsworld.com/online-video.php
What is predictive analytics? See the Predictive Analytics Guide:
http://www.predictiveanalyticsworld.com/predictive_analytics.php
If you’d like our informative event updates, sign up at:
http://www.predictiveanalyticsworld.com/signup-us.php
To sign up for the PAW group on LinkedIn, see:
http://www.linkedin.com/e/gis/1005097
For inquiries e-mail regsupport@risingmedia.com or call (717) 798-3495.
I just checked out this new software for making PMML models. It is called Augustus and is created by the Open Data Group (http://opendatagroup.com/) , which is headed by Robert Grossman, who was the first proponent of using R on Amazon Ec2.
Probably someone like Zementis ( http://adapasupport.zementis.com/ ) can use this to further test , enhance or benchmark on the Ec2. They did have a joint webinar with Revolution Analytics recently.
https://code.google.com/p/augustus/
See Recent News for more details and all recent news.
Augustus is a PMML 4-compliant scoring engine that works with segmented models. Augustus is designed for use with statistical and data mining models. The new release provides Baseline, Tree and Naive-Bayes producers and consumers.
There is also a version for use with PMML 3 models. It is able to produce and consume models with 10,000s of segments and conforms to a PMML draft RFC for segmented models and ensembles of models. It supports Baseline, Regression, Tree and Naive-Bayes.
Augustus is written in Python and is freely available under the GNU General Public License, version 2.
See the page Which version is right for me for more details regarding the different versions.
Predictive Model Markup Language (PMML) is an XML mark up language to describe statistical and data mining models. PMML describes the inputs to data mining models, the transformations used to prepare data for data mining, and the parameters which define the models themselves. It is used for a wide variety of applications, including applications in finance, e-business, direct marketing, manufacturing, and defense. PMML is often used so that systems which create statistical and data mining models (“PMML Producers”) can easily inter-operate with systems which deploy PMML models for scoring or other operational purposes (“PMML Consumers”).
For information regarding using Augustus with Change Detection and Health and Status Monitoring, please see change-detection.
Open Data Group provides management consulting services, outsourced analytical services, analytic staffing, and expert witnesses broadly related to data and analytics. It has experience with customer data, supplier data, financial and trading data, and data from internal business processes.
It has staff in Chicago and San Francisco and clients throughout the U.S. Open Data Group began operations in 2002.
The above example contains plots generated in R of scoring results from Augustus. Each point on the graph represents a use of the scoring engine and a chart is an aggregation of multiple Augustus runs. A Baseline (Change Detection) model was used to score data with multiple segments.
Typical Use
Augustus is typically used to construct models and score data with models. Augustus includes a dedicated application for creating, or producing, predictive models rendered as PMML-compliant files. Scoring is accomplished by consuming PMML-compliant files describing an appropriate model. Augustus provides a dedicated application for scoring data with four classes of models, Baseline (Change Detection) Models, Tree Models, Regression Models and Naive Bayes Models. The typical model development and use cycle with Augustus is as follows:
Separate consumer and producer applications are supplied for Baseline (Change Detection) models, Tree models, Regression models and for Naive Bayes models. The producer and consumer applications require configuration with XML-formatted files. The specification of the configuration files and model schema are detailed below. The consumers provide for some configurability of the output but users will often provide additional post-processing to render the output according to their needs. A variety of mechanisms exist for transmitting data but user’s may need to provide their own preprocessing to accommodate their particular data source.
In addition to the producer and consumer applications, Augustus is conceptually structured and provided with libraries which are relevant to the development and use of Predictive Models. Broadly speaking, these consist of components that address the use of PMML and components that are specific to Augustus.
Post Processing
Augustus can accommodate a post-processing step. While not necessary, it is often useful to
Good news for analytics speakers and listeners. Predictive Analytics Conference is accepting nominations.