Workflows and MyExperiment.org

Here is a great website for sharing workflows – it is called MyExperiment.org and it can also include Work flows from many software.

myExperiment currently has 4742 members270 groups1842 workflows423 files and 173 packs

Could it also include workflow from Red-R from #rstats or Enterprise Miner

Continue reading “Workflows and MyExperiment.org”

#Rstats for Business Intelligence

This is a short list of several known as well as lesser known R ( #rstats) language codes, packages and tricks to build a business intelligence application. It will be slightly Messy (and not Messi) but I hope to refine it someday when the cows come home.

It assumes that BI is basically-

a Database, a Document Database, a Report creation/Dashboard pulling software as well unique R packages for business intelligence.

What is business intelligence?

Seamless dissemination of data in the organization. In short let it flow- from raw transactional data to aggregate dashboards, to control and test experiments, to new and legacy data mining models- a business intelligence enabled organization allows information to flow easily AND capture insights and feedback for further action.

BI software has lately meant to be just reporting software- and Business Analytics has meant to be primarily predictive analytics. the terms are interchangeable in my opinion -as BI reports can also be called descriptive aggregated statistics or descriptive analytics, and predictive analytics is useless and incomplete unless you measure the effect in dashboards and summary reports.

Data Mining- is a bit more than predictive analytics- it includes pattern recognizability as well as black box machine learning algorithms. To further aggravate these divides, students mostly learn data mining in computer science, predictive analytics (if at all) in business departments and statistics, and no one teaches metrics , dashboards, reporting  in mainstream academia even though a large number of graduates will end up fiddling with spreadsheets or dashboards in real careers.

Using R with

1) Databases-

I created a short list of database connectivity with R here at https://rforanalytics.wordpress.com/odbc-databases-for-r/ but R has released 3 new versions since then.

The RODBC package remains the package of choice for connecting to SQL Databases.

http://cran.r-project.org/web/packages/RODBC/RODBC.pdf

Details on creating DSN and connecting to Databases are given at  https://rforanalytics.wordpress.com/odbc-databases-for-r/

For document databases like MongoDB and CouchDB

( what is the difference between traditional RDBMS and NoSQL if you ever need to explain it in a cocktail conversation http://dba.stackexchange.com/questions/5/what-are-the-differences-between-nosql-and-a-traditional-rdbms

Basically dispensing with the relational setup, with primary and foreign keys, and with the additional overhead involved in keeping transactional safety, often gives you extreme increases in performance

NoSQL is a kind of database that doesn’t have a fixed schema like a traditional RDBMS does. With the NoSQL databases the schema is defined by the developer at run time. They don’t write normal SQL statements against the database, but instead use an API to get the data that they need.

instead relating data in one table to another you store things as key value pairs and there is no database schema, it is handled instead in code.)

I believe any corporation with data driven decision making would need to both have atleast one RDBMS and one NoSQL for unstructured data-Ajay. This is a sweeping generic statement 😉 , and is an opinion on future technologies.

  • Use RMongo

From- http://tommy.chheng.com/2010/11/03/rmongo-accessing-mongodb-in-r/

http://plindenbaum.blogspot.com/2010/09/connecting-to-mongodb-database-from-r.html

Connecting to a MongoDB database from R using Java

http://nsaunders.wordpress.com/2010/09/24/connecting-to-a-mongodb-database-from-r-using-java/

Also see a nice basic analysis using R Mongo from

http://pseudofish.com/blog/2011/05/25/analysis-of-data-with-mongodb-and-r/

For CouchDB

please see https://github.com/wactbprot/R4CouchDB and

http://digitheadslabnotebook.blogspot.com/2010/10/couchdb-and-r.html

  • First install RCurl and RJSONIO. You’ll have to download the tar.gz’s if you’re on a Mac. For the second part, we’ll need to installR4CouchDB,

2) External Report Creating Software-

Jaspersoft- It has good integration with R and is a certified Revolution Analytics partner (who seem to be the only ones with a coherent #Rstats go to market strategy- which begs the question – why is the freest and finest stats software having only ONE vendor- if it was so great lots of companies would make exclusive products for it – (and some do -see https://rforanalytics.wordpress.com/r-business-solutions/ and https://rforanalytics.wordpress.com/using-r-from-other-software/)

From

http://www.jaspersoft.com/sites/default/files/downloads/events/Analytics%20-Jaspersoft-SEP2010.pdf

we see

http://jasperforge.org/projects/rrevodeployrbyrevolutionanalytics

RevoConnectR for JasperReports Server

RevoConnectR for JasperReports Server RevoConnectR for JasperReports Server is a Java library interface between JasperReports Server and Revolution R Enterprise’s RevoDeployR, a standardized collection of web services that integrates security, APIs, scripts and libraries for R into a single server. JasperReports Server dashboards can retrieve R charts and result sets from RevoDeployR.

http://jasperforge.org/plugins/esp_frs/optional_download.php?group_id=409

 

Using R and Pentaho
Extending Pentaho with R analytics”R” is a popular open source statistical and analytical language that academics and commercial organizations alike have used for years to get maximum insight out of information using advanced analytic techniques. In this twelve-minute video, David Reinke from Pentaho Certified Partner OpenBI provides an overview of R, as well as a demonstration of integration between R and Pentaho.
and from
R and BI – Integrating R with Open Source Business
Intelligence Platforms Pentaho and Jaspersoft
David Reinke, Steve Miller
Keywords: business intelligence
Increasingly, R is becoming the tool of choice for statistical analysis, optimization, machine learning and
visualization in the business world. This trend will only escalate as more R analysts transition to business
from academia. But whereas in academia R is often the central tool for analytics, in business R must coexist
with and enhance mainstream business intelligence (BI) technologies. A modern BI portfolio already includes
relational databeses, data integration (extract, transform, load – ETL), query and reporting, online analytical
processing (OLAP), dashboards, and advanced visualization. The opportunity to extend traditional BI with
R analytics revolves on the introduction of advanced statistical modeling and visualizations native to R. The
challenge is to seamlessly integrate R capabilities within the existing BI space. This presentation will explain
and demo an initial approach to integrating R with two comprehensive open source BI (OSBI) platforms –
Pentaho and Jaspersoft. Our efforts will be successful if we stimulate additional progress, transparency and
innovation by combining the R and BI worlds.
The demonstration will show how we integrated the OSBI platforms with R through use of RServe and
its Java API. The BI platforms provide an end user web application which include application security,
data provisioning and BI functionality. Our integration will demonstrate a process by which BI components
can be created that prompt the user for parameters, acquire data from a relational database and pass into
RServer, invoke R commands for processing, and display the resulting R generated statistics and/or graphs
within the BI platform. Discussion will include concepts related to creating a reusable java class library of
commonly used processes to speed additional development.

If you know Java- try http://ramanareddyg.blog.com/2010/07/03/integrating-r-and-pentaho-data-integration/

 

and I like this list by two venerable powerhouses of the BI Open Source Movement

http://www.openbi.com/demosarticles.html

Open Source BI as disruptive technology

http://www.openbi.biz/articles/osbi_disruption_openbi.pdf

Open Source Punditry

TITLE AUTHOR COMMENTS
Commercial Open Source BI Redux Dave Reinke & Steve Miller An review and update on the predictions made in our 2007 article focused on the current state of the commercial open source BI market. Also included is a brief analysis of potential options for commercial open source business models and our take on their applicability.
Open Source BI as Disruptive Technology Dave Reinke & Steve Miller Reprint of May 2007 DM Review article explaining how and why Commercial Open Source BI (COSBI) will disrupt the traditional proprietary market.

Spotlight on R

TITLE AUTHOR COMMENTS
R You Ready for Open Source Statistics? Steve Miller R has become the “lingua franca” for academic statistical analysis and modeling, and is now rapidly gaining exposure in the commercial world. Steve examines the R technology and community and its relevancy to mainstream BI.
R and BI (Part 1): Data Analysis with R Steve Miller An introduction to R and its myriad statistical graphing techniques.
R and BI (Part 2): A Statistical Look at Detail Data Steve Miller The usage of R’s graphical building blocks – dotplots, stripplots and xyplots – to create dashboards which require little ink yet tell a big story.
R and BI (Part 3): The Grooming of Box and Whiskers Steve Miller Boxplots and variants (e.g. Violin Plot) are explored as an essential graphical technique to summarize data distributions by categories and dimensions of other attributes.
R and BI (Part 4): Embellishing Graphs Steve Miller Lattices and logarithmic data transformations are used to illuminate data density and distribution and find patterns otherwise missed using classic charting techniques.
R and BI (Part 5): Predictive Modelling Steve Miller An introduction to basic predictive modelling terminology and techniques with graphical examples created using R.
R and BI (Part 6) :
Re-expressing Data
Steve Miller How do you deal with highly skewed data distributions? Standard charting techniques on this “deviant” data often fail to illuminate relationships. This article explains techniques to re-express skewed data so that it is more understandable.
The Stock Market, 2007 Steve Miller R-based dashboards are presented to demonstrate the return performance of various asset classes during 2007.
Bootstrapping for Portfolio Returns: The Practice of Statistical Analysis Steve Miller Steve uses the R open source stats package and Monte Carlo simulations to examine alternative investment portfolio returns…a good example of applied statistics using R.
Statistical Graphs for Portfolio Returns Steve Miller Steve uses the R open source stats package to analyze market returns by asset class with some very provocative embedded trellis charts.
Frank Harrell, Iowa State and useR!2007 Steve Miller In August, Steve attended the 2007 Internation R User conference (useR!2007). This article details his experiences, including his meeting with long-time R community expert, Frank Harrell.
An Open Source Statistical “Dashboard” for Investment Performance Steve Miller The newly launched Dashboard Insight web site is focused on the most useful of BI tools: dashboards. With this article discussing the use of R and trellis graphics, OpenBI brings the realm of open source to this forum.
Unsexy Graphics for Business Intelligence Steve Miller Utilizing Tufte’s philosophy of maximizing the data to ink ratio of graphics, Steve demonstrates the value in dot plot diagramming. The R open source statistical/analytics software is showcased.
I think that the report generation package Brew would also qualify as a BI package, but large scale implementation remains to be seen in
a commercial business environment
  • brew: Creating Repetitive Reports
 brew: Templating Framework for Report Generation

brew implements a templating framework for mixing text and R code for report generation. brew template syntax is similar to PHP, Ruby's erb module, Java Server Pages, and Python's psp module. http://bit.ly/jINmaI
  • Yarr- creating reports in R
to be continued ( when I have more time and the temperature goes down from 110F in Delhi, India)

Worst Chart Ever- Confusing PIE chart as English Test

THE IELTS is used for testing non native speakers to test if they understand English properly.

Imagine many Indian and Chinese smart engineers answering this question.

From-

http://www.ielts-blog.com/recent-ielts-exams/ielts-test-in-the-uk-spain-march-2010-academic-module/

Writing test

Writing Task 1 (a report)
Three pie charts about young Australians secondary school leavers in years 1980, 1990 and 2000. Each pie showed the proportion of school leavers that continued studying, were employed or unemployed. Write a report to a university lecturer describing the pie charts below.
IELTS Academic Writing Task 1 Pie Charts

Broad Guidelines for Graphs

Here are some broad guidelines for Graphs from EIA.gov , so you can say these are the official graphical guidelines of USA Gov

They can be really useful for sites planning to get into the Tableau Software/NYT /Guardian Infographic mode- or even for communities of blogs that have recurrent needs to display graphical plots- particularly since communication, statistical and design specialists are different areas/expertise/people.

Energy Information Administration Standard

Broad Guidelines for Graphs-I am reproducing an example from EIA ‘s guidelines for graphs-
http://www.eia.gov/about/eia_standards.cfm#Standard25

Energy Information Administration Standard 2009-25

Title: Statistical Graphs
Superseded Version: Standard 2002-25
Purpose: To ensure the utility (usefulness to intended users) and objectivity (accuracy, clarity, completeness, and lack of bias) of energy information presented in statistical graphs.
Applicability: All EIA information products.
Required Actions:

  1. Graphs should be used to show and compare changes, trends and/or relationships, and to assist users in visualizing the conclusions drawn from the data represented.
  2. A graph should contain sufficient Continue reading “Broad Guidelines for Graphs”

Using Color Palettes in R

If you like me, are unable to decide whether blue or brown is a better color for graph- color palettes in R are a big help for aesthetically acceptable alternatives.

Using the same graphs, I choose the 5 main kinds of color palettes, using them is as easy as specifying the col= parameter in graphical display in Base Graphs. And I modified the n parameter for number of colors to be used- you can specify more or less depending how much you want the gradient or difference in colors to be.

> hist(VADeaths,col=heat.colors(7))

> hist(VADeaths,col=terrain.colors(7))

Continue reading “Using Color Palettes in R”

Top ten business analytics graphs Bar Charts (3/10)

Bar Charts and Histograms-Bar Charts are one of the most widely used types of Business Charts. Even the ever popular histograms are  special cases of bar charts (but showing frequencies). Histograms are the not the same as bar charts, they are simply bar charts of frequencies.

Basically a bar chart shows rectangular bars with length proportional to the quantities being described. It helps to see relative quantities between various category types.

The barplot() command is used for making Bar Plots, while hist() is used for histograms. You can also use the plot() command with type=h to create histograms-The official R manual also suggests that Dot plots using dotchart () are a reasonable substitute for bar plots.
A very simple easy to understand tutorial for basic bar plots is at http://msenux.redwoods.edu/math/R/barplot.php

The difference between the three main functions that can be used for these charts are shown below-

> VADeaths
Rural Male Rural Female Urban Male Urban Female
50-54       11.7          8.7       15.4          8.4
55-59       18.1         11.7       24.3         13.6
60-64       26.9         20.3       37.0         19.3
65-69       41.0         30.9       54.6         35.1
70-74       66.0         54.3       71.1         50.0

> plot(VADeaths,type=”h”)


> dotchart(VADeaths)

Top Ten Business Analytics Graphs-Line Charts (2/10)

A line chart is one of the most commonly used charts in business analytics and metrics reporting. It basically consists of two variables plotted along the axes with the adjacent points being joined by line segments. Most often used with time series on the x-axis, line charts are simple to understand and use.
Variations on the line graph can include fan charts in time series which include joining line chart of historic data with ranges of future projections. Another common variation is to plot the linear regression or trend line between the two variables  and superimpose it on the graph.
The slope of the line chart shows the rate of change at that particular point , and can also be used to highlight areas of discontinuity or irregular change between two variables.

The basic syntax of line graph is created by first using Plot() function to plot the points and then lines () function to plot the lines between the points.

> str(cars)
‘data.frame’:   50 obs. of  2 variables:
$ speed: num  4 4 7 7 8 9 10 10 10 11 …
$ dist : num  2 10 4 22 16 10 18 26 34 17 …
> plot(cars)
> lines(cars,type=”o”, pch=20, lty=2, col=”green”)
> title(main=”Example Automobiles”, col.main=”blue”, font.main=2)

An example of Time Series Forecasting graph  or fan chart is http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=51