Protected: Converting SAS language code to Java

Interview Michal Kosinski , Concerto Web Based App using #Rstats

Here is an interview with Michal Kosinski , leader of the team that has created Concerto – a web based application using R. What is Concerto? As per http://www.psychometrics.cam.ac.uk/page/300/concerto-testing-platform.htm

Concerto is a web based, adaptive testing platform for creating and running rich, dynamic tests. It combines the flexibility of HTML presentation with the computing power of the R language, and the safety and performance of the MySQL database. It’s totally free for commercial and academic use, and it’s open source

Ajay- Describe your career in science from high school to this point. What are the various stats platforms you have trained on- and what do you think about their comparative advantages and disadvantages?

Michal- I started with maths, but quickly realized that I prefer social sciences – thus after one year, I switched to a psychology major and obtained my MSc in Social Psychology with a specialization in Consumer Behaviour. At that time I was mostly using SPSS – as it was the only statistical package that was taught to students in my department. Also, it was not too bad for small samples and the rather basic analyses I was performing at that time.

My more recent research performed during my Mphil course in Psychometrics at Cambridge University followed by my current PhD project in social networks and research work at Microsoft Research, requires significantly more powerful tools. Initially, I tried to squeeze as much as possible from SPSS/PASW by mastering the syntax language. SPSS was all I knew, though I reached its limits pretty quickly and was forced to switch to R. It was a pretty dreary experience at the start, switching from an unwieldy but familiar environment into an unwelcoming command line interface, but I’ve quickly realized how empowering and convenient this tool was.

I believe that a course in R should be obligatory for all students that are likely to come close to any data analysis in their careers. It is really empowering – once you got the basics you have the potential to use virtually any method there is, and automate most tasks related to analysing and processing data. It is also free and open-source – so you can use it wherever you work. Finally, it enables you to quickly and seamlessly migrate to other powerful environments such as Matlab, C, or Python.

Ajay- What was the motivation behind building Concerto?

Michal- We deal with a lot of online projects at the Psychometrics Centre – one of them attracted more than 7 million unique participants. We needed a powerful tool that would allow researchers and practitioners to conveniently build and deliver online tests.

Also, our relationships with the website designers and software engineers that worked on developing our tests were rather difficult. We had trouble successfully explaining our needs, each little change was implemented with a delay and at significant cost. Not to mention the difficulties with embedding some more advanced methods (such as adaptive testing) in our tests.

So we created a tool allowing us, psychometricians, to easily develop psychometric tests from scratch an publish them online. And all this without having to hire software developers.

Ajay -Why did you choose R as the background for Concerto? What other languages and platforms did you consider. Apart from Concerto, how else do you utilize R in your center, department and University?

Michal- R was a natural choice as it is open-source, free, and nicely integrates with a server environment. Also, we believe that it is becoming a universal statistical and data processing language in science. We put increasing emphasis on teaching R to our students and we hope that it will replace SPSS/PASW as a default statistical tool for social scientists.

Ajay -What all can Concerto do besides a computer adaptive test?

Michal- We did not plan it initially, but Concerto turned out to be extremely flexible. In a nutshell, it is a web interface to R engine with a built-in MySQL database and easy-to-use developer panel. It can be installed on both Windows and Unix systems and used over the network or locally.

Effectively, it can be used to build any kind of web application that requires a powerful and quickly deployable statistical engine. For instance, I envision an easy to use website (that could look a bit like SPSS) allowing students to analyse their data using a web browser alone (learning the underlying R code simultaneously). Also, the authors of R libraries (or anyone else) could use Concerto to build user-friendly web interfaces to their methods.

Finally, Concerto can be conveniently used to build simple non-adaptive tests and questionnaires. It might seem to be slightly less intuitive at first than popular questionnaire services (such us my favourite Survey Monkey), but has virtually unlimited flexibility when it comes to item format, test flow, feedback options, etc. Also, it’s free.

Ajay- How do you see the cloud computing paradigm growing? Do you think browser based computation is here to stay?

Michal – I believe that cloud infrastructure is the future. Dynamically sharing computational and network resources between online service providers has a great competitive advantage over traditional strategies to deal with network infrastructure. I am sure the security concerns will be resolved soon, finishing the transformation of the network infrastructure as we know it. On the other hand, however, I do not see a reason why client-side (or browser) processing of the information should cease to exist – I rather think that the border between the cloud and personal or local computer will continually dissolve.

About

Michal Kosinski is Director of Operations for The Psychometrics Centre and Leader of the e-Psychometrics Unit. He is also a research advisor to the Online Services and Advertising group at the Microsoft Research Cambridge, and a visiting lecturer at the Department of Mathematics in the University of Namur, Belgium. You can read more about him at http://www.michalkosinski.com/

#rstats -Basic Data Manipulation using R

Continuing my series of basic data manipulation using R. For people knowing analytics and

new to R.

1 Keeping only some variables

Using subset we can keep only the variables we want-

Sitka89 <- subset(Sitka89, select=c(size,Time,treat))

Will keep only the variables we have selected (size,Time,treat).

2 Dropping some variables

Harman23.cor$cov.arm.span <- NULL

This deletes the variable named cov.arm.span in the dataset Harman23.cor

3 Keeping records based on character condition

Titanic.sub1<-subset(Titanic,Sex=="Male")

Note the double equal-to sign

4 Keeping records based on date/time condition

subset(DF, as.Date(Date) >= '2009-09-02' & as.Date(Date) <= '2009-09-04')

5 Converting Date Time Formats into other formats

if the variable dob is “01/04/1977) then following will convert into a date object

z=strptime(dob,”%d/%m/%Y”)

and if the same date is 01Apr1977

z=strptime(dob,"%d%b%Y")

6 Difference in Date Time Values and Using Current Time

The difftime function helps in creating differences in two date time variables.

difftime(time1, time2, units='secs')

or

difftime(time1, time2, tz = "", units = c("auto", "secs", "mins", "hours", "days", "weeks"))

For current system date time values you can use

Sys.time()

Sys.Date()

This value can be put in the difftime function shown above to calculate age or time elapsed.

7 Keeping records based on numerical condition

Titanic.sub1<-subset(Titanic,Freq >37)

For enhanced usage-

you can also use the R Commander GUI with the sub menu Data > Active Dataset

8 Sorting Data

Sorting A Data Frame in Ascending Order by a variable

AggregatedData<- sort(AggregatedData, by=~ Package)

Sorting a Data Frame in Descending Order by a variable

AggregatedData<- sort(AggregatedData, by=~ -Installed)

9 Transforming a Dataset Structure around a single variable

Using the Reshape2 Package we can use melt and acast functions

library("reshape2")

tDat.m<- melt(tDat)

tDatCast<- acast(tDat.m,Subject~Item)

If we choose not to use Reshape package, we can use the default reshape method in R. Please do note this takes longer processing time for bigger datasets.

df.wide <- reshape(df, idvar="Subject", timevar="Item", direction="wide")

10 Type in Data

Using scan() function we can type in data in a list

11 Using Diff for lags and Cum Sum function forCumulative Sums

We can use the diff function to calculate difference between two successive values of a variable.

Diff(Dataset$X)

Cumsum function helps to give cumulative sum

Cumsum(Dataset$X)

> x=rnorm(10,20) #This gives 10 Randomly distributed numbers with Mean 20

> x

[1] 20.76078 19.21374 18.28483 20.18920 21.65696 19.54178 18.90592 20.67585

[9] 20.02222 18.99311

> diff(x)

[1] -1.5470415 -0.9289122 1.9043664 1.4677589 -2.1151783 -0.6358585 1.7699296

[8] -0.6536232 -1.0291181 >

cumsum(x)

[1] 20.76078 39.97453 58.25936 78.44855 100.10551 119.64728 138.55320

[8] 159.22905 179.25128 198.24438

> diff(x,2) # The diff function can be used as diff(x, lag = 1, differences = 1, ...) where differences is the order of differencing

[1] -2.4759536 0.9754542 3.3721252 -0.6474195 -2.7510368 1.1340711 1.1163064

[8] -1.6827413

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

12 Merging Data

Deducer GUI makes it much simpler to merge datasets. The simplest syntax for a merge statement is

totalDataframeZ <- merge(dataframeX,dataframeY,by=c("AccountId","Region"))

13 Aggregating and group processing of a variable

We can use multiple methods for aggregating and by group processing of variables.

Two functions we explore here are aggregate and Tapply.

Refering to the R Online Manual at

[http://stat.ethz.ch/R-manual/R-patched/library/stats/html/aggregate.html]

## Compute the averages for the variables in 'state.x77', grouped

## according to the region (Northeast, South, North Central, West) that

## each state belongs to

aggregate(state.x77, list(Region = state.region), mean)

Using TApply

## tapply(Summary Variable, Group Variable, Function)

Reference

[http://www.ats.ucla.edu/stat/r/library/advanced_function_r.htm#tapply]

We can also use specialized packages for data manipulation.

For additional By-group processing you can see the doBy package as well as Plyr package

 for data manipulation.Doby contains a variety of utilities including:

 1) Facilities for groupwise computations of summary statistics and other facilities for working with grouped data.

 2) General linear contrasts and LSMEANS (least-squares-means also known as population means),

 3) HTMLreport for autmatic generation of HTML file from R-script with a minimum of markup, 4) various other utilities and is available at[ http://cran.r-project.org/web/packages/doBy/index.html]

Also Available at [http://cran.r-project.org/web/packages/plyr/index.html],

Plyr is a set of tools that solves a common set of problems:

you need to break a big problem down into manageable pieces,

operate on each pieces and then put all the pieces back together.

 For example, you might want to fit a model to each spatial location or

 time point in your study, summarise data by panels or collapse high-dimensional arrays

 to simpler summary statistics.

Contribution to #Rstats by Revolution

I have been watching for Revolution Analytics product almost since the inception of the company. It has managed to sail over storms, naysayers and critics with simple and effective strategy of launching good software, making good partnerships and keeping up media visibility with white papers, joint webinars, blogs, conferences and events.

However this is a listing of all technical contributions made by Revolution Analytics products to the #rstats project.

1) Useful Packages mostly in parallel processing or more efficient computing like

foreach (http://cran.r-project.org/web/packages/foreach/index.html) ,
nws (http://cran.r-project.org/web/packages/nws/).
iterators (http://cran.r-project.org/web/packages/iterators/index.html),
doSMP (http://cran.r-project.org/web/packages/doSMP/index.html).
doSNOW (http://cran.r-project.org/web/packages/doSNOW/index.html),
doMC (http://cran.r-project.org/web/packages/doMC/index.html),
revoIPC (http://cran.r-project.org/web/packages/revoIPC/)

2) RevoScaler package to beat R’s memory problem (this is probably the best in my opinion as it is yet to be replicated by the open source version and is a clear cut reason for going in for the paid version)

http://www.revolutionanalytics.com/products/enterprise-big-data.php

Efficient XDF File Format designed to efficiently handle huge data sets.

Data Step Functionality to quickly clean, transform, explore, and visualize huge data sets.

Data selection functionality to store huge data sets out of memory, and select subsets of rows and columns for in-memory operation with all R functions.

Visualize Large Data sets with line plots and histograms.

Built-in Statistical Algorithms for direct analysis of huge data sets:

Summary Statistics

Linear Regression

Logistic Regression

Crosstabulation

On-the-fly data transformations to include derived variables in models without writing new data files.

Extend Existing Analyses by writing user- defined R functions to “chunk” through huge data sets.

Direct import of fixed-format text data files and SAS data sets into .xdf format

3) RevoDeploy R for API based R solution – I somehow think this feature will get more important as time goes on but it seems a lower visibility offering right now.

http://www.revolutionanalytics.com/products/enterprise-deployment.php

Collection of Web services implemented as a RESTful API.

JavaScript and Java client libraries, allowing users to easily build custom Web applications on top of R.

.NET Client library — includes a COM interoperability to call R from VBA

Management Console for securely administrating servers, scripts and users through HTTP and HTTPS.

XML and JSON format for data exchange.

Built-in security model for authenticated or anonymous invocation of R Scripts.

Repository for storing R objects and R Script execution artifacts.

4) Revolutions IDE (or Productivity Environment) for a faster coding environment than command line. The GUI by Revolution Analytics is in the works. – Having used this- only the Code Snippets function is a clear differentiator from newer IDE and GUI. The code snippets is awesome though and even someone who doesnt know much R can get analysis set up quite fast and accurately.

http://www.revolutionanalytics.com/products/enterprise-productivity.php

Full-featured Visual Debugger for debugging R scripts, with call stack window and step-in, step-over, and step-out capability.

Enhanced Script Editor with hover-over help, word completion, find-across-files capability, automatic syntax checking, bookmarks, and navigation buttons.

Run Selection, Run to Line and Run to Cursor evaluation

R Code Snippets to automatically generate fill-in-the-blank sections of R code with tooltip help.

Object Browser showing available data and function objects (including those in packages), with context menus for plotting and editing data.

Solution Explorer for organizing, viewing, adding, removing, rearranging, and sourcing R scripts.

Customizable Workspace with dockable, floating, and tabbed tool windows.

Version Control Plug-in available for the open source Subversion version control software.

Marketing contributions from Revolution Analytics-

1) Sponsoring R sessions and user meets

2) Evangelizing R at conferences and partnering with corporate partners including JasperSoft, Microsoft , IBM and others at http://www.revolutionanalytics.com/partners/

3) Helping with online initiatives like http://www.inside-r.org/ (which is curiously dormant and now largely superseded by R-Bloggers.com) and the syntax highlighting tool at http://www.inside-r.org/pretty-r. In addition Revolution has been proactive in reaching out to the community

4) Helping pioneer blogging about R and Twitter Hash tag discussions , and contributing to Stack Overflow discussions. Within a short while, #rstats online community has overtaken a lot more established names- partly due to decentralized nature of its working.

Did I miss something out? yes , they share their code by GPL.

Let me know by feedback

Calling #Rstats lovers and bloggers – to work together on “The R Programming wikibook”

so you think u like R, huh. Well it is time to pay it forward.

Message from a dear R blogger, Tal G from Tel Aviv (creator of R-bloggers.com and SAS-X.com)

———————————————————————————————————-
Calling R lovers and bloggers – to work together on “The R Programming wikibook”
Posted: 20 Jun 2011 07:05 AM PDT

This post is a call for both R community members and R-bloggers, to come and help make The R Programming wikibook be amazing:

Dear R community member – please consider giving a visit to The R Programming wikibook. If you wish to contribute your knowledge and editing skills to the project, then you could learn how to write in wiki-markup here, and how to edit a wikibook here (you can even use R syntax highlighting in the wikibook). You could take information into the site from the (soon to be) growing list of available R resources for harvesting.

Dear R blogger, you can help The R Programming wikibook by doing the following:

Write to your readers about the project and invite them to join.
Add your blog’s R content as an available resource for other editors to use for the wikibook. Here is how to do that:
First, make a clear indication on your blog that your content is licensed under cc-by-sa copyrights (*see what it means at the end of the post). You can do this by adding it to the footer of your blog, or by writing a post that clearly states that this is the case (what a great opportunity to write to your readers about the project…).
Next, go and add a link, to where all of your R content is located on your site, to the resource page (also with a link to the license post, if you wrote one). For example, since I write about other things besides R, I would give a link to my R category page, and will also give a link to this post. If you do not know how to add it to the wiki, just e-mail me about it (tal.galili@gmail.com).
If you are an R blogger, besides living up to the spirit of the R community, you will benefit from joining this project in that every time someone will use your content on the wikibook, they will add your post as a resource. In the long run, this is likely to help visitors of the site get to know about you and strengthen your site’s SEO ranking. Which reminds me, if you write about this, I always appreciate a link back to my blog

* Having a cc-by-sa copyrights means that you will agree that anyone may copy, distribute, display, and make derivative works based on your content, only if they give the author (you) the credits in the manner specified by you. And also that the user may distribute derivative works only under a license identical to the license that governs the original work.

———-

Three more points:

1) This post is a result of being contacted by Paul (a.k.a: PAC2), asking if I could help promote “The R Programming wikibook” among R-bloggers and their readers. Paul has made many contributions to the book so far. So thank you Paul for both reaching out and helping all of us with your work on this free open source project.

2) I should also mention that the R wiki exists and is open for contribution. And naturally, every thing that will help the R wikibook will help the R wiki as well.

3) Copyright notice: I hereby release all of the writing material content that is categoriesed in the R category page, under the cc-by-sa copyrights (date: 20.06.2011). Now it’s your turn!

———-

List of R bloggers who have joined: (This list will get updated as this “group writing” project will progress)

R-statistics blog (that’s Tal…)
Decisionstats.com (That’s me)
……………………………………………………………………………….
3) Copyright notice: I hereby release all of the writing material content of this website, under the cc-by-sa copyrights (date: 21.06.2011). Now it’s your turn!

https://decisionstats.com/privacy-3/

Content Licensing-
This website has all content licensed under
http://creativecommons.org/licenses/by-sa/3.0/
You are free:
to Share — to copy, distribute and transmit the work
to Remix — to adapt the work

RStudio 3- Making R as simple as possible but no simpler

From the nice shiny blog at http://blog.rstudio.org/, a shiny new upgraded software (and I used the Cobalt theme)–this is nice!

awesome coding!!!

http://www.rstudio.org/download/

Download RStudio v0.94

If you run R on your desktop:

OR

If you run R on a Linux server and want to enable users to remotely access RStudio using a web browser:

RStudio v0.94 — Release Notes

June 15th, 2011

New Features and Enhancements

Source Editor and Console

Run code:
- Run all lines in source file
- Run to current line
- Run from current line
- Redefine current function
- Re-run previous region
- Code is now run line-by-line in the console
Brace, paren, and quote matching
Improved cursor placement after newlines
Support for regex find and replace
Optional syntax highlighting for console input
Press F1 for help on current selection
Function navigation / jump to function
Column and line number display
Manually set/switch document type
New themes: Solarized and Solarized Dark

Plots

Improved image export:
- Formats: PNG, JPEG, TIFF, SVG, BMP, Metafile, and Postscript
- Dynamic resize with preview
- Option to maintain aspect ratio when resizing
- Copy to clipboard as bitmap or metafile
Improved PDF export:
- Specify custom sizes
- Preview before exporting
Remove individual plots from history
Resizable plot zoom window

History

History tab synced to loaded .Rhistory file
New commands:
- Load and save history
- Remove individual items from history
- Clear all history
New options:
- Load history from working directory or global history file
- Save history always or only when saving .RData
- Remove duplicate entries in history
Shortcut keys for inserting into console or source

Packages

Check for package updates
Filter displayed packages
Install multiple packages
Remove packages
New options:
- Install from repository or local archive file
- Target library
- Install dependencies

Miscellaneous

Find text within help topic
Sort file listing by name, type, size, or modified
Set working directory based on source file, files pane, or browsed for directory.
Console titlebar button to view current working directory in files pane
Source file menu command
Replace space and dash with dot (.) in import dataset generated variable names
Add decimal separator preference for import dataset
Added .tar.gz (Linux) and .zip (Windows) distributions for non-admin installs
Read /etc/paths.d on OS X to ensure RStudio has the same path as terminal sessions do
Added manifest to rsession.exe to prevent unwanted program files and registry virtualization

Server

Break PAM auth into its own binary for improved compatibility with 3rd party PAM authorization modules.
Ensure that AppArmor profile is enforced even after reboot
Ability to add custom LD library path for all sessions
Improved R discovery:
- Use which R then fallback to scanning for R script
- Run R discovery unconfined then switch into restricted profile
Default to uncompressed save.image output if the administrator or user hasn’t specified their own options (improved suspend/resume performance)
Ensure all running sessions are automatically updated during server version upgrade
Added verify-installation command to rstudio-server utility for easily capturing configuration and startup related errors

Bug Fixes

Source Editor

Undo to unedited state clears now dirty bit
Extract function now captures free variables used on lhs
Selected variable highlight now visible in all themes
Syncing to source file updates made outside of RStudio now happens immediately at startup and does not cause a scroll to the bottom of the document.
Fixed various issues related to copying and pasting into word processors
Fixed incorrect syntax highlighting issues in .Rd files
Make sure font size for printed source files matches current editor setting
Eliminate conflict with Ctrl+F shortcut key on OS X
Zoomed Google Chrome browser no longer causes cursor position to be off
Don’t prevent opening of unknown file types in the editor

Console

Fixed sporadic missing underscores (and other bottom clipping of text) in console
Make sure console history is never displayed offscreen
Page Up and Page Down now work properly in the console
Substantially improved console performance for both rapid output and large quantities of output

Miscellaneous

Install successfully on Windows with special characters in home directory name
make install more tolerant of configurations where it can’t write into /usr/share
Eliminate spurious stderr output in forked children of multicore package
Ensure that file modified times always update in the files pane after a save
Always default to installing packages into first writeable path of .libPaths()
Ensure that LaTeX log files are always preserved after compilePdf
Fix conflicts with zap function from epicalc package
Eliminate shortcut key conflicts with Ubuntu desktop workspace switching shortcuts
Always prompt when attempting to save files of the same name
Maximized main window now properly restored when reopening RStudio
PAM authorization works correctly even if account has password expiration warning
Correct display of manipulate panel when Plots pane is on the left

Previous Release Notes

RStudio v0.93 — April 11th, 2011

#Rstats for Business Intelligence

This is a short list of several known as well as lesser known R ( #rstats) language codes, packages and tricks to build a business intelligence application. It will be slightly Messy (and not Messi) but I hope to refine it someday when the cows come home.

It assumes that BI is basically-

a Database, a Document Database, a Report creation/Dashboard pulling software as well unique R packages for business intelligence.

What is business intelligence?

Seamless dissemination of data in the organization. In short let it flow- from raw transactional data to aggregate dashboards, to control and test experiments, to new and legacy data mining models- a business intelligence enabled organization allows information to flow easily AND capture insights and feedback for further action.

BI software has lately meant to be just reporting software- and Business Analytics has meant to be primarily predictive analytics. the terms are interchangeable in my opinion -as BI reports can also be called descriptive aggregated statistics or descriptive analytics, and predictive analytics is useless and incomplete unless you measure the effect in dashboards and summary reports.

Data Mining- is a bit more than predictive analytics- it includes pattern recognizability as well as black box machine learning algorithms. To further aggravate these divides, students mostly learn data mining in computer science, predictive analytics (if at all) in business departments and statistics, and no one teaches metrics , dashboards, reporting in mainstream academia even though a large number of graduates will end up fiddling with spreadsheets or dashboards in real careers.

Using R with

1) Databases-

I created a short list of database connectivity with R here at https://rforanalytics.wordpress.com/odbc-databases-for-r/ but R has released 3 new versions since then.

The RODBC package remains the package of choice for connecting to SQL Databases.

http://cran.r-project.org/web/packages/RODBC/RODBC.pdf

Details on creating DSN and connecting to Databases are given at https://rforanalytics.wordpress.com/odbc-databases-for-r/

For document databases like MongoDB and CouchDB

( what is the difference between traditional RDBMS and NoSQL if you ever need to explain it in a cocktail conversation http://dba.stackexchange.com/questions/5/what-are-the-differences-between-nosql-and-a-traditional-rdbms

Basically dispensing with the relational setup, with primary and foreign keys, and with the additional overhead involved in keeping transactional safety, often gives you extreme increases in performance

NoSQL is a kind of database that doesn’t have a fixed schema like a traditional RDBMS does. With the NoSQL databases the schema is defined by the developer at run time. They don’t write normal SQL statements against the database, but instead use an API to get the data that they need.

instead relating data in one table to another you store things as key value pairs and there is no database schema, it is handled instead in code.)

I believe any corporation with data driven decision making would need to both have atleast one RDBMS and one NoSQL for unstructured data-Ajay. This is a sweeping generic statement 😉 , and is an opinion on future technologies.

Use RMongo

From- http://tommy.chheng.com/2010/11/03/rmongo-accessing-mongodb-in-r/

use the C API for MongoDB to fetch some MongoDB data from R.

http://plindenbaum.blogspot.com/2010/09/connecting-to-mongodb-database-from-r.html

Connecting to a MongoDB database from R using Java

http://nsaunders.wordpress.com/2010/09/24/connecting-to-a-mongodb-database-from-r-using-java/

Also see a nice basic analysis using R Mongo from

http://pseudofish.com/blog/2011/05/25/analysis-of-data-with-mongodb-and-r/

For CouchDB

please see https://github.com/wactbprot/R4CouchDB and

http://digitheadslabnotebook.blogspot.com/2010/10/couchdb-and-r.html

First install RCurl and RJSONIO. You’ll have to download the tar.gz’s if you’re on a Mac. For the second part, we’ll need to installR4CouchDB,

2) External Report Creating Software-

Jaspersoft- It has good integration with R and is a certified Revolution Analytics partner (who seem to be the only ones with a coherent #Rstats go to market strategy- which begs the question – why is the freest and finest stats software having only ONE vendor- if it was so great lots of companies would make exclusive products for it – (and some do -see https://rforanalytics.wordpress.com/r-business-solutions/ and https://rforanalytics.wordpress.com/using-r-from-other-software/)

From

http://www.jaspersoft.com/sites/default/files/downloads/events/Analytics%20-Jaspersoft-SEP2010.pdf

we see

http://jasperforge.org/projects/rrevodeployrbyrevolutionanalytics

RevoConnectR for JasperReports Server

RevoConnectR for JasperReports Server RevoConnectR for JasperReports Server is a Java library interface between JasperReports Server and Revolution R Enterprise’s RevoDeployR, a standardized collection of web services that integrates security, APIs, scripts and libraries for R into a single server. JasperReports Server dashboards can retrieve R charts and result sets from RevoDeployR.

http://jasperforge.org/plugins/esp_frs/optional_download.php?group_id=409

Using R and Pentaho

http://www.pentaho.com/products/demos/showNtell.php or

http://www.pentaho.com/products/demos/showNtell.php?tab=demos&article=r-project-with-pentaho

Extending Pentaho with R analytics”R” is a popular open source statistical and analytical language that academics and commercial organizations alike have used for years to get maximum insight out of information using advanced analytic techniques. In this twelve-minute video, David Reinke from Pentaho Certified Partner OpenBI provides an overview of R, as well as a demonstration of integration between R and Pentaho.

and from

http://www.r-project.org/conferences/useR-2010/abstracts/Reinke+Miller.pdf

R and BI – Integrating R with Open Source Business
Intelligence Platforms Pentaho and Jaspersoft
David Reinke, Steve Miller
Keywords: business intelligence
Increasingly, R is becoming the tool of choice for statistical analysis, optimization, machine learning and
visualization in the business world. This trend will only escalate as more R analysts transition to business
from academia. But whereas in academia R is often the central tool for analytics, in business R must coexist
with and enhance mainstream business intelligence (BI) technologies. A modern BI portfolio already includes
relational databeses, data integration (extract, transform, load – ETL), query and reporting, online analytical
processing (OLAP), dashboards, and advanced visualization. The opportunity to extend traditional BI with
R analytics revolves on the introduction of advanced statistical modeling and visualizations native to R. The
challenge is to seamlessly integrate R capabilities within the existing BI space. This presentation will explain
and demo an initial approach to integrating R with two comprehensive open source BI (OSBI) platforms –
Pentaho and Jaspersoft. Our eﬀorts will be successful if we stimulate additional progress, transparency and
innovation by combining the R and BI worlds.
The demonstration will show how we integrated the OSBI platforms with R through use of RServe and
its Java API. The BI platforms provide an end user web application which include application security,
data provisioning and BI functionality. Our integration will demonstrate a process by which BI components
can be created that prompt the user for parameters, acquire data from a relational database and pass into
RServer, invoke R commands for processing, and display the resulting R generated statistics and/or graphs
within the BI platform. Discussion will include concepts related to creating a reusable java class library of
commonly used processes to speed additional development.

If you know Java- try http://ramanareddyg.blog.com/2010/07/03/integrating-r-and-pentaho-data-integration/

and I like this list by two venerable powerhouses of the BI Open Source Movement

http://www.openbi.com/demosarticles.html

Open Source BI as disruptive technology

http://www.openbi.biz/articles/osbi_disruption_openbi.pdf

Open Source Punditry

TITLE	AUTHOR	COMMENTS
Commercial Open Source BI Redux	Dave Reinke & Steve Miller	An review and update on the predictions made in our 2007 article focused on the current state of the commercial open source BI market. Also included is a brief analysis of potential options for commercial open source business models and our take on their applicability.
Open Source BI as Disruptive Technology	Dave Reinke & Steve Miller	Reprint of May 2007 DM Review article explaining how and why Commercial Open Source BI (COSBI) will disrupt the traditional proprietary market.

Spotlight on R

TITLE	AUTHOR	COMMENTS
R You Ready for Open Source Statistics?	Steve Miller	R has become the “lingua franca” for academic statistical analysis and modeling, and is now rapidly gaining exposure in the commercial world. Steve examines the R technology and community and its relevancy to mainstream BI.
R and BI (Part 1): Data Analysis with R	Steve Miller	An introduction to R and its myriad statistical graphing techniques.
R and BI (Part 2): A Statistical Look at Detail Data	Steve Miller	The usage of R’s graphical building blocks – dotplots, stripplots and xyplots – to create dashboards which require little ink yet tell a big story.
R and BI (Part 3): The Grooming of Box and Whiskers	Steve Miller	Boxplots and variants (e.g. Violin Plot) are explored as an essential graphical technique to summarize data distributions by categories and dimensions of other attributes.
R and BI (Part 4): Embellishing Graphs	Steve Miller	Lattices and logarithmic data transformations are used to illuminate data density and distribution and find patterns otherwise missed using classic charting techniques.
R and BI (Part 5): Predictive Modelling	Steve Miller	An introduction to basic predictive modelling terminology and techniques with graphical examples created using R.
R and BI (Part 6) : Re-expressing Data	Steve Miller	How do you deal with highly skewed data distributions? Standard charting techniques on this “deviant” data often fail to illuminate relationships. This article explains techniques to re-express skewed data so that it is more understandable.
The Stock Market, 2007	Steve Miller	R-based dashboards are presented to demonstrate the return performance of various asset classes during 2007.
Bootstrapping for Portfolio Returns: The Practice of Statistical Analysis	Steve Miller	Steve uses the R open source stats package and Monte Carlo simulations to examine alternative investment portfolio returns…a good example of applied statistics using R.
Statistical Graphs for Portfolio Returns	Steve Miller	Steve uses the R open source stats package to analyze market returns by asset class with some very provocative embedded trellis charts.
Frank Harrell, Iowa State and useR!2007	Steve Miller	In August, Steve attended the 2007 Internation R User conference (useR!2007). This article details his experiences, including his meeting with long-time R community expert, Frank Harrell.
An Open Source Statistical “Dashboard” for Investment Performance	Steve Miller	The newly launched Dashboard Insight web site is focused on the most useful of BI tools: dashboards. With this article discussing the use of R and trellis graphics, OpenBI brings the realm of open source to this forum.
Unsexy Graphics for Business Intelligence	Steve Miller	Utilizing Tufte’s philosophy of maximizing the data to ink ratio of graphics, Steve demonstrates the value in dot plot diagramming. The R open source statistical/analytics software is showcased.

I think that the report generation package Brew would also qualify as a BI package, but large scale implementation remains to be seen in

a commercial business environment

brew: Creating Repetitive Reports

 brew: Templating Framework for Report Generation

brew implements a templating framework for mixing text and R code for report generation. brew template syntax is similar to PHP, Ruby's erb module, Java Server Pages, and Python's psp module. http://bit.ly/jINmaI

http://learnr.wordpress.com/2009/09/09/brew-creating-repetitive-reports/

Yarr- creating reports in R

http://biostatmatt.com/archives/1000

the formidable Dirk with awesome stock reports

http://dirk.eddelbuettel.com/blog/2011/01/16/#overbought_oversold_plot

http://dirk.eddelbuettel.com/blog/code/snippets/sp500obos.png

to be continued ( when I have more time and the temperature goes down from 110F in Delhi, India)

Jaspersoft 4.1 launched (decisionstats.com)
rstat.us – a rival to Twitter? (i-programmer.info)

Please share:

Please share:

Please share:

Please share:

Download RStudio v0.94

If you run R on your desktop:

OR

If you run R on a Linux server and want to enable users to remotely access RStudio using a web browser:

RStudio v0.94 — Release Notes

June 15th, 2011

New Features and Enhancements

Source Editor and Console

Plots

History

Packages

Miscellaneous

Server

Bug Fixes

Source Editor

Console

Miscellaneous

Previous Release Notes

Please share:

Spotlight on R

Related articles

Please share: