Interview Michal Kosinski , Concerto Web Based App using #Rstats
Here is an interview with Michal Kosinski , leader of the team that has created Concerto – a web based application using R. What is Concerto? As per http://www.psychometrics.cam.ac.uk/page/300/concerto-testing-platform.htm
Concerto is a web based, adaptive testing platform for creating and running rich, dynamic tests. It combines the flexibility of HTML presentation with the computing power of the R language, and the safety and performance of the MySQL database. It’s totally free for commercial and academic use, and it’s open source
Ajay- Describe your career in science from high school to this point. What are the various stats platforms you have trained on- and what do you think about their comparative advantages and disadvantages?
Michal- I started with maths, but quickly realized that I prefer social sciences – thus after one year, I switched to a psychology major and obtained my MSc in Social Psychology with a specialization in Consumer Behaviour. At that time I was mostly using SPSS – as it was the only statistical package that was taught to students in my department. Also, it was not too bad for small samples and the rather basic analyses I was performing at that time.
My more recent research performed during my Mphil course in Psychometrics at Cambridge University followed by my current PhD project in social networks and research work at Microsoft Research, requires significantly more powerful tools. Initially, I tried to squeeze as much as possible from SPSS/PASW by mastering the syntax language. SPSS was all I knew, though I reached its limits pretty quickly and was forced to switch to R. It was a pretty dreary experience at the start, switching from an unwieldy but familiar environment into an unwelcoming command line interface, but I’ve quickly realized how empowering and convenient this tool was.
I believe that a course in R should be obligatory for all students that are likely to come close to any data analysis in their careers. It is really empowering – once you got the basics you have the potential to use virtually any method there is, and automate most tasks related to analysing and processing data. It is also free and open-source – so you can use it wherever you work. Finally, it enables you to quickly and seamlessly migrate to other powerful environments such as Matlab, C, or Python.
Ajay- What was the motivation behind building Concerto?
Michal- We deal with a lot of online projects at the Psychometrics Centre – one of them attracted more than 7 million unique participants. We needed a powerful tool that would allow researchers and practitioners to conveniently build and deliver online tests.
Also, our relationships with the website designers and software engineers that worked on developing our tests were rather difficult. We had trouble successfully explaining our needs, each little change was implemented with a delay and at significant cost. Not to mention the difficulties with embedding some more advanced methods (such as adaptive testing) in our tests.
So we created a tool allowing us, psychometricians, to easily develop psychometric tests from scratch an publish them online. And all this without having to hire software developers.
Ajay -Why did you choose R as the background for Concerto? What other languages and platforms did you consider. Apart from Concerto, how else do you utilize R in your center, department and University?
Michal- R was a natural choice as it is open-source, free, and nicely integrates with a server environment. Also, we believe that it is becoming a universal statistical and data processing language in science. We put increasing emphasis on teaching R to our students and we hope that it will replace SPSS/PASW as a default statistical tool for social scientists.
Ajay -What all can Concerto do besides a computer adaptive test?
Michal- We did not plan it initially, but Concerto turned out to be extremely flexible. In a nutshell, it is a web interface to R engine with a built-in MySQL database and easy-to-use developer panel. It can be installed on both Windows and Unix systems and used over the network or locally.
Effectively, it can be used to build any kind of web application that requires a powerful and quickly deployable statistical engine. For instance, I envision an easy to use website (that could look a bit like SPSS) allowing students to analyse their data using a web browser alone (learning the underlying R code simultaneously). Also, the authors of R libraries (or anyone else) could use Concerto to build user-friendly web interfaces to their methods.
Finally, Concerto can be conveniently used to build simple non-adaptive tests and questionnaires. It might seem to be slightly less intuitive at first than popular questionnaire services (such us my favourite Survey Monkey), but has virtually unlimited flexibility when it comes to item format, test flow, feedback options, etc. Also, it’s free.
Ajay- How do you see the cloud computing paradigm growing? Do you think browser based computation is here to stay?
Michal – I believe that cloud infrastructure is the future. Dynamically sharing computational and network resources between online service providers has a great competitive advantage over traditional strategies to deal with network infrastructure. I am sure the security concerns will be resolved soon, finishing the transformation of the network infrastructure as we know it. On the other hand, however, I do not see a reason why client-side (or browser) processing of the information should cease to exist – I rather think that the border between the cloud and personal or local computer will continually dissolve.
Michal Kosinski is Director of Operations for The Psychometrics Centre and Leader of the e-Psychometrics Unit. He is also a research advisor to the Online Services and Advertising group at the Microsoft Research Cambridge, and a visiting lecturer at the Department of Mathematics in the University of Namur, Belgium. You can read more about him at http://www.michalkosinski.com/
You can read more about Concerto at http://code.google.com/p/concerto-platform/ and http://www.psychometrics.cam.ac.uk/page/300/concerto-testing-platform.htm
#rstats -Basic Data Manipulation using R
Continuing my series of basic data manipulation using R. For people knowing analytics and
new to R.
1 Keeping only some variables Using subset we can keep only the variables we want- Sitka89 <- subset(Sitka89, select=c(size,Time,treat)) Will keep only the variables we have selected (size,Time,treat). 2 Dropping some variables Harman23.cor$cov.arm.span <- NULL
This deletes the variable named cov.arm.span in the dataset Harman23.cor 3 Keeping records based on character condition Titanic.sub1<-subset(Titanic,Sex=="Male") Note the double equal-to sign
4 Keeping records based on date/time condition subset(DF, as.Date(Date) >= '2009-09-02' & as.Date(Date) <= '2009-09-04') 5 Converting Date Time Formats into other formats if the variable dob is “01/04/1977) then following will convert into a date object z=strptime(dob,”%d/%m/%Y”) and if the same date is 01Apr1977 z=strptime(dob,"%d%b%Y") 6 Difference in Date Time Values and Using Current Time The difftime function helps in creating differences in two date time variables. difftime(time1, time2, units='secs') or difftime(time1, time2, tz = "", units = c("auto", "secs", "mins", "hours", "days", "weeks")) For current system date time values you can use Sys.time() Sys.Date() This value can be put in the difftime function shown above to calculate age or time elapsed. 7 Keeping records based on numerical condition Titanic.sub1<-subset(Titanic,Freq >37) For enhanced usage-
you can also use the R Commander GUI with the sub menu Data > Active Dataset 8 Sorting Data Sorting A Data Frame in Ascending Order by a variable AggregatedData<- sort(AggregatedData, by=~ Package) Sorting a Data Frame in Descending Order by a variable AggregatedData<- sort(AggregatedData, by=~ -Installed) 9 Transforming a Dataset Structure around a single variable Using the Reshape2 Package we can use melt and acast functions library("reshape2") tDat.m<- melt(tDat) tDatCast<- acast(tDat.m,Subject~Item) If we choose not to use Reshape package, we can use the default reshape method in R. Please do note this takes longer processing time for bigger datasets. df.wide <- reshape(df, idvar="Subject", timevar="Item", direction="wide") 10 Type in Data Using scan() function we can type in data in a list 11 Using Diff for lags and Cum Sum function forCumulative Sums We can use the diff function to calculate difference between two successive values of a variable. Diff(Dataset$X) Cumsum function helps to give cumulative sum Cumsum(Dataset$X) > x=rnorm(10,20) #This gives 10 Randomly distributed numbers with Mean 20 > x  20.76078 19.21374 18.28483 20.18920 21.65696 19.54178 18.90592 20.67585  20.02222 18.99311 > diff(x)  -1.5470415 -0.9289122 1.9043664 1.4677589 -2.1151783 -0.6358585 1.7699296  -0.6536232 -1.0291181 > cumsum(x)  20.76078 39.97453 58.25936 78.44855 100.10551 119.64728 138.55320  159.22905 179.25128 198.24438 > diff(x,2) # The diff function can be used as diff(x, lag = 1, differences = 1, ...) where differences is the order of differencing  -2.4759536 0.9754542 3.3721252 -0.6474195 -2.7510368 1.1340711 1.1163064  -1.6827413 Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole. 12 Merging Data Deducer GUI makes it much simpler to merge datasets. The simplest syntax for a merge statement is totalDataframeZ <- merge(dataframeX,dataframeY,by=c("AccountId","Region")) 13 Aggregating and group processing of a variable We can use multiple methods for aggregating and by group processing of variables.
Two functions we explore here are aggregate and Tapply. Refering to the R Online Manual at
[http://stat.ethz.ch/R-manual/R-patched/library/stats/html/aggregate.html] ## Compute the averages for the variables in 'state.x77', grouped ## according to the region (Northeast, South, North Central, West) that ## each state belongs to aggregate(state.x77, list(Region = state.region), mean) Using TApply ## tapply(Summary Variable, Group Variable, Function) Reference [http://www.ats.ucla.edu/stat/r/library/advanced_function_r.htm#tapply] We can also use specialized packages for data manipulation. For additional By-group processing you can see the doBy package as well as Plyr package
for data manipulation.Doby contains a variety of utilities including:
1) Facilities for groupwise computations of summary statistics and other facilities for working with grouped data.
2) General linear contrasts and LSMEANS (least-squares-means also known as population means),
3) HTMLreport for autmatic generation of HTML file from R-script with a minimum of markup, 4) various other utilities and is available at[ http://cran.r-project.org/web/packages/doBy/index.html]
Also Available at [http://cran.r-project.org/web/packages/plyr/index.html],
Plyr is a set of tools that solves a common set of problems:
you need to break a big problem down into manageable pieces,
operate on each pieces and then put all the pieces back together.
For example, you might want to fit a model to each spatial location or
time point in your study, summarise data by panels or collapse high-dimensional arrays
to simpler summary statistics.
Contribution to #Rstats by Revolution
I have been watching for Revolution Analytics product almost since the inception of the company. It has managed to sail over storms, naysayers and critics with simple and effective strategy of launching good software, making good partnerships and keeping up media visibility with white papers, joint webinars, blogs, conferences and events.
However this is a listing of all technical contributions made by Revolution Analytics products to the #rstats project.
1) Useful Packages mostly in parallel processing or more efficient computing like
- foreach (http://cran.r-project.org/web/packages/foreach/index.html) ,
- nws (http://cran.r-project.org/web/packages/nws/).
- iterators (http://cran.r-project.org/web/packages/iterators/index.html),
- doSMP (http://cran.r-project.org/web/packages/doSMP/index.html).
- doSNOW (http://cran.r-project.org/web/packages/doSNOW/index.html),
- doMC (http://cran.r-project.org/web/packages/doMC/index.html),
- revoIPC (http://cran.r-project.org/web/packages/revoIPC/)
2) RevoScaler package to beat R’s memory problem (this is probably the best in my opinion as it is yet to be replicated by the open source version and is a clear cut reason for going in for the paid version)
- Efficient XDF File Format designed to efficiently handle huge data sets.
- Data Step Functionality to quickly clean, transform, explore, and visualize huge data sets.
- Data selection functionality to store huge data sets out of memory, and select subsets of rows and columns for in-memory operation with all R functions.
- Visualize Large Data sets with line plots and histograms.
- Built-in Statistical Algorithms for direct analysis of huge data sets:
- Summary Statistics
- Linear Regression
- Logistic Regression
- On-the-fly data transformations to include derived variables in models without writing new data files.
- Extend Existing Analyses by writing user- defined R functions to “chunk” through huge data sets.
- Direct import of fixed-format text data files and SAS data sets into .xdf format
3) RevoDeploy R for API based R solution – I somehow think this feature will get more important as time goes on but it seems a lower visibility offering right now.
- Collection of Web services implemented as a RESTful API.
- .NET Client library — includes a COM interoperability to call R from VBA
- Management Console for securely administrating servers, scripts and users through HTTP and HTTPS.
- XML and JSON format for data exchange.
- Built-in security model for authenticated or anonymous invocation of R Scripts.
- Repository for storing R objects and R Script execution artifacts.
4) Revolutions IDE (or Productivity Environment) for a faster coding environment than command line. The GUI by Revolution Analytics is in the works. – Having used this- only the Code Snippets function is a clear differentiator from newer IDE and GUI. The code snippets is awesome though and even someone who doesnt know much R can get analysis set up quite fast and accurately.
- Full-featured Visual Debugger for debugging R scripts, with call stack window and step-in, step-over, and step-out capability.
- Enhanced Script Editor with hover-over help, word completion, find-across-files capability, automatic syntax checking, bookmarks, and navigation buttons.
- Run Selection, Run to Line and Run to Cursor evaluation
- R Code Snippets to automatically generate fill-in-the-blank sections of R code with tooltip help.
- Object Browser showing available data and function objects (including those in packages), with context menus for plotting and editing data.
- Solution Explorer for organizing, viewing, adding, removing, rearranging, and sourcing R scripts.
- Customizable Workspace with dockable, floating, and tabbed tool windows.
- Version Control Plug-in available for the open source Subversion version control software.
Marketing contributions from Revolution Analytics-
1) Sponsoring R sessions and user meets
2) Evangelizing R at conferences and partnering with corporate partners including JasperSoft, Microsoft , IBM and others at http://www.revolutionanalytics.com/partners/
3) Helping with online initiatives like http://www.inside-r.org/ (which is curiously dormant and now largely superseded by R-Bloggers.com) and the syntax highlighting tool at http://www.inside-r.org/pretty-r. In addition Revolution has been proactive in reaching out to the community
4) Helping pioneer blogging about R and Twitter Hash tag discussions , and contributing to Stack Overflow discussions. Within a short while, #rstats online community has overtaken a lot more established names- partly due to decentralized nature of its working.
Did I miss something out? yes , they share their code by GPL.
Let me know by feedback
Calling #Rstats lovers and bloggers – to work together on “The R Programming wikibook”
so you think u like R, huh. Well it is time to pay it forward.
Message from a dear R blogger, Tal G from Tel Aviv (creator of R-bloggers.com and SAS-X.com)
Calling R lovers and bloggers – to work together on “The R Programming wikibook”
Posted: 20 Jun 2011 07:05 AM PDT
This post is a call for both R community members and R-bloggers, to come and help make The R Programming wikibook be amazing:
Dear R community member – please consider giving a visit to The R Programming wikibook. If you wish to contribute your knowledge and editing skills to the project, then you could learn how to write in wiki-markup here, and how to edit a wikibook here (you can even use R syntax highlighting in the wikibook). You could take information into the site from the (soon to be) growing list of available R resources for harvesting.
Dear R blogger, you can help The R Programming wikibook by doing the following:
Write to your readers about the project and invite them to join.
Add your blog’s R content as an available resource for other editors to use for the wikibook. Here is how to do that:
First, make a clear indication on your blog that your content is licensed under cc-by-sa copyrights (*see what it means at the end of the post). You can do this by adding it to the footer of your blog, or by writing a post that clearly states that this is the case (what a great opportunity to write to your readers about the project…).
Next, go and add a link, to where all of your R content is located on your site, to the resource page (also with a link to the license post, if you wrote one). For example, since I write about other things besides R, I would give a link to my R category page, and will also give a link to this post. If you do not know how to add it to the wiki, just e-mail me about it (email@example.com).
If you are an R blogger, besides living up to the spirit of the R community, you will benefit from joining this project in that every time someone will use your content on the wikibook, they will add your post as a resource. In the long run, this is likely to help visitors of the site get to know about you and strengthen your site’s SEO ranking. Which reminds me, if you write about this, I always appreciate a link back to my blog
* Having a cc-by-sa copyrights means that you will agree that anyone may copy, distribute, display, and make derivative works based on your content, only if they give the author (you) the credits in the manner specified by you. And also that the user may distribute derivative works only under a license identical to the license that governs the original work.
Three more points:
1) This post is a result of being contacted by Paul (a.k.a: PAC2), asking if I could help promote “The R Programming wikibook” among R-bloggers and their readers. Paul has made many contributions to the book so far. So thank you Paul for both reaching out and helping all of us with your work on this free open source project.
2) I should also mention that the R wiki exists and is open for contribution. And naturally, every thing that will help the R wikibook will help the R wiki as well.
3) Copyright notice: I hereby release all of the writing material content that is categoriesed in the R category page, under the cc-by-sa copyrights (date: 20.06.2011). Now it’s your turn!
List of R bloggers who have joined: (This list will get updated as this “group writing” project will progress)
R-statistics blog (that’s Tal…)
Decisionstats.com (That’s me)
3) Copyright notice: I hereby release all of the writing material content of this website, under the cc-by-sa copyrights (date: 21.06.2011). Now it’s your turn!
This website has all content licensed under
You are free:
to Share — to copy, distribute and transmit the work
to Remix — to adapt the work
RStudio 3- Making R as simple as possible but no simpler
From the nice shiny blog at http://blog.rstudio.org/, a shiny new upgraded software (and I used the Cobalt theme)–this is nice!
Download RStudio v0.94
If you run R on a Linux server and want to enable users to remotely access RStudio using a web browser:
RStudio v0.94 — Release Notes
June 15th, 2011
New Features and Enhancements
Source Editor and Console
- Run code:
- Run all lines in source file
- Run to current line
- Run from current line
- Redefine current function
- Re-run previous region
- Code is now run line-by-line in the console
- Brace, paren, and quote matching
- Improved cursor placement after newlines
- Support for regex find and replace
- Optional syntax highlighting for console input
- Press F1 for help on current selection
- Function navigation / jump to function
- Column and line number display
- Manually set/switch document type
- New themes: Solarized and Solarized Dark
- Improved image export:
- Formats: PNG, JPEG, TIFF, SVG, BMP, Metafile, and Postscript
- Dynamic resize with preview
- Option to maintain aspect ratio when resizing
- Copy to clipboard as bitmap or metafile
- Improved PDF export:
- Specify custom sizes
- Preview before exporting
- Remove individual plots from history
- Resizable plot zoom window
- History tab synced to loaded .Rhistory file
- New commands:
- Load and save history
- Remove individual items from history
- Clear all history
- New options:
- Load history from working directory or global history file
- Save history always or only when saving .RData
- Remove duplicate entries in history
- Shortcut keys for inserting into console or source
- Check for package updates
- Filter displayed packages
- Install multiple packages
- Remove packages
- New options:
- Install from repository or local archive file
- Target library
- Install dependencies
- Find text within help topic
- Sort file listing by name, type, size, or modified
- Set working directory based on source file, files pane, or browsed for directory.
- Console titlebar button to view current working directory in files pane
- Source file menu command
- Replace space and dash with dot (.) in import dataset generated variable names
- Add decimal separator preference for import dataset
- Added .tar.gz (Linux) and .zip (Windows) distributions for non-admin installs
- Read /etc/paths.d on OS X to ensure RStudio has the same path as terminal sessions do
- Added manifest to rsession.exe to prevent unwanted program files and registry virtualization
- Break PAM auth into its own binary for improved compatibility with 3rd party PAM authorization modules.
- Ensure that AppArmor profile is enforced even after reboot
- Ability to add custom LD library path for all sessions
- Improved R discovery:
- Use which R then fallback to scanning for R script
- Run R discovery unconfined then switch into restricted profile
- Default to uncompressed save.image output if the administrator or user hasn’t specified their own options (improved suspend/resume performance)
- Ensure all running sessions are automatically updated during server version upgrade
- Added verify-installation command to rstudio-server utility for easily capturing configuration and startup related errors
- Undo to unedited state clears now dirty bit
- Extract function now captures free variables used on lhs
- Selected variable highlight now visible in all themes
- Syncing to source file updates made outside of RStudio now happens immediately at startup and does not cause a scroll to the bottom of the document.
- Fixed various issues related to copying and pasting into word processors
- Fixed incorrect syntax highlighting issues in .Rd files
- Make sure font size for printed source files matches current editor setting
- Eliminate conflict with Ctrl+F shortcut key on OS X
- Zoomed Google Chrome browser no longer causes cursor position to be off
- Don’t prevent opening of unknown file types in the editor
- Fixed sporadic missing underscores (and other bottom clipping of text) in console
- Make sure console history is never displayed offscreen
- Page Up and Page Down now work properly in the console
- Substantially improved console performance for both rapid output and large quantities of output
- Install successfully on Windows with special characters in home directory name
- make install more tolerant of configurations where it can’t write into /usr/share
- Eliminate spurious stderr output in forked children of multicore package
- Ensure that file modified times always update in the files pane after a save
- Always default to installing packages into first writeable path of .libPaths()
- Ensure that LaTeX log files are always preserved after compilePdf
- Fix conflicts with zap function from epicalc package
- Eliminate shortcut key conflicts with Ubuntu desktop workspace switching shortcuts
- Always prompt when attempting to save files of the same name
- Maximized main window now properly restored when reopening RStudio
- PAM authorization works correctly even if account has password expiration warning
- Correct display of manipulate panel when Plots pane is on the left
Previous Release Notes
#Rstats for Business Intelligence
This is a short list of several known as well as lesser known R ( #rstats) language codes, packages and tricks to build a business intelligence application. It will be slightly Messy (and not Messi) but I hope to refine it someday when the cows come home.
It assumes that BI is basically-
a Database, a Document Database, a Report creation/Dashboard pulling software as well unique R packages for business intelligence.
What is business intelligence?
Seamless dissemination of data in the organization. In short let it flow- from raw transactional data to aggregate dashboards, to control and test experiments, to new and legacy data mining models- a business intelligence enabled organization allows information to flow easily AND capture insights and feedback for further action.
BI software has lately meant to be just reporting software- and Business Analytics has meant to be primarily predictive analytics. the terms are interchangeable in my opinion -as BI reports can also be called descriptive aggregated statistics or descriptive analytics, and predictive analytics is useless and incomplete unless you measure the effect in dashboards and summary reports.
Data Mining- is a bit more than predictive analytics- it includes pattern recognizability as well as black box machine learning algorithms. To further aggravate these divides, students mostly learn data mining in computer science, predictive analytics (if at all) in business departments and statistics, and no one teaches metrics , dashboards, reporting in mainstream academia even though a large number of graduates will end up fiddling with spreadsheets or dashboards in real careers.
Using R with
I created a short list of database connectivity with R here at https://rforanalytics.wordpress.com/odbc-databases-for-r/ but R has released 3 new versions since then.
The RODBC package remains the package of choice for connecting to SQL Databases.
Details on creating DSN and connecting to Databases are given at https://rforanalytics.wordpress.com/odbc-databases-for-r/
For document databases like MongoDB and CouchDB
( what is the difference between traditional RDBMS and NoSQL if you ever need to explain it in a cocktail conversation http://dba.stackexchange.com/questions/5/what-are-the-differences-between-nosql-and-a-traditional-rdbms
Basically dispensing with the relational setup, with primary and foreign keys, and with the additional overhead involved in keeping transactional safety, often gives you extreme increases in performance
NoSQL is a kind of database that doesn’t have a fixed schema like a traditional RDBMS does. With the NoSQL databases the schema is defined by the developer at run time. They don’t write normal SQL statements against the database, but instead use an API to get the data that they need.
instead relating data in one table to another you store things as key value pairs and there is no database schema, it is handled instead in code.)
I believe any corporation with data driven decision making would need to both have atleast one RDBMS and one NoSQL for unstructured data-Ajay. This is a sweeping generic statement 😉 , and is an opinion on future technologies.
- Use RMongo
- use the C API for MongoDB to fetch some MongoDB data from R.
Connecting to a MongoDB database from R using Java
Also see a nice basic analysis using R Mongo from
please see https://github.com/wactbprot/R4CouchDB and
2) External Report Creating Software-
Jaspersoft- It has good integration with R and is a certified Revolution Analytics partner (who seem to be the only ones with a coherent #Rstats go to market strategy- which begs the question – why is the freest and finest stats software having only ONE vendor- if it was so great lots of companies would make exclusive products for it – (and some do -see https://rforanalytics.wordpress.com/r-business-solutions/ and https://rforanalytics.wordpress.com/using-r-from-other-software/)
RevoConnectR for JasperReports Server
RevoConnectR for JasperReports Server RevoConnectR for JasperReports Server is a Java library interface between JasperReports Server and Revolution R Enterprise’s RevoDeployR, a standardized collection of web services that integrates security, APIs, scripts and libraries for R into a single server. JasperReports Server dashboards can retrieve R charts and result sets from RevoDeployR.
R and BI – Integrating R with Open Source Business Intelligence Platforms Pentaho and Jaspersoft David Reinke, Steve Miller Keywords: business intelligence Increasingly, R is becoming the tool of choice for statistical analysis, optimization, machine learning and visualization in the business world. This trend will only escalate as more R analysts transition to business from academia. But whereas in academia R is often the central tool for analytics, in business R must coexist with and enhance mainstream business intelligence (BI) technologies. A modern BI portfolio already includes relational databeses, data integration (extract, transform, load – ETL), query and reporting, online analytical processing (OLAP), dashboards, and advanced visualization. The opportunity to extend traditional BI with R analytics revolves on the introduction of advanced statistical modeling and visualizations native to R. The challenge is to seamlessly integrate R capabilities within the existing BI space. This presentation will explain and demo an initial approach to integrating R with two comprehensive open source BI (OSBI) platforms – Pentaho and Jaspersoft. Our eﬀorts will be successful if we stimulate additional progress, transparency and innovation by combining the R and BI worlds. The demonstration will show how we integrated the OSBI platforms with R through use of RServe and its Java API. The BI platforms provide an end user web application which include application security, data provisioning and BI functionality. Our integration will demonstrate a process by which BI components can be created that prompt the user for parameters, acquire data from a relational database and pass into RServer, invoke R commands for processing, and display the resulting R generated statistics and/or graphs within the BI platform. Discussion will include concepts related to creating a reusable java class library of commonly used processes to speed additional development.
If you know Java- try http://ramanareddyg.blog.com/2010/07/03/integrating-r-and-pentaho-data-integration/
and I like this list by two venerable powerhouses of the BI Open Source Movement
Open Source BI as disruptive technology
Open Source Punditry
|Commercial Open Source BI Redux||Dave Reinke & Steve Miller||An review and update on the predictions made in our 2007 article focused on the current state of the commercial open source BI market. Also included is a brief analysis of potential options for commercial open source business models and our take on their applicability.|
|Open Source BI as Disruptive Technology||Dave Reinke & Steve Miller||Reprint of May 2007 DM Review article explaining how and why Commercial Open Source BI (COSBI) will disrupt the traditional proprietary market.|
Spotlight on R
|R You Ready for Open Source Statistics?||Steve Miller||R has become the “lingua franca” for academic statistical analysis and modeling, and is now rapidly gaining exposure in the commercial world. Steve examines the R technology and community and its relevancy to mainstream BI.|
|R and BI (Part 1): Data Analysis with R||Steve Miller||An introduction to R and its myriad statistical graphing techniques.|
|R and BI (Part 2): A Statistical Look at Detail Data||Steve Miller||The usage of R’s graphical building blocks – dotplots, stripplots and xyplots – to create dashboards which require little ink yet tell a big story.|
|R and BI (Part 3): The Grooming of Box and Whiskers||Steve Miller||Boxplots and variants (e.g. Violin Plot) are explored as an essential graphical technique to summarize data distributions by categories and dimensions of other attributes.|
|R and BI (Part 4): Embellishing Graphs||Steve Miller||Lattices and logarithmic data transformations are used to illuminate data density and distribution and find patterns otherwise missed using classic charting techniques.|
|R and BI (Part 5): Predictive Modelling||Steve Miller||An introduction to basic predictive modelling terminology and techniques with graphical examples created using R.|
|R and BI (Part 6) :
|Steve Miller||How do you deal with highly skewed data distributions? Standard charting techniques on this “deviant” data often fail to illuminate relationships. This article explains techniques to re-express skewed data so that it is more understandable.|
|The Stock Market, 2007||Steve Miller||R-based dashboards are presented to demonstrate the return performance of various asset classes during 2007.|
|Bootstrapping for Portfolio Returns: The Practice of Statistical Analysis||Steve Miller||Steve uses the R open source stats package and Monte Carlo simulations to examine alternative investment portfolio returns…a good example of applied statistics using R.|
|Statistical Graphs for Portfolio Returns||Steve Miller||Steve uses the R open source stats package to analyze market returns by asset class with some very provocative embedded trellis charts.|
|Frank Harrell, Iowa State and useR!2007||Steve Miller||In August, Steve attended the 2007 Internation R User conference (useR!2007). This article details his experiences, including his meeting with long-time R community expert, Frank Harrell.|
|An Open Source Statistical “Dashboard” for Investment Performance||Steve Miller||The newly launched Dashboard Insight web site is focused on the most useful of BI tools: dashboards. With this article discussing the use of R and trellis graphics, OpenBI brings the realm of open source to this forum.|
|Unsexy Graphics for Business Intelligence||Steve Miller||Utilizing Tufte’s philosophy of maximizing the data to ink ratio of graphics, Steve demonstrates the value in dot plot diagramming. The R open source statistical/analytics software is showcased.|
- brew: Creating Repetitive Reports
brew: Templating Framework for Report Generation brew implements a templating framework for mixing text and R code for report generation. brew template syntax is similar to PHP, Ruby's erb module, Java Server Pages, and Python's psp module. http://bit.ly/jINmaI
- Yarr- creating reports in R
- the formidable Dirk with awesome stock reports
- Jaspersoft 4.1 launched (decisionstats.com)
- rstat.us – a rival to Twitter? (i-programmer.info)