Interview Ajay Ohri Decisionstats.com with DMR

From-

http://www.dataminingblog.com/data-mining-research-interview-ajay-ohri/

Here is the winner of the Data Mining Research People Award 2010: Ajay Ohri! Thanks to Ajay for giving some time to answer Data Mining Research questions. And all the best to his blog, Decision Stat!

Data Mining Research (DMR): Could you please introduce yourself to the readers of Data Mining Research?

Ajay Ohri (AO): I am a business consultant and writer based out of Delhi- India. I have been working in and around the field of business analytics since 2004, and have worked with some very good and big companies primarily in financial analytics and outsourced analytics. Since 2007, I have been writing my blog at http://decisionstats.com which now has almost 10,000 views monthly.

All in all, I wrote about data, and my hobby is also writing (poetry). Both my hobby and my profession stem from my education ( a masters in business, and a bachelors in mechanical engineering).

My research interests in data mining are interfaces (simpler interfaces to enable better data mining), education (making data mining less complex and accessible to more people and students), and time series and regression (specifically ARIMAX)
In business my research interests software marketing strategies (open source, Software as a service, advertising supported versus traditional licensing) and creation of technology and entrepreneurial hubs (like Palo Alto and Research Triangle, or Bangalore India).

DMR: I know you have worked with both SAS and R. Could you give your opinion about these two data mining tools?

AO: As per my understanding, SAS stands for SAS language, SAS Institute and SAS software platform. The terms are interchangeably used by people in industry and academia- but there have been some branding issues on this.
I have not worked much with SAS Enterprise Miner , probably because I could not afford it as business consultant, and organizations I worked with did not have a budget for Enterprise Miner.
I have worked alone and in teams with Base SAS, SAS Stat, SAS Access, and SAS ETS- and JMP. Also I worked with SAS BI but as a user to extract information.
You could say my use of SAS platform was mostly in predictive analytics and reporting, but I have a couple of projects under my belt for knowledge discovery and data mining, and pattern analysis. Again some of my SAS experience is a bit dated for almost 1 year ago.

I really like specific parts of SAS platform – as in the interface design of JMP (which is better than Enterprise Guide or Base SAS ) -and Proc Sort in Base SAS- I guess sequential processing of data makes SAS way faster- though with computing evolving from Desktops/Servers to even cheaper time shared cloud computers- I am not sure how long Base SAS and SAS Stat can hold this unique selling proposition.

I dislike the clutter in SAS Stat output, it confuses me with too much information, and I dislike shoddy graphics in the rendering output of graphical engine of SAS. Its shoddy coding work in SAS/Graph and if JMP can give better graphics why is legacy source code preventing SAS platform from doing a better job of it.

I sometimes think the best part of SAS is actually code written by Goodnight and Sall in 1970’s , the latest procs don’t impress me much.

SAS as a company is something I admire especially for its way of treating employees globally- but it is strange to see the rest of tech industry not following it. Also I don’t like over aggression and the SAS versus Rest of the Analytics /Data Mining World mentality that I sometimes pick up when I deal with industry thought leaders.

I think making SAS Enterprise Miner, JMP, and Base SAS in a completely new web interface priced at per hour rates is my wishlist but I guess I am a bit sentimental here- most data miners I know from early 2000’s did start with SAS as their first bread earning software. Also I think SAS needs to be better priced in Business Intelligence- it seems quite cheap in BI compared to Cognos/IBM but expensive in analytical licensing.

If you are a new stats or business student, chances are – you may know much more R than SAS today. The shift in education at least has been very rapid, and I guess R is also more of a platform than a analytics or data mining software.

I like a lot of things in R- from graphics, to better data mining packages, modular design of software, but above all I like the can do kick ass spirit of R community. Lots of young people collaborating with lots of young to old professors, and the energy is infectious. Everybody is a CEO in R ’s world. Latest data mining algols will probably start in R, published in journals.

Which is better for data mining SAS or R? It depends on your data and your deadline. The golden rule of management and business is -it depends.

Also I have worked with a lot of KXEN, SQL, SPSS.

DMR: Can you tell us more about Decision Stats? You have a traffic of 120′000 for 2010. How did you reach such a success?

AO: I don’t think 120,000 is a success. Its not a failure. It just happened- the more I wrote, the more people read.In 2007-2008 I used to obsess over traffic. I tried SEO, comments, back linking, and I did some black hat experimental stuff. Some of it worked- some didn’t.

In the end, I started asking questions and interviewing people. To my surprise, senior management is almost always more candid , frank and honest about their views while middle managers, public relations, marketing folks can be defensive.

Social Media helped a bit- Twitter, Linkedin, Facebook really helped my network of friends who I suppose acted as informal ambassadors to spread the word.
Again I was constrained by necessity than choices- my middle class finances ( I also had a baby son in 2007-my current laptop still has some broken keys :) – by my inability to afford traveling to conferences, and my location Delhi isn’t really a tech hub.

The more questions I asked around the internet, the more people responded, and I wrote it all down.

I guess I just was lucky to meet a lot of nice people on the internet who took time to mentor and educate me.

I tried building other websites but didn’t succeed so i guess I really don’t know. I am not a smart coder, not very clever at writing but I do try to be honest.

Basic economics says pricing is proportional to demand and inversely proportional to supply. Honest and candid opinions have infinite demand and an uncertain supply.

DMR: There is a rumor about a R book you plan to publish in 2011 :-) Can you confirm the rumor and tell us more?

AO: I just signed a contract with Springer for ” R for Business Analytics”. R is a great software, and lots of books for statistically trained people, but I felt like writing a book for the MBAs and existing analytics users- on how to easily transition to R for Analytics.

Like any language there are tricks and tweaks in R, and with a focus on code editors, IDE, GUI, web interfaces, R’s famous learning curve can be bent a bit.

Making analytics beautiful, and simpler to use is always a passion for me. With 3000 packages, R can be used for a lot more things and a lot more simply than is commonly understood.
The target audience however is business analysts- or people working in corporate environments.

Brief Bio-
Ajay Ohri has been working in the field of analytics since 2004 , when it was a still nascent emerging Industries in India. He has worked with the top two Indian outsourcers listed on NYSE,and with Citigroup on cross sell analytics where he helped sell an extra 50000 credit cards by cross sell analytics .He was one of the very first independent data mining consultants in India working on analytics products and domestic Indian market analytics .He regularly writes on analytics topics on his web site www.decisionstats.com and is currently working on open source analytical tools like R besides analytical software like SPSS and SAS.

How to balance your online advertising and your offline conscience

Google in 1998, showing the original logo
Image via Wikipedia

I recently found an interesting example of  a website that both makes a lot of money and yet is much more efficient than any free or non profit. It is called ECOSIA

If you see a website that wants to balance administrative costs  plus have a transparent way to make the world better- this is a great example.

  • http://ecosia.org/how.php
  • HOW IT WORKS
    You search with Ecosia.
  • Perhaps you click on an interesting sponsored link.
  • The sponsoring company pays Bing or Yahoo for the click.
  • Bing or Yahoo gives the bigger chunk of that money to Ecosia.
  • Ecosia donates at least 80% of this income to support WWF’s work in the Amazon.
  • If you like what we’re doing, help us spread the word!
  • Key facts about the park:

    • World’s largest tropical forest reserve (38,867 square kilometers, or about the size of Switzerland)
    • Home to about 14% of all amphibian species and roughly 54% of all bird species in the Amazon – not to mention large populations of at least eight threatened species, including the jaguar
    • Includes part of the Guiana Shield containing 25% of world’s remaining tropical rainforests – 80 to 90% of which are still pristine
    • Holds the last major unpolluted water reserves in the Neotropics, containing approximately 20% of all of the Earth’s water
    • One of the last tropical regions on Earth vastly unaltered by humans
    • Significant contributor to climatic regulation via heat absorption and carbon storage

     

    http://ecosia.org/statistics.php

    They claim to have donated 141,529.42 EUR !!!

    http://static.ecosia.org/files/donations.pdf

     

     

     

     

     

     

     

     

     

     

    Well suppose you are the Web Admin of a very popular website like Wikipedia or etc

    One way to meet server costs is to say openly hey i need to balance my costs so i need some money.

    The other way is to use online advertising.

    I started mine with Google Adsense.

    Click per milli (or CPM)  gives you a very low low conversion compared to contacting ad sponsor directly.

    But its a great data experiment-

    as you can monitor which companies are likely to be advertised on your site (assume google knows more about their algols than you will)

    which formats -banner or text or flash have what kind of conversion rates

    what are the expected pay off rates from various keywords or companies (like business intelligence software, predictive analytics software and statistical computing software are similar but have different expected returns (if you remember your eco class)

     

    NOW- Based on above data, you know whats your minimum baseline to expect from a private advertiser than a public, crowd sourced search engine one (like Google or Bing)

    Lets say if you have 100000 views monthly. and assume one out of 1000 page views will lead to a click. Say the advertiser will pay you 1 $ for every 1 click (=1000 impressions)

    Then your expected revenue is $100.But if your clicks are priced at 2.5$ for every click , and your click through rate is now 3 out of 1000 impressions- (both very moderate increases that can done by basic placement optimization of ad type, graphics etc)-your new revenue is  750$.

    Be a good Samaritan- you decide to share some of this with your audience -like 4 Amazon books per month ( or I free Amazon book per week)- That gives you a cost of 200$, and leaves you with some 550$.

    Wait! it doesnt end there- Adam Smith‘s invisible hand moves on .

    You say hmm let me put 100 $ for an annual paper writing contest of $1000, donate $200 to one laptop per child ( or to Amazon rain forests or to Haiti etc etc etc), pay $100 to your upgraded server hosting, and put 350$ in online advertising. say $200 for search engines and $150 for Facebook.

    Woah!

    Month 1 would should see more people  visiting you for the first time. If you have a good return rate (returning visitors as a %, and low bounce rate (visits less than 5 secs)- your traffic should see atleast a 20% jump in new arrivals and 5-10 % in long term arrivals. Ignoring bounces- within  three months you will have one of the following

    1) An interesting case study on statistics on online and social media advertising, tangible motivations for increasing community response , and some good data for study

    2) hopefully better cost management of your server expenses

    3)very hopefully a positive cash flow

     

    you could even set a percentage and share the monthly (or annually is better actions) to your readers and advertisers.

    go ahead- change the world!

    the key paradigms here are sharing your traffic and revenue openly to everyone

    donating to a suitable cause

    helping increase awareness of the suitable cause

    basing fixed percentages rather than absolute numbers to ensure your site and cause are sustained for years.

    R is Ready for Business™

    A new 5 page brochure from Revolution Analytics. Not that slick and some marketing under-kill (which frankly is a surprise)- but I guess Revolution Analytics does not have a full time graphics designer to help with it’s collateral.

    Take a look if you are curious how and why R is getting more and more ready for business.

    Using SAS/IML with R

    Analyze That
    Image via Wikipedia

    SAS just released an updated documentation to SAS/IML language with a special chapter devoted to using R

    Here is an example-

    CALL EXPORTMATRIXTOR( IMLMatrix, RMatrix ) ;

    CALL IMPORTMATRIXFROMR( IMLMatrix, RExpr ) ;

    If you have existing SAS licences and existing hardware and loots of data -this may be the best of both worlds- without getting into the mess of technically learning MKL threads/BLAS/Premium Packages/Cloud

    Another thought- its a good professional looking help book, which is what more R packages can do (work on improving ease of their help/update vignettes)

     

    Link-http://support.sas.com/documentation/cdl/en/imlug/63541/HTML/default/viewer.htm#r_toc.htm

     

    Calling Functions in the R Language

    [continuerule]

    Nice BI Tutorials

    Tutorials screenshot.
    Image via Wikipedia

    Here is a set of very nice, screenshot enabled tutorials from SAP BI. They are a bit outdated (3 years old) but most of it is quite relevant- especially from a Tutorial Design Perspective –

    Most people would rather see screenshot based step by step powerpoints, than cluttered or clever presentations , or even videos that force you to sit like a TV zombie. Unfortunately most tutorial presentations I see especially for BI are either slides with one or two points, that abruptly shift to “concepts” or videos that are atleast more than 10 minutes long. That works fine for scripting tutorials or hands on workshops, but cannot be reproduced for later instances of study.

    The mode of tutorials especially for GUI software can vary, it may be Slideshare, Scribd, Google Presentation,Microsoft Powerpoint but a step by step screenshot by screenshot tutorial is much better for understanding than commando line jargon/ Youtub   Videos presentations, or Powerpoint with Points.

    Have a look at these SAP BI 7 slideshares

    and

    Speaking of BI, the R Package called Brew is going to brew up something special especially combined with R Apache. However I wish R Apache, or R Web, or RServe had step by step install screenshot tutorials to increase their usage in Business Intelligence.

    I tried searching for JMP GUI Tutorials too, but I believe putting all your content behind a registration wall is not so great. Do a Pareto Analysis of your training material, surely you can share a couple more tutorials without registration. It also will help new wanna-migrate users to get a test and feel for the installation complexities as well as final report GUI.

     

    Cloud Computing with R

    Illusion of Depth and Space (4/22) - Rotating ...
    Image by Dominic's pics via Flickr

    Here is a short list of resources and material I put together as starting points for R and Cloud Computing It’s a bit messy but overall should serve quite comprehensively.

    Cloud computing is a commonly used expression to imply a generational change in computing from desktop-servers to remote and massive computing connections,shared computers, enabled by high bandwidth across the internet.

    As per the National Institute of Standards and Technology Definition,
    Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.

    (Citation: The NIST Definition of Cloud Computing

    Authors: Peter Mell and Tim Grance
    Version 15, 10-7-09
    National Institute of Standards and Technology, Information Technology Laboratory
    http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc)

    R is an integrated suite of software facilities for data manipulation, calculation and graphical display.

    From http://cran.r-project.org/doc/FAQ/R-FAQ.html#R-Web-Interfaces

    R Web Interfaces

    Rweb is developed and maintained by Jeff Banfield. The Rweb Home Page provides access to all three versions of Rweb—a simple text entry form that returns output and graphs, a more sophisticated JavaScript version that provides a multiple window environment, and a set of point and click modules that are useful for introductory statistics courses and require no knowledge of the R language. All of the Rweb versions can analyze Web accessible datasets if a URL is provided.
    The paper “Rweb: Web-based Statistical Analysis”, providing a detailed explanation of the different versions of Rweb and an overview of how Rweb works, was published in the Journal of Statistical Software (http://www.jstatsoft.org/v04/i01/).

    Ulf Bartel has developed R-Online, a simple on-line programming environment for R which intends to make the first steps in statistical programming with R (especially with time series) as easy as possible. There is no need for a local installation since the only requirement for the user is a JavaScript capable browser. See http://osvisions.com/r-online/ for more information.

    Rcgi is a CGI WWW interface to R by MJ Ray. It had the ability to use “embedded code”: you could mix user input and code, allowing the HTMLauthor to do anything from load in data sets to enter most of the commands for users without writing CGI scripts. Graphical output was possible in PostScript or GIF formats and the executed code was presented to the user for revision. However, it is not clear if the project is still active.

    Currently, a modified version of Rcgi by Mai Zhou (actually, two versions: one with (bitmap) graphics and one without) as well as the original code are available from http://www.ms.uky.edu/~statweb/.

    CGI-based web access to R is also provided at http://hermes.sdu.dk/cgi-bin/go/. There are many additional examples of web interfaces to R which basically allow to submit R code to a remote server, see for example the collection of links available from http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/StatCompCourse.

    David Firth has written CGIwithR, an R add-on package available from CRAN. It provides some simple extensions to R to facilitate running R scripts through the CGI interface to a web server, and allows submission of data using both GET and POST methods. It is easily installed using Apache under Linux and in principle should run on any platform that supports R and a web server provided that the installer has the necessary security permissions. David’s paper “CGIwithR: Facilities for Processing Web Forms Using R” was published in the Journal of Statistical Software (http://www.jstatsoft.org/v08/i10/). The package is now maintained by Duncan Temple Lang and has a web page athttp://www.omegahat.org/CGIwithR/.

    Rpad, developed and actively maintained by Tom Short, provides a sophisticated environment which combines some of the features of the previous approaches with quite a bit of JavaScript, allowing for a GUI-like behavior (with sortable tables, clickable graphics, editable output), etc.
    Jeff Horner is working on the R/Apache Integration Project which embeds the R interpreter inside Apache 2 (and beyond). A tutorial and presentation are available from the project web page at http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RApacheProject.

    Rserve is a project actively developed by Simon Urbanek. It implements a TCP/IP server which allows other programs to use facilities of R. Clients are available from the web site for Java and C++ (and could be written for other languages that support TCP/IP sockets).

    OpenStatServer is being developed by a team lead by Greg Warnes; it aims “to provide clean access to computational modules defined in a variety of computational environments (R, SAS, Matlab, etc) via a single well-defined client interface” and to turn computational services into web services.

    Two projects use PHP to provide a web interface to R. R_PHP_Online by Steve Chen (though it is unclear if this project is still active) is somewhat similar to the above Rcgi and Rweb. R-php is actively developed by Alfredo Pontillo and Angelo Mineo and provides both a web interface to R and a set of pre-specified analyses that need no R code input.

    webbioc is “an integrated web interface for doing microarray analysis using several of the Bioconductor packages” and is designed to be installed at local sites as a shared computing resource.

    Rwui is a web application to create user-friendly web interfaces for R scripts. All code for the web interface is created automatically. There is no need for the user to do any extra scripting or learn any new scripting techniques. Rwui can also be found at http://rwui.cryst.bbk.ac.uk.

    Finally, the R.rsp package by Henrik Bengtsson introduces “R Server Pages”. Analogous to Java Server Pages, an R server page is typically HTMLwith embedded R code that gets evaluated when the page is requested. The package includes an internal cross-platform HTTP server implemented in Tcl, so provides a good framework for including web-based user interfaces in packages. The approach is similar to the use of the brew package withRapache with the advantage of cross-platform support and easy installation.

    Also additional R Cloud Computing Use Cases
    http://wwwdev.ebi.ac.uk/Tools/rcloud/

    ArrayExpress R/Bioconductor Workbench

    Remote access to R/Bioconductor on EBI’s 64-bit Linux Cluster

    Start the workbench by downloading the package for your operating system (Macintosh or Windows), or via Java Web Start, and you will get access to an instance of R running on one of EBI’s powerful machines. You can install additional packages, upload your own data, work with graphics and collaborate with colleagues, all as if you are running R locally, but unlimited by your machine’s memory, processor or data storage capacity.

    • Most up-to-date R version built for multicore CPUs
    • Access to all Bioconductor packages
    • Access to our computing infrastructure
    • Fast access to data stored in EBI’s repositories (e.g., public microarray data in ArrayExpress)

    Using R Google Docs
    http://www.omegahat.org/RGoogleDocs/run.pdf
    It uses the XML and RCurl packages and illustrates that it is relatively quick and easy
    to use their primitives to interact with Web services.

    Using R with Amazon
    Citation
    http://rgrossman.com/2009/05/17/running-r-on-amazons-ec2/

    Amazon’s EC2 is a type of cloud that provides on demand computing infrastructures called an Amazon Machine Images or AMIs. In general, these types of cloud provide several benefits:

    • Simple and convenient to use. An AMI contains your applications, libraries, data and all associated configuration settings. You simply access it. You don’t need to configure it. This applies not only to applications like R, but also can include any third-party data that you require.
    • On-demand availability. AMIs are available over the Internet whenever you need them. You can configure the AMIs yourself without involving the service provider. You don’t need to order any hardware and set it up.
    • Elastic access. With elastic access, you can rapidly provision and access the additional resources you need. Again, no human intervention from the service provider is required. This type of elastic capacity can be used to handle surge requirements when you might need many machines for a short time in order to complete a computation.
    • Pay per use. The cost of 1 AMI for 100 hours and 100 AMI for 1 hour is the same. With pay per use pricing, which is sometimes called utility pricing, you simply pay for the resources that you use.

    Connecting to R on Amazon EC2- Detailed tutorials
    Ubuntu Linux version
    https://decisionstats.com/2010/09/25/running-r-on-amazon-ec2/
    and Windows R version
    https://decisionstats.com/2010/10/02/running-r-on-amazon-ec2-windows/

    Connecting R to Data on Google Storage and Computing on Google Prediction API
    https://github.com/onertipaday/predictionapirwrapper
    R wrapper for working with Google Prediction API

    This package consists in a bunch of functions allowing the user to test Google Prediction API from R.
    It requires the user to have access to both Google Storage for Developers and Google Prediction API:
    see
    http://code.google.com/apis/storage/ and http://code.google.com/apis/predict/ for details.

    Example usage:

    #This example requires you had previously created a bucket named data_language on your Google Storage and you had uploaded a CSV file named language_id.txt (your data) into this bucket – see for details
    library(predictionapirwrapper)

    and Elastic R for Cloud Computing
    http://user2010.org/tutorials/Chine.html

    Abstract

    Elastic-R is a new portal built using the Biocep-R platform. It enables statisticians, computational scientists, financial analysts, educators and students to use cloud resources seamlessly; to work with R engines and use their full capabilities from within simple browsers; to collaborate, share and reuse functions, algorithms, user interfaces, R sessions, servers; and to perform elastic distributed computing with any number of virtual machines to solve computationally intensive problems.
    Also see Karim Chine’s http://biocep-distrib.r-forge.r-project.org/

    R for Salesforce.com

    At the point of writing this, there seem to be zero R based apps on Salesforce.com This could be a big opportunity for developers as both Apex and R have similar structures Developers could write free code in R and charge for their translated version in Apex on Salesforce.com

    Force.com and Salesforce have many (1009) apps at
    http://sites.force.com/appexchange/home for cloud computing for
    businesses, but very few forecasting and statistical simulation apps.

    Example of Monte Carlo based app is here
    http://sites.force.com/appexchange/listingDetail?listingId=a0N300000016cT9EAI#

    These are like iPhone apps except meant for business purposes (I am
    unaware if any university is offering salesforce.com integration
    though google apps and amazon related research seems to be on)

    Force.com uses a language called Apex  and you can see
    http://wiki.developerforce.com/index.php/App_Logic and
    http://wiki.developerforce.com/index.php/An_Introduction_to_Formulas
    Apex is similar to R in that is OOPs

    SAS Institute has an existing product for taking in Salesforce.com data.

    A new SAS data surveyor is
    available to access data from the Customer Relationship Management
    (CRM) software vendor Salesforce.com. at
    http://support.sas.com/documentation/cdl/en/whatsnew/62580/HTML/default/viewer.htm#datasurveyorwhatsnew902.htm)

    Personal Note-Mentioning SAS in an email to a R list is a big no-no in terms of getting a response and love. Same for being careless about which R help list to email (like R devel or R packages or R help)

    For python based cloud see http://pi-cloud.com

    R Apache – The next frontier of R Computing

    I am currently playing/ trying out RApache- one more excellent R product from Vanderbilt’s excellent Dept of Biostatistics and it’s prodigious coder Jeff Horner.

    The big ninja himself

    I really liked the virtual machine idea- you can download a virtual image of Rapache and play with it- .vmx is easy to create and great to share-

    http://rapache.net/vm.html

    Basically using R Apache (with an EC2 on backend) can help you create customized dashboards, BI apps, etc all using R’s graphical and statistical capabilities.

    What’s R Apache?

    As  per

    http://biostat.mc.vanderbilt.edu/wiki/Main/RapacheWebServicesReport

    Rapache embeds the R interpreter inside the Apache 2 web server. By doing this, Rapache realizes the full potential of R and its facilities over the web. R programmers configure appache by mapping Universal Resource Locaters (URL’s) to either R scripts or R functions. The R code relies on CGI variables to read a client request and R’s input/output facilities to write the response.

    One advantage to Rapache’s architecture is robust multi-process management by Apache. In contrast to Rserve and RSOAP, Rapache is a pre-fork server utilizing HTTP as the communications protocol. Another advantage is a clear separation, a loose coupling, of R code from client code. With Rserve and RSOAP, the client must send data and R commands to be executed on the server. With Rapache the only client requirements are the ability to communicate via HTTP. Additionally, Rapache gains significant authentication, authorization, and encryption mechanism by virtue of being embedded in Apache.

    Existing Demos of Architechture based on R Apache-

    1. http://rweb.stat.ucla.edu/ggplot2/ An interactive web dashboard for plotting graphics based on csv or Google Spreadsheet Data
    2. http://labs.dataspora.com/gameday/ A demo visualization of a web based dashboard system of baseball pitches by pitcher by player 

     

     

     

     

     

     

     

    3. http://data.vanderbilt.edu/rapache/bbplot For baseball results – a demo of a query based web dashboard system- very good BI feel.

    Whats coming next in R Apache?

    You can  download version 1.1.10 of rApache now. There
    are only two significant changes and you don’t have to edit your
    apache config or change any code (just recompile rApache and
    reinstall):

    1) Error reporting should be more informative. both when you
    accidentally introduce errors in the Apache config, and when your code
    introduces warnings and errors from web requests.

    I’ve struggled with this one for awhile, not really knowing what
    strategy would be best. Basically, rApache hooks into the R I/O layer
    at such a low level that it’s hard to capture all warnings and errors
    as they occur and introduce them to the user in a sane manner. In
    prior releases, when ROutputErrors was in effect (either the apache
    directive or the R function) one would typically see a bunch of grey
    boxes with a red outline with a title of RApache Warning/Error!!!.
    Unfortunately those grey boxes could contain empty lines, one line of
    error, or a few that relate to the lines in previously displayed
    boxes. Really a big uninformative mess.

    The new approach is to print just one warning box with the title
    “”Oops!!! <b>rApache</b> has something to tell you. View source and
    read the HTML comments at the end.” and then as the title implies you
    can read the HTML comment located at the end of the file… after the
    closing html. That way, you’re actually reading how R would present
    the warnings and errors to you as if you executed the code at the R
    command prompt. And if you don’t use ROutputErrors, the warning/error
    messages are printed in the Apache log file, just as they were before,
    but nicer 😉

    2) Code dispatching has changed so please let me know if I’ve
    introduced any strange behavior.

    This was necessary to enhance error reporting. Prior to this release,
    rApache would use R’s C API exclusively to build up the call to your
    code that is then passed to R’s evaluation engine. The advantage to
    this approach is that it’s much more efficient as there is no parsing
    involved, however all information about parse errors, files which
    produced errors, etc. were lost. The new approach uses R’s built-in
    parse function to build up the call and then passes it of to R. A
    slight overhead, but it should be negligible. So, if you feel that
    this approach is too slow OR I’ve introduced bugs or strange behavior,
    please let me know.

    FUTURE PLANS

    I’m gaining more experience building Debian/Ubuntu packages each day,
    so hopefully by some time in 2011 you can rely on binary releases for
    these distributions and not install rApache from source! Fingers
    crossed!

    Development on the rApache 1.1 branch will be winding down (save bug
    fix releases) as I transition to the 1.2 branch. This will involve
    taking out a small chunk of code that defines the rApache development
    environment (all the CGI variables and the functions such as
    setHeader, setCookie, etc) and placing it in its own R package…
    unnamed as of yet. This is to facilitate my development of the ralite
    R package, a small single user cross-platform web server.

    The goal for ralite is to speed up development of R web applications,
    take out a bit of friction in the development process by not having to
    run the full rApache server. Plus it would allow users to develop in
    the rApache enronment while on windows and later deploy on more
    capable server environments. The secondary goal for ralite is it’s use
    in other web server environments (nginx and IIS come to mind) as a
    persistent per-client process.

    And finally, wiki.rapache.net will be the new www.rapache.net once I
    translate the manual over… any day now.

    From –http://biostat.mc.vanderbilt.edu/wiki/Main/JeffreyHorner

     

     

    Not convinced ?- try the demos above.