Ways to use both Windows and Linux together

Tux, as originally drawn by Larry Ewing
Image via Wikipedia

Some programming ways to use both Windows and Linux

1) Wubi

http://wubi.sourceforge.net/

Wubi only adds an extra option to boot into Ubuntu. Wubi does not require you to modify the partitions of your PC, or to use a different bootloader, and does not install special drivers.

2) Wine

Wine lets you run Windows software on other operating systems. With Wine, you can install and run these applications just like you would in Windows. Read more at http://wiki.winehq.org/Debunking_Wine_Myths

http://www.winehq.org/about/

3) Cygwin

http://www.cygwin.com/

Cygwin is a Linux-like environment for Windows. It consists of two parts:

  • A DLL (cygwin1.dll) which acts as a Linux API emulation layer providing substantial Linux API functionality.
  • A collection of tools which provide Linux look and feel
  • What Isn’t Cygwin?

  • Cygwin is not a way to run native linux apps on Windows. You have to rebuild your application from source if you want it to run on Windows.
  • Cygwin is not a way to magically make native Windows apps aware of UNIX ® functionality, like signals, ptys, etc. Again, you need to build your apps from source if you want to take advantage of Cygwin functionality.
  • 4) Vmplayer

    https://www.vmware.com/products/player/

    VMware Player is the easiest way to run multiple operating systems at the same time on your PC. With its user-friendly interface, VMware Player makes it effortless for anyone to try out Windows 7, Chrome OS or the latest Linux releases, or create isolated virtual machines to safely test new software and surf the Web

    Choosing R for business – What to consider?

    A composite of the GNU logo and the OSI logo, ...
    Image via Wikipedia

    Additional features in R over other analytical packages-

    1) Source Code is given to ensure complete custom solution and embedding for a particular application. Open source code has an advantage that is extensively peer- reviewed in Journals and Scientific Literature.  This means bugs will found, shared and corrected transparently.

    2) Wide literature of training material in the form of books is available for the R analytical platform.

    3) Extensively the best data visualization tools in analytical software (apart from Tableau Software ‘s latest version). The extensive data visualization available in R is of the form a variety of customizable graphs, as well as animation. The principal reason third-party software initially started creating interfaces to R is because the graphical library of packages in R is more advanced as well as rapidly getting more features by the day.

    4) Free in upfront license cost for academics and thus budget friendly for small and large analytical teams.

    5) Flexible programming for your data environment. This includes having packages that ensure compatibility with Java, Python and C++.

     

    6) Easy migration from other analytical platforms to R Platform. It is relatively easy for a non R platform user to migrate to R platform and there is no danger of vendor lock-in due to the GPL nature of source code and open community.

    Statistics are numbers that tell (descriptive), advise ( prescriptive) or forecast (predictive). Analytics is a decision-making help tool. Analytics on which no decision is to be made or is being considered can be classified as purely statistical and non analytical. Thus ease of making a correct decision separates a good analytical platform from a not so good analytical platform. The distinction is likely to be disputed by people of either background- and business analysis requires more emphasis on how practical or actionable the results are and less emphasis on the statistical metrics in a particular data analysis task. I believe one clear reason between business analytics is different from statistical analysis is the cost of perfect information (data costs in real world) and the opportunity cost of delayed and distorted decision-making.

    Specific to the following domains R has the following costs and benefits

    • Business Analytics
      • R is free per license and for download
      • It is one of the few analytical platforms that work on Mac OS
      • It’s results are credibly established in both journals like Journal of Statistical Software and in the work at LinkedIn, Google and Facebook’s analytical teams.
      • It has open source code for customization as per GPL
      • It also has a flexible option for commercial vendors like Revolution Analytics (who support 64 bit windows) as well as bigger datasets
      • It has interfaces from almost all other analytical software including SAS,SPSS, JMP, Oracle Data Mining, Rapid Miner. Existing license holders can thus invoke and use R from within these software
      • Huge library of packages for regression, time series, finance and modeling
      • High quality data visualization packages
      • Data Mining
        • R as a computing platform is better suited to the needs of data mining as it has a vast array of packages covering standard regression, decision trees, association rules, cluster analysis, machine learning, neural networks as well as exotic specialized algorithms like those based on chaos models.
        • Flexibility in tweaking a standard algorithm by seeing the source code
        • The RATTLE GUI remains the standard GUI for Data Miners using R. It was created and developed in Australia.
        • Business Dashboards and Reporting
        • Business Dashboards and Reporting are an essential piece of Business Intelligence and Decision making systems in organizations. R offers data visualization through GGPLOT, and GUI like Deducer and Red-R can help even non R users create a metrics dashboard
          • For online Dashboards- R has packages like RWeb, RServe and R Apache- which in combination with data visualization packages offer powerful dashboard capabilities.
          • R can be combined with MS Excel using the R Excel package – to enable R capabilities to be imported within Excel. Thus a MS Excel user with no knowledge of R can use the GUI within the R Excel plug-in to use powerful graphical and statistical capabilities.

    Additional factors to consider in your R installation-

    There are some more choices awaiting you now-
    1) Licensing Choices-Academic Version or Free Version or Enterprise Version of R

    2) Operating System Choices-Which Operating System to choose from? Unix, Windows or Mac OS.

    3) Operating system sub choice- 32- bit or 64 bit.

    4) Hardware choices-Cost -benefit trade-offs for additional hardware for R. Choices between local ,cluster and cloud computing.

    5) Interface choices-Command Line versus GUI? Which GUI to choose as the default start-up option?

    6) Software component choice- Which packages to install? There are almost 3000 packages, some of them are complimentary, some are dependent on each other, and almost all are free.

    7) Additional Software choices- Which additional software do you need to achieve maximum accuracy, robustness and speed of computing- and how to use existing legacy software and hardware for best complementary results with R.

    1) Licensing Choices-
    You can choose between two kinds of R installations – one is free and open source from http://r-project.org The other R installation is commercial and is offered by many vendors including Revolution Analytics. However there are other commercial vendors too.

    Commercial Vendors of R Language Products-
    1) Revolution Analytics http://www.revolutionanalytics.com/
    2) XL Solutions- http://www.experience-rplus.com/
    3) Information Builder – Webfocus RStat -Rattle GUI http://www.informationbuilders.com/products/webfocus/PredictiveModeling.html
    4) Blue Reference- Inference for R http://inferenceforr.com/default.aspx

    1. Choosing Operating System
        1. Windows

     

    Windows remains the most widely used operating system on this planet. If you are experienced in Windows based computing and are active on analytical projects- it would not make sense for you to move to other operating systems. This is also based on the fact that compatibility problems are minimum for Microsoft Windows and the help is extensively documented. However there may be some R packages that would not function well under Windows- if that happens a multiple operating system is your next option.

          1. Enterprise R from Revolution Analytics- Enterprise R from Revolution Analytics has a complete R Development environment for Windows including the use of code snippets to make programming faster. Revolution is also expected to make a GUI available by 2011. Revolution Analytics claims several enhancements for it’s version of R including the use of optimized libraries for faster performance.
        1. MacOS

     

    Reasons for choosing MacOS remains its considerable appeal in aesthetically designed software- but MacOS is not a standard Operating system for enterprise systems as well as statistical computing. However open source R claims to be quite optimized and it can be used for existing Mac users. However there seem to be no commercially available versions of R available as of now for this operating system.

        1. Linux

     

          1. Ubuntu
          2. Red Hat Enterprise Linux
          3. Other versions of Linux

     

    Linux is considered a preferred operating system by R users due to it having the same open source credentials-much better fit for all R packages and it’s customizability for big data analytics.

    Ubuntu Linux is recommended for people making the transition to Linux for the first time. Ubuntu Linux had an marketing agreement with revolution Analytics for an earlier version of Ubuntu- and many R packages can  installed in a straightforward way as Ubuntu/Debian packages are available. Red Hat Enterprise Linux is officially supported by Revolution Analytics for it’s enterprise module. Other versions of Linux popular are Open SUSE.

        1. Multiple operating systems-
          1. Virtualization vs Dual Boot-

     

    You can also choose between having a VMware VM Player for a virtual partition on your computers that is dedicated to R based computing or having operating system choice at the startup or booting of your computer. A software program called wubi helps with the dual installation of Linux and Windows.

    1. 64 bit vs 32 bit – Given a choice between 32 bit versus 64 bit versions of the same operating system like Linux Ubuntu, the 64 bit version would speed up processing by an approximate factor of 2. However you need to check whether your current hardware can support 64 bit operating systems and if so- you may want to ask your Information Technology manager to upgrade atleast some operating systems in your analytics work environment to 64 bit operating systems.

     

    1. Hardware choices- At the time of writing this book, the dominant computing paradigm is workstation computing followed by server-client computing. However with the introduction of cloud computing, netbooks, tablet PCs, hardware choices are much more flexible in 2011 than just a couple of years back.

    Hardware costs are a significant cost to an analytics environment and are also  remarkably depreciated over a short period of time. You may thus examine your legacy hardware, and your future analytical computing needs- and accordingly decide between the various hardware options available for R.
    Unlike other analytical software which can charge by number of processors, or server pricing being higher than workstation pricing and grid computing pricing extremely high if available- R is well suited for all kinds of hardware environment with flexible costs. Given the fact that R is memory intensive (it limits the size of data analyzed to the RAM size of the machine unless special formats and /or chunking is used)- it depends on size of datasets used and number of concurrent users analyzing the dataset. Thus the defining issue is not R but size of the data being analyzed.

      1. Local Computing- This is meant to denote when the software is installed locally. For big data the data to be analyzed would be stored in the form of databases.
        1. Server version- Revolution Analytics has differential pricing for server -client versions but for the open source version it is free and the same for Server or Workstation versions.
        2. Workstation
      2. Cloud Computing- Cloud computing is defined as the delivery of data, processing, systems via remote computers. It is similar to server-client computing but the remote server (also called cloud) has flexible computing in terms of number of processors, memory, and data storage. Cloud computing in the form of public cloud enables people to do analytical tasks on massive datasets without investing in permanent hardware or software as most public clouds are priced on pay per usage. The biggest cloud computing provider is Amazon and many other vendors provide services on top of it. Google is also coming for data storage in the form of clouds (Google Storage), as well as using machine learning in the form of API (Google Prediction API)
        1. Amazon
        2. Google
        3. Cluster-Grid Computing/Parallel processing- In order to build a cluster, you would need the RMpi and the SNOW packages, among other packages that help with parallel processing.
      3. How much resources
        1. RAM-Hard Disk-Processors- for workstation computing
        2. Instances or API calls for cloud computing
    1. Interface Choices
      1. Command Line
      2. GUI
      3. Web Interfaces
    2. Software Component Choices
      1. R dependencies
      2. Packages to install
      3. Recommended Packages
    3. Additional software choices
      1. Additional legacy software
      2. Optimizing your R based computing
      3. Code Editors
        1. Code Analyzers
        2. Libraries to speed up R

    citation-  R Development Core Team (2010). R: A language and environment for statistical computing. R Foundation for Statistical Computing,Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.

    (Note- this is a draft in progress)

    Cloud Computing with R

    Illusion of Depth and Space (4/22) - Rotating ...
    Image by Dominic's pics via Flickr

    Here is a short list of resources and material I put together as starting points for R and Cloud Computing It’s a bit messy but overall should serve quite comprehensively.

    Cloud computing is a commonly used expression to imply a generational change in computing from desktop-servers to remote and massive computing connections,shared computers, enabled by high bandwidth across the internet.

    As per the National Institute of Standards and Technology Definition,
    Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.

    (Citation: The NIST Definition of Cloud Computing

    Authors: Peter Mell and Tim Grance
    Version 15, 10-7-09
    National Institute of Standards and Technology, Information Technology Laboratory
    http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc)

    R is an integrated suite of software facilities for data manipulation, calculation and graphical display.

    From http://cran.r-project.org/doc/FAQ/R-FAQ.html#R-Web-Interfaces

    R Web Interfaces

    Rweb is developed and maintained by Jeff Banfield. The Rweb Home Page provides access to all three versions of Rweb—a simple text entry form that returns output and graphs, a more sophisticated JavaScript version that provides a multiple window environment, and a set of point and click modules that are useful for introductory statistics courses and require no knowledge of the R language. All of the Rweb versions can analyze Web accessible datasets if a URL is provided.
    The paper “Rweb: Web-based Statistical Analysis”, providing a detailed explanation of the different versions of Rweb and an overview of how Rweb works, was published in the Journal of Statistical Software (http://www.jstatsoft.org/v04/i01/).

    Ulf Bartel has developed R-Online, a simple on-line programming environment for R which intends to make the first steps in statistical programming with R (especially with time series) as easy as possible. There is no need for a local installation since the only requirement for the user is a JavaScript capable browser. See http://osvisions.com/r-online/ for more information.

    Rcgi is a CGI WWW interface to R by MJ Ray. It had the ability to use “embedded code”: you could mix user input and code, allowing the HTMLauthor to do anything from load in data sets to enter most of the commands for users without writing CGI scripts. Graphical output was possible in PostScript or GIF formats and the executed code was presented to the user for revision. However, it is not clear if the project is still active.

    Currently, a modified version of Rcgi by Mai Zhou (actually, two versions: one with (bitmap) graphics and one without) as well as the original code are available from http://www.ms.uky.edu/~statweb/.

    CGI-based web access to R is also provided at http://hermes.sdu.dk/cgi-bin/go/. There are many additional examples of web interfaces to R which basically allow to submit R code to a remote server, see for example the collection of links available from http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/StatCompCourse.

    David Firth has written CGIwithR, an R add-on package available from CRAN. It provides some simple extensions to R to facilitate running R scripts through the CGI interface to a web server, and allows submission of data using both GET and POST methods. It is easily installed using Apache under Linux and in principle should run on any platform that supports R and a web server provided that the installer has the necessary security permissions. David’s paper “CGIwithR: Facilities for Processing Web Forms Using R” was published in the Journal of Statistical Software (http://www.jstatsoft.org/v08/i10/). The package is now maintained by Duncan Temple Lang and has a web page athttp://www.omegahat.org/CGIwithR/.

    Rpad, developed and actively maintained by Tom Short, provides a sophisticated environment which combines some of the features of the previous approaches with quite a bit of JavaScript, allowing for a GUI-like behavior (with sortable tables, clickable graphics, editable output), etc.
    Jeff Horner is working on the R/Apache Integration Project which embeds the R interpreter inside Apache 2 (and beyond). A tutorial and presentation are available from the project web page at http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RApacheProject.

    Rserve is a project actively developed by Simon Urbanek. It implements a TCP/IP server which allows other programs to use facilities of R. Clients are available from the web site for Java and C++ (and could be written for other languages that support TCP/IP sockets).

    OpenStatServer is being developed by a team lead by Greg Warnes; it aims “to provide clean access to computational modules defined in a variety of computational environments (R, SAS, Matlab, etc) via a single well-defined client interface” and to turn computational services into web services.

    Two projects use PHP to provide a web interface to R. R_PHP_Online by Steve Chen (though it is unclear if this project is still active) is somewhat similar to the above Rcgi and Rweb. R-php is actively developed by Alfredo Pontillo and Angelo Mineo and provides both a web interface to R and a set of pre-specified analyses that need no R code input.

    webbioc is “an integrated web interface for doing microarray analysis using several of the Bioconductor packages” and is designed to be installed at local sites as a shared computing resource.

    Rwui is a web application to create user-friendly web interfaces for R scripts. All code for the web interface is created automatically. There is no need for the user to do any extra scripting or learn any new scripting techniques. Rwui can also be found at http://rwui.cryst.bbk.ac.uk.

    Finally, the R.rsp package by Henrik Bengtsson introduces “R Server Pages”. Analogous to Java Server Pages, an R server page is typically HTMLwith embedded R code that gets evaluated when the page is requested. The package includes an internal cross-platform HTTP server implemented in Tcl, so provides a good framework for including web-based user interfaces in packages. The approach is similar to the use of the brew package withRapache with the advantage of cross-platform support and easy installation.

    Also additional R Cloud Computing Use Cases
    http://wwwdev.ebi.ac.uk/Tools/rcloud/

    ArrayExpress R/Bioconductor Workbench

    Remote access to R/Bioconductor on EBI’s 64-bit Linux Cluster

    Start the workbench by downloading the package for your operating system (Macintosh or Windows), or via Java Web Start, and you will get access to an instance of R running on one of EBI’s powerful machines. You can install additional packages, upload your own data, work with graphics and collaborate with colleagues, all as if you are running R locally, but unlimited by your machine’s memory, processor or data storage capacity.

    • Most up-to-date R version built for multicore CPUs
    • Access to all Bioconductor packages
    • Access to our computing infrastructure
    • Fast access to data stored in EBI’s repositories (e.g., public microarray data in ArrayExpress)

    Using R Google Docs
    http://www.omegahat.org/RGoogleDocs/run.pdf
    It uses the XML and RCurl packages and illustrates that it is relatively quick and easy
    to use their primitives to interact with Web services.

    Using R with Amazon
    Citation
    http://rgrossman.com/2009/05/17/running-r-on-amazons-ec2/

    Amazon’s EC2 is a type of cloud that provides on demand computing infrastructures called an Amazon Machine Images or AMIs. In general, these types of cloud provide several benefits:

    • Simple and convenient to use. An AMI contains your applications, libraries, data and all associated configuration settings. You simply access it. You don’t need to configure it. This applies not only to applications like R, but also can include any third-party data that you require.
    • On-demand availability. AMIs are available over the Internet whenever you need them. You can configure the AMIs yourself without involving the service provider. You don’t need to order any hardware and set it up.
    • Elastic access. With elastic access, you can rapidly provision and access the additional resources you need. Again, no human intervention from the service provider is required. This type of elastic capacity can be used to handle surge requirements when you might need many machines for a short time in order to complete a computation.
    • Pay per use. The cost of 1 AMI for 100 hours and 100 AMI for 1 hour is the same. With pay per use pricing, which is sometimes called utility pricing, you simply pay for the resources that you use.

    Connecting to R on Amazon EC2- Detailed tutorials
    Ubuntu Linux version
    https://decisionstats.com/2010/09/25/running-r-on-amazon-ec2/
    and Windows R version
    https://decisionstats.com/2010/10/02/running-r-on-amazon-ec2-windows/

    Connecting R to Data on Google Storage and Computing on Google Prediction API
    https://github.com/onertipaday/predictionapirwrapper
    R wrapper for working with Google Prediction API

    This package consists in a bunch of functions allowing the user to test Google Prediction API from R.
    It requires the user to have access to both Google Storage for Developers and Google Prediction API:
    see
    http://code.google.com/apis/storage/ and http://code.google.com/apis/predict/ for details.

    Example usage:

    #This example requires you had previously created a bucket named data_language on your Google Storage and you had uploaded a CSV file named language_id.txt (your data) into this bucket – see for details
    library(predictionapirwrapper)

    and Elastic R for Cloud Computing
    http://user2010.org/tutorials/Chine.html

    Abstract

    Elastic-R is a new portal built using the Biocep-R platform. It enables statisticians, computational scientists, financial analysts, educators and students to use cloud resources seamlessly; to work with R engines and use their full capabilities from within simple browsers; to collaborate, share and reuse functions, algorithms, user interfaces, R sessions, servers; and to perform elastic distributed computing with any number of virtual machines to solve computationally intensive problems.
    Also see Karim Chine’s http://biocep-distrib.r-forge.r-project.org/

    R for Salesforce.com

    At the point of writing this, there seem to be zero R based apps on Salesforce.com This could be a big opportunity for developers as both Apex and R have similar structures Developers could write free code in R and charge for their translated version in Apex on Salesforce.com

    Force.com and Salesforce have many (1009) apps at
    http://sites.force.com/appexchange/home for cloud computing for
    businesses, but very few forecasting and statistical simulation apps.

    Example of Monte Carlo based app is here
    http://sites.force.com/appexchange/listingDetail?listingId=a0N300000016cT9EAI#

    These are like iPhone apps except meant for business purposes (I am
    unaware if any university is offering salesforce.com integration
    though google apps and amazon related research seems to be on)

    Force.com uses a language called Apex  and you can see
    http://wiki.developerforce.com/index.php/App_Logic and
    http://wiki.developerforce.com/index.php/An_Introduction_to_Formulas
    Apex is similar to R in that is OOPs

    SAS Institute has an existing product for taking in Salesforce.com data.

    A new SAS data surveyor is
    available to access data from the Customer Relationship Management
    (CRM) software vendor Salesforce.com. at
    http://support.sas.com/documentation/cdl/en/whatsnew/62580/HTML/default/viewer.htm#datasurveyorwhatsnew902.htm)

    Personal Note-Mentioning SAS in an email to a R list is a big no-no in terms of getting a response and love. Same for being careless about which R help list to email (like R devel or R packages or R help)

    For python based cloud see http://pi-cloud.com

    R Apache – The next frontier of R Computing

    I am currently playing/ trying out RApache- one more excellent R product from Vanderbilt’s excellent Dept of Biostatistics and it’s prodigious coder Jeff Horner.

    The big ninja himself

    I really liked the virtual machine idea- you can download a virtual image of Rapache and play with it- .vmx is easy to create and great to share-

    http://rapache.net/vm.html

    Basically using R Apache (with an EC2 on backend) can help you create customized dashboards, BI apps, etc all using R’s graphical and statistical capabilities.

    What’s R Apache?

    As  per

    http://biostat.mc.vanderbilt.edu/wiki/Main/RapacheWebServicesReport

    Rapache embeds the R interpreter inside the Apache 2 web server. By doing this, Rapache realizes the full potential of R and its facilities over the web. R programmers configure appache by mapping Universal Resource Locaters (URL’s) to either R scripts or R functions. The R code relies on CGI variables to read a client request and R’s input/output facilities to write the response.

    One advantage to Rapache’s architecture is robust multi-process management by Apache. In contrast to Rserve and RSOAP, Rapache is a pre-fork server utilizing HTTP as the communications protocol. Another advantage is a clear separation, a loose coupling, of R code from client code. With Rserve and RSOAP, the client must send data and R commands to be executed on the server. With Rapache the only client requirements are the ability to communicate via HTTP. Additionally, Rapache gains significant authentication, authorization, and encryption mechanism by virtue of being embedded in Apache.

    Existing Demos of Architechture based on R Apache-

    1. http://rweb.stat.ucla.edu/ggplot2/ An interactive web dashboard for plotting graphics based on csv or Google Spreadsheet Data
    2. http://labs.dataspora.com/gameday/ A demo visualization of a web based dashboard system of baseball pitches by pitcher by player 

     

     

     

     

     

     

     

    3. http://data.vanderbilt.edu/rapache/bbplot For baseball results – a demo of a query based web dashboard system- very good BI feel.

    Whats coming next in R Apache?

    You can  download version 1.1.10 of rApache now. There
    are only two significant changes and you don’t have to edit your
    apache config or change any code (just recompile rApache and
    reinstall):

    1) Error reporting should be more informative. both when you
    accidentally introduce errors in the Apache config, and when your code
    introduces warnings and errors from web requests.

    I’ve struggled with this one for awhile, not really knowing what
    strategy would be best. Basically, rApache hooks into the R I/O layer
    at such a low level that it’s hard to capture all warnings and errors
    as they occur and introduce them to the user in a sane manner. In
    prior releases, when ROutputErrors was in effect (either the apache
    directive or the R function) one would typically see a bunch of grey
    boxes with a red outline with a title of RApache Warning/Error!!!.
    Unfortunately those grey boxes could contain empty lines, one line of
    error, or a few that relate to the lines in previously displayed
    boxes. Really a big uninformative mess.

    The new approach is to print just one warning box with the title
    “”Oops!!! <b>rApache</b> has something to tell you. View source and
    read the HTML comments at the end.” and then as the title implies you
    can read the HTML comment located at the end of the file… after the
    closing html. That way, you’re actually reading how R would present
    the warnings and errors to you as if you executed the code at the R
    command prompt. And if you don’t use ROutputErrors, the warning/error
    messages are printed in the Apache log file, just as they were before,
    but nicer 😉

    2) Code dispatching has changed so please let me know if I’ve
    introduced any strange behavior.

    This was necessary to enhance error reporting. Prior to this release,
    rApache would use R’s C API exclusively to build up the call to your
    code that is then passed to R’s evaluation engine. The advantage to
    this approach is that it’s much more efficient as there is no parsing
    involved, however all information about parse errors, files which
    produced errors, etc. were lost. The new approach uses R’s built-in
    parse function to build up the call and then passes it of to R. A
    slight overhead, but it should be negligible. So, if you feel that
    this approach is too slow OR I’ve introduced bugs or strange behavior,
    please let me know.

    FUTURE PLANS

    I’m gaining more experience building Debian/Ubuntu packages each day,
    so hopefully by some time in 2011 you can rely on binary releases for
    these distributions and not install rApache from source! Fingers
    crossed!

    Development on the rApache 1.1 branch will be winding down (save bug
    fix releases) as I transition to the 1.2 branch. This will involve
    taking out a small chunk of code that defines the rApache development
    environment (all the CGI variables and the functions such as
    setHeader, setCookie, etc) and placing it in its own R package…
    unnamed as of yet. This is to facilitate my development of the ralite
    R package, a small single user cross-platform web server.

    The goal for ralite is to speed up development of R web applications,
    take out a bit of friction in the development process by not having to
    run the full rApache server. Plus it would allow users to develop in
    the rApache enronment while on windows and later deploy on more
    capable server environments. The secondary goal for ralite is it’s use
    in other web server environments (nginx and IIS come to mind) as a
    persistent per-client process.

    And finally, wiki.rapache.net will be the new www.rapache.net once I
    translate the manual over… any day now.

    From –http://biostat.mc.vanderbilt.edu/wiki/Main/JeffreyHorner

     

     

    Not convinced ?- try the demos above.

    Which software do we buy? -It depends

    Software (novel)
    Image via Wikipedia

    Often I am asked by clients, friends and industry colleagues on the suitability or unsuitability of particular software for analytical needs.  My answer is mostly-

    It depends on-

    1) Cost of Type 1 error in purchase decision versus Type 2 error in Purchase Decision. (forgive me if I mix up Type 1 with Type 2 error- I do have some weird childhood learning disabilities which crop up now and then)

    Here I define Type 1 error as paying more for a software when there were equivalent functionalities available at lower price, or buying components you do need , like SPSS Trends (when only SPSS Base is required) or SAS ETS, when only SAS/Stat would do.

    The first kind is of course due to the presence of free tools with GUI like R, R Commander and Deducer (Rattle does have a 500$ commercial version).

    The emergence of software vendors like WPS (for SAS language aficionados) which offer similar functionality as Base SAS, as well as the increasing convergence of business analytics (read predictive analytics), business intelligence (read reporting) has led to somewhat brand clutter in which all softwares promise to do everything at all different prices- though they all have specific strengths and weakness. To add to this, there are comparatively fewer business analytics independent analysts than say independent business intelligence analysts.

    2) Type 2 Error- In this case the opportunity cost of delayed projects, business models , or lower accuracy – consequences of buying a lower priced software which had lesser functionality than you required.

    To compound the magnitude of error 2, you are probably in some kind of vendor lock-in, your software budget is over because of buying too much or inappropriate software and hardware, and still you could do with some added help in business analytics. The fear of making a business critical error is a substantial reason why open source software have to work harder at proving them competent. This is because writing great software is not enough, we need great marketing to sell it, and great customer support to sustain it.

    As Business Decisions are decisions made in the constraints of time, information and money- I will try to create a software purchase matrix based on my knowledge of known softwares (and unknown strengths and weakness), pricing (versus budgets), and ranges of data handling. I will add in basically an optimum approach based on known constraints, and add in flexibility for unknown operational constraints.

    I will restrain this matrix to analytics software, though you could certainly extend it to other classes of enterprise software including big data databases, infrastructure and computing.

    Noted Assumptions- 1) I am vendor neutral and do not suffer from subjective bias or affection for particular software (based on conferences, books, relationships,consulting etc)

    2) All software have bugs so all need customer support.

    3) All software have particular advantages , strengths and weakness in terms of functionality.

    4) Cost includes total cost of ownership and opportunity cost of business analytics enabled decision.

    5) All software marketing people will praise their own software- sometimes over-selling and mis-selling product bundles.

    Software compared are SPSS, KXEN, R,SAS, WPS, Revolution R, SQL Server,  and various flavors and sub components within this. Optimized approach will include parallel programming, cloud computing, hardware costs, and dependent software costs.

    To be continued-

     

     

     

     

    Public Opinion Quarterly

    If you are interested in

    SURVEY METHODOLOGY FOR PUBLIC HEALTH RESEARCHERS

    There is a free virtual issue, Survey Methodology for Public Health Researchers: Selected Readings from 20 years of PublicOpinion Quarterly. The virtual issue’s 18 articles illustrate the range of survey methods material that can be found in POQ and include conclusions that are still valid today. Specially chosen by guest editor Floyd J. Fowler, the articles will be of interest to those who work and research in public health and health services more broadly

    R on Windows HPC Server

    From HPC Wire, the newsletter/site for all HPC news-

    Source- Link

    PALO ALTO, Calif., Sept. 20 — Revolution Analytics, the leading commercial provider of software and support for the popular open source R statistics language, today announced it will deliver Revolution R Enterprise for Microsoft Windows HPC Server 2008 R2, released today, enabling users to analyze very large data sets in high-performance computing environments.

    R is a powerful open source statistics language and the modern system for predictive analytics. Revolution Analytics recently introduced RevoScaleR, new “Big Data” analysis capabilities, to its R distribution, Revolution R Enterprise. RevoScaleR solves the performance and capacity limitations of the R language by with parallelized algorithms that stream data across multiple cores on a laptop, workstation or server. Users can now process, visualize and model terabyte-class data sets at top speeds — without the need for specialized hardware.

    “Revolution Analytics is pleased to support Microsoft’s Technical Computing initiative, whose efforts will benefit scientists, engineers and data analysts,” said David Champagne, CTO at Revolution. “We believe the engineering we have done for Revolution R Enterprise, in particular our work on big-data statistics and multicore computing, along with Microsoft’s HPC platform for technical computing, makes an ideal combination for high-performance large scale statistical computing.”

    “Processing and analyzing this ‘big data’ is essential to better prediction and decision making,” said Bill Hamilton, director of technical computing at Microsoft Corp. “Revolution R Enterprise for Windows HPC Server 2008 R2 gives customers an extremely powerful tool that handles analysis of very large data and high workloads.”

    To learn more about Revolution R Enterprise and its Big Data capabilities, download thewhite paper. Revolution Analytics also has an on-demand webcast, “High-performance analytics with Revolution R and Windows HPC Server,” available online.

    AND from Microsoft’s website

    http://www.microsoft.com/hpc/en/us/solutions/hpc-for-life-sciences.aspx

    REvolution R Enterprise »

    REvolution Computing

    REvolution R Enterprise is designed for both novice and experienced R users looking for a production-grade R distribution to perform mission critical predictive analytics tasks right from the desktop and scale across multiprocessor environments. Featuring RPE™ REvolution’s R Productivity Environment for Windows.

    Of course R Enterprise is available on Linux but on Red Hat Enterprise Linux- it would be nice to see Amazom Machine Images as well as Ubuntu versions as well.

    An Amazon Machine Image (AMI) is a special type of virtual appliance which is used to instantiate (create) a virtual machine within the Amazon Elastic Compute Cloud. It serves as the basic unit of deployment for services delivered using EC2.[1]

    Like all virtual appliances, the main component of an AMI is a read-only filesystem image which includes an operating system (e.g., Linux, UNIX, or Windows) and any additional software required to deliver a service or a portion of it.[2]

    The AMI filesystem is compressed, encrypted, signed, split into a series of 10MB chunks and uploaded into Amazon S3 for storage. An XML manifest file stores information about the AMI, including name, version, architecture, default kernel id, decryption key and digests for all of the filesystem chunks.

    An AMI does not include a kernel image, only a pointer to the default kernel id, which can be chosen from an approved list of safe kernels maintained by Amazon and its partners (e.g., RedHat, Canonical, Microsoft). Users may choose kernels other than the default when booting an AMI.[3]

    [edit]Types of images

    • Public: an AMI image that can be used by any one.
    • Paid: a for-pay AMI image that is registered with Amazon DevPay and can be used by any one who subscribes for it. DevPay allows developers to mark-up Amazon’s usage fees and optionally add monthly subscription fees.