So I decided to test the next iteration of http://cloudnumbers.com and I was pleasantly surprised to see how easy it is to start a Linux Cluster and start doing #Rstats computing
on the cloud using R Studio.
Here are some screenshots of my journey.
So I decided to test the next iteration of http://cloudnumbers.com and I was pleasantly surprised to see how easy it is to start a Linux Cluster and start doing #Rstats computing
on the cloud using R Studio.
Here are some screenshots of my journey.
Here is an interview with Mike Boyarski , Director Product Marketing at Jaspersoft
the largest BI community with over 14 million downloads, nearly 230,000 registered members, representing over 175,000 production deployments, 14,000 customers, across 100 countries.
Revolution Analytics just launched an roadmap detailing their product plan for 2011.
In particular I am excited for the new GUI coming up, the Hadoop packages, new K Means and Data Sort/merge using Revoscaler for bigger datasets, and also the option to offer support for community packages like ggplot2 titled ” More value in Community Version”. Continue reading “Revolution Analytics Product Launches for #rstats in 2011”
A bit early but the latest editions of both SAS and R were released last week.
SAS 9.3 is clearly a major release with multiple enhancements to make SAS both relevant and pertinent in enterprise software in the age of big data. Also many more R specific, JMP specific and partners like Teradata specific enhancements.
http://support.sas.com/software/93/index.html
LICENCE:
• No parts of R are now licensed solely under GPL-2. The licences for packages rpart and survival have been changed, which means that the licence terms for R as distributed are GPL-2 | GPL-3.
This is a maintenance release to consolidate various minor fixes to 2.13.0.
CHANGES IN R VERSION 2.13.1:
NEW FEATURES:
• iconv() no longer translates NA strings as "NA".
• persp(box = TRUE) now warns if the surface extends outside the
box (since occlusion for the box and axes is computed assuming
the box is a bounding box). (PR#202.)
• RShowDoc() can now display the licences shipped with R, e.g.
RShowDoc("GPL-3").
• New wrapper function showNonASCIIfile() in package tools.
• nobs() now has a "mle" method in package stats4.
• trace() now deals correctly with S4 reference classes and
corresponding reference methods (e.g., $trace()) have been added.
• xz has been updated to 5.0.3 (very minor bugfix release).
• tools::compactPDF() gets more compression (usually a little,
sometimes a lot) by using the compressed object streams of PDF
1.5.
• cairo_ps(onefile = TRUE) generates encapsulated EPS on platforms
with cairo >= 1.6.
• Binary reads (e.g. by readChar() and readBin()) are now supported
on clipboard connections. (Wish of PR#14593.)
• as.POSIXlt.factor() now passes ... to the character method
(suggestion of Joshua Ulrich). [Intended for R 2.13.0 but
accidentally removed before release.]
• vector() and its wrappers such as integer() and double() now warn
if called with a length argument of more than one element. This
helps track down user errors such as calling double(x) instead of
as.double(x).
INSTALLATION:
• Building the vignette PDFs in packages grid and utils is now part
of running make from an SVN checkout on a Unix-alike: a separate
make vignettes step is no longer required.
These vignettes are now made with keep.source = TRUE and hence
will be laid out differently.
• make install-strip failed under some configuration options.
• Packages can customize non-standard installation of compiled code
via a src/install.libs.R script. This allows packages that have
architecture-specific binaries (beyond the package's shared
objects/DLLs) to be installed in a multi-architecture setting.
SWEAVE & VIGNETTES:
• Sweave() and Stangle() gain an encoding argument to specify the
encoding of the vignette sources if the latter do not contain a
\usepackage[]{inputenc} statement specifying a single input
encoding.
• There is a new Sweave option figs.only = TRUE to run each figure
chunk only for each selected graphics device, and not first using
the default graphics device. This will become the default in R
2.14.0.
• Sweave custom graphics devices can have a custom function
foo.off() to shut them down.
• Warnings are issued when non-portable filenames are found for
graphics files (and chunks if split = TRUE). Portable names are
regarded as alphanumeric plus hyphen, underscore, plus and hash
(periods cause problems with recognizing file extensions).
• The Rtangle() driver has a new option show.line.nos which is by
default false; if true it annotates code chunks with a comment
giving the line number of the first line in the sources (the
behaviour of R >= 2.12.0).
• Package installation tangles the vignette sources: this step now
converts the vignette sources from the vignette/package encoding
to the current encoding, and records the encoding (if not ASCII)
in a comment line at the top of the installed .R file.
DEPRECATED AND DEFUNCT:
• The internal functions .readRDS() and .saveRDS() are now
deprecated in favour of the public functions readRDS() and
saveRDS() introduced in R 2.13.0.
• Switching off lazy-loading of code _via_ the LazyLoad field of
the DESCRIPTION file is now deprecated. In future all packages
will be lazy-loaded.
• The off-line help() types "postscript" and "ps" are deprecated.
UTILITIES:
• R CMD check on a multi-architecture installation now skips the
user's .Renviron file for the architecture-specific tests (which
do read the architecture-specific Renviron.site files). This is
consistent with single-architecture checks, which use
--no-environ.
• R CMD build now looks for DESCRIPTION fields BuildResaveData and
BuildKeepEmpty for per-package overrides. See ‘Writing R
Extensions’.
BUG FIXES:
• plot.lm(which = 5) was intended to order factor levels in
increasing order of mean standardized residual. It ordered the
factor labels correctly, but could plot the wrong group of
residuals against the label. (PR#14545)
• mosaicplot() could clip the factor labels, and could overlap them
with the cells if a non-default value of cex.axis was used.
(Related to PR#14550.)
• dataframe[[row,col]] now dispatches on [[ methods for the
selected column (spotted by Bill Dunlap).
• sort.int() would strip the class of an object, but leave its
object bit set. (Reported by Bill Dunlap.)
• pbirthday() and qbirthday() did not implement the algorithm
exactly as given in their reference and so were unnecessarily
inaccurate.
pbirthday() now solves the approximate formula analytically
rather than using uniroot() on a discontinuous function.
The description of the problem was inaccurate: the probability is
a tail probablity (‘2 _or more_ people share a birthday’)
• Complex arithmetic sometimes warned incorrectly about producing
NAs when there were NaNs in the input.
• seek(origin = "current") incorrectly reported it was not
implemented for a gzfile() connection.
• c(), unlist(), cbind() and rbind() could silently overflow the
maximum vector length and cause a segfault. (PR#14571)
• The fonts argument to X11(type = "Xlib") was being ignored.
• Reading (e.g. with readBin()) from a raw connection was not
advancing the pointer, so successive reads would read the same
value. (Spotted by Bill Dunlap.)
• Parsed text containing embedded newlines was printed incorrectly
by as.character.srcref(). (Reported by Hadley Wickham.)
• decompose() used with a series of a non-integer number of periods
returned a seasonal component shorter than the original series.
(Reported by Rob Hyndman.)
• fields = list() failed for setRefClass(). (Reported by Michael
Lawrence.)
• Reference classes could not redefine an inherited field which had
class "ANY". (Reported by Janko Thyson.)
• Methods that override previously loaded versions will now be
installed and called. (Reported by Iago Mosqueira.)
• addmargins() called numeric(apos) rather than
numeric(length(apos)).
• The HTML help search sometimes produced bad links. (PR#14608)
• Command completion will no longer be broken if tail.default() is
redefined by the user. (Problem reported by Henrik Bengtsson.)
• LaTeX rendering of markup in titles of help pages has been
improved; in particular, \eqn{} may be used there.
• isClass() used its own namespace as the default of the where
argument inadvertently.
• Rd conversion to latex mis-handled multi-line titles (including
cases where there was a blank line in the \title section).
Some of you know that I am due to finish “R for Business Analytics” for Springer by Dec 2011 and “R for Cloud Computing” by Dec 2012. Accordingly while I am busy crunching out ” R for Business Analytics” which is a corporate business analyst\s view on using #Rstats, I am gathering material for the cloud computing book too.
I have been waiting for someone like CloudNumbers.com for some time now, and I like their initial pricing structure. As scale picks up, this should only get better. As a business Intelligence analyst, I wonder if they can help set up a dedicated or private cloud too for someone who wants a data mart solution to be done.The best thing I like about this- they have a referral scheme so if someone you know wants to test it out, well it gives you some freebies too in the form of an invitation code.
I name the session in case I want to start multiple sessions
After waiting 15 minutes, my instance is up and I type R to get the following
Note I can also see the desktop- which is a great improvement over EC2 interface for R Cloud computing on Linux. Also it shuts down on its own if I leave it running (as of now after 180 minutes) so i click shut down session
You can click this link to try and get your own cloud in the sky for free -10 hours are free for you
https://my.cloudnumbers.com/register/65E97A
I came across Cloudnumbers.com . Awesome name , I didnt know groovy domain names existed anymore.
What is cloudnumbers.com – The website which looks like the salesforce.com website in style and design-
says-
Things are still very raw here- but its an awesome concept. With 68 GB of Memory, I am sure R can blow away everything out of the water.
Probably the competition needs to ahem launch that private cloud soon, before they lose the momentum.
and you Get 2GB of storage, 2GB of traffic and 10h computation cost per month for free! I think this German startup has hit the nail on the head and it would be interesting to see what the future holds.
Check out http://cloudnumbers.com/product yourself and/or see the video
https://www.youtube.com/v/0ZNEpR_ElV0?version=3&hl=en_US&hd=1
I have been watching for Revolution Analytics product almost since the inception of the company. It has managed to sail over storms, naysayers and critics with simple and effective strategy of launching good software, making good partnerships and keeping up media visibility with white papers, joint webinars, blogs, conferences and events.
However this is a listing of all technical contributions made by Revolution Analytics products to the #rstats project.
1) Useful Packages mostly in parallel processing or more efficient computing like
2) RevoScaler package to beat R’s memory problem (this is probably the best in my opinion as it is yet to be replicated by the open source version and is a clear cut reason for going in for the paid version)
http://www.revolutionanalytics.com/products/enterprise-big-data.php
- Efficient XDF File Format designed to efficiently handle huge data sets.
- Data Step Functionality to quickly clean, transform, explore, and visualize huge data sets.
- Data selection functionality to store huge data sets out of memory, and select subsets of rows and columns for in-memory operation with all R functions.
- Visualize Large Data sets with line plots and histograms.
- Built-in Statistical Algorithms for direct analysis of huge data sets:
- Summary Statistics
- Linear Regression
- Logistic Regression
- Crosstabulation
- On-the-fly data transformations to include derived variables in models without writing new data files.
- Extend Existing Analyses by writing user- defined R functions to “chunk” through huge data sets.
- Direct import of fixed-format text data files and SAS data sets into .xdf format
3) RevoDeploy R for API based R solution – I somehow think this feature will get more important as time goes on but it seems a lower visibility offering right now.
http://www.revolutionanalytics.com/products/enterprise-deployment.php
- Collection of Web services implemented as a RESTful API.
- JavaScript and Java client libraries, allowing users to easily build custom Web applications on top of R.
- .NET Client library — includes a COM interoperability to call R from VBA
- Management Console for securely administrating servers, scripts and users through HTTP and HTTPS.
- XML and JSON format for data exchange.
- Built-in security model for authenticated or anonymous invocation of R Scripts.
- Repository for storing R objects and R Script execution artifacts.
4) Revolutions IDE (or Productivity Environment) for a faster coding environment than command line. The GUI by Revolution Analytics is in the works. – Having used this- only the Code Snippets function is a clear differentiator from newer IDE and GUI. The code snippets is awesome though and even someone who doesnt know much R can get analysis set up quite fast and accurately.
http://www.revolutionanalytics.com/products/enterprise-productivity.php
- Full-featured Visual Debugger for debugging R scripts, with call stack window and step-in, step-over, and step-out capability.
- Enhanced Script Editor with hover-over help, word completion, find-across-files capability, automatic syntax checking, bookmarks, and navigation buttons.
- Run Selection, Run to Line and Run to Cursor evaluation
- R Code Snippets to automatically generate fill-in-the-blank sections of R code with tooltip help.
- Object Browser showing available data and function objects (including those in packages), with context menus for plotting and editing data.
- Solution Explorer for organizing, viewing, adding, removing, rearranging, and sourcing R scripts.
- Customizable Workspace with dockable, floating, and tabbed tool windows.
- Version Control Plug-in available for the open source Subversion version control software.
Marketing contributions from Revolution Analytics-
1) Sponsoring R sessions and user meets
2) Evangelizing R at conferences and partnering with corporate partners including JasperSoft, Microsoft , IBM and others at http://www.revolutionanalytics.com/partners/
3) Helping with online initiatives like http://www.inside-r.org/ (which is curiously dormant and now largely superseded by R-Bloggers.com) and the syntax highlighting tool at http://www.inside-r.org/pretty-r. In addition Revolution has been proactive in reaching out to the community
4) Helping pioneer blogging about R and Twitter Hash tag discussions , and contributing to Stack Overflow discussions. Within a short while, #rstats online community has overtaken a lot more established names- partly due to decentralized nature of its working.
Did I miss something out? yes , they share their code by GPL.
Let me know by feedback