Even more variety in Cloud Computing Instances from AWS

If you ever complain that R is slow because it stores it in RAM , well here is a whole lot of RAM for you.

 

From-

http://aws.typepad.com/aws/2013/12/amazon-ec2-new-i2-instance-type-available-now.html

The Specs
Here are the instance sizes and the associated specs:

Instance Name vCPU Count RAM
Instance Storage (SSD) Price/Hour
i2.xlarge 4 30.5 GiB 1 x 800 GB $0.85
i2.2xlarge 8 61 GiB 2 x 800 GB $1.71
i2.4xlarge 16 122 GiB 4 x 800 GB $3.41
i2.8xlarge 32 244 GiB 8 x 800 GB $6.82

 

This leaves these guys way behind

https://cloud.google.com/products/compute-engine/

High Memory

Machines for tasks that require more memory relative to virtual cores

Instance type Virtual Cores Memory Price (US$)/Hour
(US hosted)
Price (US$)/Hour
(Europe hosted)
n1-highmem-2 2 13GB $0.244 $0.275
n1-highmem-4 4 26GB $0.488 $0.549
n1-highmem-8 8 52GB $0.975 $1.098
n1-highmem-16 16 104GB $1.951 $2.196

Google Compute Engine pricing much more honest than AWS and Azure Pricing

Google compute is priced like a Taxi- 10 minutes and then in blocks of 1 minute .

The web page is clear and simple and does not confuse

Google Compute Engine Pricing is more honest in its GA

Microsoft dazzles but hides pricing between layers of pages, while Amazon does it with dropdowns and contact us baits. IBM is impossible  to get an upfront honest price from and I am still trying to figure out Oracle Cloud

Hopefully GCE uptime will be better than Gmail Uptime!!!

——————————————————————————————————————————-

From

https://cloud.google.com/products/compute-engine/

Pricing

All machine types are charged a minimum of 10 minutes. For example, if you run your instance for 2 minutes, you will be billed for 10 minutes of usage. After 10 minutes, instances are charged in 1 minute increments, rounded up to the nearest minute. For example, an instance that lives for 11.25 minutes will be charged for 12 minutes of usage.

If you would like to discuss pricing for long-term commitments, please contact sales

Machine Type Pricing

Standard

Instance type Virtual Cores Memory Price (US$)/Hour
(US hosted)
Price (US$)/Hour
(Europe hosted)
n1-standard-1 1 3.75GB * $0.104 $0.114
n1-standard-2 2 7.5GB $0.207 $0.228
n1-standard-4 4 15GB * $0.415 $0.456
n1-standard-8 8 30GB $0.829 $0.912
n1-standard-16 16 60GB $1.659 $1.825

High Memory

Machines for tasks that require more memory relative to virtual cores

Instance type Virtual Cores Memory Price (US$)/Hour
(US hosted)
Price (US$)/Hour
(Europe hosted)
n1-highmem-2 2 13GB $0.244 $0.275
n1-highmem-4 4 26GB $0.488 $0.549
n1-highmem-8 8 52GB $0.975 $1.098
n1-highmem-16 16 104GB $1.951 $2.196

High CPU

Machines for tasks that require more virtual cores relative to memory

Instance type Virtual Cores Memory Price (US$)/Hour
(US hosted)
Price (US$)/Hour
(Europe hosted)
n1-highcpu-2 2 1.80GB $0.131 $0.146
n1-highcpu-4 4 3.60GB $0.261 $0.292
n1-highcpu-8 8 7.20GB $0.522 $0.584
n1-highcpu-16 16 14.40GB $1.044 $1.167

Shared Core

Machines for tasks that don’t require a lot of resources but do have to remain online for long periods of time.

Instance type Virtual Cores Memory Price (US$)/Hour
(US hosted)
Price (US$)/Hour
(Europe hosted)
f1-micro 1 0.60GB $0.019 $0.021
g1-small 1 1.70GB $0.054 $0.059

Network Pricing

Ingress Free
Egress to the same Zone. Free
Egress to a different Cloud service within the same Region. Free
Egress to a different Zone in the same Region (per GB) $0.01
Egress to a different Region within the US $0.01 ***
Inter-continental Egress At Internet Egress Rate
Internet Egress (Americas/EMEA destination) per GB
0-1 TB in a month $0.12
1-10 TB $0.11
10+ TB $0.08
Internet Egress (APAC destination) per GB
0-1 TB in a month $0.21
1-10 TB $0.18
10+ TB $0.15

Persistent Disk Pricing

Provisioned space $0.04 GB / month
Snapshot storage $0.125 GB / month
IO operations No additional charge

Image Storage

Image storage $0.085 GB / month

IP Address Pricing

Static IP address (assigned but unused) $0.01 / hour
Static IP address (assigned and in use) Free
Ephemeral IP address (attached to instance) Free

VERSUS

http://aws.amazon.com/ec2/pricing/

Pricing is per instance-hour consumed for each instance, from the time an instance is launched until it is terminated or stopped. Each partial instance-hour consumed will be billed as a full hour.

On-Demand Instance Prices

Region:US East (N. Virginia)US West (Oregon)US West (Northern California)EU (Ireland)Asia Pacific (Singapore)Asia Pacific (Tokyo)Asia Pacific (Sydney)South America (Sao Paulo)
$  US Dollar
Linux/UNIX Usage
General Purpose – Current Generation
m3.xlarge $0.450 per Hour
m3.2xlarge $0.900 per Hour
General Purpose – Previous Generation
m1.small $0.060 per Hour
m1.medium $0.120 per Hour
m1.large $0.240 per Hour
m1.xlarge $0.480 per Hour
Compute Optimized – Current Generation
c3.large $0.150 per Hour
c3.xlarge $0.300 per Hour
c3.2xlarge $0.600 per Hour
c3.4xlarge $1.200 per Hour
c3.8xlarge $2.400 per Hour
Compute Optimized – Previous Generation
c1.medium $0.145 per Hour
c1.xlarge $0.580 per Hour
cc2.8xlarge $2.400 per Hour
GPU Instances – Current Generation
g2.2xlarge $0.650 per Hour
GPU Instances – Previous Generation
cg1.4xlarge $2.100 per Hour
Memory Optimized – Current Generation
m2.xlarge $0.410 per Hour
m2.2xlarge $0.820 per Hour
m2.4xlarge $1.640 per Hour
cr1.8xlarge $3.500 per Hour
Storage Optimized – Current Generation
hi1.4xlarge $3.100 per Hour
hs1.8xlarge $4.600 per Hour
Micro Instances
t1.micro $0.020 per Hour

Pricing is per instance-hour consumed for each instance, from the time an instance is launched until it is terminated or stopped. Each partial instance-hour consumed will be billed as a full hour.

Reserved Instances

Reserved Instances give you the option to make a low, one-time payment for each instance you want to reserve and in turn receive a significant discount on the hourly charge for that instance. There are three Reserved Instance types (Light, Medium, and Heavy Utilization Reserved Instances) that enable you to balance the amount you pay upfront with your effective hourly price.

The following tables display the Reserved Instance Prices available directly from AWS. In addition to Reserved Instances for Linux/UNIX and Windows operating systems specified below, we also offer Reserved Instances forAmazon EC2 running SUSE Linux Enterprise ServerAmazon EC2 running Red Hat Enterprise Linux, and Amazon EC2 running Microsoft SQL Server. Dedicated Reserved Instances are also available.

Light Utilization Reserved Instances

Region:US East (N. Virginia)US West (Northern California)US West (Oregon)EU (Ireland)Asia Pacific (Singapore)Asia Pacific (Tokyo)Asia Pacific (Sydney)South America (Sao Paulo)
$  US Dollar
1 yr Term 3 yr Term
Upfront Hourly Upfront Hourly
General Purpose – Current Generation
m3.xlarge $439 $0.254 per Hour $686 $0.201 per Hour
m3.2xlarge $879 $0.508 per Hour $1372 $0.401 per Hour
General Purpose – Previous Generation
m1.small $61 $0.034 per Hour $96 $0.027 per Hour
m1.medium $122 $0.068 per Hour $192 $0.054 per Hour
m1.large $243 $0.136 per Hour $384 $0.108 per Hour
m1.xlarge $486 $0.271 per Hour $768 $0.215 per Hour
Compute Optimized – Current Generation
c3.large $167 $0.093 per Hour $252 $0.082 per Hour
c3.xlarge $333 $0.186 per Hour $503 $0.164 per Hour
c3.2xlarge $667 $0.373 per Hour $1006 $0.327 per Hour
c3.4xlarge $1333 $0.745 per Hour $2012 $0.654 per Hour
c3.8xlarge $2666 $1.49 per Hour $4024 $1.308 per Hour
Compute Optimized – Previous Generation
c1.medium $161 $0.09 per Hour $243 $0.079 per Hour
c1.xlarge $644 $0.36 per Hour $972 $0.316 per Hour
cc2.8xlarge $1762 $0.904 per Hour $2710 $0.904 per Hour
GPU Instances – Current Generation
g2.2xlarge $772 $0.499 per Hour $1143 $0.392 per Hour
Memory Optimized – Current Generation
m2.xlarge $272 $0.169 per Hour $398 $0.136 per Hour
m2.2xlarge $544 $0.338 per Hour $796 $0.272 per Hour
m2.4xlarge $1088 $0.676 per Hour $1592 $0.544 per Hour
cr1.8xlarge $2474 $1.54 per Hour $3846 $1.225 per Hour
Storage Optimized – Current Generation
hs1.8xlarge $3968 $2.24 per Hour $5997 $1.81 per Hour
hi1.4xlarge $2576 $1.477 per Hour $3884 $1.15 per Hour
Micro Instances
t1.micro $23 $0.012 per Hour $35 $0.012 per Hour

Medium Utilization Reserved Instances

Region:US East (N. Virginia)US West (Northern California)US West (Oregon)EU (Ireland)Asia Pacific (Singapore)Asia Pacific (Tokyo)Asia Pacific (Sydney)South America (Sao Paulo)
$  US Dollar
1 yr Term 3 yr Term
Upfront Hourly Upfront Hourly
General Purpose – Current Generation
m3.xlarge $1034 $0.156 per Hour $1631 $0.123 per Hour
m3.2xlarge $2069 $0.313 per Hour $3262 $0.247 per Hour
General Purpose – Previous Generation
m1.small $139 $0.021 per Hour $215 $0.017 per Hour
m1.medium $277 $0.042 per Hour $430 $0.033 per Hour
m1.large $554 $0.084 per Hour $860 $0.067 per Hour
m1.xlarge $1108 $0.168 per Hour $1720 $0.133 per Hour
Compute Optimized – Current Generation
c3.large $383 $0.056 per Hour $591 $0.049 per Hour
c3.xlarge $766 $0.112 per Hour $1182 $0.097 per Hour
c3.2xlarge $1532 $0.224 per Hour $2364 $0.195 per Hour
c3.4xlarge $3064 $0.447 per Hour $4728 $0.389 per Hour
c3.8xlarge $6127 $0.894 per Hour $9456 $0.778 per Hour
Compute Optimized – Previous Generation
c1.medium $370 $0.054 per Hour $571 $0.047 per Hour
c1.xlarge $1480 $0.216 per Hour $2284 $0.188 per Hour
cc2.8xlarge $4146 $0.54 per Hour $6378 $0.54 per Hour
GPU Instances – Current Generation
g2.2xlarge $1987 $0.34 per Hour $3311 $0.294 per Hour
Memory Optimized – Current Generation
m2.xlarge $651 $0.103 per Hour $992 $0.08 per Hour
m2.2xlarge $1302 $0.206 per Hour $1984 $0.16 per Hour
m2.4xlarge $2604 $0.412 per Hour $3968 $0.32 per Hour
cr1.8xlarge $5958 $0.93 per Hour $9006 $0.735 per Hour
Storage Optimized – Current Generation
hs1.8xlarge $9200 $1.38 per Hour $14103 $1.11 per Hour
hi1.4xlarge $5973 $0.909 per Hour $9133 $0.705 per Hour
Micro Instances
t1.micro $54 $0.007 per Hour $82 $0.007 per Hour

Heavy Utilization Reserved Instances

Loading pricing data…

Reserved Instances can be purchased directly from AWS for 1 or 3 year terms. Using the Reserved Instance Marketplace, you have the flexibility to purchase Reserved Instances from AWS Reserved Instance Marketplace Sellers for terms ranging between 1 month to 36 months (depending on available selection). In either case, the one-time fee per instance is non-refundable. If your needs change, you can also request to move your Reserved Instance to another Availability Zone within the same region, change its Network Platform, or, for Linux/UNIX and Windows RIs, modify the instance type of your reservation to another type in the same instance family at no additional cost.

Light and Medium Utilization Reserved Instances also are billed by the instance-hour for the time that instances are in a running state; if you do not run the instance in an hour, there is zero usage charge. Partial instance-hours consumed are billed as full hours. Heavy Utilization Reserved Instances are billed for every hour during the entire Reserved Instance term (which means you’re charged the hourly fee regardless of whether any usage has occurred during an hour).

If Microsoft or Red Hat chooses to increase the license fees that it charges for Windows or Red Hat Enterprise Linux, we may correspondingly increase the per-hour usage rate for previously purchased Reserved Instances with Windows or Red Hat Enterprise Linux. The initial one-time payment for a Reserved Instance will be unaffected in this situation. Any such changes for Windows would be made between Dec 1 – Jan 31, and with at least 30 days’ notice. Any such changes for Red Hat Enterprise Linux would be made at least 30 days’ notice. If the per-hour usage rate does increase, you may continue to use your Reserved Instance with Windows or Red Hat Enterprise Linux with the new per-hour usage rate, convert your Reserved Instance with Windows or Red Hat Enterprise Linux to a Reserved Instance with Linux/UNIX, or request a pro rata refund of the upfront fee you paid for the Reserved Instance with Windows or Red Hat Enterprise Linux.

Reserved Instances are available for Linux/UNIX, Windows, Red Hat Enterprise Linux, and SUSE Linux Enterprise operating systems. You can also optionally reserve instances in Amazon VPC at the same prices as shown above.Click here to learn more about Reserved Instances.

Reserved Instance Volume Discounts

When you have purchased a sufficient number of Reserved Instances in an AWS Region, you will automatically receive discounts on your upfront fees and usage fees for future purchases of Reserved Instances in that AWS Region. Reserved Instance Tiers are determined based on the total list price (non-discounted price) of upfront fees for the active Reserved Instances you have per AWS Region. It is important to note that Reserved Instance Tiers do not apply to Reserved Instances purchased from the Reserved Instance Marketplace. A complete list of the Reserved Instance Tiers is shown below:

Reserved Instance Volume Discounts

Total Reserved Instances

Upfront Discount

Hourly Discount
Less than $250,000
0%
0%
$250,000 to $2,000,000
10%
10%
$2,000,000 to $5,000,000
20%
20%
More than $5,000,000

http://www.windowsazure.com/EN-US/pricing/overview/

Up to 29.5% savings vs.
Pay as You Go plan
Starting at $500/month

PURCHASE

SUPPORT OPTIONSCustomizable support options to provide the best available expertise for your needs.PURCHASE

Purchase plans

* Comparisons based on the pay-as-you-go plan.

Monthly
Committed Spend

  • $500 TO $14,999
  • $15,000 TO $39,999
  • $40,000 AND ABOVE

6-Month
Monthly Pay

  • discount *
  • 20%
  • 23%
  • 27%

VIEW DETAILS | BUY

12-Month
Monthly Pay

  • discount *
  • 22.5%
  • 25.5%
  • 29.5%

VIEW DETAILS | BUY

6-Month
Pre-Pay

  • discount *
  • 22.5%
  • 25.5%
  • 29.5%

VIEW DETAILS | BUY

12-Month
Pre-Pay

  • discount *
  • 25%
  • 28%
  • 32%

VIEW DETAILS | BUY

http://www.windowsazure.com/en-us/offers/commitment-plans

Usage Quotas

The following monthly usage quotas are applied. If you need more than these limits, please contact customer service at any time so that we can understand your needs and adjust these limits appropriately.

Cloud Services and Virtual Machines

The standard quota is 20 concurrent Standard Small (A1) compute instances or an equivalent number of other types or sizes of compute instances as determined by using the compute instance quota conversion table below.

1 COMPUTE INSTANCE IN THE FOLLOWING NUMBER OF EQUIVALENT STANDARD SMALL (A1) INSTANCES
Extra Small (A0) 1
Small (A1) 1
Medium (A2) 2
Large (A3) or A6 4
Extra Large (A4)  or A7 8

Storage

  • 5 concurrent storage accounts

Active Directory

  • 150,000 objects

Also see

http://www.ibm.com/cloud-computing/social/us/en/planspricing/

https://cloud.oracle.com/mycloud/f?p=service:java_pricing:0:::::

 

Java S1

$249 / MonthBuy Now- S1

  • 1

    Oracle WebLogic Server 1

  • 1.5 GB

    RAM for Java Heap 2

  • 5 GB

    File Storage 3

  • 50 GB

    Data Transfer 4

Java S2

$499 / MonthBuy Now- S2

  • 2

    Oracle WebLogic Servers 1

  • 3 GB

    RAM for Java Heap 2

  • 10 GB

    File Storage 3

  • 250 GB

    Data Transfer 4

Java S4

$1,499 / MonthBuy Now- S4

  • 4

    Oracle WebLogic Servers 1

  • 6 GB

    RAM for Java Heap 2

  • 25 GB

    File Storage 3

  • 500 GB

    Data Transfer 4

Understanding the Google Cloud

Google has a lot of services, so I really like this simple explanation of them. Though I may want a clickable , one more level of detail to make it interactive (esp Google cloud SQL vs Google Big Query- love in a tech documentation??)

google cloud

Source-

https://cloud.google.com/resources/articles/storage-overview

I wish technical documentation had more examples of lucid , infographic like explanations.

 

R in Oracle Java Cloud and Existing R – Java Integration #rstats

So I finally got my test plan accepted for a 1 month trial to the Oracle Public Cloud at https://cloud.oracle.com/ .

oc1 I am testing this for my next book R for Cloud Computing ( I have already covered Windows Azure, Amazon AWS, and in the middle of testing Google Compute).

Some initial thoughts- this Java cloud seemed more suitable for web apps, than for data science ( but I have to spend much more time on this).

I really liked the help and documentation and tutorials, Oracle has invested a lot in it to make it friendly to enterprise users.

Hopefully the Oracle R Enterprise  ORE guys can talk to the Oracle Cloud department and get some common use case projects going.

oc3.7

In the meantime, I did a roundup on all R -Java projects.

They include- Continue reading “R in Oracle Java Cloud and Existing R – Java Integration #rstats”

Amazon AWS Data Pipeline

Ok I missed this one as it came on Dec 20.  I think the AWS data pipeline is a really important step forward for cloud enabled analytics.

dp-how-dp-works-v2

http://aws.amazon.com/about-aws/whats-new/2012/12/20/announcing-aws-data-pipeline/

What is AWS Data Pipeline?

AWS Data Pipeline is a web service that you can use to automate the movement and transformation of data. Continue reading “Amazon AWS Data Pipeline”

Running R GUI on Google Compute

I wanted to run R GUIs ( rattle, Rcmdr, Deducer) on my Google Compute Instance, but didnt know how to figure out how to enable x11.

Initially I just tried to enable x11 forwarding in the local ssh (Ubuntu) and remote sshd( GCE), but it still needed some more.

Note I use gedit to edit files locally ( since it is easier) and vi to edit files remotely ( because I didnt have a graphical environment there yet) . I used vi help from the link here  (basically sudo vi filename opens the file in Linux, you scroll down and press Insert to write your changes, then hit escape, then write this to save and quit :qw ( or :q! to NOT save and quit), your mouse is quite useless and the arrow keys dont help much in vi- I assure you that)

[local]
/etc/ssh_config or ~/.ssh/config
ForwardX11 yes

restarted local ssh

[remote]
/etc/sshd_config
X11Forwarding yes

restarted remote sshd

Well this is how it is done- following is a copy and paste from actual discussion-

here are two steps you have to do in order to run X-windows applications on your instance.

1) You have to install some X-windows applications on your instance.  I used the command
sudo apt-get install xterm
which works on Ubuntu.  On Centos, you would use the command
yum install xterm
but I didn’t test that.
2) You have to create an X-windows tunnel through SSH.  You do that with the -X switch to the gcutil ssh command:
 gcutil ssh –ssh_arg -X INSTANCE
When you login to the instance, verify that the tunnel is in place.
$rman@test-pd:~$ echo $DISPLAY
localhost:10.0
rman@test-pd:~$
By way of contrast, this is what it looks like if the tunnel didn’t work:
rman@test-pd:~$ echo $DISPLAY
rman@test-pd:~$

Hat Tip- gce discussion group on google groups  https://groups.google.com/forum/#!forum/gce-discussion  and Jeff Silverman from the GCE team.

The making of a R startup Part 1 #rstats

Note- Decisionstats.com has done almost 105 interviews in the field of analytics, technology startups and thought leaders ( you can see them here http://goo.gl/m3l31). We have covered some of the R authors ( R for SAS and SPSS users, Data Mining using R, Machine Learning for Hackers) , and noted R package creators (ggplot2, RCommander, rattle GUI, forecast)

But what we truly enjoy is interviews with startups in R ecosystem , including founders of Revolution Analytics,Inference for R, RStudio, Cloudnumbers 

The latest startup in the R ecosystem with a promising product is RApporter.net . It has actually been there for some time, but with the launch of their new product we ask them the trials and tribulations of creating an open source startup in the data science field.

This is part 1 of the interview with Gergely Daróczi, co-founder of the Rapporter project.

greg

Ajay- Describe the journey of Rapporter till now, and your product plans for 2013.

Greg- The idea of Rapporter presented itself more then 3 years ago while giving statistics, SPSS and R courses at different Hungarian universities and also creating custom statistical reports for a number of companies for a living at the same time.
Long story short, the three Hungarian co-founder faced similar problems at both sectors: students, just like business clients, admired the capabilities of R and the wide variety of tools found on CRAN,but were not eager at all to get into learn how to use that.
So we tried to make up some plans how to let the non-R users also build on the resources of R, and we came up with the idea of an intuitive web-interface as an R front-end.

The real development of a helper R package (which later become “rapport”) started in the January of 2011 by Aleksandar Blagotić and me1 in our spare time and rather just for fun, as we had a dream about using “annotated statistical templates” in R after a few conversations on StackOverflow. We also worked on a front-end in the means of an Rserve driven PHP engine with MySQL – to be dropped and completely rewritten later after some trying experiences and serious benchmarking.

We have released “rapport” package to the public at the end of 2011 on GitHub, and after a few weeks on CRAN too. Despite the fact that we did our best with creating a decent documentation and also some live examples, we somehow forgot to spread the news of the new package to the R community, so “rapport” did not attract any serious attention.

Even so, our enthusiasm for annotated R “templates” did not wane as time passed, so we continued to work on “rapport” by adding new features and also Aleksandar started to fortify his Ruby on Rails skills. We also dropped Rserve with MySQL back-end, and introduced Jeffrey Horner’s awesome RApache with some NoSQL databases.
To be honest, this change resulted in a one-year delay of releasing Rapporter and no ends of headaches on our end, but in the long run, it was a really smart move after all, as we own an easily scalable and a highly available cluster of servers at the moment.

But back to 2012.

As “rapport” got too complex as time passed with newly added features, Aleksandar and I decided to split the package, which move gave birth to “pander”. At that time “knitr” got more and more familiar among R users, so it was a brave move to release “another” similar package, but the roots of “pander” were more then one year old, we used some custom methods not available in “knitr” (like
capturing the R object beside the printed output of chunks), we needed tweakable global options instead of chunk options and we really wanted to build on the power of Pandoc – just like before.

So we had a package for converting R objects to Pandoc’s markdown with a general S3 method, another package to automatically run that and also capture plots and images a brew-like document with various output formats – like pdf, docx, odt etc.
In the summer, while Aleksandar dealt with the web interface, I worked on some new features in our packages:
• automatic and robust caching of chunks with various options for performance reasons,
• automatically unifying “base”, “lattice” and “ggplot2” images to the same style with user options – like major/minor grid color, font family, color palette, margins etc.
• adding other global options to “pander”, to let our expected clients later personalize their
custom report style with a few clicks.

At the same time, we were searching for different options to prevent running malicious code in the parallel R sessions, which might compromise all our users’ sensitive data. Unfortunately no full blown solution existed at that time, and we really wanted to stand clear of running some Java based interpreters in our network.
So I started to create a parser for R commands, which was supposed to filter out malicious R commands before evaluation, and a handful flu got me some spare time to implement “sandboxR” with an open and live “hack my R server” demo, which ended up in a great challenge on my side, but proved to really work after all.
I also had a few conversations with Jeroen Ooms (the author of the awesome OpenCPU), who faced similar problems on his servers and was eager to prevent the issues with the help of AppArmor. The great news of “RAppArmor” did make “sandboxR” needless (as AppArmor just cannot regulate inner R calls), but we started to evaluate all user specified R commands in a separate hat, which allowed me to make “sanboxR” more permissive with black-filtered functions.
In the middle of the summer, I realized that we have an almost working web application with any number of R workers being able to serve tons of users based on the flexible NoSQL database back- ends, but we had no legal background to release such a service, nor had I any solid financial background to found one – moreover the Rapporter project already took huge amount from my family budget.

As I was against of letting some venture capital to dominate the project, and did not found any accelerator that would take on a project with a maturing, almost market-ready product, me and a few associates decided to found a UK company on our own and having confidence in the future and God.

So we founded Easystats Ltd, the company running rapporter.net, in July, and decided to release the first beta and pretty stable version of the application to the public at the end of September. At that time users could:
• upload and use text or SPSS sav data sets,
• specify more then 20 global options to be applied to all generated reports (like plot themes, table width, date format, decimal mark and number of digits, separators and copula in vectors etc.),
• create reports with the help of predefined statistical “templates”,
• “fork” (clone) any of our templates and modify without restriction, or create new statistical templates from scratch,
• edit the body or remove any part of the reports, resize images with the mouse or even with finger on touch-devices,
• and export reports to pdf, odt or docx formats.

A number of new features were introduced since then:

OpenBUGS integration with more permissive security profiles, users can create custom styles for the exported documents (in LaTeX, docx and odt format) to generate unique and possibly branded reports, to share public or even private reports with anyone without the need for registering on rapporter.net by a simple hyperlink, and to let our users to integrate their templates in any homepage, blog post or even HTML mail, so that let anyone use the power of R with a few clicks building on the knowledge of template authors and our reliable back-end.
Although 2 years ago I was pretty sure that this job would be finished in a few months and that we would possibly have a successful project in a year or two, now I am certain, that bunch of new features will make Rapporter more and more user-friendly, intuitive and extensible in the next few years.
Currently, we are working hard on a redesigned GUI with the help of a dedicated UX team at last (which was a really important structural change in the life of Rapporter, as we can really assign and split tasks now just like we dreamed of when the project was a two-men show), which is to be finished no later then the first quarter of the year. Beside design issues, this change would also result
in some new features, like ordering the templates, data sets and reports by popularity, rating or relevance for the currently active data set; and also letting users to alter the style of the resulting reports in a more seamless way.

The next planned tasks for 2013 include:
• a “data transformation” front-end, which would let users to rename and label variables in any uploaded data set, specify the level of measurement, recode/categorize or create new variables with the help of existing ones and/or any R functions,
• edit tables in reports on the fly (change the decimal mark, highlight some elements, rename columns and split tables to multiple pages with a simple click),
• a more robust API to let third-party users temporary upload data to be used in the analysis,
• option to use multiple data sets in a template and to let users merge or connect data online,
• and some top-secret surprises.

Beside the above tasks, which was made up by us, our team is really interested in any feedback from the users, which might change the above order or add new tasks with higher priority, so be sure to add your two cent on our support page.

And we will have to come up with some account plans with reasonable pricing in 2013 for the hosted service to let us cover the server fees and development expenses. But of course Rapporter will remain free for ever for users with basic needs (like analyzing data sets with only a few hundreds of cases) or anyone in the academic sector, and we also plan to provide an option to run Rapporter “off-site” on any Unix-like environment.

Ajay- What are some of the Big Data use cases I can do with Rapporter?

Greg- Although we have released Rapporter beta only a few months ago, we already heard some pretty promising use-cases from our (potential) clients.

But I must emphasize that at first we are not committed to deal with Big Data in the means of user contributed data sets with billions of cases, but rather concentrating on providing an intuitive and responsive way of analyzing traditional, survey-like data frames up to about 100.000 cases.

Anyway, to be on topic: a really promising project of Optimum Dosing Strategies has been using Rapporter’s API for a number of weeks even in 2012 to compute optimal doses for different kind of antibiotics based on Monte-Carlo simulation and Bayesian adaptive feedback among other methods.
This collaboration lets the ID-ODS team develop a powerful calculator with full-blown reports ready to be attached to medical records – without any special technical knowledge on their side, as we maintain the R engine and the integration part, they code in R. This results in pleased clients all over the world, which makes us happy too.

We really look forward to ship a number of educational templates to be used in real life at several (multilingual) universities from September 2013. These templates would let teachers show customizable and interactive reports to the students with any number of comments and narrative paragraphs, which statistical introductory modules would provide a free alternative to other desktop
software used in education.

In the next few months, a part of our team will focus on spatial analysis templates, which would mean that our users could not just map, but really analyze any of their spatially related data with a few clicks and clear parameters.

Another feature request of a client seems to be a really exciting idea. Currently, Google Analytics and other tracking services provide basic options to view, filter and export the historical data of websites, blogs etc.
As creating an interface between Rapporter and the tracking services to be able to fetch the most recent data is not beyond possibility any more with the help of existing API resources, so our clients could generate annotated usage reports of any specified period of time – without restrictions. Just to emphasize some potential add-ons: using the time-series R packages in the analysis or creating real- time “dashboards” with optional forecasts about live data.

Of course you could think of other kind of live or historical data instead of Google Analytics, as creating a template for e.g. transaction data or gas usage of a household could be addressed at any time, and please do not forget about the above referenced use-cases in the 3 rd question (“[…]Rapporter can help: […]”).

But wait: the beauty of Rapporter is that you could implement all of the above ideas by yourself in our system, even without any help from us.

Ajay- What are some of things that can be easily done with Rapporter than with your plain vanilla R?

Greg- Rapporter is basically developed for creating reproducible, literative and annotated statistical modules (a.k.a. “templates”), which means the passing a data set and the list of variables with some optional arguments would end up in a full-blown written report with automatically styled tables and charts.

So using Rapporter is like writing “Sweave” or “knitr” documents, but you write the template only once, and then apply that to any number of data sets with a simple click on an intuitive user interface.

Beside this major objective: as Rapporter is running in the cloud and sharing reports and templates (or even data sets) with collaborators or with anyone on the Internet is really easy, our users can post, share any R code for free and without restrictions or release the templates with specified license and/or fees in a secured environment.

This means that Rapporter can help:

  1. scholars sharing scientific results or methods with reproducible and instantly available demo and/or dedicated implementation along with publications,
  2. teachers to create self-explanatory statistical templates which would help the students internationalize the subject by practice,
  3. any R developer to share a live and interactive demo of the implemented features of the functions with a few clicks,
  4. businesses could use a statistical platform without restrictions for a reasonable monthly fee instead of expensive and non-portable statistical programs,
  5. governments and national statistical offices to publicize census or other big data with a scientific and reliable analytic tool with annotated and clear reports while insuring the anonymity of the respondents by automatically applying custom methods (like data swapping, rounding, micro-aggregation, PRAM, adding noise etc.) to the tables and results, etc.

And of course, do not forget about one of our main objectives to let us open up the world of R to non-R users too with an intuitive, driving user interface.

(To be continued)-

About

Gergely Daróczi is co-ordinating the development of Rapporter and maintaining their  R packages. Beside he tries to be active in some open-source projects and on StackOverflow, he is a PhD candidate in sociology and also a lecturer at Corvinus University of Budapest and Pázmány Péter Catholic University in Hungary

Rapporter is a web application helping you to create comprehensive, reliable statistical reports on any mobile device or PC, using an intuitive user interface.

The application builds on the power of R beside other technologies and intended to be used in any browser doing the heavy computations on the server side. Some might consider Rapporter as a customizable graphical user interface to R – running in the cloud.

Currently, Rapporter is under heavily development and only invited alpha testers can access the application. Please sign up for an invitation if you want to have an early-bird insight on Rapporter.

part1