Some ways to test and use cloud computing for free for yourself-
- Windows Azure
- Amazon Ec2
- Google Storage
The folks at Microsoft Azure announced a 90 day free trial Continue reading “Cloud Computing by Windows , Amazon and Google for free”
Some ways to test and use cloud computing for free for yourself-
The folks at Microsoft Azure announced a 90 day free trial Continue reading “Cloud Computing by Windows , Amazon and Google for free”
Here is an interview with Mike Boyarski , Director Product Marketing at Jaspersoft
the largest BI community with over 14 million downloads, nearly 230,000 registered members, representing over 175,000 production deployments, 14,000 customers, across 100 countries.
From http://www.sas.com/events/analytics/us/
The Analytics 2011 Conference Series combines the power of SAS’s M2010 Data Mining Conference and F2010 Business Forecasting Conference into one conference covering the latest trends and techniques in the field of analytics. Analytics 2011 Conference Series brings the brightest minds in the field of analytics together with hundreds of analytics practitioners. Join us as these leading conferences change names and locations. At Analytics 2011, you’ll learn through a series of case studies, technical presentations and hands-on training. If you are in the field of analytics, this is one conference you can’t afford to miss.
October 24-25, 2011
Grande Lakes Resort
Orlando, FL
message from Amazing Amazon’s cloud team- this will also help for #rstats users given that revolution Analytics full versions on RHEL.
—————————————————-
on-demand instances of Amazon EC2 running Red Hat Enterprise Linux (RHEL) for as little as $0.145 per instance hour. The offering combines the cost-effectiveness, scalability and flexibility of running in Amazon EC2 with the proven reliability of Red Hat Enterprise Linux.
Highlights of the offering include:
For more information, please visit the Amazon EC2 Red Hat Enterprise Linux page.
which is
Amazon EC2 running Red Hat Enterprise Linux provides a dependable platform to deploy a broad range of applications. By running RHEL on EC2, you can leverage the cost effectiveness, scalability and flexibility of Amazon EC2, the proven reliability of Red Hat Enterprise Linux, and AWS premium support with back-line support from Red Hat.. Red Hat Enterprise Linux on EC2 is available in versions 5.5, 5.6, 6.0, and 6.1, both in 32-bit and 64-bit architectures.
Amazon EC2 running Red Hat Enterprise Linux provides seamless integration with existing Amazon EC2 features including Amazon Elastic Block Store (EBS), Amazon CloudWatch, Elastic-Load Balancing, and Elastic IPs. Red Hat Enterprise Linux instances are available in multiple Availability Zones in all Regions.
Pay only for what you use with no long-term commitments and no minimum fee.
On-Demand Instances let you pay for compute capacity by the hour with no long-term commitments.
| Standard Instances | Red Hat Enterprise Linux |
|---|---|
| Small (Default) | $0.145 per hour |
| Large | $0.40 per hour |
| Extra Large | $0.74 per hour |
| Micro Instances | Red Hat Enterprise Linux |
| Micro | $0.08 per hour |
| High-Memory Instances | Red Hat Enterprise Linux |
| Extra Large | $0.56 per hour |
| Double Extra Large | $1.06 per hour |
| Quadruple Extra Large | $2.10 per hour |
| High-CPU Instances | Red Hat Enterprise Linux |
| Medium | $0.23 per hour |
| Extra Large | $0.78 per hour |
| Cluster Compute Instances | Red Hat Enterprise Linux |
| Quadruple Extra Large | $1.70 per hour |
| Cluster GPU Instances | Red Hat Enterprise Linux |
| Quadruple Extra Large | $2.20 per hour |
Pricing is per instance-hour consumed for each instance type. Partial instance-hours consumed are billed as full hours.
and
Instances of this family are well suited for most applications.
Small Instance – default*
1.7 GB memory
1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)
160 GB instance storage
32-bit platform
I/O Performance: Moderate
API name: m1.small
Large Instance
7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.large
Extra Large Instance
15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.xlarge
Instances of this family provide a small amount of consistent CPU resources and allow you to burst CPU capacity when additional cycles are available. They are well suited for lower throughput applications and web sites that consume significant compute cycles periodically.
Micro Instance
613 MB memory
Up to 2 EC2 Compute Units (for short periodic bursts)
EBS storage only
32-bit or 64-bit platform
I/O Performance: Low
API name: t1.micro
Instances of this family offer large memory sizes for high throughput applications, including database and memory caching applications.
High-Memory Extra Large Instance
17.1 GB of memory
6.5 EC2 Compute Units (2 virtual cores with 3.25 EC2 Compute Units each)
420 GB of instance storage
64-bit platform
I/O Performance: Moderate
API name: m2.xlarge
High-Memory Double Extra Large Instance
34.2 GB of memory
13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute Units each)
850 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.2xlarge
High-Memory Quadruple Extra Large Instance
68.4 GB of memory
26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.4xlarge
Instances of this family have proportionally more CPU resources than memory (RAM) and are well suited for compute-intensive applications.
High-CPU Medium Instance
1.7 GB of memory
5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each)
350 GB of instance storage
32-bit platform
I/O Performance: Moderate
API name: c1.medium
High-CPU Extra Large Instance
7 GB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: c1.xlarge
Instances of this family provide proportionally high CPU resources with increased network performance and are well suited for High Performance Compute (HPC) applications and other demanding network-bound applications. Learn more about use of this instance type for HPC applications.
Cluster Compute Quadruple Extra Large Instance
23 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cc1.4xlarge
Instances of this family provide general-purpose graphics processing units (GPUs) with proportionally high CPU and increased network performance for applications benefitting from highly parallelized processing, including HPC, rendering and media processing applications. While Cluster Compute Instances provide the ability to create clusters of instances connected by a low latency, high throughput network, Cluster GPU Instances provide an additional option for applications that can benefit from the efficiency gains of the parallel computing power of GPUs over what can be achieved with traditional processors. Learn more about use of this instance type for HPC applications.
Cluster GPU Quadruple Extra Large Instance
22 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
2 x NVIDIA Tesla “Fermi” M2050 GPUs
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cg1.4xlarge
To get started using Red Hat Enterprise Linux on Amazon EC2, perform the following steps:
The AWS Management Console is an easy tool to start and manage your instances. If you are looking for more details on launching an instance, a quick video tutorial on how to use Amazon EC2 with the AWS Management Console can be found here .
A full list of Red Hat Enterprise Linux AMIs can be found in the AWS AMI Catalog.
All customers running Red Hat Enterprise Linux on EC2 will receive access to repository updates from Red Hat. Moreover, AWS Premium support customers can contact AWS to get access to a support structure from both Amazon and Red Hat.
Red Hat, the world’s leading open source solutions provider, is headquartered in Raleigh, NC with over 50 satellite offices spanning the globe. Red Hat provides high-quality, low-cost technology with its operating system platform, Red Hat Enterprise Linux, together with applications, management and Services Oriented Architecture (SOA) solutions, including the JBoss Enterprise Middleware Suite. Red Hat also offers support, training and consulting services to its customers worldwide.
also from Revolution Analytics- in case you want to #rstats in the cloud and thus kill all that talk of RAM dependency, slow R than other softwares (just increase the RAM above in the instances to keep it simple)
,or Revolution not being open enough
http://www.revolutionanalytics.com/downloads/gpl-sources.php
Revolution Analytics uses an Open-Core Licensing model. We provide open- source R bundled with proprietary modules from Revolution Analytics that provide additional functionality for our users. Open-source R is distributed under the GNU Public License (version 2), and we make our software available under a commercial license.
Revolution Analytics respects the importance of open source licenses and has contributed code to the open source R project and will continue to do so. We have carefully reviewed our compliance with GPLv2 and have worked with Mark Radcliffe of DLA Piper, the outside General Legal Counsel of the Open Source Initiative, to ensure that we fully comply with the obligations of the GPLv2.
For our Revolution R distribution, we may make some minor modifications to the R sources (the ChangeLog file lists all changes made). You can download these modified sources of open-source R under the terms of the GPLv2, using either the links below or those in the email sent to you when you download a specific version of Revolution R.
| Product | Version | Platform | Modified R Sources |
| Revolution R Community | 3.2 | Windows | R 2.10.1 |
| Revolution R Community | 3.2 | MacOS | R 2.10.1 |
| Revolution R Enterprise | 3.1.1 | RHEL | R 2.9.2 |
| Revolution R Enterprise | 4.0 | Windows | R 2.11.1 |
| Revolution R Enterprise | 4.0.1 | RHEL | R 2.11.1 |
| Revolution R Enterprise | 4.1.0 | Windows | R 2.11.1 |
| Revolution R Enterprise | 4.2 | Windows | R 2.11.1 |
| Revolution R Enterprise | 4.2 | RHEL | R 2.11.1 |
| Revolution R Enterprise | 4.3 | Windows & RHEL | R 2.12.2 |
This is a short list of several known as well as lesser known R ( #rstats) language codes, packages and tricks to build a business intelligence application. It will be slightly Messy (and not Messi) but I hope to refine it someday when the cows come home.
It assumes that BI is basically-
a Database, a Document Database, a Report creation/Dashboard pulling software as well unique R packages for business intelligence.
What is business intelligence?
Seamless dissemination of data in the organization. In short let it flow- from raw transactional data to aggregate dashboards, to control and test experiments, to new and legacy data mining models- a business intelligence enabled organization allows information to flow easily AND capture insights and feedback for further action.
BI software has lately meant to be just reporting software- and Business Analytics has meant to be primarily predictive analytics. the terms are interchangeable in my opinion -as BI reports can also be called descriptive aggregated statistics or descriptive analytics, and predictive analytics is useless and incomplete unless you measure the effect in dashboards and summary reports.
Data Mining- is a bit more than predictive analytics- it includes pattern recognizability as well as black box machine learning algorithms. To further aggravate these divides, students mostly learn data mining in computer science, predictive analytics (if at all) in business departments and statistics, and no one teaches metrics , dashboards, reporting in mainstream academia even though a large number of graduates will end up fiddling with spreadsheets or dashboards in real careers.
Using R with
1) Databases-
I created a short list of database connectivity with R here at https://rforanalytics.wordpress.com/odbc-databases-for-r/ but R has released 3 new versions since then.
The RODBC package remains the package of choice for connecting to SQL Databases.
http://cran.r-project.org/web/packages/RODBC/RODBC.pdf
Details on creating DSN and connecting to Databases are given at https://rforanalytics.wordpress.com/odbc-databases-for-r/
For document databases like MongoDB and CouchDB
( what is the difference between traditional RDBMS and NoSQL if you ever need to explain it in a cocktail conversation http://dba.stackexchange.com/questions/5/what-are-the-differences-between-nosql-and-a-traditional-rdbms
Basically dispensing with the relational setup, with primary and foreign keys, and with the additional overhead involved in keeping transactional safety, often gives you extreme increases in performance
NoSQL is a kind of database that doesn’t have a fixed schema like a traditional RDBMS does. With the NoSQL databases the schema is defined by the developer at run time. They don’t write normal SQL statements against the database, but instead use an API to get the data that they need.
instead relating data in one table to another you store things as key value pairs and there is no database schema, it is handled instead in code.)
I believe any corporation with data driven decision making would need to both have atleast one RDBMS and one NoSQL for unstructured data-Ajay. This is a sweeping generic statement 😉 , and is an opinion on future technologies.
From- http://tommy.chheng.com/2010/11/03/rmongo-accessing-mongodb-in-r/
http://plindenbaum.blogspot.com/2010/09/connecting-to-mongodb-database-from-r.html
Connecting to a MongoDB database from R using Java
http://nsaunders.wordpress.com/2010/09/24/connecting-to-a-mongodb-database-from-r-using-java/
Also see a nice basic analysis using R Mongo from
http://pseudofish.com/blog/2011/05/25/analysis-of-data-with-mongodb-and-r/
For CouchDB
please see https://github.com/wactbprot/R4CouchDB and
http://digitheadslabnotebook.blogspot.com/2010/10/couchdb-and-r.html
2) External Report Creating Software-
Jaspersoft- It has good integration with R and is a certified Revolution Analytics partner (who seem to be the only ones with a coherent #Rstats go to market strategy- which begs the question – why is the freest and finest stats software having only ONE vendor- if it was so great lots of companies would make exclusive products for it – (and some do -see https://rforanalytics.wordpress.com/r-business-solutions/ and https://rforanalytics.wordpress.com/using-r-from-other-software/)
From
http://www.jaspersoft.com/sites/default/files/downloads/events/Analytics%20-Jaspersoft-SEP2010.pdf
we see
http://jasperforge.org/projects/rrevodeployrbyrevolutionanalytics
RevoConnectR for JasperReports Server
RevoConnectR for JasperReports Server RevoConnectR for JasperReports Server is a Java library interface between JasperReports Server and Revolution R Enterprise’s RevoDeployR, a standardized collection of web services that integrates security, APIs, scripts and libraries for R into a single server. JasperReports Server dashboards can retrieve R charts and result sets from RevoDeployR.
http://jasperforge.org/plugins/esp_frs/optional_download.php?group_id=409
R and BI – Integrating R with Open Source Business Intelligence Platforms Pentaho and Jaspersoft David Reinke, Steve Miller Keywords: business intelligence Increasingly, R is becoming the tool of choice for statistical analysis, optimization, machine learning and visualization in the business world. This trend will only escalate as more R analysts transition to business from academia. But whereas in academia R is often the central tool for analytics, in business R must coexist with and enhance mainstream business intelligence (BI) technologies. A modern BI portfolio already includes relational databeses, data integration (extract, transform, load – ETL), query and reporting, online analytical processing (OLAP), dashboards, and advanced visualization. The opportunity to extend traditional BI with R analytics revolves on the introduction of advanced statistical modeling and visualizations native to R. The challenge is to seamlessly integrate R capabilities within the existing BI space. This presentation will explain and demo an initial approach to integrating R with two comprehensive open source BI (OSBI) platforms – Pentaho and Jaspersoft. Our efforts will be successful if we stimulate additional progress, transparency and innovation by combining the R and BI worlds. The demonstration will show how we integrated the OSBI platforms with R through use of RServe and its Java API. The BI platforms provide an end user web application which include application security, data provisioning and BI functionality. Our integration will demonstrate a process by which BI components can be created that prompt the user for parameters, acquire data from a relational database and pass into RServer, invoke R commands for processing, and display the resulting R generated statistics and/or graphs within the BI platform. Discussion will include concepts related to creating a reusable java class library of commonly used processes to speed additional development.
If you know Java- try http://ramanareddyg.blog.com/2010/07/03/integrating-r-and-pentaho-data-integration/
and I like this list by two venerable powerhouses of the BI Open Source Movement
http://www.openbi.com/demosarticles.html
Open Source BI as disruptive technology
http://www.openbi.biz/articles/osbi_disruption_openbi.pdf
Open Source Punditry
| TITLE | AUTHOR | COMMENTS |
|---|---|---|
| Commercial Open Source BI Redux | Dave Reinke & Steve Miller | An review and update on the predictions made in our 2007 article focused on the current state of the commercial open source BI market. Also included is a brief analysis of potential options for commercial open source business models and our take on their applicability. |
| Open Source BI as Disruptive Technology | Dave Reinke & Steve Miller | Reprint of May 2007 DM Review article explaining how and why Commercial Open Source BI (COSBI) will disrupt the traditional proprietary market. |
| TITLE | AUTHOR | COMMENTS |
|---|---|---|
| R You Ready for Open Source Statistics? | Steve Miller | R has become the “lingua franca” for academic statistical analysis and modeling, and is now rapidly gaining exposure in the commercial world. Steve examines the R technology and community and its relevancy to mainstream BI. |
| R and BI (Part 1): Data Analysis with R | Steve Miller | An introduction to R and its myriad statistical graphing techniques. |
| R and BI (Part 2): A Statistical Look at Detail Data | Steve Miller | The usage of R’s graphical building blocks – dotplots, stripplots and xyplots – to create dashboards which require little ink yet tell a big story. |
| R and BI (Part 3): The Grooming of Box and Whiskers | Steve Miller | Boxplots and variants (e.g. Violin Plot) are explored as an essential graphical technique to summarize data distributions by categories and dimensions of other attributes. |
| R and BI (Part 4): Embellishing Graphs | Steve Miller | Lattices and logarithmic data transformations are used to illuminate data density and distribution and find patterns otherwise missed using classic charting techniques. |
| R and BI (Part 5): Predictive Modelling | Steve Miller | An introduction to basic predictive modelling terminology and techniques with graphical examples created using R. |
| R and BI (Part 6) : Re-expressing Data |
Steve Miller | How do you deal with highly skewed data distributions? Standard charting techniques on this “deviant” data often fail to illuminate relationships. This article explains techniques to re-express skewed data so that it is more understandable. |
| The Stock Market, 2007 | Steve Miller | R-based dashboards are presented to demonstrate the return performance of various asset classes during 2007. |
| Bootstrapping for Portfolio Returns: The Practice of Statistical Analysis | Steve Miller | Steve uses the R open source stats package and Monte Carlo simulations to examine alternative investment portfolio returns…a good example of applied statistics using R. |
| Statistical Graphs for Portfolio Returns | Steve Miller | Steve uses the R open source stats package to analyze market returns by asset class with some very provocative embedded trellis charts. |
| Frank Harrell, Iowa State and useR!2007 | Steve Miller | In August, Steve attended the 2007 Internation R User conference (useR!2007). This article details his experiences, including his meeting with long-time R community expert, Frank Harrell. |
| An Open Source Statistical “Dashboard” for Investment Performance | Steve Miller | The newly launched Dashboard Insight web site is focused on the most useful of BI tools: dashboards. With this article discussing the use of R and trellis graphics, OpenBI brings the realm of open source to this forum. |
| Unsexy Graphics for Business Intelligence | Steve Miller | Utilizing Tufte’s philosophy of maximizing the data to ink ratio of graphics, Steve demonstrates the value in dot plot diagramming. The R open source statistical/analytics software is showcased. |
brew: Templating Framework for Report Generation brew implements a templating framework for mixing text and R code for report generation. brew template syntax is similar to PHP, Ruby's erb module, Java Server Pages, and Python's psp module. http://bit.ly/jINmaI
http://dirk.eddelbuettel.com/blog/2011/01/16/#overbought_oversold_plot