Google Cloud is finally here

Amazon gets some competition, and customers should see some relief, unless Google withdraws commitment on these products after a few years of trying (like it often does now!)

 

http://cloud.google.com/products/index.html

Machine Type Pricing
Configuration Virtual Cores Memory GCEU * Local disk Price/Hour $/GCEU/hour
n1-standard-1-d 1 3.75GB *** 2.75 420GB *** $0.145 0.053
n1-standard-2-d 2 7.5GB 5.5 870GB $0.29 0.053
n1-standard-4-d 4 15GB 11 1770GB $0.58 0.053
n1-standard-8-d 8 30GB 22 2 x 1770GB $1.16 0.053
Network Pricing
Ingress Free
Egress to the same Zone. Free
Egress to a different Cloud service within the same Region. Free
Egress to a different Zone in the same Region (per GB) $0.01
Egress to a different Region within the US $0.01 ****
Inter-continental Egress At Internet Egress Rate
Internet Egress (Americas/EMEA destination) per GB
0-1 TB in a month $0.12
1-10 TB $0.11
10+ TB $0.08
Internet Egress (APAC destination) per GB
0-1 TB in a month $0.21
1-10 TB $0.18
10+ TB $0.15
Persistent Disk Pricing
Provisioned space $0.10 GB/month
Snapshot storage** $0.125 GB/month
IO Operations $0.10 per million
IP Address Pricing
Static IP address (assigned but unused) $0.01 per hour
Ephemeral IP address (attached to instance) Free
* GCEU is Google Compute Engine Unit — a measure of computational power of our instances based on industry benchmarks; review the GCEU definition for more information
** coming soon
*** 1GB is defined as 2^30 bytes
**** promotional pricing; eventually will be charged at internet download rates

Google Prediction API

Tap into Google’s machine learning algorithms to analyze data and predict future outcomes.

Leverage machine learning without the complexity
Use the familiar RESTful interface
Enter input in any format – numeric or text

Build smart apps

Learn how you can use Prediction API to build customer sentiment analysis, spam detection or document and email classification.

Google Translation API

Use Google Translate API to build multilingual apps and programmatically translate text in your webpage or application.

Translate text into other languages programmatically
Use the familiar RESTful interface
Take advantage of Google’s powerful translation algorithms

Build multilingual apps

Learn how you can use Translate API to build apps that can programmatically translate text in your applications or websites.

Google BigQuery

Analyze Big Data in the cloud using SQL and get real-time business insights in seconds using Google BigQuery. Use a fully-managed data analysis service with no servers to install or maintain.
Figure

Reliable & Secure

Complete peace of mind as your data is automatically replicated across multiple sites and secured using access control lists.
Scale infinitely

You can store up to hundreds of terabytes, paying only for what you use.
Blazing fast

Run ad hoc SQL queries on
multi-terabyte datasets in seconds.

Google App Engine

Create apps on Google’s platform that are easy to manage and scale. Benefit from the same systems and infrastructure that power Google’s applications.

Focus on your apps

Let us worry about the underlying infrastructure and systems.
Scale infinitely

See your applications scale seamlessly from hundreds to millions of users.
Business ready

Premium paid support and 99.95% SLA for business users

Google Cloud SQL

Another xing bang API from the boyz in Mountain View. (entry by invite only) But it is free and you can test your stuff on a MySQL db =10 GB

Database as a service ? (Maybe)— while Amazon was building fires (and Fire)

—————————————————————–

https://code.google.com/apis/sql/index.html

What is Google Cloud SQL?

Google Cloud SQL is a web service that provides a highly available, fully-managed, hosted SQL storage solution for your App Engine applications.

What are the benefits of using Google Cloud SQL?

You can access a familiar, highly available SQL database from your App Engine applications, without having to worry about provisioning, management, and integration with other Google services.

How much does Google Cloud SQL cost?

We will not be billing for this service in 2011. We will give you at least 30 days’ advance notice before we begin billing in the future. Other services such as Google App Engine, Google Cloud Storage etc. that you use with Google Cloud SQL may have their own payment terms, and you need to pay for them. Please consult their documentation for details.

Currently you are limited to the three instance sizes. What if I need to store more data or need better performance?

In the Limited Preview period, we only have three sizes available. If you have specific needs, we would like to hear from you on our google-cloud-sqldiscussion board.

When is Google Cloud SQL be out of Limited Preview?

We are working hard to make the service generally available.We don’t have a firm date that we can announce right now.

Do you support all the features of MySQL?

In general, Google Cloud SQL supports all the features of MySQL. The following are lists of all the unsupported features and notable differences that Google Cloud SQL has from MySQL.

Unsupported Features:

  • User defined functions
  • MySql replication

Unsupported MySQL statements:

  • LOAD DATA INFILE
  • SELECT ... INTO OUTFILE
  • SELECT ... INTO DUMPFILE
  • INSTALL PLUGIN .. SONAME ...
  • UNINSTALL PLUGIN
  • CREATE FUNCTION ... SONAME ...

Unsupported SQL Functions:

  • LOAD_FILE()

Notable Differences:

  • If you want to import databases with binary data into your Google Cloud SQL instance, you must use the --hex-blob option with mysqldump.Although this is not a required flag when you are using a local MySQL server instance and the MySQL command line, it is required if you want to import any databases with binary data into your Google Cloud SQL instance. For more information, see Importing Data.
How large a database can I use with Google Cloud SQL?
Currently, in this limited preview period, your database instance must be no larger than 10GB.
How can I be notified when there are any changes to Google Cloud SQL?
You can sign up for the sql-announcements forum where we post announcements and news about the Google Cloud SQL.
How can I cancel my Google Cloud SQL account?
To remove all data from your Google Cloud SQL account and disable the service:

  1. Delete all your data. You can remove your tables, databases, and indexes using the drop command. For more information, see SQL DROP statement.
  2. Deactivate the Google Cloud SQL by visiting the Services pane and clicking the On button next to Google Cloud SQL. The button changes from Onto Off.
How do I report a bug, request a feature, or ask a question?
You can report bugs and request a feature on our project page.You can ask a question in our discussion forum.

Getting Started

Can I use languages other than Java or Python?
Only Java and Python are supported for Google Cloud SQL.
Can I use Google Cloud SQL outside of Google App Engine?
The Limited Preview is primarily focused on giving Google App Engine customers the ability to use a familiar relational database environment. Currently, you cannot access Google Cloud SQL from outside Google App Engine.
What database engine are we using in the Google Cloud SQL?
MySql Version 5.1.59
Do I need to install a local version of MySQL to use the Development Server?
Yes.

Managing Your Instances

Do I need to use the Google APIs Console to use Google Cloud SQL?
Yes. For basic tasks like granting access control to applications, creating instances, and deleting instances, you need to use the Google APIs Console.
Can I import or export specific databases?
No, currently it is not possible to export specific databases. You can only export your entire instance.
Do I need a Google Cloud Storage account to import or export my instances?
Yes, you need to sign up for a Google Cloud Storage account or have access to a Google Cloud Storage account to import or export your instances. For more information, see Importing and Exporting Data.
If I delete my instance, can I reuse the instance name?
Yes, but not right away. The instance name is reserved for up to two months before it can be reused.

Tools & Resources

Can I use Django with Google Cloud SQL?
No, currently Google Cloud SQL is not compatible with Django.
What is the best tool to use for interacting with my instance?
There are a variety of tools available for Google Cloud SQL. For executing simple statements, you can use the SQL prompt. For executing more complicated tasks, you might want to use the command line tool. If you want to use a tool with a graphical interface, the SQuirrel SQL Client provides an interface you can use to interact with your instance.

Common Technical Questions

Should I use InnoDB for my tables?
Yes. InnoDB is the default storage engine in MySQL 5.5 and is also the recommended storage engine for Google Cloud SQL. If you do not need any features that require MyISAM, you should use InnoDB. You can convert your existing tables using the following SQL command, replacing tablename with the name of the table to convert:

ALTER tablename ENGINE = InnoDB;

If you have a mysqldump file where all your tables are in MyISAM format, you can convert them by piping the file through a sed script:

mysqldump --databases database_name [-u username -p  password] --hex-blob database_name | sed 's/ENGINE=MyISAM/ENGINE=InnoDB/g' > database_file.sql

Warning: You should not do this if your mysqldump file contains the mysql schema. Those files must remain in MyISAM.

Are there any size or QPS limits?
Yes, the following limits apply to Google Cloud SQL:

Resource Limits from External Requests Limits from Google App Engine
Queries Per Second (QPS) 5 QPS No limit
Maximum Request Size 16 MB
Maximum Response Size 16 MB

Google App Engine Limits

Google App Engine applications are also subject to additional Google App Engine quotas and limits. Requests from Google App Engine applications to Google Cloud SQL are subject to the following time limits:

  • All database requests must finish within the HTTP request timer, around 60 seconds.
  • Offline requests like cron tasks have a time limit of 10 minutes.
  • Backend requests to Google Cloud SQL have a time limit of 10 minutes.

App Engine-specific quotas and access limits are discussed on the Google App Engine Quotas page.

Should I use Google Cloud SQL with my non-High Replication App Engine application?
We recommend that you use Google Cloud SQL with High Replication App Engine applications. While you can use use Google Cloud SQL with applications that do not use high replication, doing so might impact performance.
Source-
https://code.google.com/apis/sql/faq.html#supportmysqlfeatures

Windows Azure and Amazon Free offer

Simple Cpu Cache Memory Organization
Image via Wikipedia

For Hi-Computing folks try out Azure for free-

http://www.microsoft.com/windowsazure/offers/popup/popup.aspx?lang=en&locale=en-US&offer=MS-AZR-0001P#compute

Windows Azure Platform
Introductory Special

This promotional offer enables you to try a limited amount of the Windows Azure platform at no charge. The subscription includes a base level of monthly compute hours, storage, data transfers, a SQL Azure database, Access Control transactions and Service Bus connections at no charge. Please note that any usage over this introductory base level will be charged at standard rates.

Included each month at no charge:

  • Windows Azure
    • 25 hours of a small compute instance
    • 500 MB of storage
    • 10,000 storage transactions
  • SQL Azure
    • 1GB Web Edition database (available for first 3 months only)
  • Windows Azure platform AppFabric
    • 100,000 Access Control transactions
    • 2 Service Bus connections
  • Data Transfers (per region)
    • 500 MB in
    • 500 MB out

Any monthly usage in excess of the above amounts will be charged at the standard rates. This introductory special will end on March 31, 2011 and all usage will then be charged at the standard rates.

Standard Rates:

Windows Azure

  • Compute*
    • Extra small instance**: $0.05 per hour
    • Small instance (default): $0.12 per hour
    • Medium instance: $0.24 per hour
    • Large instance: $0.48 per hour
    • Extra large instance: $0.96 per hour

 

http://aws.amazon.com/ec2/pricing/

Free Tier*

As part of AWS’s Free Usage Tier, new AWS customers can get started with Amazon EC2 for free. Upon sign-up, new AWScustomers receive the following EC2 services each month for one year:

  • 750 hours of EC2 running Linux/Unix Micro instance usage
  • 750 hours of Elastic Load Balancing plus 15 GB data processing
  • 10 GB of Amazon Elastic Block Storage (EBS) plus 1 million IOs, 1 GB snapshot storage, 10,000 snapshot Get Requests and 1,000 snapshot Put Requests
  • 15 GB of bandwidth in and 15 GB of bandwidth out aggregated across all AWS services

 

Paid Instances-

 

Standard On-Demand Instances Linux/UNIX Usage Windows Usage
Small (Default) $0.085 per hour $0.12 per hour
Large $0.34 per hour $0.48 per hour
Extra Large $0.68 per hour $0.96 per hour
Micro On-Demand Instances
Micro $0.02 per hour $0.03 per hour
High-Memory On-Demand Instances
Extra Large $0.50 per hour $0.62 per hour
Double Extra Large $1.00 per hour $1.24 per hour
Quadruple Extra Large $2.00 per hour $2.48 per hour
High-CPU On-Demand Instances
Medium $0.17 per hour $0.29 per hour
Extra Large $0.68 per hour $1.16 per hour
Cluster Compute Instances
Quadruple Extra Large $1.60 per hour N/A*
Cluster GPU Instances
Quadruple Extra Large $2.10 per hour N/A*
* Windows is not currently available for Cluster Compute or Cluster GPU Instances.

 

NOTE- Amazon Instance definitions differ slightly from Azure definitions

http://aws.amazon.com/ec2/instance-types/

Available Instance Types

Standard Instances

Instances of this family are well suited for most applications.

Small Instance – default*

1.7 GB memory
1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)
160 GB instance storage
32-bit platform
I/O Performance: Moderate
API name: m1.small

Large Instance

7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.large

Extra Large Instance

15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.xlarge

Micro Instances

Instances of this family provide a small amount of consistent CPU resources and allow you to burst CPU capacity when additional cycles are available. They are well suited for lower throughput applications and web sites that consume significant compute cycles periodically.

Micro Instance

613 MB memory
Up to 2 EC2 Compute Units (for short periodic bursts)
EBS storage only
32-bit or 64-bit platform
I/O Performance: Low
API name: t1.micro

High-Memory Instances

Instances of this family offer large memory sizes for high throughput applications, including database and memory caching applications.

High-Memory Extra Large Instance

17.1 GB of memory
6.5 EC2 Compute Units (2 virtual cores with 3.25 EC2 Compute Units each)
420 GB of instance storage
64-bit platform
I/O Performance: Moderate
API name: m2.xlarge

High-Memory Double Extra Large Instance

34.2 GB of memory
13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute Units each)
850 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.2xlarge

High-Memory Quadruple Extra Large Instance

68.4 GB of memory
26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.4xlarge

High-CPU Instances

Instances of this family have proportionally more CPU resources than memory (RAM) and are well suited for compute-intensive applications.

High-CPU Medium Instance

1.7 GB of memory
5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each)
350 GB of instance storage
32-bit platform
I/O Performance: Moderate
API name: c1.medium

High-CPU Extra Large Instance

7 GB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: c1.xlarge

Cluster Compute Instances

Instances of this family provide proportionally high CPU resources with increased network performance and are well suited for High Performance Compute (HPC) applications and other demanding network-bound applications. Learn more about use of this instance type for HPC applications.

Cluster Compute Quadruple Extra Large Instance

23 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cc1.4xlarge

Cluster GPU Instances

Instances of this family provide general-purpose graphics processing units (GPUs) with proportionally high CPU and increased network performance for applications benefitting from highly parallelized processing, including HPC, rendering and media processing applications. While Cluster Compute Instances provide the ability to create clusters of instances connected by a low latency, high throughput network, Cluster GPU Instances provide an additional option for applications that can benefit from the efficiency gains of the parallel computing power of GPUs over what can be achieved with traditional processors. Learn moreabout use of this instance type for HPC applications.

Cluster GPU Quadruple Extra Large Instance

22 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
2 x NVIDIA Tesla “Fermi” M2050 GPUs
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cg1.4xlarge

versus-

Windows Azure compute instances come in five unique sizes to enable complex applications and workloads.

Compute Instance Size CPU Memory Instance Storage I/O Performance
Extra Small 1 GHz 768 MB 20 GB* Low
Small 1.6 GHz 1.75 GB 225 GB Moderate
Medium 2 x 1.6 GHz 3.5 GB 490 GB High
Large 4 x 1.6 GHz 7 GB 1,000 GB High
Extra large 8 x 1.6 GHz 14 GB 2,040 GB High

*There is a limitation on the Virtual Hard Drive (VHD) size if you are deploying a Virtual Machine role on an extra small instance. The VHD can only be up to 15 GB.

 

 

Sector/ Sphere – Faster than Hadoop/Mapreduce at Terasort

Here is a preview of a relatively young software Sector and Sphere- which are claimed to be better than Hadoop /MapReduce at TeraSort Benchmark among others.

http://sector.sourceforge.net/tech.html

System Overview

The Sector/Sphere stack consists of the Sector distributed file system and the Sphere parallel data processing framework. The objective is to support highly effective and efficient large data storage and processing over commodity computer clusters.

Sector/Sphere Architecture

Sector consists of 4 parts, as shown in the above diagram. The Security server maintains the system security configurations such as user accounts, data IO permissions, and IP access control lists. The master servers maintain file system metadata, schedule jobs, and respond users’ requests. Sector supports multiple active masters that can join and leave at run time and they all actively respond users’ requests. The slave nodes are racks of computers that store and process data. The slaves nodes can be located within a single data center to across multiple data centers with high speed network connections. Finally, the client includes tools and programming APIs to access and process Sector data.

Sphere: Parallel Data Processing Framework

Sphere allows developers to write parallel data processing applications with a very simple set of API. It applies user-defined functions (UDF) on all input data segments in parallel. In a Sphere application, both inputs and outputs are Sector files. Multiple Sphere processing can be combined to support more complicated applications, with inputs/outputs exchanged/shared via the Sector file system.

Data segments are processed at their storage locations whenever possible (data locality). Failed data segments may be restarted on other nodes to achieve fault tolerance.

The Sphere framework can be compared to MapReduce as they both enforce data locality and provide simplified programming interfaces. In fact, Sphere can simulate any MapReduce operations, but Sphere is more efficient and flexible. Sphere can provide better data locality for applications that process files or multiple files as minimum input units and for applications that involve with iterative/combinative processing, which requires coordination of multiple UDFs to obtain the final result.

A Sphere application includes two parts: the client program that organizes inputs (including certain parameters), outputs, and UDFs; and the UDFs that process data segments. Data segmentation, load balancing, and fault tolerance are transparent to developers.

Space: Column-based Distbuted Data Table

Space stores data tables in Sector and uses Sphere for parallel query processing. Space is similar to BigTable. Table is stored by columns and is segmented on to multiple slave nodes. Tables are independent and no relationship between tables are supported. A reduced set of SQL operations is supported, including but not limited to table creation and modification, key-value update and lookup, and select operations based on UDF.

Supported by the Sector data placement mechanism and the Sphere parallel processing framework, Space can support efficient key-value lookup and certain SQL queries on very large data tables.

Space is currently still in development.

and just when you thought Hadoop was the only way to be on the cloud.

http://sector.sourceforge.net/benchmark.html

The Terasort Benchmark

The table below lists the performance (total processing time in seconds) of the Terasort benchmark of both Sphere and Hadoop. (Terasort benchmark: suppose there are N nodes in the system, the benchmark generates a 10GB file on each node and sorts the total N*10GB data. Data generation time is excluded.) Note that it is normal to see a longer processing time for more nodes because the total amount of data also increases proportionally.

The performance value listed in this page was achieved using the Open Cloud Testbed. Currently the testbed consists of 4 racks. Each rack has 32 nodes, including 1 NFS server, 1 head node, and 30 compute/slave nodes. The head node is a Dell 1950, dual dual-core Xeon 3.0GHz, 16GB RAM. The compute nodes are Dell 1435s, single dual core AMD Opteron 2.0GHz, 4GB RAM, and 1TB single disk. The 4 racks are located in JHU (Baltimore), StarLight (Chicago), UIC (Chicago), and Calit2(San Diego). The inter-rack bandwidth is 10GE, supported by CiscoWave deployed over National Lambda Rail.

Sphere
Hadoop (3 replicas)
Hadoop (1 replica)
UIC
1265 2889 2252
UIC + StarLight
1361 2896 2617
UIC + StarLight + Calit2
1430 4341 3069
UIC + StarLight + Calit2 + JHU
1526 6675 3702

The benchmark uses the testfs/testdc examples of Sphere and randomwriter/sort examples of Hadoop. Hadoop parameters were tuned to reach good results.

Updated on Sep. 22, 2009: We have benchmarked the most recent versions of Sector/Sphere (1.24a) and Hadoop (0.20.1) on a new set of servers. Each server node costs $2,200 and consits of a single Intel Xeon E5410 2.4GHz CPU, 16GB RAM, 4*1TB RAID0 disk, and 1Gb/s NIC. The 120 nodes are hosted on 4 racks within the same data center and the inter-rack bandwidth is 20Gb/s.

The table below lists the performance of sorting 1TB data using Sector/Sphere version 1.24a and Hadoop 0.20.1. Related Hadoop parameters have been tuned for better performance (e.g., big block size), while Sector/Sphere does not require tuning. In addition, to achieve the highest performance, replication is disabled in both systems (note that replication does not afftect the performance of Sphere but will significantly decrease the performance of Hadoop).

Number of Racks
Sphere
Hadoop
1
28m 25s 85m 49s
2
15m 20s 37m 0s
3
10m 19s 25m 14s
4
7m 56s 17m 45s