Windows Azure vs Amazon EC2 (and Google Storage)

Here is a comparison of Windows Azure instances vs Amazon compute instances

Compute Instance Sizes:

Developers have the ability to choose the size of VMs to run their application based on the applications resource requirements. Windows Azure compute instances come in four unique sizes to enable complex applications and workloads.

Compute Instance Size CPU Memory Instance Storage I/O Performance
Small 1.6 GHz 1.75 GB 225 GB Moderate
Medium 2 x 1.6 GHz 3.5 GB 490 GB High
Large 4 x 1.6 GHz 7 GB 1,000 GB High
Extra large 8 x 1.6 GHz 14 GB 2,040 GB High

Standard Rates:

Windows Azure

  • Compute
    • Small instance (default): $0.12 per hour
    • Medium instance: $0.24 per hour
    • Large instance: $0.48 per hour
    • Extra large instance: $0.96 per hour
  • Storage
    • $0.15 per GB stored per month
    • $0.01 per 10,000 storage transactions
  • Content Delivery Network (CDN)
    • $0.15 per GB for data transfers from European and North American locations*
    • $0.20 per GB for data transfers from other locations*
    • $0.01 per 10,000 transactions*

Source –

http://www.microsoft.com/windowsazure/offers/popup/popup.aspx?lang=en&locale=en-US&offer=MS-AZR-0001P

and

http://www.microsoft.com/windowsazure/windowsazure/

Amazon EC2 has more options though——————————-

http://aws.amazon.com/ec2/pricing/

Standard On-Demand Instances Linux/UNIX Usage Windows Usage
Small (Default) $0.085 per hour $0.12 per hour
Large $0.34 per hour $0.48 per hour
Extra Large $0.68 per hour $0.96 per hour
Micro On-Demand Instances Linux/UNIX Usage Windows Usage
Micro $0.02 per hour $0.03 per hour
High-Memory On-Demand Instances
Extra Large $0.50 per hour $0.62 per hour
Double Extra Large $1.00 per hour $1.24 per hour
Quadruple Extra Large $2.00 per hour $2.48 per hour
High-CPU On-Demand Instances
Medium $0.17 per hour $0.29 per hour
Extra Large $0.68 per hour $1.16 per hour
Cluster Compute Instances
Quadruple Extra Large $1.60 per hour N/A*
* Windows is not currently available for Cluster Compute Instances.

http://aws.amazon.com/ec2/instance-types/

Standard Instances

Instances of this family are well suited for most applications.

Small Instance – default*

1.7 GB memory
1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)
160 GB instance storage (150 GB plus 10 GB root partition)
32-bit platform
I/O Performance: Moderate
API name: m1.small

Large Instance

7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage (2×420 GB plus 10 GB root partition)
64-bit platform
I/O Performance: High
API name: m1.large

Extra Large Instance

15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage (4×420 GB plus 10 GB root partition)
64-bit platform
I/O Performance: High
API name: m1.xlarge

Micro Instances

Instances of this family provide a small amount of consistent CPU resources and allow you to burst CPUcapacity when additional cycles are available. They are well suited for lower throughput applications and web sites that consume significant compute cycles periodically.

Micro Instance

613 MB memory
Up to 2 EC2 Compute Units (for short periodic bursts)
EBS storage only
32-bit or 64-bit platform
I/O Performance: Low
API name: t1.micro

High-Memory Instances

Instances of this family offer large memory sizes for high throughput applications, including database and memory caching applications.

High-Memory Extra Large Instance

17.1 GB of memory
6.5 EC2 Compute Units (2 virtual cores with 3.25 EC2 Compute Units each)
420 GB of instance storage
64-bit platform
I/O Performance: Moderate
API name: m2.xlarge

High-Memory Double Extra Large Instance

34.2 GB of memory
13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute Units each)
850 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.2xlarge

High-Memory Quadruple Extra Large Instance

68.4 GB of memory
26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.4xlarge

High-CPU Instances

Instances of this family have proportionally more CPU resources than memory (RAM) and are well suited for compute-intensive applications.

High-CPU Medium Instance

1.7 GB of memory
5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each)
350 GB of instance storage
32-bit platform
I/O Performance: Moderate
API name: c1.medium

High-CPU Extra Large Instance

7 GB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: c1.xlarge

Cluster Compute Instances

Instances of this family provide proportionally high CPU resources with increased network performance and are well suited for High Performance Compute (HPC) applications and other demanding network-bound applications. Learn more about use of this instance type for HPC applications.

Cluster Compute Quadruple Extra Large Instance

23 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cc1.4xlarge

Also http://www.microsoft.com/en-us/sqlazure/default.aspx

offers SQL Databases as a service with a free trial offer

If you are into .Net /SQL big time or too dependent on MS, Azure is a nice option to EC2 http://www.microsoft.com/windowsazure/offers/popup/popup.aspx?lang=en&locale=en-US&offer=COMPARE_PUBLIC

Updated- I just got approved for Google Storage so am adding their info- though they are in Preview (and its free right now) 🙂

https://code.google.com/apis/storage/docs/overview.html

Functionality

Google Storage for Developers offers a rich set of features and capabilities:

Basic Operations

  • Store and access data from anywhere on the Internet.
  • Range-gets for large objects.
  • Manage metadata.

Security and Sharing

  • User authentication using secret keys or Google account.
  • Authenticated downloads from a web browser for Google account holders.
  • Secure access using SSL.
  • Easy, powerful sharing and collaboration via ACLs for individuals and groups.

Performance and scalability

  • Up to 100 gigabytes per object and 1,000 buckets per account during the preview.
  • Strong data consistency—read-after-write consistency for all upload and delete operations.
  • Namespace for your domain—only you can create bucket URIs containing your domain name.
  • Data replicated in multiple data centers across the U.S. and within the same data center.

Tools

  • Web-based storage manager.
  • GSUtil, an open source command line tool.
  • Compatible with many existing cloud storage tools and libraries.

Read the Getting Started Guide to learn more about the service.

Note: Google Storage for Developers does not support Google Apps accounts that use your company domain name at this time.

Back to top

Pricing

Google Storage for Developers pricing is based on usage.

  • Storage—$0.17/gigabyte/month
  • Network
    • Upload data to Google
      • $0.10/gigabyte
    • Download data from Google
      • $0.15/gigabyte for Americas and EMEA
      • $0.30/gigabyte for Asia-Pacific
  • Requests
    • PUT, POST, LIST—$0.01 per 1,000 requests
    • GET, HEAD—$0.01 per 10,000 requests

SAS/Blades/Servers/ GPU Benchmarks

Just checked out cool new series from NVidia servers.

Now though SAS Inc/ Jim Goodnight thinks HP Blade Servers are the cool thing- the GPU takes hardware high performance computing to another level. It would be interesting to see GPU based cloud computers as well – say for the on Demand SAS (free for academics and students) but which has had some complaints of being slow.

See this for SAS and Blade Servers-

http://www.sas.com/success/ncsu_analytics.html

To give users hands-on experience, the program is underpinned by a virtual computing lab (VCL), a remote access service that allows users to reserve a computer configured with a desired set of applications and operating system and then access that computer over the Internet. The lab is powered by an IBM BladeCenter infrastructure, which includes more than 500 blade servers, distributed between two locations. The assignment of the blade servers can be changed to meet shifts in the balance of demand among the various groups of users. Laura Ladrie, MSA Classroom Coordinator and Technical Support Specialist, says, “The virtual computing lab chose IBM hardware because of its quality, reliability and performance. IBM hardware is also energy efficient and lends itself well to high performance/low overhead computing.

Thats interesting since IBM now competes (as owner of SPSS) and also cooperates with SAS Institute

And

http://www.theaustralian.com.au/australian-it/the-world-according-to-jim-goodnight-blade-switch-slashes-job-times/story-e6frgakx-1225888236107

You’re effectively turbo-charging through deployment of many processors within the blade servers?

Yes. We’ve got machines with 192 blades on them. One of them has 202 or 203 blades. We’re using Hewlett-Packard blades with 12 CP cores on each, so it’s a total 2300 CPU cores doing the computation.

Our idea was to give every one of those cores a little piece of work to do, and we came up with a solution. It involved a very small change to the algorithm we were using, and it’s just incredible how fast we can do things now.

I don’t think of it as a grid, I think of it as essentially one computer. Most people will take a blade and make a grid out of it, where everything’s a separate computer running separate jobs.

We just look at it as one big machine that has memory and processors all over the place, so it’s a totally different concept.

GPU servers can be faster than CPU servers, though , Professor G.




Source-

http://www.nvidia.com/object/preconfigured_clusters.html

TESLA GPU COMPUTING SOLUTIONS FOR DATA CENTERS
Supercharge your cluster with the Tesla family of GPU computing solutions. Deploy 1U systems from NVIDIA or hybrid CPU-GPU servers from OEMs that integrate NVIDIA® Tesla™ GPU computing processors.

When compared to the latest quad-core CPU, Tesla 20-series GPU computing processors deliver equivalent performance at 1/20th the power consumption and 1/10th the cost. Each Tesla GPU features hundreds of parallel CUDA cores and is based on the revolutionary NVIDIA® CUDA™ parallel computing architecture with a rich set of developer tools (compilers, profilers, debuggers) for popular programming languages APIs like C, C++, Fortran, and driver APIs like OpenCL and DirectCompute.

NVIDIA’s partners provide turnkey easy-to-deploy Preconfigured Tesla GPU clusters that are customizable to your needs. For 3D cloud computing applications, our partners offer the Tesla RS clusters that are optimized for running RealityServer with iray.

Available Tesla Products for Data Centers:
– Tesla S2050
– Tesla M2050/M2070
– Tesla S1070
– Tesla M1060

Also I liked the hybrid GPU and CPU

And from a paper on comparing GPU and CPU using Benchmark tests on BLAS from a Debian- Dirk E’s excellent blog

http://dirk.eddelbuettel.com/blog/

Usage of accelerated BLAS libraries seems to shrouded in some mystery, judging from somewhat regularly recurring requests for help on lists such as r-sig-hpc(gmane version), the R list dedicated to High-Performance Computing. Yet it doesn’t have to be; installation can be really simple (on appropriate systems).

Another issue that I felt needed addressing was a comparison between the different alternatives available, quite possibly including GPU computing. So a few weeks ago I sat down and wrote a small package to run, collect, analyse and visualize some benchmarks. That package, called gcbd (more about the name below) is now onCRAN as of this morning. The package both facilitates the data collection for the paper it also contains (in the vignette form common among R packages) and provides code to analyse the data—which is also included as a SQLite database. All this is done in the Debian and Ubuntu context by transparently installing and removing suitable packages providing BLAS implementations: that we can fully automate data collection over several competing implementations via a single script (which is also included). Contributions of benchmark results is encouraged—that is the idea of the package.

And from his paper on the same-

Analysts are often eager to reap the maximum performance from their computing platforms.

A popular suggestion in recent years has been to consider optimised basic linear algebra subprograms (BLAS). Optimised BLAS libraries have been included with some (commercial) analysis platforms for a decade (Moler 2000), and have also been available for (at least some) Linux distributions for an equally long time (Maguire 1999). Setting BLAS up can be daunting: the R language and environment devotes a detailed discussion to the topic in its Installation and Administration manual (R Development Core Team 2010b, appendix A.3.1). Among the available BLAS implementations, several popular choices have emerged. Atlas (an acronym for Automatically Tuned Linear Algebra System) is popular as it has shown very good performance due to its automated and CPU-speci c tuning (Whaley and Dongarra 1999; Whaley and Petitet 2005). It is also licensed in such a way that it permits redistribution leading to fairly wide availability of Atlas.1 We deploy Atlas in both a single-threaded and a multi-threaded con guration. Another popular BLAS implementation is Goto BLAS which is named after its main developer, Kazushige Goto (Goto and Van De Geijn 2008). While `free to use’, its license does not permit redistribution putting the onus of con guration, compilation and installation on the end-user. Lastly, the Intel Math Kernel Library (MKL), a commercial product, also includes an optimised BLAS library. A recent addition to the tool chain of high-performance computing are graphical processing units (GPUs). Originally designed for optimised single-precision arithmetic to accelerate computing as performed by graphics cards, these devices are increasingly used in numerical analysis. Earlier criticism of insucient floating-point precision or severe performance penalties for double-precision calculation are being addressed by the newest models. Dependence on particular vendors remains a concern with NVidia’s CUDA toolkit (NVidia 2010) currently still the preferred development choice whereas the newer OpenCL standard (Khronos Group 2008) may become a more generic alternative that is independent of hardware vendors. Brodtkorb et al. (2010) provide an excellent recent survey. But what has been lacking is a comparison of the e ective performance of these alternatives. This paper works towards answering this question. By analysing performance across ve di erent BLAS implementations|as well as a GPU-based solution|we are able to provide a reasonably broad comparison.

Performance is measured as an end-user would experience it: we record computing times from launching commands in the interactive R environment (R Development Core Team 2010a) to their completion.

And

Basic Linear Algebra Subprograms (BLAS) provide an Application Programming Interface
(API) for linear algebra. For a given task such as, say, a multiplication of two conformant
matrices, an interface is described via a function declaration, in this case sgemm for single
precision and dgemm for double precision. The actual implementation becomes interchangeable
thanks to the API de nition and can be supplied by di erent approaches or algorithms. This
is one of the fundamental code design features we are using here to benchmark the di erence
in performance from di erent implementations.
A second key aspect is the di erence between static and shared linking. In static linking,
object code is taken from the underlying library and copied into the resulting executable.
This has several key implications. First, the executable becomes larger due to the copy of
the binary code. Second, it makes it marginally faster as the library code is present and
no additional look-up and subsequent redirection has to be performed. The actual amount
of this performance penalty is the subject of near-endless debate. We should also note that
this usually amounts to only a small load-time penalty combined with a function pointer
redirection|the actual computation e ort is unchanged as the actual object code is identi-
cal. Third, it makes the program more robust as fewer external dependencies are required.
However, this last point also has a downside: no changes in the underlying library will be
reected in the binary unless a new build is executed. Shared library builds, on the other
hand, result in smaller binaries that may run marginally slower|but which can make use of
di erent libraries without a rebuild.

Basic Linear Algebra Subprograms (BLAS) provide an Application Programming Interface(API) for linear algebra. For a given task such as, say, a multiplication of two conformantmatrices, an interface is described via a function declaration, in this case sgemm for singleprecision and dgemm for double precision. The actual implementation becomes interchangeablethanks to the API de nition and can be supplied by di erent approaches or algorithms. Thisis one of the fundamental code design features we are using here to benchmark the di erencein performance from di erent implementations.A second key aspect is the di erence between static and shared linking. In static linking,object code is taken from the underlying library and copied into the resulting executable.This has several key implications. First, the executable becomes larger due to the copy ofthe binary code. Second, it makes it marginally faster as the library code is present andno additional look-up and subsequent redirection has to be performed. The actual amountof this performance penalty is the subject of near-endless debate. We should also note thatthis usually amounts to only a small load-time penalty combined with a function pointerredirection|the actual computation e ort is unchanged as the actual object code is identi-cal. Third, it makes the program more robust as fewer external dependencies are required.However, this last point also has a downside: no changes in the underlying library will bereected in the binary unless a new build is executed. Shared library builds, on the otherhand, result in smaller binaries that may run marginally slower|but which can make use ofdi erent libraries without a rebuild.

And summing up,

reference BLAS to be dominated in all cases. Single-threaded Atlas BLAS improves on the reference BLAS but loses to multi-threaded BLAS. For multi-threaded BLAS we nd the Goto BLAS dominate the Intel MKL, with a single exception of the QR decomposition on the xeon-based system which may reveal an error. The development version of Atlas, when compiled in multi-threaded mode is competitive with both Goto BLAS and the MKL. GPU computing is found to be compelling only for very large matrix sizes. Our benchmarking framework in the gcbd package can be employed by others through the R packaging system which could lead to a wider set of benchmark results. These results could be helpful for next-generation systems which may need to make heuristic choices about when to compute on the CPU and when to compute on the GPU.

Source – DirkE’paper and blog http://dirk.eddelbuettel.com/papers/gcbd.pdf

Quite appropriately-,

Hardware solutions or atleast need to be a part of Revolution Analytic’s thinking as well. SPSS does not have any choice anymore though 😉

It would be interesting to see how the new SAS Cloud Computing/ Server Farm/ Time Sharing facility is benchmarking CPU and GPU for SAS analytics performance – if being done already it would be nice to see a SUGI paper on the same at http://sascommunity.org.

Multi threading needs to be taken care automatically by statistical software to optimize current local computing (including for New R)

Acceptable benchmarks for testing hardware as well as software need to be reinforced and published across vendors, academics  and companies.

What do you think?


Cutting Office Overheads to Almost Zero

You need to work from home.

Here is what you need-

1) Bandwidth – Broad band

Bandwidth fallback – In case the broad band fails. This should ideally be a wireless kind of modem.

Same goes for your PC s ,backup data online or in your seperate PC .

Have one UPS handy.

2) Communication – Use Skype or Gtalk

3) Email Addresses – In case you do not have your own web server, GMAIL offers most flexibilty because you can use outlook with it . Plus GTALK is loaded with it.

4) Remote Access – Use in Built Features of Microsoft like

a) remote desktop connection (Start -Programs-Accessories-Communication)

b) net meeting (Start- run – type -conf and press enter )

5) Software costs – Use remote applications to connect to Pcs which are pre installed with required software to ensure all software in your org is utilized 24 * 7 .

or use open source and freeware

(like open office ,

google apps,

avg anti virus,

zone alarm firewall,

download accelerator,

ubuntu linux,

mozilla firefox)

Try and test this on 10 % of your staff for three months and note the difference in costs. Then scale up/down depending on experience.