Windows Azure vs Amazon EC2 (and Google Storage)

Here is a comparison of Windows Azure instances vs Amazon compute instances

Compute Instance Sizes:

Developers have the ability to choose the size of VMs to run their application based on the applications resource requirements. Windows Azure compute instances come in four unique sizes to enable complex applications and workloads.

Compute Instance Size CPU Memory Instance Storage I/O Performance
Small 1.6 GHz 1.75 GB 225 GB Moderate
Medium 2 x 1.6 GHz 3.5 GB 490 GB High
Large 4 x 1.6 GHz 7 GB 1,000 GB High
Extra large 8 x 1.6 GHz 14 GB 2,040 GB High

Standard Rates:

Windows Azure

  • Compute
    • Small instance (default): $0.12 per hour
    • Medium instance: $0.24 per hour
    • Large instance: $0.48 per hour
    • Extra large instance: $0.96 per hour
  • Storage
    • $0.15 per GB stored per month
    • $0.01 per 10,000 storage transactions
  • Content Delivery Network (CDN)
    • $0.15 per GB for data transfers from European and North American locations*
    • $0.20 per GB for data transfers from other locations*
    • $0.01 per 10,000 transactions*

Source –

http://www.microsoft.com/windowsazure/offers/popup/popup.aspx?lang=en&locale=en-US&offer=MS-AZR-0001P

and

http://www.microsoft.com/windowsazure/windowsazure/

Amazon EC2 has more options though——————————-

http://aws.amazon.com/ec2/pricing/

Standard On-Demand Instances Linux/UNIX Usage Windows Usage
Small (Default) $0.085 per hour $0.12 per hour
Large $0.34 per hour $0.48 per hour
Extra Large $0.68 per hour $0.96 per hour
Micro On-Demand Instances Linux/UNIX Usage Windows Usage
Micro $0.02 per hour $0.03 per hour
High-Memory On-Demand Instances
Extra Large $0.50 per hour $0.62 per hour
Double Extra Large $1.00 per hour $1.24 per hour
Quadruple Extra Large $2.00 per hour $2.48 per hour
High-CPU On-Demand Instances
Medium $0.17 per hour $0.29 per hour
Extra Large $0.68 per hour $1.16 per hour
Cluster Compute Instances
Quadruple Extra Large $1.60 per hour N/A*
* Windows is not currently available for Cluster Compute Instances.

http://aws.amazon.com/ec2/instance-types/

Standard Instances

Instances of this family are well suited for most applications.

Small Instance – default*

1.7 GB memory
1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)
160 GB instance storage (150 GB plus 10 GB root partition)
32-bit platform
I/O Performance: Moderate
API name: m1.small

Large Instance

7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage (2×420 GB plus 10 GB root partition)
64-bit platform
I/O Performance: High
API name: m1.large

Extra Large Instance

15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage (4×420 GB plus 10 GB root partition)
64-bit platform
I/O Performance: High
API name: m1.xlarge

Micro Instances

Instances of this family provide a small amount of consistent CPU resources and allow you to burst CPUcapacity when additional cycles are available. They are well suited for lower throughput applications and web sites that consume significant compute cycles periodically.

Micro Instance

613 MB memory
Up to 2 EC2 Compute Units (for short periodic bursts)
EBS storage only
32-bit or 64-bit platform
I/O Performance: Low
API name: t1.micro

High-Memory Instances

Instances of this family offer large memory sizes for high throughput applications, including database and memory caching applications.

High-Memory Extra Large Instance

17.1 GB of memory
6.5 EC2 Compute Units (2 virtual cores with 3.25 EC2 Compute Units each)
420 GB of instance storage
64-bit platform
I/O Performance: Moderate
API name: m2.xlarge

High-Memory Double Extra Large Instance

34.2 GB of memory
13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute Units each)
850 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.2xlarge

High-Memory Quadruple Extra Large Instance

68.4 GB of memory
26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.4xlarge

High-CPU Instances

Instances of this family have proportionally more CPU resources than memory (RAM) and are well suited for compute-intensive applications.

High-CPU Medium Instance

1.7 GB of memory
5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each)
350 GB of instance storage
32-bit platform
I/O Performance: Moderate
API name: c1.medium

High-CPU Extra Large Instance

7 GB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: c1.xlarge

Cluster Compute Instances

Instances of this family provide proportionally high CPU resources with increased network performance and are well suited for High Performance Compute (HPC) applications and other demanding network-bound applications. Learn more about use of this instance type for HPC applications.

Cluster Compute Quadruple Extra Large Instance

23 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cc1.4xlarge

Also http://www.microsoft.com/en-us/sqlazure/default.aspx

offers SQL Databases as a service with a free trial offer

If you are into .Net /SQL big time or too dependent on MS, Azure is a nice option to EC2 http://www.microsoft.com/windowsazure/offers/popup/popup.aspx?lang=en&locale=en-US&offer=COMPARE_PUBLIC

Updated- I just got approved for Google Storage so am adding their info- though they are in Preview (and its free right now) 🙂

https://code.google.com/apis/storage/docs/overview.html

Functionality

Google Storage for Developers offers a rich set of features and capabilities:

Basic Operations

  • Store and access data from anywhere on the Internet.
  • Range-gets for large objects.
  • Manage metadata.

Security and Sharing

  • User authentication using secret keys or Google account.
  • Authenticated downloads from a web browser for Google account holders.
  • Secure access using SSL.
  • Easy, powerful sharing and collaboration via ACLs for individuals and groups.

Performance and scalability

  • Up to 100 gigabytes per object and 1,000 buckets per account during the preview.
  • Strong data consistency—read-after-write consistency for all upload and delete operations.
  • Namespace for your domain—only you can create bucket URIs containing your domain name.
  • Data replicated in multiple data centers across the U.S. and within the same data center.

Tools

  • Web-based storage manager.
  • GSUtil, an open source command line tool.
  • Compatible with many existing cloud storage tools and libraries.

Read the Getting Started Guide to learn more about the service.

Note: Google Storage for Developers does not support Google Apps accounts that use your company domain name at this time.

Back to top

Pricing

Google Storage for Developers pricing is based on usage.

  • Storage—$0.17/gigabyte/month
  • Network
    • Upload data to Google
      • $0.10/gigabyte
    • Download data from Google
      • $0.15/gigabyte for Americas and EMEA
      • $0.30/gigabyte for Asia-Pacific
  • Requests
    • PUT, POST, LIST—$0.01 per 1,000 requests
    • GET, HEAD—$0.01 per 10,000 requests

Towards better analytical software

Here are some thoughts on using existing statistical software for better analytics and/or business intelligence (reporting)-

1) User Interface Design Matters- Most stats software have a legacy approach to user interface design. While the Graphical User Interfaces need to more business friendly and user friendly- example you can call a button T Test or You can call it Compare > Means of Samples (with a highlight called T Test). You can call a button Chi Square Test or Call it Compare> Counts Data. Also excessive reliance on drop down ignores the next generation advances in OS- namely touchscreen instead of mouse click and point.

Given the fact that base statistical procedures are the same across softwares, a more thoughtfully designed user interface (or revamped interface) can give softwares an edge over legacy designs.

2) Branding of Software Matters- One notable whine against SAS Institite products is a premier price. But really that software is actually inexpensive if you see other reporting software. What separates a Cognos from a Crystal Reports to a SAS BI is often branding (and user interface design). This plays a role in branding events – social media is often the least expensive branding and marketing channel. Same for WPS and Revolution Analytics.

3) Alliances matter- The alliances of parent companies are reflected in the sales of bundled software. For a complete solution , you need a database plus reporting plus analytical software. If you are not making all three of the above, you need to partner and cross sell. Technically this means that software (either DB, or Reporting or Analytics) needs to talk to as many different kinds of other softwares and formats. This is why ODBC in R is important, and alliances for small companies like Revolution Analytics, WPS and Netezza are just as important as bigger companies like IBM SPSS, SAS Institute or SAP. Also tie-ins with Hadoop (like R and Netezza appliance)  or  Teradata and SAS help create better usage.

4) Cloud Computing Interfaces could be the edge- Maybe cloud computing is all hot air. Prudent business planing demands that any software maker in analytics or business intelligence have an extremely easy to load interface ( whether it is a dedicated on demand website) or an Amazon EC2 image. Easier interfaces win and with the cloud still in early stages can help create an early lead. For R software makers this is critical since R is bad in PC usage for larger sets of data in comparison to counterparts. On the cloud that disadvantage vanishes. An easy to understand cloud interface framework is here ( its 2 years old but still should be okay) http://knol.google.com/k/data-mining-through-cloud-computing#

5) Platforms matter- Softwares should either natively embrace all possible platforms or bundle in middle ware themselves.

Here is a case study SAS stopped supporting Apple OS after Base SAS 7. Today Apple OS is strong  ( 3.47 million Macs during the most recent quarter ) and the only way to use SAS on a Mac is to do either

http://goo.gl/QAs2

or do a install of Ubuntu on the Mac ( https://help.ubuntu.com/community/MacBook ) and do this

http://ubuntuforums.org/showthread.php?t=1494027

Why does this matter? Well SAS is free to academics and students  from this year, but Mac is a preferred computer there. Well WPS can be run straight away on the Mac (though they are curiously not been able to provide academics or discounted student copies 😉 ) as per

http://goo.gl/aVKu

Does this give a disadvantage based on platform. Yes. However JMP continues to be supported on Mac. This is also noteworthy given the upcoming Chromium OS by Google, Windows Azure platform for cloud computing.

Twitter Cloud and a note on Cloud Computing

That’s what I use twitter for. If you have a twitter account you can follow me here

http://twitter.com/decisionstats

A couple of weeks ago I accidentally deleted many followers using a Twitter App called Refollow- I was trying to clean up people I follow and checked the wrong tick box-

so please if you feel I unfollowed you- it was a mistake. Seriously.

[tweetmeme=”decisionstats”]

 

 

 

 

 

 

 

 

 

 

 

 

On Cloud Computing- and Google- rumours ( 🙂 ) are emerging that Google’s push for cloud computing is to turn desktop computing to IBM like mainframe computing .  Except that there are too many players this time. Where is the Department of Justice and anti trust – does Amazon qualify for being too big in cloud computing currently.

Or the rumours could be spread by Microsoft/ Apple / Amazon competitors etc. Geeks are like that sometimes.

Interview KXEN Bruno Delahaye

In my continuing coverage of KXEN, the plucky company that has managed to revolutionize analytics automation and social network analysis- Here is an interview with KXEN’s Vice President Bruno Delahaye.

246ee7c

Ajay – What is the best feature you like in KXEN. – both as a company and as a product.

Bruno- Well actually what I like the most about KXEN is the will to make a difference. This is true at different levels of course: each individual within the company is trying to make things happen. For employees at KXEN this is not just a job: they want to change the game! The product side is naturally cascading from this. We are not simply recoding existing algorithms like some of our competitors are doing, instead we are looking in every domain of predictive and descriptive analytics where we can deliver higher value to our customers. When customers, thanks to the automation we provide, come back to us stating that they manage to increase their modeling productivity by 10 or even 50 compared to their previous modeling process we really think that what we provide is changing the game. Also, the fact that we have well over 500 customers globally today is proving that our customers recognize this as well!

Ajay : What areas has KXEN been most suitable for ? Biggest success story so far.
Bruno- KXEN has been very successful for 2 types of customers. We have been very successful in companies with mature Data Mining practices, companies that have realized that they need to move from a fully hand crafted approach to a more industrialized one in order to answer business requirements. As an example, lots of large companies run 10s of marketing campaigns per month and actually use data mining for only 1 or 2 at best… once organizations have understood the power of Data Mining they certainly want to target each campaign. Only KXEN can provide the level of automation required for this. On the other side, new data mining users (either new companies or new departments in a company) are also very eager to use KXEN. The learning curve with KXEN is so quick that it enables them to use their existing team (the ones that are aware of the business issues) and make them run within few days successful churn management programs or rebuild their customer segmentation in a reliable manner.

If you were expecting figures here, some Vodafone entities are claiming that they reduce churn in some customer segments by more than 10% by implementing KXEN. Unicredit in Austria mentioned that due to KXEN they gained an additional 50m per season….as you can guess the success of our customers always brighten our days.

Ajay : What areas would you rather not recommend KXEN? What other software would you recommend in those cases ?

Bruno- Well, I would recommend to use KXEN in every area of course, nevertheless where we have been less successful so far is with companies where time pressure to deliver analysis is lower. Basically, research departments tend to use more softwares like SAS EM or SPSS Clementine that are more methods/algorithms oriented rather than results oriented.

Ajay : What is the biggest challenge you have faced while introducing KXEN to a wider audience.
Bruno- The bigger challenge we have is in building domain expertise, it is indeed very difficult to build knowledge of our teams at the same time in Customer Lifecycle Analytics, in HRM, SCM… that is where building a confident relationship with the customer is so important. We have to prove to our prospect very early in the discussions that with KXEN they will make significant steps forward! This is also where our partner are so important to us. KXEN works with international as well as local partners with specific expertise to help our customers make the best possible use of the KXEN Data mining software to insure a high and fast ROI.

Ajay -Do you think the text mining as well as the Data Fusion approach can work for online web analytics, search engines or ad targeting?

Bruno- The data fusion approach is certainly one that makes sense for online web analytics. Analyzing the sequence of events rather than just taking into account whether an event occurs is actually a very powerful way to predict customer behavior or in this case the next click or the next action that is going to be made. I am not in this case claiming that everything has to be real-time as this could be the cause of the creation of weak or even unreliable/non stable models. Instead what we recommend our customer to do is to split the learning part that can be made off-line from the deployment that needs to be done real-time.

Ajay- Describe the relationships of KXEN with other members of the business intelligence community in terms of alliances.

Bruno- KXEN is a very good complement to BI vendors. We are actually partnering with several Data warehouse Vendors. For Data warehouse, the equation is quite simple they allow customers to structure and store the data but to provide real ROI, solutions need to be plugged on top of them. Setting a Data warehouse if you do not use the stored data is just another cost, what KXEN does is enabling to take advantage of the data asset to build customer segments that you will use to define your marketing mix, or simply target your customer either for cross-selling, up-selling or retention/loyalty purposes. The same is valid for credit scoring, fraud detection….

Case Study- Assume I have 50000 leads daily on a Car buying website. How would KXEN help me in scoring the model (as compared to other online based scoring solutions). Is it technically possible for me to install KXEN on Windows/ Other instances in remote computing like Amazon EC2 and not a server sitting somewhere.

The key difference, I believe, is that with KXEN you will indeed be able to do this even if you are not a data mining expert, if you want to use the results of yesterday’s campaigns to rebuild a model and if you can only afford to spend 10 minutes on this task every day. At the end of the day what we allow our users is to answer their business questions within the time frame they have rather than trying to convince them that they do not really need to do so many analysis for their business to run successfully

Ajay- And that was Bruno, VP, EMEA, KXEN. His profile can be seen here

http://www.linkedin.com/in/brunodelahaye

Bruno Delahaye manages KXENs operations for Continental Europe, Middle East, Africa and South America at KXEN. He is responsible for identifying and managing key partnership opportunities and developing the overall strategy for new partnerships.

For more on KXEN please go to http://www.kxen.com, you may need to regsiter to download their properietary white papers on Structural Risk Management or Text Mining.

Conflict of Interest Disclaimer-I am a consultant to KXEN as a social media consultant. Chairman Roger Hadaad was one of the first Chairman of a major corporation to agree to give interview to this small blog.