Home » Posts tagged 'storage'
Tag Archives: storage
Latest from the Amazon Cloud-
hi1.4xlarge instances come with eight virtual cores that can deliver 35 EC2 Compute Units (ECUs) of CPU performance, 60.5 GiB of RAM, and 2 TiB of storage capacity across two SSD-based storage volumes. Customers using hi1.4xlarge instances for their applications can expect over 120,000 4 KB random write IOPS, and as many as 85,000 random write IOPS (depending on active LBA span). These instances are available on a 10 Gbps network, with the ability to launch instances into cluster placement groups for low-latency, full-bisection bandwidth networking.
High I/O instances are currently available in three Availability Zones in US East (N. Virginia) and two Availability Zones in EU West (Ireland) regions. Other regions will be supported in the coming months. You can launch hi1.4xlarge instances as On Demand instances starting at $3.10/hour, and purchase them as Reserved Instances
High I/O Instances
Instances of this family provide very high instance storage I/O performance and are ideally suited for many high performance database workloads. Example applications include NoSQL databases like Cassandra and MongoDB. High I/O instances are backed by Solid State Drives (SSD), and also provide high levels of CPU, memory and network performance.
High I/O Quadruple Extra Large Instance
60.5 GB of memory
35 EC2 Compute Units (8 virtual cores with 4.4 EC2 Compute Units each)
2 SSD-based volumes each with 1024 GB of instance storage
I/O Performance: Very High (10 Gigabit Ethernet)
Storage I/O Performance: Very High*
API name: hi1.4xlarge
*Using Linux paravirtual (PV) AMIs, High I/O Quadruple Extra Large instances can deliver more than 120,000 4 KB random read IOPS and between 10,000 and 85,000 4 KB random write IOPS (depending on active logical block addressing span) to applications. For hardware virtual machines (HVM) and Windows AMIs, performance is approximately 90,000 4 KB random read IOPS and between 9,000 and 75,000 4 KB random write IOPS. The maximum sequential throughput on all AMI types (Linux PV, Linux HVM, and Windows) per second is approximately 2 GB read and 1.1 GB write.
1) Google Drive gives more free space upfront than Dropbox.5GB versus 2GB
2) Dropbox has a referral system 500 mb per referral while there is no referral system for Google Drive
3) The sync facility with Google Docs makes Google Drive especially useful for prior users of Google Docs.
4) API access to Google Drive is only for Chrome apps which is intriguing!
Apps will not have any API access to files unless users have first installed the app in Chrome Web Store.
You can use the Dropbox API much more easily -
See the platforms at
(though I wonder if you set the R working directory to the local shared drive for Google Drive it should sync up as well but of course be slower -http://scrogster.wordpress.com/2011/01/29/using-dropbox-with-r-2/)
5) Google Drive icon is ugly (seriously, dude!) , but the features in the Windows app is just the same as the Dropbox App. Too similar ;)
6) Upgrade space is much more cheaper to Google Drive than Dropbox ( by Google Drive prices being exactly a quarter of prices on Dropbox and max storage being 16 times as much). This will affect power storage users. I expect to see some slowdown in Dropbox new business unless G Drive has outage (like Gmail) . Existing users at Dropbox probably wont shift for the small dollar amount- though it is quite easy to do so.
Install Google Drive on your local workstation and cut and paste your Dropbox local folder to the Google Drive local folder!!
7) Dropbox deserves credit for being first (like Hotmail and AOL) but Google Drive is almost better in all respects!
Need more storage?
Current account type
Up to 18 GB (2 GB + 500 MB per referral)
Other account types
|1 TB +||
Plans starting at 1 TB
|Large shared quota, centralized admin and billing, and more!|
Part 1 in this series is avaiable at http://www.decisionstats.com/analytics-for-cyber-conflict/
The next articles in this series will cover-
- the kind of algorithms that are currently or being proposed for cyber conflict, as well as or detection
Cyber Conflict requires some basic elements of the following broad disciplines within Computer and Information Science (besides the obvious disciplines of heterogeneous database types for different kinds of data) -
1) Cryptography – particularly a cryptographic hash function that maximizes cost and time of the enemy trying to break it.
The ideal cryptographic hash function has four main or significant properties:
- it is easy (but not necessarily quick) to compute the hash value for any given message
- it is infeasible to generate a message that has a given hash
- it is infeasible to modify a message without changing the hash
- it is infeasible to find two different messages with the same hash
A commercial spin off is to use this to anonymized all customer data stored in any database, such that no database (or data table) that is breached contains personally identifiable information. For example anonymizing the IP Addresses and DNS records with a mashup (embedded by default within all browsers) of Tor and MafiaaFire extensions can help create better information privacy on the internet.
This can also help in creating better encryption between Instant Messengers in Communication
2) Data Disaster Planning for Data Storage (but also simulations for breaches)- including using cloud computing, time sharing, or RAID for backing up data. Planning and creating an annual (?) exercise for a simulated cyber breach of confidential just like a cyber audit- similar to an annual accounting audit
3) Basic Data Reduction Algorithms for visualizing large amounts of information. This can include
- K Means Clustering, http://www.jstor.org/pss/2346830 , http://www.cs.ust.hk/~qyang/Teaching/537/Papers/huang98extensions.pdf , and http://stackoverflow.com/questions/6372397/k-means-with-really-large-matrix
- Topic Models (LDA) http://www.decisionstats.com/topic-models/,
- Social Network Analysis http://en.wikipedia.org/wiki/Social_network_analysis,
- Graph Analysis http://micans.org/mcl/ and http://www.ncbi.nlm.nih.gov/pubmed/19407357
- MapReduce and Parallelization algorithms for computational boosting http://www.slideshare.net/marin_dimitrov/large-scale-data-analysis-with-mapreduce-part-i
In the next article we will examine
- the role of non state agents as well as state agents competing and cooperating,
- and what precautions can knowledge discovery in databases practitioners employ to avoid breaches of security, ethics, and regulation.
and an additional 750 hours /month of Linux based computing. The windows instance is really quite easy for users to start getting the hang of cloud computing. and it is quite useful for people to tinker around, given Google’s retail cloud offerings are taking so long to hit the market
But it is only for new users.
WS Free Usage Tier now Includes Microsoft Windows on EC2
The AWS Free Usage Tier now allows you to run Microsoft Windows Server 2008 R2 on an EC2 t1.micro instance for up to 750 hours per month. This benefit is open to new AWS customers and to those who are already participating in the Free Usage Tier, and is available in all AWS Regions with the exception of GovCloud. This is an easy way for Windows users to start learning about and enjoying the benefits of cloud computing with AWS.
The micro instances provide a small amount of consistent processing power and the ability to burst to a higher level of usage from time to time. You can use this instance to learn about Amazon EC2, support a development and test environment, build an AWS application, or host a web site (or all of the above). We’ve fine-tuned the micro instances to make them even better at running Microsoft Windows Server.
You can launch your instance from the AWS Management Console:
We have lots of helpful resources to get you started:
- An updated (and even more helpful) Amazon EC2 Microsoft Windows Guide.
- Getting Started Guide: Web Application Hosting for Microsoft Windows.
- The Getting Started Guide includes a new section on Deploying a WordPress Blog.
- Our Windows and .NET Developer Center.
- A brand new AWS Microsite, with a focus on running Windows on Amazon EC2.
- Additional documentation on the AWS free usage tier, including eligibility information and some tips for making the most of it.
Along with 750 instance hours of Windows Server 2008 R2 per month, the Free Usage Tier also provides another 750 instance hours to run Linux (also on a t1.micro), Elastic Load Balancer time and bandwidth, Elastic Block Storage, Amazon S3 Storage, and SimpleDB storage, a bunch of Simple Queue Service and Simple Notification Service requests, and some CloudWatch metrics and alarms (see the AWS Free Usage Tier page for details). We’ve also boosted the amount of EBS storage space offered in the Free Usage Tier to 30GB, and we’ve doubled the I/O requests in the Free Usage Tier, to 2 million.
Business Metrics (a partial extract from my upcoming book “R for Business Analytics”
Business Metrics are important variables that are collected on a periodic basis to assess the health and sustainability of a business. They should have the following properties-
1) What is a Business Metric-The absence of collection of regular update of the business metric could cause business disruption by incorrect and incomplete decision making.
2) Cost of Business Metrics- The costs of collection, storage and updating of the business metric is less than the opportunity costs of wrong decision making cause by lack of information of that business metric.
3) Continuity in your Business Metrics- The business metrics are continuous in comparing across time periods and business units- if necessary the assumptions for smoothing the comparisons should be listed in the business metric presentation itself.
4) Simplify your Business Metrics- Business metrics can be derived as well from other business metrics. If necessary and to avoid clutter only the most important business metrics should be presented, or the metrics with the biggest deviation from past trends should be mentioned.
5) Normalize your Business Metrics- Scale of the business metric units should be comparable to other business metrics as well as significant to emphasize the difference in numbers.
6) Standardize your Business Metrics- Dimension of business metrics should be increased to enhance comparison and contrasts without enhancing complexity. This means adding an extra dimension for analysis rather than a 2 by 2 comparison, to add time /geography/ employee/business owner as a dimension .
Finally a powerful enough cloud computing instance from Amazon EC2 – called CC2 priced at 3$ per hour (for Windows instances) and 2.4$/hour for Linux
It would be interesting to see how SAS, IBM SPSS or R can leverage these
Storage - On the storage front, the CC2 instance type is packed with 60.5 GB of RAM and 3.37 TB of instance storage.
Processing - The CC2 instance type includes 2 Intel Xeon processors, each with 8 hardware cores. We’ve enabled Hyper-Threading, allowing each core to process a pair of instruction streams in parallel. Net-net, there are 32 hardware execution threads and you can expect 88 EC2 Compute Units (ECU’s) from this 64-bit instance type
On a somewhat smaller scale, you can launch your own array of 290 CC2 instances and create a Top500 supercomputer (63.7 teraFLOPS) at a cost of less than $1000 per hour
Cluster Compute Eight Extra Large specifications:
88 EC2 Compute Units (Eight-core 2 x Intel Xeon)
60.5 GB of memory
3370 GB of instance storage
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cc2.8xlarge
Price: Starting from $2.40 per hour
But some caveats
- The instances are available in a single Availability Zone in the US East (Northern Virginia) Region. We plan to add capacity in other EC2 Regions throughout 2012.
- You can run 2 CC2 instances by default.
- You cannot currently launch instances of this type within a Virtual Private Cloud (VPC).