Ok I missed this one as it came on Dec 20. I think the AWS data pipeline is a really important step forward for cloud enabled analytics.
What is AWS Data Pipeline?
AWS Data Pipeline is a web service that you can use to automate the movement and transformation of data. With AWS Data Pipeline, you can define data-driven workflows, so that tasks can be dependent on the successful completion of previous tasks.
For example, you can use AWS Data Pipeline to archive your web server’s logs to Amazon Simple Storage Service (Amazon S3) each day and then run a weekly Amazon Elastic MapReduce (Amazon EMR) job flow over those logs to generate traffic reports.
In this example, AWS Data Pipeline would schedule the daily tasks to copy data and the weekly task to launch the Amazon EMR job flow. AWS Data Pipeline would also ensure that Amazon EMR waits for the final day’s data to be uploaded to Amazon S3 before it began its analysis, even if there is an unforeseen delay in uploading the logs.
AWS Data Pipeline handles the ambiguities of real-world data management. You define the parameters of your data transformations and AWS Data Pipeline enforces the logic that you’ve set up.
Some examples that are listed are-
AWS Data Pipeline currently is available in the US East region. Pay only for what you use – there is no minimum fee.
As part of AWS’s Free Usage Tier, AWS Data Pipeline offers the following each month to new customers:
- 3 Low Frequency preconditions running on AWS at no charge
- 5 Low Frequency activities running on AWS at no charge
Low Frequency activities and preconditions are ones scheduled to run one time a day or less.
|High Frequency||Low Frequency|
|Activities or preconditions running on AWS||$1.00 per month||$0.60 per month|
|Activities or preconditions running on-premise||$2.50 per month||$1.50 per month|