Up and Running with Big Data: 3 Day Deep-Dive

Over three days, explore the Big Data tools, technologies and techniques which allow organisations to gain insight and drive new business opportunities by finding signal in their data. Using Amazon Web Services, you’ll learn how to use the flexible map/reduce programming model to scale your analytics, use Hadoop with Elastic MapReduce, write queries with Hive, develop real world data flows with Pig and understand the operational needs of a production data platform

Day 1:

MapReduce concepts

Hadoop implementation: Jobtracker, Namenode, Tasktracker, Datanode, Shuffle & Sort

Introduction to Amazon AWS and EMR with console and command-line tools

Implementing MapReduce with Java and Streaming

Day 2:

Hive Introduction

Hive Relational Operators

Hive Implementation to MapReduce

Hive Partitions

Hive UDFs, UDAFs, UDTFs

Day 3:

Pig Introduction

Pig Relational Operators

Pig Implementation to MapReduce

Pig UDFs

NoSQL discussion

What is Cassandra? Why is this relevant to analytics?

It is the next generation Database that you want your analytics software to be compatible with. Also it is quite easy to learn. Did I mention that if you say “I know how to Hadoop/Big Data” on your resume, you just raised your market price by an extra 30 K$. I mean there is a big demand for analysts and statisticians who can think/slice data from a business perspective AND write that HADOOP/Big Data code.

How do I learn more?

http://www.datastax.com/events/cassandrasf2011

Whats in it for you?

Well, I shifted my poetry to https://poemsforkush.wordpress.com/

On Decisionstats.com This is what I love to write about! I find it cool.

——————————————————–

Cassandra SF 2011- Monday, July 11

It’s been almost a year since the first Apache Cassandra Summit in San Francisco. Once again we’ve reserved the beautiful Mission Bay Conference Center. Because the Cassandra community has grown so much in the last year, we’re taking the entire venue. This year’s event will not only include Cassandra, but also Brisk, Apache Hadoop, and more.

What’s in-store for this year’s conference?

We have two rooms set aside for presentations.This year we also have multiple rooms set aside for Birds of a Feather talks, committer meetups, and other small discussions.

We’ve sent out surveys to all the attendees of last year’s conference, as well as a few hundred other members of the community. Below are some of the topics people have requested so far.

If you have topics you’d like to see covered, or you would like to submit a presentation, send a note to lynnbender@datastax.com.

What else?

We’ll be providing lunch as well as continuous beverage service — so that you won’t have to take your mind outside the information windtunnel.

We’ll also be hosting a post event party. Details coming shortly.

For more information…

Submissions and suggestions: If you wish to propose a talk or presentation, or have a suggestion on a topic you’d like to see covered, send a note to Lynn Bender at lynnbender@datastax.com

Sponsorship opportunities: Contact Michael Weir at DataStax: mweir@datastax.com

Apache Cassandra, Cassandra, Apache Hadoop, Hadoop, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission as of 2011. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by DataStax.

https://cassandra.apache.org/

Welcome to Apache Cassandra

The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo’s fully distributed design and Bigtable’s ColumnFamily-based data model.

Cassandra was open sourced by Facebook in 2008, and is now developed by Apache committers and contributors from many companies.

Download

The latest release is 0.8.0(Changes)

apache-cassandra-0.8.0-bin.tar.gz

Other download options

Overview

ProvenCassandra is in use at Digg, Facebook,Twitter, Reddit, Rackspace, Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX, and more companiesthat have large, active data sets. The largest production cluster has over 100 TB of data in over 150 machines.
Fault TolerantData is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.

DecentralizedEvery node in the cluster is identical. There are no network bottlenecks. There are no single points of failure.
You’re in ControlChoose between synchronous or asynchronous replication for each update. Highly available asynchronus operations are optimized with features like Hinted Handoffand Read Repair.
Rich Data ModelAllows efficient use for many applications beyond simple key/value.

ElasticRead and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.
DurableCassandra is suitable for applications that can’t afford to lose data, even when an entire data center goes down.
Professionally SupportedCassandra support contracts and services are available from third partie

DataStax Rewires Hadoop with Apache Cassandra (datacenterknowledge.com)

Tag: Apache Hadoop

How to help your government keep the world safe using statistics #rstats #python #sas

Think Big Analytics

Up and Running with Big Data: 3 Day Deep-Dive

Here comes Cassandra!

Cassandra SF 2011- Monday, July 11

What’s in-store for this year’s conference?

What else?

For more information…

Welcome to Apache Cassandra

Download

Overview

Please share:

Up and Running with Big Data: 3 Day Deep-Dive

Related articles

Please share:

Cassandra SF 2011- Monday, July 11

What’s in-store for this year’s conference?

What else?

For more information…

Welcome to Apache Cassandra

Download

Overview

Related articles

Please share: