How to help your government keep the world safe using statistics #rstats #python #sas

Big Data for Big Brother. Now playing. At a computer near you. How to help water the tree of liberty using statistics?

Use R

 

or

Use Python

 

LKF2-eVZHWtc-47347

WvfC-nxDTMqJ-97899

or use SAS software

SAS/CIA from the last paragraph of

Click to access ET_CD_Mumbai_Jul12.pdf

Screenshot from 2013-06-09 20:19:01

 

Think Big Analytics

I came across this lovely analytics company. Think Big Analytics. and I really liked their lovely explanation of the whole she-bang big data etc stuff. Because Hadoop isnt rocket science and can be made simpler to explain and deploy.

Check them out yourself at http://www.thinkbiganalytics.com/resources_reference

Also they have an awesome series of lectures coming up-

check out

http://www.eventbrite.com/org/1740609570

Up and Running with Big Data: 3 Day Deep-Dive

Over three days, explore the Big Data tools, technologies and techniques which allow organisations to gain insight and drive new business opportunities by finding signal in their data. Using Amazon Web Services, you’ll learn how to use the flexible map/reduce programming model to scale your analytics, use Hadoop with Elastic MapReduce, write queries with Hive, develop real world data flows with Pig and understand the operational needs of a production data platform

Day 1:

  • MapReduce concepts
  • Hadoop implementation:  Jobtracker, Namenode, Tasktracker, Datanode, Shuffle & Sort
  • Introduction to Amazon AWS and EMR with console and command-line tools
  • Implementing MapReduce with Java and Streaming

Day 2:

  • Hive Introduction
  • Hive Relational Operators
  • Hive Implementation to MapReduce
  • Hive Partitions
  • Hive UDFs, UDAFs, UDTFs

Day 3:

  • Pig Introduction
  • Pig Relational Operators
  • Pig Implementation to MapReduce
  • Pig UDFs
  • NoSQL discussion

Here comes Cassandra!

What is Cassandra? Why is this relevant to analytics?

It is the next generation Database that you want your analytics software to be compatible with. Also it is quite easy to learn. Did I mention that if you say “I know how to Hadoop/Big Data” on your resume, you just raised your market price by an extra 30 K$. I mean there is a big demand for analysts and statisticians who can think/slice data from a business perspective AND write that HADOOP/Big Data code.

How do I learn more?

http://www.datastax.com/events/cassandrasf2011

Whats in it for you?

Well, I shifted my poetry to https://poemsforkush.wordpress.com/

On Decisionstats.com This is what I love to write about! I find it cool.

——————————————————–

Cassandra SF 2011- Monday, July 11

Free Pass Datastax Cassandra SF

It’s been almost a year since the first Apache Cassandra Summit in San Francisco. Once again we’ve reserved the beautiful Mission Bay Conference Center. Because the Cassandra community has grown so much in the last year, we’re taking the entire venue. This year’s event will not only include Cassandra, but also Brisk, Apache Hadoop, and more.

What’s in-store for this year’s conference?

We have two rooms set aside for presentations.This year we also have multiple rooms set aside for Birds of a Feather talks, committer meetups, and other small discussions.

We’ve sent out surveys to all the attendees of last year’s conference, as well as a few hundred other members of the community. Below are some of the topics people have requested so far.

If you have topics you’d like to see covered, or you would like to submit a presentation, send a note to lynnbender@datastax.com.

What else?

We’ll be providing lunch as well as continuous beverage service — so that you won’t have to take your mind outside the information windtunnel.

We’ll also be hosting a post event party. Details coming shortly.

For more information…

Submissions and suggestions: If you wish to propose a talk or presentation, or have a suggestion on a topic you’d like to see covered, send a note to Lynn Bender at lynnbender@datastax.com

Sponsorship opportunities: Contact Michael Weir at DataStax: mweir@datastax.com

Apache Cassandra, Cassandra, Apache Hadoop, Hadoop, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission as of 2011. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by DataStax.

https://cassandra.apache.org/

Welcome to Apache Cassandra

The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo’s fully distributed design and Bigtable’s ColumnFamily-based data model.

Cassandra was open sourced by Facebook in 2008, and is now developed by Apache committers and contributors from many companies.

Download

Overview

  • ProvenCassandra is in use at DiggFacebook,TwitterRedditRackspaceCloudkick, Cisco, SimpleGeo, Ooyala, OpenX, and more companiesthat have large, active data sets. The largest production cluster has over 100 TB of data in over 150 machines.
  • Fault TolerantData is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.
  • DecentralizedEvery node in the cluster is identical. There are no network bottlenecks. There are no single points of failure.
  • You’re in ControlChoose between synchronous or asynchronous replication for each update. Highly available asynchronus operations are optimized with features like Hinted Handoffand Read Repair.
  • Rich Data ModelAllows efficient use for many applications beyond simple key/value.
  • ElasticRead and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.
  • DurableCassandra is suitable for applications that can’t afford to lose data, even when an entire data center goes down.
  • Professionally SupportedCassandra support contracts and services are available from third partie
%d bloggers like this: