SQL and Hadoop: What is this cloud thing

Here is a very good ,in fact brilliant post from Joe Hellerstein, a Professor of Computer Science at UC Berkeley at http://radar.oreilly.com/2008/11/the-commoditization-of-massive.html

It explains the difference between the two databases type.

Enterprise IT camp tends to favor relational databases and the SQL language, while the web upstarts have rallied around the MapReduce programming model popularized at Google, and cloned in open source as Apache Hadoop. Hadoop is in wide use at companies like Yahoo! and Facebook, and gets a lot of attention in tech blogs as the next big open source project. But if you mention Hadoop in a corporate IT shop you are often met with blank stares — SQL is ubiquitous in those environments

Setting aside the trash talk, the usual cases made for the two technologies can be summarized as follows:

Relational Databases

multipurpose: useful for analysis and data update, batch and interactive tasks
high data integrity via ACID transactions
lots of compatible tools, e.g. for loading, management, reporting, data visualization and mining
support for SQL, the most widely-used language for data analysis
automatic SQL query optimization, which can radically improve performance
integration of SQL with familiar programming languages via connectivity protocols, mapping layers and user-defined functions

MapReduce (Hadoop)

designed for large clusters: 1000+ computers
very high availability, keeping long jobs running efficiently even when individual computers break or slow down
data is accessed in "native format" from a filesystem — no need to transform data into tables at load time
no special query language; programmers use familiar languages like Java, Python, and Perl
programmers retain control over performance, rather than counting on a query optimizer
the open-source Hadoop implementation is funded by corporate donors, and will mature over time as Linux and Apache did

Hadoop is still relatively young, and by all reports much slower and more resource intensive than Google’s MapReduce implementation.

What I liked about the article was explaining Hadoop in simple terms to corporate SQL types like me.

It’s interesting how Hadoop would be configured on the NVidia Tesla supercomputer ( at 10000 USD)

– Update – Mathematica is already being modified for the GPU versus CPU system, and there was an interesting discussion in R _help list today on this.

Mathematica is launching a version working with Nvidia GPUs. It is claimed that it’d make it
~10-100x faster!
http://www.physorg.com/news146247669.html