Mapreduce Book

Here is a new book on learning MapReduce and it has a free downloadable version as well.

Data-Intensive Text Processing with MapReduce

Jimmy Lin and Chris Dyer

ABSTRACT

Our world is being revolutionized by data-driven methods: access to large amounts of data has generated new insights and opened exciting new opportunities in commerce, science, and computing applications. Processing the enormous quantities of data necessary for these advances requires large clusters, making distributed computing paradigms more crucial than ever. MapReduce is a programming model for expressing distributed computations on massive datasets and an execution framework for large-scale data processing on clusters of commodity servers. The programming model provides an easy-to-understand abstraction for designing scalable algorithms, while the execution framework transparently handles many system-level details, ranging from scheduling to synchronization to fault tolerance. This book focuses on MapReduce algorithm design, with an emphasis on text processing algorithms common in natural language processing, information retrieval, and machine learning. We introduce the notion of MapReduce design patterns, which represent general reusable solutions to commonly occurring problems across a variety of problem domains. This book not only intends to help the reader “think in MapReduce”, but also discusses limitations of the programming model as well.

You can download the book here

This book is part of the Morgan & Claypool Synthesis Lectures on Human Language Technologies. If you’re at a university, your institution may already subscribe to the series, in which case you can access the electronic version directly without cost (see this page for a list of institutional subscribers). Otherwise, to purchase:

Quite explicitly, this book focuses on MapReduce algorithm design, not Hadoop programming. Tom White’s Hadoop: The Definitive Guide is a great resource for learning Hadoop.

Want to be notified of updates? Interested in MapReduce algorithm design? Follow @lintool on Twitter here!

Open Source and Software Strategy

Curt Monash at Monash Research pointed out some ongoing open source GPL issues for WordPress and the Thesis issue (Also see http://ma.tt/2009/04/oracle-and-open-source/ and  http://www.mattcutts.com/blog/switching-things-around/).

As a user of both going upwards of 2 years- I believe open source and GPL license enforcement are general parts of software strategy of most software companies nowadays. Some thoughts on  open source and software strategy-Thesis remains a very very popular theme and has earned upwards of 100,000 $ for its creator (estimate based on 20k plus installs and 60$ avg price)

  • Little guys like to give away code to get some satisfaction/ recognition, big guys give away free code only when its necessary or when they are not making money in that product segment anyway.
  • As Ethan Hunt said, ” Every Hero needs a Villian”. Every software (market share) war between players needs One Big Company Holding more market share and Open Source Strategy between other player who is not able to create in house code, so effectively out sources by creating open source project. But same open source propent rarely gives away the secret to its own money making project.
    • Examples- Google creates open source Android, but wont reveal its secret algorithm for search which drives its main profits,
    • Google again puts a paper for MapReduce but it’s Yahoo that champions Hadoop,
    • Apple creates open source projects (http://www.apple.com/opensource/) but wont give away its Operating Source codes (why?) which help people buys its more expensive hardware,
    • IBM who helped kickstart the whole proprietary code thing (remember MS DOS) is the new champion of open source (http://www.ibm.com/developerworks/opensource/) and
    • Microsoft continues to spark open source debate but read http://blogs.technet.com/b/microsoft_blog/archive/2010/07/02/a-perspective-on-openness.aspx and  also http://www.microsoft.com/opensource/
    • SAS gives away a lot of open source code (Read Jim Davis , CMO SAS here , but will stick to Base SAS code (even though it seems to be making more money by verticals focus and data mining).
    • SPSS was the first big analytics company that helps supports R (open source stats software) but will cling to its own code on its softwares.
    • WordPress.org gives away its software (and I like Akismet just as well as blogging) for open source, but hey as anyone who is on WordPress.com knows how locked in you can get by its (pricy) platform.
    • Vendor Lock-in (wink wink price escalation) is the elephant in the room for Big Software Proprietary Companies.
    • SLA Quality, Maintenance and IP safety is the uh-oh for going in for open source software mostly.
  • Lack of IP protection for revenue models for open source code is the big bottleneck  for a lot of companies- as very few software users know what to do with source code if you give it to them anyways.
    • If companies were confident that they would still be earning same revenue and there would be less leakage or theft, they would gladly give away the source code.
    • Derivative softwares or extensions help popularize the original softwares.
      • Half Way Steps like Facebook Applications  the original big company to create a platform for third party creators),
      • IPhone Apps and Android Applications show success of creating APIs to help protect IP and software control while still giving some freedom to developers or alternate
      • User Interfaces to R in both SAS/IML and JMP is a similar example
  • Basically open source is mostly done by under dog while top dog mostly rakes in money ( and envy)
  • There is yet to a big commercial success in open source software, though they are very good open source softwares. Just as Google’s success helped establish advertising as an alternate ( and now dominant) revenue source for online companies , Open Source needs a big example of a company that made billions while giving source code away and still retaining control and direction of software strategy.
  • Open source people love to hate proprietary packages, yet there are more shades of grey (than black and white) and hypocrisy (read lies) within  the open source software movement than the regulated world of big software. People will be still people. Software is just a piece of code.  😉

(Art citation-http://gapingvoid.com/about/ and http://gapingvoidgallery.com/

Movie Review – BBudah Hoga Tera Baap (or Your Daddy is old)

Movie Review – BBudah Hoga Tera Baap (or Your Daddy is old)

This one is  a scene stealer by the Big Amitabh Bachchan, who plays a retired gangster back in Mumbai for one last gig.With his manners, his slapstick, his dance and still his panache at executing action scenes, Mr Bachchan proves who is the big daddy of Bollywood. The movie seems a celebration of his 4 decades career, including references to his left hand shooting revolver, his wit, his dance, his legendary songs.

Watch it to celebrate Bollywood and the Indian way of shooting film.

 

Movie Review Transformers

Transformers # is a long long movie. so long that you wonder when the robots would stop fighting and start singing songs of peace.

Megan Fox ‘s replacement is so bad, you wish they replaced the director. This is a robot overkill, to the point of a horror massacres. But if you saw the first two movies, and want to know what happened next go ahead.  The  early half of the movie with the Moon Landing was good, but somewhere down the line the director falls in love with himself rather than with his art. Shia LeBeouf is wasted once again in a big action franchise thriller-but he would rather be a well paid safe big movie star than be an actor plying his grease paint. Oh yes, some parts of the movie you cant really figure out who is an autobot, who is a decepticion and who is the idiot for paying extra for a 3 D movie.

Still , Enjoy .

Browser based Music Creation

Aviary- the online pioneer in browser based art editing introduced music creation and editing in it’s latest version. If you see the screenshots you will find creating music is now simply a matter of visualization and you can see some screenshots below to see how easy it is to create music.

So if you like music and ever felt the need for music to create for videos or websites , take 5 minutes and try it out at http://aviary.com/