New Blogging Schedule

Given existing demands on health , wealth and time- I am planning to blog every Friday at the minimum and also send a summary on the Linkedin Group.

Here are some ways to better connect to my writing-

1 Join the LinkedIn Group http://www.linkedin.com/groups?about=&gid=54257

2 Follow on Twitter http://twitter.com/decisionstats

3 Interact on Facebook http://www.facebook.com/Decisionstats

I will also be trying to find better ways to optimize my writing soon. Thanks for reading this!

MapReduce Patent Granted

After 5 years of third party validation and almost 10 years of Google Internal Validation, the fastest way to crunch data belongs to the people who created it first, Google Inc

From

http://www.google.com/patents/about?id=XLfIAAAAEBAJ

Citations

Patent Number Title Issue date
4876643 Parallel searching system having a master processor for controlling plural slave processors for independently processing respective search requests Oct 24, 1989
5345584 System for managing data storage based on vector-summed size-frequency vectors for data sets, devices, and residual storage on devices Sep 6, 1994
5414849 Evaluating method of data division patterns and a program execution time for a distributed memory parallel computer system, and parallel program producing method using such an evaluating method May 9, 1995
5414899 Pivot structure from a lock handle May 16, 1995
5471622 Run-time system having nodes for identifying parallel tasks in a logic program and searching for available nodes to execute the parallel tasks Nov 28, 1995
5590319 Query processor for parallel processing in homogenous and heterogenous databases Dec 31, 1996
5806059 Database management system and method for query process for the same Sep 8, 1998
5819251 System and apparatus for storage retrieval and analysis of relational and non-relational data Oct 6, 1998
5870743 Method and apparatus for parallelizing operations that create a table Feb 9, 1999
5884299 Optimization of SQL queries involving aggregate expressions using a plurality of local and global aggregation operations Mar 16, 1999
5884303 Parallel searching technique Mar 16, 1999
5920854 Real-time document collection search engine with phrase indexing Jul 6, 1999
5956704 Method and apparatus for parallelizing operations that insert data into an existing data container Sep 21, 1999
5963954 Method for mapping an index of a database into an array of files Oct 5, 1999
6006224 Crucible query system Dec 21, 1999
6026394 System and method for implementing parallel operations in a database management system Feb 15, 2000
6182061 Method for executing aggregate queries, and computer system Jan 30, 2001
6226635 Layered query management May 1, 2001
6256621 Database management system and query operation therefor, including processing plural database operation requests based on key range of hash code Jul 3, 2001
6301574 System for providing business information Oct 9, 2001
6366904 Machine-implementable method and apparatus for iteratively extending the results obtained from an initial query in a database Apr 2, 2002
6408292 Method of and system for managing multi-dimensional databases using modular-arithmetic based address data mapping processes on integer-encoded business dimensions Jun 18, 2002
6556988 Database management apparatus and query operation therefor, including processing plural database operation requests based on key range of hash code Apr 29, 2003
6567806 System and method for implementing hash-based load-balancing query processing in a multiprocessor database system May 20, 2003
6741992 Flexible rule-based communication system and method for controlling the flow of and access to information between computer users May 25, 2004
6910070 Methods and systems for asynchronous notification of database events Jun 21, 2005
6961723 System and method for determining relevancy of query responses in a distributed network search mechanism Nov 1, 2005
6983322 System for discrete parallel processing of queries and updates Jan 3, 2006
7099871 System and method for distributed real-time search Aug 29, 2006
7103590 Method and system for pipelined database table functions Sep 5, 2006
7146365 Method, system, and program for optimizing database query execution Dec 5, 2006
7430549 Optimized SQL code generation Sep 30, 2008
7433863 SQL code generation for heterogeneous environment Oct 7, 2008

Claims

What is claimed is:1. A computer-implemented method of analyzing data records, comprising:

storing the data records in one or more data centers;
allocating groups of the stored data records to respective processes of a first plurality of processes executing in parallel;
after allocating the groups of the stored data records to the respective processes of the first plurality of processes executing in parallel, in each respective process of the first plurality of processes:
for each data record in at least a subset of the group of the stored data records allocated to the respective process:
creating a parsed representation of the data record;
applying a procedural language query to the parsed representation of the data record to extract one or more values, wherein the procedural language query is applied independently to each parsed representation; and
applying a respective emit operator to at least one of the extracted one or more values to add corresponding information to a respective intermediate data structure, wherein the respective emit operator implements one of a predefined set of application-independent statistical information processing functions;
in each process of a second plurality of processes, aggregating information from a subset of the intermediate data structures to produce aggregated data; and
combining the produced aggregated data to produce output data.

2. The method of claim 1, wherein the respective emit operator implements one of a predefined set of application-independent statistical information processing functions.

3. The method of claim 2, wherein the application-independent statistical information processing functions comprise one or more of the following: a function for counting occurrences of distinct values, a maximum value function, a minimum value function, a statistical sampling function, a function for identifying values that occur most frequently, and a function for estimating a total number of unique values.

4. The method of claim 1, wherein the applying the procedural language query to the parsed representation of the data record to extract the one or more values and the applying the respective emit operator to at least one of the one or more values to add the corresponding information to the respective intermediate data structure are performed independently for each data record.

5. The method of claim 1, wherein the parsed representation of the data record comprises a key-value pair.

6. The method of claim 1, wherein the intermediate data structure comprises a table having at least one index whose index values comprise unique values of the extracted one or more values.

7. The method of claim 6, wherein the aggregating information from the subset of the intermediate data structures to produce the aggregated data combines the extracted one or more values having the same index values.

8. The method of claim 1, wherein

when applying the procedural language query to the parsed representation produces a plurality of values, applying the respective emit operator to each of the produced plurality of values to add corresponding information to the respective intermediate data structure.

9. The method of claim 1, wherein the second plurality of processes are executing in parallel.

10. The method of claim 1, wherein the allocating the groups of the stored data records to the respective processes of the first plurality of processes executing in Parallel is application independent, and the procedural language query is application dependent.

11. The method of claim 1, wherein the data records comprise one or more of the following types of data records: log files, transaction records, and documents.

12. The method of claim 1, wherein the intermediate data structure is a table having a plurality of indices, wherein each of the plurality of indices is dynamically generated in accordance with the extracted one or more values.

13. A computer-implemented method of analyzing data records, comprising:

storing the data records in one or more data centers;
allocating groups of the stored data records to respective processes of a first plurality of processes executing in parallel;
after allocating the groups of the stored data records to the respective processes of the first plurality of processes executing in parallel, in each respective process of the first plurality of processes:
for each data record in at least a subset of the group of stored data records allocated to the respective process:
creating a parsed representation of the data record;
applying a procedural language query to the parsed representation of the data record to extract one or more values; and
applying a respective operator to at least one of the extracted one or more values to add corresponding information to a respective intermediate data structure;
in each process of a second plurality of processes, aggregating information from a subset of the intermediate data structures to produce aggregated data; and
combining the produced aggregated data to produce output data.

14. A computer system with one or more processors and memory for analyzing data records, wherein the data records are stored in one or more data centers, the computer system comprising:

a first plurality of processes operating in parallel, each of which is allocated a group of stored data records to process;
each respective process of the first plurality of processes including instructions for:
creating a parsed representation of each data record in at least a subset of the group of stored data records allocated to the respective process after the group of stored data records is allocated to the respective process;
applying a procedural language query to the parsed representation of each stored data record in at least the subset of the group of stored data records allocated to the respective process to produce one or more values; and
applying one or more emit operators to each of the one or more produced values to add corresponding information to an intermediate data structure; and
at least one aggregating process for aggregating information from a plurality of the intermediate data structures to produce output data.

15. The system of claim 14, wherein the at least one aggregating process for aggregating information comprises a second plurality of processes operating in parallel, wherein each respective process of the second plurality of processes operating in parallel includes instructions for aggregating information from the plurality of the intermediate data structures to produce the output data.

16. The system of claim 14, wherein the intermediate data structure comprises a table.

17. The system of claim 15, wherein at least one process of the second plurality of processes operating in parallel includes instructions for combining the output data to produce aggregated output data.

18. The system of claim 14, wherein each of the one or more emit operators implements one of a predefined set of application-independent statistical information processing functions.

19. The system of claim 18, wherein the application-independent statistical information processing functions comprise one or more of the following: a function for counting occurrences of distinct values, a maximum value function, a minimum value function, a statistical sampling function, a function for identifying values that occur most frequently, and a function for estimating a total number of unique values.

20. The system of claim 14, wherein the instructions for applying the procedural language query to the parsed representation of each data record in at least the subset of the group of stored data records allocated to the respective process to produce the one or more values include instructions for applying the procedural language query independently to each data record.

21. The system of claim 14, wherein the instructions for applying the procedural language query to the parsed representation of each data record in at least the subset of the group of stored data records allocated to the respective process to produce the one or more values and instructions for applying the one or more emit operators to each of the one or more produced values to add the corresponding information to the intermediate data structure include instructions for applying the procedural language query and the one or more emit operators independently to each data record.

22. The system of claim 14, wherein the at least one aggregating process for aggregating information is configured to aggregate, in each respective process of a second plurality of processes, the information from the plurality of the intermediate data structures to produce the output data.

23. The system of claim 14, wherein each parsed representation of each data record comprises a key-value pair.

24. The system of claim 14, wherein the intermediate data structure comprises a table having at least one index whose index values comprise unique values of the produced values.

25. The system of claim 24, wherein the at least one aggregating process for aggregating the information from the plurality of intermediate data structures to produce the output data includes instructions for combining the one or more produced values having the same index values.

26. The system of claim 14, wherein the instructions for applying the procedural language query to the parsed representation of each stored data record include instructions for applying the one or more emit operators to each of a plurality of produced values to add corresponding information to the intermediate data structure.

27. The system of claim 14, wherein the at least one aggregating process for aggregating the information from the plurality of intermediate data structures to produce the output data comprises a second plurality of processes executing in parallel.

28. The system of claim 14, wherein the system is configured such that the allocation of stored data records to each respective process of the first plurality of processes is application independent, and wherein the procedural language query is application dependent.

29. The system of claim 14, wherein the data records comprise one or more of the following types of data records: log files, transaction records, and documents.

30. The system of claim 14, wherein the intermediate data structure is a table having a plurality of indices, wherein each of the plurality of the indices is dynamically generated in accordance with the one or more produced values.

China bans Chinese Food for Googleplex

This is a direct result of Google ‘s stand on principles (see below). No Google for China means no Chinese food for Googlers. But seriously.

http://googleblog.blogspot.com/2010/01/new-approach-to-china.html

In mid-December, we detected a highly sophisticated and targeted attack on our corporate infrastructure originating from China that resulted in the theft of intellectual property from Google. However, it soon became clear that what at first appeared to be solely a security incident–albeit a significant one–was something quite different.

First, this attack was not just on Google. As part of our investigation we have discovered that at least twenty other large companies from a wide range of businesses–including the Internet, finance, technology, media and chemical sectors–have been similarly targeted. We are currently in the process of notifying those companies, and we are also working with the relevant U.S. authorities.

Second, we have evidence to suggest that a primary goal of the attackers was accessing the Gmail accounts of Chinese human rights activists. Based on our investigation to date we believe their attack did not achieve that objective. Only two Gmail accounts appear to have been accessed, and that activity was limited to account information (such as the date the account was created) and subject line, rather than the content of emails themselves.

Third, as part of this investigation but independent of the attack on Google, we have discovered that the accounts of dozens of U.S.-, China- and Europe-based Gmail users who are advocates of human rights in China appear to have been routinely accessed by third parties. These accounts have not been accessed through any security breach at Google, but most likely via phishing scams or malware placed on the users’ computers.

Algorithms and Ads: No Free Lunches and Hill Climbing

From http://www.no-free-lunch.org/

More formally, where
d = training set;
m = number of elements in training set;
f = ‘target’ input-output relationships;
h = hypothesis (the algorithm’s guess for f made in response to d); and
C = off-training-set ‘loss’ associated with f and h (‘generalization error’)
all algorithms are equivalent, on average, by any of the following measures of risk: E(C|d), E(C|m), E(C|f,d), or E(C|f,m).

How well you do is determined by how ‘aligned’ your learning algorithm P(h|d) is with the actual posterior, P(f|d).

Wolpert’s result, in essence, formalizes Hume, extends him and calls the whole of science into question.

Bing Ad

Make Bing your decision engine

Google Ad

_null_

From http://en.wikipedia.org/wiki/Hill_climbing

hill climbing is a mathematical optimization technique which belongs to the family of local search. It is relatively simple to implement, making it a popular first choice. Although more advanced algorithms may give better results, in some situations hill climbing works just as well.

Hill climbing can be used to solve problems that have many solutions, some of which are better than others. It starts with a random (potentially poor) solution, and iteratively makes small changes to the solution, each time improving it a little. When the algorithm cannot see any improvement anymore, it terminates. Ideally, at that point the current solution is close to optimal, but it is not guaranteed that hill climbing will ever come close to the optimal solution.

For example, hill climbing can be applied to the traveling salesman problem. It is easy to find a solution that visits all the cities but will be very poor compared to the optimal solution. The algorithm starts with such a solution and makes small improvements to it, such as switching the order in which two cities are visited. Eventually, a much better route is obtained.

Hill climbing is used widely in artificial intelligence, for reaching a goal state from a starting node. Choice of next node and starting node can be varied to give a list of related algorithms.

Bing Ad for Hill Climbing-

Climbing at Amazon

Buy books at Amazon.com and save. Qualified orders over $25 ship free

Amazon.com/books

Google Ad for Hill Climbing Algorithm

_null_

A year after Google’s Kill Bill OS announcements and Ballmer’s lets buy our way outta here- there seem still more sense to stick to Google ‘s ad algols. Unless you want to climb Microsoft’s online hills only to find there is no free lunch in their ad rates and offers.

Like the free and virus prone browser.

Dude, Where’s my Water!

A recent extract from the “independent” Times of India – privately owned and indeed the World’s largest newspaper in English

http://timesofindia.indiatimes.com/india/West-uses-glacier-theory-to-flog-India-on-climate-change/articleshow/5482652.cms

NEW DELHI: IPCC’s admission of getting its facts on Himalayan glaciers completely wrong has again brought out concerns about the use of science,

Twitter Facebook Share
Email Print Save Comment

and pseudo-science, to put pressure on India to take stronger action on climate change or to put greater responsibility for the climate crisis on it.

The ‘2035 demise’ date drawn by IPCC in its fourth assessment report for Himalayan glaciers was used very often to demand that India should take greater action to reduce its emissions in order to protect people from catastrophes like glacial melts and floods. Similarly, a ‘premature’ release of information on the so-called Asian Brown Cloud was used by several western NGOs and governments to pin the blame on the melting of glaciers and other climate change impacts on pollution from burning firewood and cow dung in India.

I had earlier pointed out the same based on my proximity to Oakridge , TN and some data ( see here-

https://decisionstats.wordpress.com/2010/01/05/climate-die-oxide/

on January 5

1) What is the expected date of melting of glaciers in Himalayas thus affecting sacred rivers like Ganges and also causing floods in densely populated Asia. How would nation states with shareable resources like Water react on the disputes, dams , hydro electricity and floods.

2) How would you count per capita CO2 consumption- Assume a Factory in China makes 3 tonnes of C02 every year but exports all its products to USA on Indian Cargo ship. Travel contributes another 1 tonne of C02 including air travel, visits etc.

As of now this will be counted as 3 tonne for China, 1 Tonne for India, X tonne for USA ? What is wrong in these assumptions

Indeed I gave a presentation ro senior Times Group People on using data which is available on my Linkedin profile with the Google Docs presentation at

http://linkedin.com/in/ajayohri

Who is correct? The Indians or the Cowboys see NYT article

http://www.nytimes.com/2010/01/05/science/earth/05satellite.html

The nation’s top scientists and spies are collaborating on an effort to use the federal government’s intelligence assets — including spy satellites and other classified sensors — to assess the hidden complexities of environmental change. They seek insights from natural phenomena like clouds and glaciers, deserts and tropical forests.

Not a coincidence this comes close on the National Security Function in India coming totally revamped

http://timesofindia.indiatimes.com/india/Narayanans-exit-gives-full-control-of-internal-security-to-Chidambaram/articleshow/5474408.cms

The exit of M K Narayanan as national security advisor has set the stage for a significant re-ordering of UPA-2’s power structure with

home minister P Chidambaram set to gain fuller control of internal security reducing the role of the next NSA to foreign policy.

Debate and discussion between the freest and largest democracy are welcome steps.

But who is right?

Is climate change negotiations also a proxy for negotiation on terror co operation- as pointed out by me the Sikhs and Indians remain the only forces to be in Kabul (respectively the Sikhs  in recent (late 18th-19th Century) Source- A Brief History of Sikhs and ancient history ( 8 th Century AD) while Churchill’s memoirs in Young Winston talk of the stellar role of the Indian Army in Afghanistan or NWFP. Remember we have been here before- the Bush Administration negotiated and failed to get Indian troops in Iraq in 2004 over lack of monetary negotiations- the Indians turned to be right on true costs!

Are the Chinese or the Americans using India’s insecurities as a proxy?

ps- on Movies Why was Shekhar Kapur’s ( The Oscar nomianted director of Elizabeth ) documentary Paani stopped due to funding issues?

How can ice melting in North Pole lead to lack of water. Do water projections measure that rainwater harvesting has been low in India and ancient Indian religion is okay with Saraswati as one dis appeared river. If the Ganges dries up- the people in India may riot or may just blame it on sin and build smaller rain water dams.

Dude, Where’s my water? When is it gonna go ?

R for Stats : Updated

Here is the new website for statistical analysis using the free analytical software called R (which is enabled for cloud computing as well : see here http://bit.ly/OhriCloud

or http://rgrossman.com/2009/05/17/running-r-on-amazons-ec2/

for the R tutorial on running it on Amazon’s EC2 pay per demand RAM.

It is called R 4 stats or simply http://www.r4stats.com/

Hosted on Google’s Updated Google Sites Platform- it offers a preview to Bob’s earlier run away hit R for SAS and SPSS users updation as well as his upcoming work R for Stata Users.

In Bob’s words himself –

I have substantially expanded the table that compares SAS and SPSS
add-on modules to somewhat equivalent R packages. This new version is
at:
http://r4stats.com/add-on-modules
and I would very much appreciate any feedback you might have on it.

The site http://r4stats.com is the replacement to
http://RforSASandSPSSusers.com and includes the support files for both
“R for SAS and SPSS Users” and the new “R for Stata Users”, due out in
March from Springer.

Topic SAS Product SPSS Product R Package
Advanced Models
SAS/STAT IBM SPSS Advanced Statistics
R, MASS, many others
Association Analysis
Enterprise Miner
IBM SPSS Association
arules, arulesNBMiner, arulesSequences
Basics Base SAS
IBM SPSS Statistics Base
R
Bootstrapping
SAS/STAT
IBM SPSS Bootstrapping
BootCL, BootPR, boot, bootRes, BootStepAIC, bootspecdens, bootstrap, FRB, gPdtest, meboot, multtest, pvclust, rqmcmb2, scaleboot, simpleboot
Classification Analysis
Enterprise Miner
IBM SPSS Classification
rattle, see the neural networks and trees entries in this table.
Conjoint Analysis
SAS/STAT: PROC TRANSREG
IBM SPSS Conjoint
homals, psychoR, bayesm
Correspondence Analysis
SAS/STAT: PROC CORRESP
IBM SPSS Categories
ade4, cocorresp, FactoMineR, homals, made4, MASS, psychoR, PTAk, vegan
Custom Tables
Base SAS, PROC REPORT, PROC SQL, PROC TABULATE, Enterprise Reporter
IBM SPSS Custom Tables
reshape
Data Access
SAS/ACCESS
SPSS Data Access Pack
DBI, foreign, Hmisc: sas.get, sasxport.get, RODBC
Data Collection
SAS/FSP
IBM SPSS Data Collection Family
RSQLite, and the other open source programs MySQL or PostgreSQL are popular among R users for this purpose.
Data Mining
Enterprise Miner
IBM SPSS Modeler
(formerly Clementine)
arules, FactoMineR, rattle, various functions
Data Mining, In-database Processing
SAS In-Database Initiative with Teradata
IBM SPSS Modeler
PL/R
Data Preparation
Various procedures
IBM SPSS Data Preparation, various commands
dprep, plyr, reshape, sqldf, various functions
Developer Tools
SAS/AF, SAS/FSP, SAS Integration Technologies, SAS/TOOLKIT IBM SPSS Statistics Developer, IBM SPSS Statistics Programmability Extension
StatET, R links to most popular compilers, scripting languages, and databases.
Direct Marketing
Nothing quite like it
IBM SPSS Direct Marketing
Nothing quite like it
Exact Tests
SAS/STAT various
IBM SPSS Exact Tests
coin, elrm, exactLoglinTest, exactmaxsel, and options in many others
Excel Integration
SAS Enterprise BI Server IBM SPSS Advantage for Excel 2007
RExcel
Forecasting
SAS/ETS
IBM SPSS Forecasting
Over 40 packages that do time series are described at the Task View link above under Time Series.
Forecasting, Automated
Forecast Server IBM SPSS Forecasting
forecast
Genetics JMP Genomics
None http://www.bioconductor.org
Geographic Information Systems
SAS/GIS, SAS/GRAPH
None (Maps is defunct)
maps, mapdata, mapproj, GRASS via spgrass6, RColorBrewer, see Spatial in Task Views at link at top
Graphical user interfaces
Enterprise Guide, IML Studio, SAS/ASSIST, Analyst, Insight
IBM SPSS Statistics Base Deducer, JGR, R Commander, pmg, rattle, many others at http://www.sciviews.org/_rgui/
Graphics, Interactive
SAS/IML Studio, SAS/INSIGHT, JMP
None
GGobi via rggobi, iPlots, latticist, playwith
Graphics, Static
SAS/GRAPH
SPSS Base, Graphics Production Language
ggplot2, gplots, graphics, grid, gridBase, hexbin, lattice, plotrix, scatterplot3d, vcd, vioplot, geneplotter, Rgraphics
Graphics, Template Builder
Doesn’t use Grammar of Graphics model that forms the core of IBM SPSS Viz Designer or R’s ggplot2
IBM SPSS Viz Designer
Doesn’t use templates, but this GUI for ggplot2 http://www.stat.ucla.edu/~jeroen/ggplot2.html works similarly to IBM SPSS Viz Designer.
Guided Analytics
SAS/LAB
None
None
Matrix/linear Algebra
SAS/IML Studio
IBM SPSS Matrix
R, matlab, Matrix, sparseM
Missing Values Imputation
SAS/STAT: PROC MI
IBM SPSS Missing Values
amelia, Hmisc: aregImpute, EMV, rms (replaces Design): fit.mult.impute, mice, mitools, mvnmle, VIM
Neural Networks
Enterprise Miner
IBM SPSS Neural Networks
AMORE, grnnR, neuralnet, nnet, rattle
Operations Research
SAS/OR
None
glpk, linprog, LowRankQP, TSP
Power Analysis
SAS Power and Sample Size Application, SAS/STAT:
PROC POWER,
PROC GLMPOWER
SamplePower
asypow, powerpkg, pwr, MBESS
Quality Control
SAS/QC
IBM SPSS Statistics Base qcc, spc
Regression Models
SAS/STAT
IBM SPSS Regression
R, Hmisc, lasso, VGAM, pda, rms (replaces Design)
Sampling, Complex
SAS/STAT: PROC SURVEY SELECT, SURVEYMEANS, etc.
IBM SPSS Complex Samples
pps, sampfling, sampling, spsurvey, survey
Segmentation Analysis
Enterprise Miner
IBM Modeler Segmentation
cluster, rattle, som, see CRAN Task Views under Cluster for over 70 packages
Server Version
SAS for your particular server IBM SPSS Statistics Server,
IBM SPSS Modeler Server
rapache, R(D)COM Server, Rserve, StatET
Structural Equation Modeling
SAS/STAT: PROC CALIS
Amos OpenMX, sem
Text Analysis/Mining
Text Miner
IBM SPSS Text Analytics,
IBM SPSS Text Analysis for Surveys
Rstem, las, tm
Trees, Decision, Classification or Regression
Enterprise Miner
IBM SPSS Decision Trees, IBM SPSS AnswerTree, IBM SPSS Modeler (formerly Clementine)
ada, adabag, BayesTree, boost, GAMboost, gbev, gbm, maptree, mboost, mvpart, party, pinktoe,
quantregForest, rpart,rpart.permutation, randomForest, rattle, tree

All SAS and SPSS product names are registered trademarks of their respective companies.

Disclaimer- Bob Muenchen and I work for the same University. While we do have interesting conflicts often, his interview was one of the earliest where this blog began.

See- http://sites.google.com/site/r4statistics/interview

3 Idiots: Insight to Indian Engineer Campus Life

Ever wondered what makes Indian engineers so ahem hard working. Or Just in the mood to sample a BollyWood Movie. Here is 2009’s best movie – an all time grosser from the Oscar Nominated Aamir Khan.

It’s called 3 Idiots and loosely based on the adventures of 3-5 engineering students as they face academic and peer pressure challenges. Awesome. Loosely based on Chetan Bhagat’s book of 3 IIT friends.

Here is a preview of the video-

(Note the students praying for good grades).