Laptops and SuperComputers

The One Laptop per Child project is having its annual give get one promotion. Basically you pay 400 $ , you get one XO laptop free and another XO laptop is donated in your name in a devleoping country. For the technically minded ,here is a great review of the XO laptop at


On a slightly different scale is the NVidia GPU ( as opposed to CPU powered computers). They are available here at

These and the forthcoming series of NVidia powered GPus are going to give extremely high speeds within a price range of 10,000 USD. How high is the speed ?

Well here is a case study from the NYT "

Techniscan Medical Systems of Salt Lake City has turned to Nvidia’s graphics processors to speed up a three-dimensional breast scanning device that could be used for cancer detection if the machine received regulatory approval. Techniscan must turn tens of gigabytes of raw data generated by transmitting pulses of energy through a breast submerged in water into medical image files that consume just 100 megabytes. This whole process used to take a couple of hours using Intel’s processors and now takes just 15 minutes with Nvidia’s hardware."


And here is finally your desktop supercomputer, the Tesla from Nvidia.

"Get your own supercomputer. Experience cluster level computing performance—up to 250 times faster than standard PCs and workstations—right at your desk. The NVIDIA® Tesla™ Personal Supercomputer is based on the revolutionary NVIDIA® CUDA™ parallel computing architecture and powered by up to 960 parallel processing cores."

Now data mining and analytics people love processing power. With this much processing power  it can be quite a lot of fun !

So if you deal with data more than 1 Gb at a time or have more than 10 Pcs, or 2 servers, try the Tesla.

Tesla Architecture

  • Massively-parallel many-core architecture
  • 240 scalar processor cores per GPU
  • Integer, single-precision and double-precision floating point operations
  • Hardware Thread Execution Manager enables thousands of concurrent threads per GPU
  • Parallel shared memory enables processor cores to collaborate on shared information at local cache performance
  • Ultra-fast GPU memory access with 102 GB/s peak bandwidth per GPU
  • IEEE 754 single-precision and double-precision floating point
  • Each Tesla C1060 GPU delivers 933 GFlops Single Precision and 78 GFlops Double Precision performance

Software Development Tools

  • C language compiler, debugger, profiler, and emulation mode for debugging
  • Standard numerical libraries for FFT (Fast Fourier Transform), BLAS (Basic Linear Algebra Subroutines), and CuDPP (CUDA Data Parallel Primitives)

Product Details

  • 3 or 4 Tesla C1060 Computing Processors with 4GB of dedicated memory per GPU
  • 2.33 GHz+ Quad-core AMD Phenom or Opteron, — OR — Quad-core Intel Core 2 or Xeon
  • Minimum system memory: 12 GB for 3 Tesla C1060s and 16 GB for 4 Tesla C1060s (at least 4GB per Tesla C1060)
  • 12GB+ system memory (at least 4GB per Tesla C1060)
  • 1200-1350 Watt Power supply
  • Acoustics < 45dbA

Supported Platforms

  • Microsoft® Windows® XP 64-bit and 32-bit (64-bit recommended)
  • Linux® 64-bit and 32-bit (64-bit recommended)
    • Red Hat Enterprise Linux 4 and 5
    • SUSE 10.1, 10.2 and 10.3


Review – R for SAS and SPSS Users



Introduction- Even though R is a very powerful tool and is free, people with SAS and SPSS background have trouble adapting to R language. That is because all data objects in SAS, SPSS are in fixed rectangular layout, and the programmer just needs to write a series of pre given functions to give results. In R the flexibility of functions, and the sheer diversity of it can confuse and confound the SAS and SPSS programmer wanting to learn it. Note that most SAS and SPSS programmers are corporate users, thus they pay for licenses only by just signing the approved email, and they have a paucity of time.


R Book
R Book




The technical review-

The book is lucid, exhaustive and lists down all reasons for and against R in an objective scientific manner. It goes in great detail, has ready datasets and offers the earlier reference sheet from its websites ( At 75  $ it is not expensive considering the cost of other textbooks in this domain. Having both SAS and SPSS can be a distraction as many SPSS users actually use its click and point interface rather than write raw syntax- perhaps those screenshots should be included.

 It thus gives you the side effect of teaching you twice the languages you wanted to learn, but that’s a good thing. Of course you can choose to ignore the second language if you don’t want to learn it.Maybe this book can be split into R for SAS users and R for SPSS users separately (unless Muenchen is trying an agenda of unifying the whole analytics world – a common theme in November 2008 in the USA😉 )

The book is very easy to understand, providing a step by step way of learning R thoroughly. In addition it uses screenshots extensively to make the point. However this sometimes slows the pace down of the book as it resorts to oversimplification for an advanced SAS users . It does have a good reference guide for people who just need R for some functions like graphics to be used in combination with WPS or Base SAS. You can simply pick and choose.

However it would have been great if there was a CD version.Perhaps it would be in the next edition.

An additional point is that it tries to briefly explain advanced statistical functions (or SAS/STAT equivalents). Some more depth in this section especially to the very popular logistic and linear regression techniques would have helped.(I.e a Chapter on how to build, validate and test a scoring risk model using R).

However you can use the existing chapters to get started on the modeling and iteratively come to its solutions. Some more coverage of Graphical User Interface would have helped, even though that that would have helped many people to simply go to the easy way of using click and get results of the RCmdr and Rattle GUI.


Overall- a great textbook, you would need to spend an hour a day for as much as three weeks to master the book and once you do it would be worth it.

I highly recommend this book for both technical and business users, and for libraries as a reference guide. Both corporate and academic users would relate to the readability of the book and Amazon is also launching a kindle version soon.



The business case review-


In these troubled economic times, if you have to choose between cutting down your software costs or your supervisor having to choose all his software AND employee costs, would you take the time and 50 $ to learn two additional analytics platforms. Because that is what this book by Muenchen does, it can teach SAS to R programmers, and SPSS to both. It’s almost a triple pack combo.


Muenchen spent two years creating a textbook and off the shelf manual for all three languages. Given the fact that the Base SAS language is also supported by WPS ( a software that costs only 600 dollars a year and reads/writes SAS codes and datasets), this book deals with both SAS and WPS languages

Wait a minute ! That’s what all advertisers say…but this is an honest review.(Gandhian Economics-If price of everything was demand and supply, then price of honesty would be infinite- because the supply is so low and the demand so high.)


Buying this book for yourself or a friend could thus save your organizations a lot of money, give you additional skills on your resume and give you bragging rights at the cooler for knowing three languages (SAS, SPSS and R) – all for the price of $ 75 .




Cutting Employee Costs not Employees.




  • use more contractors. use company alumni rather than out of town contractors as this saves the time for them to get up to speed.


  • use more skype rather than phones. get over the headphones discomfort. It is better than the unemployment discomfort.


  • use more WebEX meetings than flying people around. Money you save for your company will help you in retaining others, and yourselves. Most of the airlines money goes to the foreign oil providers anyways.


  • shop for cheaper open source software than blindly approve annual license fees.  this can save you more money than you think and is less complex. Dont go for 100 % either -or solution. Example replace 20 % of your desktops with open office , rather than microsoft office renewals.


  • use more variable linked bonus than flat fees for salaries. this helps separates the grain from the chaff. And improve revenue and cut costs rather than employees.



Each unemployed person is a drag on the economy that will take social security that YOU will pay anyways. Employed people spend money on groceries , movies and provide even more jobs or potential markets.


Mahatma Gandhi once said – Be Indian , Buy Indian. It helped India get rid of the British Empire.

Once in a while, think about paying a small premium for being American , buying American

A Framework for Diners

I thought of opening my own diner, but how to decide where to locate it ?
Note this is a generic model that can also be used to things like shopping malls, or even an service dealership , or any facility that is dependent on nearby population density.
Here is the approach and it is called Ohri Dhaba  Framework (in case you wish to use it, as I have the copyright ,thank you”)-
1) Take Google Earth or Any GIS data for mapping specific zones of City.
2) Use Market Research to get population density , income levels , ethnic preferences, age range , as well as probable customers for that kind of food/service/mall.
3) Use a KML (thats Google Earth format) parser to recover longitude and latitude of centroids of zones .
4) Add additional columns in each zone like other facilties like that (other malls in area)
5) Keep a shortlist of existing as well proposed new sites.
6) Make two circles for 3 km and 10 km penetration of the service from the existing sites and proposed sites.
7) using radius formula convert longitude and latitude to distance in km
8) Convert each site area into a mix of zonal areas (as in area1_3km = zone 1*A1 +zone2*A2…)
9) run regression on known sites ( whose customers are known) to get eventual customers , error terms and coefficients of additional columns in step 4
10) project probable customers of proposed sites.
11) Use feedback loop for changing time validation or percentage error term.
This is a framework for locating location sites using rules rather than gut instinct or pricing dynamics, etc. I am not sure how existing diners ,auto dealerships choose locations but that is more of perceived demand for new sites, and load on existing facilities rather than scientific. This may or maynot be patentable, as someone like X mart may have already built a facilty locating algorithm., but no framework exists to my knowledge.To the best of my knowledge there are many tools that help you locate a site, but no framework as such.

R for SAS and SPSS Users

Update -The R for SAS and SPSS Users book is 1 week away. It is an analytics textbook and can be used as a reference for R,SPSS,SAS as well as any object oriented programming language. Buying it can potentially strengthen your skills and resume in SAS as well as R, and SPSS as well.

Note-Robert Muenchen (pronounced Min’-chen) is the author of the famous R for SAS and SPSS users, and his forthcoming book is an extensive tutorial on anyone wanting to learn either SAS,SPSS,or R or even to migrate from one platform to another. In an exclusive interview Bob agreed to answer some questions on the book , and on students planning to enter science careers.

What made you write the R For SAS and SPSS users?

The book-

A few years ago, all my colleagues seemed to be suddenly talking about R. Had I tried it? What did I think? Wasn’t it amazing? I searched around for a review and found an article by Patrick Burns, “R Relative to Statistics Packages” which is posted on the UCLA site ( That article pointed out the many advantages of R and in it Burns claimed that knowing a standard statistics package interfered with learning R. That article really got my interest up. Pat’s article was a rejoinder to “Strategically using General Purpose Statistics Packages: A Look at Stata, SAS and SPSS” by Michael Mitchell, then the manager of statistical consulting at UCLA (it’s at that same site). In it he said little about R, other than he had “enormous difficulties” learning it that he had especially found the documentation lacking.

I dove in and started learning R. It was incredibly hard work, most of which was caused by my expectations of how I thought it ought to work. I did have a lot to “unlearn” but once I figured a certain step out, I could see that explaining it to another SAS or SPSS user would be relatively easy. I started keeping notes on these differences for myself initially. I finally posted them on the Internet as the first version of R for SAS and SPSS Users. It was only 80 pages and much of its explanation was in the form of extensive R program comments. I provided 27 example programs, each done in SAS, SPSS and R. A person could see how they differed, topic by topic. When a person ran the sections of the R programs and read all the comments, he or she would learn how R worked.

A web page counter on that document showed it was getting about 10,000 hits a month. That translates into about 300 users, paging back and forth through the document. An editor from Springer emailed me to ask if I could make it a book. I said it might be 150 pages when I wrote out the prose to replace all the comments. It turned out to be 480 pages!

What are the salient points in this book ?

The main point is that having R taught to you using terms you already know will make R much easier to learn. SAS and SPSS concepts are used in the body of the book as well as the table of contents, the index and even the glossary. For example, the table of contents has an entry for “Value Labels or Formats” even though R uses neither of those terms as SPSS and SAS do, respectively. The index alone took over 80 hours to compile because it is important for people to be able to look up things like “length” as both a SAS statement and as an R function. The glossary defines R terms using SAS/SPSS jargon and then again using proper R definitions.

SAS and SPSS each have five main parts: 1) commands to read and manage data, 2) procedures for statistics & graphics, 3) output management systems that allow you to use output as input to other analyses, 4) a macro language to automate the above steps and finally 5) a matrix language to help you extend the packages. All five of these parts use different statements and rules that do not apply to the others. Due to the complexity of all this, many SAS and SPSS users never get past the first two parts.

R instead has all these functions unified into a common single structure. That makes it much more flexible and powerful. This claim may seem to be a matter of opinion, but the evidence to back it up comes from the companies themselves. The developers at SAS Institute and SPSS Inc. don’t write their procedures in their own languages, R developers do.

How do you think R will impact the statistical software vendors?

With more statistical procedures than any other package, and its free price, some people think R will put many of the proprietary vendors out of business. R is a tsunami coming at the vendors and how they respond will determine their future. Take SPSS Inc. for example. They have written an excellent interface to R that lets you transfer your data back and forth, letting you run R functions in the middle of your SPSS programs. I show how to use it in my book. Starting with SPSS 17, you can also add R functions to the SPSS menus. This is particularly important because most SPSS users prefer to use menus. The company itself is adding menus to R functions, letting them rapidly expand SPSS’ capabilities at very little expense. They saw the R tsunami coming and they hopped on a surfboard to make the most of it. I think this attitude will help them thrive in the future.

SAS Institute so far as been ignoring R. That means if you need to use an analytic method that is only available in R, you must learn much more R than an SPSS user would. Once you have done that, you might be much more likely to switch over completely to R. Colleagues inside SAS Institute tell me they are debating whether they should follow SPSS’ lead and write a link to R. This has already been done by MineQuest, LLC (see ) with their amusingly named, “A bridge to R” product (playing off “A Bridge Too Far.”)

Statistica is officially supporting R. You can read about the details at ( . Statacorp has not supported R in Stata yet, although a user, Roger Newson, has written an R interface to it (

The company with the most to lose are the makers of S-PLUS. That was Insightful Corp. until they were recently bought out by Tibco. Since R is an implementation of the S language, S-PLUS could be hit pretty hard. On the other hand, they do have functions that handle “big data” so there is a chance that people will develop programs in R, run out of memory and then end up porting them to S-PLUS. S-PLUS also has a more comprehensive graphical user interface than R does, giving them an advantage. However, XL-Solutions Corp. has their new R-PLUS version that adds a slick GUI to R ( There could be a rocky road ahead for S-PLUS. IBM faced a similar dilemma when computing hardware started becoming commodities. They prospered by making up the difference with service income. Perhaps Tibco can too.

Do you have special discounts for students?

My original version of R for SAS and SPSS Users is still online at so students can get it there for free. The book version has a small market that is mostly students so pricing was set with that in mind.

What made you choose a career in Science and what have been the reasons for your success in it.

I started out as an accounting major. I was lucky enough to have had two years of bookkeeping in high school, and I worked part-time in the accounting department of ServiceMaster Industries for several years. I got to fill in for whoever was on vacation, so I got a broad range of accounting experience. I also got my first experience with statistics by helping the auditors. We took a stratified sample of transactions. With transactions divided into segments by their value, and sample a greater proportion as the value increased. For the most expensive transactions, we examined them all. My job was to be the “gofer” who collected all the invoices, checks, etc. to prove that the transactions were real. For a kid in high school, that was great fun!

By the time I was a freshman at Bradley University, I became excited by three new areas: mathematics, computing and psychology. I got to work in a lab at the Peoria Addictions Research Institute, studying addiction in rats and the parts of the brain that were involved. I wrote a simple stat package in FORTRAN to analyze data. After getting my B.A. in psychology, I worked on a PhD in Educational Psychology at Arizona State University. I loved that field and did well, but the job market for professors in that field was horrible at the time. So I transferred to a PhD program in Industrial/Organizational Psychology at The University of Tennessee. It turned out that I did not really care for that area at all, and I spent much of my time studying computing and calculus. My assistantship was with the Department of Statistics. By the time my first year was up, I transferred to statistics. At the time the department lacked a PhD program, so after four years of grad school I stopped with an M.S. in Statistics and got a job as a computing consultant helping people with their SAS, SPSS and STATGRAPHICS programs. Later I was able to expand that role, creating a full-fledged statistical consulting center in partnership with the Department of Statistics. Ongoing funding cuts have been chipping away at that concept though.

What made me a success? I love my job! I get to work with a lot of smart scientists and their grad students, expanding scientific knowledge. What could be better?

Science is boring, and not well paying career compared to being a lawyer or a sales job. People think you are a nerd. Please comment based on your experiences.

Science is constantly making new discoveries. That’s not boring! An area that most people can relate to is medicine. When we finish a study that shows a new treatment is better than an old one, our efforts will help thousands of people. In one study we compared a new, very expensive anti-nausea drug to an old one that was quite cheap. The pharmaceutical company claimed the new drug was better of course, but our study showed that it was not. That ended up helping to control health care costs that we all see escalating rapidly.

Another study found for the first time, a measure that could predict how well a hearing aid would help a person. Now, it’s easy to measure a hearing aid and see that it is doing what it is supposed to do, but a huge proportion of people who buy them don’t like them and stop wearing them after a brief period. Scientists tried for decades to predict which people would not be good candidates for hearing aids. A very sharp scientist at UT, Anna Nabelek, came up with the concept of Acceptable Noise Level. We measured how much background noise people were willing to tolerate before trying a hearing aid. That allowed us to develop a model that could predict well for the first time if someone should bother spending up to $5,000 for hearing aids. For retired people on a fixed income, that was an important finding. An audiology journal devoted an entire issue to the work.

It’s true that you can make more money in many other fields. But the excitement of discovery and the feeling that I’m helping to extend science very satisfying and well worth the lower salary. Plus, having a job in science means you will never have a chance to get bored!

What is your view on Rice University’s initiatives to create open source textbooks at .

I think this is a really good idea. One of my favorite statistics books is Statnotes: Topics in Multivariate Analysis, by G David Garson. You can read it for free at .

Universities pay professors to spend their time doing research, which must be published to get credit. So why not pay professors to write text books too? There have been probably hundreds of introductory books in every imaginable field. They cannot all make it in the marketplace so when they drop out of publication, why not make them available for free? I still have my old Introductory Statistics textbook from 30 years ago and the material is still good. It may be missing a few modern things like boxplots, but it would not take much effort to bring it up to date.

I’m also a huge fan of Project Gutenburg ( That is a collection of over 20,000 books, articles, etc. available there for free download. My wife does volunteer project management and post-processing with Distributed Proofreaders ( which supplies books for Gutenburg.

What are your views on students uploading scanned copies of books to torrent sharing web sites because of expensive books.

The cost of textbooks has gotten out of hand. I think students should pressure universities and professors to consider cheaper alternatives. However scanning books putting them up on web sites isn’t sharing, it’s stealing. I put in most of my weekends and nights for 2 ½ years on my book that will be lucky to sell a few thousand copies. That works out to pennies per hour. Seeing it scanned in would be quite depressing.

When is the book coming out ? What is taking so long ?

We ran into problems when the book was translated from Microsoft Word to LaTeX. The translator program did not anticipate that an index would already be in place. That resulted in 2-3 errors per page. We’re working through that and should finally get it printed in early October.


Robert A. Muenchen is a consulting statistician with 28 years of experience. He is currently the manager of the Statistical Consulting Center at the University of Tennessee. He holds a B.A. in Psychology and an M.S. in Statistics. Bob has conducted research for a variety of public and private organizations and has assisted on more than 1,000 graduate theses and dissertations. He has coauthored over 40 articles published in scientific journals and conference proceedings. Bob has served on the advisory boards of SPSS Inc., the Statistical Graphics Corporation and PC Week Magazine. His suggested improvements have been incorporated into SAS, SPSS, JMP, STATGRAPHICS and several R packages. His research interests include statistical computing, data graphics and visualization,text analysis, data mining, psychometrics and resampling.

Ajay-He is also a very modest and great human being.

SQL Server for the Clouds


Is Microsoft Really going the cloud way ?

Thanks to a post on which brought this to my notice.

You load data from a local .csv file.I used the Microsoft demo dataset. Then go in series of the top tab. All screenshots belong to Microsoft Corp.




Want more ? Visit


Good Ole Microsoft . And God Bless Bill and Jerry (from the ads!)

Online Analytics -June Dershewitz

June Dershewitz

One of World’s Leading and Well Known Authority on Web Analytics

1) What’s the latest trend you see in Online Analytics over the next year and next three to five years.

I strongly believe that web analytics is on its way to becoming business analytics. In the early days we were solely focused on analyzing clickstream data, but in recent years we’ve relaxed our definition of web analytics to include things like voice of customer
and offline outcome data and multivariate testing. More and more I hear people talking about how online customer interaction fits in with the overall goals of the business rather than as an isolated island of activity. In the future I think we’ll see less of a distinction
between traditional business intelligence and what we currently consider to be the separate field of web analytics.

2) Tell us how you came in this field of work, and what factors made
you succeed.

I entered the field of web analytics in 1999. Like many people who got their start at that time, it happened totally by chance. I had applied for a job as a web developer, but the interviewer thought I’d be perfect for another open position – as a web analyst. I took it just to see what it was like. Here it is a decade later and I’m still in web analytics – so I guess you could say it worked out.

Why is web analytics is a natural match for me? Well, I’ve always felt quite comfortable

Continue reading “Online Analytics -June Dershewitz”