R for Business Analytics now in Chinese

Email from Springer to me-


Dear Dr. Ohri,

Springer SBM is pleased to inform you that we have concluded a contract for the Chinese translation of your book:
R for Business Analytics
Edition Number: 1 (2013)
Mr. A Ohri

We trust you are as enthusiastic about this opportunity to distribute your book as we are.

The publisher of the translation is: Xi’an Jiaotong University Press.

The financial conditions we agreed upon are:
A flat fee of EUR XXXX  for 3,000 copies, payable upon conclusion of the agreement but not later than 60 days thereafter. (Please note that this fee is subject to tax deductions of 15.77%, imposed by the licensee’s country.)

Upon receiving the payment, we will ask our accounts department to transfer your shares to you, according to your contract with Springer. The share will then be shown on the next royalty statement you’ll receive.

Upon publication you would receive 4 complimentary copies.

In case you have further questions, please do not hesitate to contact us.

With best wishes,


Rights and Permissions


For the book in English- see right margin!

The dichotomy in being a writer on open source with a non-open access publisher

  • The publisher adds credibility to your work


  • A self fulfilling prophecy where researchers want to publish in exclusive journals and closed -access books, for the sole reason that others did so as well before them and thereby donate their knowledge and money to the publisher


The dichotomy in being a writer on open source with a non-open access publisher?

  • I write on open source R , 
  • and I have been published (one book )
  • and am on contract to write two more ( R for Cloud Computing) and (R for Web and Social Media Analytics)
  • My publisher does have open access journals.
  • But the book is at $50. Most of India lives at less than 2$ per day. Thats 800 million people in my country alone.

But the publisher is the most reputed in this field. So what are my choices? How do I get more people to have choices to read books.

Take open knowledge , curate it, and turn it behind a $50 paywall. I am sorry, Aaron. People like me are the reason ……


Writing a technical book

This is a fairly concise collection on how to write a technical book. It may seem arrogant for a 1- book author like me to do so, but I get a lot of queries on this and it seems there is a fair amount of information asymmetry on this process.  I have experience with getting rejected and accepted in both creative and technology domains, but I will make this post fairly tech specific.

Books I have Written-(click on images to go to the book site)


Poetry (Self Published)

In Case I Don't See You Again
Corporate Poetry
Poets & Hackers (e-book)
Technology (Published )
R for Business Analytics
(Currently Writing)
R for Cloud Computing ( Springer) – Due 2013
R for Web Analytics and Social Media Analytics (Springer) – Due 2014
Top 5 Myths on Writing and Getting Published
  • Publishers dont like unsolicited manuscripts.

Well they don’t like unsolicited manuscripts from total unknowns. This is also very domain specific. If you are writing a novel, or a poetry book, or a technical book, approval rates will depend on current interest in that domain.

Advice– If you are first time author to be, choose your niche domain as one which you are passionate about and which has been generating some buzz lately. It could be Python, D3, R etc.

  • Publishers get all the money

No, they don’t make that much money compared to a Hollywood studio. Yes, books are expensive, but they basically are funding a whole supply chain that may or may not be efficient. Your book is subsidizing all the books that didn’t sell. Proof reading, and editing are not very glamorous jobs, but they take a long time, and are expensive. I have much more respect for editors now than say 3 years ago. The ultimate in supply chain efficiency would be if each and every hard copy was printed on demand, and each and every soft copy was priced efficiently given pricing elasticity. Pricing analytics on dynamic book pricing (like on Amazon)— hmm

  • Writers get all the money

You would be lucky to get more than 14% from a gross selling price of a hard copy or more than 40% of an electronic book. You want to make money, dont write technical books, write white papers and make webinars.

  • Writers get no money

You don’t make money by writing a technical book, but your branding does go up significantly, and you can now charge for training, webinars, talks, conferences, white papers, articles. These alternatives can help you survive.

  • I got a great idea- but I keep getting rejected. That guy had a lousy idea, but he keeps writing.

THAT guy wrote a great proposal, spent time building his brand, and wrote interesting stuff. Publishers like to sell books, not ideas.Writer jealousy and insecurity are part of the game – you have a limited amount of energy in a day- spend that writing or spend that reading. Ideally do both.

Book Publication

The book publication process has three parts-

1) Proposal

2) Manuscript

3) Editing

1) Proposal- Write an awesome proposal. Take tips from the publisher website. Choose which publisher is more interested in publishing the topic (hint- go to all the websites) . Those publisher websites confusing you yet- jump to the FAQ.

Some publishers I think relevant to technical books-




2) Manuscript- Write daily . 300 words. 300 times. Thats a manuscript. It is tough for people like us. Hemingway had  it easy. I used a Latex GUI called Lyx for writing http://www.lyx.org/. You may choose your own tool, style, time of day /night, cafe , room to spur your creative juices.

3) Editing- you will edit, chop, re edit and rewrite a book many times. It is ok. Make it readable is my advice. Try and think of a non technical person and try and explain your book to clear your ideas.

Once your proposal is accepted, you sign a contract for royalty and copyright.

Once the contract is signed you write the manuscript.This also involves a fair amount of research, citations, folder management , to keep your book figures, your citations ready. I generally write the citation then and there within the book, and then organize them later chapter by chapter. Un-cited work leads to charges of plagiarism which is the buzz kill for any author. Write, Cite, Rewrite.

You will also need to create index (can be done by software) so people can navigate the book better , and appendix for hiding all the stuff you couldn’t leave behind.

Once you submit the manuscript ,you choose the cover, discuss the rewrites with editor, edit the changes suggested, and resend the manuscript files, count till six months for publication. Send copies to people you like who can help spread the word on your book. Wait for reviews, engage with positivity with everyone, then wait for sales figures. Congrats- you are a writer now!




Interview Rob J Hyndman Forecasting Expert #rstats

Here is an interview with Prof Rob J Hyndman who has created many time series forecasting methods and authored books as well as R packages on the same.

Ajay -Describe your journey from being a student of science to a Professor. What were some key turning points along that journey?
Rob- I started a science honours degree at the University of Melbourne in 1985. By the end of 1985 I found myself simultaneously working as a statistical consultant (having completed all of one year of statistics courses!). For the next three years I studied mathematics, statistics and computer science at university, and tried to learn whatever I needed to in order to help my growing group of clients. Often we would cover things in classes that I’d already taught myself through my consulting work. That really set the trend for the rest of my career. I’ve always been an academic on the one hand, and a statistical consultant on the other. The consulting work has led me to learn a lot of things that I would not otherwise have come across, and has also encouraged me to focus on research problems that are of direct relevance to the clients I work with.
I never set out to be an academic. In fact, I thought that I would get a job in the business world as soon as I finished my degree. But once I completed the degree, I was offered a position as a statistical consultant within the University of Melbourne, helping researchers in various disciplines and doing some commercial work. After a year, I was getting bored doing only consulting, and I thought it would be interesting to do a PhD. I was lucky enough to be offered a generous scholarship which meant I was paid more to study than to continue working.
Again, I thought that I would probably go and get a job in the business world after I finished my PhD. But I finished it early and my scholarship was going to be cut off once I submitted my thesis. So instead, I offered to teach classes for free at the university and delayed submitting my thesis until the scholarship period ran out. That turned out to be a smart move because the university saw that I was a good teacher, and offered me a lecturing position starting immediately I submitted my thesis. So I sort of fell into an academic career.
I’ve kept up the consulting work part-time because it is interesting, and it gives me a little extra money. But I’ve also stayed an academic because I love the freedom to be able to work on anything that takes my fancy.
Ajay- Describe your upcoming book on Forecasting.
Rob- My first textbook on forecasting (with Makridakis and Wheelwright) was written a few years after I finished my PhD. It has been very popular, but it costs a lot of money (about $140 on Amazon). I estimate that I get about $1 for every book sold. The rest goes to the publisher (Wiley) and all they do is print, market and distribute it. I even typeset the whole thing myself and they print directly from the files I provided. It is now about 15 years since the book was written and it badly needs updating. I had a choice of writing a new edition with Wiley or doing something completely new. I decided to do a new one, largely because I didn’t want a publisher to make a lot of money out of students using my hard work.
It seems to me that students try to avoid buying textbooks and will search around looking for suitable online material instead. Often the online material is of very low quality and contains many errors.
As I wasn’t making much money on my textbook, and the facilities now exist to make online publishing very easy, I decided to try a publishing experiment. So my new textbook will be online and completely free. So far it is about 2/3 completed and is available at http://otexts.com/fpp/. I am hoping that my co-author (George Athanasopoulos) and I will finish it off before the end of 2012.
The book is intended to provide a comprehensive introduction to forecasting methods. We don’t attempt to discuss the theory much, but provide enough information for people to use the methods in practice. It is tied to the forecast package in R, and we provide code to show how to use the various forecasting methods.
The idea of online textbooks makes a lot of sense. They are continuously updated so if we find a mistake we fix it immediately. Also, we can add new sections, or update parts of the book, as required rather than waiting for a new edition to come out. We can also add richer content including video, dynamic graphics, etc.
For readers that want a print edition, we will be aiming to produce a print version of the book every year (available via Amazon).
I like the idea so much I’m trying to set up a new publishing platform (otexts.com) to enable other authors to do the same sort of thing. It is taking longer than I would like to make that happen, but probably next year we should have something ready for other authors to use.
Ajay- How can we make textbooks cheaper for students as well as compensate authors fairly
Rob- Well free is definitely cheaper, and there are a few businesses trying to make free online textbooks a reality. Apart from my own efforts, http://www.flatworldknowledge.com/ is producing a lot of free textbooks. And textbookrevolution.org is another great resource.
With otexts.com, we will compensate authors in two ways. First, the print versions of a book will be sold (although at a vastly cheaper rate than other commercial publishers). The royalties on print sales will be split 50/50 with the authors. Second, we plan to have some features of each book available for subscription only (e.g., solutions to exercises, some multimedia content, etc.). Again, the subscription fees will be split 50/50 with the authors.
Ajay- Suppose a person who used to use forecasting software from another company decides to switch to R. How easy and lucid do you think the current documentation on R website for business analytics practitioners such as these – in the corporate world.
Rob- The documentation on the R website is not very good for newcomers, but there are a lot of other R resources now available. One of the best introductions is Matloff’s “The Art of R Programming”. Provided someone has done some programming before (e.g., VBA, python or java), learning R is a breeze. The people who have trouble are those who have only ever used menu interfaces such as Excel. Then they are not only learning R, but learning to think about computing in a different way from what they are used to, and that can be tricky. However, it is well worth it. Once you know how to code, you can do so much more.  I wish some basic programming was part of every business and statistics degree.
If you are working in a particular area, then it is often best to find a book that uses R in that discipline. For example, if you want to do forecasting, you can use my book (otexts.com/fpp/). Or if you are using R for data visualization, get hold of Hadley Wickham’s ggplot2 book.
Ajay- In a long and storied career- What is the best forecast you ever made ? and the worst?
 Rob- Actually, my best work is not so much in making forecasts as in developing new forecasting methodology. I’m very proud of my forecasting models for electricity demand which are now used for all long-term planning of electricity capacity in Australia (see  http://robjhyndman.com/papers/peak-electricity-demand/  for the details). Also, my methods for population forecasting (http://robjhyndman.com/papers/stochastic-population-forecasts/ ) are pretty good (in my opinion!). These methods are now used by some national governments (but not Australia!) for their official population forecasts.
Of course, I’ve made some bad forecasts, but usually when I’ve tried to do more than is reasonable given the available data. One of my earliest consulting jobs involved forecasting the sales for a large car manufacturer. They wanted forecasts for the next fifteen years using less than ten years of historical data. I should have refused as it is unreasonable to forecast that far ahead using so little data. But I was young and naive and wanted the work. So I did the forecasts, and they were clearly outside the company’s (reasonable) expectations, and they then refused to pay me. Lesson learned. It’s better to refuse work than do it poorly.

Probably the biggest impact I’ve had is in helping the Australian government forecast the national health budget. In 2001 and 2002, they had underestimated health expenditure by nearly $1 billion in each year which is a lot of money to have to find, even for a national government. I was invited to assist them in developing a new forecasting method, which I did. The new method has forecast errors of the order of plus or minus $50 million which is much more manageable. The method I developed for them was the basis of the ETS models discussed in my 2008 book on exponential smoothing (www.exponentialsmoothing.net)

. And now anyone can use the method with the ets() function in the forecast package for R.
Rob J Hyndman is Pro­fessor of Stat­ist­ics in the Depart­ment of Eco­no­met­rics and Busi­ness Stat­ist­ics at Mon­ash Uni­ver­sity and Dir­ector of the Mon­ash Uni­ver­sity Busi­ness & Eco­nomic Fore­cast­ing Unit. He is also Editor-in-Chief of the Inter­na­tional Journal of Fore­cast­ing and a Dir­ector of the Inter­na­tional Insti­tute of Fore­casters. Rob is the author of over 100 research papers in stat­ist­ical sci­ence. In 2007, he received the Moran medal from the Aus­tralian Academy of Sci­ence for his con­tri­bu­tions to stat­ist­ical research, espe­cially in the area of stat­ist­ical fore­cast­ing. For 25 years, Rob has main­tained an act­ive con­sult­ing prac­tice, assist­ing hun­dreds of com­pan­ies and organ­iz­a­tions. His recent con­sult­ing work has involved fore­cast­ing elec­tri­city demand, tour­ism demand, the Aus­tralian gov­ern­ment health budget and case volume at a US call centre.

Cloud Computing using Python

I liked the new features in PiCloud , which is a cloud computing way to use Python. Python is increasingly popular as a computational language, and the cloud is the way where HW is headed to atleast as of 2011-12


The new features allows you to publish your own functions as urls.

 By publishing your Python functions to URLs. Why would you want to publish a function?

  • To call your Python functions from a programming language other than Python.
  • To use PiCloud from Google AppEngine, which does not support our native client library.
  • To easily setup a scalable RPC system.

Here’s a peek at the interface:

You publish a Python function

cloud.rest.publish(your_func, ‘myfunction’)

We give you a URL Back


You make an HTTP request using your method of choice to the URL

curl -k -u ‘key:secret_key’ https://api.picloud.com/r/2/myfunction/

It certainly is an interesting development and I am wondering how other languages can adopt this paradigm as well.
For R, as of now http://www.cloudnumbers.com/ seems to be the only player in the cloud.
It would be exciting to see more players in the cloud statistical analytical space.


The Top Statisticians in the World









John Tukey

From Wikipedia, the free encyclopedia
John Tukey

John Wilder Tukey
Born June 16, 1915
New Bedford, Massachusetts, USA
Died July 26, 2000 (aged 85)
New Brunswick, New Jersey
Residence United States
Nationality American
Fields Mathematician
Institutions Bell Labs
Princeton University
Alma mater Brown University
Princeton University
Doctoral advisor Solomon Lefschetz
Doctoral students Frederick Mosteller
Kai Lai Chung
Known for FFT algorithm
Box plot
Coining the term ‘bit’
Notable awards Samuel S. Wilks Award (1965)
National Medal of Science (USA) in Mathematical, Statistical, and Computational Sciences (1973)
Shewhart Medal (1976)
IEEE Medal of Honor (1982)
Deming Medal (1982)
James Madison Medal (1984)
Foreign Member of the Royal Society(1991)

John Wilder Tukey ForMemRS[1] (June 16, 1915 – July 26, 2000) was an American statistician.




Tukey was born in New Bedford, Massachusetts in 1915, and obtained a B.A. in 1936 and M.Sc.in 1937, in chemistry, from Brown University, before moving to Princeton University where he received a Ph.D. in mathematics.[2]

During World War II, Tukey worked at the Fire Control Research Office and collaborated withSamuel Wilks and William Cochran. After the war, he returned to Princeton, dividing his time between the university and AT&T Bell Laboratories.

Among many contributions to civil society, Tukey served on a committee of the American Statistical Association that produced a report challenging the conclusions of the Kinsey Report,Statistical Problems of the Kinsey Report on Sexual Behavior in the Human Male.

He was awarded the IEEE Medal of Honor in 1982 “For his contributions to the spectral analysis of random processes and the fast Fourier transform (FFT) algorithm.”

Tukey retired in 1985. He died in New Brunswick, New Jersey on July 26, 2000.

[edit]Scientific contributions

His statistical interests were many and varied. He is particularly remembered for his development with James Cooley of the Cooley–Tukey FFT algorithm. In 1970, he contributed significantly to what is today known as the jackknife estimation—also termed Quenouille-Tukey jackknife. He introduced the box plot in his 1977 book,”Exploratory Data Analysis“.

Tukey’s range test, the Tukey lambda distributionTukey’s test of additivity and Tukey’s lemma all bear his name. He is also the creator of several little-known methods such as the trimean andmedian-median line, an easier alternative to linear regression.

In 1974, he developed, with Jerome H. Friedman, the concept of the projection pursuit.[3]


Sir Ronald Aylmer Fisher FRS (17 February 1890 – 29 July 1962) was an English statistician,evolutionary biologisteugenicist and geneticist. Among other things, Fisher is well known for his contributions to statistics by creating Fisher’s exact test and Fisher’s equationAnders Hald called him “a genius who almost single-handedly created the foundations for modern statistical science”[1] while Richard Dawkins named him “the greatest biologist since Darwin“.[2]




William Sealy Gosset (June 13, 1876–October 16, 1937) is famous as a statistician, best known by his pen name Student and for his work on Student’s t-distribution.

Born in CanterburyEngland to Agnes Sealy Vidal and Colonel Frederic Gosset, Gosset attendedWinchester College before reading chemistry and mathematics at New College, Oxford. On graduating in 1899, he joined the Dublin brewery of Arthur Guinness & Son.

Guinness was a progressive agro-chemical business and Gosset would apply his statistical knowledge both in the brewery and on the farm—to the selection of the best yielding varieties ofbarley. Gosset acquired that knowledge by study, trial and error and by spending two terms in 1906–7 in the biometric laboratory of Karl Pearson. Gosset and Pearson had a good relationship and Pearson helped Gosset with the mathematics of his papers. Pearson helped with the 1908 papers but he had little appreciation of their importance. The papers addressed the brewer’s concern with small samples, while the biometrician typically had hundreds of observations and saw no urgency in developing small-sample methods.

Another researcher at Guinness had previously published a paper containing trade secrets of the Guinness brewery. To prevent further disclosure of confidential information, Guinness prohibited its employees from publishing any papers regardless of the contained information. However, after pleading with the brewery and explaining that his mathematical and philosophical conclusions were of no possible practical use to competing brewers, he was allowed to publish them, but under a pseudonym (“Student”), to avoid difficulties with the rest of the staff.[1] Thus his most famous achievement is now referred to as Student’s t-distribution, which might otherwise have been Gosset’s t-distribution.

Google unleashes Fusion Tables

I just discovered Fusion Tables. There is life beyond the amazing Jeff’s Amazon Ec2/s3 after all!

Check out http://www.google.com/fusiontables/public/tour/index.html

Gather, visualize and share data online

Don’t have a Google Account?
Create one now

  • Visualize and publish your data as maps, timelines and charts
  • Host your data tables online
  • Combine data from multiple people

data table turns into map

Google Fusion Tables is a modern data management and publishing web application that makes it easy
to host, manage, collaborate on, visualize, and publish data tables online.

What can I do with Google Fusion Tables?

Import your own data
Upload data tables from spreadsheets or CSV files, even KML. Developers can use the Fusion Tables API to insert, update, delete and query data programmatically. You can export your data as CSV or KML too.

Visualize it instantly
See the data on a map or as a chart immediately. Use filters for more selective visualizations.

Publish your visualization on other web properties
Now that you’ve got that nice map or chart of your data, you can embed it in a web page or blog post. Or send a link by email or IM. It will always display the latest data values from your table and helps you communicate your story more easily.

Look at the Fusion Tables Example Gallery

at https://sites.google.com/site/fusiontablestalks/stories

If you are worried about data.gov closing down, heres a snapshot of Fusion Table Public datasets.