Writing for kdnuggets.com

I have been writing freelance for kdnuggets.com

Its a great learning for me to be a better writer especially in my discipline.

These are a list of articles -interviews are in bold and I will keep updating this list

  1. Book Review: Data Just Right 2014/04/03
  2. Exclusive Interview: Richard Socher, founder of etcML, Easy Text Classification Startup 2014/03/31
  3. Trifacta – Tackling Data Wrangling with Automation and Machine Learning 2014/03/17
  4. Paxata automates Data Preparation for Big Data Analytics 2014/03/07
  5. etcML Promises to Make Text Classification Easy  2014/03/05
  6. Wolfram Breakthrough Knowledge-based Programming Language – what it means for Data Science? 2014/03/02

Writing on APIs for Programmable Web

I have been writing free lance on APIs for Programmable Web. Here is an updated list of the articles, many of these would be of interest to analytics users. Note- some of these are interviews and they are in bold. Note to regular readers: I keep updating this list , and at each updation bring it to the front page, then allowing the blog postings to slide it down!

Scoreoid Aims to Gamify the World Using APIs January 27th, 2014

Plot.ly’s Plot to Visualize More Data January 22nd, 2014

LumenData’s Acquisition of Algorithms.io is a Win-Win January 8th, 2014

Yactraq API Sees Huge Growth in 2013  January 6th, 2014

Scrape.it Describes a Better Way to Extract Data December 20th, 2013

Exclusive Interview: App Store Analytics API December 4th, 2013

APIs Enter 3d Printing Industry November 29th, 2013

PW Interview: José Luis Martinez of Textalytics November 6th, 2013

PW Interview Simon Chan PredictionIO November 5th, 2013

PW Interview: Scott Gimpel Founder and CEO FantasyData.com October 23rd, 2013

PW Interview Brandon Levy, cofounder and CEO of Stitch Labs October 8th, 2013

PW Interview: Jolo Balbin Co-Founder Text Teaser  September 18th, 2013

PW Interview:Bob Bickel CoFounder Redline13 July 29th, 2013

PW Interview : Brandon Wirtz CTO Stremor.com   July 4th, 2013

PW Interview: Andy Bartley, CEO Algorithms.io  June 4th, 2013

PW Interview: Francisco J Martin, CEO BigML.com 2013/05/30

PW Interview: Tal Rotbart Founder- CTO, SpringSense 2013/05/28

PW Interview: Jeh Daruwala CEO Yactraq API, Behavorial Targeting for videos 2013/05/13

PW Interview: Michael Schonfeld of Dwolla API on Innovation Meeting the Payment Web  2013/05/02

PW Interview: Stephen Balaban of Lamda Labs on the Face Recognition API  2013/04/29

PW Interview: Amber Feng, Stripe API, The Payment Web 2013/04/24

PW Interview: Greg Lamp and Austin Ogilvie of Yhat on Shipping Predictive Models via API   2013/04/22

Google Mirror API documentation is open for developers   2013/04/18

PW Interview: Ricky Robinett, Ordr.in API, Ordering Food meets API    2013/04/16

PW Interview: Jacob Perkins, Text Processing API, NLP meets API   2013/04/10

Amazon EC2 On Demand Windows Instances -Prices reduced by 20%  2013/04/08

Amazon S3 API Requests prices slashed by half  2013/04/02

PW Interview: Stuart Battersby, Chatterbox API, Machine Learning meets Social 2013/04/02

PW Interview: Karthik Ram, rOpenSci, Wrapping all science API2013/03/20

Viralheat Human Intent API- To buy or not to buy 2013/03/13

Interview Tammer Kamel CEO and Founder Quandl 2013/03/07

YHatHQ API: Calling Hosted Statistical Models 2013/03/04

Quandl API: A Wikipedia for Numerical Data 2013/02/25

Amazon Redshift API is out of limited preview and available! 2013/02/18

Windows Azure Media Services REST API 2013/02/14

Data Science Toolkit Wraps Many Data Services in One API 2013/02/11

Diving into Codeacademy’s API Lessons 2013/01/31

Google APIs finetuning Cloud Storage JSON API 2013/01/29

Ergast API Puts Car Racing Fans in the Driver’s Seat 2012/12/05
Springer APIs- Fostering Innovation via API Contests 2012/11/20
Statistically programming the web – Shiny,HttR and RevoDeploy API 2012/11/19
Google Cloud SQL API- Bigger ,Faster and now Free 2012/11/12
A Look at the Web’s Most Popular API -Google Maps API 2012/10/09
Cloud Storage APIs for the next generation Enterprise 2012/09/26
Last.fm API: Sultan of Musical APIs 2012/09/12
Socrata Data API: Keeping Government Open 2012/08/29
BigML API Gets Bigger 2012/08/22
Bing APIs: the Empire Strikes Back 2012/08/15
Google Cloud SQL: Relational Database on the Cloud 2012/08/13
Google BigQuery API Makes Big Data Analytics Easy 2012/08/05
Your Store in The Cloud -Google Cloud Storage API 2012/08/01
Predict the future with Google Prediction API 2012/07/30
The Romney vs Obama API 2012/07/27

Some tips on creating a useful blog for beginners

1) Blog post title should be self explanatory

2) Use categories and tags for better navigation

3) Use a theme which attracts not distracts

4) Simple language in blog writing works best

5) Useful blogs get more traffic than autobiographical blogs. Unless you are a celebrity.

6) People who enjoy writing blogs create better blogs

7) Writing a blog  is like jogging. Do it every day , even when its boring and painful. or Do it as much as your schedule permits.


How to be a better writer


Background- I wrote this as an accident while trolling on Quora. I was not confident of what I wrote- in fact I wrote it anonymous except people kept asking me why! It was pure serendipity- I wrote it less than 4 minutes and submitted without thinking. Then edited once based on feedback.

Some one clearly more smarter than me made my tips for writing into a picture http://amandaonwriting.tumblr.com/post/54265230509

and it went popular on Tumblr just like it did on Quora!

Apparently if some guy like Wil Wheaton likes your words, it can go viral!  It has 41799 notes ( reblogs+hearts) on Tumblr as of now.


Words . Reposted by a member of STAR TREK:NG. I can now die a happy Geek! The Internet is a funny thing!

Thank you everyone! Now if only Google learnt to include OCR for Images as part of text search!

  1. Write 50 words . That’s  a paragraph.
  2. Write 400 words . That’s a page.
  3. Write 300 pages. That’s a manuscript.
  4. Write everyday. That’s a habit.
  5. Edit and Rewrite. That’s how you get better.
  6. Spread your writing for people to comment. That’s called feedback.
  7. Dont worry about rejection or publication. That’s a writer.
  8. When not writing, read. Read from writers better than you. Read and Perceive.

But overall, just write more to get better.

1887+ votes on Quora!! :) Probably my most viewed content ever- !

61036 people  have viewed this answer!


Also it got a mention here-


Now I think I should take some of my own advice and get back to writing

The dichotomy in being a writer on open source with a non-open access publisher

  • The publisher adds credibility to your work


  • A self fulfilling prophecy where researchers want to publish in exclusive journals and closed -access books, for the sole reason that others did so as well before them and thereby donate their knowledge and money to the publisher


The dichotomy in being a writer on open source with a non-open access publisher?

  • I write on open source R , 
  • and I have been published (one book )
  • and am on contract to write two more ( R for Cloud Computing) and (R for Web and Social Media Analytics)
  • My publisher does have open access journals.
  • But the book is at $50. Most of India lives at less than 2$ per day. Thats 800 million people in my country alone.

But the publisher is the most reputed in this field. So what are my choices? How do I get more people to have choices to read books.

Take open knowledge , curate it, and turn it behind a $50 paywall. I am sorry, Aaron. People like me are the reason ……


Writing a technical book

This is a fairly concise collection on how to write a technical book. It may seem arrogant for a 1- book author like me to do so, but I get a lot of queries on this and it seems there is a fair amount of information asymmetry on this process.  I have experience with getting rejected and accepted in both creative and technology domains, but I will make this post fairly tech specific.

Books I have Written-(click on images to go to the book site)


Poetry (Self Published)

In Case I Don't See You Again
Corporate Poetry
Poets & Hackers (e-book)
Technology (Published )
R for Business Analytics
(Currently Writing)
R for Cloud Computing ( Springer) – Due 2013
R for Web Analytics and Social Media Analytics (Springer) – Due 2014
Top 5 Myths on Writing and Getting Published
  • Publishers dont like unsolicited manuscripts.

Well they don’t like unsolicited manuscripts from total unknowns. This is also very domain specific. If you are writing a novel, or a poetry book, or a technical book, approval rates will depend on current interest in that domain.

Advice- If you are first time author to be, choose your niche domain as one which you are passionate about and which has been generating some buzz lately. It could be Python, D3, R etc.

  • Publishers get all the money

No, they don’t make that much money compared to a Hollywood studio. Yes, books are expensive, but they basically are funding a whole supply chain that may or may not be efficient. Your book is subsidizing all the books that didn’t sell. Proof reading, and editing are not very glamorous jobs, but they take a long time, and are expensive. I have much more respect for editors now than say 3 years ago. The ultimate in supply chain efficiency would be if each and every hard copy was printed on demand, and each and every soft copy was priced efficiently given pricing elasticity. Pricing analytics on dynamic book pricing (like on Amazon)— hmm

  • Writers get all the money

You would be lucky to get more than 14% from a gross selling price of a hard copy or more than 40% of an electronic book. You want to make money, dont write technical books, write white papers and make webinars.

  • Writers get no money

You don’t make money by writing a technical book, but your branding does go up significantly, and you can now charge for training, webinars, talks, conferences, white papers, articles. These alternatives can help you survive.

  • I got a great idea- but I keep getting rejected. That guy had a lousy idea, but he keeps writing.

THAT guy wrote a great proposal, spent time building his brand, and wrote interesting stuff. Publishers like to sell books, not ideas.Writer jealousy and insecurity are part of the game – you have a limited amount of energy in a day- spend that writing or spend that reading. Ideally do both.

Book Publication

The book publication process has three parts-

1) Proposal

2) Manuscript

3) Editing

1) Proposal- Write an awesome proposal. Take tips from the publisher website. Choose which publisher is more interested in publishing the topic (hint- go to all the websites) . Those publisher websites confusing you yet- jump to the FAQ.

Some publishers I think relevant to technical books-




2) Manuscript- Write daily . 300 words. 300 times. Thats a manuscript. It is tough for people like us. Hemingway had  it easy. I used a Latex GUI called Lyx for writing http://www.lyx.org/. You may choose your own tool, style, time of day /night, cafe , room to spur your creative juices.

3) Editing- you will edit, chop, re edit and rewrite a book many times. It is ok. Make it readable is my advice. Try and think of a non technical person and try and explain your book to clear your ideas.

Once your proposal is accepted, you sign a contract for royalty and copyright.

Once the contract is signed you write the manuscript.This also involves a fair amount of research, citations, folder management , to keep your book figures, your citations ready. I generally write the citation then and there within the book, and then organize them later chapter by chapter. Un-cited work leads to charges of plagiarism which is the buzz kill for any author. Write, Cite, Rewrite.

You will also need to create index (can be done by software) so people can navigate the book better , and appendix for hiding all the stuff you couldn’t leave behind.

Once you submit the manuscript ,you choose the cover, discuss the rewrites with editor, edit the changes suggested, and resend the manuscript files, count till six months for publication. Send copies to people you like who can help spread the word on your book. Wait for reviews, engage with positivity with everyone, then wait for sales figures. Congrats- you are a writer now!




Data Frame in Python

Exploring some Python Packages and R packages to move /work with both Python and R without melting your brain or exceeding your project deadline


If you liked the data.frame structure in R, you have some way to work with them at a faster processing speed in Python.

Here are three packages that enable you to do so-

(1) pydataframe http://code.google.com/p/pydataframe/

An implemention of an almost R like DataFrame object. (install via Pypi/Pip: “pip install pydataframe”)


        u = DataFrame( { "Field1": [1, 2, 3],
                        "Field2": ['abc', 'def', 'hgi']},
                         ['Field1', 'Field2']
                         ["rowOne", "rowTwo", "thirdRow"])

A DataFrame is basically a table with rows and columns.

Columns are named, rows are numbered (but can be named) and can be easily selected and calculated upon. Internally, columns are stored as 1d numpy arrays. If you set row names, they’re converted into a dictionary for fast access. There is a rich subselection/slicing API, see help(DataFrame.get_item) (it also works for setting values). Please note that any slice get’s you another DataFrame, to access individual entries use get_row(), get_column(), get_value().

DataFrames also understand basic arithmetic and you can either add (multiply,…) a constant value, or another DataFrame of the same size / with the same column names, like this:

#multiply every value in ColumnA that is smaller than 5 by 6.
my_df[my_df[:,'ColumnA'] < 5, 'ColumnA'] *= 6

#you always need to specify both row and column selectors, use : to mean everything
my_df[:, 'ColumnB'] = my_df[:,'ColumnA'] + my_df[:, 'ColumnC']

#let's take every row that starts with Shu in ColumnA and replace it with a new list (comprehension)
select = my_df.where(lambda row: row['ColumnA'].startswith('Shu'))
my_df[select, 'ColumnA'] = [row['ColumnA'].replace('Shu', 'Sha') for row in my_df[select,:].iter_rows()]

Dataframes talk directly to R via rpy2 (rpy2 is not a prerequiste for the library!)


(2) pandas http://pandas.pydata.org/

Library Highlights

  • A fast and efficient DataFrame object for data manipulation with integrated indexing;
  • Tools for reading and writing data between in-memory data structures and different formats: CSV and text files, Microsoft Excel, SQL databases, and the fast HDF5 format;
  • Intelligent data alignment and integrated handling of missing data: gain automatic label-based alignment in computations and easily manipulate messy data into an orderly form;
  • Flexible reshaping and pivoting of data sets;
  • Intelligent label-based slicing, fancy indexing, and subsetting of large data sets;
  • Columns can be inserted and deleted from data structures for size mutability;
  • Aggregating or transforming data with a powerful group by engine allowing split-apply-combine operations on data sets;
  • High performance merging and joining of data sets;
  • Hierarchical axis indexing provides an intuitive way of working with high-dimensional data in a lower-dimensional data structure;
  • Time series-functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging. Even create domain-specific time offsets and join time series without losing data;
  • The library has been ruthlessly optimized for performance, with critical code paths compiled to C;
  • Python with pandas is in use in a wide variety of academic and commercial domains, including Finance, Neuroscience, Economics, Statistics, Advertising, Web Analytics, and more.

Why not R?

First of all, we love open source R! It is the most widely-used open source environment for statistical modeling and graphics, and it provided some early inspiration for pandas features. R users will be pleased to find this library adopts some of the best concepts of R, like the foundational DataFrame (one user familiar with R has described pandas as “R data.frame on steroids”). But pandas also seeks to solve some frustrations common to R users:

  • R has barebones data alignment and indexing functionality, leaving much work to the user. pandas makes it easy and intuitive to work with messy, irregularly indexed data, like time series data. pandas also provides rich tools, like hierarchical indexing, not found in R;
  • R is not well-suited to general purpose programming and system development. pandas enables you to do large-scale data processing seamlessly when developing your production applications;
  • Hybrid systems connecting R to a low-productivity systems language like Java, C++, or C# suffer from significantly reduced agility and maintainability, and you’re still stuck developing the system components in a low-productivity language;
  • The “copyleft” GPL license of R can create concerns for commercial software vendors who want to distribute R with their software under another license. Python and pandas use more permissive licenses.

(3) datamatrix http://pypi.python.org/pypi/datamatrix/0.8

datamatrix 0.8

A Pythonic implementation of R’s data.frame structure.

Latest Version: 0.9

This module allows access to comma- or other delimiter separated files as if they were tables, using a dictionary-like syntax. DataMatrix objects can be manipulated, rows and columns added and removed, or even transposed


Modeling in Python



