Things I wonder about

  1. Why humans need one set of accommodation to live in, another to work in, and a third to relax in. It seems we are using three times the number of buildings we should be using.
  2. Why can’t analytics measure the cost to environment (not just carbon output) in any product and service?
  3. What prevents a global effort for analytics  against corruption ?
  4. Why open source software underestimates the need of marketing and why proprietary software companies underestimate the need for open sourcing at least a small part of their extensive portfolio?
  5. Why is education and training still so expensive in the era of MOOCs and Internet and Skype?
  6. Why are expensive textbooks (and books and newspapers) still being printed on paper?
  7. Why does it take 15 minutes to set up the projector before any presentation despite the advances in technology?
  8. Why can’t I just 3D print most of my wardrobe and my gadgets?
  9. When will we have virtual reality movies?
  10. Why software companies focus on creating more and more languages, rather than use machine learning to create a language 1 to language 2 translator. How about a Google/Bing Translate for Computer Languages?
  11. Why they do a lot of checking for giving me a credit card but not so much checking for giving me a gun in the USA? Why do 2 billion Indians and Chinese put up with corruption ? Why do Europeans work so few hours and Asians so many?
  12. Why people who write packages in open source make less money than people who write apps for mobiles?
  13. When can software startups  focus on job search and dating search as the real problems humans care for- not just website search?
  14. Why is there a digital divide and what a donation of 1000,000 phablets in poor countries to kids can do for the future?
  15. When will we start consuming smarter rather than just less or more to heal climate change?

But mostly I am thinking of this?  aYpYmWV_700b1-560x559Happy New Year. Stay Awesome and Classy

Writing for

I have been writing freelance for

Its a great learning for me to be a better writer especially for analytics and programming

These are a list of articles -interviews are in bold and I will keep updating this list when there are new additions

  1. Interview: Ingo Mierswa, RapidMiner CEO on “Predaction” and Key Turning Points June 2014

  2. Guide to Data Science Cheat Sheets 2014/05/12

  3. Book Review: Data Just Right 2014/04/03
  4. Exclusive Interview: Richard Socher, founder of etcML, Easy Text Classification Startup 2014/03/31
  5. Trifacta – Tackling Data Wrangling with Automation and Machine Learning 2014/03/17
  6. Paxata automates Data Preparation for Big Data Analytics 2014/03/07
  7. etcML Promises to Make Text Classification Easy  2014/03/05
  8. Wolfram Breakthrough Knowledge-based Programming Language – what it means for Data Science? 2014/03/02

Using R for random number creation from time stamps #rstats

Suppose – let us just suppose- you want to create random numbers that are reproducible , and derived from time stamps

Here is the code in R

> a=as.numeric(Sys.time())
> set.seed(a)
> rnorm(log(a))

Note- you can create a custom function  ( I used  the log) for generating random numbers of the system time too. This creates a random numbered list of pseudo random numbers (since nothing machine driven is purely random in the strict philosophy of the word)


[1]  39621645  99451316 109889294 110275233 278994547   6554596  38654159  68748122   8920823  13293010
[11]  57664241  24533980 174529340 105304151 168006526  39173857  12810354 145341412 241341095  86568818
[21] 105672257

Possible applications- things that need both random numbers (like encryption keys) and time stamps (like events , web or industrial logs or as pseudo random pass codes in Google 2 factor authentication )

Note I used the rnorm function but you could possibly draw the functions also as a random input (rnorm or rcauchy)

Again I would trust my own random ness than one generated by an arm of US Govt (see )

Update- Random numbers in R


The currently available RNG kinds are given below. kind is partially matched to this list. The default is "Mersenne-Twister".

The seed, .Random.seed[-1] == r[1:3] is an integer vector of length 3, where each r[i] is in 1:(p[i] - 1), where p is the length 3 vector of primes, p = (30269, 30307, 30323). The Wichmann–Hill generator has a cycle length of 6.9536e12 (= prod(p-1)/4, see Applied Statistics (1984) 33, 123 which corrects the original article).

A multiply-with-carry RNG is used, as recommended by George Marsaglia in his post to the mailing list ‘sci.stat.math’. It has a period of more than 2^60 and has passed all tests (according to Marsaglia). The seed is two integers (all values allowed).

Marsaglia’s famous Super-Duper from the 70’s. This is the original version which does not pass the MTUPLE test of the Diehard battery. It has a period of about 4.6*10^18 for most initial seeds. The seed is two integers (all values allowed for the first seed: the second must be odd).

We use the implementation by Reeds et al. (1982–84).

The two seeds are the Tausworthe and congruence long integers, respectively. A one-to-one mapping to S’s .Random.seed[1:12] is possible but we will not publish one, not least as this generator is not exactly the same as that in recent versions of S-PLUS.

From Matsumoto and Nishimura (1998). A twisted GFSR with period 2^19937 – 1 and equidistribution in 623 consecutive dimensions (over the whole period). The ‘seed’ is a 624-dimensional set of 32-bit integers plus a current position in that set.

A 32-bit integer GFSR using lagged Fibonacci sequences with subtraction. That is, the recurrence used is

X[j] = (X[j-100] – X[j-37]) mod 2^30

and the ‘seed’ is the set of the 100 last numbers (actually recorded as 101 numbers, the last being a cyclic shift of the buffer). The period is around 2^129.

An earlier version from Knuth (1997).

The 2002 version was not backwards compatible with the earlier version: the initialization of the GFSR from the seed was altered. R did not allow you to choose consecutive seeds, the reported ‘weakness’, and already scrambled the seeds.

Initialization of this generator is done in interpreted R code and so takes a short but noticeable time.

A ‘combined multiple-recursive generator’ from L’Ecuyer (1999), each element of which is a feedback multiplicative generator with three integer elements: thus the seed is a (signed) integer vector of length 6. The period is around 2^191.

The 6 elements of the seed are internally regarded as 32-bit unsigned integers. Neither the first three nor the last three should be all zero, and they are limited to less than 4294967087 and 4294944443 respectively.

This is not particularly interesting of itself, but provides the basis for the multiple streams used in package parallel.

Use a user-supplied generator.


Function RNGkind allows user-coded uniform and normal random number generators to be supplied.