Much ado about nothing

P Values have now become controversial. P here does not stand for President Trump but this.

After 150 Years, the ASA Says No to p-values

https://matloff.wordpress.com/2016/03/07/after-150-years-the-asa-says-no-to-p-values/

Sadly, the concept of p-values and significance testing forms the very core of statistics. A number of us have been pointing out for decades that p-values are at best underinformative and often misleading. Almost all statisticians agree on this, yet they all continue to use it and, worse, teach it. I recall a few years ago, when Frank Harrell and I suggested that R place less emphasis on p-values in its output, there was solid pushback. One can’t blame the pusherbackers, though, as the use of p-values is so completely entrenched that R would not be serving its users well with such a radical move.

Click to access P-ValueStatement.pdf

The American Statistical Association (ASA) has released a “Statement on Statistical Significance and P-Values” with six principles underlying the proper use and interpretation of the p-value [http://amstat.tandfonline.com/doi/abs/10.1080/00031305.2016.1154108#.Vt2XIOaE2MN]. The ASA releases this guidance on p-values to improve the conduct and interpretation of quantitative science and inform the growing emphasis on reproducibility of science research. The statement also notes that the increased quantification of scientific research and a proliferation of large, complex data sets has expanded the scope for statistics and the importance of appropriately chosen techniques, properly conducted analyses, and correct interpretation.

 

I personally think Big Data needs Bigger Thinking among statisticians about a new era of inference.

However as always these guys are the best

Algorithm to deal with a broken heart

  • Abort (A): Terminate the operation/program and return to the system command prompt.[2] In hindsight this was not a good idea as the program would not do any cleanup (such as completing writing of other files). “Abort” was necessary because early DOS did not implement “Fail”. It may have remained necessary for poorly written software for which “Fail” would have caused a loop that would have repeatedly invoked the critical error handler with no other way to exit.
  • Retry (R): DOS would attempt the operation again.[2] “Retry” made sense if the user could rectify the problem. To continue the example above, if the user simply forgot to close the drive latch, they could close it, retry, and the system would continue where it left off.
  • Ignore (I) (older versions of DOS): Return success status to the calling program/routine, despite the failure of the operation.[2] For instance, a disk read error could be ignored and DOS would return whatever data was in the read buffer, which might contain some of the correct data from the disk. Attempting to use results after an “Ignore” was an undefined behavior.[2] “Ignore” did not appear in cases where it was impossible for the data to be used; for instance, a missing disk could not be ignored because that would require DOS to construct and return some kind of file descriptor that worked in further “read” calls. This is not available if DOS cannot read any sector from the first sector of a floppy disk or a partition of a hard disk to the last sector of the root directory.
  • Fail (F) (DOS 3.3 and later): Return failure status to the calling program/routine.[2] “Fail” returned an error code to the program, similar to other errors such as file not found. The program could then gracefully recover from the problem.

from https://en.wikipedia.org/wiki/Abort,_Retry,_Fail%3F

In the lines above replace DOS with LOVER, and you have the algorithm

Interview Questions for Budding Data Scientists for Edureka Blog

I created a list of questions and answers I have seen for data science interviews. Since everyone claims to be an expert in data science, let me assure I am obediently learning new things with bemused humility after 12 years.

Interview Questions for Budding Data Scientists

Background– I have been working into analytics since February 2004. From 2004 to 2007 I worked only in SAS language. From 2008 onwards I started working with R and SAS languages. From 2013 I started working with Python. Since around 2009, we had a term called Big Data thanks to Hadoop and since 2013 we had a term called data science. What used to be called just analytics is now called data science, with added variants of Big Data Analytics and Business Analytics to refer to the same. For techniques in building models, we have used the terms predictive analytics, data mining and machine learning to mean roughy the same thing ( but actually they might be different). I had a MBA for business training and have written two books on R by now. I am currently writing my third book but on Python for Wiley.
During this journey I have taught, trained, mentored hundreds of budding data scientists and also interviewed a few of them, while giving a few interviews myself.  Based on this experience, here are a few questions to help you clear an entry level interview for data science roles and become data scientists.  Since data science itself is an intersection of business perspective, statistics and coding, I have accordingly labeled them in specific sections. This will not give you a sure shot chance of clearing an interview but learning a few of these questions and answers will definitely help increase the probability of you clearing the interview process.

http://www.edureka.co/blog/top-data-science-interview-questions-for-budding-data-scientists

I helped create Edureka’s R course  in 2013 (before I did it for Collabera in 2015 and after I did it for Jigsaw in 2011)- you can see a video of the initial class here which has gotten 100,000 views.

Edureka remains one of the true believers in customer centered online education without fooling young people of too much money, by a mass market, mass course approach with actual teacher student interaction than the cold robotic automation of MOOCs.

Screenshot from 2016-03-02 12:41:58

(related-

Jigsaw Completes training of 300 students on R

http://analyticstraining.com/author/ajay-ohri/

http://www.collaberatact.com/online-training-courses/analytics-with-r-certification/

 

 

 

 

 

PYTHON FOR R USERS : ; Come September

I am writing a new book on a new language for me (python) for a new publisher ( Wiley)

 

This book is the first of its kind to provide a reference that enables students and practitioners to easily learn to code in Python if they are familiar with R and vice versa, even if they are beginners in the second language. It also provides a detailed introduction and overview of each language to the reader who might be unfamiliar with the other. While R has better statistical and graphical tools, Python has good machine learning tools and proves to be more useful software for the analysis of Big Data. A unique feature of this book is how it provides a command-by-command translation between R and Python for many mathematical, visualization and machine learning techniques. The intended audience is statistical practitioners and data scientists trying to learn one of R or Python or both, as well as students that are familiar with one of the languages.

http://www.amazon.co.uk/Python-R-Users-Ajay-Ohri/dp/1119126762

 

Coming to Bay Area in April

Despite my visa blues ( see more at https://todayilearnedinamerica.wordpress.com/2016/02/15/night-13-make-epic-shit/ ) I am still hanging on and traveling on in the United States of America. I might have posted some blog posts here by mistake but I corrected that today since I am traveling by an Amtrac pass and I dont have wifi there.

I am also going to TWO of the best conferences I have never attended despite being a blog Partner since past three years.

Predictive Analytics World San Francisco – April 3-7, 2016http://www.predictiveanalyticsworld.com/sanfrancisco/2016/

pawsf16_blog
Predictive Analytics World for Workforce – April 3-6, 2016http://www.predictiveanalyticsworld.com/workforce/2016/

Do you want to go to San Fransisco for this conference ?

MAGIC CODE TO USE- You can get a 15% off the price of registration for 2 Day and Combo passes:    AJAYBP16

pawwf16_blog