Why American government needs better data science than provided currently?

  1. Government spends a lot of money tackling the toughest least profitable problems
  2. Govt has trouble recruiting the best hackers , computer scientists and statisticians (data science community) as they generally get a lot more salary in private sector for far more easy problems ( which ad do I want them to click)
  3. Private companies in USA can also outsource or get H1 visa workers for analytical needs while even USA government has to rely on US citizen data scientists for small non-sensitive departments like calculating subsidies for factory farms for Department of Agriculture.
  4. Meanwhile the budget for IT digitization for electronic government and Data Science is quite small
  5. Govt has lot more bureaucracy and lack of speed to get things done which is a big turn off for companies trying to be new data science vendors thus leaving a big hand to pricey players like AH BEE HUMMhttp://www.youtube.com/watch?v=25QyCxVkXwQ
  6. http://www.youtube.com/watch?v=25QyCxVkXwQ
  7. In election season, data scientists are in even more shortage as they work for analyzing and calculating odds for winning states or even work in teams for candidates ( political parties pay everyone working for them by cheque in US )
  8. Information security is one more area where they lack enough recruitment strategy
  9. Hackers have a general aversion to working for any Government ( for less salary) unless they are endowed with equity ( here successes companies like INQTEL (https://www.iqt.org/) can be replicated not just for Intelligence but for other departments as well by startup funds in hacker
  10. Software interfaces need to be updated for better data visualization and analytical communication across departments
  11. More money can be invested in training existing Federal Employees in analytics, analytical way of thinking or even basics of data science

One more note- US government can repair its relationship by the hacker activist community even by small courtesies and track 2 diplomacy. That can help not just with business as usual data science (like where is rain going to fall in Florida and Lousiana for Department of Agriculture) but also special areas of mutual concern (identifying hateful events through crowd sourced intelligence across public social media dataScreenshot from 2016-03-13 04:29:21

Much ado about nothing

P Values have now become controversial. P here does not stand for President Trump but this.

After 150 Years, the ASA Says No to p-values

https://matloff.wordpress.com/2016/03/07/after-150-years-the-asa-says-no-to-p-values/

Sadly, the concept of p-values and significance testing forms the very core of statistics. A number of us have been pointing out for decades that p-values are at best underinformative and often misleading. Almost all statisticians agree on this, yet they all continue to use it and, worse, teach it. I recall a few years ago, when Frank Harrell and I suggested that R place less emphasis on p-values in its output, there was solid pushback. One can’t blame the pusherbackers, though, as the use of p-values is so completely entrenched that R would not be serving its users well with such a radical move.

Click to access P-ValueStatement.pdf

The American Statistical Association (ASA) has released a “Statement on Statistical Significance and P-Values” with six principles underlying the proper use and interpretation of the p-value [http://amstat.tandfonline.com/doi/abs/10.1080/00031305.2016.1154108#.Vt2XIOaE2MN]. The ASA releases this guidance on p-values to improve the conduct and interpretation of quantitative science and inform the growing emphasis on reproducibility of science research. The statement also notes that the increased quantification of scientific research and a proliferation of large, complex data sets has expanded the scope for statistics and the importance of appropriately chosen techniques, properly conducted analyses, and correct interpretation.

 

I personally think Big Data needs Bigger Thinking among statisticians about a new era of inference.

However as always these guys are the best

Interview Questions for Budding Data Scientists for Edureka Blog

I created a list of questions and answers I have seen for data science interviews. Since everyone claims to be an expert in data science, let me assure I am obediently learning new things with bemused humility after 12 years.

Interview Questions for Budding Data Scientists

Background– I have been working into analytics since February 2004. From 2004 to 2007 I worked only in SAS language. From 2008 onwards I started working with R and SAS languages. From 2013 I started working with Python. Since around 2009, we had a term called Big Data thanks to Hadoop and since 2013 we had a term called data science. What used to be called just analytics is now called data science, with added variants of Big Data Analytics and Business Analytics to refer to the same. For techniques in building models, we have used the terms predictive analytics, data mining and machine learning to mean roughy the same thing ( but actually they might be different). I had a MBA for business training and have written two books on R by now. I am currently writing my third book but on Python for Wiley.
During this journey I have taught, trained, mentored hundreds of budding data scientists and also interviewed a few of them, while giving a few interviews myself.  Based on this experience, here are a few questions to help you clear an entry level interview for data science roles and become data scientists.  Since data science itself is an intersection of business perspective, statistics and coding, I have accordingly labeled them in specific sections. This will not give you a sure shot chance of clearing an interview but learning a few of these questions and answers will definitely help increase the probability of you clearing the interview process.

http://www.edureka.co/blog/top-data-science-interview-questions-for-budding-data-scientists

I helped create Edureka’s R course  in 2013 (before I did it for Collabera in 2015 and after I did it for Jigsaw in 2011)- you can see a video of the initial class here which has gotten 100,000 views.

Edureka remains one of the true believers in customer centered online education without fooling young people of too much money, by a mass market, mass course approach with actual teacher student interaction than the cold robotic automation of MOOCs.

Screenshot from 2016-03-02 12:41:58

(related-

Jigsaw Completes training of 300 students on R

http://analyticstraining.com/author/ajay-ohri/

http://www.collaberatact.com/online-training-courses/analytics-with-r-certification/

 

 

 

 

 

PYTHON FOR R USERS : ; Come September

I am writing a new book on a new language for me (python) for a new publisher ( Wiley)

 

This book is the first of its kind to provide a reference that enables students and practitioners to easily learn to code in Python if they are familiar with R and vice versa, even if they are beginners in the second language. It also provides a detailed introduction and overview of each language to the reader who might be unfamiliar with the other. While R has better statistical and graphical tools, Python has good machine learning tools and proves to be more useful software for the analysis of Big Data. A unique feature of this book is how it provides a command-by-command translation between R and Python for many mathematical, visualization and machine learning techniques. The intended audience is statistical practitioners and data scientists trying to learn one of R or Python or both, as well as students that are familiar with one of the languages.

http://www.amazon.co.uk/Python-R-Users-Ajay-Ohri/dp/1119126762

 

Coming to Bay Area in April

Despite my visa blues ( see more at https://todayilearnedinamerica.wordpress.com/2016/02/15/night-13-make-epic-shit/ ) I am still hanging on and traveling on in the United States of America. I might have posted some blog posts here by mistake but I corrected that today since I am traveling by an Amtrac pass and I dont have wifi there.

I am also going to TWO of the best conferences I have never attended despite being a blog Partner since past three years.

Predictive Analytics World San Francisco – April 3-7, 2016http://www.predictiveanalyticsworld.com/sanfrancisco/2016/

pawsf16_blog
Predictive Analytics World for Workforce – April 3-6, 2016http://www.predictiveanalyticsworld.com/workforce/2016/

Do you want to go to San Fransisco for this conference ?

MAGIC CODE TO USE- You can get a 15% off the price of registration for 2 Day and Combo passes:    AJAYBP16

pawwf16_blog