
Month: March 2016
Why I would always love freedom more than criticize
Why American government needs better data science than provided currently?
- Government spends a lot of money tackling the toughest least profitable problems
- Govt has trouble recruiting the best hackers , computer scientists and statisticians (data science community) as they generally get a lot more salary in private sector for far more easy problems ( which ad do I want them to click)
- Private companies in USA can also outsource or get H1 visa workers for analytical needs while even USA government has to rely on US citizen data scientists for small non-sensitive departments like calculating subsidies for factory farms for Department of Agriculture.
- Meanwhile the budget for IT digitization for electronic government and Data Science is quite small
- Govt has lot more bureaucracy and lack of speed to get things done which is a big turn off for companies trying to be new data science vendors thus leaving a big hand to pricey players like AH BEE HUMMhttp://www.youtube.com/watch?v=25QyCxVkXwQ
- http://www.youtube.com/watch?v=25QyCxVkXwQ
- In election season, data scientists are in even more shortage as they work for analyzing and calculating odds for winning states or even work in teams for candidates ( political parties pay everyone working for them by cheque in US )
- Information security is one more area where they lack enough recruitment strategy
- Hackers have a general aversion to working for any Government ( for less salary) unless they are endowed with equity ( here successes companies like INQTEL (https://www.iqt.org/) can be replicated not just for Intelligence but for other departments as well by startup funds in hacker
- Software interfaces need to be updated for better data visualization and analytical communication across departments
- More money can be invested in training existing Federal Employees in analytics, analytical way of thinking or even basics of data science
One more note- US government can repair its relationship by the hacker activist community even by small courtesies and track 2 diplomacy. That can help not just with business as usual data science (like where is rain going to fall in Florida and Lousiana for Department of Agriculture) but also special areas of mutual concern (identifying hateful events through crowd sourced intelligence across public social media data
Because Today It is Friday
Much ado about nothing
P Values have now become controversial. P here does not stand for President Trump but this.
https://matloff.wordpress.com/2016/03/07/after-150-years-the-asa-says-no-to-p-values/
Sadly, the concept of p-values and significance testing forms the very core of statistics. A number of us have been pointing out for decades that p-values are at best underinformative and often misleading. Almost all statisticians agree on this, yet they all continue to use it and, worse, teach it. I recall a few years ago, when Frank Harrell and I suggested that R place less emphasis on p-values in its output, there was solid pushback. One can’t blame the pusherbackers, though, as the use of p-values is so completely entrenched that R would not be serving its users well with such a radical move.
Click to access P-ValueStatement.pdf
The American Statistical Association (ASA) has released a “Statement on Statistical Significance and P-Values” with six principles underlying the proper use and interpretation of the p-value [http://amstat.tandfonline.com/doi/abs/10.1080/00031305.2016.1154108#.Vt2XIOaE2MN]. The ASA releases this guidance on p-values to improve the conduct and interpretation of quantitative science and inform the growing emphasis on reproducibility of science research. The statement also notes that the increased quantification of scientific research and a proliferation of large, complex data sets has expanded the scope for statistics and the importance of appropriately chosen techniques, properly conducted analyses, and correct interpretation.
I personally think Big Data needs Bigger Thinking among statisticians about a new era of inference.
However as always these guys are the best


Algorithm to deal with a broken heart
- Abort (A): Terminate the operation/program and return to the system command prompt.[2] In hindsight this was not a good idea as the program would not do any cleanup (such as completing writing of other files). “Abort” was necessary because early DOS did not implement “Fail”. It may have remained necessary for poorly written software for which “Fail” would have caused a loop that would have repeatedly invoked the critical error handler with no other way to exit.
- Retry (R): DOS would attempt the operation again.[2] “Retry” made sense if the user could rectify the problem. To continue the example above, if the user simply forgot to close the drive latch, they could close it, retry, and the system would continue where it left off.
- Ignore (I) (older versions of DOS): Return success status to the calling program/routine, despite the failure of the operation.[2] For instance, a disk read error could be ignored and DOS would return whatever data was in the read buffer, which might contain some of the correct data from the disk. Attempting to use results after an “Ignore” was an undefined behavior.[2] “Ignore” did not appear in cases where it was impossible for the data to be used; for instance, a missing disk could not be ignored because that would require DOS to construct and return some kind of file descriptor that worked in further “read” calls. This is not available if DOS cannot read any sector from the first sector of a floppy disk or a partition of a hard disk to the last sector of the root directory.
- Fail (F) (DOS 3.3 and later): Return failure status to the calling program/routine.[2] “Fail” returned an error code to the program, similar to other errors such as file not found. The program could then gracefully recover from the problem.
from https://en.wikipedia.org/wiki/Abort,_Retry,_Fail%3F
In the lines above replace DOS with LOVER, and you have the algorithm
Internship Experience of working in DecisionStats
A DecisionStats Intern shares his experts. ps- he got a job as a data scientist post the internship 🙂