How to help your government keep the world safe using statistics #rstats #python #sas

Big Data for Big Brother. Now playing. At a computer near you. How to help water the tree of liberty using statistics?

Use R

 

or

Use Python

 

LKF2-eVZHWtc-47347

WvfC-nxDTMqJ-97899

or use SAS software

SAS/CIA from the last paragraph of

Click to access ET_CD_Mumbai_Jul12.pdf

Screenshot from 2013-06-09 20:19:01

 

Thoughts on Guardian’s expose of US Govt Data Mining

I am writing this sitting in Canada, in a language given to me by my education and colonial history. Some of my best friends and mentors have been Americans, Europeans and Asians. I will try and state this as objectively as possible.

  1. If China’s government sees your data or even data of dissidents, it is considered bad, but if the US Government does it, it is considered okay. Is that really okay? I would trust the SCOTUS, the IRS, the Congress, but I am not sure I should trust the Pentagon, NSA and White House with zero restraints.
  2. In reality no human eye can see so much data. They have algorithms running for text mining, automated programs that enable human analysts to zoom in if required. The US government is not presumably interested in your dating life but the data is there.
  3. Economic Espionage has been a traditional tradecraft of Western policy since they borrowed gunpowder and silk from China, to Operation Paper Clip , giving Nazis pardons for US space programs.
  4. USA has a long tradition and policy of government and defense working with the private sector to give them economic advantages. Internet was released by Al Gore and DARPA.
  5. With the new challenges of climate change, economic rivalry, diminishing energy resources- should the US government be trusted with almost 80% of the data flowing through the English speaking Internet.
  6. The collection of data from non American citizens effectively makes this an undeclared cyber-war that the Obama government is waging against the world.
  7. If Albert Einstein could protest nuclear weapons 60 years ago, as data scientists it is our community’s duty to clear the rules of engagement of data collection and data mining. Before we get into one more cyber cold war.obama-big-brother

Barack_Obama_Hope_poster

 

I am disappointed with Microsoft, Yahoo, Google, Facebook, PalTalk, AOL, Skype, YouTube, Apple for not even once protesting this move.

I would also like to know what is the expense in past 7 years of this monitoring and how many threats were neutralized (and why the Boston Bombers and others could not be)

I would like to know if data belonging to members of the US Congress was collected or purged by the records from the NSA or if there are any exclusion criterion for people or was data collected for everybody

Read more here- http://www.guardian.co.uk/world/2013/jun/06/us-tech-giants-nsa-data

To my intense horror, it seems Julian Assange was right about Eric Schmidt.

There are NO ways of making money that are NOT evil.

Tatvic bets on R

Tatvic, a up and coming startup founded by an ex-Trilogy colleague, has helped with the R for Google Analytics package. While Tatvic is into heavy duty web analytics, they are betting big on R, and using it for Web Analytics. David Smith, most excellent blogger-de-chief in R universe has blogged on them before here http://blog.revolutionanalytics.com/2013/02/analyze-web-traffic-data-with-google-analytics-and-r.html

Here is an upcoming seminar on R in Web Analytics.

Click here

From this webinar, you will get to know:

  • What is R and why should you use this tool? How to extract your Web Analytics data into R?
  • How to build a predictive model using web analytics data with the help of R?
  • How predictive modelling can take your analysis to the next level?
  • How to carry out insightful analysis through visualization?

Who should attend: Every web analyst who wants to take his analysis to the next level.

ps- Hat tip to Caroline  A

6 weeks Data Scientist Online Courses #rstats

Hosting a 6 weekend live online certification course on Business Analytics with R starting June 1 at Edureka.Check www.edureka.in/r-for-analytics for more details. Course has been decided to ensure more open data science than current expensive offerings that are tech rather than business oriented but more support and customization than a MOOC This is because many business customers don’t care if it is lapply or ddapply, or command line or GUI, as long  as they get good ROI on time and money spent in shifting to R from other analytics software.

Screenshot from 2013-05-28 07:16:41

 

 

R for Business Analytics now in Chinese

Email from Springer to me-

—————————————————————————————————————————————————————

Dear Dr. Ohri,

Springer SBM is pleased to inform you that we have concluded a contract for the Chinese translation of your book:
R for Business Analytics
Edition Number: 1 (2013)
Mr. A Ohri
978-1-4614-4342-1

We trust you are as enthusiastic about this opportunity to distribute your book as we are.

The publisher of the translation is: Xi’an Jiaotong University Press.

The financial conditions we agreed upon are:
A flat fee of EUR XXXX  for 3,000 copies, payable upon conclusion of the agreement but not later than 60 days thereafter. (Please note that this fee is subject to tax deductions of 15.77%, imposed by the licensee’s country.)

Upon receiving the payment, we will ask our accounts department to transfer your shares to you, according to your contract with Springer. The share will then be shown on the next royalty statement you’ll receive.

Upon publication you would receive 4 complimentary copies.

In case you have further questions, please do not hesitate to contact us.

With best wishes,

XXXXX

Springer
Rights and Permissions

—————————————————-

For the book in English- see right margin!

So many R Packages Everywhere, which one do I use? #rstats

Some thoughts on R Packages

  • CRAN is no longer the sole repository for many useful R packages. This includes R Forge, Google Code and increasingly Github
  • CRAN lacks the flexibility and social aspect of Github.
  • CRAN Views is the only thing that lists subject wide listing of R packages. The categorization is however done more on methods than on use cases or business domains.
  • Multiple R packages for the same thing. Which one do I use? Only Stack Overflow helps with that. No rating , no recommendation system
  • The packages suggested by R package feature needs better and automatic association analysis . Right now it is manual and dependent on package author and maintainer.
  • Quis custodiet ipsos custodes? Who guards the guardians of R packages. In an era of cyber security, we need better transparency on security measures within R packages especially given the international nature of the project.  I am very sure I ( or anyone) can create R code to communicate discretely especially on Windows

  • I would rather not install anything on my local machine, and read the package directly from the CRAN . CRAN was designed in an era of low bandwidth- this needs to be upgraded.
  • Note I am refraining respectfully from the atrocious nature of aesthetics in the home website. Many statisticians feel no use of making R user friendly. My professors at U tenn (from which I dropped out in 2 sems) were horrified when I took courses in graphic design as I wanted to know more on the A and B, which make the A/B testing of statistical design. Now that I am getting older, I get horrified by the lack of HTML, CSS and JQuery by some of the brightest programmers in this project.
  • Please comment below.