Self Driving Cars , Geo Coded Ads, End of Privacy

Imagine a world in which your car tracks everywhere you go. Over a period of time, it builds up a database of your driving habits, how long you stay at particular kinds of dining places, entertainment places (ahem!) , and the days, and times you do it.  You can no longer go to massage parlours without your data being checked by your car software admin (read – your home admin)

And that data is mined using machine learning algols to give you better ads for pizzas, or a reminder for food after every 3 hours , or an ad for beer every Thursday after 8 pm .

Welcome Brave New World!

How to learn to be a hacker easily

1) Are you sure. It is tough to be a hacker. And football players get all the attention.

2) Really? Read on

3) Read Hacker’s Code

http://muq.org/~cynbe/hackers-code.html

The Hacker’s Code

“A hacker of the Old Code.”

  • Hackers come and go, but a great hack is forever.
  • Public goods belong to the public.*
  • Software hoarding is evil.
    Software does the greatest good given to the greatest number.
  • Don’t be evil.
  • Sourceless software sucks.
  • People have rights.
    Organizations live on sufferance.
  • Governments are organizations.
  • If it is wrong when citizens do it,
    it is wrong when governments do it.
  • Information wants to be free.
    Information deserves to be free.
  • Being legal doesn’t make it right.
  • Being illegal doesn’t make it wrong.
  • Subverting tyranny is the highest duty.
  • Trust your technolust!

4) Read How to be a hacker by

Eric Steven Raymond

http://www.catb.org/~esr/faqs/hacker-howto.html

or just get the Hacker Attitude

The Hacker Attitude

1. The world is full of fascinating problems waiting to be solved.
2. No problem should ever have to be solved twice.
3. Boredom and drudgery are evil.
4. Freedom is good.
5. Attitude is no substitute for competence.
5) If you are tired of reading English, maybe I should move on to technical stuff
6) Create your hacking space, a virtual disk on your machine.
You will need to learn a bit of Linux. If you are a Windows user, I recommend creating a VMWare partition with Ubuntu
If you like Mac, I recommend the more aesthetic Linux Mint.
How to create your virtual disk-
read here-
Download VM Player here
http://www.vmware.com/support/product-support/player/
Down iso image of operating system here
http://ubuntu.com
Downloading is the longest thing in this exercise
Now just do what is written here
http://www.vmware.com/pdf/vmware_player40.pdf
or if you want to try and experiment with other ways to use Windows and Linux just read this
http://www.decisionstats.com/ways-to-use-both-windows-and-linux-together/
Moving data back and forth between your new virtual disk and your old real disk
http://www.decisionstats.com/moving-data-between-windows-and-ubuntu-vmware-partition/
7) Get Tor to hide your IP address when on internet
https://www.torproject.org/docs/tor-doc-windows.html.en
8a ) Block Ads using Ad-block plugin when surfing the internet (like 14.95 million other users)
https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/
 8b) and use Mafiafire to get elusive websites
https://addons.mozilla.org/en-US/firefox/addon/mafiaafire-redirector/
9) Get a  Bit Torrent Client at http://www.utorrent.com/
This will help you download stuff
10) Hacker Culture Alert-
This instruction is purely for sharing the culture but not the techie work of being a hacker
The website Pirate bay acts like a search engine for Bit torrents 
http://thepiratebay.se/
Visiting it is considered bad since you can get lots of music, videos, movies etc for free, without paying copyright fees.
The website 4chan is considered a meeting place to meet other hackers. The site can be visually shocking
http://boards.4chan.org/b/
You need to do atleast set up these systems, read the websites and come back in N month time for second part in this series on how to learn to be a hacker. That will be the coding part.
END OF PART  1
Updated – sorry been a bit delayed on next part. Will post soon.

Sanskrit for Human Resource Management

So I picked up more Sanskrit on my stay at Goa at the Tantra http://www.decisionstats.com/tantra-anjuna/

Things to do- or Aims of Human Life

Dharam– Planning, Duty and Responsibilities
Karam– Executing Actions
Artha-Monetary Gains through Planning and Executing
Kama-Desires and Pleasure Seeking
Moksha- Achieving Self Actualization

Things to Control-

http://en.wikipedia.org/wiki/Five_Evils

instead of 7 sins in Western thought, there are only 5 evils in Sanksrit. Also these evils are correlated, if you control one too much, the other evils will rise.
Kam – Your Lusts or Desires
Krodha-Your Anger
Madh-Your Pride
Lobh-Your Greed for Monetary Satisfaction
Moh-Your affection and love and attachments

 

Also related-

Sanskrit for Motivation

http://www.decisionstats.com/strategic-tactics-in-sanskrit/

Indian Societal Hierarchy

http://www.decisionstats.com/economic-indian-caste-system-simplification/

 

 

Interview Kelci Miclaus, SAS Institute Using #rstats with JMP

Here is an interview with Kelci Miclaus, a researcher working with the JMP division of the SAS Institute, in which she demonstrates examples of how the R programming language is a great hit with JMP customers who like to be flexible.

 

Ajay- How has JMP been using integration with R? What has been the feedback from customers so far? Is there a single case study you can point out where the combination of JMP and R was better than any one of them alone?

Kelci- Feedback from customers has been very positive. Some customers are using JMP to foster collaboration between SAS and R modelers within their organizations. Many are using JMP’s interactive visualization to complement their use of R. Many SAS and JMP users are using JMP’s integration with R to experiment with more bleeding-edge methods not yet available in commercial software. It can be used simply to smooth the transition with regard to sending data between the two tools, or used to build complete custom applications that take advantage of both JMP and R.

One customer has been using JMP and R together for Bayesian analysis. He uses R to create MCMC chains and has found that JMP is a great tool for preparing the data for analysis, as well as displaying the results of the MCMC simulation. For example, the Control Chart platform and the Bubble Plot platform in JMP can be used to quickly verify convergence of the algorithm. The use of both tools together can increase productivity since the results of an analysis can be achieved faster than through scripting and static graphics alone.

I, along with a few other JMP developers, have written applications that use JMP scripting to call out to R packages and perform analyses like multidimensional scaling, bootstrapping, support vector machines, and modern variable selection methods. These really show the benefit of interactive visual analysis of coupled with modern statistical algorithms. We’ve packaged these scripts as JMP add-ins and made them freely available on our JMP User Community file exchange. Customers can download them and now employ these methods as they would a regular JMP platform. We hope that our customers familiar with scripting will also begin to contribute their own add-ins so a wider audience can take advantage of these new tools.

(see http://www.decisionstats.com/jmp-and-r-rstats/)

Ajay- Are there plans to extend JMP integration with other languages like Python?

Kelci- We do have plans to integrate with other languages and are considering integrating with more based on customer requests. Python has certainly come up and we are looking into possibilities there.

 Ajay- How is R a complimentary fit to JMP’s technical capabilities?

Kelci- R has an incredible breadth of capabilities. JMP has extensive interactive, dynamic visualization intrinsic to its largely visual analysis paradigm, in addition to a strong core of statistical platforms. Since our brains are designed to visually process pictures and animated graphs more efficiently than numbers and text, this environment is all about supporting faster discovery. Of course, JMP also has a scripting language (JSL) allowing you to incorporate SAS code, R code, build analytical applications for others to leverage SAS, R and other applications for users who don’t code or who don’t want to code.

JSL is a powerful scripting language on its own. It can be used for dialog creation, automation of JMP statistical platforms, and custom graphic scripting. In other ways, JSL is very similar to the R language. It can also be used for data and matrix manipulation and to create new analysis functions. With the scripting capabilities of JMP, you can create custom applications that provide both a user interface and an interactive visual back-end to R functionality. Alternatively, you could create a dashboard using statistical and/or graphical platforms in JMP to explore the data and with the click of a button, send a portion of the data to R for further analysis.

Another JMP feature that complements R is the add-in architecture, which is similar to how R packages work. If you’ve written a cool script or analysis workflow, you can package it into a JMP add-in file and send it to your colleagues so they can easily use it.

Ajay- What is the official view on R from your organization? Do you think it is a threat, or a complimentary product or another statistical platform that coexists with your offerings?

Kelci- Most definitely, we view R as complimentary. R contributors are providing a tremendous service to practitioners, allowing them to try a wide variety of methods in the pursuit of more insight and better results. The R community as a whole is providing a valued role to the greater analytical community by focusing attention on newer methods that hold the most promise in so many application areas. Data analysts should be encouraged to use the tools available to them in order to drive discovery and JMP can help with that by providing an analytic hub that supports both SAS and R integration.

Ajay-  While you do use R, are there any plans to give back something to the R community in terms of your involvement and participation (say at useR events) or sponsoring contests.

 Kelci- We are certainly open to participating in useR groups. At Predictive Analytics World in NY last October, they didn’t have a local useR group, but they did have a Predictive Analytics Meet-up group comprised of many R users. We were happy to sponsor this. Some of us within the JMP division have joined local R user groups, myself included.  Given that some local R user groups have entertained topics like Excel and R, Python and R, databases and R, we would be happy to participate more fully here. I also hope to attend the useR! annual meeting later this year to gain more insight on how we can continue to provide tools to help both the JMP and R communities with their work.

We are also exploring options to sponsor contests and would invite participants to use their favorite tools, languages, etc. in pursuit of the best model. Statistics is about learning from data and this is how we make the world a better place.

About- Kelci Miclaus

Kelci is a research statistician developer for JMP Life Sciences at SAS Institute. She has a PhD in Statistics from North Carolina State University and has been using SAS products and R for several years. In addition to research interests in statistical genetics, clinical trials analysis, and multivariate analysis/visualization methods, Kelci works extensively with JMP, SAS, and R integration.

.

 

Facebook IPO- Do you feel lucky?

2 Jan 2011 dealbook.nytimes.com

Facebook has raised $500 million from Goldman Sachs and a Russian investor in a transaction that values the company at $50 billion

29 Jan 2011 -www.bloomberg.com-$82.9-billion

14 Jun 2011-CNBC———————-$100 billion

27 Jun 2011 -news.cnet.com———-$70 billion

27 Sep 2011-Venturebeat.com——-$82.5 billion

100 billion valuation divided by 1000 million subscribers

=100 $ net present value of ad profit (note if 80 billion valuation with 800 million subscribers it is the same)

=250 $ net present value of ad revenues (assuming 40 % profitability)

=2500 $ net present value of online purchases by Facebook ad clicking customer

(assuming advertisers dedicate 10% of revenue to advertising by Facebook)

and the lucky Russian Investor who invested at 50 billion valuation only to see it double in six months, where else has he inVested

http://nymag.com/daily/intel/2011/01/facebooks_russian_investor_hel.html

Digital Sky Technologies co-founder Yuri Milner, who co-invested in the Goldman-Facebook deal, enviably poised in the middle. DST has been investing early and aggressively in some of the biggest names in the tech bubble boom like Facebook (DST first invested in May 2009), Zynga (the company that makes Farmville and Cityville for Facebook), and Groupon (the dudes that just turned down Google’s $6 billion).

NOTE -Both groupon and Zynga IPO  investors lost money as they are now below IPO price.

http://openchannel.msnbc.msn.com/_news/2011/01/05/5771129-russian-facebook-investors-have-sparked-us-concerns

More on Digital Sky Tech and Yuri Milner and the free internet in Putin’s Russia

Digital Sky got particular attention because of its broad control of the Russian Internet. DNI noted that the company is “a dominant force in the Runet,” owning the most popular Websites in the former Soviet Union, including Russia, Ukraine, Kazakhstan, Georgia, and Armenia as well as others in the Czech Republic and Poland. By some estimates it reported “over 70 percent of all page views in the Russian-language Internet are on its companies’ Websites.”

 

 

From Wall Street Journal-

May 1, 2011

http://www.zdnet.com/blog/facebook/wsj-facebook-growth-exceeds-expectations-100-billion-valuation-justifiable/1306

Last month, a private-market transaction of 100,000 shares of Facebook Class B Common Stock priced at $32.00 apiece gave the website a valuation of $80 billion. Two months ago, Facebook was valued at $65 billion, when investment firm General Atlantic reportedly bought 0.1 percent of Facebook by purchasing roughly 2.5 million Facebook shares from former Facebook employees. Three months ago, Kleiner Perkins Caufield & Byers (KPCB) invested $38 million in Facebook, which was only worth 0.00073 percent of the social network, but still resulted in a valuation of $52 billion.

 

related-

http://techcrunch.com/2011/01/10/facebook-5/

 

Something is gotta give?

Go ahead and  Please. Buy Facebook Stock !

Do you feel lucky?

 

 

 

 

Top 5 XKCD on Data Visualization

By request, an analysis of Top 5  XKCDs on data visualization. Statisticians and Data Scientists to note-

1) DOT PLOT

 

2)  LINE PLOTS

3) FLOW CHARTS

4) PIE CHARTS and 5) BAR GRAPHS

I am not going into the big big graphs of course like the Star Wars Plot data visualization at

http://xkcd.com/657/ or the Money Chart at http://xkcd.com/980/ because I dont believe in data visualization to show off, but to keep it simple simply 🙂

Now I gotta find me a software that can write my blog for me 🙂

Analytics for Cyber Conflict -Part Deux

Part 1 in this series is avaiable at http://www.decisionstats.com/analytics-for-cyber-conflict/

The next articles in this series will cover-

  1. the kind of algorithms that are currently or being proposed for cyber conflict, as well as or detection

Cyber Conflict requires some basic elements of the following broad disciplines within Computer and Information Science (besides the obvious disciplines of heterogeneous database types for different kinds of data) –

1) Cryptography – particularly a cryptographic  hash function that maximizes cost and time of the enemy trying to break it.

From http://en.wikipedia.org/wiki/Cryptographic_hash_function

The ideal cryptographic hash function has four main or significant properties:

  • it is easy (but not necessarily quick) to compute the hash value for any given message
  • it is infeasible to generate a message that has a given hash
  • it is infeasible to modify a message without changing the hash
  • it is infeasible to find two different messages with the same hash

A commercial spin off is to use this to anonymized all customer data stored in any database, such that no database (or data table) that is breached contains personally identifiable information. For example anonymizing the IP Addresses and DNS records with a mashup  (embedded by default within all browsers) of Tor and MafiaaFire extensions can help create better information privacy on the internet.

This can also help in creating better encryption between Instant Messengers in Communication

2) Data Disaster Planning for Data Storage (but also simulations for breaches)- including using cloud computing, time sharing, or RAID for backing up data. Planning and creating an annual (?) exercise for a simulated cyber breach of confidential just like a cyber audit- similar to an annual accounting audit

3) Basic Data Reduction Algorithms for visualizing large amounts of information. This can include

  1. K Means Clustering, http://www.jstor.org/pss/2346830 , http://www.cs.ust.hk/~qyang/Teaching/537/Papers/huang98extensions.pdf , and http://stackoverflow.com/questions/6372397/k-means-with-really-large-matrix
  2. Topic Models (LDA) http://www.decisionstats.com/topic-models/,
  3. Social Network Analysis http://en.wikipedia.org/wiki/Social_network_analysis,
  4. Graph Analysis http://micans.org/mcl/ and http://www.ncbi.nlm.nih.gov/pubmed/19407357
  5. MapReduce and Parallelization algorithms for computational boosting http://www.slideshare.net/marin_dimitrov/large-scale-data-analysis-with-mapreduce-part-i

In the next article we will examine

  1. the role of non state agents as well as state agents competing and cooperating,
  2. and what precautions can knowledge discovery in databases practitioners employ to avoid breaches of security, ethics, and regulation.