Analytics – Page 129 – DECISION STATS

What do you want to know in data analytics?

I will be posting video responses to the questions asked by you at (using Google Moderator)

http://www.google.com/moderator/#15/e=7217&t=7217.40

So ask and I willl compile the best questions and reply on.

All you want to know in data analytics- What do you want to know in data analytics?

Below-Screenshot of existing questions asked already-

How crowded is the neighborhood?

How crowded is India compared to the United States? Around 11 times. Thats based on number of person per square km.

How crowded is India compared to China? Around 2.5 times.

– Based on the following procedure-

Data Sources – http://bit.ly/densityUN . With Pivotable tables, downloaded the CSV file.
Creating a new spreadsheet in Google Docs, I copied and pasted data in the csv file
Using Gadgets- I inserted the Gadget for Motion Chart which is based on Hans Rosling’s famous Gapminder Bubble Chart.

– Some Thoughts

It is not surprising that most immigration (legal and illegal) occurs from high population density countries with stretched resources to lower density countries with higher levels of living. Generally smaller sized countries like Japan, Singapore, Macau (china) have outlier densities as well.

– Also, the Adobe AIR desktop application by Gapminder is quite the best application for this as well. Speaking of which_ I hope other Linux application developers can learn from Adobe AIR’s way of graphics /data visualization.

Graphs

Some graphs from the Official Graphs Gallery at sas.com

http://support.sas.com/sassamples/graphgallery/PROC_G3D_Graph_Types_Plots_Scatter.html

From R’s Graph Gallery Here is the same-

http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=10

Which one do you like? Sometimes graphics is about imagination and not just software.

Aster Data Webinar on Analytics /Mapreduce

Covers the following usual suspects-

Time-series analysis – Applied to price optimization and fraud detection

Graph analysis – Applied to social networks and physical networks (IT/Telco/Cable)

Behavioral analysis – Applied to clickstream behavior and market basket analys

For more click here-

Software- Appls and Bugs

Some time ago I had written on a Twitter application bubble (actually it was a year ago here at https://decisionstats.wordpress.com/2009/04/05/tweets-viruses-and-bubbles/)

The automatic Twitter follow /unfollow (or atleast the automated unfollow ) was used by Twitter App Refollow.com (which is quite old- so it was a surprise when Twitter blamed the recent 0 followers 0 floowing on a bug which allows automated following) and the RSS automated reader is used by Twitterfeed.com (among others). I accidently created/revealed a bug in 2009 with the hash command #rstats which is used as a search index in twitter’s search engine) when I basically married a lot of RSS feeds pertaining to R and added the #rstats with them to the alternative twitter handle (Rarchive) . I did the same with the #sas with Sascommunity (which I later donated on request back to that community sascommunity.org). Basically this had the temporary effect of skewing search results for these search terms for a day (till Twitter fixed it).

As Twitter evolves from a well funded startup to a business- and tries to become more structured from chaotic flux, such bugs will continue to evolve. Bugs and especially software bugs are meant to be fixed (or squashed). This by no means should be a relection on the health of the software service (here- Twitter). Indeed the biggest worry is a mainstream software that has no flexibility for creative third party applications and thinks that it is bug-free. Perfect software exists in a perfect world- and delusional perfection can be dangerous thinking especially for software with clients (even more for statistical software).

Which stats softwares are you using and how confident you are that the bugs are being resolved openly?

The R Online WikiBook

I came across the R Programming Wikibook at http://en.wikibooks.org/wiki/R_Programming

It is quite surprisingly good- easy to read for a beginner- handy and concise reference for intermediate users. Some chapters like clustering could do with some more support from the community -see http://en.wikibooks.org/wiki/R_Programming/Clustering

See packages class, amap and cluster

See The R bioinformatic page on clustering

[edit]References

“The Elements of Statistical Learning”

But I really liked the pages on Graphics, Modeling and Maths (including Matrix)

See

http://en.wikibooks.org/wiki/R_Programming/Graphics

and http://en.wikibooks.org/wiki/R_Programming/Linear_Models

I really believe that a consolidated one book online documentation can be achieved for R, only if we follow a moderated-wiki like structure. This can be of a great use- since online help documents for R are currently not concise or present a seemingly professional look (due to multiple formats and styles to the documentation) and they rarely do multiple package comparison. All this has made R books the top selling books on statistics on Amazon but a project like R deserves atleast one comprehensive online and concise book which can be used readily without going through all the scattered multiple documentation- a bit like a R Online Doc.This could help in stage next of the project in getting more users to be comfortable with it.

Any volunteers 🙂 ?

Please share:

Please share:

Please share:

Please share:

Please share:

Please share:

[edit]References

Please share: