Comparing Bit Torrent Downloaders

Tux, as originally drawn by Larry Ewing
Image via Wikipedia

I personally like UTorrent on Windows and KTorrent on Linux.

While no experts on this, anything that gets the data down faster while maximizing my pipes efficiency.

I also like Torrenting than  any of the sudo-apt get method of downloading software or the zip unzip,tar untar, install/make file

Torrenting is a simpler way of sharing applications but sadly not used much by the stats computing community to share downloads.

Also I think any dashboard or visualization should be sorted (but not alphabetically but numerically/categorically)

SORT THE DASHBOARD —-KEEP IT SORTED

So I am partially recreating after sorting the data viz from http://en.wikipedia.org/wiki/Comparison_of_BitTorrent_clients

BitTorrent client Magnet URI Super-seeding Embedded tracker UPnP[81] NAT Port Mapping Protocol NAT traversal[82] DHT[83] Peer exchange Encryption UDP tracker LPD
µTorrent Yes Yes[95] Yes[96] Yes[97] Yes Yes[98] Yes[99] Yes[85] Yes[100] Yes Yes[101]
BitSpirit [11] Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No
BitTorrent 6 Yes Yes Yes Yes Yes Yes Yes Yes[85] Yes Yes Yes
OneSwarm Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No
qBittorrent Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
SoMud Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Vuze (formerly Azureus) Yes Yes Yes Yes Yes Yes[102] Yes[87] Yes Yes Yes No
BitComet Yes Yes Separate download Yes Yes Yes Yes Yes Yes Yes No
Tixati [43] Yes Yes No Yes No No Yes Yes Yes Yes Partial
Aria2 Yes No Yes No No No Yes Yes Yes Yes Yes
Tribler Yes No Yes Yes Yes No Yes Yes Yes No No
Bitflu Yes No No No No No Yes Yes No Yes No
Deluge Yes No No Yes Yes Yes Yes Yes Yes Yes Yes
Flush Yes No No Yes Yes No Yes Yes No No Yes
KTorrent Yes No No Yes Yes Yes Yes Yes Yes Yes Partial
Shareaza Yes No No Yes Yes No Yes[93] Yes No No No
Transmission Yes No No Yes Yes Yes Yes Yes[94] Yes No Yes
LimeWire Partial Yes Yes Yes Yes No Yes Yes Yes Yes No
BitTyrant No Yes[citation needed] Yes Yes Yes Yes[86] Yes[87] Yes Yes No No
BitTornado No Yes Yes[84] Yes No No No No Yes No No
Torrent Swapper No Yes Yes[84] Yes No No No Yes No No No
Localhost No Yes Yes Yes No Yes Yes [89] No No No No
Meerkat Bittorrent Client No Yes No Yes Yes Yes Yes No Yes No No
rTorrent No Yes No No No No Yes Yes Yes Yes No[92]
TorrentFlux No Yes No Yes No No No No Yes No No
TorrentVolve No Partial [76] No Partial[76] Partial [76] Partial [76] Partial[76] Partial [76] Partial [76] Partial [76] No
Opera No No Yes[90] No No No No Yes[91] No No No
BitTorrent 5 / Mainline No No Yes[84] Yes Yes No Yes Yes Yes No No
ABC No No Yes Yes No No No No No No No
Blog Torrent No No Yes No No No No No No No No
MLDonkey No No Yes Yes Yes No No No No Yes No
Tomato Torrent No No Yes No No No Yes No No No No
Acquisition No No No No Yes No No No No No No
Arctic Torrent No No No No No No No Yes No No No
BitLet No No No Yes No No No No No No No
BitLord No No No Yes No Yes No Yes No Yes No
BitThief No No No No No No No No No No No
Bits on Wheels No No No No No No No No No No No
BTG No No No Yes Yes No Yes Yes Yes Yes No
BTPD No No No No No No No No No No No
FlashGet No No No No No No Yes No Yes No No
Folx No No No Yes Yes No Yes Yes No Yes No
Free Download Manager No No No No No No Yes Yes No No No
G3 Torrent No No No No No No No No No No No
Gnome BitTorrent No No No No No No No No No No No
Halite No No No Yes Yes No Yes No Yes No[88] No
QTorrent No No No No No No No No No No No
Rufus No No No No No No No No No No No
SymTorrent No No No N/A N/A N/A No No No No No
Tonido Torrent No No No Yes Yes Yes Yes No No No No
Torium No No No Yes No No Yes No No No No
ZipTorrent No No No Yes Yes No No Yes No No No

 

 

 

 

Data Visualization: Central Banks

Iron Ore Company of Canada
Image via Wikipedia

Trying to compare the transparency of central banks via the data visualization of two very different central banks.

One is Reserve Bank of India and the other is Federal Reserve Bank of New York

Here are some points-

1) The federal bank gives you a huge clutter of charts to choose from and sometimes gives you very difficult to understand charts.

see http://www.newyorkfed.org/research/global_economy/usecon_charts.html

and http://www.newyorkfed.org/research/directors_charts/us18chart.pdf

us18chart

2) The Reserve bank of India choose Business Objects and gives you a proper drilldown kind  of  graph and tables. ( thats a lot of heavy metal and iron ore China needs from India 😉 😉

Foreign Trade – Export      Time-line: ALL

TIME LINE COUNTRY COMMODITY AMOUNT (US $ MILLION) EXPORT QUANTITY
2010:07 (JUL) – P China IRON ORE (Units: TON) 205.06 1878456
2010:06 (JUN) – P China IRON ORE (Units: TON) 427.68 6808528
2010:05 (MAY) – P China IRON ORE (Units: TON) 550.67 5290450
2010:04 (APR) – P China IRON ORE (Units: TON) 922.46 9931500
2010:03 (MAR) – P China IRON ORE (Units: TON) 829.75 13177672
2010:02 (FEB) – P China IRON ORE (Units: TON) 706.04 10141259
2010:01 (JAN) – P China IRON ORE (Units: TON) 577.13 8498784
2009:12 (DEC) – P China IRON ORE (Units: TON) 545.68 9264544
2009:11 (NOV) – P China IRON ORE (Units: TON) 508.17 9509213
2009:10 (OCT) – P China IRON ORE (Units: TON) 422.6 7691652
2009:09 (SEP) – P China IRON ORE (Units: TON) 278.04 4577943
2009:08 (AUG) – P China IRON ORE (Units: TON) 276.96 4371847
2009:07 (JUL) China IRON ORE (Units: TON) 266.11 4642237
2009:06 (JUN) China IRON ORE (Units: TON) 241.08 4584354

Source : DGCI & S, Ministry of Commerce & Industry, GoI

 

You can see the screenshots of the various visualization tools of the New York Fed Reserve Bank and Indian Reserve Bank- if the US Fed is serious about cutting the debt maybe it should start publishing better visuals

The Year 2010

Nokia N800 internet tablet, with open source s...
Image via Wikipedia

My annual traffic to this blog was almost 99,000 . Add in additional views on networking sites plus the 400 plus RSS readers- so I can say traffic was 1,20,000 for 2010. Nice. Thanks for reading and hope it was worth your time. (this is a long post and will take almost 440 secs to read but the summary is just given)

My intent is either to inform you, give something useful or atleast something interesting.

see below-

Jan Feb Mar Apr May Jun
2010 6,311 4,701 4,922 5,463 6,493 4,271
Jul Aug Sep Oct Nov Dec Total
5,041 5,403 17,913 16,430 11,723 10,096 98,767

 

 

Sandro Saita from http://www.dataminingblog.com/ just named me for an award on his blog (but my surname is ohRi , Sandro left me without an R- What would I be without R :)) ).

Aw! I am touched. Google for “Data Mining Blog” and Sandro is the best that it is in data mining writing.

DMR People Award 2010
There are a lot of active people in the field of data mining. You can discuss with them on forums. You can read their blogs. You can also meet them in events such as PAW or KDD. Among the people I follow on a regular basis, I have elected:

Ajay Ori

He has been very active in 2010, especially on his blog . Good work Ajay and continue sharing your experience with us!”

What did I write in 2010- stuff.

What did you read on this blog- well thats the top posts list.

2009-12-31 to Today

Title Views
Home page More stats 21,150
Top 10 Graphical User Interfaces in Statistical Software More stats 6,237
Wealth = function (numeracy, memory recall) More stats 2,014
Matlab-Mathematica-R and GPU Computing More stats 1,946
The Top Statistical Softwares (GUI) More stats 1,405
About DecisionStats More stats 1,352
Using Facebook Analytics (Updated) More stats 1,313
Test drive a Chrome notebook. More stats 1,170
Top ten RRReasons R is bad for you ? More stats 1,157
Libre Office More stats 1,151
Interview Hadley Wickham R Project Data Visualization Guru More stats 1,007
Using Red R- R with a Visual Interface More stats 854
SAS Institute files first lawsuit against WPS- Episode 1 More stats 790
Interview Professor John Fox Creator R Commander More stats 764
R Package Creating More stats 754
Windows Azure vs Amazon EC2 (and Google Storage) More stats 726
Norman Nie: R GUI and More More stats 716
Startups for Geeks More stats 682
Google Maps – Jet Ski across Pacific Ocean More stats 670
Not so AWkward after all: R GUI RKWard More stats 579
Red R 1.8- Pretty GUI More stats 570
Parallel Programming using R in Windows More stats 569
R is an epic fail or is it just overhyped More stats 559
Enterprise Linux rises rapidly:New Report More stats 537
Rapid Miner- R Extension More stats 518
Creating a Blog Aggregator for free More stats 504
So which software is the best analytical software? Sigh- It depends More stats 473
Revolution R for Linux More stats 465
John Sall sets JMP 9 free to tango with R More stats 460

So how do people come here –

well I guess I owe Tal G for almost 9000 views ( incidentally I withdrew posting my blog from R- Bloggers and Analyticbridge blogs – due to SEO keyword reasons and some spam I was getting see (below))

http://r-bloggers.com is still the CAT’s whiskers and I read it  a lot.

I still dont know who linked my blog to a free sex movie site with 400 views but I have a few suspects.

2009-12-31 to Today

Referrer Views
r-bloggers.com 9,131
Reddit 3,829
rattle.togaware.com 1,500
Twitter 1,254
Google Reader 1,215
linkedin.com 717
freesexmovie.irwanaf.com 422
analyticbridge.com 341
Google 327
coolavenues.com 322
Facebook 317
kdnuggets.com 298
dataminingblog.com 278
en.wordpress.com 185
google.co.in 151
xianblog.wordpress.com 130
inside-r.org 124
decisionstats.com 119
ifreestores.com 117
bits.blogs.nytimes.com 108

Still reading this post- gosh let me sell you some advertising. It is only $100 a month (yes its a recession)

Advertisers are treated on First in -Last out (FILO)

I have been told I am obsessed with SEO , but I dont care much for search engines apart from Google, and yes SEO is an interesting science (they should really re name it GEO or Google Engine Optimization)

Apparently Hadley Wickham and Donald Farmer are big keywords for me so I should be more respectful I guess.

Search Terms for 365 days ending 2010-12-31 (Summarized)

2009-12-31 to Today

Search Views
libre office 925
facebook analytics 798
test drive a chrome notebook 467
test drive a chrome notebook. 215
r gui 203
data mining 163
wps sas lawsuit 158
wordle.net 133
wps sas 123
google maps jet ski 123
test drive chrome notebook 96
sas wps 89
sas wps lawsuit 85
chrome notebook test drive 83
decision stats 83
best statistics software 74
hadley wickham 72
google maps jetski 72
libreoffice 70
doug savage 65
hive tutorial 58
funny india 56
spss certification 52
donald farmer microsoft 51
best statistical software 49

What about outgoing links? Apparently I need to find a way to ask Google to pay me for the free advertising I gave their chrome notebook launch. But since their search engine and browser is free to me, guess we are even steven.

Clicks for 365 days ending 2010-12-31 (Summarized)

2009-12-31 to Today

URL Clicks
rattle.togaware.com 378
facebook.com/Decisionstats 355
rapid-i.com/content/view/182/196 319
services.google.com/fb/forms/cr48basic 313
red-r.org 228
decisionstats.wordpress.com/2010/05/07/the-top-statistical-softwares-gui 199
teamwpc.co.uk/products/wps 162
r4stats.com/popularity 148
r-statistics.com/2010/04/r-and-the-google-summer-of-code-2010-accepted-students-and-projects 138
socserv.mcmaster.ca/jfox/Misc/Rcmdr 138
spss.com/certification 116
learnr.wordpress.com 114
dudeofdata.com/decisionstats 108
r-project.org 107
documentfoundation.org/faq 104
goo.gl/maps/UISY 100
inside-r.org/download 96
en.wikibooks.org/wiki/R_Programming 92
nytimes.com/external/readwriteweb/2010/12/07/07readwriteweb-report-google-offering-chrome-notebook-test-11919.html 92
sourceforge.net/apps/mediawiki/rkward/index.php?title=Main_Page 92
analyticdroid.togaware.com 88
yeroon.net/ggplot2 87

so in 2010,

SAS remained top daddy in business analytics,

R made revolutionary strides in terms of new packages,

JMP  launched a new version,

SPSS got integrated with Cognos,

Oracle sued Google and did build a great Data Mining GUI,

Libre Office gave you a non Oracle Open office ( or open even more office)

2011 looks like  a fun year. Have safe partying .

Google Books Ngram Viewer

Here is a terrific data visualization from Google based on their digitized books collection. How does it work, basically you can test the frequency of various words across time periods from 1700s to 2010.

Like the frequency and intensity of kung fu vs yoga, or pizza versus hot dog. The basic datasets scans millions /billions of words.

Here is my yoga vs kung fu vs judo graph.

http://ngrams.googlelabs.com/info

What’s all this do?

When you enter phrases into the Google Books Ngram Viewer, it displays a graph showing how those phrases have occurred in a corpus of books (e.g., “British English”, “English Fiction”, “French”) over the selected years. Let’s look at a sample graph:

This shows trends in three ngrams from 1950 to 2000: “nursery school” (a 2-gram or bigram), “kindergarten” (a 1-gram or unigram), and “child care” (another bigram). What the y-axis shows is this: of all the bigrams contained in our sample of books written in English and published in the United States, what percentage of them are “nursery school” or “child care”? Of all the unigrams, what percentage of them are “kindergarten”? Here, you can see that use of the phrase “child care” started to rise in the late 1960s, overtaking “nursery school” around 1970 and then “kindergarten” around 1973. It peaked shortly after 1990 and has been falling steadily since.

(Interestingly, the results are noticeably different when the corpus is switched to British English.)

Corpora

Below are descriptions of the corpora that can be searched with the Google Books Ngram Viewer. All of these corpora were generated in July 2009; we will update these corpora as our book scanning continues, and the updated versions will have distinct persistent identifiers.

Informal corpus name Persistent identifier Description
American English googlebooks-eng-us-all-20090715 Same filtering as the English corpus but further restricted to books published in the United States.
British English googlebooks-eng-gb-all-20090715 Same filtering as the English corpus but further restricted to books published in Great Britain.

Choosing R for business – What to consider?

A composite of the GNU logo and the OSI logo, ...
Image via Wikipedia

Additional features in R over other analytical packages-

1) Source Code is given to ensure complete custom solution and embedding for a particular application. Open source code has an advantage that is extensively peer- reviewed in Journals and Scientific Literature.  This means bugs will found, shared and corrected transparently.

2) Wide literature of training material in the form of books is available for the R analytical platform.

3) Extensively the best data visualization tools in analytical software (apart from Tableau Software ‘s latest version). The extensive data visualization available in R is of the form a variety of customizable graphs, as well as animation. The principal reason third-party software initially started creating interfaces to R is because the graphical library of packages in R is more advanced as well as rapidly getting more features by the day.

4) Free in upfront license cost for academics and thus budget friendly for small and large analytical teams.

5) Flexible programming for your data environment. This includes having packages that ensure compatibility with Java, Python and C++.

 

6) Easy migration from other analytical platforms to R Platform. It is relatively easy for a non R platform user to migrate to R platform and there is no danger of vendor lock-in due to the GPL nature of source code and open community.

Statistics are numbers that tell (descriptive), advise ( prescriptive) or forecast (predictive). Analytics is a decision-making help tool. Analytics on which no decision is to be made or is being considered can be classified as purely statistical and non analytical. Thus ease of making a correct decision separates a good analytical platform from a not so good analytical platform. The distinction is likely to be disputed by people of either background- and business analysis requires more emphasis on how practical or actionable the results are and less emphasis on the statistical metrics in a particular data analysis task. I believe one clear reason between business analytics is different from statistical analysis is the cost of perfect information (data costs in real world) and the opportunity cost of delayed and distorted decision-making.

Specific to the following domains R has the following costs and benefits

  • Business Analytics
    • R is free per license and for download
    • It is one of the few analytical platforms that work on Mac OS
    • It’s results are credibly established in both journals like Journal of Statistical Software and in the work at LinkedIn, Google and Facebook’s analytical teams.
    • It has open source code for customization as per GPL
    • It also has a flexible option for commercial vendors like Revolution Analytics (who support 64 bit windows) as well as bigger datasets
    • It has interfaces from almost all other analytical software including SAS,SPSS, JMP, Oracle Data Mining, Rapid Miner. Existing license holders can thus invoke and use R from within these software
    • Huge library of packages for regression, time series, finance and modeling
    • High quality data visualization packages
    • Data Mining
      • R as a computing platform is better suited to the needs of data mining as it has a vast array of packages covering standard regression, decision trees, association rules, cluster analysis, machine learning, neural networks as well as exotic specialized algorithms like those based on chaos models.
      • Flexibility in tweaking a standard algorithm by seeing the source code
      • The RATTLE GUI remains the standard GUI for Data Miners using R. It was created and developed in Australia.
      • Business Dashboards and Reporting
      • Business Dashboards and Reporting are an essential piece of Business Intelligence and Decision making systems in organizations. R offers data visualization through GGPLOT, and GUI like Deducer and Red-R can help even non R users create a metrics dashboard
        • For online Dashboards- R has packages like RWeb, RServe and R Apache- which in combination with data visualization packages offer powerful dashboard capabilities.
        • R can be combined with MS Excel using the R Excel package – to enable R capabilities to be imported within Excel. Thus a MS Excel user with no knowledge of R can use the GUI within the R Excel plug-in to use powerful graphical and statistical capabilities.

Additional factors to consider in your R installation-

There are some more choices awaiting you now-
1) Licensing Choices-Academic Version or Free Version or Enterprise Version of R

2) Operating System Choices-Which Operating System to choose from? Unix, Windows or Mac OS.

3) Operating system sub choice- 32- bit or 64 bit.

4) Hardware choices-Cost -benefit trade-offs for additional hardware for R. Choices between local ,cluster and cloud computing.

5) Interface choices-Command Line versus GUI? Which GUI to choose as the default start-up option?

6) Software component choice- Which packages to install? There are almost 3000 packages, some of them are complimentary, some are dependent on each other, and almost all are free.

7) Additional Software choices- Which additional software do you need to achieve maximum accuracy, robustness and speed of computing- and how to use existing legacy software and hardware for best complementary results with R.

1) Licensing Choices-
You can choose between two kinds of R installations – one is free and open source from http://r-project.org The other R installation is commercial and is offered by many vendors including Revolution Analytics. However there are other commercial vendors too.

Commercial Vendors of R Language Products-
1) Revolution Analytics http://www.revolutionanalytics.com/
2) XL Solutions- http://www.experience-rplus.com/
3) Information Builder – Webfocus RStat -Rattle GUI http://www.informationbuilders.com/products/webfocus/PredictiveModeling.html
4) Blue Reference- Inference for R http://inferenceforr.com/default.aspx

  1. Choosing Operating System
      1. Windows

 

Windows remains the most widely used operating system on this planet. If you are experienced in Windows based computing and are active on analytical projects- it would not make sense for you to move to other operating systems. This is also based on the fact that compatibility problems are minimum for Microsoft Windows and the help is extensively documented. However there may be some R packages that would not function well under Windows- if that happens a multiple operating system is your next option.

        1. Enterprise R from Revolution Analytics- Enterprise R from Revolution Analytics has a complete R Development environment for Windows including the use of code snippets to make programming faster. Revolution is also expected to make a GUI available by 2011. Revolution Analytics claims several enhancements for it’s version of R including the use of optimized libraries for faster performance.
      1. MacOS

 

Reasons for choosing MacOS remains its considerable appeal in aesthetically designed software- but MacOS is not a standard Operating system for enterprise systems as well as statistical computing. However open source R claims to be quite optimized and it can be used for existing Mac users. However there seem to be no commercially available versions of R available as of now for this operating system.

      1. Linux

 

        1. Ubuntu
        2. Red Hat Enterprise Linux
        3. Other versions of Linux

 

Linux is considered a preferred operating system by R users due to it having the same open source credentials-much better fit for all R packages and it’s customizability for big data analytics.

Ubuntu Linux is recommended for people making the transition to Linux for the first time. Ubuntu Linux had an marketing agreement with revolution Analytics for an earlier version of Ubuntu- and many R packages can  installed in a straightforward way as Ubuntu/Debian packages are available. Red Hat Enterprise Linux is officially supported by Revolution Analytics for it’s enterprise module. Other versions of Linux popular are Open SUSE.

      1. Multiple operating systems-
        1. Virtualization vs Dual Boot-

 

You can also choose between having a VMware VM Player for a virtual partition on your computers that is dedicated to R based computing or having operating system choice at the startup or booting of your computer. A software program called wubi helps with the dual installation of Linux and Windows.

  1. 64 bit vs 32 bit – Given a choice between 32 bit versus 64 bit versions of the same operating system like Linux Ubuntu, the 64 bit version would speed up processing by an approximate factor of 2. However you need to check whether your current hardware can support 64 bit operating systems and if so- you may want to ask your Information Technology manager to upgrade atleast some operating systems in your analytics work environment to 64 bit operating systems.

 

  1. Hardware choices- At the time of writing this book, the dominant computing paradigm is workstation computing followed by server-client computing. However with the introduction of cloud computing, netbooks, tablet PCs, hardware choices are much more flexible in 2011 than just a couple of years back.

Hardware costs are a significant cost to an analytics environment and are also  remarkably depreciated over a short period of time. You may thus examine your legacy hardware, and your future analytical computing needs- and accordingly decide between the various hardware options available for R.
Unlike other analytical software which can charge by number of processors, or server pricing being higher than workstation pricing and grid computing pricing extremely high if available- R is well suited for all kinds of hardware environment with flexible costs. Given the fact that R is memory intensive (it limits the size of data analyzed to the RAM size of the machine unless special formats and /or chunking is used)- it depends on size of datasets used and number of concurrent users analyzing the dataset. Thus the defining issue is not R but size of the data being analyzed.

    1. Local Computing- This is meant to denote when the software is installed locally. For big data the data to be analyzed would be stored in the form of databases.
      1. Server version- Revolution Analytics has differential pricing for server -client versions but for the open source version it is free and the same for Server or Workstation versions.
      2. Workstation
    2. Cloud Computing- Cloud computing is defined as the delivery of data, processing, systems via remote computers. It is similar to server-client computing but the remote server (also called cloud) has flexible computing in terms of number of processors, memory, and data storage. Cloud computing in the form of public cloud enables people to do analytical tasks on massive datasets without investing in permanent hardware or software as most public clouds are priced on pay per usage. The biggest cloud computing provider is Amazon and many other vendors provide services on top of it. Google is also coming for data storage in the form of clouds (Google Storage), as well as using machine learning in the form of API (Google Prediction API)
      1. Amazon
      2. Google
      3. Cluster-Grid Computing/Parallel processing- In order to build a cluster, you would need the RMpi and the SNOW packages, among other packages that help with parallel processing.
    3. How much resources
      1. RAM-Hard Disk-Processors- for workstation computing
      2. Instances or API calls for cloud computing
  1. Interface Choices
    1. Command Line
    2. GUI
    3. Web Interfaces
  2. Software Component Choices
    1. R dependencies
    2. Packages to install
    3. Recommended Packages
  3. Additional software choices
    1. Additional legacy software
    2. Optimizing your R based computing
    3. Code Editors
      1. Code Analyzers
      2. Libraries to speed up R

citation-  R Development Core Team (2010). R: A language and environment for statistical computing. R Foundation for Statistical Computing,Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.

(Note- this is a draft in progress)

An Introduction to Data Mining-online book

I was reading David Smith’s blog http://blog.revolutionanalytics.com/

where he mentioned this interview of Norman Nie, at TDWI

http://tdwi.org/Articles/2010/11/17/R-101.aspx?Page=2

where I saw this link (its great if you want to study Data Mining btw)

http://www.kdnuggets.com/education/usa-canada.html

and I c/liked the U Toronto link

http://chem-eng.utoronto.ca/~datamining/

Best of All- I really liked this online book created by Professor S. Sayad

Its succinct and beautiful and describes all of the Data Mining you want to read in one Map (actually 4 images painstakingly assembled with perfection)

The best thing is- in the original map- even the sub items are click-able for specifics like Pie Chart and Stacked Column chart are not in one simple drop down like Charts- but rather by nature of the kind of variables that lead to these charts. For doing that- you would need to go to the site itself- ( see http://chem-eng.utoronto.ca/~datamining/dmc/categorical_variables.htm

vs

http://chem-eng.utoronto.ca/~datamining/dmc/categorical_numerical.htm

Again- there is no mention of the data visualization software used to create the images but I think I can take a hint from the Software Page which says software used are-

Software

See it on your own-online book (c)Professor S. Sayad

Really good DIY tutorial

http://chem-eng.utoronto.ca/~datamining/dmc/data_mining_map.htm

AsterData partners with Tableau

This chart represents several constituent comp...
Image via Wikipedia

Tableau which has been making waves recntly with its great new data visualization tool announced a partner with my old friends at AsterData. Its really cool piece of data vis and very very fast on the desktop- so I can imagine what speed it can help with AsterData’s MPP Row and Column Zingbang AND Parallel Analytical Functions

Tableau and AsterData also share the common Stanfordian connection (but it seems software is divided quite equally between Stanford, Hardvard Dropouts and North Carolina )

It remains to be seen in this announcement how much each company  can leverage the partnership or whether it turns like the SAS Institute- AsterData partnership last year or whether it is just to announce connectors in their software to talk to each other.

See a Tableau vis at

http://public.tableausoftware.com/views/geographyofdiabetes/Dashboard2?:embed=yes&:toolbar=yes

AsterData remains the guys with the potential but I would be wrong to say MapReduceSQL is as hot in December 2010 as it was in June 2009- and the elephant in the room would be Hadoop. That and Google’s continued shyness from encashing its principal comptency of handling Big Data (but hush – I signed a NDA with the Google Prediction API– so things maaaay change very rapidly on ahem that cloud)

Disclaimer- AsterData was my internship sponsor during my winter training while at Univ of  Tenn.