Data Mining Music

AA classic paper by Donald E Knuth (creator  of Tex) on the information complexity of songs can help listeners of music with an interest in analytics. This paper is a classic and dates from 1985 but is pertinent even today.

 

Free Tibet

We should all ask China to free Tibet because of the following reasons-

10 Reasons to Free Tibet

1) Replace a system of governance which is giving 12% GDP growth with a 1000 year old belief that one old guy is really a reincarnation of GOD

2) Because it is a romantic idea

3) The average Tibetan is much better economically than most other countries in Asia and Africa. Still freedom is messy- Donald Rumsfield.

4) So we can sell beer, Facebook ads, Internet Pornography to Tibetans which do not have the liberty to do so currently

5) So we can explore that area for mining and minerals

6) Damn it. We need one more ally for the free world. So we can invade more non free countries.

7)  Tibetans girls are hot.

8) Dalai Lama is cool. and he doesnot charge by the hour unlike other yoga Gurus.

9) We need to encircle China just like we did in the 19th Century and Opium Wars

10) So artists like Ai Wei Wei can blog freely

1 Reason not to Free Tibet

1) Tibetans want to be free. If we give them democracy- they will be disappointed to know that the bullets just get replaced by the pepper spray. How silly is that? The desire to be free- when there is no such thing as free anymore.

(This was an article in Sarcasm and meant as literary and not a pseudo-intellectual political article. I have no training in Politics. For details see http://en.wikipedia.org/wiki/Sarcasm

Protected: Happy Labour Day to American Stats-ical Association

This content is password protected. To view it please enter your password below:

Jim Goodnight for US Senate: Op Ed

Jim Goodnight, Chief Executive Officer, SAS, U...
Image via Wikipedia

This is NOT an April fool joke or a publicity stunt. It is also not meant to provoke discussion for the sake of provocation.

For a time, as I have studied both US and India , in what makes Government work or fail, academia work or fail, or businesses to work or fail- a common thread is the quality of people involved. Someone who is a wasteful businessman, will be a wasteful politician. Someone who is a flamboyant businessman with flair more than substance will continue that in public life.
Accordingly I have created a Facebook cause-

Jim Goodnight for the US Senate

http://www.causes.com/causes/600220-jim-goodnight-for-the-us-senate

If Donald Trump can run for President, I can think of no one who has done more for the American South. Unlike the tech heavy, Stanford dominated boom in California, the Mid West and South have been declining centers of influence. Cities like Austin Texas or Raleigh, North California are the exception rather than norm there. A friend who went to Duke once told me, the worst thing is to be borne a rural white male who is poor in America. There are no groups lobbying for education or internet hi fi blazing speeds for you. Socially you are expected to walk and thrive alone.

The Southern Baptist Church has managed to infiltrate and influence young minds there- the average conservative American seemed better off and happier in his moderated social behaviour. But the Church exacts a 10 % tithe, and it is efficient in stretching every dollar and every cent of church donations. Government works with the best intentions, but spending someone else’s money (your tax money money by a bureaucrat) is always more inefficient than the actual owner spending it alone. Taxes are higher than the 10 % tithe and seem to accomplish much less social change. You would rather go to work or go to war?

Accordingly I find that on the West Coast there are very few tech savvy leaders with a track record of both fiscal pragmatism, educational reform and job creation. Certainly the industry lobbyist is smarter at evading taxes than the average Joe, and campaign financing is still dependent on deep pockets despite the innovations of internet retail fund raising.

Would you like your Senator to be as considerate of creating jobs as entrepreneurs are. Jim Goodnight here is a metaphor for all entrepreneurs who dont believe in reckless hire-fire,outsourcing and long term views on people.

Click here to spread this cause- perhaps it will make existing politicians more efficient just by the threat of new competition.

http://www.causes.com/causes/600220-jim-goodnight-for-the-us-senate?recruiter_id=8347178



The Year 2010

Nokia N800 internet tablet, with open source s...
Image via Wikipedia

My annual traffic to this blog was almost 99,000 . Add in additional views on networking sites plus the 400 plus RSS readers- so I can say traffic was 1,20,000 for 2010. Nice. Thanks for reading and hope it was worth your time. (this is a long post and will take almost 440 secs to read but the summary is just given)

My intent is either to inform you, give something useful or atleast something interesting.

see below-

Jan Feb Mar Apr May Jun
2010 6,311 4,701 4,922 5,463 6,493 4,271
Jul Aug Sep Oct Nov Dec Total
5,041 5,403 17,913 16,430 11,723 10,096 98,767

 

 

Sandro Saita from http://www.dataminingblog.com/ just named me for an award on his blog (but my surname is ohRi , Sandro left me without an R- What would I be without R :)) ).

Aw! I am touched. Google for “Data Mining Blog” and Sandro is the best that it is in data mining writing.

DMR People Award 2010
There are a lot of active people in the field of data mining. You can discuss with them on forums. You can read their blogs. You can also meet them in events such as PAW or KDD. Among the people I follow on a regular basis, I have elected:

Ajay Ori

He has been very active in 2010, especially on his blog . Good work Ajay and continue sharing your experience with us!”

What did I write in 2010- stuff.

What did you read on this blog- well thats the top posts list.

2009-12-31 to Today

Title Views
Home page More stats 21,150
Top 10 Graphical User Interfaces in Statistical Software More stats 6,237
Wealth = function (numeracy, memory recall) More stats 2,014
Matlab-Mathematica-R and GPU Computing More stats 1,946
The Top Statistical Softwares (GUI) More stats 1,405
About DecisionStats More stats 1,352
Using Facebook Analytics (Updated) More stats 1,313
Test drive a Chrome notebook. More stats 1,170
Top ten RRReasons R is bad for you ? More stats 1,157
Libre Office More stats 1,151
Interview Hadley Wickham R Project Data Visualization Guru More stats 1,007
Using Red R- R with a Visual Interface More stats 854
SAS Institute files first lawsuit against WPS- Episode 1 More stats 790
Interview Professor John Fox Creator R Commander More stats 764
R Package Creating More stats 754
Windows Azure vs Amazon EC2 (and Google Storage) More stats 726
Norman Nie: R GUI and More More stats 716
Startups for Geeks More stats 682
Google Maps – Jet Ski across Pacific Ocean More stats 670
Not so AWkward after all: R GUI RKWard More stats 579
Red R 1.8- Pretty GUI More stats 570
Parallel Programming using R in Windows More stats 569
R is an epic fail or is it just overhyped More stats 559
Enterprise Linux rises rapidly:New Report More stats 537
Rapid Miner- R Extension More stats 518
Creating a Blog Aggregator for free More stats 504
So which software is the best analytical software? Sigh- It depends More stats 473
Revolution R for Linux More stats 465
John Sall sets JMP 9 free to tango with R More stats 460

So how do people come here –

well I guess I owe Tal G for almost 9000 views ( incidentally I withdrew posting my blog from R- Bloggers and Analyticbridge blogs – due to SEO keyword reasons and some spam I was getting see (below))

http://r-bloggers.com is still the CAT’s whiskers and I read it  a lot.

I still dont know who linked my blog to a free sex movie site with 400 views but I have a few suspects.

2009-12-31 to Today

Referrer Views
r-bloggers.com 9,131
Reddit 3,829
rattle.togaware.com 1,500
Twitter 1,254
Google Reader 1,215
linkedin.com 717
freesexmovie.irwanaf.com 422
analyticbridge.com 341
Google 327
coolavenues.com 322
Facebook 317
kdnuggets.com 298
dataminingblog.com 278
en.wordpress.com 185
google.co.in 151
xianblog.wordpress.com 130
inside-r.org 124
decisionstats.com 119
ifreestores.com 117
bits.blogs.nytimes.com 108

Still reading this post- gosh let me sell you some advertising. It is only $100 a month (yes its a recession)

Advertisers are treated on First in -Last out (FILO)

I have been told I am obsessed with SEO , but I dont care much for search engines apart from Google, and yes SEO is an interesting science (they should really re name it GEO or Google Engine Optimization)

Apparently Hadley Wickham and Donald Farmer are big keywords for me so I should be more respectful I guess.

Search Terms for 365 days ending 2010-12-31 (Summarized)

2009-12-31 to Today

Search Views
libre office 925
facebook analytics 798
test drive a chrome notebook 467
test drive a chrome notebook. 215
r gui 203
data mining 163
wps sas lawsuit 158
wordle.net 133
wps sas 123
google maps jet ski 123
test drive chrome notebook 96
sas wps 89
sas wps lawsuit 85
chrome notebook test drive 83
decision stats 83
best statistics software 74
hadley wickham 72
google maps jetski 72
libreoffice 70
doug savage 65
hive tutorial 58
funny india 56
spss certification 52
donald farmer microsoft 51
best statistical software 49

What about outgoing links? Apparently I need to find a way to ask Google to pay me for the free advertising I gave their chrome notebook launch. But since their search engine and browser is free to me, guess we are even steven.

Clicks for 365 days ending 2010-12-31 (Summarized)

2009-12-31 to Today

URL Clicks
rattle.togaware.com 378
facebook.com/Decisionstats 355
rapid-i.com/content/view/182/196 319
services.google.com/fb/forms/cr48basic 313
red-r.org 228
decisionstats.wordpress.com/2010/05/07/the-top-statistical-softwares-gui 199
teamwpc.co.uk/products/wps 162
r4stats.com/popularity 148
r-statistics.com/2010/04/r-and-the-google-summer-of-code-2010-accepted-students-and-projects 138
socserv.mcmaster.ca/jfox/Misc/Rcmdr 138
spss.com/certification 116
learnr.wordpress.com 114
dudeofdata.com/decisionstats 108
r-project.org 107
documentfoundation.org/faq 104
goo.gl/maps/UISY 100
inside-r.org/download 96
en.wikibooks.org/wiki/R_Programming 92
nytimes.com/external/readwriteweb/2010/12/07/07readwriteweb-report-google-offering-chrome-notebook-test-11919.html 92
sourceforge.net/apps/mediawiki/rkward/index.php?title=Main_Page 92
analyticdroid.togaware.com 88
yeroon.net/ggplot2 87

so in 2010,

SAS remained top daddy in business analytics,

R made revolutionary strides in terms of new packages,

JMP  launched a new version,

SPSS got integrated with Cognos,

Oracle sued Google and did build a great Data Mining GUI,

Libre Office gave you a non Oracle Open office ( or open even more office)

2011 looks like  a fun year. Have safe partying .

Who searches for this Blog?

Statue of Michael Jackson in Eindhoven, the Ne...
Image via Wikipedia

Using WP- Stats I set about answering this question-

What search keywords lead here-

Clearly Michael Jackson is down this year

And R GUI, Data Mining is up.

How does that affect my writing- given I get almost 250 visitors by search engines alone daily- assume I write nothing on this blog from now on.

It doesnt- I still write what ever code or poem that comes to my mind. So it is hurtful people misunderstimate the effort in writing and jump to conclusions (esp if I write about a company- I am not on payroll of that company- just like if  I write about a poem- I am not a full time poet)

Over to xkcd

All Time (for Decisionstats.Wordpress.com)

Search Views
libre office 818
facebook analytics 806
michael jackson history 240
wps sas lawsuit 180
r gui 168
wps sas 154
wordle.net 118
sas wps 116
decision stats 110
sas wps lawsuit 100
google maps jet ski 94
data mining 88
doug savage 72
hive tutorial 63
spss certification 63
hadley wickham 63
google maps jetski 62
sas sues wps 60
decisionstats 58
donald farmer microsoft 45
libreoffice 44
wps statistics 44
best statistics software 42
r gui ubuntu 41
rstat 37
tamilnadu advanced technical training institute tatti 37

YTD

2009-11-24 to Today

Search Views
libre office 818
facebook analytics 781
wps sas lawsuit 170
r gui 164
wps sas 125
wordle.net 118
sas wps 101
sas wps lawsuit 95
google maps jet ski 94
data mining 86
decision stats 82
doug savage 63
hadley wickham 63
google maps jetski 62
hive tutorial 56
donald farmer microsoft 45

Interview Donald Farmer Microsoft

Here is an Interview with Donald Farmer of Microsoft talking about the passion for the exciting business intelligence projects at MS.

Q Describe your career from high school to your current job responsibilities at Microsoft. How can technology companies in America work together to grow the home pool of American science students ( irrespective of market share battles)

A My background is relatively unusual for a technology professional, although at Microsoft one meets people with a very wide range of backgrounds. I had little interest in studying Computer Science formally. For me, software was always a means to an end: a way of solving what were, for me, “more interesting” problems. Of course, I cannot deny that computer science is a compelling subject in itself, just not for me. Yet, from my early teens in Scotland, I had computers to try (starting with the justly famous Sinclair range) and I used them to store, classify and analyze the data I needed for my other work. So, as I studied philosophy and languages, and as I worked in history, archaeology, forestry, fish-farming and so on (through many variations) before I became more completely involved in Business Intelligence, I used database techniques extensively.

I spent some years as a consultant, building all sorts of applications, My first predictive application enabled fish-farmers with private water supplies to balance the needs of fish production and hydro-electric generation based on past, present and predicted rainfall. I believe that application is still in use today, 15 years later!

Later, I joined an excellent group of developers and analysts at AppsMart, building a data mart rapid-development application. That brought me into the Microsoft sphere, as we built on the SQL Server platform and were actively involved in the SQL Server Data Warehouse ecosystem.

With the dot-com bust of 2000, I happily found an opportunity to work with Microsoft. There I started working on Analysis Services, later leading a team of program managers in Integration Services. In that time, we did some really interesting work along with Zhaohui Tang’s team, integrating Data Mining capabilities with our ETL tool, to enable predictive analytics in the flow of data. The implications of this technique are still only being realized: we have used it for imputing missing data, and have an interesting patent on how to use this technique for detecting outliers in streaming data. In addition, we included fuzzy matching techniques from Surajit Chaudhuri’s team, to give even more flexibility.

More recently I have been working in Data Mining, with a marvelous and energetic team under Jamie MacLennan, and then in the last couple of years I have been managing a super team of Program Managers building the client interfaces for our new PowerPivot application.

My current role is not focused on a single product, but rather I look across all the business intelligence products to see how we can engage our engineering knowledge ever more effectively with customers, partners, analysts and, of course, with other teams across Microsoft.

So, as you can see my background is very varied. In some ways, that means that I am not well placed to speak to how the USA can better grow a pool of science students, as I was never one myself. Yet, I do think there are some lessons I can share. Firstly, we should not make the mistake of focusing only on science and technology as an end in itself. We do need to encourage the use of information science techniques in all appropriate fields, including liberal arts, and also “power professions” such as medicine and law. The USA provides wonderful educational opportunities in these fields, but all too often young people have to choose between science and arts. Many of the best talents I have met in the world of analytics have backgrounds which are very diverse.

Q) Describe the current status of SQL Server and Microsoft Data Mining. What are the areas in Business Intelligence we can see much more excitement and innovation in the coming few months from you guys.

A) Data Mining remains one of the most popular technologies in the SQL Server stack. I have presented recently in China, Germany, The Netherlands and the UK, and at every conference the data mining sessions were among the most popular and the most successful. This speaks volumes about the interest in this field. it also reflects how successfully Microsoft has broadened our user base by shipping the Excel Data Mining Add-ins.

Q) How is Microsoft’s cloud computing venture Azure going? How is Sharepoint doing? What do you personally feel on the remote sharing and computing model.

A) Azure and Sharepoint are, of course, very different beasts. Windows Azure, and especially SQL Azure which we launched at PDC in November, are proving to be very popular. In particular SQL Server Azure is really succeeding with it’s strong development and management story – you design and manage cloud databases with the same tools and techniques as you do for on-premise databases. There has been a fabntastic response to this, especially from emerging economies where the idea of having Microsoft manage your data infrastructure at any scale is very attractive. At TechEd South Africa, for example, David Robinson from the SQL Azure team got a tremendous reception. However, there are difficulties in emerging economies because of poor bandwidth. Shortly after David and I were in South Africa, local businesses held a race: they tied a usb stick with files to the leg of a carrier pigeon and set it off home from Pietermaritzburg to Durban, simultaneously trying to download the same files between the same locations online. The pigeon won!

So, I do think the cloud offers tremendous opportunities for business to scale and manage their resources effectively, but it’s early days.

Q And when can I start do data mining from within my Excel workbook- I remember working on a SQL Server Analysis Plugin for an cloud Excel prototype last year.

A You should be using Excel for data mining right now. Just go to http://www.sqlserverdatamining.com and look for the links, on the right hand side of the page. These are released products. You can also go to http://www.sqlserverdatamining.com/cloud to try an experimental cloud service – but it is only experimental and could be up or down at any time.

For more conventional, OLAP-like, analytics you should also try out PowerPivot in beta. See http://www.powerpivot.com . PowerPivot is an application that plugs into Excel and enables business users to build quite complex models, over basically unlimited data volumes, quickly and easily. It’s proving to be hugely popular already. I am sure it will dominate much of the BI news in 2010.

Q) What are the risks, and challenges in creating new technology when working for an Industry leader like Microsoft where the spotlight is on every step you take and the competition is brutal.

A) I simply don’t think about brutal competition. Even in nature I see far more symbiosis than competition. I personally think competition is a very negative mindset although the term “competitor” is the common shorthand for another vendor in the space and I do use it that way myself – but more from habit than conviction.

In the database world, you might say Oracle are our competitors. Yet most of the Oracle customers I know (and I was an Oracle customer myself once) are also SQL Server customers. Often they use Reporting Services, or Analysis Services. Integration Services had to ship a fast-loading Oracle destination, because so many customers want to use SQL Server tools to load Oracle databases. I see far more cases like that, where the picture is complex and symbiotic, than I do of outright competition.

In the analytic space, almost every tool out there has one feature in common – one feature which everyone uses. Export to Excel.

I genuinely love working with our partners, and I am lucky to have good friends throughout the industry: at SAP, Oracle, IBM, SAS … you name it. We all benefit from empowering businesses with better tools. As the old saying goes, “the rising tide lifts all boats.”

Q) In terms of Lines of Code, Microsoft may have given the maximum number of shared libraries and code away- yet sometimes comes from a perception problem because of vintage. Do you think all cool tech companies become not so cool after some years, even if they dont fundamentally change.

A) I think the idea of a company being “cool” is itself just a phase we’re going through as an industry as we’re growing up. As the tech industry matures, you’ll see more emphasis on value, and net contributution. In many ways, Microsoft, and IBM I think, are ahead of the curve, as companies which are valued for their stability, resources and our ability to continually provide compelling new solutions and services. I travel a lot, and I see classrooms in western China, and emerging businesses in Africa, and women starting to work in new careers in the Middle East, and I don’t see them prioritizing cool. But I do see them doing amazing things with Microsoft technology.

Q) Describe your blogging style and what best tips would you give to technology bloggers.

A) I don’t blog enough, sadly, although I do try.

I have two blogs. One, at http://blogs.technet.com/sqlserverexperts/ is a shared “SQL Server Experts” blog. It’s very focussed on Microsoft technologies, of course. I especially like to blog about trends that I am seeing in my work with customers. My other blog, at http://beyeblogs.com/donaldfarmer/ is more personal, and includes gleanings from my other interests. I especially like doing my first blog of April there – that’s always fun.

My advice to bloggers should probably be “do what I say, not what I do.” However, most important I think, is to be authentic in your voice. My business intelligence bloggers are Jill Dyche, Evan Levy, David Loshin, William McKnight and Neil Raden – all of them blog quite regularly and are always great to read. There are others out there who are just as interesting, but don’t quite have the same rhythm to their blogging. I admire, but sadly fail to emulate, those who blog regularly and effectively.

Q) What do you do when not at work.

A) My wife is an artist, and she keeps me busy helping out with events and projects. We live on a wild couple of acres in Washington and caring for that is a lot of fun too. Otherwise, I mostly read, cook and play the piano. I love cooking, although I’m not sure how good I am – my son is now a professional chef, so perhaps I had some influence. I play the piano badly, but I can lose myself in that. I read very well. I love to read poetry – and I struggle to read Chinese poetry in the original. It’s such a fascinating language, and the poetry is so complex and yet so simple. That will be a lifetime study.

Biography-

Donald Farmer is the Principal Program Manager, SQL Server Data Mining, at Microsoft Corp.