Interesting Data Visualization:Friendwheels

Here is an interesting Facebook Application that I used to generate clusters among my 900( or 400 top) Facebook Connections. What is interesting is the way it drew lines in a circle showing which friends I am most connected with – a bit like analysis of my own social network. It could be interesting if we could apply this to business cases like organizational resource planning or even client relationship management ( or quite traditionally even credit card fraud or risk /marketing analysis)

Thats my network

and this is the main clusters I could draw ( note the number represents the number of common friends/connections)

The FB app was at http://apps.facebook.com/friendwheel/

Wordle.net

Here is some cool visualization of words of a poem I wrote .

Courtesy tools at Wordle.net

Wordle: The Extroverted Engineer

Here is a link to the underlying words of the Extroverted Engineer

https://decisionstats.wordpress.com/2009/10/24/poem-the-extroverted-engineer/

Interview Hadley Wickham R Project Data Visualization Guru

Here is an interview with the genius behind many of the R Project’s Graphical Packages- Dr Hadley Wickham.

Ajay– Describe your pivotal moments in your career in science from a high school science student leading up till here as a professor.

Hadley– After high school I went to medical school. After three years and a degree I realised that I really didn’t want to be a doctor so I went back to two topics that I had enjoyed in high school: programming and statistics. I really loved the practice of statistics, digging in to data and figuring out what was going on, but didn’t find the theoretical study of computer science so interesting. That spurred me to get my MSc in Statistics and then to apply to graduate school in the US.

The next pivotal moment occurred when I accepted a PhD offer from Iowa State. I applied to ISU because I was interested in multivariate data and visualisation and heard that the department had a focus on those two topics, through the presence of Di Cook and Heike Hofmann. I couldn’t have made a better choice – Di and Heike were fantastic major professors and I loved the combination of data analysis, software development and teaching that they practiced. That in turn lead to my decision to look for a job in academia.

Ajay– You have created almost ten R Packages as per your website http://had.co.nz/. Do you think there is a potential for a commercial version for a data visualization R software? What are your views on the current commercial R packages?

Hadley– I think there’s a lot of opportunity for the development of user-friendly data visualisation tools based on R. These would be great for novices and casual users, wrapping up the complexities of the command-line into an approachable GUI – see Jeroen Oom’s http://yeroon.net/ggplot2 for an example.

Developing these tools is not something that is part of my research endeavors. I’m a strong believer in the power of computational thinking and the advantages that programming (instead of pointing and clicking) brings. Creating visualizations with code makes reproducibility, automation and communication much easier – all of which are important for good science.

Commercial packages fill a hole in the R ecosystem. They make R more palatable to enterprise customers with guaranteed support, and they can offer a way to funnel some of that money back into the R ecosystem. I am optimistic about the future of these endeavors.

Ajay– Clearly with your interest in graphics, you seem to favor visual solutions. Do you also feel that R Project could benefit from better R GUIs or GUIs for specific packages?

Hadley– See above – while GUIs are useful for novices and casual users, they are not a good fit for the demands of science. In my opinion, what R needs more are better tutorials and documentation so that people don’t need to use GUIs. I’m very excited about the new dynamic html help system – I think it has huge potential for making R easier to use.

Compared to other programming languages, R currently lacks good online (free) introductions for new users. I think this is because many R developers are academics and the incentives aren’t there to make freely available documentation. Personally, I would love to make (e.g.) the ggplot2 book available openly available under a creative common license, but I would receive no academic credit for doing so.

Ajay– Describe the top 3-5 principles which you have explained in your book, ggplot2: Elegant graphics for data analysis). What are other important topics that you cover in the book?

Hadley– The ggplot2 book gives you the theory to understand the construction of almost any statistical graphic. With this theory in hand, you are much better equipped to create visualisations that are tailored to the exact problem you face, rather than having to rely on a canned set of pre-made graphics.

The book is divided into sections based on the components of this theory, called the layered grammar of graphics, which is based on Lee Wilkinson’s excellent “The Grammar of Graphics”. It’s quite possible to use ggplot2 without understanding these components, but the better you understand, the better your ability to critique and improve your graphics.

Ajay– What are the five best tutorials that you would recommend for students learning data visualization in R? As a data visualization person do you feel that R could do with more video tutorials?

Hadley– If you want to learn about ggplot2, I’d highly recommend the following two resources:

* The Learning R blog, http://learnr.wordpress.com/
* The ggplot2 mailing list, http://groups.google.com/group/ggplot2

For general data management and manipulation (often needed before you can visualise data) and visualisation using base graphics, Quick-R (http://www.statmethods.net/) is very useful.

Local useR groups can be an excellent if you live nearby. Lately, the bay area (http://www.meetup.com/R-Users/) and the New York (http://www.meetup.com/nyhackr/) useR groups have had some excellent speakers on visualisation, and they often post slides and videos online.

Ajay– What are your personal hobbies? How important are work-life balance and serendipity for creative, scientific and academic people?

Hadley– When I’m not working, I enjoy reading and cooking. I find it’s important to take regular breaks from my research and software development work. When I come back I’m usually bursting with new ideas. Two resources that have helped shape my views on creativity and productivity are Elizabeth’s Gilbert TED talk on nurturing creativity (http://www.ted.com/index.php/talks/elizabeth_gilbert_on_genius.html) and
“The Creative Habit: Learn It and Use It for Life”, by Twyla Twarp (http://amzn.com/0743235266). I highly recommend both of them.

Dr Wickham’s impressive biography can be best seen at http://had.co.nz/

Climate Die Oxide ( Updated)


Here is some room for thought in climate control negotiations.

[tweetmeme=”Decisionstats”]

Decisionstats on Facebook

1) What is the expected date of melting of glaciers in Himalayas thus affecting sacred rivers like Ganges and also causing floods in densely populated Asia. How would nation states with shareable resources like Water react on the disputes, dams , hydro electricity and floods.

2) How would you count per capita CO2 consumption- Assume a Factory in China makes 3 tonnes of C02 every year but exports all its products to USA on Indian Cargo ship. Travel contributes another 1 tonne of C02 including air travel, visits etc.

As of now this will be counted as 3 tonne for China, 1 Tonne for India, X tonne for USA ? What is wrong in these assumptions.

3) Some countries that used to be cold will get warmer- will that lead to extra crops. Which countries will that be.

4) It took a world war to create fission. Will it take another World War on Energy to create fusion. How much energy and resources are needed for creating a dedicated project ManHatten 2 for sharing with the world.

5) Most of the bigger data owned by climate change observations is in the Western Hemisphere under National labs not under UN control OR INSPECTION. How sacrosanct is the data to fudging, or infiltration by intelligence agencies of those countries hoping to influence bargaining chips on the climate change table.

6) Are there last action military ways to change climate during wars- like cause glaciers to melt by thermal bombs, earthquakes by seismic sensitive explosions and how high tech are these solutions and which countries have them.

7) If the planet is running out of Resources- why dont we go to Mars. 🙂

Source

http://manyeyes.alphaworks.ibm.com/manyeyes/files/thumbnails/bb09d328-d863-11de-a602-000255111976.wm.png

Note this is from 2006 Data, so assume 2009 CO2 as more than this.

Data Source-

TN guys at ORNL at http://cdiac.ornl.gov/trends/emis/glo.html

Data Visualization: MANY EYES IBM

http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/2006-co2-emissions-by-country

Data Visualization and Politics

Here is a Data Visualization graphic from Office of the Joint Chiefs of Staff showing the clear way for Afghanistan.

Reminds me of the quote mistakenly attributed to Shakespeare-

Oh what a tangled web ( ^+^) we weave
When first we practice to deceive.
– Sir Walter Scott (Marmion, 1808)

Disclaimer- As someone whose Hindu grandparents emigrated from Pakistan, I recommend reading ” A Brief History of the Sikhs” for a military /story on Afghanistan. The Sikhs were the first to conquer and occupy those deserted mountains- after Alexander of Macedonia/Seleceus
Graphic Citation
http://msnbcmedia.msn.com/i/MSNBC/Components/Photo/2009/December/091202/091203-engel-big-9a.jpg

Best Internet Site of 2009

Here is the best internet site of 2009.
It basically shows how many jobs have been created per dollar spent.
Funded by the debt of American Treasuries………

Here is the best internet site of 2009.
It basically shows how many jobs have been created per dollar spent.
Funded by the debt of American Treasuries
sold to Chinese.

Remember the Chinese Opium Wars.
Well the Chinese are hooked to American Treasuries and they probably need a Warship with Admiral to open their markets and currency. Oui!

Well anyway the website is called http://Recovery.gov

Interview Dominic Pouzin Data Applied

Here is an interview with Dominic Pouzin, CEO of http://www.data-applied.com which is a startup making waves in the fields of Data Visualization.
meAjay – Describe your career in applied science. What made you decide to pursue a career in science? Some people think that careers in science are boring. How would you convince a high school student to choose a career in science?

Dominic- It’s important to realize that we are surrounded by products of science and engineering. By products of science, I mean bridges we cross on our way to work, video games we play for entertainment, or even the fabric of clothes we wear. Anyone who is curious should want to know how things really work. In that case, a scientific education makes sense, because it provides the tools necessary to understand and improve our world. I would also argue that a scientific training can also be a stepping stone towards high levels of achievements in other fields. For example, to become a financial wizard, a top patent attorney, or direct large clinical trials, a scientific education serves as a strong foundation. In addition, it’s probably easier to switch from science to another field than the other way round. Who wants to learn about matrix calculus in their forties? In my case, I graduated with a Masters in Computer Science degree, and spent 10 years at Microsoft leading software development teams for the Windows server, Exchange server, and Dynamics CRM product lines. I wish that, along the way, I had found time for a PhD in data mining, but years of practical software engineering experience also has its advantages.

Ajay- What advice would you give to someone who just got laid off, and is pondering whether he should / should not start a business?

Dominic- Working for a large company used to mean trading some autonomy for more stability and access to a wide array of resources. However, in this economy, the terms of the equation have changed. Many workers who lost their jobs found that this stability had disappeared. Others found that resources have become scarcer due to shrinking budgets. With this shift in the balance, entrepreneurship starts becoming more appealing.

Creating your own business might sound daunting, but for example creating a US Washington State LLC takes about 15 minutes, costs 200 dollars, and only requires an Internet connection. Managing payroll may sound like a big headache, but again specialized companies can handle all payroll matters on your behalf for only a few dollars a month. So while this part is relatively easy, you also need two things which are more difficult to come by:

a/ an unshakable belief in what you are trying to achieve, and

b/ a willingness to handle anything that comes your way.

You need to think like a commando solider who just landed on a beach: you’ve got great skills, but you’re alone, and can’t afford to fail. Practically, you may find yourself working for weeks or months with little or no income, and friends and family thinking that you are wasting your time. So, if necessary, try finding a co-founder to boost your confidence and motivate one another. Also, unless you want to spend most of your time chasing people for money, personal savings are a must.

Ajay- So describe your company. How does data visualization work? What differentiates your company from so many data visualization companies?

Dominic- We’re trying to stir things up a bit in terms of making it easier for regular business users to benefit from data mining. For example, we enable new “BI in the cloud” scenarios by allowing users to simply point a browser to access analysis results, or by allowing applications to submit and analyze data using an XML-based API. Built-in collaboration features, and more interactive visualizations, are also definitely part of our story.

Finally, while we focus on data mining (ex: time series forecasting, association rule mining, decision trees, etc.), we also make available other things such as pivot charts or tree maps. No data mining algorithm there, but why should business users care as long as the insight is there?

dataapplied_overview-500x326

To answer your question about visualization, most packages offer basic features such as the ability to pick colors, or to change labels, etc. For differences to emerge, you have to ask the right questions.

*

Access: does visualization require an application to be installed on each computer? Our visualization work directly from a web page, so there is nothing to install (and upgrades are automatic).
*

Search: can visualization results be searched, so as to enable drill-down scenarios? In the age of Google, we enable search everywhere, so that views can be constrained to what the user is looking for.
*

Collaboration: can visualization results be tagged using comments, or shared with other users while securely controlling access, etc.? Visualization is only a starting point – chances are that you will need to talk to someone before analysis is complete – so we offer plenty of collaboration features.
*

Export:how easy is it for a business user to present analysis results to management in a way that is understandable? We make it easy to export visualization content to a shared gallery, and as presentation-ready images.

There are a couple of other things we do as well in terms of interaction (ex: zoom, select, focus, smart graph layout), and a couple we don’t have yet (ex: geo-mapping, export to PDF).

But in conclusion, I would say that useful data visualization is as much about the way you present data (and that must be compelling!), as it is about how one accesses, searches, secures, shares, or exports visualizations.

Ajay- The technology sector was hit the hardest by the immigration of skilled workers. As a technology worker, what do you have to say about immigration? What do you have to say about outsourcing? Do you have any plans for selling your products outside the United States?

Dominic- I am a US permanent resident, half French, half British, and my wife is Indian. So you won’t find it surprising to hear that I am in favor of immigration. In 1996, as an engineering student in France, I made the unusual choice to study one year at the Indian Institute of Technology (Delhi).

In fact, I was the only one in my engineering college (France’s largest) to select India as a destination (my friends all went to the US, UK, Australia, Germany, etc.). Now that India has become a recognized player in the IT field, several dozen students from the same engineering college chose India as a destination. So I guess the immigration is starting to flow both ways!

Also, among the people I used to work with at Microsoft and who left to start a company, a good proportion are immigrants. So it’s important to recognize that immigrants not only help fill high-tech positions, but also create jobs.

Finally, as an entrepreneur trying to keep costs low, outsourcing is a tool you can’t afford to ignore. For example, websites such as http://www.elance.com provide easy access to the global marketplace. For those worried about quality, it’s possible to review customer ratings and portfolios. We keep track of visitors coming to our website, and the majority of the visitors to date have been from outside the US.

Ajay-  What is the basic science used by your company’s product?

Dominic – We use a client / server model. On the server, at the lowest level, we use SQL databases (accessed using ODBC), acting as data and configuration repositories.

Immediately above that sits a computing layer, which offers scalable, distributed data mining algorithms. We implement algorithms which scale well with the number of rows and attributes, but also properly handle a mix of discrete / numeric / missing values.

For example, just for clustering, the literature has some incredibly powerful algorithms (ex: WaveCluster, an algorithm based on wavelet transforms), but which also fail as soon as you enter real-world situations (ex: some fields are discrete).

On top of the computing layer sits a rich, secure web-based XML API, which allows users to manipulate analysis and collaboration objects, while enforcing security.

For the client, we built a web-based visualization application using Microsoft Silverlight. To ensure client / server communications are as efficient as possible, we use a fair amount of data compression and caching.

Ajay-  Who are your existing clients and what is the product launch plan for next year?

Dominic- We’re only in alpha mode right now, so our next customers are in fact beta testers. We’re still busy adding new features. It’s good to be small and nimble, it allows us to move quickly. Sorry, I can’t confirm any launch date yet!

Ajay-  What does the CEO of a startup company do, when he has free time (assuming he has any)?

Dominic- When you spend most of your time working on analytics, it’s sometimes hard to leave your analytical brain at work.

For example, I am sure that readers who come to your website and visit a casino can’t help themselves and immediately start calculating the exact odds of winning (instead of just having fun).

Among other things, I enjoy challenging friends to programming puzzles (actually, they’re recycled Microsoft interview questions). My current bedtime reading is a book about data compression. I think you got the picture!

******************************************************************

Dominic is currently making promising data visualization products at http://data-applied.com/ .To read more about him, please visit his profile page http://www.analyticbridge.com/profile/DominicPouzin