Here is an interview with the genius behind many of the R Project’s Graphical Packages- Dr Hadley Wickham.
Ajay– Describe your pivotal moments in your career in science from a high school science student leading up till here as a professor.
Hadley– After high school I went to medical school. After three years and a degree I realised that I really didn’t want to be a doctor so I went back to two topics that I had enjoyed in high school: programming and statistics. I really loved the practice of statistics, digging in to data and figuring out what was going on, but didn’t find the theoretical study of computer science so interesting. That spurred me to get my MSc in Statistics and then to apply to graduate school in the US.
The next pivotal moment occurred when I accepted a PhD offer from Iowa State. I applied to ISU because I was interested in multivariate data and visualisation and heard that the department had a focus on those two topics, through the presence of Di Cook and Heike Hofmann. I couldn’t have made a better choice – Di and Heike were fantastic major professors and I loved the combination of data analysis, software development and teaching that they practiced. That in turn lead to my decision to look for a job in academia.
Ajay– You have created almost ten R Packages as per your website http://had.co.nz/. Do you think there is a potential for a commercial version for a data visualization R software? What are your views on the current commercial R packages?
Hadley– I think there’s a lot of opportunity for the development of user-friendly data visualisation tools based on R. These would be great for novices and casual users, wrapping up the complexities of the command-line into an approachable GUI – see Jeroen Oom’s http://yeroon.net/ggplot2 for an example.
Developing these tools is not something that is part of my research endeavors. I’m a strong believer in the power of computational thinking and the advantages that programming (instead of pointing and clicking) brings. Creating visualizations with code makes reproducibility, automation and communication much easier – all of which are important for good science.
Commercial packages fill a hole in the R ecosystem. They make R more palatable to enterprise customers with guaranteed support, and they can offer a way to funnel some of that money back into the R ecosystem. I am optimistic about the future of these endeavors.
Ajay– Clearly with your interest in graphics, you seem to favor visual solutions. Do you also feel that R Project could benefit from better R GUIs or GUIs for specific packages?
Hadley– See above – while GUIs are useful for novices and casual users, they are not a good fit for the demands of science. In my opinion, what R needs more are better tutorials and documentation so that people don’t need to use GUIs. I’m very excited about the new dynamic html help system – I think it has huge potential for making R easier to use.
Compared to other programming languages, R currently lacks good online (free) introductions for new users. I think this is because many R developers are academics and the incentives aren’t there to make freely available documentation. Personally, I would love to make (e.g.) the ggplot2 book available openly available under a creative common license, but I would receive no academic credit for doing so.
Ajay– Describe the top 3-5 principles which you have explained in your book, ggplot2: Elegant graphics for data analysis). What are other important topics that you cover in the book?
Hadley– The ggplot2 book gives you the theory to understand the construction of almost any statistical graphic. With this theory in hand, you are much better equipped to create visualisations that are tailored to the exact problem you face, rather than having to rely on a canned set of pre-made graphics.
The book is divided into sections based on the components of this theory, called the layered grammar of graphics, which is based on Lee Wilkinson’s excellent “The Grammar of Graphics”. It’s quite possible to use ggplot2 without understanding these components, but the better you understand, the better your ability to critique and improve your graphics.
Ajay– What are the five best tutorials that you would recommend for students learning data visualization in R? As a data visualization person do you feel that R could do with more video tutorials?
Hadley– If you want to learn about ggplot2, I’d highly recommend the following two resources:
For general data management and manipulation (often needed before you can visualise data) and visualisation using base graphics, Quick-R (http://www.statmethods.net/) is very useful.
Local useR groups can be an excellent if you live nearby. Lately, the bay area (http://www.meetup.com/R-Users/) and the New York (http://www.meetup.com/nyhackr/) useR groups have had some excellent speakers on visualisation, and they often post slides and videos online.
Ajay– What are your personal hobbies? How important are work-life balance and serendipity for creative, scientific and academic people?
Hadley– When I’m not working, I enjoy reading and cooking. I find it’s important to take regular breaks from my research and software development work. When I come back I’m usually bursting with new ideas. Two resources that have helped shape my views on creativity and productivity are Elizabeth’s Gilbert TED talk on nurturing creativity (http://www.ted.com/index.php/talks/elizabeth_gilbert_on_genius.html) and
“The Creative Habit: Learn It and Use It for Life”, by Twyla Twarp (http://amzn.com/0743235266). I highly recommend both of them.
Dr Wickham’s impressive biography can be best seen at http://had.co.nz/