Here is an interview of the Dr Damien Farrell creator of an interesting Python GUI with some data science flavors called DataExplore. Of course R has many Data Analysis GUI like R Commander, Deducer, Rattle which we have all featured on this site before. Hopefully there can be cross pollination of ideas on GUI design for Data Science in Python/ pydata community.
A- What solution does DataExplore provide to data scientists?
D- It’s not really meant for data scientists specifically. It is targeted towards scientists and students who want to do some analysis but cannot yet code. R-studio is the closest comparison. That’s a very good tool and much more comprehensive but it still does require you know the R language. So there is a bit of a learning curve. I was looking to make something that allows you to manipulate data usefully but with minimal coding knowledge. You could see this as an intermediate between a spreadsheet and using something like R-studio or R commander. Ultimately there is no replacement for being able to write your own code but this could serve as a kind of gateway to introduced the concepts involved. It is also a good way to quickly explore and plot your data and could be seen as complimentary to other tools.
A- What were your motivations for making pandastable/DataExplore?
D- Non-computational scientists are sometimes very daunted by the prospect of data analysis. People who work as wet lab scientists in particular often do not see themselves capable of substantial analysis even though they are well able to do it. Nowadays they are presented with a lot of sometimes heterogeneous data and it is intimidating if you cannot code. Obviously advanced analysis requires programming skills that take time to learn but there is no reason that some comprehensive analysis can’t be done using the right tools. Data ‘munging’ is one skill that is not easily accessible to the non programmer and that must be frustrating. Traditionally the focus is on either using a spreadsheet which can be very limited or plotting with commercial tools like prism. More difficult tasks are passed on to the specialists. So my motivation is to provide something that bridges the data manipulation and plotting steps and allows data to be handled more confidently by a ‘non-data analyst’.
A- What got you into data science and python development. Describe your career journey so far
D- I currently work as a postdoctoral researcher in bovine and pathogen genomics though I am not a biologist. I came from outside the field from a computer science and physics background. When I got the chance to do a PhD in a research group doing structural biology I took the opportunity and stayed in biology. I only started using Python about 7 years ago and use it for nearly everything. I suppose I do what is now called bioinformatics but the term doesn’t tell you very much in my opinion. In any case I find myself doing a lot of general data analysis.
Early on I developed end user tools in Python but they weren’t that successful since it’s so hard to create a user base in a niche area. I thought I would try something more general this time. I started using Pandas a few years ago and find it pretty indispensable now. Since the pydata stack is quite mature and has a large user community I thought using these libraries as a front-end to a desktop application would be an interesting project.
A-What is your roadmap or plans in future for pandastable?
D- pandastable is the name of the library because it’s a widget for Tkinter that provides a graphical view for a pandas dataframe. DataExplore is then the desktop application based around that. This is a work in progress and really a side project. Hopefully there will be some uptake and then it’s up to users to decide what they want out of it. You can only go so far in guessing what people might find useful or even easy to use. There is a plugin system which makes it easy to add arbitrary functionality if you know Python, so that could be one avenue of development. I implemented this tool in the rather old Tkinter GUI toolkit and whilst quite functional it has certain limitations. So updating to use Qt5 might be an option. Although the fashion is for web applications I think there is still plenty of scope for desktop tools.
A- How can we teach data science to more people in easier way to reduce the demand-supply gap for data scientists?
D- A can’t speak about business, but in science teaching has certainly lagged behind the technology. I don’t know about other fields, but in molecular biology we are now producing huge amounts of data because something like sequencing has developed so rapidly. This is hard to avoid in research. Probably the concepts need to be introduced early on in undergraduate level so that PhD students don’t come to data analysis cold. In biological sciences I think postgraduate programs are slowly adapting to allow training in wet and dry lab disciplines.
Dr. Damien Farrell is Postdoctoral fellow of School of Veterinary Medicine at University College Dublin Ireland. The download page for the dataexplore app is : http://dmnfarrell.github.io/pandastable/