To cap off a wonderful year, we have decided to interview Karl Rexer , founder of http://www.rexeranalytics.com/ and of the data mining survey that is considered the Industry benchmark for the state of the industry in analytics.
Ajay: Describe the history behind doing the survey , how you came up with the idea and what all players do you think survey the data mining and statistical software market apart from you
Karl: Since the early 2000s I’ve been involved on the organizing and review committees for several data mining conferences and workshops. Early in the 2000s, in the hallways at these conferences I heard many analytic practitioners discussing and comparing their algorithms, data sources, challenges, tools, etc. Since we were already conducting online surveys for several of our clients, and my network of data miners is pretty large, I realized that we could easily do a survey of data miners, and share the results with the data mining community. I saw that the gap was there (and the interest), and we could help fill it. It was a way to give back to the data mining community, and also to raise awareness in the marketplace for my company, Rexer Analytics. So in 2007 we launched the first Data Miner Survey. In the first year, 314 data miners participated, and it’s just grown from there. In each of the last two surveys, over 1200 people participated. The interest we’ve seen in our research summary reports has also been astounding – we get thousands of requests for the summary reports each year. Overall, this just confirms what we originally thought: both inside the industry and beyond, people are hungry for information about data mining.
Are there other surveys and reviews of analytic professionals and the analytic marketplace? Sure. And there’s room for a variety of methodologies and perspectives. Forester and Gartner produce several reports that cover the analytic marketplace – they largely focus on software evaluations and IT trends. There are also surveys of CIOs and IT professionals that sometimes cover analytic topics. James Taylor (Decision Management Solutions) conducted an interesting study this year of Predictive Analytics in the Cloud. And of course, there are also the KDnuggets single-question polls that provide a pulse on people’s views of topical issues.
Ajay: Over the years- what broad trends have you seen in the survey in terms of paradigms- name your top 5 insights over these years
Karl: Well, I can’t think of a fifth one, but I’ve got four key findings and trends we’ve seen over the years we’ve been doing the Data Miner Surveys:
- The dramatic rise of open-source data mining tools, especially R. Since 2010, R has been the most-used data mining tool. And in 2013, 70% of data miners report using R. R is frequently used along with other tools, but we also see an increasing number of data miners selecting R as their primary tool.
- Data miners consistently report that regression, decision trees, and cluster analysis are the key algorithms they turn to. Each of the surveys, from 2007 through 2013, has shown this same core triad of algorithms.
- The challenges data miners face are also consistent: Across multiple years, the #1 challenge data miners report has been “dirty data.”. The other top challenges are “explaining data mining to others” and “difficult access to data”. In response to the 2010 survey, data miners described their best practices in overcoming these three key challenges. A summary of their ideas is available on our website here: http://www.rexeranalytics.com/Overcoming_Challenges.html. And three linked “challenge” pages contain almost 200 verbatim best practice ideas collected from survey respondents.
- We also see that there is excitement among analytic professionals, high job satisfaction, and room for more and better analytics. People report that the number of analytic projects is increasing, and the size of analytic teams is increasing too. But still there’s room for much wider and more sophisticated use of analytics – only a minority of data miners consider their companies to be analytically sophisticated.
Ajay: What percentage of people are now doing analytics on the cloud, on mobile, tablet , versus desktop
Karl: In the past few years we’ve seen a doubling in the percent of people who report doing some of their analytics using cloud environments. It’s still the minority of data miners, but it’s grown from 7% in 2010 to 10% in 2011, and 19% reporting using cloud environments in 2013.
Ajay:Your survey is free. How does it help your consulting practice?
Karl: Our main motivation for doing the Data Miner Survey is to contribute to the data mining community. We don’t want to charge a fee for the summary reports, because we want to get the information into as many people’s hands as possible. And we want people to feel free to send the report on to their friends and colleagues.
However, the Data Miner Survey does also help Rexer Analytics. It helps to raise the visibility of our company. It increases the traffic and links to our website, and therefore helps our Google rankings. And it is a great conversation starter.
Ajay: Name some statistics on how popular your survey has become over time- in terms of people filling the survey , and people reading the survey
Karl: In 2007 when we launched the first Data Miner Survey, 314 data miners participated, and it’s grown nicely from there. In each of the last two surveys, over 1200 people participated. The interest we’ve seen in our research summary reports has also been growing at a dramatic rate – recently we’ve been getting thousands of requests for the summary reports each year. Additionally, we have been unveiling the highlights of the surveys with a presentation at the Fall Predictive Analytics World conferences, and it is always a popular talk.
But the most gratifying aspects about the expanded interest in our Data Miner Survey are two things:
- The great conversations that the Data Miner Survey has initiated. I have wonderful conversations with people by phone, email and at conferences and at colleges about the findings, the trends, and about all the great ideas people have for new and exciting ways that they want to apply analytics in their domains – everything from human resource planning to cancer research, and customer retention to fraud detection. And many people have contributed ideas for new questions or topics that we have incorporated into the survey.
- Seeing that people in the data mining community find the survey results useful. Many students and young people entering the field have told us the summary reports provide a great overview of the field and emerging trends. And many software vendors have told us that the survey helps them better understand the needs and preferences of hands-on data mining practitioners. I’m often surprised to see new people and places that are reading and appreciating our survey. We get emails from all corners of the globe, asking questions about the survey, or asking to share it with others. Sometime last year after receiving a question from an academic researcher in Asia, I decided to check Google Scholar to see who is citing the Data Miner Survey in their books and published papers. The list was long. And the list of online news stories, blogs and other mentions of the Data Miner Survey was even longer. I started a list of citations, with links back to the places that are citing the Data Miner Survey – you can look at the list here: http://www.rexeranalytics.com/Data_Miner_Survey_Citations.html – there are over 100 places citing our research, and the list includes 15 languages. But even more surprising was finding that someone had created a Wikipedia entry about the Data Miner Surveys. I made a couple small edits, but then I stopped. The accepted rule in the Wikipedia community is to not edit things that one has a personal interest in. However, I want to encourage any Wikipedia authors out there to go and help update https://en.wikipedia.org/wiki/Rexer%27s_Annual_Data_Miner_Survey.
Ajay -What do you think are the top 3 insightful charts from your 2013 Report
Karl- OK, it’s tough for me to pick only 3. I think that you should pick the three that you think are the most insightful, and then blog about them and the reasons you think they’re important.
But if you want me to pick 3, then here are three good ones:
— R Usage graph on page 16
— Algorithm graph on page 36
— The pair of graphs on page 19 that show that there’s still a lot of room for improvement
Happy new year!
(Ajay- You can see the wonderful report at http://www.rexeranalytics.com/ especially the collection of links in the top right corner of the home page that cite this survey)