Despite the plethora of data generated in Sports, there is not much open data for Olympics and one wonders why if sharing best practices and data openly on what works and what does not can reduce the level of Russian athletes being banned in a cylical cold war era game.
Some links I found useful
http://www.kdnuggets.com/2014/01/data-mining-predict-sochi-winter-olympics-medal-counts.html
Could data mining techniques accurately predict the medal counts at the Olympics? A predictive model could give us an estimate of the number of medals each nation might win; but how close could we get to the actual outcomes? It was a tantalizing project …
By Dan Graettinger with Tim Graettinger
• Which nation will bring home the most medals at the upcoming Winter Olympics in Sochi, Russia?
• Will any nation from Africa, South America, or the Middle East finally break through and win a medal?
• Why do some nations win a bundle of medals while others win only a few?
• Can data mining give us the answers to these questions?
the Graettinger brothers do? They used a seemingly standard methodology: learn from the past to predict the future. More precisely, they used past Olympics results to build a predictive model. Each country is represented by a feature vector, i.e. a set of quantities drawn form several categories:
- Economic
- Population
- Human Development
- Geography
- Religion
- Politics and Freedom
Then they used a standard technique known as linear regression to find which set of features were best for predicting medal count. I was reading their blog post with great interest until I saw what were the most meaningful features found by the linear regression algorithm:
- Geographic area
- GDP per capita
- Value of Exports
- Latitude of Nation’s Capital
and
I was able to find data in many categories:
- Economic
- Population
- Human Development
- Geography
- Religion
- Politics and Freedom
Thankfully, there were some good sources out there[f3], and I collected enough data that I felt I had a good chance to predict some meaningful outcomes. But would it be enough? There is more than one way to go about predicting the medal count at the Olympics, and the route before me was the “30,000 feet” approach.
So any takers?
Hackers for Hacking the Olympics 🙂