Data Science for Olympics and lack of Reproducible Research

Despite the plethora of data generated in Sports, there is not much open data for Olympics and one wonders why if sharing best practices and data openly on what works and what does not can reduce the level of Russian athletes being banned in a cylical cold war era game.

Some links I found useful

http://www.kdnuggets.com/2014/01/data-mining-predict-sochi-winter-olympics-medal-counts.html

Could data mining techniques accurately predict the medal counts at the Olympics? A predictive model could give us an estimate of the number of medals each nation might win; but how close could we get to the actual outcomes? It was a tantalizing project …

Sochi-Ru By Dan Graettinger with Tim Graettinger

• Which nation will bring home the most medals at the upcoming Winter Olympics in Sochi, Russia?

• Will any nation from Africa, South America, or the Middle East finally break through and win a medal?

• Why do some nations win a bundle of medals while others win only a few?

• Can data mining give us the answers to these questions?

and
https://www.ibm.com/developerworks/community/blogs/jfp/entry/data_science_is_hard?lang=en

the Graettinger brothers do? They used a seemingly  standard methodology: learn from the past to predict the future.  More precisely, they used past Olympics results to build a predictive model.  Each country is represented by a feature vector, i.e. a set of quantities drawn form several categories:

  • Economic
  • Population
  • Human Development
  • Geography
  • Religion
  • Politics and Freedom

Then they used a standard technique known as linear regression to find which set of features were best for predicting medal count.  I was reading their blog post with great interest until I saw what were the most meaningful features found by the linear regression algorithm:

  • Geographic area
  • GDP per capita
  • Value of Exports
  • Latitude of Nation’s Capital

and

http://www.discoverycorpsinc.com/winter-olympic-medal-predict_1/?cm_mc_uid=41778113867514699665668&cm_mc_sid_50200000=1469966566

I was able to find data in many categories:

  • Economic
  • Population
  • Human Development
  • Geography
  • Religion
  • Politics and Freedom

Thankfully, there were some good sources out there[f3], and I collected enough data that I felt I had a good chance to predict some meaningful outcomes.  But would it be enough?  There is more than one way to go about predicting the medal count at the Olympics, and the route before me was the “30,000 feet” approach.

So any takers?

Hackers for Hacking the Olympics 🙂

 

 

Author: Ajay Ohri

http://about.me/ajayohri

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s