Cricinfo StatsGuru Database for Statistical and Graphical Analysis

Data from the ESPN Cricinfo website is available from the STATSGURU website.

The url is of the form-

http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;team=6;template=results;type=batting

http://stats.espncricinfo.com/ci/engine/stats/index.html?

class=1;team=6;template=results;type=batting

If you break down this URL to get more statistics on cricket, you can choose the following parameters.
class
1=Test
2=ODI
3=T20I
11=Test+ODI+T20I
team
1=England
2=Australia
3=South America
4-West Indies
5=New Zealand
6=India ,7=Pakistan and 8=Sri Lanka

type
batting
bowling
fielding
allround
fow
official
team
aggregate

 

ESPN Terms of Use are here-you may need to  check this before trying any web scraping.

http://www.espncricinfo.com/ci/content/site/company/terms_use.html

 

However ESPN has unleashed the API (including both free and premium)for Developers at http://developer.espn.com/docs.

and especially these sports http://developer.espn.com/docs/headlines#parameters

/sports News across all sports/sections
/sports/baseball/mlb Major League Baseball (MLB)
/sports/basketball/mens-college-basketball NCAA Men’s College Basketball
/sports/basketball/nba National Basketball Association (NBA)
/sports/basketball/wnba Women’s National Basketball Association (WNBA)
/sports/basketball/womens-college-basketball NCAA Women’s College Basketball
/sports/boxing Boxing
/sports/football/college-football NCAA College Football
/sports/football/nfl National Football League (NFL)
/sports/golf Golf
/sports/hockey/nhl National Hockey League (NHL)
/sports/horse-racing Horse Racing
/sports/mma Mixed Martial Arts
/sports/racing Auto Racing
/sports/racing/nascar NASCAR Racing
/sports/soccer Professional soccer (US focus)
/sports/tennis Tennis

 

I wonder when this can be enabled for Cricket as well (including APIs  free,academic,premium,partner ).

(Note you can use R packages XML , RCurl , rjson, to get data from the web among others).

Plotting is best done using ggplot2 http://had.co.nz/ggplot2/ or d3.js at http://mbostock.github.com/d3/, and the current status of cricket graphics can surely look a change- they are mostly a single radial plot of shots played /runs scored or a combined barplot/line graph.

Google introduces Google Play

Some nice new features from the big G men from Mountain view. Google Play- for movies, games, apps, music and books. Nice to see entertainment is back on Google’s priority.

 

See this to read more

https://play.google.com/about/

When will I get Google Play?

About Google Play

Q: What is Google Play?
A: Google Play is a new digital content experience from Google where you can find your favorite music, movies, books, and Android apps and games. It’s your entertainment hub: you can access it from the web or from your Android device or even TV, and all your content is instantly available across all of these devices.

Q: What is your strategy with Google Play?
A: Our goal with Google Play is to bring together all your favorite content in one place that you can access across your devices. Specifically, digital content is fundamental to the mobile experience, so bringing all of this content together in one place for users makes the Android platform even more compelling. We’re also simplifying digital content for Google users – you can go to the Google Play website on your desktop and purchase and experience the latest movies, music and books. With Google Play, we’re giving you a simpler way to get your digital content.

Q: What will the experience be for users? What will happen to my existing account?
A: All content and apps in your existing account will remain in your account, but will transition to Google Play. On your device, the Android Market app icon will become the Google Play store icon. You’ll see “Play Store.” For the movies, books and music apps, you’ll begin to see Play versions of these as well, such as “Play Music,” and “Play Movies.”

Q: When will I get Google Play? What markets is this available in?
A: We’ll be rolling out Google Play globally starting today. On the web, Google Play will be live today. On devices, it will take a few days for the Android Market app to update to the Google Play Store app. The music, books and movies apps will also receive an update today.
Around the globe, Google Play will include Android apps and games. In countries where we have already launched music, books or movies, you will see those categories available in Google Play, too.

Q: I live outside the US. When will I get the books, music or movies verticals? I only see Android apps and games?
A: We want to bring different content categories to as many countries as possible. We’ve already launched movies and books in several countries outside the U.S. and will continue to do so overtime, but we don’t have a specific timeline to share.

Q: What types of content are available in my country?

  • Paid Apps: Available in these countries
  • Movies: Available in US, UK, Canada, and Japan
  • eBooks: Available in US, UK, Canada, and Australia
  • Music: Available in US

 

Q: Does this mean Google Music and the Google eBookstore will cease to exist? What about my account?
A: Both Google Music and the Google eBookstore are now part of Google Play. Your music and your books, including anything you bought, are still there, available to you in Google Play and accessible through your Google account.

Q: Where did my Google eBooks books go? Will I still have access to them?
A: Your books are now part of Google Play. Your books are still there, available to you in your Google Play library and accessible through your Google account.

Q: I don’t use an Android phone, can I still use Google Play?
A: Yes. Google Play is available on any computer with a modern browser at play.google.com. On the web, you can browse and buy books, movies and music. You can read books on the Google Play web reader, listen to music on your computer or watch movies online. Your digital content is all stored in the cloud, so you can access from anywhere using your Google Account.
We’ve also created ways to experience your music and books on other platforms such as the Google Books iOS app.

Q: Why do I not see Google Play yet on my device?
A: Please see our help center article on this here.

Q: How can I contact Google Play consumer support?
A: You can call or email our team here.

Using Google Analytics API with R:dimensions and metrics

I modified the query I wrote earlier  at http://www.decisionstats.com/using-google-analytics-with-r/to get multiple dimensions and metrics from the Google Analytics API, like hour of day,day of week to get cyclical parameters.We are adding the dimensions, and metrics to bring more depth in our analysis.Basically we are trying to do a time series analysis for forecasting web analytics data( which is basically time -stamped and rich in details ).

Basically I am modifying the dimensions and metrics parameters of the query code using the list at

http://code.google.com/apis/analytics/docs/gdata/dimsmets/dimsmets.html

 

query <- QueryBuilder()
query$Init(start.date = "2011-08-20",
                   end.date = "2012-08-25",
                   dimensions = c("ga:date","ga:hour","ga:dayOfWeek"),
                   metrics = c("ga:visitors","ga:visits","ga:pageviews","ga:timeOnSite"),
                   sort = c("ga:date","ga:hour","ga:dayOfWeek"),
                   table.id = paste(profiles$profile[3,3]))

#5. Make a request to get the data from the API 

ga.data <- ga$GetReportData(query)

#6. Look at the returned data 

str(ga.data)

head(ga.data$data) 

and we need the lubridate package to create a ymd:hour (time stamp)    since GA gives data aggregated at a hourly level at most. Also we need to smoothen the effect of weekend on web analytics data.

#Using package lubridate to convert character dates into time

library(lubridate)
ga.data$data[,1]=ymd(ga.data$data[,1])
ls()
dataset1=ga.data$data
names(dataset1) <- make.names(names(dataset1))
str(dataset1)
head(dataset1)

To be continued-

Web Analytics using R , Google Analytics and TS Forecasting

This is a continuation of the previous post on using Google Analytics .

Now that we have downloaded and plotted the data- we try and fit time series to the website data to forecast future traffic.

Some observations-

1) Google Analytics has 0 predictive analytics, it is just descriptive analytics and data visualization models (including the recent social analytics). However you can very well add in basic TS function using R to the GA API.

Why do people look at Website Analytics? To know today’s traffic and derive insights for the Future

2) Web Data clearly follows a 7 day peak and trough for weekly effects (weekdays and weekends), this is also true for hourly data …and this can be used for smoothing historic web data for future forecast.

3) On an advanced level, any hugely popular viral posts can be called a level shift (not drift) and accoringly dampened.

Test and Control!

Similarly using ARIMAX, we can factor in quantity and tag of posts as X regressor variables.

and now the code-( dont laugh at the simplicity please, I am just tinkering and playing with data here!)

You need to copy and paste the code at the bottom of   this post  http://www.decisionstats.com/using-google-analytics-with-r/ if you want to download your GA data down first.

Note I am using lubridate ,forecast and timeSeries packages in this section.

#Plotting the Traffic  plot(ga.data$data[,2],type="l") 

library(timeSeries)
library(forecast)

#Using package lubridate to convert character dates into time
library(lubridate)
ga.data$data[,1]=ymd(ga.data$data[,1])
ls()
dataset1=ga.data$data
names(dataset1) <- make.names(names(dataset1))
str(dataset1)
head(dataset1)
dataset2 <- ts(dataset1$ga.visitors,start=0,frequency = frequency(dataset1$ga.visitors), names=dataset1$ga.date)
str(dataset2)
head(dataset2)
ts.test=dataset2[1:200]
ts.control=dataset2[201:275]

 #Note I am splitting the data into test and control here

fitets=ets(ts.test)
plot(fitets)
testets=ets(ts.control,model=fitets)
accuracy(testets)
plot(testets)
spectrum(ts.test,method='ar')
decompose(ts.test)

library("TTR")
bb=SMA(dataset2,n=7)#We are doing a simple moving average for every 7 days. Note this can be 24 hrs for hourly data, or 30 days for daily data for month # 

to month comparison or 12 months for annual
#We notice that Web Analytics needs sommethening for every 7 thday as there is some relation to traffic on weekedays /weekends /same time last week
head(dataset2,40)
head(bb,40)

par(mfrow=c(2,1))
plot(bb,type="l",main="Using Seven Day Moving Average for Web Visitors")
plot(dataset2,main="Original Data")

Created by Pretty R at inside-R.org

Though I still wonder why the R query, gA R code /package could not be on the cloud (why it  needs to be downloaded)– cloud computing Gs?

Also how about adding some MORE predictive analytics to Google Analytics, chaps!

To be continued-

auto.arima() and forecasts!!!

cross validations!!!

and adapting the idiosyncratic periods and cycles  of web analytics to time series !!

Interview: Hjálmar Gíslason, CEO of DataMarket.com

Here is an interview with Hjálmar Gíslason, CEO of Datamarket.com  . DataMarket is an active marketplace for structured data and statistics. Through powerful search and visual data exploration, DataMarket connects data seekers with data providers.

 

Ajay-  Describe your journey as an entrepreneur and techie in Iceland. What are the 10 things that surprised you most as a tech entrepreneur.

HG- DataMarket is my fourth tech start-up since at age 20 in 1996. The previous ones have been in gaming, mobile and web search. I come from a technical background but have been moving more and more to the business side over the years. I can still prototype, but I hope there isn’t a single line of my code in production!

Funny you should ask about the 10 things that have surprised me the most on this journey, as I gave a presentation – literally yesterday – titled: “9 things nobody told me about the start-up business”

They are:
* Do NOT generalize – especially not to begin with
* Prioritize – and find a work-flow that works for you
* Meet people – face to face
* You are a sales person – whether you like it or not
* Technology is not a product – it’s the entire experience
* Sell the current version – no matter how amazing the next one is
* Learn from mistakes – preferably others’
* Pick the right people – good people is not enough
* Tell a good story – but don’t make them up

I obviously elaborate on each of these points in the talk, but the points illustrate roughly some of the things I believe I’ve learned … so far 😉

9 things nobody told me about the start-up business

Ajay-

Both Amazon  and Google  have entered the public datasets space. Infochimps  has 14,000+ public datasets. The US has http://www.data.gov/

So clearly the space is both competitive and yet the demand for public data repositories is clearly under served still. 

How does DataMarket intend to address this market in a unique way to differentiate itself from others.

HG- DataMarket is about delivering business data to decision makers. We help data seekers find the data they need for planning and informed decision making, and data publishers reaching this audience. DataMarket.com is the meeting point, where data seekers can come to find the best available data, and data publishers can make their data available whether for free or for a fee. We’ve populated the site with a wealth of data from public sources such as the UN, Eurostat, World Bank, IMF and others, but there is also premium data that is only available to those that subscribe to and pay for the access. For example we resell the entire data offering from the EIU (Economist Intelligence Unit) (link: http://datamarket.com/data/list/?q=provider:eiu)

DataMarket.com allows all this data to be searched, visualized, compared and downloaded in a single place in a standard, unified manner.

We see many of these efforts not as competition, but as valuable potential sources of data for our offering, while others may be competing with parts of our proposition, such as easy access to the public data sets.

 

Ajay- What are your views on data confidentiality and access to data owned by Governments funded by tax payer money.

HG- My views are very simple: Any data that is gathered or created for taxpayers’ money should be open and free of charge unless higher priorities such as privacy or national security indicate otherwise.

Reflecting that, any data that is originally open and free of charge is still open and free of charge on DataMarket.com, just easier to find and work with.

Ajay-  How is the technology entrepreneurship and venture capital scene in Iceland. What things work and what things can be improved?

HG- The scene is quite vibrant, given the small community. Good teams with promising concepts have been able to get the funding they need to get started and test their footing internationally. When the rapid growth phase is reached outside funding may still be needed.

There are positive and negative things about any location. Among the good things about Iceland from the stand point of a technology start-up are highly skilled tech people and a relatively simple corporate environment. Among the bad things are a tiny local market, lack of skills in international sales and marketing and capital controls that were put in place after the crash of the Icelandic economy in 2008.

I’ve jokingly said that if a company is hot in the eyes of VCs it would get funding even if it was located in the jungles of Congo, while if they’re only lukewarm towards you, they will be looking for any excuse not to invest. Location can certainly be one of them, and in that case being close to the investor communities – physically – can be very important.

We’re opening up our sales and marketing offices in Boston as we speak. Not to be close to investors though, but to be close to our market and current customers.

Ajay- Describe your hobbies when you are not founding amazing tech startups.

HG- Most of my time is spent working – which happens to by my number one hobby.

It is still important to step away from it all every now and then to see things in perspective and come back with a clear mind.

I *love* traveling to exotic places. Me and my wife have done quite a lot of traveling in Africa and S-America: safari, scuba diving, skiing, enjoying nature. When at home I try to do some sports activities 3-4 times a week at least, and – recently – play with my now 8 month old son as much as I can.

About-

http://datamarket.com/p/about/team/

Management

Hjalmar GislasonHjálmar Gíslason, Founder and CEO: Hjalmar is a successful entrepreneur, founder of three startups in the gaming, mobile and web sectors since 1996. Prior to launching DataMarket, Hjalmar worked on new media and business development for companies in the Skipti Group (owners of Iceland Telecom) after their acquisition of his search startup – Spurl. Hjalmar offers a mix of business, strategy and technical expertise. DataMarket is based largely on his vision of the need for a global exchange for structured data.

hjalmar.gislason@datamarket.com

To know more, have a quick  look at  http://datamarket.com/

Cloud Computing – can be evil

Cloud Computing can be evil because-

1) Most browsers are owned by for profit corporations . Corporations can be evil, sometimes

And corporations can go bankrupt. You can back up data locally, but try backing up a corporation.

2) The content on your web page can be changed using translator extensions . This has interesting ramifications as in George Orwell. You may not be even aware of subtle changes introduced in your browser in the way it renders the html or some words using keywords from a browser extension app.

Imagine a new form of language called Politically Correct Truthspeak, and that can be in English but using machine learning learn to substitute politically sensitive words with Govt sanctioned words.

3) Your DNS and IP settings can be redirected using extensions. This means if a Govt passes a law- you can be denied the websites using just the browser not even the ISP.

Thats an extreme scenario for a authoritative govt creating its own version of Mafiaafire Redirector.

So how to keep the cloud computer honest?Move some stuff to the desktop

How to keep desktop computing efficient?Use some more cloud computing

It is not an OR but an AND function in which some computing can be local, some shared and some in the cloud.

Si?

How to use Bit Torrents

I really liked the software Qbittorent available from http://www.qbittorrent.org/ I think bit torrents should be the default way of sharing huge content especially software downloads. For protecting intellectual property there should be much better codes and software keys than presently available.

The qBittorrent project aims to provide a Free Software alternative to µtorrent. Additionally, qBittorrent runs and provides the same features on all major platforms (Linux, Mac OS X, Windows, OS/2, FreeBSD).

qBittorrent is based on Qt4 toolkit and libtorrent-rasterbar.

qBittorrent v2 Features

  • Polished µTorrent-like User Interface
  • Well-integrated and extensible Search Engine
    • Simultaneous search in most famous BitTorrent search sites
    • Per-category-specific search requests (e.g. Books, Music, Movies)
  • All Bittorrent extensions
    • DHT, Peer Exchange, Full encryption, Magnet/BitComet URIs, …
  • Remote control through a Web user interface
    • Nearly identical to the regular UI, all in Ajax
  • Advanced control over trackers, peers and torrents
    • Torrents queueing and prioritizing
    • Torrent content selection and prioritizing
  • UPnP / NAT-PMP port forwarding support
  • Available in ~25 languages (Unicode support)
  • Torrent creation tool
  • Advanced RSS support with download filters (inc. regex)
  • Bandwidth scheduler
  • IP Filtering (eMule and PeerGuardian compatible)
  • IPv6 compliant
  • Sequential downloading (aka “Download in order”)
  • Available on most platforms: Linux, Mac OS X, Windows, OS/2, FreeBSD
So if you are new to Bit Torrents- here is a brief tutorial
Some terminology from

Tracker

tracker is a server that keeps track of which seeds and peers are in the swarm.

Seed

Seed is used to refer to a peer who has 100% of the data. When a leech obtains 100% of the data, that peer automatically becomes a Seed.

Peer

peer is one instance of a BitTorrent client running on a computer on the Internet to which other clients connect and transfer data.

Leech

leech is a term with two meanings. Primarily leech (or leeches) refer to a peer (or peers) who has a negative effect on the swarm by having a very poor share ratio (downloading much more than they upload, creating a ratio less than 1.0)
1) Download and install the software from  http://www.qbittorrent.org/
2) If you want to search for new files, you can use the nice search features in here
3) If you want to CREATE new bit torrents- go to Tools -Torrent Creator
4) For sharing content- just seed the torrent you just created. What is seeding – hey did you read the terminology in the beginning?
5) Additionally –
From

Trackers: Below are some popular public trackers. They are servers which help peers to communicate.

Here are some good trackers you can use:

 

http://open.tracker.thepiratebay.org/announce
http://www.torrent-downloads.to:2710/announce
http://denis.stalker.h3q.com:6969/announce
udp://denis.stalker.h3q.com:6969/announce
http://www.sumotracker.com/announce

and

Super-seeding

When a file is new, much time can be wasted because the seeding client might send the same file piece to many different peers, while other pieces have not yet been downloaded at all. Some clients, like ABCVuzeBitTornado, TorrentStorm, and µTorrent have a “super-seed” mode, where they try to only send out pieces that have never been sent out before, theoretically making the initial propagation of the file much faster. However the super-seeding becomes less effective and may even reduce performance compared to the normal “rarest first” model in cases where some peers have poor or limited connectivity. This mode is generally used only for a new torrent, or one which must be re-seeded because no other seeds are available.
Note- you use this tutorial and any or all steps at your own risk. I am not legally responsible for any mishaps you get into. Please be responsible while being an efficient bit tor renter. That means respecting individual property rights.