Home » Posts tagged 'google analytics'
Tag Archives: google analytics
Due to changes in Google APIs my earlier post on using Google Analytics in R is deprecated. Unfortunately it is still on top 10 results for Google results for Using Google Analytics with R.
That post is here http://decisionstats.com/2012/03/20/using-google-analytics-with-r/
A more updated R package on Google Analytics and R is here . https://github.com/skardhamar/rga
A better updated post on an easy to use tutorial on using Google Analytics with R using OAuth 2 playground is here.
- Set the Google analytics query parameters for preparing the request URI
- Get the access token from Oauth 2.0 Playground
- Retrieve and select the Profile
- Retrieving GA data
Note it is excellent for learning to use RJSON method as well. You can see the details on the Tatvic blog above.
Hat tip- Vignesh Prajapati
- (not provided): Using R and the Google Analytics API (r-bloggers.com)
The Analytics (or stats) dashboard at WordPress.com continues to disappoint, and is a major reason for people to move out of WordPress.com hosting (since they need better analytics like that by Google Analytics which cant be enabled on the default mode)
Its not really beautiful unlike the rest of WordPress Universe!
It can be made better if people try harder! Analytics matters
Here are some points
1) Bar charts and Histograms are not really the best way to visualize trends across time
4) I cant even export my traffic stats (and forget an api !) so I am stuck with the bad data viz here
I modified the query I wrote earlier at http://www.decisionstats.com/using-google-analytics-with-r/to get multiple dimensions and metrics from the Google Analytics API, like hour of day,day of week to get cyclical parameters.We are adding the dimensions, and metrics to bring more depth in our analysis.Basically we are trying to do a time series analysis for forecasting web analytics data( which is basically time -stamped and rich in details ).
Basically I am modifying the dimensions and metrics parameters of the query code using the list at
query <- QueryBuilder() query$Init(start.date = "2011-08-20", end.date = "2012-08-25", dimensions = c("ga:date","ga:hour","ga:dayOfWeek"), metrics = c("ga:visitors","ga:visits","ga:pageviews","ga:timeOnSite"), sort = c("ga:date","ga:hour","ga:dayOfWeek"), table.id = paste(profiles$profile[3,3])) #5. Make a request to get the data from the API ga.data <- ga$GetReportData(query) #6. Look at the returned data str(ga.data) head(ga.data$data)
and we need the lubridate package to create a ymd:hour (time stamp) since GA gives data aggregated at a hourly level at most. Also we need to smoothen the effect of weekend on web analytics data.
#Using package lubridate to convert character dates into time
To be continued-
This is a continuation of the previous post on using Google Analytics .
Now that we have downloaded and plotted the data- we try and fit time series to the website data to forecast future traffic.
1) Google Analytics has 0 predictive analytics, it is just descriptive analytics and data visualization models (including the recent social analytics). However you can very well add in basic TS function using R to the GA API.
Why do people look at Website Analytics? To know today’s traffic and derive insights for the Future
2) Web Data clearly follows a 7 day peak and trough for weekly effects (weekdays and weekends), this is also true for hourly data …and this can be used for smoothing historic web data for future forecast.
3) On an advanced level, any hugely popular viral posts can be called a level shift (not drift) and accoringly dampened.
Test and Control!
Similarly using ARIMAX, we can factor in quantity and tag of posts as X regressor variables.
and now the code-( dont laugh at the simplicity please, I am just tinkering and playing with data here!)
You need to copy and paste the code at the bottom of this post http://www.decisionstats.com/using-google-analytics-with-r/ if you want to download your GA data down first.
Note I am using lubridate ,forecast and timeSeries packages in this section.
#Plotting the Traffic plot(ga.data$data[,2],type="l")
#Using package lubridate to convert character dates into time library(lubridate) ga.data$data[,1]=ymd(ga.data$data[,1]) ls() dataset1=ga.data$data names(dataset1) <- make.names(names(dataset1)) str(dataset1) head(dataset1) dataset2 <- ts(dataset1$ga.visitors,start=0,frequency = frequency(dataset1$ga.visitors), names=dataset1$ga.date) str(dataset2) head(dataset2) ts.test=dataset2[1:200] ts.control=dataset2[201:275] #Note I am splitting the data into test and control here fitets=ets(ts.test) plot(fitets) testets=ets(ts.control,model=fitets) accuracy(testets) plot(testets) spectrum(ts.test,method='ar') decompose(ts.test) library("TTR") bb=SMA(dataset2,n=7)#We are doing a simple moving average for every 7 days. Note this can be 24 hrs for hourly data, or 30 days for daily data for month # to month comparison or 12 months for annual #We notice that Web Analytics needs sommethening for every 7 thday as there is some relation to traffic on weekedays /weekends /same time last week head(dataset2,40) head(bb,40) par(mfrow=c(2,1)) plot(bb,type="l",main="Using Seven Day Moving Average for Web Visitors") plot(dataset2,main="Original Data")
Though I still wonder why the R query, gA R code /package could not be on the cloud (why it needs to be downloaded)– cloud computing Gs?
Also how about adding some MORE predictive analytics to Google Analytics, chaps!
To be continued-
auto.arima() and forecasts!!!
and adapting the idiosyncratic periods and cycles of web analytics to time series !!
Some code to read in data from Google Analytics data. Some modifications include adding the SSL authentication code and modifying (in bold) the table.id parameter to choose correct website from a GA profile with many websites
The Google Analytics Package files can be downloaded from http://code.google.com/p/r-google-analytics/downloads/list
It provides access to Google Analytics data natively from the R Statistical Computing programming language. You can use this library to retrieve an R data.frame with Google Analytics data. Then perform advanced statistical analysis, like time series analysis and regressions.
- Access to v2 of the Google Analytics Data Export API Data Feed
- A QueryBuilder class to simplify creating API queries
- API response is converted directly into R as a data.frame
- Library returns the aggregates, and confidence intervals of the metrics, dynamically if they exist
- Auto-pagination to return more than 10,000 rows of information by combining multiple data requests. (Upper Limit 1M rows)
- Authorization through the ClientLogin routine
- Access to all the profiles ids for the authorized user
- Full documentation and unit tests
Loading required package: bitops
> #Change path name in the following to the folder you downloaded the Google Analytics Package
> # download the file needed for authentication
> download.file(url=”http://curl.haxx.se/ca/cacert.pem”, destfile=”cacert.pem”)
trying URL ‘http://curl.haxx.se/ca/cacert.pem’ Content type ‘text/plain’ length 215993 bytes (210 Kb) opened
URL downloaded 210 Kb
> # set the curl options
> curl <- getCurlHandle()
> options(RCurlOptions = list(capath = system.file(“CurlSSL”, “cacert.pem”,
+ package = “RCurl”),
+ ssl.verifypeer = FALSE))
> curlSetOpt(.opts = list(proxy = ‘proxyserver:port’), curl = curl)
An object of class “CURLHandle” Slot “ref”: <pointer: 0000000006AA2B70>
> # 1. Create a new Google Analytics API object
> ga <- RGoogleAnalytics()
> # 2. Authorize the object with your Google Analytics Account Credentials
> ga$SetCredentials(“USERNAME”, “PASSWORD”)
> # 3. Get the list of different profiles, to help build the query
> profiles <- ga$GetProfileData()
> profiles #Error Check to See if we get the right website
$profile AccountName ProfileName TableId
1 dudeofdata.com dudeofdata.com ga:44926237
2 knol.google.com knol.google.com ga:45564890
3 decisionstats.com decisionstats.com ga:46751946
> # 4. Build the Data Export API query
> #Modify the start.date and end.date parameters based on data requirements
> #Modify the table.id at table.id = paste(profiles$profile[X,3]) to get the X th website in your profile
> # 4. Build the Data Export API query
> query <- QueryBuilder() > query$Init(start.date = “2012-01-09″, + end.date = “2012-03-20″, + dimensions = “ga:date”,
+ metrics = “ga:visitors”,
+ sort = “ga:date”,
+ table.id = paste(profiles$profile[3,3]))
> #5. Make a request to get the data from the API
> ga.data <- ga$GetReportData(query)
> #6. Look at the returned data
List of 3
$ data :’data.frame’: 72 obs. of 2 variables: ..
$ ga:date : chr [1:72] “20120109″ “20120110″ “20120111″ “20120112″ … ..
$ ga:visitors: num [1:72] 394 405 381 390 323 47 169 67 94 89 …
$ aggr.totals :’data.frame’: 1 obs. of 1 variable: ..
$ aggregate.totals: num 28348
$ total.results: num 72
1 20120109 394
2 20120110 405
3 20120111 381
4 20120112 390
5 20120113 323
6 20120114 47 >
> #Plotting the Traffic >
Update- Some errors come from pasting Latex directly to WordPress. Here is some code , made pretty-r in case you want to play with the GA api
library(XML) library(RCurl) #Change path name in the following to the folder you downloaded the Google Analytics Package source("C:/Users/KUs/Desktop/CANADA/R/RGoogleAnalytics/R/RGoogleAnalytics.R") source("C:/Users/KUs/Desktop/CANADA/R/RGoogleAnalytics/R/QueryBuilder.R") # download the file needed for authentication download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem") # set the curl options curl <- getCurlHandle() options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"), ssl.verifypeer = FALSE)) curlSetOpt(.opts = list(proxy = 'proxyserver:port'), curl = curl) # 1. Create a new Google Analytics API object ga <- RGoogleAnalytics() # 2. Authorize the object with your Google Analytics Account Credentials ga$SetCredentials("email@example.com", "XXXXXXX") # 3. Get the list of different profiles, to help build the query profiles <- ga$GetProfileData() profiles #Error Check to See if we get the right website # 4. Build the Data Export API query #Modify the start.date and end.date parameters based on data requirements #Modify the table.id at table.id = paste(profiles$profile[X,3]) to get the X th website in your profile # 4. Build the Data Export API query query <- QueryBuilder() query$Init(start.date = "2012-01-09", end.date = "2012-03-20", dimensions = "ga:date", metrics = "ga:visitors", sort = "ga:date", table.id = paste(profiles$profile[3,3])) #5. Make a request to get the data from the API ga.data <- ga$GetReportData(query) #6. Look at the returned data str(ga.data) head(ga.data$data) #Plotting the Traffic plot(ga.data$data[,2],type="l")