Using Google Analytics API with R:dimensions and metrics

I modified the query I wrote earlier  at http://www.decisionstats.com/using-google-analytics-with-r/to get multiple dimensions and metrics from the Google Analytics API, like hour of day,day of week to get cyclical parameters.We are adding the dimensions, and metrics to bring more depth in our analysis.Basically we are trying to do a time series analysis for forecasting web analytics data( which is basically time -stamped and rich in details ).

Basically I am modifying the dimensions and metrics parameters of the query code using the list at

http://code.google.com/apis/analytics/docs/gdata/dimsmets/dimsmets.html

 

query <- QueryBuilder()
query$Init(start.date = "2011-08-20",
                   end.date = "2012-08-25",
                   dimensions = c("ga:date","ga:hour","ga:dayOfWeek"),
                   metrics = c("ga:visitors","ga:visits","ga:pageviews","ga:timeOnSite"),
                   sort = c("ga:date","ga:hour","ga:dayOfWeek"),
                   table.id = paste(profiles$profile[3,3]))

#5. Make a request to get the data from the API 

ga.data <- ga$GetReportData(query)

#6. Look at the returned data 

str(ga.data)

head(ga.data$data) 

and we need the lubridate package to create a ymd:hour (time stamp)    since GA gives data aggregated at a hourly level at most. Also we need to smoothen the effect of weekend on web analytics data.

#Using package lubridate to convert character dates into time

library(lubridate)
ga.data$data[,1]=ymd(ga.data$data[,1])
ls()
dataset1=ga.data$data
names(dataset1) <- make.names(names(dataset1))
str(dataset1)
head(dataset1)

To be continued-

Funny Photos- it happens only in India 2

Raj Weds Deepika (with spelling error) and no, its not photoshopped

Translation- No Girlfriend No Tension (back of Truck).Note the amazing spelling in the picture above

Polite reminder- Please do not spit

Ok maybe this maybe an international thing- but it is funny in India (which is hot enough, thank you and our curries are hot enough!)

 environmental sign

—–

Based on the award winning series of pictures at http://www.decisionstats.com/funny-photo-it-happens-only-in-india/

Web Analytics using R , Google Analytics and TS Forecasting

This is a continuation of the previous post on using Google Analytics .

Now that we have downloaded and plotted the data- we try and fit time series to the website data to forecast future traffic.

Some observations-

1) Google Analytics has 0 predictive analytics, it is just descriptive analytics and data visualization models (including the recent social analytics). However you can very well add in basic TS function using R to the GA API.

Why do people look at Website Analytics? To know today’s traffic and derive insights for the Future

2) Web Data clearly follows a 7 day peak and trough for weekly effects (weekdays and weekends), this is also true for hourly data …and this can be used for smoothing historic web data for future forecast.

3) On an advanced level, any hugely popular viral posts can be called a level shift (not drift) and accoringly dampened.

Test and Control!

Similarly using ARIMAX, we can factor in quantity and tag of posts as X regressor variables.

and now the code-( dont laugh at the simplicity please, I am just tinkering and playing with data here!)

You need to copy and paste the code at the bottom of   this post  http://www.decisionstats.com/using-google-analytics-with-r/ if you want to download your GA data down first.

Note I am using lubridate ,forecast and timeSeries packages in this section.

#Plotting the Traffic  plot(ga.data$data[,2],type="l") 

library(timeSeries)
library(forecast)

#Using package lubridate to convert character dates into time
library(lubridate)
ga.data$data[,1]=ymd(ga.data$data[,1])
ls()
dataset1=ga.data$data
names(dataset1) <- make.names(names(dataset1))
str(dataset1)
head(dataset1)
dataset2 <- ts(dataset1$ga.visitors,start=0,frequency = frequency(dataset1$ga.visitors), names=dataset1$ga.date)
str(dataset2)
head(dataset2)
ts.test=dataset2[1:200]
ts.control=dataset2[201:275]

 #Note I am splitting the data into test and control here

fitets=ets(ts.test)
plot(fitets)
testets=ets(ts.control,model=fitets)
accuracy(testets)
plot(testets)
spectrum(ts.test,method='ar')
decompose(ts.test)

library("TTR")
bb=SMA(dataset2,n=7)#We are doing a simple moving average for every 7 days. Note this can be 24 hrs for hourly data, or 30 days for daily data for month # 

to month comparison or 12 months for annual
#We notice that Web Analytics needs sommethening for every 7 thday as there is some relation to traffic on weekedays /weekends /same time last week
head(dataset2,40)
head(bb,40)

par(mfrow=c(2,1))
plot(bb,type="l",main="Using Seven Day Moving Average for Web Visitors")
plot(dataset2,main="Original Data")

Created by Pretty R at inside-R.org

Though I still wonder why the R query, gA R code /package could not be on the cloud (why it  needs to be downloaded)– cloud computing Gs?

Also how about adding some MORE predictive analytics to Google Analytics, chaps!

To be continued-

auto.arima() and forecasts!!!

cross validations!!!

and adapting the idiosyncratic periods and cycles  of web analytics to time series !!

iPhones Siri to replace call centers . Is it doable

I was wondering why the planet  spends  so much money in the $150-billion business process outsourcing industry, especially in voice calls to call centers.

If your iPhone Siri phone can be configured to answer any query, Why can’t it be configured to be  a virtual assistant, customer support, marketing outbound or even a super charged call center interactive voice response .

Can we  do and run some tests on this?

Using Google Analytics with R

Some code to read in data from Google Analytics data. Some modifications include adding the SSL authentication code and modifying (in bold) the table.id parameter to choose correct website from a GA profile with many websites

The Google Analytics Package files can be downloaded from http://code.google.com/p/r-google-analytics/downloads/list

It provides access to Google Analytics data natively from the R Statistical Computing programming language. You can use this library to retrieve an R data.frame with Google Analytics data. Then perform advanced statistical analysis, like time series analysis and regressions.

Supported Features

  • Access to v2 of the Google Analytics Data Export API Data Feed
  • A QueryBuilder class to simplify creating API queries
  • API response is converted directly into R as a data.frame
  • Library returns the aggregates, and confidence intervals of the metrics, dynamically if they exist
  • Auto-pagination to return more than 10,000 rows of information by combining multiple data requests. (Upper Limit 1M rows)
  • Authorization through the ClientLogin routine
  • Access to all the profiles ids for the authorized user
  • Full documentation and unit tests
Code-

> library(XML)

>

> library(RCurl)

Loading required package: bitops

>

> #Change path name in the following to the folder you downloaded the Google Analytics Package

>

> source(“C:/Users/KUs/Desktop/CANADA/R/RGoogleAnalytics/R/RGoogleAnalytics.R”)

>

> source(“C:/Users/KUs/Desktop/CANADA/R/RGoogleAnalytics/R/QueryBuilder.R”)

> # download the file needed for authentication

> download.file(url=”http://curl.haxx.se/ca/cacert.pem&#8221;, destfile=”cacert.pem”)

trying URL ‘http://curl.haxx.se/ca/cacert.pem&#8217; Content type ‘text/plain’ length 215993 bytes (210 Kb) opened

URL downloaded 210 Kb

>

> # set the curl options

> curl <- getCurlHandle()

> options(RCurlOptions = list(capath = system.file(“CurlSSL”, “cacert.pem”,

+ package = “RCurl”),

+ ssl.verifypeer = FALSE))

> curlSetOpt(.opts = list(proxy = ‘proxyserver:port’), curl = curl)

An object of class “CURLHandle” Slot “ref”: <pointer: 0000000006AA2B70>

>

> # 1. Create a new Google Analytics API object

>

> ga <- RGoogleAnalytics()

>

> # 2. Authorize the object with your Google Analytics Account Credentials

>

> ga$SetCredentials(“USERNAME”, “PASSWORD”)

>

> # 3. Get the list of different profiles, to help build the query

>

> profiles <- ga$GetProfileData()

>

> profiles #Error Check to See if we get the right website

$profile AccountName ProfileName TableId

1 dudeofdata.com dudeofdata.com ga:44926237

2 knol.google.com knol.google.com ga:45564890

3 decisionstats.com decisionstats.com ga:46751946

$total.results

total.results

1 3

>

> # 4. Build the Data Export API query

>

> #Modify the start.date and end.date parameters based on data requirements

>

> #Modify the table.id at table.id = paste(profiles$profile[X,3]) to get the X th website in your profile

> # 4. Build the Data Export API query

> query <- QueryBuilder() > query$Init(start.date = “2012-01-09”, + end.date = “2012-03-20”, + dimensions = “ga:date”,

+ metrics = “ga:visitors”,

+ sort = “ga:date”,

+ table.id = paste(profiles$profile[3,3]))

>

>

> #5. Make a request to get the data from the API

>

> ga.data <- ga$GetReportData(query)

[1] “Executing query: https://www.google.com/analytics/feeds/data?start-date=2012%2D01%2D09&end-date=2012%2D03%2D20&dimensions=ga%3Adate&metrics=ga%3Avisitors&sort=ga%3Adate&ids=ga%3A46751946&#8221;

>

> #6. Look at the returned data

>

> str(ga.data)

List of 3

$ data :’data.frame’: 72 obs. of 2 variables: ..

$ ga:date : chr [1:72] “20120109” “20120110” “20120111” “20120112” … ..

$ ga:visitors: num [1:72] 394 405 381 390 323 47 169 67 94 89 …

$ aggr.totals :’data.frame’: 1 obs. of 1 variable: ..

$ aggregate.totals: num 28348

$ total.results: num 72

>

> head(ga.data$data)

ga:date ga:visitors

1 20120109 394

2 20120110 405

3 20120111 381

4 20120112 390

5 20120113 323

6 20120114 47 >

> #Plotting the Traffic >

> plot(ga.data$data[,2],type=”l”)

Update- Some errors come from pasting Latex directly to WordPress. Here is some code , made pretty-r in case you want to play with the GA api

library(XML)

library(RCurl)

#Change path name in the following to the folder you downloaded the Google Analytics Package 

source("C:/Users/KUs/Desktop/CANADA/R/RGoogleAnalytics/R/RGoogleAnalytics.R")

source("C:/Users/KUs/Desktop/CANADA/R/RGoogleAnalytics/R/QueryBuilder.R")
# download the file needed for authentication
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")

# set the curl options
curl <- getCurlHandle()
options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem",
package = "RCurl"),
ssl.verifypeer = FALSE))
curlSetOpt(.opts = list(proxy = 'proxyserver:port'), curl = curl)

# 1. Create a new Google Analytics API object 

ga <- RGoogleAnalytics()

# 2. Authorize the object with your Google Analytics Account Credentials 

ga$SetCredentials("ohri2007@gmail.com", "XXXXXXX")

# 3. Get the list of different profiles, to help build the query

profiles <- ga$GetProfileData()

profiles #Error Check to See if we get the right website

# 4. Build the Data Export API query 

#Modify the start.date and end.date parameters based on data requirements 

#Modify the table.id at table.id = paste(profiles$profile[X,3]) to get the X th website in your profile 
# 4. Build the Data Export API query
query <- QueryBuilder()
query$Init(start.date = "2012-01-09",
                   end.date = "2012-03-20",
                   dimensions = "ga:date",
                   metrics = "ga:visitors",
                   sort = "ga:date",
                   table.id = paste(profiles$profile[3,3]))

#5. Make a request to get the data from the API 

ga.data <- ga$GetReportData(query)

#6. Look at the returned data 

str(ga.data)

head(ga.data$data)

#Plotting the Traffic 

plot(ga.data$data[,2],type="l")

Created by Pretty R at inside-R.org

Reading Google Docs using R Curl

 

and finally I can download my Google spreadsheet file using-

require(RCurl)

download.file(url=”http://curl.haxx.se/ca/cacert.pem&#8221;, destfile=”cacert.pem”)

url=”https://docs.google.com/spreadsheet/pub?key=0AtYMMvghK2ytcldUcWNNZTltcXdIZUZ2MWU0R1NfeWc&output=csv&#8221;

b <- getURL(url,cainfo=”cacert.pem”)

write.table(b,quote = FALSE, sep = “,”,file=”test.csv”)

 

Previously (for past 3 5 7 hours)-

The codes at http://blog.revolutionanalytics.com/2011/09/using-google-spreadsheets-with-r-an-update.html dont work thanks to the SSL authentication issue. and the packages at [http://www.omegahat.org/RGoogleDocs/] and [https://r-forge.r-project.org/projects/rgoogledata/] are missing in action. 

So I mixed the codes at http://blog.revolutionanalytics.com/2009/09/how-to-use-a-google-spreadsheet-as-data-in-r.html and http://www.brocktibert.com/blog/2012/01/19/358/

and I get this error while using http://thebiobucket.blogspot.in/2012/03/r-function-to-read-data-from-google.html#more

Error in read.table(file = file, header = header, sep = sep, quote = quote, :
more columns than column names

 

JMP 10 released

JMP , the visual data exploration, statistical quality control software from SAS Institute launched version 10 of its software today.

Source-http://jmp.com/about/events/webcasts/jmp_webcast.shtml?name=jmp10

JMP 10 includes:

Numerous enhancements to the drag-and-drop Graph Builder, including a new iPad application.

A cutting-edge Control Chart Builder to create process control charts with drag-and-drop ease.

New reliability capabilities, including growth and forecast models.

Additions and improvements for sorting and filtering data, design of experiments, statistical modeling, scripting, add-in and application development, script debugging and more.

From JohnSall’s blog post at http://blogs.sas.com/content/jmp/2012/03/20/discover-more-with-jmp-10/

Much of the development centered on four focus areas:

1. Graph Builder everywhere. The Graph Builder platform itself has new features like Heatmap and Treemap, an elements palette and properties panel, making the choices more visible. But Graph Builder also has some descendents now, including the new Control Chart Builder, which makes creating control charts an interactive process. In addition, some of the drag-and-drop features that are used to change columns in Graph Builder are also available in Distribution, Fit Y by X, and a few other places. Finally, Graph Builder has been ported to the iPad. For the first time, you can use JMP for exploration and presentation on a mobile device for free. So just think of Graph Builder as gradually taking over in lots of places.

2. Expert-driven design.reliability, measurement systems, and partial least squares analyses.

3. Performance.  this release has the most new multithreading so far

4. Application development

You can read more here –http://jmp.com/about/events/webcasts/jmpwebcast_detail.shtml?reglink=70130000001r9IP