This is a continuation of the previous post on using Google Analytics .
Now that we have downloaded and plotted the data- we try and fit time series to the website data to forecast future traffic.
1) Google Analytics has 0 predictive analytics, it is just descriptive analytics and data visualization models (including the recent social analytics). However you can very well add in basic TS function using R to the GA API.
Why do people look at Website Analytics? To know today’s traffic and derive insights for the Future
2) Web Data clearly follows a 7 day peak and trough for weekly effects (weekdays and weekends), this is also true for hourly data …and this can be used for smoothing historic web data for future forecast.
3) On an advanced level, any hugely popular viral posts can be called a level shift (not drift) and accoringly dampened.
Test and Control!
Similarly using ARIMAX, we can factor in quantity and tag of posts as X regressor variables.
and now the code-( dont laugh at the simplicity please, I am just tinkering and playing with data here!)
You need to copy and paste the code at the bottom of this post http://www.decisionstats.com/using-google-analytics-with-r/ if you want to download your GA data down first.
Note I am using lubridate ,forecast and timeSeries packages in this section.
#Plotting the Traffic plot(ga.data$data[,2],type="l")
#Using package lubridate to convert character dates into time library(lubridate) ga.data$data[,1]=ymd(ga.data$data[,1]) ls() dataset1=ga.data$data names(dataset1) <- make.names(names(dataset1)) str(dataset1) head(dataset1) dataset2 <- ts(dataset1$ga.visitors,start=0,frequency = frequency(dataset1$ga.visitors), names=dataset1$ga.date) str(dataset2) head(dataset2) ts.test=dataset2[1:200] ts.control=dataset2[201:275] #Note I am splitting the data into test and control here fitets=ets(ts.test) plot(fitets) testets=ets(ts.control,model=fitets) accuracy(testets) plot(testets) spectrum(ts.test,method='ar') decompose(ts.test) library("TTR") bb=SMA(dataset2,n=7)#We are doing a simple moving average for every 7 days. Note this can be 24 hrs for hourly data, or 30 days for daily data for month # to month comparison or 12 months for annual #We notice that Web Analytics needs sommethening for every 7 thday as there is some relation to traffic on weekedays /weekends /same time last week head(dataset2,40) head(bb,40) par(mfrow=c(2,1)) plot(bb,type="l",main="Using Seven Day Moving Average for Web Visitors") plot(dataset2,main="Original Data")
Though I still wonder why the R query, gA R code /package could not be on the cloud (why it needs to be downloaded)– cloud computing Gs?
Also how about adding some MORE predictive analytics to Google Analytics, chaps!
To be continued-
auto.arima() and forecasts!!!
and adapting the idiosyncratic periods and cycles of web analytics to time series !!