Home » google (Page 3)
Category Archives: google
Web Analytics using R , Google Analytics and TS Forecasting
This is a continuation of the previous post on using Google Analytics .
Now that we have downloaded and plotted the data- we try and fit time series to the website data to forecast future traffic.
Some observations-
1) Google Analytics has 0 predictive analytics, it is just descriptive analytics and data visualization models (including the recent social analytics). However you can very well add in basic TS function using R to the GA API.
Why do people look at Website Analytics? To know today’s traffic and derive insights for the Future
2) Web Data clearly follows a 7 day peak and trough for weekly effects (weekdays and weekends), this is also true for hourly data …and this can be used for smoothing historic web data for future forecast.
3) On an advanced level, any hugely popular viral posts can be called a level shift (not drift) and accoringly dampened.
Test and Control!
Similarly using ARIMAX, we can factor in quantity and tag of posts as X regressor variables.
and now the code-( dont laugh at the simplicity please, I am just tinkering and playing with data here!)
You need to copy and paste the code at the bottom of this post
http://www.decisionstats.com/using-google-analytics-with-r/
if you want to download your GA data down first.
Note I am using lubridate ,forecast and timeSeries packages in this section.
#Plotting the Traffic plot(ga.data$data[,2],type="l")
library(timeSeries)
library(forecast)
#Using package lubridate to convert character dates into time library(lubridate) ga.data$data[,1]=ymd(ga.data$data[,1]) ls() dataset1=ga.data$data names(dataset1) <- make.names(names(dataset1)) str(dataset1) head(dataset1) dataset2 <- ts(dataset1$ga.visitors,start=0,frequency = frequency(dataset1$ga.visitors), names=dataset1$ga.date) str(dataset2) head(dataset2) ts.test=dataset2[1:200] ts.control=dataset2[201:275] #Note I am splitting the data into test and control here fitets=ets(ts.test) plot(fitets) testets=ets(ts.control,model=fitets) accuracy(testets) plot(testets) spectrum(ts.test,method='ar') decompose(ts.test) library("TTR") bb=SMA(dataset2,n=7)#We are doing a simple moving average for every 7 days. Note this can be 24 hrs for hourly data, or 30 days for daily data for month # to month comparison or 12 months for annual #We notice that Web Analytics needs sommethening for every 7 thday as there is some relation to traffic on weekedays /weekends /same time last week head(dataset2,40) head(bb,40) par(mfrow=c(2,1)) plot(bb,type="l",main="Using Seven Day Moving Average for Web Visitors") plot(dataset2,main="Original Data")
Created by Pretty R at inside-R.org
Though I still wonder why the R query, gA R code /package could not be on the cloud (why it needs to be downloaded)– cloud computing Gs?
Also how about adding some MORE predictive analytics to Google Analytics, chaps!
To be continued-
auto.arima() and forecasts!!!
cross validations!!!
and adapting the idiosyncratic periods and cycles of web analytics to time series !!
Using Google Adwords to target Vic Gundotra and Matt Cutts stochastically
Over the Christmas break, I created a Google Adwords campaign using the $100 credit generously given by Google. I did it using my alumni id, even though I have a perfectly normal gmail id. I guess if Google allows me to use the credit on any account- well I will take it. and so a free experiment was borne.
But whom to target -with Google- but Google itself. It seemed logical
So I created a campaign for the names of prominent Googlers (from a list of Google + at
https://plus.google.com/103399926392582289066/posts/LX4g7577DqD
) and limited the ad location to Mountain View, California.
NULL HYPOTHESIS- People who are googled a lot from within the office are either popular or just checking themselves.
Since Google’s privacy policy is great, has been now shown billions of times, well I guess what’s a little ad targetting between brother geeks. Right?
My ad was-
Hire Ajay Ohri
He is
Awesome
linkedin.com/in/ajayohri
or see screenshot below.
Here are the results-88 clicks and 43000 impressions (and 83$ of Google’s own money)
clearly Vic Gundotra is googled a lot within Mountain View, California. Does He Google himself.
so is Matt Cutts. Does HE Google himself or does he get elves to help him.
to my disappointment not many people clicked my LI offer, I am still blogging
and there were few clicks on Marissa Myers. Why Google her when she is right down the corridor.
The null hypothesis is thus rejected. Also most clicks were from display and not from search.
I need to do something better to do with Christmas break this year. I still got a credit of 16$ left.
Self Driving Cars , Geo Coded Ads, End of Privacy
Imagine a world in which your car tracks everywhere you go. Over a period of time, it builds up a database of your driving habits, how long you stay at particular kinds of dining places, entertainment places (ahem!) , and the days, and times you do it. You can no longer go to massage parlours without your data being checked by your car software admin (read – your home admin)
And that data is mined using machine learning algols to give you better ads for pizzas, or a reminder for food after every 3 hours , or an ad for beer every Thursday after 8 pm .
Welcome Brave New World!
Note on Internet Privacy (Updated)and a note on DNSCrypt
I noticed the brouaha on Google’s privacy policy. I am afraid that social networks capture much more private information than search engines (even if they integrate my browser history, my social network, my emails, my search engine keywords) – I am still okay. All they are going to do is sell me better ads (maybe than just flood me with ads hoping to get a click). Of course Microsoft should take it one step forward and capture data from my desktop as well for better ads, that would really complete the curve. In any case , with the Patriot Act, most information is available to the Government anyway.
But it does make sense to have an easier to understand privacy policy, and one of my disappointments is the complete lack of visual appeal in such notices. Make things simple as possible, but no simpler, as Al-E said.
Privacy activists forget that ads run on models built on AGGREGATED data, and most models are scored automatically. Unless you do something really weird and fake like, chances are the data pertaining to you gets automatically collected, algorithmic-ally aggregated, then modeled and scored, and a corresponding ad to your score, or segment is shown to you. Probably no human eyes see raw data (but big G can clarify that)
( I also noticed Google gets a lot of free advice from bloggers. hey, if you were really good at giving advice to Google- they WILL hire you !)
on to another tool based (than legalese based approach to privacy)
I noticed tools like DNSCrypt increase internet security, so that all my integrated data goes straight to people I am okay with having it (ad sellers not governments!)
Unfortunately it is Mac Only, and I will wait for Windows or X based tools for a better review. I noticed some lag in updating these tools , so I can only guess that the boys of Baltimore have been there, so it is best used for home users alone.
Maybe they can find a chrome extension for DNS dummies.
http://www.opendns.com/technology/dnscrypt/
Why DNSCrypt is so significant
In the same way the SSL turns HTTP web traffic into HTTPS encrypted Web traffic, DNSCrypt turns regular DNS traffic into encrypted DNS traffic that is secure from eavesdropping and man-in-the-middle attacks. It doesn’t require any changes to domain names or how they work, it simply provides a method for securely encrypting communication between our customers and our DNS servers in our data centers. We know that claims alone don’t work in the security world, however, so we’ve opened up the source to our DNSCrypt code base and it’s available onGitHub.
DNSCrypt has the potential to be the most impactful advancement in Internet security since SSL, significantly improving every single Internet user’s online security and privacy.
and
http://dnscurve.org/crypto.html
The DNSCurve project adds link-level public-key protection to DNS packets. This page discusses the cryptographic tools used in DNSCurve.
Elliptic-curve cryptography
DNSCurve uses elliptic-curve cryptography, not RSA.
RSA is somewhat older than elliptic-curve cryptography: RSA was introduced in 1977, while elliptic-curve cryptography was introduced in 1985. However, RSA has shown many more weaknesses than elliptic-curve cryptography. RSA’s effective security level was dramatically reduced by the linear sieve in the late 1970s, by the quadratic sieve and ECM in the 1980s, and by the number-field sieve in the 1990s. For comparison, a few attacks have been developed against some rare elliptic curves having special algebraic structures, and the amount of computer power available to attackers has predictably increased, but typical elliptic curves require just as much computer power to break today as they required twenty years ago.
IEEE P1363 standardized elliptic-curve cryptography in the late 1990s, including a stringent list of security criteria for elliptic curves. NIST used the IEEE P1363 criteria to select fifteen specific elliptic curves at five different security levels. In 2005, NSA issued a new “Suite B” standard, recommending the NIST elliptic curves (at two specific security levels) for all public-key cryptography and withdrawing previous recommendations of RSA.
Some specific types of elliptic-curve cryptography are patented, but DNSCurve does not use any of those types of elliptic-curve cryptography.
Adding / to robots. text again
So I tried to move without a search engine , and only social sharing, but for a small blog like mine, that means almost 75% of traffic comes via search engines.
Maybe the ratio of traffic from search to social will change in the future,
I have now enough data to conclude search is the ONLY statistically significant driver of traffic ( for a small blog)
If you are a blogger you should definitely try and give the tools at Google Webmaster a go,
eg
https://www.google.com/webmasters/tools/googlebot-fetch
URL Googlebot type Fetch Status Fetch date
http://decisionstats.com/ Web Denied by robots.txt 1/19/12 8:25 PM
http://decisionstats.com/ Web Success URL and linked pages submitted to index 12/27/11 9:55 PM
Also from Google Analytics, I see that denying search traffic doesnot increase direct/ referral traffic in any meaningful way.
So my hypothesis that some direct traffic was mis-counted as search traffic due to Chrome, toolbar search – well the hypothesis was wrong
Also Google seems to drop url quite quickly (within 18 hours) and I will test the rebound in SERPs in a few hours. I was using meta tags, blocked using robots.txt, and removal via webmasters ( a combination of the three may have helped)
To my surprise search traffic declined to 5-10, but it did not become 0. I wonder why that happens (I even got a few Google queries per day) and I was blocking the “/” fron robots.txt.
Net Net- The numbers below show- as of now , in a non SOPA, non Social world, Search Engines remain the webmasters only true friend (till they come up with another panda or whatever update
)
Google Apps Terms of Use- Termination
TERMINATION
You may discontinue your use of Google services at any time. You agree that Google may at any time and for any reason, including a period of account inactivity, terminate your access to Google services, terminate the Terms, or suspend or terminate your account. In the event of termination, your account will be disabled and you may not be granted access to Google services, your account or any files or other content contained in your account. Sections 10 (Termination), 13 (Indemnity), 14 (Disclaimer of Warranties), 15 (Limitations of Liability), 16 (Exclusions and Limitations) and 19 (including choice of law, severability and statute of limitations), of the Terms, shall survive expiration or termination.
Source-
http://www.google.com/apps/intl/en/terms/user_terms.html
Related-
https://www.youtube-nocookie.com/v/BcxRfg96dTQ?version=3&hl=en_US&rel=0






