Home » Posts tagged 'search engine'
Tag Archives: search engine
So okay, I was a bit harsh here . I apologize.
Anonymous India is doing a protest (on a Saturday which is mostly an off in the Tech Industry in India)
The Indian govt-judiciary-pvt ISPs are blocking the Internet-particularly PirateBay (and OTHER sites).
You can access these sites by going through Amazon AWS.
You can read how to configure an Amazon AWS Windows Instance for free here. Free as in free speech but not as free beer
You can attach your Amazon Windows Instance to your local drive here. See screenshot
Basically you edit the RDP file (Remote Desktop File) as below.
You can then download the torrent from Amazon to your local drive like this.
and you can now click on torrent to start your downloads like this.
Note- Pirate Bay is just a search engine for Torrents- the basic data is peer to peer.
You should change your default password in Windows which you got here-
The noted Diamonds dataset in the ggplot2 package of R is actually culled from the website http://www.diamondse.info/diamond-prices.asp
However it has ~55000 diamonds, while the whole Diamonds search engine has almost ten times that number. Using iMacros – a Google Chrome Plugin, we can scrape that data (or almost any data). The iMacros chrome plugin is available at https://chrome.google.com/webstore/detail/cplklnmnlbnpmjogncfgfijoopmnlemp while notes on coding are at http://wiki.imacros.net
Imacros makes coding as easy as recording macro and the code is automatcially generated for whatever actions you do. You can set parameters to extract only specific parts of the website, and code can be run into a loop (of 9999 times!)
Here is the iMacros code-Note you need to navigate to the web site http://www.diamondse.info/diamond-prices.asp before running it
VERSION BUILD=5100505 RECORDER=CR
SET !EXTRACT_TEST_POPUP NO
SET !ERRORIGNORE YES
TAG POS=6 TYPE=TABLE ATTR=TXT:* EXTRACT=TXT
TAG POS=1 TYPE=DIV ATTR=CLASS:paginate_enabled_next
SAVEAS TYPE=EXTRACT FOLDER=* FILE=test+3
and voila- all the diamonds you need to analyze!
The returning data can be read using the standard delimiter data munging in the language of SAS or R.
More on IMacros from
Automate your web browser. Record and replay repetitious work
If you encounter any problems with iMacros for Chrome, please let us know in our Chrome user forum at http://forum.iopus.com/viewforum.php?f=21 Our forum is also the best place for new feature suggestions :-) ---- iMacros was designed to automate the most repetitious tasks on the web. If there’s an activity you have to do repeatedly, just record it in iMacros. The next time you need to do it, the entire macro will run at the click of a button! With iMacros, you can quickly and easily fill out web forms, remember passwords, create a webmail notifier, and more. You can keep the macros on your computer for your own use, use them within bookmark sync / Xmarks or share them with others by embedding them on your homepage, blog, company Intranet or any social bookmarking service as bookmarklet. The uses are limited only by your imagination! Popular uses are as web macro recorder, form filler on steroids and highly-secure password manager (256-bit AES encryption).
So I tried to move without a search engine , and only social sharing, but for a small blog like mine, that means almost 75% of traffic comes via search engines.
Maybe the ratio of traffic from search to social will change in the future,
I have now enough data to conclude search is the ONLY statistically significant driver of traffic ( for a small blog)
If you are a blogger you should definitely try and give the tools at Google Webmaster a go,
URL Googlebot type Fetch Status Fetch date
http://decisionstats.com/ Web Denied by robots.txt 1/19/12 8:25 PM
http://decisionstats.com/ Web Success URL and linked pages submitted to index 12/27/11 9:55 PM
Also from Google Analytics, I see that denying search traffic doesnot increase direct/ referral traffic in any meaningful way.
So my hypothesis that some direct traffic was mis-counted as search traffic due to Chrome, toolbar search – well the hypothesis was wrong
Also Google seems to drop url quite quickly (within 18 hours) and I will test the rebound in SERPs in a few hours. I was using meta tags, blocked using robots.txt, and removal via webmasters ( a combination of the three may have helped)
To my surprise search traffic declined to 5-10, but it did not become 0. I wonder why that happens (I even got a few Google queries per day) and I was blocking the “/” fron robots.txt.
Net Net- The numbers below show- as of now , in a non SOPA, non Social world, Search Engines remain the webmasters only true friend (till they come up with another panda or whatever update )
I just used the really handy tools at
, clicked Remove URL
and submitted http://www.decisionstats.com
and I also modified my robots.txt file to
Just to make sure- I added the meta tag to each right margin of my blog
“<meta name=”robots” content=”noindex”>”
Now for last six months of 2011 as per Analytics, search engines were really generous to me- Giving almost 170 K page views,
Source Visits Pages/Visit
1. google 58,788 2.14
2. (direct) 10,832 2.24
3. linkedin.com 2,038 2.50
4. google.com 1,823 2.15
5. bing 1,007 2.04
6. reddit.com 749 1.93
7. yahoo 740 2.25
8. google.co.in 576 2.13
9. search 572 2.07
I do like to experiment though, and I wonder if search engines just -
1) Make people lazy to bookmark or type the whole website name in Chrome/Opera toolbars
2) Help disguise sources of traffic by encrypted search terms
3) Help disguise corporate traffic watchers and aggregators
So I am giving all spiders a leave for Q1 2012. I am interested in seeing impact of this on my traffic , and I suspect that the curves would not be as linear as I think.
Is search engine optimization over rated? Let the data decide….
I am also interested in seeing how social sharing can impact traffic in the absence of search engine interaction effects- and whether it is possible to retain a bigger chunk of traffic by reducing SEO efforts and increasing social efforts!