Home » Analytics » Using Google Docs for Web Scraping

Using Google Docs for Web Scraping


Train in R

Predictive Analytics- The Book

While trying to scrape some data from a Website , I chanced upon the getXML function which is pretty neat, as it basically allows you to import the XML feed of a webpage and then parse the data appropriately.


Here is an example-


Using the getXML function I parsed all links for “analytics consultant in India” search results in Google.

The GetXML function works as follows (from the support page here )



  • URL – the URL of the XML or HTML file
  • query – the XPath query to run on the data given at the URL. For example, "//a/@href" returns a list of the href attributes of all <a> tags in the document (i.e. all of the URLs the document links to). For more information about XPath, please visithttp://www.w3schools.com/xpath/
  • Example: =importXml("www.google.com", "//a/@href"). This returns all of the href attributes (the link URLs) in all the <a> tags on www.google.com home page


You can see it here-


or Using the Embed Function


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Get every new post delivered to your Inbox.

Join 802 other followers