Using Google Docs for Web Scraping

While trying to scrape some data from a Website , I chanced upon the getXML function which is pretty neat, as it basically allows you to import the XML feed of a webpage and then parse the data appropriately.


Here is an example-


Using the getXML function I parsed all links for “analytics consultant in India” search results in Google.

The GetXML function works as follows (from the support page here )



  • URL – the URL of the XML or HTML file
  • query – the XPath query to run on the data given at the URL. For example, "//a/@href" returns a list of the href attributes of all <a> tags in the document (i.e. all of the URLs the document links to). For more information about XPath, please visit
  • Example: =importXml("", "//a/@href"). This returns all of the href attributes (the link URLs) in all the <a> tags on home page


You can see it here-

or Using the Embed Function


