New Zealand just made it to their first ever world cup final ( yes it is cricket) and they made it with a thrilling six ( like a home run) for the last ball. Congrats to New Zealand .Of course R was created in New Zealand too and Hadley Wickham is from New Zealand
I recently installed the rvest package from https://github.com/hadley/rvest and its now on CRAN as well
rvest helps you scrape information from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup.
library(rvest) lego_movie <- html("http://www.imdb.com/title/tt1490017/") rating <- lego_movie %>% html_nodes("strong span") %>% html_text() %>% as.numeric() rating #>  7.9 cast <- lego_movie %>% html_nodes("#titleCast .itemprop span") %>% html_text() cast #>  "Will Arnett" "Elizabeth Banks" "Craig Berry" #>  "Alison Brie" "David Burrows" "Anthony Daniels" #>  "Charlie Day" "Amanda Farinos" "Keith Ferguson" #>  "Will Ferrell" "Will Forte" "Dave Franco" #>  "Morgan Freeman" "Todd Hansen" "Jonah Hill" poster <- lego_movie %>% html_nodes("#img_primary img") %>% html_attr("src") poster #>  "http://ia.media-imdb.com/images/M/MV5BMTg4MDk1ODExN15BMl5BanBnXkFtZTgwNzIyNjg3MDE@._V1_SX214_AL_.jpg"
The most important functions in rvest are:
- Create an html document from a url, a file on disk or a string containing html with
- Select parts of a document using css selectors:
html_nodes(doc, "table td")(or if you’ve a glutton for punishment, use xpath selectors with
html_nodes(doc, xpath = "//table//td")). If you haven’t heard of selectorgadget, make sure to read
vignette("selectorgadget")to learn about it.
- Extract components with
html_tag()(the name of the tag),
html_text()(all text inside the tag),
html_attr()(contents of a single attribute) and
- (You can also use rvest with XML files: parse with
xml(), then extract components using
- Parse tables into data frames with
- Extract, modify and submit forms with
- Detect and repair encoding problems with
- Navigate around a website as if you’re in a browser with
submit_form()and so on. (This is still a work in progress, so I’d love your feedback.)
While Hadley Wickham seems busy with reading excel files ( see https://github.com/hadley/readxl) maybe using rvest can help in more sports analysis now!
Meanwhile I am searching for equivalent of readHTMLtable