New Zealand just made it to their first ever world cup final ( yes it is cricket) and they made it with a thrilling six ( like a home run) for the last ball. Congrats to New Zealand .Of course R was created in New Zealand too and Hadley Wickham is from New Zealand
I recently installed the rvest package from https://github.com/hadley/rvest and its now on CRAN as well
library(rvest) lego_movie <- html("http://www.imdb.com/title/tt1490017/") rating <- lego_movie %>% html_nodes("strong span") %>% html_text() %>% as.numeric() rating #>  7.9 cast <- lego_movie %>% html_nodes("#titleCast .itemprop span") %>% html_text() cast #>  "Will Arnett" "Elizabeth Banks" "Craig Berry" #>  "Alison Brie" "David Burrows" "Anthony Daniels" #>  "Charlie Day" "Amanda Farinos" "Keith Ferguson" #>  "Will Ferrell" "Will Forte" "Dave Franco" #>  "Morgan Freeman" "Todd Hansen" "Jonah Hill" poster <- lego_movie %>% html_nodes("#img_primary img") %>% html_attr("src") poster #>  "http://ia.media-imdb.com/images/M/MV5BMTg4MDk1ODExN15BMl5BanBnXkFtZTgwNzIyNjg3MDE@._V1_SX214_AL_.jpg"
The most important functions in rvest are:
- Create an html document from a url, a file on disk or a string containing html with
- Select parts of a document using css selectors:
html_nodes(doc, "table td")(or if you’ve a glutton for punishment, use xpath selectors with
html_nodes(doc, xpath = "//table//td")). If you haven’t heard of selectorgadget, make sure to read
vignette("selectorgadget")to learn about it.
- Extract components with
html_tag()(the name of the tag),
html_text()(all text inside the tag),
html_attr()(contents of a single attribute) and
- (You can also use rvest with XML files: parse with
xml(), then extract components using
- Parse tables into data frames with
- Extract, modify and submit forms with
- Detect and repair encoding problems with
- Navigate around a website as if you’re in a browser with
submit_form()and so on. (This is still a work in progress, so I’d love your feedback.)
While Hadley Wickham seems busy with reading excel files ( see https://github.com/hadley/readxl) maybe using rvest can help in more sports analysis now!
Meanwhile I am searching for equivalent of readHTMLtable