我想在这里抓取的页面:
http://stoptb.org/countries/tbteam/searchExperts.asp需要在此页面中提交参数:
http://stoptb.org/countries/tbteam/experts.asp以获取数据.由于参数没有嵌套在URL中,我不知道如何用R传递它们.有没有办法在R中执行此操作?
(顺便说一下,我对ASP几乎一无所知,所以也许这就是我缺少的组件.)
解决方法
您可以使用RHTMLForms
您可能需要先安装它:
# install.packages("RHTMLForms",repos = "http://www.omegahat.org/R")
或在Windows下您可能需要
# install.packages("RHTMLForms",repos = "http://www.omegahat.org/R",type = "source") require(RHTMLForms) require(RCurl) require(XML) forms = getHTMLFormDescription("http://stoptb.org/countries/tbteam/experts.asp") fun = createFunction(forms$sExperts) # find experts with expertise in "Infection control: Engineering Consultant" results <- fun(Expertise = "Infection control: Engineering Consultant") tableData <- getNodeSet(htmlParse(results),"//*/table[@class = 'data']") readHTMLTable(tableData[[1]]) # V1 V2 V3 #1 <NA> <NA> #2 Name of Expert Country of Residence Email #3 Girmay,Desalegn Ethiopia deskebede@yahoo.com #4 IVANCHENKO,VARVARA Estonia v.ivanchenko81@mail.ru #5 JAUCOT,Alex Belgium alex.jaucot@gmail.com #6 Mulder,Hans Johannes Henricus Namibia hmulder@iway.na #7 Walls,Neil Australia neil@nwalls.com #8 Zuccotti,Thea Italy thea_zuc@yahoo.com # V4 #1 <NA> #2 Number of Missions #3 0 #4 3 #5 0 #6 0 #7 0 #8 1
或创建一个阅读器来返回一个表
returnTable <- function(results){ tableData <- getNodeSet(htmlParse(results),"//*/table[@class = 'data']") readHTMLTable(tableData[[1]]) } fun = createFunction(forms$sExperts,reader = returnTable) fun(CBased = "Bhutan") # find experts based in Bhutan # V1 V2 V3 #1 <NA> <NA> #2 Name of Expert Country of Residence Email #3 Wangchuk,Lungten Bhutan drlungten@health.gov.bt # V4 #1 <NA> #2 Number of Missions #3 2