在Perl中有一个
LWP module:
The libwww-perl collection is a set of Perl modules which provides a simple and consistent application programming interface (API) to the World-Wide Web. The main focus of the library is to provide classes and functions that allow you to write WWW clients. The library also contain modules that are of more general use and even classes that help you implement simple HTTP servers.
Ruby有类似的模块(gem)吗?
更新
use LWP::UserAgent; use HTML::TreeBuilder 3; use HTML::TokeParser; sub get_gallery_urls { my $url = shift; my $ua = LWP::UserAgent->new; $ua->agent("$0/0.1 " . $ua->agent); $ua->agent("Mozilla/8.0"); my $req = new HTTP::Request 'GET' => "$url"; $req->header('Accept' => 'text/html'); # send request $response_u = $ua->request($req); die "Error: ",$response_u->status_line unless $response_u->is_success; my $root = HTML::TreeBuilder->new; $root->parse($response_u->content); my @gu = $root->find_by_attribute("id","thumbnails"); my %urls = (); foreach my $g (@gu) { my @as = $g->find_by_tag_name('a'); foreach $a (@as) { my $u = $a->attr("href"); if ($u =~ /^\//) { $urls{"http://example.com"."$u"} = 1; } } } return %urls; }
解决方法
最接近的匹配可能是
httpclient,其目的是相当于LWP.但是,根据您的计划,可能会有更好的选择.如果您打算关注链接,填写表单等以便抓取Web内容,您可以使用
Mechanize,它与perl模块的名称相同.还有更多特定于Ruby的宝石,例如优秀的
Rest-client和
HTTParty(我个人最喜欢的).有关更大的列表,请参阅
HTTP Clients category of Ruby Toolbox.
更新:这是一个如何在Mechanize中找到页面上所有链接的示例(Ruby,但它在Perl中类似):
require 'rubygems' require 'mechanize' agent = Mechanize.new page = agent.get('http://example.com/') page.links.each do |link| puts link.text end
附:作为一个前Perler,我曾经担心放弃优秀的CPAN – 我会把自己画成Ruby的角落吗?难道我无法找到与我依赖的模块相当的东西吗?事实证明这根本不是问题,事实上最近恰恰相反:Ruby(以及Python)往往是第一个获得客户支持新平台/ Web服务等的人.