Ruby學(xué)習(xí)筆記一:安裝Ruby與Ruby的爬蟲應(yīng)用
最近終于空下來了,所以下個(gè)Ruby玩玩,安裝Ruby很簡單,去官網(wǎng)下載一個(gè)一鍵安裝包既可,linux下的安裝,大家Google下就有很多教程了.對(duì)于IDE網(wǎng)上說NetBeans支持得很完美,但是因?yàn)楸救吮容^喜歡Eclipse,所以還是跟大家推薦EasyEclipse for Ruby and Rails,當(dāng)然你可以選擇只下RoR的插件而不弄個(gè)全新的Eclipse.
以前一直在用Java寫爬蟲工具抓圖片,對(duì)HttpClient包裝,正則表達(dá)式處理那個(gè)是累啊,就算弄好了工具類,有時(shí)候一會(huì)又想不起來放哪兒,但Ruby對(duì)方面包裝的就很強(qiáng)大,短短幾十行代碼就搞定了這一切:
頁面獲取和文件下載的方法.
抓取圖片的具體應(yīng)用:
以前一直在用Java寫爬蟲工具抓圖片,對(duì)HttpClient包裝,正則表達(dá)式處理那個(gè)是累啊,就算弄好了工具類,有時(shí)候一會(huì)又想不起來放哪兒,但Ruby對(duì)方面包裝的就很強(qiáng)大,短短幾十行代碼就搞定了這一切:
頁面獲取和文件下載的方法.
util.rb:
require 'net/http'
def query_url(url)
return Net::HTTP.get(URI.parse(url));
end
def save_url(url,dir,filename)
filename = url[url.rindex('/')+1, url.length-1] if filename == nil || filename.empty?
require 'open-uri'
Dir.mkdir("#{dir}") if dir != nil && !dir.empty? && !FileTest.exist?(dir)
open(url) do |fin|
if true
File.new("#{dir}#{filename}","wb").close
open("#{dir}#{filename}","wb") do |fout|
while buf = fin.read(1024) do
fout.write buf
STDOUT.flush
end
end
end
end
end
require 'net/http'
def query_url(url)
return Net::HTTP.get(URI.parse(url));
end
def save_url(url,dir,filename)
filename = url[url.rindex('/')+1, url.length-1] if filename == nil || filename.empty?
require 'open-uri'
Dir.mkdir("#{dir}") if dir != nil && !dir.empty? && !FileTest.exist?(dir)
open(url) do |fin|
if true
File.new("#{dir}#{filename}","wb").close
open("#{dir}#{filename}","wb") do |fout|
while buf = fin.read(1024) do
fout.write buf
STDOUT.flush
end
end
end
end
end
抓取圖片的具體應(yīng)用:
require "util"
begin
start_url = 'http://list.mall.taobao.com/1424/g-d-----40-0--1424.htm'
while start_url != nil && !start_url.empty? do
print "開始下載#{start_url}\n"
content = query_url(start_url)
next_page = content.scan(/ <a href="(.*?)" class="next-page"><span>下一頁<\/span><\/a>/)
next_url = nil
next_url = next_page[0][0] if next_page != nil && next_page.length > 0 && next_page[0].length > 0
imgs = content.scan(/<img src="(http:\/\/img[\d].*?)" \/>/)
for img in imgs
url = img[0];
save_url(url,"d:\\mall\\",nil)
end
start_url = next_url;
# break;
end
end
使用一天之后感覺ruby的語法很自然,很好理解,上手比較容易,而且相關(guān)包封裝的也很好,確實(shí)比較適合拿來玩玩小程序.begin
start_url = 'http://list.mall.taobao.com/1424/g-d-----40-0--1424.htm'
while start_url != nil && !start_url.empty? do
print "開始下載#{start_url}\n"
content = query_url(start_url)
next_page = content.scan(/ <a href="(.*?)" class="next-page"><span>下一頁<\/span><\/a>/)
next_url = nil
next_url = next_page[0][0] if next_page != nil && next_page.length > 0 && next_page[0].length > 0
imgs = content.scan(/<img src="(http:\/\/img[\d].*?)" \/>/)
for img in imgs
url = img[0];
save_url(url,"d:\\mall\\",nil)
end
start_url = next_url;
# break;
end
end
posted on 2008-10-15 10:11 Anemone 閱讀(2004) 評(píng)論(0) 編輯 收藏 所屬分類: RUBY學(xué)習(xí)