Dust Of Dream

知識真的是一個圓么?

Ruby學習筆記一:安裝Ruby與Ruby的爬蟲應用

最近終于空下來了,所以下個Ruby玩玩,安裝Ruby很簡單,去官網下載一個一鍵安裝包既可,linux下的安裝,大家Google下就有很多教程了.對于IDE網上說NetBeans支持得很完美,但是因為本人比較喜歡Eclipse,所以還是跟大家推薦EasyEclipse for Ruby and Rails,當然你可以選擇只下RoR的插件而不弄個全新的Eclipse.
以前一直在用Java寫爬蟲工具抓圖片,對HttpClient包裝,正則表達式處理那個是累啊,就算弄好了工具類,有時候一會又想不起來放哪兒,但Ruby對方面包裝的就很強大,短短幾十行代碼就搞定了這一切:
頁面獲取和文件下載的方法.

util.rb:

require 'net/http'
def query_url(url)
  return Net::HTTP.get(URI.parse(url));
end

def save_url(url,dir,filename)
  filename = url[url.rindex('/')+1, url.length-1] if filename == nil || filename.empty?
  require 'open-uri'
  Dir.mkdir("#{dir}") if dir != nil && !dir.empty? && !FileTest.exist?(dir)
  open(url) do |fin|
    if true
    File.new("#{dir}#{filename}","wb").close
    open("#{dir}#{filename}","wb") do |fout|
      while buf = fin.read(1024) do
        fout.write buf
        STDOUT.flush
      end
    end
    end
  end
end

抓取圖片的具體應用:

require "util"
begin
  start_url = 'http://list.mall.taobao.com/1424/g-d-----40-0--1424.htm'
  while start_url != nil && !start_url.empty? do
    print "開始下載#{start_url}\n"
    content = query_url(start_url)
    next_page = content.scan(/ <a href="(.*?)" class="next-page"><span>下一頁<\/span><\/a>/)
    next_url = nil
    next_url = next_page[0][0] if  next_page != nil && next_page.length > 0 && next_page[0].length > 0

    imgs = content.scan(/<img src="(http:\/\/img[\d].*?)" \/>/)
    for img in imgs
      url = img[0];
      save_url(url,"d:\\mall\\",nil)
    end

    start_url = next_url;
    #    break;
  end

end

使用一天之后感覺ruby的語法很自然,很好理解,上手比較容易,而且相關包封裝的也很好,確實比較適合拿來玩玩小程序.

posted on 2008-10-15 10:11 Anemone 閱讀(2011) 評論(0) 編輯收藏所屬分類: RUBY學習

新用戶注冊刷新評論列表


只有注冊用戶登錄后才能發表評論。




網站導航: 博客園 IT新聞 Chat2DB C++博客博問
相關文章: Agile Web Development with Rails 3nd Edition 閱讀筆記 Gem 從1.1 1.2 無法升級到1.3問題 Ror學習筆記 Ruby學習筆記二:使用Ruby實現通過Proxy的方式請求網頁 Ruby學習筆記一:安裝Ruby與Ruby的爬蟲應用

My Links

Blog Stats

隨筆 - 23
文章 - 0
評論 - 26
Trackbacks - 0

News

一直很懶，雖然工作5年了，卻很少留下足跡。是時候需要改變了，歡迎大家去我的新博客逛逛。

Dust Of Dream

Ruby學習筆記一:安裝Ruby與Ruby的爬蟲應用

My Links

Blog Stats

News

常用鏈接

留言簿(1)

隨筆分類

隨筆檔案

新聞檔案

相冊

常去網站

搜索

積分與排名

最新評論

閱讀排行榜

評論排行榜