笨小孩...................................................................................
Java,Ruby,Php,Flex,Ajax,UI,Google,Twitter,Firefox,Ubuntu,Opensource,Designer,Blogger,Web2.0

隨筆-167 評論-65 文章-0 trackbacks-0

Rails 解析rss

ruby 1.8.7 + rails 2.1.0

打開 http://www.google.cn/finance?q=600001 這個(gè)網(wǎng)址可以看到谷歌財(cái)經(jīng)的右側(cè) 有個(gè)新聞區(qū)。。。這個(gè)新聞區(qū)就是從別的地方抓取來的
截圖：

現(xiàn)在我們也來仿照它來實(shí)現(xiàn)一個(gè)，首先rails解析rss有兩種方式，一種是用封裝好的類庫，一種是用原始的解析xml的方式,或者利用別人封裝好的庫例如feedtools, rubyrss 等
用類庫的方法：
    require 'rss/2.0'
    require 'open-uri'
    url = "http://news.google.cn/news?pz=1&ned=ccn&hl=zh-CN&topic=b&output=rss"
    @feed = RSS::Parser.parse(open(url).read, false)
    @feed.items.each do |item|
      puts item.title
      puts item.link
      puts item.description
    end

解析xml的方法：
在lib下建立一個(gè)RssParser的類，這樣在任何地方都可以調(diào)用
class RssParser
require 'rexml/document'
def self.run(url)
    xml = REXML::Document.new Net::HTTP.get(URI.parse(url))
    data = {
      :title    => xml.root.elements['channel/title'].text,
      :home_url => xml.root.elements['channel/link'].text,
      :rss_url => url,
      :items    => []
    }
    xml.elements.each '//item' do |item|
      new_items = {} and item.elements.each do |e|
        new_items[e.name.gsub(/^dc:(\w)/,"\1").to_sym] = e.text
      end
      data[:items] << new_items
    end
    data
end
end

action中使用：
def test
    feed = RssParser.run("http://news.google.cn/news?pz=1&ned=ccn&hl=zh-CN&topic=b&output=rss")
    feed1 = feed[:items][0]
    feed2 = feed[:items][0]
    feed3 = feed[:items][0]
    # combine the feeds into an array
    @feeds = [feed1, feed2, feed3]
    # parse the pubDate strings into a DateTime object
    @feeds.each {|x| x[:pubDate] = DateTime.parse(x[:pubDate].to_s)}
    # iterate through each feed, sorting by pubDate
    @feeds.sort! {|a,b| a[:pubDate] <=> b[:pubDate]}
    # reverse the array to sort by descending pubDate
    @feeds.reverse!
    @feeds.each do |feed|
      puts feed[:title]
      puts feed[:link]
      puts feed[:pubDate]
    end
end

那么上面的title link description 是從哪里來的呢。。。這個(gè)是rss2.0的xml結(jié)構(gòu)，一般情況下是這樣的：

<?xml version="1.0" encoding="utf-8"?>

<rss version="2.0">

<channel>

         <title>Example Feed</title>

<description>Insert witty or insightful remark here</description>

<link>http://example.org/</link>

<lastBuildDate>Sat, 13 Dec 2003 18:30:02 GMT</lastBuildDate>

<managingEditor>johndoe@example.com (John Doe)</managingEditor>

<item>

<title>Atom-Powered Robots Run Amok</title>

<link>http://example.org/2003/12/13/atom03</link>

<guid isPermaLink="false">urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</guid>

<pubDate>Sat, 13 Dec 2003 18:30:02 GMT</pubDate>

<description>Some text.</description>

</item>

</channel>

</rss>



或者你可以查看rss的頁面源代碼，或者puts下  @feed = RSS::Parser.parse(open(url).read, false)的結(jié)果都可以看到上面的這中xml文檔結(jié)構(gòu)



好，下面我們開始實(shí)現(xiàn)上面圖的新聞：

我們可以把這個(gè)部分放在partial里，所以只需要helper和partial文件

helper：

def feed_collection(param)

require 'rss/2.0'

require 'open-uri'

# from news.google.cn

urlhot = "http://news.google.cn/news?pz=1&ned=ccn&hl=zh-CN&topic=b&output=rss"

urlfinance = "http://news.google.cn/news?pz=1&ned=ccn&hl=zh-CN&topic=ecn&output=rss"

urlfund = "http://news.google.cn/news?pz=1&ned=ccn&hl=zh-CN&topic=stc&output=rss"

urlfinancing = "http://news.google.cn/news?pz=1&ned=ccn&hl=zh-CN&topic=pf&output=rss"

case param

when 'hot'

RSS::Parser.parse(open(urlhot).read, false)

when 'finance'

RSS::Parser.parse(open(urlfinance).read, false)

when 'fund'

RSS::Parser.parse(open(urlfund).read, false)

when 'financing'

RSS::Parser.parse(open(urlfinancing).read, false)

end   

end



def feed_link(param)

require 'cgi'

CGI.unescape(param.slice(/(http%).*(&)/)).gsub(/&/,'')  if param # 把十六進(jìn)制路徑 例如http%3A2F之類的轉(zhuǎn)化為 字符

end



def feed_title(param)

param.slice(/.*(-)/).gsub(/-/,"") if param #截取需要的title

end



def feed_from(param)

param.slice(/( - ).*/).from(2) if param # 截取需要的部分

end





partial: _feednews.erb.html

<div class="slides">        

<div><%= render :partial => 'shared/feednews_item',:collection => feed_collection("hot").items %></div>

<div><%= render :partial => 'shared/feednews_item',:collection => feed_collection('finance').items %></div>

<div><%= render :partial => 'shared/feednews_item',:collection => feed_collection('fund').items %></div>

<div><%= render :partial => 'shared/feednews_item',:collection => feed_collection('financing').items %></div>

</div>



主義這里參考了 jquery的loopslider 插件（幻燈片） 加載顯示的只是第一個(gè)div部分，可以參考：

http://github.com/nathansearles/loopedSlider/tree/master



partial: _feednews_item.html.erb

<ul>

<% unless feednews_item.nil? %>

<li  class="news"><a  href="<%= feed_link(feednews_item.link) %>" target="_blank"><%=  feed_title(feednews_item.title) %></a>



<span class="grey small"><span> <%= feed_from(feednews_item.title) %></span>&nbsp;&mdash;&nbsp;<span><%= feednews_item.pubDate.to_date %></span></span></li>

<% end %>

</ul>

okay....已經(jīng)成功了，我實(shí)現(xiàn)的截圖：

ref:
http://www.rubycentral.com/book/ref_c_string.html
http://www.javaeye.com/topic/60620
http://www.troubleshooters.com/codecorn/ruby/basictutorial.htm#_Regular_Expressions
http://paranimage.com/15-jquery-slideshow-plugins/#respond
http://hi.baidu.com/todayz/blog/item/83c1b219d966fd4142a9ad5f.html
http://dennis-zane.javaeye.com/blog/57538
http://sporkmonger.com/projects/feedtools/
http://rubyrss.com/
http://rubyrss.com/
http://www.superwick.com/archives/2007/06/09/rss-feed-parsing-in-ruby-on-rails/
http://www.ruby-forum.com/topic/144447

write by feng

posted on 2009-08-18 10:45 fl1429 閱讀(815) 評論(0) 編輯收藏所屬分類: Rails

新用戶注冊刷新評論列表


只有注冊用戶登錄后才能發(fā)表評論。




網(wǎng)站導(dǎo)航: 博客園 IT新聞 Chat2DB C++博客博問管理
相關(guān)文章: Jquery pagination like twitter more button Rails sphinx + libmmseg + ultrasphinx 全文檢索 Rails sphinx + libmmseg + thinking_sphinx 全文檢索 Rails Jquery scrolling pagenation Rails 很好用的錨(anchor)標(biāo)記 rails jquery autocomplete Rails will_paginate ajax pagination with jquery ruby p , puts 和 print 的區(qū)別 ruby 中英文混合截取字符串 Rails override helper method