pengpenglin

人，必須選擇一種生活方式并有勇氣堅持下去

posts - 262, comments - 221, trackbacks - 0

【原】RSS工具開發手記(12)---Informa的parsers包

既然知道了FeedParser類只是一個中介類，真正的解析工作都在各個版本號對應的解析器完成，那么下面就從RSS 0.9.x / 2.0系協議解析器開始學習。

★RSS_0_91_Parser

/**

* Private constructor suppresses generation of a (public) default

* constructor.

*/

private RSS_0_91_Parser() {

}

/**

* Holder of the RSS_0_91_Parser instance.

*/

private static class RSS_0_91_ParserHolder {

private static RSS_0_91_Parser instance = new RSS_0_91_Parser();

}

/**

* Get the RSS_0_91_Parser instance.

*/

public static RSS_0_91_Parser getInstance() {

return RSS_0_91_ParserHolder.instance;

}

可以看到解析器都是單例模式，這里比較特別的是使用靜態內部類來保存這個單例。個人認為這里有點多余，靜態內部類的作用主要有：
A.在創建靜態內部類的實例前，不需要額外創建外部類的實例
B.靜態內部類可以也只能訪問外部類的靜態成員和靜態方法

內部類的方法通常用在一些隱藏實現細節的地方。通常這些類只對另外的某個類有意義，對其他類來說完全沒有作用。如果定義在包里成為獨立類，反而顯得不協調。所以采用內部類的方式定義在需要使用它的類的內部，隱藏細節。

其次靜態內部類的內部變量和方法都必須是靜態的，這樣我們才可以通過Outerclass.InnerClass.staticMethod的形式來引用他們。

這個類的重點是parse方法，我們都知道了肯定是先從channel節點開始解析。

Element channel = root.getChild("channel");

ChannelIF chnl = cBuilder.createChannel(channel, channel

.getChildTextTrim("title"));

首先獲取XML Document的root節點下的channel子節點。然后利用Jdom提供的訪問節點文本值的方法獲取channel的title。接下來就是調用ChannelBuilder的createChannel了。在這里就用到了Java的多態特性，不同的實現使用不同的構建方法。如果是使用hibernate方法，那么則是如下的過程：

public ChannelIF createChannel(Element channelElement, String title,

String location) {

ChannelIF obj = null;

if (location != null) {

Query query = session
.createQuery("from Channel as channel where channel.locationString = ? ");

query.setString(0, location);

obj = (ChannelIF) query.uniqueResult();

}

if (obj == null) {

obj = new Channel(channelElement, title, location);

session.save(obj);

} else {

logger

.info("Found already existing channel instance with location "

+ location);

}

return obj;

}

先從數據庫加載，如果找不到就創建然后持久化它。

下面的代碼則是對channel下屬的子節點和item的獲取。

// 1..n item elements

List items = channel.getChildren("item");

Iterator i = items.iterator();

while (i.hasNext()) {

Element item = (Element) i.next();

ParserUtils.matchCaseOfChildren(item, new String[] { "title",

"link", "description", "source", "enclosure" });

// get title element

Element elTitle = item.getChild("title");

String strTitle = "<No Title>";

if (elTitle != null) {

strTitle = elTitle.getTextTrim();

}

if (logger.isDebugEnabled()) {

logger.debug("Item element found (" + strTitle + ").");

}

// get link element

Element elLink = item.getChild("link");

String strLink = "";

if (elLink != null) {

strLink = elLink.getTextTrim();

}

// get description element

Element elDesc = item.getChild("description");

String strDesc = "";

if (elDesc != null) {

strDesc = elDesc.getTextTrim();

}

// generate new RSS item (link to article)

ItemIF rssItem = cBuilder.createItem(item, chnl, strTitle, strDesc,

ParserUtils.getURL(strLink));

rssItem.setFound(dateParsed);

// get source element (an RSS 0.92 element)

Element source = item.getChild("source");

if (source != null) {

String sourceName = source.getTextTrim();

Attribute sourceAttribute = source.getAttribute("url");

if (sourceAttribute != null) {

String location = sourceAttribute.getValue().trim();

ItemSourceIF itemSource = cBuilder.createItemSource(

rssItem, sourceName, location, null);

rssItem.setSource(itemSource);

}

// get enclosure element (an RSS 0.92 element)

Element enclosure = item.getChild("enclosure");

if (enclosure != null) {

URL location = null;

String type = null;

int length = -1;

Attribute urlAttribute = enclosure.getAttribute("url");

if (urlAttribute != null) {

location = ParserUtils.getURL(urlAttribute.getValue()

.trim());

}

Attribute typeAttribute = enclosure.getAttribute("type");

if (typeAttribute != null) {

type = typeAttribute.getValue().trim();

}

Attribute lengthAttribute = enclosure.getAttribute("length");

if (lengthAttribute != null) {

try {

length = Integer.parseInt(lengthAttribute.getValue()

.trim());

} catch (NumberFormatException e) {

logger.warn(e);

}

ItemEnclosureIF itemEnclosure = cBuilder.createItemEnclosure(

rssItem, location, type, length);

rssItem.setEnclosure(itemEnclosure);

}

可以看到，對于這個解析過程，一般的步驟就是：
A.獲取channnel下的某個子節點元素
B.如果該子節點元素有子元素或屬性，則繼續遞歸訪問
C.調用該channnel子元素的createXxx方法加載或創建該子元素
D.調用Channel的setXxx方法添加該子元素到channel實例中

整個RSS 0.9.1協議的解析過程如下：

==================根元素==================

1. channel

==================必需元素==================

2. title

3.description

4.link

==================可選元素==================

5.language

6.item

7.image

8.textinput

9.copyright

10.rating

11.pubDate

12.lastBuildDate

13.docs

14.managingEditor

15.webMaster

16.cloud

★RSS_2_0_Parser

比較0.9.1和2.0協議，發現整個解析過程幾乎相同。最大的不同有以下兩點：

A.從RSS 2.0協議開始，增加了對名稱空間(Namespace)的支持
B.增加了對幾個2.0協議新增元素的解析

在RSS_2_0_Parser類中，每個元素的訪問都需要使用name和namespace來區分，默認的namespace是""。其次在RSS 2.0的解析器中增加了對subject、category、author、creator、comments、guid這些元素的解析，這些在0.9.1協議中是沒有的元素

-------------------------------------------------------------
生活就像打牌，不是要抓一手好牌，而是要盡力打好一手爛牌。

posted on 2009-12-30 10:45 Paul Lin 閱讀(254) 評論(0) 編輯收藏所屬分類: J2SE

新用戶注冊刷新評論列表


只有注冊用戶登錄后才能發表評論。




網站導航: 博客園 IT新聞 Chat2DB C++博客博問管理
相關文章: 【Java基礎專題】IO與文件讀寫---優化搜索程序(01) 【Java基礎專題】IO與文件讀寫---DirectoryWalker和FileFilter的復雜條件使用【Java基礎專題】IO與文件讀寫---使用DirectoryWalker和FileFilterUtils進行搜索【Java基礎專題】IO與文件讀寫---慎用FileUtils.writeLines(File, Collection)方法 TSS上關于JDBC操作優化的Tips總結【Java基礎專題】IO與文件讀寫---對同步/異步和阻塞/非阻塞的理解【Java基礎專題】IO與文件讀寫---同步/異步與阻塞/非阻塞的區別（轉）【Java基礎專題】IO與文件讀寫---使用Apache commons IO包進行資源遍歷【Java基礎專題】IO與文件讀寫---使用Apache commons IO過濾文件和目錄【Java基礎專題】IO與文件讀寫---使用Apache commons IO操縱底層讀寫