Dedian |
|
|||
-- 關注搜索引擎的開發 |
日歷
統計
導航常用鏈接留言簿(8)隨筆分類(45)
隨筆檔案(82)
文章檔案(2)Java Spaces搜索積分與排名
最新評論
閱讀排行榜評論排行榜 |
1. Develop a searching engine merely for Weblogs (Main jobs will be on WebCrawler, Indexer and Searcher part has been done for xml-based information retrieval) Motivation: ?? ?a. Weblog is more and more popular recently ?? ?b. Though there has some weblog search engines such as Technorati and Blogdigger, but still seems lots of work need to do. ?? ?c. the formats of weblog feed (RSS2.0 & Atom) are xml-based and more standard, which is very close to my current job on xml-based information retrieval ?? ?d. easily extensible for crawling xml-based information websites besides weblogs ?? ? HOWTO: ?? ????? a. Utilize GData for feeding xml-based information or????? b. using some Open Source Crawlers + Lucene (similar idea in this article) or ?? ? c. develop and merge my own simple Crawler package into my Shemy project which is clustering structure searching engine design based on Lucene ???????? likely: c > a > b (coz most open source crawlers are supposed to deal with much complex web pages/links, while since weblog feed is simpler, the crawler for it should be lighter) Requirement/Functionality Analysis : (in progress) Schedule: (in progress) 2. Exploration of performation tuning on searching issues to improve Shemy kernel |
![]() |
|
Copyright © Dedian | Powered by: 博客園 模板提供:滬江博客 |