Dedian  
          -- 關注搜索引擎的開發
          日歷
          <2006年5月>
          30123456
          78910111213
          14151617181920
          21222324252627
          28293031123
          45678910
          統計
          • 隨筆 - 82
          • 文章 - 2
          • 評論 - 228
          • 引用 - 0

          導航

          常用鏈接

          留言簿(8)

          隨筆分類(45)

          隨筆檔案(82)

          文章檔案(2)

          Java Spaces

          搜索

          •  

          積分與排名

          • 積分 - 66093
          • 排名 - 813

          最新評論

          閱讀排行榜

          評論排行榜

           

          1. Develop a searching engine merely for Weblogs (Main jobs will be on WebCrawler, Indexer and Searcher part has been done for xml-based information retrieval)

          Motivation:
          ?? ?a. Weblog is more and more popular recently
          ?? ?b. Though there has some weblog search engines such as Technorati and Blogdigger, but still seems lots of work need to do.
          ?? ?c. the formats of weblog feed (RSS2.0 & Atom) are xml-based and more standard, which is very close to my current job on xml-based information retrieval
          ?? ?d. easily extensible for crawling xml-based information websites besides weblogs
          ?? ?
          HOWTO:
          ?? ????? a. Utilize GData for feeding xml-based information
          or????? b. using some Open Source Crawlers + Lucene (similar idea in this article)
          or ?? ? c. develop and merge my own simple Crawler package into my Shemy project which is clustering structure searching engine design based on Lucene

          ???????? likely: c > a > b (coz most open source crawlers are supposed to deal with much complex web pages/links, while since weblog feed is simpler, the crawler for it should be lighter)

          Requirement/Functionality Analysis : (in progress)

          Schedule: (in progress)

          2. Exploration of performation tuning on searching issues to improve Shemy kernel
          posted on 2006-05-17 06:36 Dedian 閱讀(247) 評論(0)  編輯  收藏

          只有注冊用戶登錄后才能發表評論。


          網站導航:
           
           
          Copyright © Dedian Powered by: 博客園 模板提供:滬江博客
          主站蜘蛛池模板: 无极县| 山阴县| 雅江县| 长兴县| 凤山县| 瑞昌市| 唐海县| 夏邑县| 安塞县| 灵川县| 丹巴县| 海原县| 屯门区| 平遥县| 太谷县| 和平区| 中西区| 晋中市| 道真| 弥勒县| 那坡县| 泗阳县| 宿迁市| 织金县| 平湖市| 吴桥县| 广水市| 洮南市| 普格县| 涿鹿县| 德格县| 桦南县| 新乡县| 阿克| 凌源市| 麻城市| 宜州市| 且末县| 乌什县| 阿坝| 汉中市|