Dedian  
          -- 關注搜索引擎的開發
          日歷
          <2006年5月>
          30123456
          78910111213
          14151617181920
          21222324252627
          28293031123
          45678910
          統計
          • 隨筆 - 82
          • 文章 - 2
          • 評論 - 228
          • 引用 - 0

          導航

          常用鏈接

          留言簿(8)

          隨筆分類(45)

          隨筆檔案(82)

          文章檔案(2)

          Java Spaces

          搜索

          •  

          積分與排名

          • 積分 - 66091
          • 排名 - 813

          最新評論

          閱讀排行榜

          評論排行榜

           

          1. Develop a searching engine merely for Weblogs (Main jobs will be on WebCrawler, Indexer and Searcher part has been done for xml-based information retrieval)

          Motivation:
          ?? ?a. Weblog is more and more popular recently
          ?? ?b. Though there has some weblog search engines such as Technorati and Blogdigger, but still seems lots of work need to do.
          ?? ?c. the formats of weblog feed (RSS2.0 & Atom) are xml-based and more standard, which is very close to my current job on xml-based information retrieval
          ?? ?d. easily extensible for crawling xml-based information websites besides weblogs
          ?? ?
          HOWTO:
          ?? ????? a. Utilize GData for feeding xml-based information
          or????? b. using some Open Source Crawlers + Lucene (similar idea in this article)
          or ?? ? c. develop and merge my own simple Crawler package into my Shemy project which is clustering structure searching engine design based on Lucene

          ???????? likely: c > a > b (coz most open source crawlers are supposed to deal with much complex web pages/links, while since weblog feed is simpler, the crawler for it should be lighter)

          Requirement/Functionality Analysis : (in progress)

          Schedule: (in progress)

          2. Exploration of performation tuning on searching issues to improve Shemy kernel
          posted on 2006-05-17 06:36 Dedian 閱讀(247) 評論(0)  編輯  收藏

          只有注冊用戶登錄后才能發表評論。


          網站導航:
           
           
          Copyright © Dedian Powered by: 博客園 模板提供:滬江博客
          主站蜘蛛池模板: 云南省| 沙坪坝区| 德保县| 光泽县| 昭觉县| 南丰县| 凉山| 长葛市| 班玛县| 巴林右旗| 德钦县| 灌云县| 林西县| 岱山县| 清苑县| 白水县| 吴江市| 朔州市| 山丹县| 永城市| 江都市| 镇宁| 彝良县| 鹤山市| 鄂伦春自治旗| 龙岩市| 萨嘎县| 双柏县| 江源县| 百色市| 古田县| 兴义市| 房产| 赤壁市| 平安县| 永定县| 商水县| 方城县| 齐河县| 旬邑县| 崇礼县|