-- 關注搜索引擎的開發

日歷

2025年6月

日

一

二

三

四

五

六

統計

隨筆 - 82
文章 - 2
評論 - 228
引用 - 0

隨筆分類(45)

隨筆檔案(82)

文章檔案(2)

2006年4月 (2)

Java Spaces

Alanb(Sun) (rss)
FreeRoller (rss)
JavaBlogs
JavaWorld (rss)

搜索

積分與排名

積分 - 65924
排名 - 816

閱讀排行榜

評論排行榜

Good or Bad, Check your OO Design

An idea is proposed by a PHD student of University of Auckland to check your OO Design on Java. The key point is to use directed graph to analyze the dependencies between all java classes, and the more classses involved in some cycle, the worse design it is.

Several Java Open source softwares have been examed in his research report...
Though it is not the only metric to check your OO design, I'd like to say that it is an interesting thought.

posted @ 2006-06-08 03:05 Dedian 閱讀(988) | 評論 (0) | 編輯收藏

Retrieve values in HashTable or HashMap

Unlike collection types such as Vector or List, Map (HashTable or HashMap) accesses a value by a key. If we want to retrieve all the values that have been put in a Map, one of simple ways to do that is employing a Collection or plus an Iterator, here is the sample code (just retrieve vaules, skip keys), assuming there is a variable: HashMap<String, <ComplexDataType>> links

Collection c = links.value();
Vector<ComplexDataType> v = new Vector<ComplexDataType>(c);
for(int i = 0; i< v.size(); i++)
{
??? ComplexDataType tempData = (ComplexDataType)v.get(i);
??? dosomethingwith(tempData);
}

P.S. Map provides three views of map: keySet, entrySet and values collection, we can use any of them .

posted @ 2006-06-02 07:16 Dedian 閱讀(346) | 評論 (0) | 編輯收藏

Java Interview Questions

These questions are very useful for some Java newbies and guys who wanna prepare some interviews on Java programming positions, which is really cool.

reference:
http://www.allapplabs.com/interview_questions/java_interview_questions.htm
http://www.allapplabs.com/interview_questions/java_interview_questions_2.htm
http://www.allapplabs.com/interview_questions/java_interview_questions_3.htm
http://www.allapplabs.com/interview_questions/java_interview_questions_4.htm
http://www.allapplabs.com/interview_questions/java_interview_questions_5.htm
http://www.allapplabs.com/interview_questions/java_interview_questions_6.htm

posted @ 2006-06-02 06:14 Dedian 閱讀(391) | 評論 (0) | 編輯收藏

Java Reading & Writing file

1. Reading text from Standard Input

try 
{
       BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
       String str = "";
       while (str != null) 
       {
          System.out.print("> some prompt ");
          str = in.readLine();
	  dosomethingwith(str);
       }
} 
catch (IOException e) 
{
}

2. Reading text from a file

try 
{
     BufferedReader in = new BufferedReader(new FileReader("filename"));
     String str;
     while ((str = in.readLine()) != null) 
     {
	dosomethingwith(str);
     }
     in.close();
} 
catch (IOException e) 
{
}

3. Reading a file into a BityArray

    // Returns the contents of the file in a byte array.
    public static byte[] getBytesFromFile(File file) throws IOException 
    {
        InputStream is = new FileInputStream(file);

        // Get the size of the file
        long length = file.length();

        // You cannot create an array using a long type.
        // It needs to be an int type.
        // Before converting to an int type, check
        // to ensure that file is not larger than Integer.MAX_VALUE.
        if (length > Integer.MAX_VALUE) 
	{
            // File is too large
        }

        // Create the byte array to hold the data
        byte[] bytes = new byte[(int)length];

        // Read in the bytes
        int offset = 0;
        int numRead = 0;
        while (offset < bytes.length
               && (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) 
	{
            offset += numRead;
        }

        // Ensure all the bytes have been read in
        if (offset < bytes.length) 
	{
            throw new IOException("Could not completely read file "+file.getName());
        }

        // Close the input stream and return bytes
        is.close();
        return bytes;

    }

4. Writing to a file

try 
{
    BufferedWriter out = new BufferedWriter(new FileWriter("filename"));
    out.write("some string");
    out.close();
} 
catch (IOException e) 
{
}

Note: If the file does not already exist, it is automatically created.

5. Appending to a file

try 
{
     BufferedWriter out = new BufferedWriter(new FileWriter("filename", true));
     out.write("appending String");
     out.close();
} 
catch (IOException e) 
{
}

6. Using a Random Access File

try 
{
     File f = new File("filename");
     RandomAccessFile raf = new RandomAccessFile(f, "rw");

     // Read a character
     char ch = raf.readChar();

     // Seek to end of file
     raf.seek(f.length());

     // Append to the end
     raf.writeChars("aString");
     raf.close();
} 
catch (IOException e) 
{
}

reference:
http://javaalmanac.com/egs/java.io/pkg.html

posted @ 2006-05-31 08:12 Dedian 閱讀(567) | 評論 (1) | 編輯收藏

Java Glossary -- Volatile

volatile

The volatile keyword is used on variables that may be modified simultaneously by other threads. This warns the compiler to fetch them fresh each time, rather than caching them in registers. This also inhibits certain optimisations that assume no other thread will change the values unexpectedly. Since other threads cannot see local variables, there is never any need to mark local variables volatile.

quote from:

http://mindprod.com/jgloss/volatile.html

posted @ 2006-05-25 04:45 Dedian 閱讀(310) | 評論 (1) | 編輯收藏

Lucene 2.0 release mostly this Friday

Though still under voting, it is originally?mentioned by Doug Cutting, and got only positive votes. So it is very likely we can get a 2.0 release version on this Friday. Some bugs has been fixed and deprecated code has been removed in this approaching version.

posted @ 2006-05-24 09:00 Dedian 閱讀(227) | 評論 (0) | 編輯收藏

歲月遐想

二十年前

我受著老師家長的各種表揚帶著各種的小紅花拿著各種的競賽獎狀

我現在的老板也許正在池塘里抓魚樹上捕知了向家長鬧棒棒糖吃

十年前

我開始談戀愛開始在月光下行走在沒人行走的小道上開始學著猶豫地寫詩

我現在的老板也許正在狂啃高中課本而郁郁寡歡或許也開始遞小紙條給鄰座的小女生

十年后的今天

戀人終成我的內人然后我在吭哧吭哧地在我現在的老板提供的一片小天地下寫著莫名其妙的代碼

鄰座的小女生終成記憶然后我現在的老板在我10米不遠的窗明幾凈的空曠的房間里看著我以及100號在他眼里和我差不多的人賣命地為他寫著代碼而輕松的聽者不知是不是搖滾的音樂而搖頭晃腦。

十年后的明天

？

結局1：

內人依然還是內人我還在吭哧吭哧地寫著代碼身邊卻多了一個長著和我有些許相似的小孩拽著我的胳膊鬧著要用我的電腦玩游戲

無數的漂亮女生在大樓里走馬觀花然后我現在的老板在我100米以外不知是不是房間的里面開著大會和著幾個肥頭大耳的股東討論著我以及1000號類似的人類的存活問題

結局2：

內人依然還是內人我終于省吃儉用和內人開辦有史以來第一個屬于自己的公司坐在屬于自己的窗明幾凈的辦公室里看著外面100號年輕如20年前的我的小兄弟們熱火朝天的干著革命

漂亮的女生們依然走馬觀花現在我的老板在更高更大的高樓大廈里和著幾個肥頭大耳的股東討論著怎么把曾經是他的手下如今卻成了一個小老板的我的公司進行兼并的大事。

結局3：

內人依然還是內人我卻擁有一個屬于自己的公司辦公室聚集著一幫曾經是我的同事以及現在的老板混在其中的人群在空調房里為我出謀劃策或者吭哧吭哧地寫著和10年前不一樣的代碼

一個漂亮的女生終于成為漂亮少婦現在的老板卻因為經營不善轉手把公司賣給曾經在他手下吭哧吭哧寫代碼的我然后我給了他一個不錯的職位讓他養家糊口娶妻生子。

P.S. 函數 Likely(結局n) (1<=n<=3)為嚴格單調遞減函數，其上限為0.0001

P.S.

以上歲月遐想純屬yy,我的老板不是中國人，沒有我yy中的他的少年以及青年。既然他不懂中文，我這里用中文進行yy決不會有落把柄在他手中的危險。寫這段yy的話的目的是表達我對年輕的他的敬仰(希望他能看懂這句中文)，以及我還未泯滅在幸福生活中的一點雄心。

posted @ 2006-05-20 13:28 Dedian 閱讀(279) | 評論 (0) | 編輯收藏

Ooops! my laptop not working...

Oops! My laptop, Compaq Presario R3230, is not working now (just worked yesterday evening), blue screen, hangs at disk checking...when I reboot with safe mode, it still hangs at is multi(0)disk(0)rdisk(0)partition(1)\windows\system32\drivers\atisgkaf.sys, I guess there is something wrong with my video driver, but how can I fix that problem without wipe out my documents in harddriver?

I am trying to google by it, it seems some guys also got that problem, some steps are suggested:

1. ?Insert the QuickRestore CD into the CD drive and restart the
? ? system.
2. ?When the red Compaq logo appears, press and hold the Caps
? ? Lock key. ?Next screen will be a blinking QuickRestore screen.
3. ?When the QuickRestore text stops blinking, press and hold the
? ? Num Lock key.

but where can I get QuickRestore CD? included CD seems not in my room any more...anybody has thought about that?

posted @ 2006-05-20 04:32 Dedian 閱讀(188) | 評論 (0) | 編輯收藏

最近的一些心得 -- 關于搜索引擎

由于工作的需要，最近對搜索引擎感興趣起來，下面有些心得：

1。其實要讓自己的Blog的點擊率狂漲的辦法很簡單，就是寫一個最簡單的webcrawler程序，不斷的訪問自己的主頁(發送http請求)，很多計數器的原理就是根據這個來計算的，而不會核實IP地址，不信，只要自己F5刷新一下自己的頁面就知道了。照這樣下去，點擊率超過老徐是肯定沒有問題的。不過，新浪本來就玩點擊率貓膩的，因為他們可以自己修改計數器，所以和他們玩這個沒有意義。

2。點擊率高并不表示你的頁面排名高(PageRank)。PageRank是一個技術含量比較高的詞，想當初Google那兩個毛頭小伙子Larry Page(真的很巧和，那小子的姓居然是Page,真的想不做Page的老大都不行)和 Sergey Brin就是靠在斯坦福期間有關PageRank的研究發家的，如今年紀輕輕就可以和MS叫板。當然，Google的PageRank的算法是商業秘密。不過網上牛人不乏其數，居然有人根據Google的一些搜索行為和利用概率建模等數學知識硬是弄出一套PageRank的解釋，在網上大為流行。那篇Paper只要Google一下PageRank Uncovered(by Chris Ridings and Mike Shishigin)就可以找到。據說，還有人利用里面的機制大大戲弄了一把Google的搜索引擎。不過已無法考證，因為Google也在不斷完善自己。

3。簡單來說，PageRank就是一個衡量自己網站或網頁的重要性的一個很關鍵的指標。其概念的核心簡單來說就是看有多少網頁鏈接到你的網頁，特別是有多少重要的網頁鏈接到你的網頁。換句話說，如果老徐的Blog因為其點擊率或在全國人民的博客世界的影響力使得其PageRank達到10，即為一非常重要之網頁，而你又有幸得到老徐的青睞加為友情鏈接，即她之重要網頁有鏈接指向了你的網頁，則你的PageRank必有所提高。當然，這只是一個非常簡單的例子，具體的公式還沒那么簡單，自己有興趣可以在網上查到，即便這樣，這只是一個因素而已。不過這就不難理解為什么會有那么多的人會在名人的博客上搶沙發甚至故意大放厥詞已引起各方注意了。也就不難理解廣告做到博客上去了。

4.其實，PageRank的idea來源于我們平時的生活中。比如，我想買一個電腦，我希望一個懂電腦的人告訴我買什么電腦。比如我知道小王比較懂，我就會問小王，小王說，恩，dedian牌電腦不錯，就買dedian牌電腦吧。我說，好吧，就買它了，可你是怎么知道的呢，哪里有介紹呢，有哪些優點呢？小王說，這。。。，我也不是很清楚，我也是聽小李那丫說的，你去問那小子吧。這時，即便我不認識小李，可他在我心目中的形象一下高大了許多，小王都要聽他丫的。。。

5。所以，要讓自己的網頁或網站就有影響力，就要千方百計讓別人來連接你，來引用你。當然還有一種辦法，就是不斷的引用別人的文章，這里的引用不是說在你自己的網頁里嵌上別人的連接，而是利用別人的網頁嵌上自己網頁。怎么做，其實就是很多Blog的Trackback的功能，細心可以發現，只要你Trackback別人的Blog,你的Blog地址就留在別人的Blog的網頁里(comments一樣)。不過，現在大都的blog都開始有設置不允許別人Trackback或comments.新浪好像也開始做了手腳，名人的博客不讓引用了好像，不過新浪的博客對很多的搜索引擎都不友好，也就別動他的主意了。倒是MSN space似乎可以，可以寫一段代碼自動連到各個網頁上fetch出每個blog的permalink然后執行一段MSN自己提供的javascript就可以trackback了，不過這只是我最近想到的，還沒有寫代碼實現。如果可以成功的話，很多其他的博客也一樣可以成功。這個想法是最近老看到一些亂七八糟的網站出現在我的trackback里想到的。

6。不過現在網上提供越來越多的服務會杜絕類似的不友好攻擊行為。比如，如果你很討厭有人在你的博客里亂引用，亂寫評論。你可以申請一個類似托管的服務，就是讓另一個網站先收集那些留言或評論，再篩選，再放到你的博客上。總之，網絡的林子大了，什么鳥都有。

posted @ 2006-05-19 16:15 Dedian 閱讀(1535) | 評論 (3) | 編輯收藏

Notes for exploration of Search Engine (keep updating...)

+ Webcrawler
???
??? -- study open source code
??? ?? ?? purpose: analyze code structure and basic componences
??? ?? ?? focus on: Nutch (http://lucene.apache.org/nutch/)
??? ??? ??? ??? ??? & HTMLParser (http://htmlparser.sourceforge.net/)
??? ?? ?? ?? ?? ?? ? & GData(http://code.google.com/apis/gdata/overview.html)

??? -- understand PageRank idea
??? ?? relative articles:
??? ?? http://en.wikipedia.org/wiki/PageRank
??? ?? http://www.thesitewizard.com/archive/google.shtml
?????? paper : "PageRank Uncoverd" by Chris Ridings and Mike Shishigin
?????? http://www.rankforsales.com/n-aa/095-seo-may-31-03.html (about Chris Ridings & SEO)
??? ?? http://en.wikipedia.org/wiki/Web_crawler (basic idea about crawler)
??? ??
??? -- familar with RSS & Atom protocol

??? -- sample coding:
??? ?? Interface: Scheduler for fetching web links
??? ?? Interface: Web page paser/Analyzer --> to deal with XML-based websites(Weblogs or news sites, RSS & Atom) --> Paser classes based on SAX parser
??? ?? Interface: Retractor/Fetcher --> to get links from page
??? ?? Interface: Collector --> check URL whether duplicated and save in URL database with certian data structure
??? ?? Interface: InformationProcesser --> PageRank should be one important factor --> (under thinking)
??? ?? Interface: Policies(Filter) --> will be served for Collector and InformationProcessor --> (under thinking)

+ Indexer/Searcher (almost done base on Lucene)

posted @ 2006-05-19 09:40 Dedian 閱讀(300) | 評論 (1) | 編輯收藏

my favorite way to load a Java project

Motivation:

always, if you wanna check/analyze source code or do some contribution in open source communities, you would like to download the source code of some projects and load (or import) it into your own IDE. (if you don't wanna use CVS or SVN)

Following is my favorite way to do that under Eclipse:

1. create a new blank Java project:

File -> New -> Project ... -> Java Project --> Next >> -> input the project name (project layout: Create seperate source and output folders) --> click Finish

2. right click Source Folder "src" --> import ... -> select File system -> choose correct source code folder where you put the downloaded source code by click the top "Browse..." button (source code folder means the root folder? thus can keep folder structure as package structure) --> Finish

3. if you import wrong source code folder, you can delete whole project to redo. (it is no use merely deleting some failed packages)

Note:

if there is Ant build file (some stuff like build.xml) included in source code package, that will be cool, just using File -> New -> Project... -> Java Project from Existing Ant Buildfile.

posted @ 2006-05-19 02:58 Dedian 閱讀(254) | 評論 (0) | 編輯收藏

Crawling policies

The behavior of a web crawler is the outcome of a combination of policies:

A selection policy that states which pages to download.
A re-visit policy that states when to check for changes to the pages.
A politeness policy that states how to avoid overloading websites.
A parallelization policy that states how to coordinate distributed web crawlers.

cite from:

http://en.wikipedia.org/wiki/Web_crawler

posted @ 2006-05-18 06:34 Dedian 閱讀(186) | 評論 (0) | 編輯收藏

Compiler problem in Eclipse

Problem Description:

I wanna build GData source code under Eclipse which contrains creating type-specific map codes, the Eclipse IDE will complain something like that:? Syntax error, parameterized types are only available if source level is 5.0

Reason:

The new feature to create a type-specific map can only be supported at source level 5.0

Solution:

Do some IDE compiler configuration:
Window > Preferences > Java > Compiler > Compiler compliance level => 5.0

Note:
1. type-specific map:? create a map that will hold only objects of a certain type
??? example:

Map<Integer, String> map = new HashMap<Integer, String>();

    map.put(1, "first");
    map.put(2, "second");

2. if source level 5.0 is applied, Type-safe problem should be noticed for collection data type, such as Vector, List, Stack or Map etc.
that means, you can write code under level 1.4 like this:

private Vector MyList = new Vector();
...
MyList.add(str);

you'd better change to some stuff like this under level 5.0:

private Vector<String> MyList = new Vector<String>();

posted @ 2006-05-17 09:41 Dedian 閱讀(403) | 評論 (0) | 編輯收藏

Planning for next job

1. Develop a searching engine merely for Weblogs (Main jobs will be on WebCrawler, Indexer and Searcher part has been done for xml-based information retrieval)

Motivation:
?? ?a. Weblog is more and more popular recently
?? ?b. Though there has some weblog search engines such as Technorati and Blogdigger, but still seems lots of work need to do.
?? ?c. the formats of weblog feed (RSS2.0 & Atom) are xml-based and more standard, which is very close to my current job on xml-based information retrieval
?? ?d. easily extensible for crawling xml-based information websites besides weblogs
?? ?
HOWTO:
?? ????? a. Utilize GData for feeding xml-based information
or????? b. using some Open Source Crawlers + Lucene (similar idea in this article)
or ?? ? c. develop and merge my own simple Crawler package into my Shemy project which is clustering structure searching engine design based on Lucene

???????? likely: c > a > b (coz most open source crawlers are supposed to deal with much complex web pages/links, while since weblog feed is simpler, the crawler for it should be lighter)

Requirement/Functionality Analysis : (in progress)

Schedule: (in progress)

2. Exploration of performation tuning on searching issues to improve Shemy kernel

posted @ 2006-05-17 06:36 Dedian 閱讀(246) | 評論 (0) | 編輯收藏

Java Glossary -- Nested Class

Definition:

A class within another class

Example:

class EnclosingClass 
{
    ...
    class ANestedClass 
    {
        ...
    }
}

Purpose:

Reflect and enforce the relationship between two classes. (esp. in the scenarios that the nested class makes sense only in the context of its enclosing class or when it relies on the enclosing class for its functionthe nested class makes sense only in the context of its enclosing class or when it relies on the enclosing class for its function)

Interesting features:

1. An instance of InnerClass can exist only within an instance of

EnclosingClass

2. InnerClass instance has direct access to the instance variables and methods of its enclosing instance.
3. two special kinds of inner classes: local classes and anonymous classes

reference:
http://java.sun.com/docs/books/tutorial/java/javaOO/nested.html

posted @ 2006-05-16 08:22 Dedian 閱讀(328) | 評論 (0) | 編輯收藏

僅列出標題


Copyright © Dedian	Powered by: 博客園模板提供：滬江博客

導航

常用鏈接

留言簿(8)