Easy Net (Lucene && SOA)

隨筆 - 17 文章 - 84 trackbacks - 0

2007年7月

>

日

一

二

三

四

五

六

24

25

26

27

28

29

30

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

1

2

3

4

如非特別說(shuō)明，所有文章均為原創(chuàng)。如需引用，請(qǐng)注明出處
Email:liangtianyu@gmail.com
MSN:terry.liangtianyu@hotmail.com

常用鏈接

留言簿(4)

隨筆分類(12)

隨筆檔案(17)

搜索

積分與排名

積分 - 52108
排名 - 961

閱讀排行榜

評(píng)論排行榜

Lucene 2.1研究：對(duì)字符的判斷

posted on 2007-07-02 08:14 Terry Liang 閱讀(1602) 評(píng)論(5) 編輯收藏所屬分類: Lucene 2.1研究

FeedBack:

# re: Lucene 2.1研究：對(duì)字符的判斷 2007-07-02 14:02 xmlspy

沒(méi)弄明白你這個(gè)到底如何用,下面是我的測(cè)試代碼

無(wú)論如何都是返回false

1 import org.apache.oro.text.regex.MalformedPatternException;
2 import org.apache.oro.text.regex.Pattern;
3 import org.apache.oro.text.regex.PatternCompiler;
4 import org.apache.oro.text.regex.PatternMatcher;
5 import org.apache.oro.text.regex.Perl5Compiler;
6 import org.apache.oro.text.regex.Perl5Matcher;
7
8 //正則表達(dá)式
9 public class RegxLan {
10
11     //用于判斷Unicode Letter：
12     private static final String UNICODE_LETTER_PATTERN = "[(\u0041-\u005a)|"
13             + "(\u0061-\u007a)|(\u00c0-\u00d6)|(\u00d8-\u00f6)|(\u00f8-\u00ff)|"
14             + "(\u0100-\u1fff)]";
15
16     //用于判斷亞洲語(yǔ)言字符（中國(guó)，日本，韓國(guó)）：
17     private static final String UNICODE_CJP_PATTERN = "[(\u3040-\u318f)|(\u3300-\u337f)|"
18             + "(\u3400-\u3d2d)|(\u4e00-\u9fff)|(\uf900-\ufaff)|(\uac00-\ud7af)]";
19
20     //用于判斷Unicode中的數(shù)字：
21     private static final String UNICODE_DIGIT_PATTERN = "[(\u0030-\u0039)|"
22             + "(\u0660-\u0669)|(\u06f0-\u06f9)|(\u0966-\u096f)|(\u09e6-\u09ef)|"
23             + "(\u0a66-\u0a6f)|(\u0ae6-\u0aef)|(\u0b66-\u0b6f)|(\u0be7-\u0bef)|"
24             + "(\0c66-\u0c6f)|(\u0ce6-\u0cef)|(\u0d66-\u0d6f)|(\u0e50-\u0e59)|"
25             + "(\u0ed0-\u0ed9)|(\u1040-\u1049)]";
26
27     /**
28      * 判斷是否是Unicode字母
29      */
30     public static final boolean isUnicodeLetter(String str) {
31         return testString(str,UNICODE_LETTER_PATTERN);
32     }
33     /**
34      * 判斷是否是Unicode數(shù)字
35      */
36     public static final boolean isUnicodeDigit(String str) {
37         return testString(str,UNICODE_DIGIT_PATTERN);
38     }
39     /**
40      * 判斷是否是Unicode亞洲語(yǔ)言字符
41      */
42     public static final boolean isUnicodeCPJ(String str) {
43         return testString(str,UNICODE_CJP_PATTERN);
44     }
45
46     public static void main(String[] args) {
47         String x="123";
48         boolean is=isUnicodeLetter(x);
49         System.out.println(is);
50         is=isUnicodeDigit(x);
51         System.out.println(is);
52         is=isUnicodeCPJ(x);
53         System.out.println(is);
54     }
55     private static final boolean testString(String str, String pattern) {
56         PatternCompiler cpl = new Perl5Compiler();
57         Pattern p=null;
58         try {
59             p=cpl.compile(pattern);
60         } catch (MalformedPatternException e) {
61             e.printStackTrace();
62         }
63         PatternMatcher matcher=new Perl5Matcher();
64         return matcher.matches(str, p);
65     }
66 }
67

回復(fù) 更多評(píng)論

# re: Lucene 2.1研究：對(duì)字符的判斷 2007-07-02 14:16 Terry Liang

@xmlspy
我定義的是正則表達(dá)式樣式，我在C#中測(cè)試通過(guò)，而且我已經(jīng)指明是判斷單個(gè)字符的，假如傳入字符串，當(dāng)然只會(huì)返回false了。
例如：對(duì)于“我”，假如UnicodeCJPattern去正則匹配，則會(huì)返回true。
很不好意思，我沒(méi)有寫一個(gè)java正則表達(dá)式應(yīng)用的事例。
回復(fù) 更多評(píng)論

# re: Lucene 2.1研究：對(duì)字符的判斷 2007-07-02 21:37 xmlspy

謝謝 :)

把我那個(gè)改了吧,正好當(dāng)作示例用 :) 回復(fù) 更多評(píng)論

# re: Lucene 2.1研究：對(duì)字符的判斷 2007-07-02 22:24 xmlspy

測(cè)試了一下,還是有些問(wèn)題的,不嚴(yán)謹(jǐn).

哥們請(qǐng)看一下 :)

1 import org.apache.oro.text.regex.MalformedPatternException;
2 import org.apache.oro.text.regex.Pattern;
3 import org.apache.oro.text.regex.PatternCompiler;
4 import org.apache.oro.text.regex.PatternMatcher;
5 import org.apache.oro.text.regex.Perl5Compiler;
6 import org.apache.oro.text.regex.Perl5Matcher;
7
8 //正則表達(dá)式
9 //jdk版本:jdk1.5.0_09
10 //類庫(kù):jakarta-oro-2.0.8.jar
11 //操作系統(tǒng): win2003 standard
12 public class RegxLan {
13
14     //用于判斷Unicode Letter：
15     private static final String UNICODE_LETTER_PATTERN = "[(\u0041-\u005a)|"
16             + "(\u0061-\u007a)|(\u00c0-\u00d6)|(\u00d8-\u00f6)|(\u00f8-\u00ff)|"
17             + "(\u0100-\u1fff)]";
18
19     //用于判斷亞洲語(yǔ)言字符（中國(guó)，日本，韓國(guó)）：
20     private static final String UNICODE_CJP_PATTERN = "[(\u3040-\u318f)|(\u3300-\u337f)|"
21             + "(\u3400-\u3d2d)|(\u4e00-\u9fff)|(\uf900-\ufaff)|(\uac00-\ud7af)]";
22
23     //用于判斷Unicode中的數(shù)字：
24     private static final String UNICODE_DIGIT_PATTERN = "[(\u0030-\u0039)|"
25             + "(\u0660-\u0669)|(\u06f0-\u06f9)|(\u0966-\u096f)|(\u09e6-\u09ef)|"
26             + "(\u0a66-\u0a6f)|(\u0ae6-\u0aef)|(\u0b66-\u0b6f)|(\u0be7-\u0bef)|"
27             + "(\0c66-\u0c6f)|(\u0ce6-\u0cef)|(\u0d66-\u0d6f)|(\u0e50-\u0e59)|"
28             + "(\u0ed0-\u0ed9)|(\u1040-\u1049)]";
29
30     /**
31      * 判斷是否是Unicode字母
32      */
33     public static final boolean isUnicodeLetter(String str) {
34         return testString(str, UNICODE_LETTER_PATTERN);
35     }
36
37     /**
38      * 判斷是否是Unicode數(shù)字
39      */
40     public static final boolean isUnicodeDigit(String str) {
41         return testString(str, UNICODE_DIGIT_PATTERN);
42     }
43
44     /**
45      * 判斷是否是Unicode亞洲語(yǔ)言字符
46      */
47     public static final boolean isUnicodeCPJ(String str) {
48         return testString(str, UNICODE_CJP_PATTERN);
49     }
50
51     //通過(guò)測(cè)試,看到還是有問(wèn)題的,尤其是對(duì)符號(hào)判讀不正確,
52     //另外,把英文字母也當(dāng)作數(shù)字對(duì)待了
53     //全角字符，和．返回的都是false,而全角字符×返回的確實(shí)false,true,false
54     //
55     public static void main(String[] args) {
56         //最后三個(gè)是全角字符
57         char[] test = "`~!@#$%^&*()_-+=|\\,.<>/?;:'\"[]{}w2這×，．".toCharArray();
58
59         for (char t : test) {
60             String x = String.valueOf(t);
61             System.out.println("========== 字符: "+t+" 的結(jié)果 ==========");
62
63             boolean is = isUnicodeLetter(x);
64             System.out.println("isUnicodeLetter == "+is);
65             is = isUnicodeDigit(x);
66             System.out.println("isUnicodeDigit == "+is);
67             is = isUnicodeCPJ(x);
68             System.out.println("isUnicodeCPJ == "+is);
69         }
70     }
71
72     private static final boolean testString(String str, String pattern) {
73         PatternCompiler cpl = new Perl5Compiler();
74         Pattern p = null;
75         try {
76             p = cpl.compile(pattern);
77         } catch (MalformedPatternException e) {
78             e.printStackTrace();
79         }
80         PatternMatcher matcher = new Perl5Matcher();
81         return matcher.matches(str, p);
82     }
83 }
84

回復(fù) 更多評(píng)論

# re: Lucene 2.1研究：對(duì)字符的判斷 2007-07-18 12:43 Terry Liang

@xmlspy
我不了解java和.net對(duì)正則表達(dá)式的應(yīng)用有什么異同。
上述判斷證則表示樣式我只在.net中測(cè)試過(guò)。
@xmlspy能否告訴我具體有什不嚴(yán)謹(jǐn)?shù)牡胤侥兀?
回復(fù) 更多評(píng)論

新用戶注冊(cè) 刷新評(píng)論列表


只有注冊(cè)用戶登錄后才能發(fā)表評(píng)論。




網(wǎng)站導(dǎo)航: 博客園 IT新聞 Chat2DB C++博客博問(wèn) 管理
相關(guān)文章: 正在修改基于Java Lucene 2.3.2的.Net Framework 3.5版本 Lucene 2.1研究：發(fā)布NLucene-2.1 Lucene 2.1研究：對(duì)字符的判斷 Lucene 2.1研究：檢索 Lucene 2.1研究：文件存儲(chǔ) Lucene 2.1研究：倒排序基本常識(shí) Lucene 2.1研究：索引文件格式說(shuō)明基于Lucene 2.1研究：時(shí)間的處理基于Lucene 2.1的研究：Lucene.Net版本Bug修改 Lucene數(shù)據(jù)索引搜索示例

主站蜘蛛池模板：慈溪市| 富平县| 巴林右旗| 花莲县| 凤庆县| 临高县| 陈巴尔虎旗| 固阳县| 贵南县| 交城县| 奉化市| 辰溪县| 郁南县| 长寿区| 黔江区| 凌云县| 仪陇县| 丹凤县| 白朗县| 镇沅| 广南县| 毕节市| 仙桃市| 大丰市| 文水县| 山东| 夹江县| 阿拉善左旗| 阳高县| 达州市| 柳河县| 岐山县| 九江县| 特克斯县| 双流县| 襄汾县| 汝州市| 射洪县| 隆林| 汉沽区| 呼和浩特市|