涓ゅ瓧絎︿覆鐩鎬技搴﹁綆楁柟娉曟湁濂藉錛岀幇瀵瑰熀浜庣紪璺濈殑綆楁硶鐨勭浉浼煎害璁$畻鑷繁鎬葷粨涓嬨?/p>
綆鍗曚粙緇嶄笅Levenshtein Distance(LD)錛歀D 鍙兘琛¢噺涓ゅ瓧絎︿覆鐨勭浉浼兼с傚畠浠殑璺濈灝辨槸涓涓瓧絎︿覆杞崲鎴愰偅涓涓瓧絎︿覆榪囩▼涓殑娣誨姞銆佸垹闄ゃ佷慨鏀規暟鍊箋?/p>
涓句緥錛?/p>
- 濡傛灉str1="test"錛宻tr2="test"錛岄偅涔圠D(str1,str2) = 0銆傛病鏈夌粡榪囪漿鎹€?
- 濡傛灉str1="test"錛宻tr2="tent"錛岄偅涔圠D(str1,str2) = 1銆俿tr1鐨?s"杞崲"n"錛岃漿鎹簡涓涓瓧絎︼紝鎵浠ユ槸1銆?
濡傛灉瀹冧滑鐨勮窛紱昏秺澶э紝璇存槑瀹冧滑瓚婃槸涓嶅悓銆?/p>
Levenshtein distance鏈鍏堟槸鐢變縿鍥界瀛﹀Vladimir Levenshtein鍦?965騫村彂鏄庯紝鐢ㄤ粬鐨勫悕瀛楀懡鍚嶃備笉浼氭嫾璇伙紝鍙互鍙畠edit distance錛堢紪杈戣窛紱伙級銆?/p>
Levenshtein distance鍙互鐢ㄦ潵錛?/p>
- Spell checking(鎷煎啓媯鏌?
- Speech recognition(璇彞璇嗗埆)
- DNA analysis(DNA鍒嗘瀽)
- Plagiarism detection(鎶勮媯嫻?
LD鐢╩*n鐨勭煩闃靛瓨鍌ㄨ窛紱誨箋傜畻娉曞ぇ姒傝繃紼嬶細
- str1鎴杝tr2鐨勯暱搴︿負0榪斿洖鍙︿竴涓瓧絎︿覆鐨勯暱搴︺?
- 鍒濆鍖?n+1)*(m+1)鐨勭煩闃礵錛屽茍璁╃涓琛屽拰鍒楃殑鍊間粠0寮濮嬪闀褲?
- 鎵弿涓ゅ瓧絎︿覆錛坣*m綰х殑錛夛紝濡傛灉錛歴tr1[i] == str2[j]錛岀敤temp璁板綍瀹冿紝涓?銆傚惁鍒檛emp璁頒負1銆傜劧鍚庡湪鐭╅樀d[i][j]璧嬩簬d[i-1][j]+1 銆乨[i][j-1]+1銆乨[i-1][j-1]+temp涓夎呯殑鏈灝忓箋?
- 鎵弿瀹屽悗錛岃繑鍥炵煩闃電殑鏈鍚庝竴涓煎嵆d[n][m]
鏈鍚庤繑鍥炵殑鏄畠浠殑璺濈銆傛庝箞鏍規嵁榪欎釜璺濈姹傚嚭鐩鎬技搴﹀憿錛熷洜涓哄畠浠殑鏈澶ц窛紱誨氨鏄袱瀛楃涓查暱搴︾殑鏈澶у箋傚瀛楃涓蹭笉鏄緢鏁忔劅銆傜幇鎴戞妸鐩鎬技搴﹁綆楀叕寮忓畾涓?-瀹冧滑鐨勮窛紱?瀛楃涓查暱搴︽渶澶у箋?/p>
婧愮爜錛?/p>
package com.chenlb.algorithm;
/**
* 緙栬緫璺濈鐨勪袱瀛楃涓茬浉浼煎害
*
* @author chenlb 2008-6-24 涓嬪崍06:41:55
*/
public class Similarity {
private int min(int one, int two, int three) {
int min = one;
if(two < min) {
min = two;
}
if(three < min) {
min = three;
}
return min;
}
public int ld(String str1, String str2) {
int d[][]; //鐭╅樀
int n = str1.length();
int m = str2.length();
int i; //閬嶅巻str1鐨?/span>
int j; //閬嶅巻str2鐨?/span>
char ch1; //str1鐨?/span>
char ch2; //str2鐨?/span>
int temp; //璁板綍鐩稿悓瀛楃,鍦ㄦ煇涓煩闃典綅緗肩殑澧為噺,涓嶆槸0灝辨槸1
if(n == 0) {
return m;
}
if(m == 0) {
return n;
}
d = new int[n+1][m+1];
for(i=0; i<=n; i++) { //鍒濆鍖栫涓鍒?/span>
d[i][0] = i;
}
for(j=0; j<=m; j++) { //鍒濆鍖栫涓琛?/span>
d[0][j] = j;
}
for(i=1; i<=n; i++) { //閬嶅巻str1
ch1 = str1.charAt(i-1);
//鍘誨尮閰峴tr2
for(j=1; j<=m; j++) {
ch2 = str2.charAt(j-1);
if(ch1 == ch2) {
temp = 0;
} else {
temp = 1;
}
//宸﹁竟+1,涓婅竟+1, 宸︿笂瑙?temp鍙栨渶灝?/span>
d[i][j] = min(d[i-1][j]+1, d[i][j-1]+1, d[i-1][j-1]+temp);
}
}
return d[n][m];
}
public double sim(String str1, String str2) {
int ld = ld(str1, str2);
return 1 - (double) ld / Math.max(str1.length(), str2.length());
}
public static void main(String[] args) {
Similarity s = new Similarity();
String str1 = "chenlb.blogjava.net";
String str2 = "chenlb.javaeye.com";
System.out.println("ld="+s.ld(str1, str2));
System.out.println("sim="+s.sim(str1, str2));
}
}
涓嶇煡sim鏂規硶涓殑鍏紡鏄悎鐞嗭紝涓漢璁や負宸己浜烘剰鎬濓紝^_^
鍙傝? http://www.merriampark.com/ld.htm

]]>