前兩天看了一個文本比較的算法,算法的思路我就不多說了,主要說下我的實現。算法參考:
文本比較算法剖析(1)-如何確定最大匹配率
文本比較算法剖析(2)-如何確定最優匹配路徑
我的實現步驟是:
1、計算出所有可行的路徑
如下圖中,N(l,r)所在的位置如果該位置匹配,則肯定要走A區。在掃描A區所有的可行路徑,可行路徑的標準是下一個可行并且匹配的點,在這一步先不考慮不普配的點。

關鍵代碼 :
public List<CharPoint> getNextPoints(boolean[][] comps, CharPoint p) {
if (comps == null || p == null)
throw new IllegalArgumentException("參數不能為空。");
int maxY = 0;
List<CharPoint> cps = new ArrayList<CharPoint>();
for (int i = p.getX() + 1; i < comps.length; i++) {
if (maxY == 0 && p.getX() + 1 == i) {
maxY = comps[i].length - 1;
}
int maxJ = maxY;
boolean bo = false;
for (int j = p.getY() + 1; j <= maxJ; j++) {
if (comps[i][j]) {
if (!bo) {
bo = true;
maxY = j;
}
CharPoint cp = new CharPoint(i, j);
cp.setFromPoint(p);
cps.add(cp);
}
}
}
return cps;
}
if (comps == null || p == null)
throw new IllegalArgumentException("參數不能為空。");
int maxY = 0;
List<CharPoint> cps = new ArrayList<CharPoint>();
for (int i = p.getX() + 1; i < comps.length; i++) {
if (maxY == 0 && p.getX() + 1 == i) {
maxY = comps[i].length - 1;
}
int maxJ = maxY;
boolean bo = false;
for (int j = p.getY() + 1; j <= maxJ; j++) {
if (comps[i][j]) {
if (!bo) {
bo = true;
maxY = j;
}
CharPoint cp = new CharPoint(i, j);
cp.setFromPoint(p);
cps.add(cp);
}
}
}
return cps;
}
2、計算出最大匹配的路徑
獲得所有可行的路徑后再對所有路徑的節點數進行判斷,節點最多的則是最大匹配路徑。但有可能有幾個最大匹配路徑。
關鍵代碼:
for (CharPoint cp : map.keySet()) {
// System.out.println(map.get(cp));
if (map.get(cp) == max) {
cps.add(cp);
}
}
for (CharPoint cp : cps) {
cp = TextComparerUtil.copyCharPoint(cp);
while (cp.getFromPoint() != null) {
cp.getFromPoint().setNext(cp);
cp = cp.getFromPoint();
}
cps2.add(cp);
}
在計算完最大匹配路徑后在對路徑進行補充。使路徑完整// System.out.println(map.get(cp));
if (map.get(cp) == max) {
cps.add(cp);
}
}
for (CharPoint cp : cps) {
cp = TextComparerUtil.copyCharPoint(cp);
while (cp.getFromPoint() != null) {
cp.getFromPoint().setNext(cp);
cp = cp.getFromPoint();
}
cps2.add(cp);
}
關鍵代碼:
public static List<CharPoint> applySetpBySetpPath(boolean[][] comps,
List<CharPoint> cps) {
for (CharPoint cp : cps) {
CharPoint next = cp.getNext();
applySetpPath(comps, cp, next);
}
return cps;
}
private static void applySetpPath(boolean[][] comps, CharPoint cp,
CharPoint next) {
while (next != null) {
if (cp.getX() >= 0 && cp.getY() >= 0
&& cp.getX() + 1 < comps.length
&& cp.getY() + 1 < comps[cp.getX()].length
&& comps[cp.getX()][cp.getY()]) {
cp.setNext(new CharPoint(cp.getX() + 1, cp.getY() + 1));
cp = cp.getNext();
}
for (int i = cp.getX() + 1; i <= next.getX(); i++) {
cp.setNext(new CharPoint(i, cp.getY()));
cp = cp.getNext();
}
for (int i = cp.getY() + 1; i <= next.getY(); i++) {
cp.setNext(new CharPoint(cp.getX(), i));
cp = cp.getNext();
}
next = next.getNext();
}
}
public static CharPoint applyToEnd(boolean[][] comps, CharPoint cp) {
while (cp.getNext() != null) {
cp = cp.getNext();
}
CharPoint next = new CharPoint(comps.length - 1, comps[0].length - 1);
applySetpPath(comps, cp, next);
return cp;
}
List<CharPoint> cps) {
for (CharPoint cp : cps) {
CharPoint next = cp.getNext();
applySetpPath(comps, cp, next);
}
return cps;
}
private static void applySetpPath(boolean[][] comps, CharPoint cp,
CharPoint next) {
while (next != null) {
if (cp.getX() >= 0 && cp.getY() >= 0
&& cp.getX() + 1 < comps.length
&& cp.getY() + 1 < comps[cp.getX()].length
&& comps[cp.getX()][cp.getY()]) {
cp.setNext(new CharPoint(cp.getX() + 1, cp.getY() + 1));
cp = cp.getNext();
}
for (int i = cp.getX() + 1; i <= next.getX(); i++) {
cp.setNext(new CharPoint(i, cp.getY()));
cp = cp.getNext();
}
for (int i = cp.getY() + 1; i <= next.getY(); i++) {
cp.setNext(new CharPoint(cp.getX(), i));
cp = cp.getNext();
}
next = next.getNext();
}
}
public static CharPoint applyToEnd(boolean[][] comps, CharPoint cp) {
while (cp.getNext() != null) {
cp = cp.getNext();
}
CharPoint next = new CharPoint(comps.length - 1, comps[0].length - 1);
applySetpPath(comps, cp, next);
return cp;
}
3、計算最優路徑
計算最優路徑的方法很簡單,按照作者的算法的介紹,計算最優路徑的方法就是最后匹配的節點離root(0,0)點最近的那條路徑
4、計算差異
計算差異也很簡單,就不介紹了,直接上代碼
TextDiff diffdel = null;
TextDiff diffins = null;
while (root != null) {
if (root.getNext() != null) {
CharPoint next = root.getNext();
if (root.getX() == next.getX() && root.getY() != next.getY()) {
if (diffdel == null) {
int start = root.getY();
if (comps[root.getX()][root.getY()]) {
start = next.getY();
}
diffdel = new TextDiff(TextDiff.TYPE_DELETED, start, 0);
}
diffdel.setDiffLength(diffdel.getDiffLength() + 1);
}
if (root.getY() == next.getY() && root.getX() != next.getX()) {
if (diffins == null) {
int start = root.getX();
if (comps[root.getX()][root.getY()]) {
start = next.getX();
}
diffins = new TextDiff(TextDiff.TYPE_INSERTED, start, 0);
}
diffins.setDiffLength(diffins.getDiffLength() + 1);
}
if (root.getX() < comps.length
&& root.getY() < comps[root.getX()].length
&& comps[next.getX()][next.getY()]) {
if (diffdel != null && diffdel.getDiffLength() != 0) {
diffs.add(diffdel);
}
if (diffins != null && diffins.getDiffLength() != 0) {
diffs.add(diffins);
}
diffdel = null;
diffins = null;
}
}
root = root.getNext();
if (root != null && root.getNext() == null) {
if (diffdel != null && diffdel.getDiffLength() != 0) {
diffs.add(diffdel);
}
if (diffins != null && diffins.getDiffLength() != 0) {
diffs.add(diffins);
}
}
}
TextDiff diffins = null;
while (root != null) {
if (root.getNext() != null) {
CharPoint next = root.getNext();
if (root.getX() == next.getX() && root.getY() != next.getY()) {
if (diffdel == null) {
int start = root.getY();
if (comps[root.getX()][root.getY()]) {
start = next.getY();
}
diffdel = new TextDiff(TextDiff.TYPE_DELETED, start, 0);
}
diffdel.setDiffLength(diffdel.getDiffLength() + 1);
}
if (root.getY() == next.getY() && root.getX() != next.getX()) {
if (diffins == null) {
int start = root.getX();
if (comps[root.getX()][root.getY()]) {
start = next.getX();
}
diffins = new TextDiff(TextDiff.TYPE_INSERTED, start, 0);
}
diffins.setDiffLength(diffins.getDiffLength() + 1);
}
if (root.getX() < comps.length
&& root.getY() < comps[root.getX()].length
&& comps[next.getX()][next.getY()]) {
if (diffdel != null && diffdel.getDiffLength() != 0) {
diffs.add(diffdel);
}
if (diffins != null && diffins.getDiffLength() != 0) {
diffs.add(diffins);
}
diffdel = null;
diffins = null;
}
}
root = root.getNext();
if (root != null && root.getNext() == null) {
if (diffdel != null && diffdel.getDiffLength() != 0) {
diffs.add(diffdel);
}
if (diffins != null && diffins.getDiffLength() != 0) {
diffs.add(diffins);
}
}
}
(代碼有BUG。。。。暫停下載)
順便提前祝各位新春快樂,新年吉祥,和家安康