平時工作中可能會遇到當試圖對庫表中的某一列或幾列創建唯一索引時,系統提示 ORA-01452 :不能創建唯一索引,發現重復記錄。

下面總結一下幾種查找和刪除重復記錄的方法(以表CZ為例):

表CZ的結構如下:

SQL> desc cz
            Name Null? Type
            -----------------------------------------
            C1 NUMBER(10)
            C10 NUMBER(5)
            C20 VARCHAR2(3)

刪除重復記錄的方法原理:

(1).在Oracle中,每一條記錄都有一個rowid,rowid在整個數據庫中是唯一的,rowid確定了每條記錄是在Oracle中的哪一個數據文件、塊、行上。

(2).在重復的記錄中,可能所有列的內容都相同,但rowid不會相同,所以只要確定出重復記錄中那些具有最大rowid的就可以了,其余全部刪除。

重復記錄判斷的標準是:

C1,C10和C20這三列的值都相同才算是重復記錄。

經查看表CZ總共有16條記錄:

SQL>set pagesize 100
            SQL>select * from cz;
            C1 C10 C20
            ---------- ---------- ---
            1 2 dsf
            1 2 dsf
            1 2 dsf
            1 2 dsf
            2 3 che
            1 2 dsf
            1 2 dsf
            1 2 dsf
            1 2 dsf
            2 3 che
            2 3 che
            2 3 che
            2 3 che
            3 4 dff
            3 4 dff
            3 4 dff
            4 5 err
            5 3 dar
            6 1 wee
            7 2 zxc
            20 rows selected.

1.查找重復記錄的幾種方法:

(1).SQL>select * from cz group by c1,c10,c20 having count(*) >1;
            C1 C10 C20
            ---------- ---------- ---
            1 2 dsf
            2 3 che
            3 4 dff
            (2).SQL>select distinct * from cz;
            C1 C10 C20
            ---------- ---------- ---
            1 2 dsf
            2 3 che
            3 4 dff
            (3).SQL>select * from cz a where rowid=(select max(rowid)
            from cz where c1=a.c1 and c10=a.c10 and c20=a.c20);
            C1 C10 C20
            ---------- ---------- ---
            1 2 dsf
            2 3 che
            3 4 dff

2.刪除重復記錄的幾種方法:

(1).適用于有大量重復記錄的情況(在C1,C10和C20列上建有索引的時候,用以下語句效率會很高):

SQL>delete cz where (c1,c10,c20) in (select c1,c10,
            c20 from cz group by c1,c10,c20 having count(*)>1) and rowid not in
            (select min(rowid) from cz group by c1,c10,c20 having count(*)>1);
            SQL>delete cz where rowid not in
            (select min(rowid) from cz group by c1,c10,c20);

(2).適用于有少量重復記錄的情況(注意,對于有大量重復記錄的情況,用以下語句效率會很低):

SQL>delete from cz a where a.rowid!=(select max(rowid)
            from cz b where a.c1=b.c1 and a.c10=b.c10 and a.c20=b.c20);
            SQL>delete from cz a where a.rowid<(select max(rowid)
            from cz b where a.c1=b.c1 and a.c10=b.c10 and a.c20=b.c20);
            SQL>delete from cz a where rowid <(select max(rowid)
            from cz where c1=a.c1 and c10=a.c10 and c20=a.c20);

(3).適用于有少量重復記錄的情況(臨時表法):

SQL>create table test as select distinct * from cz;
            (建一個臨時表test用來存放重復的記錄)
            SQL>truncate table cz; (清空cz表的數據,但保留cz表的結構)
            SQL>insert into cz select * from test;
            (再將臨時表test里的內容反插回來)

(4).適用于有大量重復記錄的情況(Exception into 子句法):

采用alter table 命令中的 Exception into 子句也可以確定出庫表中重復的記錄。這種方法稍微麻煩一些,為了使用“excepeion into ”子句,必須首先創建 EXCEPTIONS 表。創建該表的 SQL 腳本文件為 utlexcpt.sql 。對于win2000系統和 UNIX 系統, Oracle 存放該文件的位置稍有不同,在win2000系統下,該腳本文件存放在$ORACLE_HOMEOra90rdbmsadmin 目錄下;而對于 UNIX 系統,該腳本文件存放在$ORACLE_HOME/rdbms/admin 目錄下。

具體步驟如下:

SQL>@?/rdbms/admin/utlexcpt.sql
            Table created.
            SQL>desc exceptions
            Name Null? Type
            -----------------------------------------
            ROW_ID ROWID
            OWNER VARCHAR2(30)
            TABLE_NAME VARCHAR2(30)
            CONSTRAINT VARCHAR2(30)
            SQL>alter table cz add constraint
            cz_unique unique(c1,c10,c20) exceptions into exceptions;
            *
            ERROR at line 1:
            ORA-02299: cannot validate (TEST.CZ_UNIQUE) - duplicate keys found
            SQL>create table dups as select
            * from cz where rowid in (select row_id from exceptions);
            Table created.
            SQL>select * from dups;
            C1 C10 C20
            ---------- ---------- ---
            1 2 dsf
            1 2 dsf
            1 2 dsf
            1 2 dsf
            2 3 che
            1 2 dsf
            1 2 dsf
            1 2 dsf
            1 2 dsf
            2 3 che
            2 3 che
            2 3 che
            2 3 che
            3 4 dff
            3 4 dff
            3 4 dff
            16 rows selected.
            SQL>select row_id from exceptions;
            ROW_ID
            ------------------
            AAAHD/AAIAAAADSAAA
            AAAHD/AAIAAAADSAAB
            AAAHD/AAIAAAADSAAC
            AAAHD/AAIAAAADSAAF
            AAAHD/AAIAAAADSAAH
            AAAHD/AAIAAAADSAAI
            AAAHD/AAIAAAADSAAG
            AAAHD/AAIAAAADSAAD
            AAAHD/AAIAAAADSAAE
            AAAHD/AAIAAAADSAAJ
            AAAHD/AAIAAAADSAAK
            AAAHD/AAIAAAADSAAL
            AAAHD/AAIAAAADSAAM
            AAAHD/AAIAAAADSAAN
            AAAHD/AAIAAAADSAAO
            AAAHD/AAIAAAADSAAP
            16 rows selected.
            SQL>delete from cz where rowid in
            ( select row_id from exceptions);
            16 rows deleted.
            SQL>insert into cz select distinct * from dups;
            3 rows created.
            SQL>select *from cz;
            C1 C10 C20
            ---------- ---------- ---
            1 2 dsf
            2 3 che
            3 4 dff
            4 5 err
            5 3 dar
            6 1 wee
            7 2 zxc
            7 rows selected.

從結果里可以看到重復記錄已經刪除。