Decode360's Blog

          業精于勤而荒于嬉 QQ:150355677 MSN:decode360@hotmail.com

            BlogJava :: 首頁 :: 新隨筆 :: 聯系 ::  :: 管理 ::
            302 隨筆 :: 26 文章 :: 82 評論 :: 0 Trackbacks
          取重復記錄中的任一條的分析
          ?
          ?
          ??? 最近一直考慮一個問題,找出表中同一字段重復記錄中的任意一條,模擬的環境如下:
          ?
          ??? create table t1_t(a int,b varchar2(50));
          ??? insert into t1_t(a,b) (select mod(rownum,50) rn,object_name from user_objects);
          ??? insert into t1_t(a,b) (select * from t1_t);
          ??? insert into t1_t(a,b) (select * from t1_t);
          ??? insert into t1_t(a,b) (select * from t1_t);
          ??? insert into t1_t(a,b) (select * from t1_t);
          ??? commit;
          ?
          ??? 至此t1_t表大致上有300萬條記錄左右,其中a字段的取值都是0-49之間的integer,現在的目標就是取到50條記錄,a分別等于0~49,具體哪一條都可以,但需要同時取出b字段。
          ?
          ?
          一、方法大致有兩種:
          ?
          1、使用row_number() over函數找到首條記錄選出
          ?
          ??? select * from?
          ??? (select a,b,row_number() over(partition by a order by 1) k from t1_t)
          ??? where k=1;
          ?
          ??? 實際執行時間:在軟解析的情況下需要3s左右。執行計劃如下:
          ?
          Execution Plan
          ----------------------------------------------------------
          ?? 0????? SELECT STATEMENT Optimizer=ALL_ROWS (Cost=32068 Card=3632040 Bytes=192498120)
          ?? 1??? 0?? VIEW (Cost=32068 Card=3632040 Bytes=192498120)
          ?? 2??? 1???? WINDOW (SORT PUSHED RANK) (Cost=32068 Card=3632040 Bytes=101697120)
          ?? 3??? 2?????? TABLE ACCESS (FULL) OF 'T1_T' (TABLE) (Cost=3255 Card=3632040 Bytes=101697120)
          ?
          Statistics
          ----------------------------------------------------------
          ??????? 165? recursive calls
          ???????? 22? db block gets
          ????? 17303? consistent gets
          ?????? 1937? physical reads
          ????? 75904? redo size
          ?????? 2682? bytes sent via SQL*Net to client
          ??????? 537? bytes received via SQL*Net from client
          ????????? 5? SQL*Net roundtrips to/from client
          ????????? 4? sorts (memory)
          ????????? 1? sorts (disk)
          ???????? 50? rows processed
          ?
          2、兩層關聯,用rowid別名進行匹配:
          ?
          ??? select t1.* from t1_t t1,(select min(rowid) rid from t1_t group by a) t2
          ??? where t1.rowid = t2.rid;
          ?
          ??? 實際執行時間:軟解析情況下1.5s左右。執行計劃如下,比上面的方法要好很多,主要是由于不需要進行排序的操作:
          ?
          Execution Plan
          ----------------------------------------------------------
          ?? 0????? SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3919 Card=50 Bytes=2000)
          ?? 1??? 0?? NESTED LOOPS (Cost=3919 Card=50 Bytes=2000)
          ?? 2??? 1???? VIEW (Cost=3869 Card=50 Bytes=600)
          ?? 3??? 2?????? HASH (GROUP BY) (Cost=3869 Card=50 Bytes=750)
          ?? 4??? 3???????? TABLE ACCESS (FULL) OF 'T1_T' (TABLE) (Cost=3243 Card=3632040 Bytes=54480600)
          ?? 5??? 1???? TABLE ACCESS (BY USER ROWID) OF 'T1_T' (TABLE) (Cost=1 Card=1 Bytes=28)
          ?
          Statistics
          ----------------------------------------------------------
          ????????? 1? recursive calls
          ????????? 0? db block gets
          ????? 16278? consistent gets
          ????????? 0? physical reads
          ????????? 0? redo size
          ?????? 2638? bytes sent via SQL*Net to client
          ??????? 537? bytes received via SQL*Net from client
          ????????? 5? SQL*Net roundtrips to/from client
          ????????? 0? sorts (memory)
          ????????? 0? sorts (disk)
          ???????? 50? rows processed
          ?
          ?
          二、考慮使用索引
          ?
          ?
          ??? 首先增加a、b字段的索引:
          ?

          ??? create index t1_t_a on t1_t(a);

          ??? create index t1_t_b on t1_t(b);

          ???
          ??? 要使用索引,需要將字段設置為not null,或在SQL中使用not null選項才可以:
          ?

          ??? alter table t1_t modify a not null ;

          ??? alter table t1_t modify b not null ;

          ?
          ?
          1、對于第1種方法,執行計劃完全沒變,執行效率略有提高
          ?
          Execution Plan
          ----------------------------------------------------------
          ?? 0????? SELECT STATEMENT Optimizer=ALL_ROWS (Cost=32068 Card=3632040 Bytes=192498120)
          ?? 1??? 0?? VIEW (Cost=32068 Card=3632040 Bytes=192498120)
          ?? 2??? 1???? WINDOW (SORT PUSHED RANK) (Cost=32068 Card=3632040 Bytes=101697120)
          ?? 3??? 2?????? TABLE ACCESS (FULL) OF 'T1_T' (TABLE) (Cost=3255 Card=3632040 Bytes=101697120)
          ?
          Statistics
          ----------------------------------------------------------
          ????????? 2? recursive calls
          ???????? 22? db block gets
          ????? 16228? consistent gets
          ???????? 21? physical reads
          ????????? 0? redo size
          ?????? 2682? bytes sent via SQL*Net to client
          ??????? 537? bytes received via SQL*Net from client
          ????????? 5? SQL*Net roundtrips to/from client
          ????????? 0? sorts (memory)
          ????????? 1? sorts (disk)
          ???????? 50? rows processed
          ?
          2、對于第2中方法,用INDEX FAST FULL SCAN 代替了 TABLE ACCESS FULL SCAN,執行效率也略有提高
          ?
          Execution Plan
          ----------------------------------------------------------
          ?? 0????? SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2114 Card=50 Bytes=2000)
          ?? 1??? 0?? NESTED LOOPS (Cost=2114 Card=50 Bytes=2000)
          ?? 2??? 1???? VIEW (Cost=2064 Card=50 Bytes=600)
          ?? 3??? 2?????? HASH (GROUP BY) (Cost=2064 Card=50 Bytes=750)
          ?? 4??? 3???????? INDEX (FAST FULL SCAN) OF 'T1_T_A' (INDEX) (Cost=1439 Card=3632040 Bytes=54480600)
          ?? 5??? 1???? TABLE ACCESS (BY USER ROWID) OF 'T1_T' (TABLE) (Cost=1 Card=1 Bytes=28)

          Statistics
          ----------------------------------------------------------
          ??????? 210? recursive calls
          ????????? 0? db block gets
          ?????? 7145? consistent gets
          ????????? 0? physical reads
          ????????? 0? redo size
          ?????? 2638? bytes sent via SQL*Net to client
          ??????? 537? bytes received via SQL*Net from client
          ????????? 5? SQL*Net roundtrips to/from client
          ????????? 5? sorts (memory)
          ????????? 0? sorts (disk)
          ???????? 50? rows processed
          ?
          ?
          三、強制使用索引
          ?
          1、對于方法1,使用索引t1_t_a
          ?

          ??? select * from

          ??? ( select /*+INDEX(t1_t t1_t_a)*/ a,b,row_number() over( partition by a order by 1 ) k from t1_t)

          ??? where k= 1 ;


          Execution Plan
          ----------------------------------------------------------
          ?? 0????? SELECT STATEMENT Optimizer=ALL_ROWS (Cost=844081 Card=3632040 Bytes=192498120)
          ?? 1??? 0?? VIEW (Cost=844081 Card=3632040 Bytes=192498120)
          ?? 2??? 1???? WINDOW (NOSORT) (Cost=844081 Card=3632040 Bytes=101697120)
          ?? 3??? 2?????? TABLE ACCESS (BY INDEX ROWID) OF 'T1_T' (TABLE) (Cost=815268 Card=3632040Bytes=101697120)
          ?
          ?? 4??? 3???????? INDEX (FULL SCAN) OF 'T1_T_A' (INDEX) (Cost=7154 Card=3632040)

          Statistics
          ----------------------------------------------------------
          ????????? 0? recursive calls
          ????????? 0? db block gets
          ???? 814027? consistent gets
          ????????? 0? physical reads
          ????????? 0? redo size
          ?????? 2817? bytes sent via SQL*Net to client
          ??????? 537? bytes received via SQL*Net from client
          ????????? 5? SQL*Net roundtrips to/from client
          ????????? 0? sorts (memory)
          ????????? 0? sorts (disk)
          ???????? 50? rows processed
          ?
          ??? 相比沒有使用索引時的效率還要低的多,因為是全表掃描,所以使用全索引掃描造成了額外的開銷。
          ?
          2、對于方法2:
          ?
          ??? 將INDEX FAST FULL SCAN 改成INDEX FULL SCAN 明顯沒有什么意義。
          ?
          ?
          四、a,b的聯合索引:
          ?
          ??? 首先創建a,b的聯合索引:
          ??? createindex t1_t_ab on t1_t(a,b);
          ?
          1、對方法1,使用索引t1_t_ab
          ?
          ?

          ??? select * from

          ??? ( select /*+INDEX(t1_t t1_t_ab)*/ a,b,row_number() over( partition by a order by 1 ) k from t1_t)

          ??? where k= 1 ;

          ?
          Execution Plan
          ----------------------------------------------------------
          ?? 0????? SELECT STATEMENT Optimizer=ALL_ROWS (Cost=48327 Card=3632040 Bytes=192498120)
          ?? 1??? 0?? VIEW (Cost=48327 Card=3632040 Bytes=192498120)
          ?? 2??? 1???? WINDOW (NOSORT) (Cost=48327 Card=3632040 Bytes=101697120)
          ?? 3??? 2?????? INDEX (FULL SCAN) OF 'T1_T_AB' (INDEX) (Cost=19514 Card=3632040 Bytes=1016
          ????????? 97120)
          ?
          Statistics
          ----------------------------------------------------------
          ????????? 1? recursive calls
          ????????? 0? db block gets
          ????? 19368? consistent gets
          ????????? 0? physical reads
          ????????? 0? redo size
          ?????? 2783? bytes sent via SQL*Net to client
          ??????? 537? bytes received via SQL*Net from client
          ????????? 5? SQL*Net roundtrips to/from client
          ????????? 0? sorts (memory)
          ????????? 0? sorts (disk)
          ???????? 50? rows processed
          ?
          ??? 相比沒有使用索引的時候,性能有所降低,但是比使用t1_t_a索引要好的多了,但是如果對于a字段進行限制時,row_number() over函數會自動找到t1_t_ab索引,效果會很好
          ?

          ??? select * from

          ??? ( select a,b,row_number() over( partition by a order by 1 ) k from t1_twhere a='1' )

          ??? where k= 1 ;

          ?
          Execution Plan
          ----------------------------------------------------------
          ?? 0????? SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1119 Card=84004 Bytes=4452212)
          ?? 1??? 0?? VIEW (Cost=1119 Card=84004 Bytes=4452212)
          ?? 2??? 1???? WINDOW (NOSORT) (Cost=1119 Card=84004 Bytes=2352112)
          ?? 3??? 2?????? INDEX (RANGE SCAN) OF 'T1_T_AB' (INDEX) (Cost=453 Card=84004 Bytes=2352112)
          ?
          Statistics
          ----------------------------------------------------------
          ????????? 1? recursive calls
          ????????? 0? db block gets
          ??????? 410? consistent gets
          ????????? 0? physical reads
          ????????? 0? redo size
          ??????? 537? bytes sent via SQL*Net to client
          ??????? 504? bytes received via SQL*Net from client
          ????????? 2? SQL*Net roundtrips to/from client
          ????????? 0? sorts (memory)
          ????????? 0? sorts (disk)
          ????????? 1? rows processed
          ?
          ?
          五、總結
          ?
          ??? 基本上如果是要在全表中找到所有a的單條記錄,那么加不加索引的區別都不大,因為全表掃描是沒有必要走索引的,但是用方法2會比方法1要快速很多,資源占用也較少,原因是不需要排序的環節。
          ?
          ??? 如果只是想要取出單個a值的任意記錄,則可以選用方法1,并建立a,b字段的聯合索引,會大大提高效率。
          ?
          ?




          -The End-

          posted on 2009-03-04 21:57 decode360-3 閱讀(390) 評論(0)  編輯  收藏 所屬分類: SQL Dev
          主站蜘蛛池模板: 正镶白旗| 长阳| 金堂县| 陕西省| 嘉义市| 南昌县| 榆中县| 湟源县| 冀州市| 酒泉市| 韶山市| 扶沟县| 墨江| 上杭县| 三台县| 左权县| 从化市| 斗六市| 开原市| 三江| 江口县| 商南县| 安岳县| 图木舒克市| 云阳县| 柏乡县| 太谷县| 巴彦淖尔市| 克东县| 扬州市| 象山县| 仪陇县| 沾化县| 木兰县| 青阳县| 于田县| 泉州市| 南川市| 玉树县| 康平县| 五大连池市|