Decode360's Blog

          業精于勤而荒于嬉 QQ:150355677 MSN:decode360@hotmail.com

            BlogJava :: 首頁 :: 新隨筆 :: 聯系 ::  :: 管理 ::
            397 隨筆 :: 33 文章 :: 29 評論 :: 0 Trackbacks
          取重復記錄中的任一條的分析
          ?
          ??? 最近一直考慮一個問題,找出表中同一字段重復記錄中的任意一條,照理來說這樣的一個需求,在Oracle內部進行實現是很方便的,而且不需要對表進行2遍的掃描。但是事實上我想了很久也向不出來有什么函數可以直接實現這一功能,基本上所有可以這樣做的方法都需要進行嵌套才能完成。不知道為什么Oracle沒有提供這個功能,也許是有什么邏輯矛盾我沒有想到。現在總結一下,模擬的環境如下:
          ?
          ??? create table t1_t(a int,b varchar2(50));
          ??? insert into t1_t(a,b) (select mod(rownum,50) rn,object_name from user_objects);
          ??? insert into t1_t(a,b) (select * from t1_t);
          ??? insert into t1_t(a,b) (select * from t1_t);
          ??? insert into t1_t(a,b) (select * from t1_t);
          ??? insert into t1_t(a,b) (select * from t1_t);
          ??? commit;
          ?
          ??? 至此t1_t表大致上有300萬條記錄左右,其中a字段的取值都是0-49之間的integer,現在的目標就是取到50條記錄,a分別等于0~49,具體哪一條都可以,但需要同時取出b字段。
          ?
          ?
          一、方法大致有兩種:
          ?
          1、使用row_number() over函數找到首條記錄選出
          ?
          ??? select * from
          ??? (select a,b,row_number() over(partition by a order by 1) k from t1_t)
          ??? where k=1;
          ?
          ??? 實際執行時間:在軟解析的情況下需要3s左右。執行計劃如下:
          ?
          Execution Plan
          ----------------------------------------------------------
          ?? 0????? SELECT STATEMENT Optimizer=ALL_ROWS (Cost=32068 Card=3632040 Bytes=192498120)
          ?? 1??? 0?? VIEW (Cost=32068 Card=3632040 Bytes=192498120)
          ?? 2??? 1???? WINDOW (SORT PUSHED RANK) (Cost=32068 Card=3632040 Bytes=101697120)
          ?? 3??? 2?????? TABLE ACCESS (FULL) OF 'T1_T' (TABLE) (Cost=3255 Card=3632040 Bytes=101697120)
          ?
          Statistics
          ----------------------------------------------------------
          ??????? 165? recursive calls
          ???????? 22? db block gets
          ????? 17303? consistent gets
          ?????? 1937? physical reads
          ????? 75904? redo size
          ?????? 2682? bytes sent via SQL*Net to client
          ??????? 537? bytes received via SQL*Net from client
          ????????? 5? SQL*Net roundtrips to/from client
          ????????? 4? sorts (memory)
          ????????? 1? sorts (disk)
          ???????? 50? rows processed
          ?
          2、兩層關聯,用rowid別名進行匹配:
          ?
          ??? select t1.* from t1_t t1,(select min(rowid) rid from t1_t group by a) t2
          ??? where t1.rowid = t2.rid;
          ?
          ??? 實際執行時間:軟解析情況下1.5s左右。執行計劃如下,比上面的方法要好很多,主要是由于不需要進行排序的操作:
          ?
          Execution Plan
          ----------------------------------------------------------
          ?? 0????? SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3919 Card=50 Bytes=2000)
          ?? 1??? 0?? NESTED LOOPS (Cost=3919 Card=50 Bytes=2000)
          ?? 2??? 1???? VIEW (Cost=3869 Card=50 Bytes=600)
          ?? 3??? 2?????? HASH (GROUP BY) (Cost=3869 Card=50 Bytes=750)
          ?? 4??? 3???????? TABLE ACCESS (FULL) OF 'T1_T' (TABLE) (Cost=3243 Card=3632040 Bytes=54480600)
          ?? 5??? 1???? TABLE ACCESS (BY USER ROWID) OF 'T1_T' (TABLE) (Cost=1 Card=1 Bytes=28)
          ?
          Statistics
          ----------------------------------------------------------
          ????????? 1? recursive calls
          ????????? 0? db block gets
          ????? 16278? consistent gets
          ????????? 0? physical reads
          ????????? 0? redo size
          ?????? 2638? bytes sent via SQL*Net to client
          ??????? 537? bytes received via SQL*Net from client
          ????????? 5? SQL*Net roundtrips to/from client
          ????????? 0? sorts (memory)
          ????????? 0? sorts (disk)
          ???????? 50? rows processed
          ?
          ?
          二、考慮使用索引
          ?
          ?
          ??? 首先增加a、b字段的索引:
          ?

          ??? create index t1_t_a on t1_t(a);

          ??? create index t1_t_b on t1_t(b);

          ???
          ??? 要使用索引,需要將字段設置為not null,或在SQL中使用not null選項才可以:
          ?

          ??? alter table t1_t modify a not null ;

          ??? alter table t1_t modify b not null ;

          ?
          ?
          1、對于第1種方法,執行計劃完全沒變,執行效率略有提高
          ?
          Execution Plan
          ----------------------------------------------------------
          ?? 0????? SELECT STATEMENT Optimizer=ALL_ROWS (Cost=32068 Card=3632040 Bytes=192498120)
          ?? 1??? 0?? VIEW (Cost=32068 Card=3632040 Bytes=192498120)
          ?? 2??? 1???? WINDOW (SORT PUSHED RANK) (Cost=32068 Card=3632040 Bytes=101697120)
          ?? 3??? 2?????? TABLE ACCESS (FULL) OF 'T1_T' (TABLE) (Cost=3255 Card=3632040 Bytes=101697120)
          ?
          Statistics
          ----------------------------------------------------------
          ????????? 2? recursive calls
          ???????? 22? db block gets
          ????? 16228? consistent gets
          ???????? 21? physical reads
          ????????? 0? redo size
          ?????? 2682? bytes sent via SQL*Net to client
          ??????? 537? bytes received via SQL*Net from client
          ????????? 5? SQL*Net roundtrips to/from client
          ????????? 0? sorts (memory)
          ????????? 1? sorts (disk)
          ???????? 50? rows processed
          ?
          2、對于第2中方法,用INDEX FAST FULL SCAN 代替了 TABLE ACCESS FULL SCAN,執行效率也略有提高
          ?
          Execution Plan
          ----------------------------------------------------------
          ?? 0????? SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2114 Card=50 Bytes=2000)
          ?? 1??? 0?? NESTED LOOPS (Cost=2114 Card=50 Bytes=2000)
          ?? 2??? 1???? VIEW (Cost=2064 Card=50 Bytes=600)
          ?? 3??? 2?????? HASH (GROUP BY) (Cost=2064 Card=50 Bytes=750)
          ?? 4??? 3???????? INDEX (FAST FULL SCAN) OF 'T1_T_A' (INDEX) (Cost=1439 Card=3632040 Bytes=54480600)
          ?? 5??? 1???? TABLE ACCESS (BY USER ROWID) OF 'T1_T' (TABLE) (Cost=1 Card=1 Bytes=28)

          Statistics
          ----------------------------------------------------------
          ??????? 210? recursive calls
          ????????? 0? db block gets
          ?????? 7145? consistent gets
          ????????? 0? physical reads
          ????????? 0? redo size
          ?????? 2638? bytes sent via SQL*Net to client
          ??????? 537? bytes received via SQL*Net from client
          ????????? 5? SQL*Net roundtrips to/from client
          ????????? 5? sorts (memory)
          ????????? 0? sorts (disk)
          ???????? 50? rows processed
          ?
          ?
          三、強制使用索引
          ?
          1、對于方法1,使用索引t1_t_a
          ?

          ??? select * from

          ??? ( select /*+INDEX(t1_t t1_t_a)*/ a,b,row_number() over( partition by a order by 1 ) k from t1_t)

          ??? where k= 1 ;


          Execution Plan
          ----------------------------------------------------------
          ?? 0????? SELECT STATEMENT Optimizer=ALL_ROWS (Cost=844081 Card=3632040 Bytes=192498120)
          ?? 1??? 0?? VIEW (Cost=844081 Card=3632040 Bytes=192498120)
          ?? 2??? 1???? WINDOW (NOSORT) (Cost=844081 Card=3632040 Bytes=101697120)
          ?? 3??? 2?????? TABLE ACCESS (BY INDEX ROWID) OF 'T1_T' (TABLE) (Cost=815268 Card=3632040Bytes=101697120)
          ?
          ?? 4??? 3???????? INDEX (FULL SCAN) OF 'T1_T_A' (INDEX) (Cost=7154 Card=3632040)

          Statistics
          ----------------------------------------------------------
          ????????? 0? recursive calls
          ????????? 0? db block gets
          ???? 814027? consistent gets
          ????????? 0? physical reads
          ????????? 0? redo size
          ?????? 2817? bytes sent via SQL*Net to client
          ??????? 537? bytes received via SQL*Net from client
          ????????? 5? SQL*Net roundtrips to/from client
          ????????? 0? sorts (memory)
          ????????? 0? sorts (disk)
          ???????? 50? rows processed
          ?
          ??? 相比沒有使用索引時的效率還要低的多,因為是全表掃描,所以使用全索引掃描造成了額外的開銷。
          ?
          2、對于方法2:
          ?
          ??? 將INDEX FAST FULL SCAN 改成INDEX FULL SCAN 明顯沒有什么意義。
          ?
          ?
          四、a,b的聯合索引:
          ?
          ??? 首先創建a,b的聯合索引:
          ??? createindex t1_t_ab on t1_t(a,b);
          ?
          1、對方法1,使用索引t1_t_ab
          ?
          ?

          ??? select * from

          ??? ( select /*+INDEX(t1_t t1_t_ab)*/ a,b,row_number() over( partition by a order by 1 ) k from t1_t)

          ??? where k= 1 ;

          ?
          Execution Plan
          ----------------------------------------------------------
          ?? 0????? SELECT STATEMENT Optimizer=ALL_ROWS (Cost=48327 Card=3632040 Bytes=192498120)
          ?? 1??? 0?? VIEW (Cost=48327 Card=3632040 Bytes=192498120)
          ?? 2??? 1???? WINDOW (NOSORT) (Cost=48327 Card=3632040 Bytes=101697120)
          ?? 3??? 2?????? INDEX (FULL SCAN) OF 'T1_T_AB' (INDEX) (Cost=19514 Card=3632040 Bytes=1016
          ????????? 97120)
          ?
          Statistics
          ----------------------------------------------------------
          ????????? 1? recursive calls
          ????????? 0? db block gets
          ????? 19368? consistent gets
          ????????? 0? physical reads
          ????????? 0? redo size
          ?????? 2783? bytes sent via SQL*Net to client
          ??????? 537? bytes received via SQL*Net from client
          ????????? 5? SQL*Net roundtrips to/from client
          ????????? 0? sorts (memory)
          ????????? 0? sorts (disk)
          ???????? 50? rows processed
          ?
          ??? 相比沒有使用索引的時候,性能有所降低,但是比使用t1_t_a索引要好的多了,但是如果對于a字段進行限制時,row_number() over函數會自動找到t1_t_ab索引,效果會很好
          ?

          ??? select * from

          ??? ( select a,b,row_number() over( partition by a order by 1 ) k from t1_twhere a='1' )

          ??? where k= 1 ;

          ?
          Execution Plan
          ----------------------------------------------------------
          ?? 0????? SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1119 Card=84004 Bytes=4452212)
          ?? 1??? 0?? VIEW (Cost=1119 Card=84004 Bytes=4452212)
          ?? 2??? 1???? WINDOW (NOSORT) (Cost=1119 Card=84004 Bytes=2352112)
          ?? 3??? 2?????? INDEX (RANGE SCAN) OF 'T1_T_AB' (INDEX) (Cost=453 Card=84004 Bytes=2352112)
          ?
          Statistics
          ----------------------------------------------------------
          ????????? 1? recursive calls
          ????????? 0? db block gets
          ??????? 410? consistent gets
          ????????? 0? physical reads
          ????????? 0? redo size
          ??????? 537? bytes sent via SQL*Net to client
          ??????? 504? bytes received via SQL*Net from client
          ????????? 2? SQL*Net roundtrips to/from client
          ????????? 0? sorts (memory)
          ????????? 0? sorts (disk)
          ????????? 1? rows processed
          ?
          ?
          五、總結
          ?
          ??? 基本上如果是要在全表中找到所有a的單條記錄,那么加不加索引的區別都不大,因為全表掃描是沒有必要走索引的,但是用方法2會比方法1要快速很多,資源占用也較少,原因是不需要排序的環節。
          ?
          ??? 如果只是想要取出單個a值的任意記錄,則可以選用方法1,并建立a,b字段的聯合索引,會大大提高效率。
          ?
          ?
          ?
          posted on 2009-03-04 21:57 decode360 閱讀(252) 評論(0)  編輯  收藏 所屬分類: 05.SQL
          主站蜘蛛池模板: 连州市| 北海市| 丽江市| 光泽县| 马鞍山市| 曲周县| 日喀则市| 射洪县| 民和| 建德市| 天门市| 张家川| 嵊泗县| 色达县| 隆子县| 澎湖县| 桦南县| 锡林郭勒盟| 邛崃市| 肥城市| 临武县| 谢通门县| 金川县| 景德镇市| 客服| 萨迦县| 郧西县| 神池县| 达尔| 牙克石市| 平顺县| 乐平市| 石河子市| 万宁市| 密云县| 西乌珠穆沁旗| 麻江县| 论坛| 横峰县| 荆门市| 新化县|