凉凉钢琴简谱数字右手:Oracle 如何在亿级记录表中创建索引

来源:百度文库 编辑:偶看新闻 时间:2024/05/09 13:27:15
Oracle 如何在亿级记录表中创建索引
  1. 查看表的具体情况

  查看是不是分区表,有多少个分区、分区字段:

  SQL> col table_name for a20

  SQL> col column_name for a20

  SQL> select a.table_name,a.partitioned,b.partition_count,c.column_name

   2 from user_tables a, user_part_tables b, user_part_key_columns c

   3 where a.table_name="STAT_SUBMIT_CENTER"

   4 and b.table_name="STAT_SUBMIT_CENTER"

   5 and c.name="STAT_SUBMIT_CENTER";

  TABLE_NAME PAR PARTITION_COUNT COLUMN_NAME

  -------------------- --- --------------- --------------------

  STAT_SUBMIT_CENTER YES 50 MSGDATE

  查看已使用的每个分区的大小:

  SQL> select segment_name,partition_name,round(bytes/1024/1024) from user_segments

  where segment_name ="STAT_SUBMIT_CENTER" and bytes/1024/1024>0.25 order by 3 desc;

  SEGMENT_NAME PARTITION_NAME

  SEGMENT_NAME PARTITION_NAME ROUND(BYTES/1024/1024)

  -------------------------- ------------------------------ ----------------------

  STAT_SUBMIT_CENTER STAT_SUBMIT_CENTER_20051101 1722

  STAT_SUBMIT_CENTER STAT_SUBMIT_CENTER_20051021 1488

  STAT_SUBMIT_CENTER STAT_SUBMIT_CENTER_20051111 1440

  STAT_SUBMIT_CENTER STAT_SUBMIT_CENTER_20051121 1355

  STAT_SUBMIT_CENTER STAT_SUBMIT_CENTER_20051221 1335

  STAT_SUBMIT_CENTER STAT_SUBMIT_CENTER_20050Array11 130Array

  STAT_SUBMIT_CENTER STAT_SUBMIT_CENTER_20051211 1253

  STAT_SUBMIT_CENTER STAT_SUBMIT_CENTER_20051201 1247

  STAT_SUBMIT_CENTER STAT_SUBMIT_CENTER_20050Array21 11Array8

  STAT_SUBMIT_CENTER STAT_SUBMIT_CENTER_20060101 1151

  STAT_SUBMIT_CENTER STAT_SUBMIT_CENTER_20060111 1068

  STAT_SUBMIT_CENTER STAT_SUBMIT_CENTER_20051001 1018

  STAT_SUBMIT_CENTER STAT_SUBMIT_CENTER_20051011 865

  STAT_SUBMIT_CENTER STAT_SUBMIT_CENTER_20060121 7Array6

  14 rows selected.

  查看整个表的大小:

  SQL> select segment_name,sum(bytes/1024/1024) from user_segments

  where segment_name ="STAT_SUBMIT_CENTER" group by segment_name;

  SEGMENT_NAME

  SEGMENT_NAME SUM(BYTES/1024/1024)

  -------------------------------- --------------------

  STAT_SUBMIT_CENTER 17234

  查看表的记录数:

  SQL> set timing on

  SQL> select count(*) from STAT_SUBMIT_CENTER;

  COUNT(*)

  ----------

  170341007

  Elapsed: 00:14:18.60

  查看这个表上的索引情况如下:

  table STAT_SUBMIT_CENTER 17234 M

  index IDX_SUBCEN_ADDRUSER 5155 M ADDRUSER

   PK_STAT_SUBMIT_CENTER 10653 M MSGDATE,ADDRUSER,MSGID

  然后,查看一些数据库参数情况:

  SQL> show parameter work

  NAME TYPE VALUE

  NAME TYPE VALUE

  ------------------------------------ ----------- ------------------------------

  workarea_size_policy string AUTO

  SQL> show parameter pga

  NAME TYPE VALUE

  ------------------------------------ ----------- ------------------------------

  pga_aggregate_target big integer 20Array715200

  SQL> select * from dba_temp_files;

  FILE_NAME

  ------------------------------------------------------------------------------------------------------------------------

  FILE_ID TABLESPACE_NAME BYTES BLOCKS STATUS RELATIVE_FNO AUT MAXBYTES MAXBLOCKS

  ---------- ------------------------------ ---------- ---------- --------- ------------ --- ---------- ----------

  INCREMENT_BY USER_BYTES USER_BLOCKS

  ------------ ---------- -----------

  /bgdata/oracle/temp01.dbf

  1 TEMP 3563061248 434Array44 AVAILABLE 1 YES 42Array4Array672Array6 5242886400 3562012672 434816

  2. 需要考虑的几个方面

  1)创建的索引需要几个G的磁盘空间。

  2)创建索引需要排序,使用pga_aggregate_target,要把这个值从200M加大到2G。

  3)如果内存不够,需要temp表空间,则要把temp表空间加大到8G――itpub上有一个帖子说过,15亿条记录用了34G空间。

  4)在线创建,时间会比较长。讨论后,停止这个表的操作,非online创建。

  3. 实际操作过程

  1)数据文件够,不扩展;temp数据文件扩展:

  alter database tempfile "/bgdata/oracle/temp01.dbf" resize 81Array2m;

  2)在workarea_size_policy=AUTO的情况下,改pga_aggregate_target=2048m。对于串行操作,一个session能使用的pga=MIN(5%PGA_AGGREGATE_TARGET,100MB),这样可以使得pga用到最大的值:

  alter system set pga_aggregate_target=2048m;

  3)因为这是一个比较长的过程,所以写脚本让后台运行:

  nohup time createind.sh &

  vi createind.sh

  #!/bin/sh

  sqlplus user/password <<>

  create index IDX_SUBMIT_RECORDTIME on STAT_SUBMIT_CENTER(RECORDTIME) local;

  exit

  EOF

  4)创建过程中可以观察v$sort_segment,v$sort_usage看排序情况:

  nohup time createind.sh &

  vi createind.sh

  #!/bin/sh

  sqlplus user/password <<>

  create index IDX_SUBMIT_RECORDTIME on STAT_SUBMIT_CENTER(RECORDTIME) local;

  exit

  EOF

  5)创建完成后,把tempfile和pga_aggregate_target改回原值:

  alter database tempfile "/bgdata/oracle/temp01.dbf" resize 40Array6m;

  alter system set pga_aggregate_target=500m;

  4. 实际创建过程中观察到的情况

  1)开始之前:

  SQL> select tablespace_name,current_users,total_blocks,used_blocks,free_blocks from v$sort_segment;

  TABLESPACE_NAME CURRENT_USERS TOTAL_BLOCKS USED_BLOCKS FREE_BLOCKS

  ------------------------------- ------------- ------------ ----------- -----------

  TEMP 0 431360 0 431360

  SQL> select * from v$sort_usage;

  no rows selected

  2)创建之初,抓到这么一条sql:

  insert into obj$(owner#,name,namespace,obj#,type#,ctime,mtime,st

  ime,status,remoteowner,linkname,subname,dataobj#,flags,oid$,spar

  e1,spare2)values(:1,:2,:3,:4,:5,:6,:7,:8,:Array,:10,:11,:12,:13,:14,

  :15,:16, :17)

  3)然后v$sort_segment.USED_BLOCKS变大,v$sort_usage.BLOCKS变大,一直增长到:

  SQL> select tablespace_name,current_users,total_blocks,used_blocks,free_blocks from v$sort_segment;

  TABLESPACE_NAME CURRENT_USERS TOTAL_BLOCKS USED_BLOCKS FREE_BLOCKS

  ------------------------------- ------------- ------------ ----------- -----------

  TEMP 1 431360 46720 384640

  SQL> select * from v$sort_usage;

  USERNAME USER SESSION_ADDR SESSION_NUM SQLADDR SQLHASH

  ------------------------------ ------------------------------ ---------------- ----------- ---------------- ----------

  TABLESPACE CONTENTS SEGTYPE SEGFILE# SEGBLK# EXTENTS BLOCKS SEGRFNO#

  ------------------------------- --------- --------- ---------- ---------- ---------- ---------- ----------

  DPC DPC 00000003Array74CFFB0 6134 00000003ArrayArrayCAB288 1254Array50678

  TEMP TEMPORARY SORT 201 431113 365 46720 1

  这个过程中抓到的sql为:

  select file# from file$ where ts#=:1

  4)v$sort_segment.USED_BLOCKS变为0,v$sort_usage.BLOCKS变为0。

  5)重复3,4两步,估计这个是创建一个分区的索引。

  需要解释一下的是,上面的sql只是我随机抓到的运行时间比较长的,整个create index过程会复杂很多,具体怎么样可以用sqltrace跟踪。这里主要看的是temp表空间的使用情况。

  同时,在创建的过程中:

  SQL> select segment_name,partition_name from user_segments where segment_name="IDX_SUBMIT_RECORDTIME";

  no rows selected

  SQL> select index_name,partition_name from user_ind_partitions where INDEX_NAME="IDX_SUBMIT_RECORDTIME";

  no rows selected

  当时忘了查user_segments中其实是有一个segment_name为一串数字的记录,那个才是正在创建的索引;如果这个事务失败了,将回滚。

  最后耗时ArrayArray分钟完成。

  5. 创建完成后分析索引

  但是接下来还有一件事。创建完成后要分析索引,否则就是走了索引,查询速度也会很慢。

  SQL> explain plan for select count(*) from stat_submit_center where recordtime>trunc(sysdate);

  Explained.

  SQL> @?/rdbms/admin/utlxplp.sql

  PLAN_TABLE_OUTPUT

  ------------------------------------------------------------------------------------------------------------------------

  -------------------------------------------------------------------------------------------------

  | Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop |

  -------------------------------------------------------------------------------------------------

  | 0 | SELECT STATEMENT | | 1 | Array | 4 | | |

  | 1 | SORT AGGREGATE | | 1 | Array | | | |

  | 2 | PARTITION RANGE ALL | | | | | 1 | 50 |

  |* 3 | INDEX FAST FULL SCAN| IDX_SUBMIT_RECORDTIME | 8878K| 76M| 4 | 1 | 50 |

  -------------------------------------------------------------------------------------------------

  Predicate Information (identified by operation id):

  ---------------------------------------------------

  3 - filter("STAT_SUBMIT_CENTER"."RECORDTIME">TRUNC(SYSDATE@!))

  Note: cpu costing is off

  16 rows selected.

  SQL> set autotrace on explain

  SQL> set timing on

  SQL> select count(*) from stat_submit_center where recordtime>trunc(sysdate);

  aa^Cselect count(*) from stat_submit_center where recordtime>trunc(sysdate)

  *

  ERROR at line 1:

  ORA-01013: user requested cancel of current operation

  Elapsed: 00:11:4Array.85

  SQL>

  SQL> set autotrace off

  上面可以看到,因为没有分析索引,虽然它走的是新建的IDX_SUBMIT_RECORDTIME索引,但是查询速度很慢,10分钟后也没有结果。下面我们分析一下:

  SQL> Analyze index IDX_SUBMIT_RECORDTIME estimate statistics;

  Index analyzed.

  Elapsed: 00:00:06.84

  SQL> set autotrace on explain

  SQL> select count(*) from stat_submit_center where recordtime>trunc(sysdate);

  COUNT(*)

  ----------

  Array26736

  Elapsed: 00:00:05.37

  Execution Plan

  ----------------------------------------------------------

   0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4360 Card=1 Bytes=Array)

   1 0 SORT (AGGREGATE)

   2 1 PARTITION RANGE (ALL)

   3 2 INDEX (RANGE SCAN) OF "IDX_SUBMIT_RECORDTIME" (NON-UNI

   QUE) (Cost=4360 Card=8878740 Bytes=7ArrayArray08660)

  SQL> set autotrace off

  索引分析之后,查询时间为5分钟左右,效率大大提高。

  至此,完成全部操作。

  作者简介:柔嘉维则;作者Email地址为baobaoc@hotmail.com;作者Blog为http://spaces.msn.com/roujiaweize/(e12Array)