Oracle 启动故障案例之--ORA-600 [4193]错误

Oracle 启动故障案例之--ORA-600 [4193]错误

操作系统：Oracle Linux 5

数据库： Oracle 11gR2（11.2.0.3.0）

一、故障现象：

1、在做了redo log当前日志组被破坏恢复的测试后

2、启动数据库后出现ORA-600 【4193】的错误

查看告警日志：

[oracle@ocm1 ~]$ tail -f /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/alert_enmoedu.log
Block recovery completed at rba 5.111.16,scn 0.1430641
Errors in file /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/enmoedu_pmon_10635.trc (incident=36027):
ORA-00600: internal error code,arguments: [4193],[],[]
Incident details in: /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/incident/incdir_36027/enmoedu_pmon_10635_i36027.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue Dec 13 12:54:04 2016
Dumping diagnostic data in directory=[cdmp_20161213125404],requested by (instance=1,osid=10635 (PMON)),summary=[incident=36027].
Errors in file /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/enmoedu_pmon_10635.trc:
ORA-00600: internal error code,[]
PMON (ospid: 10635): terminating the instance due to error 472
System state dump requested by (instance=1,summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/enmoedu_diag_10653.trc
Dumping diagnostic data in directory=[cdmp_20161213125405],summary=[abnormal instance termination].
Instance terminated by PMON,pid = 10635

查看trace文件：
[oracle@ocm1 ~]$more /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/incident/incdir_38544/enmoedu_mmon_3181_i38544.trc
......
----- START DDE Action: 'dumpKernelDiagState' (Sync) -----
------- Kernel Diag Dump -------
dbkcBSExt: 0
dbkedDefDump info:
Internal err count: 4
Error Flags: 0x0
Exception: FALSE
Bootstrapping info:
Flags: 0x17
Options: 0x806
Diag Dest: /u01/app/oracle
DB Unique name: enmoedu
Instance Name: enmoedu
------- END Kernel Diag Dump -------
----- END DDE Action: 'dumpKernelDiagState' (SUCCESS,0 csec) -----
----- START DDE Action: 'xdb_dump_buckets' (Sync) -----
----- END DDE Action: 'xdb_dump_buckets' (FAILURE,0 csec) -----
----- START DDE Action: 'dumpKGERing' (Sync) -----
----- END DDE Action: 'dumpKGERing' (SUCCESS,0 csec) -----
----- START DDE Action: 'dumpKGEState' (Sync) -----
kgepgtfr 0x7fffb09068d0
kgepgtba 0x7fffb09107a8
kgepgter 5
kgepgpar kgepgbpa 0xbaf48c5
kgepgepa 0xbaf5064
kgepgtfd 21
kgepgdmc 0
kgepgflg 0x8
kgepg_stkgfr (nil)
kgepgkgsmp 0xbaf3fa0
kgepgspm 4
kgepg_ba_set_in_eh 0x7fffb09114b0
kgepg_kgecatch_set_in_eh_ba (nil)
kge_ba_set_in_eh_funcloc 0x9b975bc
kge_ba_set_in_eh_fileloc 0x9b97890
------------------- start error stack dump with barriers
<error barrier> at 0x7fffb09107a8
ORA-00603: ORACLE server session terminated by fatal error
ORA-24557: error 600 encountered while handling error 600; exiting server process
ORA-00600: internal error code,[]
<error barrier> at 0x7fffb09114b0
ORA-00600: internal error code,[]
ORA-00600: internal error code,[]
<error barrier> at 0x7fffb0915ed8
------------------- end error stack dump with barriers
----- END DDE Action: 'dumpKGEState' (SUCCESS,0 csec) -----
----- START DDE Action: 'kpuActionDefault' (Sync) -----
Begin OCI Current State Dump
End OCI Current State Dump
Begin OCI Call Context Dump
End OCI Call Context Dump
Begin Process state dump.
ttcdrvdmplocation: msg-0 ln-0 reporting 0
HST is NULL or no two task connection
End Process state dump.
----- END DDE Action: 'kpuActionDefault' (SUCCESS,0 csec) -----
----- END DDE Actions Dump (total 1 csec) -----
End of Incident Dump

根据MOS介绍，此故障一般和undo segment有关
二、解决方法：

1、通过spfile生成pfile
01:55:31 SYS@ enmoedu>create pfile from spfile;
File created.

2、编辑pfile文件
[oracle@ocm1 dbs]$ vi initenmoedu.ora
#*.undo_tablespace='UNDOTBS1'
undo_management = 'MANUAL'
rollback_segments = 'SYSTEM'

3、通过pfile启动Instance
01:58:48 SYS@ enmoedu>startup mount pfile='$ORACLE_HOME/dbs/initenmoedu.ora';
ORACLE instance started.
Total System Global Area 521936896 bytes
Fixed Size 2229944 bytes
Variable Size 360712520 bytes
Database Buffers 155189248 bytes
Redo Buffers 3805184 bytes
Database mounted.
Elapsed: 00:00:00.00

02:00:07 SYS@ enmoedu>show parameter undo
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
undo_management string MANUAL
undo_retention integer 900
undo_tablespace string

4、打开数据库
02:00:16 SYS@ enmoedu>alter database open;
Database altered.
此时打开数据库正常：
[oracle@ocm1 ~]$ tail -f /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/alert_enmoedu.log
alter database open
Beginning crash recovery of 1 threads
Started redo scan
Completed redo scan
read 43 KB redo,36 data blocks need recovery
Started redo application at
Thread 1: logseq 7,block 3
Recovery of Online Redo Log: Thread 1 Group 1 Seq 7 Reading mem 0
Mem# 0: /u01/app/oracle/oradata/enmoedu/redo01.log
Completed redo application of 0.03MB
Completed crash recovery at
Thread 1: logseq 7,block 90,scn 1491526
36 data blocks read,36 data blocks written,43 redo k-bytes read
Wed Dec 14 02:00:26 2016
Thread 1 advanced to log sequence 8 (thread open)
Thread 1 opened at log sequence 8
Current log# 2 seq# 8 mem# 0: /u01/app/oracle/oradata/enmoedu/redo02.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Dec 14 02:00:26 2016
SMON: enabling cache recovery
Undo initialization finished serial:0 start:479574 end:479584 diff:10 (0 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Wed Dec 14 02:00:27 2016
QMNC started with pid=20,OS id=3522
Completed: alter database open
Wed Dec 14 02:00:28 2016
Starting background process CJQ0
Wed Dec 14 02:00:28 2016
CJQ0 started with pid=26,OS id=3554
Wed Dec 14 02:00:30 2016
db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files,and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Wed Dec 14 02:00:51 2016
Errors in file /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/enmoedu_j001_3562.trc:
ORA-01552: cannot use system rollback segment for non-system tablespace 'TEMP'

5、删除原有的undo tablespace创建新的undo tablespace
02:00:27 SYS@ enmoedu>drop tablespace undotbs1 including contents and datafiles;
Tablespace dropped.

02:03:32 SYS@ enmoedu>create undo tablespace undotbs1
02:03:39 2 datafile '/u01/app/oracle/oradata/enmoedu/undotbs01.dbf' size 100m
02:03:50 3 autoextend on;
Tablespace created.

6、关闭数据库，重新通过spfle启动
02:04:02 SYS@ enmoedu>shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

02:05:45 SYS@ enmoedu>startup
ORACLE instance started.
Total System Global Area 521936896 bytes
Fixed Size 2229944 bytes
Variable Size 360712520 bytes
Database Buffers 155189248 bytes
Redo Buffers 3805184 bytes
Database mounted.
Database opened.

查看告警日志，数据库启动正常，问题解决！
[oracle@ocm1 ~]$ tail -f /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/alert_enmoedu.log
ALTER DATABASE OPEN
Thread 1 opened at log sequence 8
Current log# 2 seq# 8 mem# 0: /u01/app/oracle/oradata/enmoedu/redo02.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
SMON: enabling cache recovery
[3870] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:808234 end:808274 diff:40 (0 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is ZHS16GBK
No Resource Manager plan active
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Wed Dec 14 02:05:55 2016
QMNC started with pid=20,OS id=3874
Completed: ALTER DATABASE OPEN
Wed Dec 14 02:05:56 2016
db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files,and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Starting background process CJQ0
Wed Dec 14 02:05:56 2016
CJQ0 started with pid=22,OS id=3902

附：（转MOS文档）

ORA-600[4193] 这个错误也是与UNDO 有关系，MOS 上有几篇相关的说明文章.

一.MOS说明

1.1 ORA-600 [4193] WhenTrying To Open The Database [ID 763566.1]

Symptoms

Copying database from one server to another server and getting an ORA-600 [4193] error when trying to open the database on the destination server.

--copy 数据库从一个server 到另一个server 后，尝试打开时报这个错误。

Cause

The online redo logs were copied when the source database was open,online redo logs should never be copied when the database is open.

--导致原因是因为在数据库open时把online redo logs 也一起copy 过去了。在数据库open状态，online redo log 不应该copy。

Solution

In this instance the datafiles were being copied properly after the tablespaces were put in to backup mode,however,online redo logs should only be copied if the source database is shutdown first before copying the online redo logs. The source database needed to remain open so,the datafiles were copied again (withthe tablespaces in backup mode) and then a number of archive logs were transferred over to the new server and after the last archivelog was applied the database could be opened with resetlogs and new online redo logs were created on the destination server.

--当表空间被设置为backup 模式之后，可以copy 数据文件，但是onlineredo log 只能是在数据库shutdown 之后才能copy，如果数据库一直是open 状态，那么只能把datafile copy 过去，然后把归档文件传送过去，最后用openresetlogs的方式打开数据库，在open时online redo log 会自动重建。

1.2 Ora-600 [4193] Whenopening Or Shutting Down A Database [ID 452662.1]

1.2.1 Symptoms

Errors in alert.log:

TueJul1713:38:132007
Errorsinfile/home/Oracle/oracle/product/10.2.0/yms/rdbms/log/yms_smon_8337.trc:
ORA-00607:Internalerroroccurredwhilemakingachangetoadatablock
ORA-00600:internalerrorcode,arguments:[4193],[3552],[3554],[]

yms_smon_8337.trc:

SO:0xdfaec728,type:24,owner:0xdf266580,flag:INIT/-/-/0x00

(buffer)PR:0xdf1f1338FLG:0x1000
classbit:0x80000
kcbbfbp:[BH:0xded4bf40,LINK:0xdfaec768]
kcbbfbx[0]:[BH:0xdece41d8,LINK:0xdfaec788]
where:ktuwh01:ktugus,why:0
buffertsn:2rdba:0x00c00002(3/2)
scn:0x0000.03c95628seq:0x01flg:0x00tail:0x56280e01
frmt:0x02chkval:0x0000type:0x0e=KTUUNDOHEADERW/UNLIMITEDEXTENTS
BH(0xdece41d8)file#:3rdba:0x00c003b6(3/950)class:20ba:0x11d6ba000
set:6blksize:8192bsi:0set-flg:0pwbcnt:0
dbwrid:0obj:-1objn:0tsn:2afn:3
hash:[df870f70,df870f70]lru:[dece4488,dece4028]
obj-flags:object_ckpt_list
ckptq:[dedac4a0,ded47cb8]fileq:[dedac500,ded47cc8]objq:[ded47d78,db7bfd78]
use:[dfaec788,dfaec788]wait:[NULL]
st:XCURRENTmd:EXCLtch:0
flags:mod_startedgotten_in_current_modeblock_written_once
changestate:ACTIVE
changecount:1
LRBA:[0xac3.4de07.0]HSCN:[0xffff.ffffffff]HSUB:[65535]
UsingStateObjects
----------------------------------------
SO:0xdfaec728,flag:INIT/-/-/0x00
(buffer)PR:0xdf1f1338FLG:0x1000
classbit:0x80000
kcbbfbp:[BH:0xded4bf40,why:0
buffertsn:2rdba:0x00c003b6(3/950)
scn:0x0000.03be3c7dseq:0x5aflg:0x04tail:0x3c7d025a
frmt:0x02chkval:0x0868type:0x02=KTUUNDOBLOCK
----------------------------------------
Error607inredoapplicationcallback
TYP:0CLS:20AFN:3DBA:0x00c003b6OBJ:4294967295SCN:0x0000.03be3c7dSEQ:90OP:5.1
ktudbredo:siz:132spc:4462flg:0x0012seq:0x0de2rec:0x09

UNDO BLK:
xid: 0x0002.045.00006c61seq:0xde0cnt: 0x60 irb: 0x60 icl: 0x0 flg: 0x0000

1.2.2 Cause

When we try toapply redo to an undo block (forward changes are made by the applicationof redo to a block) we check that the seq# in the undo record matches the seq# in the redo record.

--数据库在启动时需要进行一个前滚的操作，在前滚时会应用redo 到undo block上，操作时会检查undorecord里的seq#和 redo record里的seq#.

These seq# should be the same because when we apply a redo record we must apply itto thecorrect version of the block.

--正常情况下，这2者的seq# 应该是一致的。

We can only apply a redo record to a block that contains the same seq# as in the redo record.

--在一致的情况下，我们才应用redo record 到undo record。

If the seq# do not match then ORA-600[4193][a].[b] is raised. .

Arg [a] Undorecord seq number --> seq: 0xde0 = 3552
Arg [b] Redo record seq number --> seq:0x0de2 = 3554

--如果不一致就会出现ORA-600[4193][a][b]的错误。其中a 是undo 里的seq#记录，b是redo 里的seq# 值。这里的值都是十六进程，我们可以通过to_number() 这个函数来转换一下：

SYS@anqing1(rac1)> Select to_number('de0','xxxx') from dual;

TO_NUMBER('DE0','XXXX')

-----------------------

3552

This implies some kind of block corruptionin either the redo or the undo block.

--当redo record 和 undo record 不一致时，就会抛出ORA-600[4193]的错误。

1.3 bug 导致的ORA-600[4193]

MOS:

ORA-600 [4193] "seq# mismatch while adding undo record" [ID 39282.1]

Bug 8240762 - Undo corruptions with ORA-600[4193]/ORA-600 [4194] or ORA-600 [4137] [ID 8240762.8]

Undo corruptionmay be caused after a shrink and the same undo block may be used for two different transactions causing several internal errors like:

ORA-600 [4193] / ORA-600 [4194] for new transactions

ORA-600 [4137] for a transaction rollback

Undo segment shrink is internally done by Oracle.

--undo shrink 导致的undo corruptions

Workaround

Drop the undo segment.

Affects:

*Product (Component)*	Oracle Server (Rdbms)
Range of versionsbelievedto be affected	Versions >= 10.2 but BELOW 11.2
Versionsconfirmedas being affected	10.2.0.4 10.2.0.3
Platforms affected	Generic (all / most platforms affected)

Fixed:

This issue is fixed in

在Oracle 10.2 以上到11.2 的DB 会受Bug 8240762的影响导致undo 的corruption。在10.2.0.5 中已经修复了这个bug。如果出现这种问题，drop 对应的undo segment 即可。

原文链接：https://www.f2er.com/oracle/211288.html