Tuesday, July 29, 2008

Problem with CRS startup after disk migration

Our unix and storage group has a project for migration from old storage subsystem to a brand new one. Action plan was simple;

1- shutdown the database with
srvctl stop database -d 

2- shutdown the crs with
crsctl stop crs

3- move the data from old disks to new ones
4- rename the new disks with the original names
5- open the crs with
crsctl start crs

6- startup the database with
srvctl start database -d 


But right after step 4, crs was unable to open and I am called at night. When i tried to start crs with crsctl start crs, i had errors indicates crs was unable to reach the OCR disk. I tried to check the OCR with ocrcheck and trace the log file of the OCR in $CRS_HOME/log//client/

[root@raquality00 client]$ pwd
/oracle/product/10.2.0/crs/log/raquality00/client
[root@raquality00 client]# cat ocrcheck_13416.log
Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle. All rights reserved.
2008-06-18 02:13:39.487: [OCRCHECK][2538462912]ocrcheck starts...
2008-06-18 02:13:39.488: [ OCROSD][2538462912]utstoragetype: /oradata1/orcfile.ora is on FS type 1952539503. Not supported.
2008-06-18 02:13:39.488: [ OCROSD][2538462912]utopen:6'': OCR location /oradata1/orcfile.ora configured is not valid storage type. Return code [37].
2008-06-18 02:13:39.488: [ OCRRAW][2538462912]proprinit: Could not open raw device
2008-06-18 02:13:39.488: [ default][2538462912]a_init:7!: Backend init unsuccessful : [37]
2008-06-18 02:13:39.488: [OCRCHECK][2538462912]Failed to initialize OCR context: [PROC-37: Oracle Cluster Registry does not support the storage type configured]
2008-06-18 02:13:39.488: [OCRCHECK][2538462912]Failed to initialize ocrchek2
2008-06-18 02:13:39.488: [OCRCHECK][2538462912]Exiting [status=failed]...

Log file says there is a problem with the ocrfile.ora. But I am suspicious if the filename is true. I should be sure with the filename so I examine the install logs in $CRS_HOME/install/ directory.

[oracle@raquality00 install]$ pwd
/oracle/product/10.2.0/crs/install
[oracle@raquality00 install]$ cat paramfile.crs
ORA_CRS_HOME=/oracle/product/10.2.0/crs
CRS_ORACLE_OWNER=oracle
CRS_DBA_GROUP=oinstall
CRS_VNDR_CLUSTER=false
CRS_OCR_LOCATIONS=/oradata1/orcfile.ora
CRS_CLUSTER_NAME=raqcrs
CRS_HOST_NAME_LIST=raquality00,1,raquality01,2
CRS_NODE_NAME_LIST=raquality00,1,raquality01,2
CRS_PRIVATE_NAME_LIST=raquality00-priv,1,raquality01-priv,2
CRS_LANGUAGE_ID='AMERICAN_AMERICA.WE8ISO8859P1'
CRS_VOTING_DISKS=/oradata1/votingfile.ora
CRS_NODELIST=raquality00,raquality01
CRS_NODEVIPS='raquality00/raquality00-vip/255.255.255.0/eth0,raquality01/raquality01-vip/255.255.255.0/eth0'

Lets see if the location of the ocr file is changed by someone after installation by inspecting the /etc/oracle/ocr.loc.

[oracle@raquality00 oracle]$ pwd
/etc/oracle
[oracle@raquality00 oracle]$ cat ocr.loc
ocrconfig_loc=/oradata1/orcfile.ora
local_only=FALSE

OCR location and the filename seems correct. Now we now it hasnt been changed after the installation. It seems there may be corruption with the OCR file. Checking the backups of the OCR is handy. Maybe I will need to restore from the backup. The command I runned is ocrconfig -showbackup. The output shows the case goes bad.

[root@raquality00 client]# cat ocrconfig_9347.log
Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle. All rights reserved.
2008-06-18 02:04:20.350: [ OCRCONF][2538462912]ocrconfig starts...
2008-06-18 02:04:20.353: [ OCROSD][2538462912]utstoragetype: /oradata1/orcfile.ora is on FS type 1952539503. Not supported.
2008-06-18 02:04:20.353: [ OCROSD][2538462912]utopen:6'': OCR location /oradata1/orcfile.ora configured is not valid storage type. Return code [37].
2008-06-18 02:04:20.353: [ OCRRAW][2538462912]proprcow: problem reading the bootblock
2008-06-18 02:04:20.353: [ OCRCONF][2538462912]Failure in overwriting OCR configuration on disk
2008-06-18 02:04:20.353: [ OCRCONF][2538462912]Exiting [status=failed]...

I am now getting doubt. Maybe there is a problem about the filesystems. I decided to check the filesystem types from fstab. But they seem OK.

[root@raquality00 raqcrs]# cat /etc/fstab
# This file is edited by fstab-sync - see 'man fstab-sync' for details
LABEL=/1 / ext3 defaults 1 1
LABEL=/boot1 /boot ext3 defaults 1 2
#LABEL=/oradata /oradata1 ocfs2 _netdev,datavolume,nointr 0 0
LABEL=/oradata1_new /oradata1 ocfs2 _netdev,datavolume,nointr 0 0
#LABEL=/oradata2 /oradata2 ocfs2 _netdev,datavolume,nointr 0 0
LABEL=/oradata2_new /oradata2 ocfs2 _netdev,datavolume,nointr 0 0
#LABEL=/oradata3 /oradata3 ocfs2 _netdev,datavolume,nointr 0 0
#LABEL=/oradata4 /oradata4 ocfs2 _netdev,datavolume,nointr 0 0
#LABEL=/oradata5 /oradata5 ocfs2 _netdev,datavolume,nointr 0 0
#LABEL=/oradata6 /oradata6 ocfs2 _netdev,datavolume,nointr 0 0
#LABEL=/oradata7 /oradata7 ocfs2 _netdev,datavolume,nointr 0 0
#LABEL=/oradata8 /oradata8 ocfs2 _netdev,datavolume,nointr 0 0
#LABEL=/oradata9 /oradata9 ocfs2 _netdev,datavolume,nointr 0 0
LABEL=/oradata9_new /oradata9 ocfs2 _netdev,datavolume,nointr 0 0
LABEL=/oradata10_new /oradata10 ocfs2 _netdev,datavolume,nointr 0 0
#LABEL=/oradata11 /oradata11 ocfs2 _netdev,datavolume,nointr 0 0
LABEL=/oradata11_new /oradata11 ocfs2 _netdev,datavolume,nointr 0 0
#LABEL=/oradata12 /oradata12 ocfs2 _netdev,datavolume,nointr 0 0
LABEL=/oradata12_new /oradata12 ocfs2 _netdev,datavolume,nointr 0 0
#/dev/mapper/vgra-lvra01 /ra01 ext3 defaults 1 2
/dev/vgra00/lvra01 /ra01_new ext3 defaults 1 2
none /dev/pts devpts gid=5,mode=620 0 0
none /dev/shm tmpfs defaults 0 0
none /proc proc defaults 0 0
none /sys sysfs defaults 0 0
LABEL=SWAP-sda2 swap swap defaults 0 0
/dev/sdb1 swap swap defaults 0 0
#/dev/vgora/lvora /oracle ext3 defaults 1 2
/dev/vgra00/lvoracle /oracle ext3 defaults 1 2
/dev/hda /media/cdrom auto pamconsole,exec,noauto,managed 0 0

When i recheck the error logs of ocrcheck command i realised that there are some other error notifications after invalid storage type error line. OCR complains about reading bootblock and overwriting the file. There maybe a process holding the file.

[oracle@raquality00 install]$ ps -ef | grep crs
root 10239 7791 0 Jun18 ? 00:37:59 /oracle/product/10.2.0/crs/bin/crsd.bin reboot

Now it is more clear, maybe our admins forgot to disable crs service by executing
/etc/init.d/init.crs stop
after stopping crs with
crsctl stop crs
. And data migration is made with the running crs service. I recommended a reboot but stopping the the service may also be OK. After the reboot crs is opened successfully and databases are now up and running.

[oracle@raquality00 install]$ crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy

[oracle@raquality00 install]$ srvctl status database -d RADB
Instance RADB1 is running on node raquality00
Instance RADB2 is running on node raquality01

Saturday, July 5, 2008

Alter the table to change the structure

What will happen if one of the source files structure changes, and all data loading procedures and "sql loader" scripts are ready after long and busy working days. Oracle has options to change the table structure.

Altering a column

create table EPEKER.ep_test (col1 varchar(10));

insert into EPEKER.ep_test values ('some text here');
--ORA-12899: value too large for column "EPEKER"."EP_TEST"."COL1" (actual: 14, maximum: 10)

alter table EPEKER.ep_test modify (col1 varchar2(20));
insert into EPEKER.ep_test values ('some text here');
--1 row inserted
commit;

select * from EPEKER.ep_test;


Adding/Dropping a column

alter table EPEKER.ep_test add (col2 date);
alter table EPEKER.ep_test add (col2 date DEFAULT sysdate);

alter table EPEKER.ep_test drop (col2);


Renaming Columns

alter table EPEKER.ep_test rename column col3 to col4;


Changing Datatypes

But changing datatype of a column is not that easy. As the following example shows to change a datatype of a column, the column to be modified must be empty.

insert into EPEKER.ep_test (col3) values ('123')
commit;
alter table EPEKER.ep_test modify (col3 number)
--ORA-01439: column to be modified must be empty to change datatype

update EPEKER.ep_test set col3=NULL;
commit;
alter table EPEKER.ep_test modify (col3 number)

desc EPEKER.ep_test


Compressed Tables

You can modify and add columns to compressed tables. But dropping a column is still not supported (v10.2). To drop a column one should first MOVE NOCOMPRESS, drop the column and again MOVE COMPRESS a table.

alter table EPEKER.ep_test move compress;
alter table EPEKER.ep_test add (col3 varchar2(10));
alter table EPEKER.ep_test modify (col3 varchar2(20));

alter table EPEKER.ep_test drop (col3);
--ORA-39726: unsupported add/drop column operation on compressed tables