Thursday, December 30, 2010

Error in checking condition of instance on node

After rebooting both rac nodes srvctl started to complain about the condition of the second node of my cluster.


[oracle@EPRHEL6 admin]$ srvctl status database -d orcl
Instance ORCL1 is running on node eprhel5
PRKO-2015 : Error in checking condition of instance on node: eprhel6

[oracle@EPRHEL6 admin]$ sqlplus system/password@ORCL2

SQL*Plus: Release 10.2.0.1.0 - Production on Mon Dec 27 00:03:11 2010

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

ERROR:
ORA-12514: TNS:listener does not currently know of service requested in connect
descriptor


Enter user-name: 



srvctl also complains when i was trying to start the instance on the second node. So i decided to start the instance manually by using sqlplus.

[oracle@EPRHEL6 admin]$ sqlplus "/ as sysdba"


SQL*Plus: Release 10.2.0.1.0 - Production on Mon Dec 27 00:03:24 2010

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options

SQL> startup;
Oracle instance started.

Total System Global Area 599785472 bytes
Fixed Size     2022600 bytes
Variable Size   188744504 bytes
Database Buffers  402653184 bytes
Redo Buffers     6365184 bytes
Database mounted.
Database opened.
SQL> alter system register;

System altered.

SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options

[oracle@EPRHEL6 admin]$ sqlplus system/password@ORCL2

SQL*Plus: Release 10.2.0.1.0 - Production on Mon Dec 27 00:04:18 2010

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options

SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options


It seems there is not any problem with the instance itself. Sqlplus barely connects to the instance ORCL2. There should be a problem about the way of communication between srvctl and the instance.

[oracle@EPRHEL6 admin]$ crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....L1.inst application    ONLINE    ONLINE    eprhel5     
ora....L2.inst application    ONLINE    UNKNOWN   eprhel6     
ora.ORCL.db    application    ONLINE    ONLINE    eprhel5     
ora....SM1.asm application    ONLINE    ONLINE    eprhel5     
ora....L5.lsnr application    ONLINE    ONLINE    eprhel5     
ora....el5.gsd application    ONLINE    ONLINE    eprhel5     
ora....el5.ons application    ONLINE    ONLINE    eprhel5     
ora....el5.vip application    ONLINE    ONLINE    eprhel5     
ora....SM2.asm application    ONLINE    ONLINE    eprhel6     
ora....L5.lsnr application    OFFLINE   OFFLINE               
ora....L6.lsnr application    ONLINE    ONLINE    eprhel6     
ora....el6.gsd application    ONLINE    ONLINE    eprhel6     
ora....el6.ons application    ONLINE    ONLINE    eprhel6     
ora....el6.vip application    ONLINE    ONLINE    eprhel6     

[oracle@EPRHEL6 admin]$ srvctl start listener -n EPRHEL6

[oracle@EPRHEL6 admin]$ crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....L1.inst application    ONLINE    ONLINE    eprhel5     
ora....L2.inst application    ONLINE    UNKNOWN   eprhel6     
ora.ORCL.db    application    ONLINE    ONLINE    eprhel5     
ora....SM1.asm application    ONLINE    ONLINE    eprhel5     
ora....L5.lsnr application    ONLINE    ONLINE    eprhel5     
ora....el5.gsd application    ONLINE    ONLINE    eprhel5     
ora....el5.ons application    ONLINE    ONLINE    eprhel5     
ora....el5.vip application    ONLINE    ONLINE    eprhel5     
ora....SM2.asm application    ONLINE    ONLINE    eprhel6     
ora....L5.lsnr application    OFFLINE   OFFLINE               
ora....L6.lsnr application    ONLINE    ONLINE    eprhel6     
ora....el6.gsd application    ONLINE    ONLINE    eprhel6     
ora....el6.ons application    ONLINE    ONLINE    eprhel6     
ora....el6.vip application    ONLINE    ONLINE    eprhel6  

[oracle@EPRHEL6 admin]$ sqlplus system/password@ORCL1

SQL*Plus: Release 10.2.0.1.0 - Production on Mon Dec 27 00:04:35 2010

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options

SQL> show parameter listener;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
local_listener                       string
remote_listener                      string      LISTENERS_ORCL
SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options


I think there is a problem with listener configuration or the listener resource itself. But everything seems fine except that OFFLINE resource. After searching google a little bit, i found a solution indicates listener configuration. I decided to recreate the listeners with netca. I will first delete listener named LISTENER from both ASM and DB homes using netca and then recreate them only using DB home. Maybe this resolves the problem.

My action plan is first stop all asm and db instances. Manually remove that OFFLINE listener which is very confusing. Remove all the listener configuration from the cluster with netca and recreate using db home. Here we go.

[oracle@EPRHEL6 db]$ lsnrctl status

LSNRCTL for Linux: Version 10.2.0.1.0 - Production on 27-DEC-2010 00:22:16

Copyright (c) 1991, 2005, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_EPRHEL6
Version                   TNSLSNR for Linux: Version 10.2.0.1.0 - Production
Start Date                27-DEC-2010 00:02:31
Uptime                    0 days 0 hr. 19 min. 44 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /oracle/product/asm/network/admin/listener.ora
Listener Log File         /oracle/product/asm/network/log/listener_eprhel6.log
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.28.4.226)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.28.4.246)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM2", status BLOCKED, has 1 handler(s) for this service...
Service "+ASM_XPT" has 1 instance(s).
Instance "+ASM2", status BLOCKED, has 1 handler(s) for this service...
Service "ORCL" has 2 instance(s).
Instance "ORCL1", status READY, has 1 handler(s) for this service...
Instance "ORCL2", status READY, has 2 handler(s) for this service...
Service "ORCLXDB" has 2 instance(s).
Instance "ORCL1", status READY, has 1 handler(s) for this service...
Instance "ORCL2", status READY, has 1 handler(s) for this service...
Service "ORCL_XPT" has 2 instance(s).
Instance "ORCL1", status READY, has 1 handler(s) for this service...
Instance "ORCL2", status READY, has 2 handler(s) for this service...
The command completed successfully

[oracle@EPRHEL6 db]$ srvctl stop database -d orcl
[oracle@EPRHEL6 db]$ srvctl stop asm -n EPRHEL5
[oracle@EPRHEL6 db]$ srvctl stop asm -n EPRHEL6
[oracle@EPRHEL6 db]$ crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....L1.inst application    OFFLINE   OFFLINE               
ora....L2.inst application    OFFLINE   OFFLINE               
ora.ORCL.db    application    OFFLINE   OFFLINE               
ora....SM1.asm application    OFFLINE   OFFLINE               
ora....el5.gsd application    ONLINE    ONLINE    eprhel5     
ora....el5.ons application    ONLINE    ONLINE    eprhel5     
ora....el5.vip application    ONLINE    ONLINE    eprhel5     
ora....SM2.asm application    OFFLINE   OFFLINE               
ora....L5.lsnr application    OFFLINE   OFFLINE               
ora....el6.gsd application    ONLINE    ONLINE    eprhel6     
ora....el6.ons application    ONLINE    ONLINE    eprhel6     
ora....el6.vip application    ONLINE    ONLINE    eprhel6     

[oracle@EPRHEL6 db]$ crs_getperm ora.eprhel6.LISTENER_EPRHEL5.lsnr
Name: ora.eprhel6.LISTENER_EPRHEL5.lsnr
owner:oracle:rwx,pgrp:dba:rwx,other::r--,
[oracle@EPRHEL6 db]$ crs_unregister ora.eprhel6.LISTENER_EPRHEL5.lsnr
[oracle@EPRHEL6 db]$ crs_profile -delete ora.eprhel6.LISTENER_EPRHEL5.lsnr
CRS-0170: The resource 'ora.eprhel6.LISTENER_EPRHEL5.lsnr' doesn't exist.

[oracle@EPRHEL6 db]$ crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....L1.inst application    ONLINE    ONLINE    eprhel5     
ora....L2.inst application    ONLINE    ONLINE    eprhel6     
ora.ORCL.db    application    ONLINE    ONLINE    eprhel5     
ora....SM1.asm application    ONLINE    ONLINE    eprhel5     
ora....L5.lsnr application    ONLINE    ONLINE    eprhel5     
ora....el5.gsd application    ONLINE    ONLINE    eprhel5     
ora....el5.ons application    ONLINE    ONLINE    eprhel5     
ora....el5.vip application    ONLINE    ONLINE    eprhel5     
ora....SM2.asm application    ONLINE    ONLINE    eprhel6     
ora....L6.lsnr application    ONLINE    ONLINE    eprhel6     
ora....el6.gsd application    ONLINE    ONLINE    eprhel6     
ora....el6.ons application    ONLINE    ONLINE    eprhel6     
ora....el6.vip application    ONLINE    ONLINE    eprhel6     
[oracle@EPRHEL6 db]$ srvctl status database -d orcl
Instance ORCL1 is running on node eprhel5
Instance ORCL2 is running on node eprhel6
[oracle@EPRHEL6 db]$ 


It seems problem is solved.

Tuesday, December 28, 2010

Relocating CRS Resource

I have installed a one node RAC 10gR2 on RHEL5.5 for test purposes (my 10gR2 rac on RHEL5.5 vmware installation notes). After adding the second node to the cluster successfully, i realized that the new nodes vip resource is running on the first node. I have seen this problem before on a solaris system but i hadnt got any time to write about that.


[root@EPRHEL6]# crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy

[root@EPRHEL6]# crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....L1.inst application    ONLINE    ONLINE    eprhel5     
ora.ORCL.db    application    ONLINE    ONLINE    eprhel5     
ora....SM1.asm application    ONLINE    ONLINE    eprhel5     
ora....L5.lsnr application    ONLINE    ONLINE    eprhel5     
ora....el5.gsd application    ONLINE    ONLINE    eprhel5     
ora....el5.ons application    ONLINE    ONLINE    eprhel5     
ora....el5.vip application    ONLINE    ONLINE    eprhel5     
ora....el6.gsd application    ONLINE    ONLINE    eprhel6     
ora....el6.ons application    ONLINE    ONLINE    eprhel6     
ora....el6.vip application    ONLINE    ONLINE    eprhel5     

[root@EPRHEL6]# ping eprhel6-vip
PING eprhel6-vip (172.28.4.226) 56(84) bytes of data.
64 bytes from eprhel6-vip (172.28.4.226): icmp_seq=1 ttl=64 time=2.28 ms
64 bytes from eprhel6-vip (172.28.4.226): icmp_seq=2 ttl=64 time=1.03 ms
64 bytes from eprhel6-vip (172.28.4.226): icmp_seq=3 ttl=64 time=0.131 ms

[root@EPRHEL6]# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:0C:29:DE:D8:FD  
inet addr:172.28.4.246  Bcast:172.28.4.255  Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fede:d8fd/64 Scope:Link
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
RX packets:36798 errors:0 dropped:0 overruns:0 frame:0
TX packets:13478 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000 
RX bytes:21057458 (20.0 MiB)  TX bytes:10660215 (10.1 MiB)

eth1      Link encap:Ethernet  HWaddr 00:0C:29:DE:D8:07  
BROADCAST MULTICAST  MTU:1500  Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000 
RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

[root@EPRHEL5]# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:0C:29:B7:92:45  
inet addr:172.28.4.245  Bcast:172.28.4.255  Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:feb7:9245/64 Scope:Link
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
RX packets:9074707 errors:0 dropped:0 overruns:0 frame:0
TX packets:1212938 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000 
RX bytes:1173926429 (1.0 GiB)  TX bytes:1041963477 (993.6 MiB)

[root@EPRHEL5]# ifconfig eth0:1
eth0:1    Link encap:Ethernet  HWaddr 00:0C:29:B7:92:45  
inet addr:172.28.4.225  Bcast:172.28.4.255  Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

[root@EPRHEL5]# ifconfig eth0:2
eth0:2    Link encap:Ethernet  HWaddr 00:0C:29:B7:92:45  
inet addr:172.28.4.226  Bcast:172.28.4.255  Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1


I suppose this is because of the network settings of the newly added node and somehow crs could not assign the vip ip address to the nic card. crs_relocate may work on this.


[root@EPRHEL5]# crs_relocate ora.eprhel6.vip
Attempting to stop `ora.eprhel6.vip` on member `eprhel5`
Stop of `ora.eprhel6.vip` on member `eprhel5` succeeded.
Attempting to start `ora.eprhel6.vip` on member `eprhel6`
Start of `ora.eprhel6.vip` on member `eprhel6` succeeded.
[root@EPRHEL5]# crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....L1.inst application    ONLINE    ONLINE    eprhel5     
ora.ORCL.db    application    ONLINE    ONLINE    eprhel5     
ora....SM1.asm application    ONLINE    ONLINE    eprhel5     
ora....L5.lsnr application    ONLINE    ONLINE    eprhel5     
ora....el5.gsd application    ONLINE    ONLINE    eprhel5     
ora....el5.ons application    ONLINE    ONLINE    eprhel5     
ora....el5.vip application    ONLINE    ONLINE    eprhel5     
ora....el6.gsd application    ONLINE    ONLINE    eprhel6     
ora....el6.ons application    ONLINE    ONLINE    eprhel6     
ora....el6.vip application    ONLINE    ONLINE    eprhel6  

Now ifconfig on my new node should show the vip ip address information.

[root@EPRHEL6 network-scripts]# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:0C:29:DE:D8:FD  
inet addr:172.28.4.246  Bcast:172.28.4.255  Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fede:d8fd/64 Scope:Link
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
RX packets:36798 errors:0 dropped:0 overruns:0 frame:0
TX packets:13478 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000 
RX bytes:21057458 (20.0 MiB)  TX bytes:10660215 (10.1 MiB)

eth0:1    Link encap:Ethernet  HWaddr 00:0C:29:DE:D8:FD  
inet addr:172.28.4.226  Bcast:172.28.4.255  Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1