Saturday, January 5, 2008

Endless Status Pending in OMS Console

Problem: When I logged in the OMS console, i realized that one of the cluster instances has a problem with one of its nodes. Node_1 seems to be up but when i examine Node_2 the status seems like "Status Pending" in the OMS console of the Node_2.

When i check the agent on the target host, there was not any problem and the agent uploads metrics successfully.


[oracle@be02 bin]$ ./emctl status agent
Oracle Enterprise Manager 10g Release 3 Grid Control 10.2.0.3.0.
Copyright (c) 1996, 2007 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
Agent Version : 10.2.0.3.0
OMS Version : 10.2.0.3.0
Protocol Version : 10.2.0.2.0
Agent Home : /oracle/product/10.2.0/agent10g/ACTV_ACTV2
Agent binaries : /oracle/product/10.2.0/agent10g
Agent Process ID : 28158
Parent Process ID : 28049
Agent URL : https://be02:3872/emd/main
Repository URL : https://oragrid:1159/em/upload
Started at : 2008-01-04 10:47:17
Started by user : oracle
Last Reload : 2008-01-04 10:47:17
Last successful upload : 2008-01-04 10:49:05
Total Megabytes of XML files uploaded so far : 22.89
Number of XML files pending upload : 741
Size of XML files pending upload(MB) : 35.85
Available disk space on upload filesystem : 70.56%
Collection Status : Disabled by Upload Manager
Last successful heartbeat to OMS : 2008-01-04 10:48:30
---------------------------------------------------------------
Agent is Running and Ready


While i was scratching the OMS console to find a clue about the problem i realized that the target host has been restarted for a week ago and the last upload made by target agent was further then the current date . OMS console shows the last upload date four days future. How could it be? I decided to check the date and TZ environment variable for both target and OMS hosts.


-- target op sys date
[oracle@be02 bin]$ date
Fri Jan 4 14:53:38 EET 2008

-- OMS op sys date
[oracle@oragrid ~]$ date
Fri Jan 4 14:55:40 EET 2008

-- TZ Parameter of target
[oracle@be02 bin]$ echo $TZ

[oracle@be02 bin]$


It seems there is not any problem about the date settings of the host operationg systems. Let's have a look at the agent and OMS timezone regions.


-- Agents Timezone value
[oracle@be02 bin]$ ./emctl config agent getTZ
Oracle Enterprise Manager 10g Release 3 Grid Control 10.2.0.3.0.
Copyright (c) 1996, 2007 Oracle Corporation. All rights reserved.
Turkey

--emd.properties Timezone parameter value
[oracle@be02 config]$ tail -10 emd.properties
# at startup and reload. Currently this applies only to "Critical" marked dynamic
# properties. The following two values are applicable per target and not per
# dynamic property
#
# dynamicPropReComputeInterval --> time difference between a failed dynamic property
# computation and the next try to compute the property in seconds. The default value is 120 seconds.
#
# dynamicPropReComputeMaxTries --> maximum number of reties for calculating failed
# dynamic properties. The default value is 4 retires.
agentTZRegion=Turkey


Timezone values are also the same. There should not be any problem. Then what is the "Status Pending" state and the last upload date of the target in OMS. I decided to reset timezone settings of the agent and the oms.


-- Changing agents timezone
[oracle@be02 bin]$ ./emctl resetTZ agent
Oracle Enterprise Manager 10g Release 3 Grid Control 10.2.0.3.0.
Copyright (c) 1996, 2007 Oracle Corporation. All rights reserved.
Agent is running. Stop the agent and rerun the command.

[oracle@be02 bin]$ ./emctl stop agent
Oracle Enterprise Manager 10g Release 3 Grid Control 10.2.0.3.0.
Copyright (c) 1996, 2007 Oracle Corporation. All rights reserved.
Stopping agent ... stopped.

[oracle@be02 bin]$ ./emctl resetTZ agent
Oracle Enterprise Manager 10g Release 3 Grid Control 10.2.0.3.0.
Copyright (c) 1996, 2007 Oracle Corporation. All rights reserved.
Updating /oracle/product/agent10g/agent10g/sysman/config/emd.properties...
Successfully updated /oracle/product/agent10g/agent10g/sysman/config/emd.properties.

Login as the em repository user and run the script:
exec mgmt_target.set_agent_tzrgn('be02:3872','Turkey')
This can be done for example by logging into sqlplus and doing
SQL> exec mgmt_target.set_agent_tzrgn('be02:3872','Turkey')


[oracle@be02 bin]$ ./emctl start agent
Oracle Enterprise Manager 10g Release 3 Grid Control 10.2.0.3.0.
Copyright (c) 1996, 2007 Oracle Corporation. All rights reserved.
Starting agent ..... started.

[oracle@be02 bin]$ ./emctl upload agent
Oracle Enterprise Manager 10g Release 3 Grid Control 10.2.0.3.0.
Copyright (c) 1996, 2007 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD upload completed successfully

-- Executing OMS Script
[oracle@oragrid ~]$ sqlplus sysman/

SQL*Plus: Release 10.2.0.3.0 - Production on Fri Jan 4 14:19:19 2008

Copyright (c) 1982, 2006, Oracle. All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options

SQL> exec mgmt_target.set_agent_tzrgn('be02:3872','Turkey');

PL/SQL procedure successfully completed.



Lets check if everything is OK. When i check the OMS console for the status of the Node_2 of my cluster instance it seems to be recovered from the "Status Pending". Node_2 seems to be up in the cluster instances home page also. Everything went fine but one thing to notice. When i click to Node_2 home page in the OMS console weird thing i noticed. Latest Data Collected changed further than before. It now shows 11 days future for the last upload date. Everything goes fine but i still could not find an answer for this strange issue. Maybe somehow target hosts operating system date changed temporarily by unix administrators or by developers for accidentally or for test purposes and then reset to the current date.


You can check the following Oracle document for details of the commands used here:
Oracle® Enterprise Manager Advanced Configuration
10g Release 2 (10.2)
Part Number B16242-02
Chapter 10 - Reconfiguring the Management Agent and Management Service

No comments:

Post a Comment