Wednesday, November 11, 2009

creating user defined metric for monitoring job durations

In this production system there are some number of database jobs (not Scheduler Jobs) running on different time intervals. Some of these jobs are running on every five minutes while the other running on every hour. There are also some batch jobs scheduled to be run on every night.

There was a custom developed Oracle9i script to monitor every job duration and warn the DBA if there is a broken job or a job with more than three hours of duration. As this database is upgraded to 10g now we are able to use "Notification Rules" and "User Defined Metrics" to monitor the database jobs. This approach will supply us the historical data collection and also the standart notification method with all other EM Console messages. We also will not need to compile this old package on all of the production databases instead the User Defined Metric can be applied to all of the databases with including the metric in the monitoring templates.

"User Defined Metrics" page can be reached from a link at the bottom of the main page of the Enterprise Manager DB Console just under the Related Links section shown below.



When you click to the "User-Defined Metrics" link on the main page of the EM DB Console, User Defined Metrics are listed.



You can manage the your own custom metrics by using the right top buttons. You can simply select on of the User Defined Metric and click on the "View Data" Button.



When you click on the "View Data" button the Metric Value History of the specified Metric can be seen.



Again from the main page of the "User-Defined Metrics" you can click on the "Create" Button to create a brand new User Defined Metric. There is a sample new metric creation page values which i used for the metric UDM_LONG_RUNNING_JOBS. This metric calculates the job durations and notified the DBA if there are jobs which have more than 180 Minutes of duration time.



This metric has a string value and collection SQL of the metric is as follows. If the return value contains "DURATION" string then EM Console pretends as it exceeds a Critical Treshold and throws a Critical Alert. Metric collection runs every 5 minutes. At least 1 occurance is sufficient to send a Critical Alert.


select RETURN from
(
select 'DURATION: ' || round((sysdate-this_date)*24*60,2) || 'minutes JOB: ' || job || ' -> ' || substr(WHAT,1,50) as RETURN from
(
select
CASE when round((sysdate-this_date)*24*60,2) >= 5 then 'TRUE' else 'FALSE' end as return,
round((sysdate-this_date)*24*60,2) as minutes,
this_date,
sysdate,
job,
schema_user,
last_date,
next_date,
total_time,
interval,
what
from
dba_jobs
where
broken='N' and
LAST_DATE is not NULL
order by
next_date asc
)
where return='TRUE'
group by round((sysdate-this_date)*24*60,2),job,substr(WHAT,1,50)
UNION
select 'OK' as RETURN from dual
)
where rownum=1


The last thing is, if you want an email notification when a Critical Alert triggered by this User Defined Metric you should add this metric to the notification rules which is in the preferences page of the EM Grid Control.






No comments:

Post a Comment