Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Mailing Lists -> Oracle-L -> RE: A Challenge - My Answer
Thanks for your answers.
Some of you are using software for your solution, this is nice when it is available but there is the time to install, configure, and deploy to multiple servers/environments (for me anyway). Others have written scripts in the past but have moved to other tools. Others had some scripts they were currently using.
My script is below. I actually took my load monitoring script, decided it was pretty ugly and thought that I should just create a script that monitors numbers and it would be a bit more useful for other things.
Some ways I could use the script below...(the load average example is in the script).
# if >= 200 oracle processes for > 30 minutes then alert and sleep 60
minutes
watchnum.ksh -t 30 -s 60 smon $(ps -ef | grep oracle | wc -l) 200
epost1_at_yahoo.com
# alert if a process has accumulated more than N cpu time and does not
go away after N time.
should be easy, sorry no example
you get the idea, I am also going to port below to PL/SQL as well as make a few enhancements.
#!/usr/bin/ksh
typeset -i SLEEP_UNTIL_TIME OVER_THRESHOLD_TIME MIN_SLEEP_TIME
CURRENT_TIME BEGIN_TIME
typeset -u WATCH_ID
MAX_ALLOWED_TIME=0
MIN_SLEEP_TIME=0
SEND_OK_ON_RESET=N
LOG_FILE=/tmp/watchnum.log
TMP_DIRECTORY=/tmp
HEADLINE="watchlog.ksh"
function uhoh {
if (( ${1} )); then
echo "uhoh: ${2}"
exit 1
fi
}
function current_minutes {
# This very UGLY function calculates the # of minutes since the year
2000.
MIN_YEAR=$( date +"%Y" ) MIN_YEAR=$( expr ${MIN_YEAR} - 2000 ) MIN_YEAR=$( expr ${MIN_YEAR} \* 525600 ) MIN_DAYS=$( date +"%j" ) MIN_DAYS=$( expr "${MIN_DAYS}" - 1 ) MIN_DAYS=$( expr "${MIN_DAYS}" \* 1440 ) MIN_HOURS=$( date +"%H" ) MIN_HOURS=$( expr "${MIN_HOURS}" \* 60 ) MIN_MINS=$( date +"%M" ) MIN_TOTAL=$(( ${MIN_YEAR} + ${MIN_DAYS} + ${MIN_HOURS} + ${MIN_MINS}))
echo ${MIN_TOTAL}
}
CURRENT_TIME=$(current_minutes)
while getopts :t:s:a:l:oh: options
do
case $options in
t) MAX_ALLOWED_TIME=${OPTARG} ;; s) MIN_SLEEP_TIME=${OPTARG} ;; l) LOG_FILE="${OPTARG}" ;; o) SEND_OK_ON_RESET=Y ;; h) HEADLINE="${OPTARG}" ;; \?) print ${OPTARG} is not a valid argument. ;;esac
shift $(expr $OPTIND - 1)
usage() {
cat <<USAGE
Script:
watchnum.ksh
Options:
-o Sends an everything is OK message when the monitored value
falls below the defined threshold.
-t Sets MAX_ALLOWED_TIME. The number of minutes the monitored
value is allowed to exceed the threshold before triggering an alert. -s Sets MIN_SLEEP_TIME. The number of minutes to ignore alerts
for after an alert has been triggered. This helps cut down the number of emails and pages when you already know there is a problem. -h Sets HEADLINE. This is the string that will appear in the subject
of the email of page.
-l Sets LOG_FILE. This defaults to /tmp/watchnum.ksh unless specified.
Parameters (1-3 are required):
\$1 WATCH_ID - User specified ID for this alert, no spaces no silly
characters.
\$2 CURRENT_VALUE - Current value of the number related to this alert. \$3 THRESHOLD - The threshold that will trigger the alert. \$4 EMAILS/PAGERS - List of emails with commas between them.
Examples:
# If server load average is over 8 for 2 hours send email. watchnum.ksh -o -t 120 -s 180 -h "Server Load Warning" \\
-l /home/oracle/log/watchnum.log loadavg \$(uptime | awk '{ print substr($(NF-2),1,4) }') \\
8 epost1_at_yahoo.com
USAGE
exit 1
}
if (( $# == 0 )); then
usage;
fi
# Exit if these parameters are not supplied. [[ -z "${1}" || -z ${2} || -z ${3} ]] && usage
WATCH_ID=${1}
CURRENT_NUMBER=${2}
THRESHOLD=${3}
EMAILS="${4}"
ALERT_OR_OK=
if [[ -n ${LOG_FILE} ]]; then
touch ${LOG_FILE} || uhoh $? "Cannot create ${LOG_FILE}." fi
TINY="${TMP_DIRECTORY}/watchval_${WATCH_ID}.dat" [[ -f "${TINY}" ]] || echo "${WATCH_ID}:0:0" > ${TINY} || uhoh $? "Could not create ${TINY}."
SLEEP_UNTIL_TIME=$(cat ${TINY} | awk -F":" '{ print $2}') BEGIN_TIME=$(cat ${TINY} | awk -F":" '{ print $3}')
if (( ${CURRENT_NUMBER} >= ${THRESHOLD} )); then
# When over threshold and begin is still zero, then this is first
time over
# the threshold and we will set begin to current time.
if (( ${BEGIN_TIME} == 0 )); then
BEGIN_TIME=${CURRENT_TIME} echo "${WATCH_ID}:${SLEEP_UNTIL_TIME}:${BEGIN_TIME}" > ${TINY}fi
# If we are not currently in a sleep cycle.
if (( ${CURRENT_TIME} >= ${SLEEP_UNTIL_TIME} )); then
# Get the # of minutes we have been over threshold. OVER_THRESHOLD_TIME=$( echo "${CURRENT_TIME} - ${BEGIN_TIME}" | bc -l ) # If # of minutes is more than allowed trigger alert. if (( ${OVER_THRESHOLD_TIME} >= ${MAX_ALLOWED_TIME} )); then # We will sleep until stated, this will require an update to the record. SLEEP_UNTIL_TIME=$( echo "${CURRENT_TIME} + ${MIN_SLEEP_TIME}" | bc -l) echo "${WATCH_ID}:${SLEEP_UNTIL_TIME}:${BEGIN_TIME}" > ${TINY} ALERT_OR_OK="ALERT" fi
# If we fall under threshold reset the entire record.
echo "${WATCH_ID}:0:0" > ${TINY} if (( ${BEGIN_TIME} > 0 )); then [[ "${SEND_OK_ON_RESET}" = "Y" ]] && ALERT_OR_OK="OK"fi
echo "$(hostname)|${WATCH_ID}|$(date +"%m/%d/%Y %H:%M")|${CURRENT_NUMBER}|${SEND_OR_OK}" >> ${LOG_FILE} if [[ -n "${ALERT_OR_OK}" ]]; then
for EMAIL_ADDRESS in ${EMAILS}; do
echo "${ALERT_OR_OK} ${WATCH_ID}=${CURRENT_NUMBER}, host=$(hostname)" | mailx -s "${HEADLINE}" "${EMAIL_ADDRESS}"
done
fi
exit 0
-- http://www.freelists.org/webpage/oracle-lReceived on Mon Nov 14 2005 - 16:34:02 CST
![]() |
![]() |