T.R | Title | User | Personal Name | Date | Lines |
---|
995.1 | It works this way... | AZUR::HUREZ | Connectivity & Computing Services @VBE. DTN 828-5159 | Fri Feb 07 1997 05:50 | 21 |
| Hi Dan,
The disk list available on the system is originally fetched (and
regularly updated) into the Agent memory using the $DEVICE_SCAN system
service with a DVS$_DEVCLASS item code valued with the DC$_DISK constant
(out of the $DCDEF macro).
Then, on each Agent poll time, we use a $GETDVI indeed on each individual
disk to fetch error counters (DVI$_ERRCNT) and disk status (DVI$_STS,
DVI$_DEVCHAR and DVI$_DEVCHAR2).
Old and new error counts are stored, along with the disk specification,
in a linked list within the Agent, and values are compared at each poll
time in order to determine whether events should be generated or not.
I hope this helps...
Regards,
-- Olivier.
|
995.2 | | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Fri Feb 07 1997 19:28 | 6 |
| Thanks Olivier. SInce you use $DEVICE_SCAN then it should find all
disks regardless of the controller or storage architecture. Looks like
I'll have to try and set up a local SCSI cluster.
Regs,
Dan
|
995.3 | | COMICS::JOLLEYD | | Wed Feb 19 1997 09:42 | 37 |
| Hello all,
Sorry to piggy back this entry but it fits the bill for an error I have
seen.
Customer is running 6.1 of VMS Watchdog 2.2 eco 2 and is getting the
following error on starting an agent.
%ADA-F-EXCCOPX, Exception was copied at a raise or accept statement
-SYSTEM-F-RANGEERR, range error, PC=00044C7C , PS=0000001B
Upon setting the logical sns$watchdog to full the following output is
seen from the device_scan routine I guess.
WD_HW: Disk :_$1$DKC300:
WD_HW: -> New. 0
WD_HW: Disk :_$1$DKC400:
WD_HW: -> New. 0
WD_HW: Disk :_$1$DKC500:
WD_HW: -> New. 0
The program then give the ada error above. The next device is $1$dkc600
Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
$1$DKC600: (HUMV12) Online 20779
You can see the large error count.
Is this breaking the code ?
My customer cannot reload for a week so I cannot confirm that this is
the case.
Regards
Darren (OpenVMS Support. UK CSC)
|
995.4 | There's something wrong indeed... | AZUR::HUREZ | Connectivity & Computing Services @VBE. DTN 828-5159 | Wed Feb 19 1997 12:17 | 22 |
| Well, this is strange enough, since
. We're using GETDVI with the DVI$_ERRCNT item code,
which returns the error count as a 32 bits decimal number.
. We cast this down to a 16 bits unsigned decimal number, which looks
weird and probably is the source of the problem you're experiencing...
. ... although 16 bits unsigned provides a range from 0 up to 65535
which is enough to hold the 20779 error count you got (unless
VMS doesn't show it correcly either or another disk would have an
even greater count...).
Anyway, I'll address that in the source code for ECO04 (currently in
Field Test) and let you know when a new kit will be ready for you to
cross-check its efficiency on customer site if you wish...
Best Regards,
-- Olivier Hurez.
|
995.5 | | CSC32::BUTTERWORTH | Gun Control is a steady hand. | Tue Mar 25 1997 21:31 | 24 |
| The plot thickens. I have reproduced this problem on a small scsi
cluster that consists of an Alphaserver 1000 and an Alphastation 400.
There is at least one RZ28B sitting right on the common SCSI
bus.
I can bump the error counts with DELTA and no event is reported.
The SNS$DSK_FILTER_OFF is set to TRUE so all increments should
be reported. As a control, I tried the delta trick on a CI
based VAXCluster and all errors were reported faithfully.
I set SNS$WATCHDOG_TRACE to FULL and indeed the agent sees the
increment *but* it never adds an entry to the message list thus the
consolidator never sees it. The agents in question are T2.2-08 so
I want to install at least eco 3 and retest. If it fails we will
go ahead and IPMT it. The consolidator has eco 3 installed and is
a VAXStation 4000-60 by the way.
Any thoughts Ollie?
Regs,
Dan.
|