[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | NetWorker |
Notice: | kits - 12-14, problem reporting - 41.*, basics 1-100 |
Moderator: | DECWET::RANDALL .com::lenox |
|
Created: | Thu Oct 10 1996 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 750 |
Total number of notes: | 3361 |
420.0. "nsrmmd in U state - V4.2B on 4.0aDUUNIX- 8400" by CSC32::TRENTA () Tue Feb 18 1997 11:10
Hi,
I hope that someone here can help me - I am stumped.
Here it goes... thank you in advance for any assistance.
I have a customer that is running NSR on a 8400. V4.0a
Digital UNIX - currently V4.2B NSR (was V3.2 but was told by
NSR support in Atlanta that going to 4.2B might resolve problem).
The problem: nsrmmd goes into a U state and nsr hangs.
Background:
1) The problem appears to arise when full dumps are done;
incrementals seem to in most cases work without a hitch.
2) This same system has also had some problems with nfsd
going into a U state. UNIX Engineering is having the customer
run with a debug vfs.mod module to try and find out what is
going on here. This problem is intermittent. However there
is SOME correlation. We got a forced crash when nsrmmd was
in a U state (when running v3.2 NSR) and engineering found
that in this crash it had similar conditions to the nfsd
problems even though nfsd WAS NOT in a U state on this forced
crash.
a) nfs_udp_wait threads are in a pg_wait
b) the pg_hold bit is set in the vm_page struct
c) the vnode is on a local f/s not a nfs f/s
3) In the last occurence of nsrmmd in U state (last Sunday
morning), the customer noticed some errors in dia on rz80 and
rz81. Further investigation determined that these drives belong
to a LSM volume (vol01) that was mounted as /sdpag6. This is the
last filesystem NSR appeared to be able to backup. Below is the
dia output the customer emailed to me yesterday.
4) I still have the forced crash that was mentioned above but
I have not been able to determine exactly what is going on.
Partly probably due to my lack of knowledge on NSR. So would
appreciate any pointers.
Well, that's about it. I have the messages and daemon.log of the
most recent occurence of the problem. If someone would like to see
it, just let me know.
I appreciate any help you can provide.
Best Regards,
Debbie Trenta
Lucent/AT&T UNIX Premium Services Support
Colorado Springs Support Center
DECevent V2.3
******************************** ENTRY 1 ********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 31.
Timestamp of occurrence 16-FEB-1997 05:17:45
Host name sd9803
System type register x0000000C AlphaServer 8x00
Number of CPUs (mpnum) x00000006
CPU logging event (mperr) x00000009
Event validity 1. O/S claims event is valid
Event severity 5. Low Priority
Entry type 199. CAM SCSI Event Type
------- Unit Info -------
Bus Number 10.
Unit Number x0280 Target = 0.
LUN = 0.
------- CAM Data -------
Class x00 Disk
Subsystem x00 Disk
Number of Packets 7.
------ Packet Type ------ 258. Module Name String
Routine Name cdisk_complete
------ Packet Type ------ 256. Generic String
Cmd Timeout - retrying
------ Packet Type ------ 261. Soft Error String
Error Type Soft Error Detected (recovered)
------ Packet Type ------ 257. Device Name String
Device Name DEC RZ28 (C) DEC
------ Packet Type ------ 256. Generic String
Active CCB at time of error
------ Packet Type ------ 256. Generic String
Command timed out
------ Packet Type ------ 1. SCSI I/O Request CCB(CCB_SCSIIO)
Packet Revision 76.
CCB Address xFFFFFC0050E5E800
CCB Length x00C0
XPT Function Code x01 Execute requested SCSI I/O
Cam Status x0B Command Timeout
Path ID 10.
Target ID 0.
Target LUN 0.
Cam Flags x00000442 SIM Queue Actions are Enabled
Data Direction (01: DATA IN)
Disable the SIM Queue Frozen State
*pdrv_ptr xFFFFFC0050E5E4A8
*next_ccb x0000000000000000
*req_map xFFFFFC001140FA40
void (*cam_cbfcnp)() xFFFFFC00004AAF70
*data_ptr xFFFFFFFE7E910000
Data Transfer Length 8192.
*sense_ptr xFFFFFC0050E5E4D0
Auotsense Byte Length 64.
CDB Length 6.
Scatter/Gather Entry Cnt 0.
SCSI Status x00 Good Condition
Autosense Residue Length x00
Transfer Residue Length x00002000
(CDB) Command & Data Buf
15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
0000: 00000000 00000010 E0090408 * ............*
Timeout Value x0000003C
*msg_ptr x0000000000000000
Message Length 0.
Vendor Unique Flags x4000
Tag Queue Actions x20 Tag for Simple Queue
******************************** ENTRY 2 ********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 30.
Timestamp of occurrence 16-FEB-1997 05:17:45
Host name sd9803
System type register x0000000C AlphaServer 8x00
Number of CPUs (mpnum) x00000006
CPU logging event (mperr) x0000000B
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 199. CAM SCSI Event Type
------- Unit Info -------
Bus Number 10.
Unit Number x0280 Target = 0.
LUN = 0.
------- CAM Data -------
Class x22 DEC SIM - SCSI Interface Module
Subsystem x22 DEC SIM - SCSI Interface Module
Number of Packets 2.
------ Packet Type ------ 258. Module Name String
Routine Name ss_abort_done
------ Packet Type ------ 256. Generic String
SCSI abort tag has been performed
******************************** ENTRY 3 ********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 29.
Timestamp of occurrence 16-FEB-1997 05:17:45
Host name sd9803
System type register x0000000C AlphaServer 8x00
Number of CPUs (mpnum) x00000006
CPU logging event (mperr) x00000008
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 199. CAM SCSI Event Type
------- Unit Info -------
Bus Number 10.
Unit Number x0280 Target = 0.
LUN = 0.
------- CAM Data -------
Class x22 DEC SIM - SCSI Interface Module
Subsystem x22 DEC SIM - SCSI Interface Module
Number of Packets 3.
------ Packet Type ------ 258. Module Name String
Routine Name ss_perform_timeout
------ Packet Type ------ 256. Generic String
timeout on disconnected request
------ Packet Type ------ 1038. SIM Working Set(SIM_WS)
Packet Revision 2.
*flink xFFFFFC0073D38000
*blink xFFFFFC0073D38000
Controller # for HBA 10.
Target ID 0.
LUN 0.
Cam Status x00 CCB Request In Progress
TAG x00000043
Sequence Number 57187.
Time Stamp x00000000
*nexus xFFFFFC0073D38000
*it_nexus xFFFFFC007396CE40
*sim_sc xFFFFFC0073968000
*ccb xFFFFFC0050E5E800
Phase Bits x00000000
Misc Flags x00080040 This request is tagged
Timeout
Cam Flags x00000442 SIM Queue Actions are Enabled
Data Direction (01: DATA IN)
Disable the SIM Queue Frozen State
Error Recovery x00000080 SIM_WS in process of being timed out
Recovery Status x00000000
(*as_callback)() x0000000000000000
*as_ccb x0000000000000000
(*tmo_fn)() xFFFFFC00004E39D0
*tmo_arg xFFFFFC0050E5E5D8
Rest of SIM_WS ** Not Printed **
******************************** ENTRY 4 ********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 28.
Timestamp of occurrence 16-FEB-1997 05:01:27
Host name sd9803
System type register x0000000C AlphaServer 8x00
Number of CPUs (mpnum) x00000006
CPU logging event (mperr) x0000000C
Event validity 1. O/S claims event is valid
Event severity 5. Low Priority
Entry type 199. CAM SCSI Event Type
------- Unit Info -------
Bus Number 10.
Unit Number x0288 Target = 1.
LUN = 0.
------- CAM Data -------
Class x00 Disk
Subsystem x00 Disk
Number of Packets 7.
------ Packet Type ------ 258. Module Name String
Routine Name cdisk_complete
------ Packet Type ------ 256. Generic String
Cmd Timeout - retrying
------ Packet Type ------ 261. Soft Error String
Error Type Soft Error Detected (recovered)
------ Packet Type ------ 257. Device Name String
Device Name DEC RZ28M (C) DEC
------ Packet Type ------ 256. Generic String
Active CCB at time of error
------ Packet Type ------ 256. Generic String
Command timed out
------ Packet Type ------ 1. SCSI I/O Request CCB(CCB_SCSIIO)
Packet Revision 76.
CCB Address xFFFFFC0040799A00
CCB Length x00C0
XPT Function Code x01 Execute requested SCSI I/O
Cam Status x0B Command Timeout
Path ID 10.
Target ID 1.
Target LUN 0.
Cam Flags x00000442 SIM Queue Actions are Enabled
Data Direction (01: DATA IN)
Disable the SIM Queue Frozen State
*pdrv_ptr xFFFFFC00407996A8
*next_ccb x0000000000000000
*req_map xFFFFFC005D431E00
void (*cam_cbfcnp)() xFFFFFC00004AAF70
*data_ptr xFFFFFC00790EE000
Data Transfer Length 8192.
*sense_ptr xFFFFFC00407996D0
Auotsense Byte Length 64.
CDB Length 10.
Scatter/Gather Entry Cnt 0.
SCSI Status x00 Good Condition
Autosense Residue Length x00
Transfer Residue Length x00002000
(CDB) Command & Data Buf
15--<-12 11--<-08 07--<-04 03--<-00 :Byte Order
0000: 00000000 0000A0A7 3E000028 * (..>........*
Timeout Value x0000003C
*msg_ptr x0000000000000000
Message Length 0.
Vendor Unique Flags x0000
Tag Queue Actions x20 Tag for Simple Queue
******************************** ENTRY 5 ********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 27.
Timestamp of occurrence 16-FEB-1997 05:01:27
Host name sd9803
System type register x0000000C AlphaServer 8x00
Number of CPUs (mpnum) x00000006
CPU logging event (mperr) x00000008
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 199. CAM SCSI Event Type
------- Unit Info -------
Bus Number 10.
Unit Number x0288 Target = 1.
LUN = 0.
------- CAM Data -------
Class x22 DEC SIM - SCSI Interface Module
Subsystem x22 DEC SIM - SCSI Interface Module
Number of Packets 2.
------ Packet Type ------ 258. Module Name String
Routine Name ss_abort_done
------ Packet Type ------ 256. Generic String
SCSI abort tag has been performed
******************************** ENTRY 6 ********************************
Logging OS 2. Digital UNIX
System Architecture 2. Alpha
Event sequence number 26.
Timestamp of occurrence 16-FEB-1997 05:01:27
Host name sd9803
System type register x0000000C AlphaServer 8x00
Number of CPUs (mpnum) x00000006
CPU logging event (mperr) x00000008
Event validity 1. O/S claims event is valid
Event severity 1. Severe Priority
Entry type 199. CAM SCSI Event Type
------- Unit Info -------
Bus Number 10.
Unit Number x0288 Target = 1.
LUN = 0.
------- CAM Data -------
Class x22 DEC SIM - SCSI Interface Module
Subsystem x22 DEC SIM - SCSI Interface Module
Number of Packets 3.
------ Packet Type ------ 258. Module Name String
Routine Name ss_perform_timeout
------ Packet Type ------ 256. Generic String
timeout on disconnected request
------ Packet Type ------ 1038. SIM Working Set(SIM_WS)
Packet Revision 2.
*flink xFFFFFC0073D38080
*blink xFFFFFC0073D38080
Controller # for HBA 10.
Target ID 1.
LUN 0.
Cam Status x00 CCB Request In Progress
TAG x0000007A
Sequence Number 37621.
Time Stamp x00000000
*nexus xFFFFFC0073D38080
*it_nexus xFFFFFC007396D100
*sim_sc xFFFFFC0073968000
*ccb xFFFFFC0040799A00
Phase Bits x00000000
Misc Flags x00080040 This request is tagged
Timeout
Cam Flags x00000442 SIM Queue Actions are Enabled
Data Direction (01: DATA IN)
Disable the SIM Queue Frozen State
Error Recovery x00000080 SIM_WS in process of being timed out
Recovery Status x00000000
(*as_callback)() x0000000000000000
*as_ccb x0000000000000000
(*tmo_fn)() xFFFFFC00004E39D0
*tmo_arg xFFFFFC00407997D8
Rest of SIM_WS ** Not Printed **
% ====== Internet headers and postmarks (see DECWRL::GATEWAY.DOC) ======
% Received: from mail2.digital.com by us3rmc.pa.dec.com (5.65/rmc-22feb94) id AA22483; Mon, 17 Feb 97 15:40:54 -0800
% Received: from ihgw2.lucent.com by mail2.digital.com (5.65 EXP 4/12/95 for V3.2/1.0/WV) id AA07405; Mon, 17 Feb 1997 15:32:22 -0800
% Received: from clipper.cb.lucent.com by ihig2.firewall.lucent.com (SMI-8.6/EMS-L sol2) id RAA29601; Mon, 17 Feb 1997 17:19:59 -0600
% Received: by clipper.cb.lucent.com (SMI-8.6/EMS-L sol2) id SAA29454; Mon, 17 Feb 1997 18:22:42 -0500
% From: [email protected] (BL0312500-David Jorgensen(CUN5OETJS2)NONE)
% Cc: [email protected], [email protected]
% Received: from sundance.cbctc.cb by clipper.cb.lucent.com (SMI-8.6/EMS-L sol2) id SAA29438; Mon, 17 Feb 1997 18:22:38 -0500
% Received: by sundance.cbctc.cb (SMI-8.6/SMI-SVR4) id SAA15894; Mon, 17 Feb 1997 18:25:37 -0500
% Date: Mon, 17 Feb 1997 18:25:37 -0500
% Original-From: [email protected] (BL0312500-David Jorgensen(CUN5OETJS2)NONE)
% Message-Id: <[email protected]>
% To: csc32::trenta
% Subject: NSR in u-state again! :(
% Original-Cc: [email protected], [email protected]
% Content-Type: X-sun-attachment
T.R | Title | User | Personal Name | Date | Lines
|
---|