[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference decwet::networker

Title:NetWorker
Notice:kits - 12-14, problem reporting - 41.*, basics 1-100
Moderator:DECWET::RANDALL.com::lenox
Created:Thu Oct 10 1996
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:750
Total number of notes:3361

420.0. "nsrmmd in U state - V4.2B on 4.0aDUUNIX- 8400" by CSC32::TRENTA () Tue Feb 18 1997 11:10

Hi,

I hope that someone here can help me - I am stumped.
Here it goes... thank you in advance for any assistance.

I have a customer that is running NSR on a 8400.  V4.0a 
Digital UNIX - currently V4.2B NSR (was V3.2 but was told by
NSR support in Atlanta that going to 4.2B might resolve problem).
The problem:  nsrmmd goes into a U state and nsr hangs.

Background:  

1) The problem appears to arise when full dumps are done;
incrementals seem to in most cases work without a hitch. 

2) This same system has also had some problems with nfsd
going into a U state.  UNIX Engineering is having the customer
run with a debug vfs.mod module to try and find out what is 
going on here.  This problem is intermittent.  However there
is SOME correlation.  We got a forced crash when nsrmmd was
in a U state (when running v3.2 NSR) and engineering found
that in this crash it had similar conditions to the nfsd
problems even though nfsd WAS NOT in a U state on this forced
crash.
	a) nfs_udp_wait threads are in a pg_wait
	b) the pg_hold bit is set in the vm_page struct
	c) the vnode is on a local f/s not a nfs f/s

3) In the last occurence of nsrmmd in U state (last Sunday
morning), the customer noticed some errors in dia on rz80 and
rz81.  Further investigation determined that these drives belong
to a LSM volume (vol01) that was mounted as /sdpag6.  This is the
last filesystem NSR appeared to be able to backup.  Below is the
dia output the customer emailed to me yesterday.

4) I still have the forced crash that was mentioned above but
I have not been able to determine exactly what is going on.
Partly probably due to my lack of knowledge on NSR.  So would 
appreciate any pointers.

Well, that's about it.  I have the messages and daemon.log of the
most recent occurence of the problem.  If someone would like to see
it,  just let me know.

I appreciate any help you can provide.

Best Regards,

Debbie Trenta
Lucent/AT&T UNIX Premium Services Support
Colorado Springs Support Center

 
DECevent V2.3


******************************** ENTRY    1 ******************************** 


Logging OS                        2. Digital UNIX 
System Architecture               2. Alpha 
Event sequence number            31. 
Timestamp of occurrence              16-FEB-1997 05:17:45   
Host name                            sd9803 

System type register      x0000000C  AlphaServer 8x00 
Number of CPUs (mpnum)    x00000006 
CPU logging event (mperr) x00000009 

Event validity                    1. O/S claims event is valid 
Event severity                    5. Low Priority 
Entry type                      199. CAM SCSI Event Type 


------- Unit Info -------              
Bus Number                       10. 
Unit Number                   x0280  Target =   0. 
                                     LUN =   0. 
------- CAM Data -------               
Class                           x00  Disk 
Subsystem                       x00  Disk 
Number of Packets                 7. 

------ Packet Type ------       258. Module Name String 

Routine Name                         cdisk_complete 

------ Packet Type ------       256. Generic String 

                                     Cmd Timeout - retrying 

------ Packet Type ------       261. Soft Error String 

Error Type                           Soft Error Detected (recovered) 

------ Packet Type ------       257. Device Name String 

Device Name                          DEC     RZ28     (C) DEC 

------ Packet Type ------       256. Generic String 

                                     Active CCB at time of error 

------ Packet Type ------       256. Generic String 

                                     Command timed out 

------ Packet Type ------         1. SCSI I/O Request CCB(CCB_SCSIIO) 
Packet Revision                  76. 

CCB Address               xFFFFFC0050E5E800 
CCB Length                    x00C0 
XPT Function Code               x01  Execute requested SCSI I/O 
Cam Status                      x0B  Command Timeout 
Path ID                          10. 
Target ID                         0. 
Target LUN                        0. 
Cam Flags                 x00000442  SIM Queue Actions are Enabled 
                                     Data Direction (01: DATA IN) 
                                     Disable the SIM Queue Frozen State 
*pdrv_ptr                 xFFFFFC0050E5E4A8 
*next_ccb                 x0000000000000000 
*req_map                  xFFFFFC001140FA40 
void (*cam_cbfcnp)()      xFFFFFC00004AAF70 
*data_ptr                 xFFFFFFFE7E910000 
Data Transfer Length           8192. 
*sense_ptr                xFFFFFC0050E5E4D0 
Auotsense Byte Length            64. 
CDB Length                        6. 
Scatter/Gather Entry Cnt          0. 
SCSI Status                     x00  Good Condition 
Autosense Residue Length        x00 
Transfer Residue Length   x00002000 
(CDB) Command & Data Buf 

          15--<-12  11--<-08  07--<-04  03--<-00   :Byte Order 
 0000:              00000000  00000010  E0090408   *    ............* 

Timeout Value             x0000003C 
*msg_ptr                  x0000000000000000 
Message Length                    0. 
Vendor Unique Flags           x4000 
Tag Queue Actions               x20  Tag for Simple Queue 


******************************** ENTRY    2 ******************************** 


Logging OS                        2. Digital UNIX 
System Architecture               2. Alpha 
Event sequence number            30. 
Timestamp of occurrence              16-FEB-1997 05:17:45   
Host name                            sd9803 

System type register      x0000000C  AlphaServer 8x00 
Number of CPUs (mpnum)    x00000006 
CPU logging event (mperr) x0000000B 

Event validity                    1. O/S claims event is valid 
Event severity                    1. Severe Priority 
Entry type                      199. CAM SCSI Event Type 


------- Unit Info -------              
Bus Number                       10. 
Unit Number                   x0280  Target =   0. 
                                     LUN =   0. 
------- CAM Data -------               
Class                           x22  DEC SIM - SCSI Interface Module 
Subsystem                       x22  DEC SIM - SCSI Interface Module 
Number of Packets                 2. 

------ Packet Type ------       258. Module Name String 

Routine Name                         ss_abort_done 

------ Packet Type ------       256. Generic String 

                                     SCSI abort tag has been performed 


******************************** ENTRY    3 ******************************** 


Logging OS                        2. Digital UNIX 
System Architecture               2. Alpha 
Event sequence number            29. 
Timestamp of occurrence              16-FEB-1997 05:17:45   
Host name                            sd9803 

System type register      x0000000C  AlphaServer 8x00 
Number of CPUs (mpnum)    x00000006 
CPU logging event (mperr) x00000008 

Event validity                    1. O/S claims event is valid 
Event severity                    1. Severe Priority 
Entry type                      199. CAM SCSI Event Type 


------- Unit Info -------              
Bus Number                       10. 
Unit Number                   x0280  Target =   0. 
                                     LUN =   0. 
------- CAM Data -------               
Class                           x22  DEC SIM - SCSI Interface Module 
Subsystem                       x22  DEC SIM - SCSI Interface Module 
Number of Packets                 3. 

------ Packet Type ------       258. Module Name String 

Routine Name                         ss_perform_timeout 

------ Packet Type ------       256. Generic String 

                                     timeout on disconnected request 

------ Packet Type ------      1038. SIM Working Set(SIM_WS) 
Packet Revision                   2. 

*flink                    xFFFFFC0073D38000 
*blink                    xFFFFFC0073D38000 
Controller # for HBA             10. 
Target ID                         0. 
LUN                               0. 
Cam Status                      x00  CCB Request In Progress 
TAG                       x00000043 
Sequence Number               57187. 
Time Stamp                x00000000 
*nexus                    xFFFFFC0073D38000 
*it_nexus                 xFFFFFC007396CE40 
*sim_sc                   xFFFFFC0073968000 
*ccb                      xFFFFFC0050E5E800 
Phase Bits                x00000000 
Misc Flags                x00080040  This request is tagged 
                                     Timeout 
Cam Flags                 x00000442  SIM Queue Actions are Enabled 
                                     Data Direction (01: DATA IN) 
                                     Disable the SIM Queue Frozen State 
Error Recovery            x00000080  SIM_WS in process of being timed out 
Recovery Status           x00000000 
(*as_callback)()          x0000000000000000 
*as_ccb                   x0000000000000000 
(*tmo_fn)()               xFFFFFC00004E39D0 
*tmo_arg                  xFFFFFC0050E5E5D8 
Rest of SIM_WS                       ** Not Printed ** 


******************************** ENTRY    4 ******************************** 


Logging OS                        2. Digital UNIX 
System Architecture               2. Alpha 
Event sequence number            28. 
Timestamp of occurrence              16-FEB-1997 05:01:27   
Host name                            sd9803 

System type register      x0000000C  AlphaServer 8x00 
Number of CPUs (mpnum)    x00000006 
CPU logging event (mperr) x0000000C 

Event validity                    1. O/S claims event is valid 
Event severity                    5. Low Priority 
Entry type                      199. CAM SCSI Event Type 


------- Unit Info -------              
Bus Number                       10. 
Unit Number                   x0288  Target =   1. 
                                     LUN =   0. 
------- CAM Data -------               
Class                           x00  Disk 
Subsystem                       x00  Disk 
Number of Packets                 7. 

------ Packet Type ------       258. Module Name String 

Routine Name                         cdisk_complete 

------ Packet Type ------       256. Generic String 

                                     Cmd Timeout - retrying 

------ Packet Type ------       261. Soft Error String 

Error Type                           Soft Error Detected (recovered) 

------ Packet Type ------       257. Device Name String 

Device Name                          DEC     RZ28M    (C) DEC 

------ Packet Type ------       256. Generic String 

                                     Active CCB at time of error 

------ Packet Type ------       256. Generic String 

                                     Command timed out 

------ Packet Type ------         1. SCSI I/O Request CCB(CCB_SCSIIO) 
Packet Revision                  76. 

CCB Address               xFFFFFC0040799A00 
CCB Length                    x00C0 
XPT Function Code               x01  Execute requested SCSI I/O 
Cam Status                      x0B  Command Timeout 
Path ID                          10. 
Target ID                         1. 
Target LUN                        0. 
Cam Flags                 x00000442  SIM Queue Actions are Enabled 
                                     Data Direction (01: DATA IN) 
                                     Disable the SIM Queue Frozen State 
*pdrv_ptr                 xFFFFFC00407996A8 
*next_ccb                 x0000000000000000 
*req_map                  xFFFFFC005D431E00 
void (*cam_cbfcnp)()      xFFFFFC00004AAF70 
*data_ptr                 xFFFFFC00790EE000 
Data Transfer Length           8192. 
*sense_ptr                xFFFFFC00407996D0 
Auotsense Byte Length            64. 
CDB Length                       10. 
Scatter/Gather Entry Cnt          0. 
SCSI Status                     x00  Good Condition 
Autosense Residue Length        x00 
Transfer Residue Length   x00002000 
(CDB) Command & Data Buf 

          15--<-12  11--<-08  07--<-04  03--<-00   :Byte Order 
 0000:              00000000  0000A0A7  3E000028   *    (..>........* 

Timeout Value             x0000003C 
*msg_ptr                  x0000000000000000 
Message Length                    0. 
Vendor Unique Flags           x0000 
Tag Queue Actions               x20  Tag for Simple Queue 


******************************** ENTRY    5 ******************************** 


Logging OS                        2. Digital UNIX 
System Architecture               2. Alpha 
Event sequence number            27. 
Timestamp of occurrence              16-FEB-1997 05:01:27   
Host name                            sd9803 

System type register      x0000000C  AlphaServer 8x00 
Number of CPUs (mpnum)    x00000006 
CPU logging event (mperr) x00000008 

Event validity                    1. O/S claims event is valid 
Event severity                    1. Severe Priority 
Entry type                      199. CAM SCSI Event Type 


------- Unit Info -------              
Bus Number                       10. 
Unit Number                   x0288  Target =   1. 
                                     LUN =   0. 
------- CAM Data -------               
Class                           x22  DEC SIM - SCSI Interface Module 
Subsystem                       x22  DEC SIM - SCSI Interface Module 
Number of Packets                 2. 

------ Packet Type ------       258. Module Name String 

Routine Name                         ss_abort_done 

------ Packet Type ------       256. Generic String 

                                     SCSI abort tag has been performed 


******************************** ENTRY    6 ******************************** 


Logging OS                        2. Digital UNIX 
System Architecture               2. Alpha 
Event sequence number            26. 
Timestamp of occurrence              16-FEB-1997 05:01:27   
Host name                            sd9803 

System type register      x0000000C  AlphaServer 8x00 
Number of CPUs (mpnum)    x00000006 
CPU logging event (mperr) x00000008 

Event validity                    1. O/S claims event is valid 
Event severity                    1. Severe Priority 
Entry type                      199. CAM SCSI Event Type 


------- Unit Info -------              
Bus Number                       10. 
Unit Number                   x0288  Target =   1. 
                                     LUN =   0. 
------- CAM Data -------               
Class                           x22  DEC SIM - SCSI Interface Module 
Subsystem                       x22  DEC SIM - SCSI Interface Module 
Number of Packets                 3. 

------ Packet Type ------       258. Module Name String 

Routine Name                         ss_perform_timeout 

------ Packet Type ------       256. Generic String 

                                     timeout on disconnected request 

------ Packet Type ------      1038. SIM Working Set(SIM_WS) 
Packet Revision                   2. 

*flink                    xFFFFFC0073D38080 
*blink                    xFFFFFC0073D38080 
Controller # for HBA             10. 
Target ID                         1. 
LUN                               0. 
Cam Status                      x00  CCB Request In Progress 
TAG                       x0000007A 
Sequence Number               37621. 
Time Stamp                x00000000 
*nexus                    xFFFFFC0073D38080 
*it_nexus                 xFFFFFC007396D100 
*sim_sc                   xFFFFFC0073968000 
*ccb                      xFFFFFC0040799A00 
Phase Bits                x00000000 
Misc Flags                x00080040  This request is tagged 
                                     Timeout 
Cam Flags                 x00000442  SIM Queue Actions are Enabled 
                                     Data Direction (01: DATA IN) 
                                     Disable the SIM Queue Frozen State 
Error Recovery            x00000080  SIM_WS in process of being timed out 
Recovery Status           x00000000 
(*as_callback)()          x0000000000000000 
*as_ccb                   x0000000000000000 
(*tmo_fn)()               xFFFFFC00004E39D0 
*tmo_arg                  xFFFFFC00407997D8 
Rest of SIM_WS                       ** Not Printed ** 

% ====== Internet headers and postmarks (see DECWRL::GATEWAY.DOC) ======
% Received: from mail2.digital.com by us3rmc.pa.dec.com (5.65/rmc-22feb94) id AA22483; Mon, 17 Feb 97 15:40:54 -0800
% Received: from ihgw2.lucent.com by mail2.digital.com (5.65 EXP 4/12/95 for V3.2/1.0/WV) id AA07405; Mon, 17 Feb 1997 15:32:22 -0800
% Received: from clipper.cb.lucent.com by ihig2.firewall.lucent.com (SMI-8.6/EMS-L sol2) id RAA29601; Mon, 17 Feb 1997 17:19:59 -0600
% Received: by clipper.cb.lucent.com (SMI-8.6/EMS-L sol2) id SAA29454; Mon, 17 Feb 1997 18:22:42 -0500
% From: [email protected] (BL0312500-David Jorgensen(CUN5OETJS2)NONE)
% Cc: [email protected], [email protected]
% Received: from sundance.cbctc.cb by clipper.cb.lucent.com (SMI-8.6/EMS-L sol2) id SAA29438; Mon, 17 Feb 1997 18:22:38 -0500
% Received: by sundance.cbctc.cb (SMI-8.6/SMI-SVR4) id SAA15894; Mon, 17 Feb 1997 18:25:37 -0500
% Date: Mon, 17 Feb 1997 18:25:37 -0500
% Original-From: [email protected] (BL0312500-David Jorgensen(CUN5OETJS2)NONE)
% Message-Id: <[email protected]>
% To: csc32::trenta
% Subject: NSR in u-state again! :(
% Original-Cc: [email protected], [email protected]
% Content-Type: X-sun-attachment
T.RTitleUserPersonal
Name
DateLines