[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::ase

Title:	ase

Moderator:	SMURF::GROSSO

Created:	Thu Jul 29 1993
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2114
Total number of notes:	7347

1977.0. "Memory Channel Problems?" by ZPOVC::JUSTIN () Wed Apr 02 1997 01:28

Hello,

	We are experiencing some strange behaviour with our UNIX systems 
and would like to find out why? 

	Our config (ALL NEW) consists of 3 A8400,10CPUs,5/440,4GB Mem, 
kftia with FDDI, PCI bus with 1 KZPSA and Memory channel card. All 
three systems are connected via the KFTIA based FDDI controller to a 
GIGaswitch/FDDI. The 3 systems are also connected via memory channel 
to a MC-HUB. The KZPSAs are also connected between the systems with 
a pair of HSZ50s (dual redundant)with write-back cache (2X128). UNIX 
ver 4.0B. Latest dupatch kit applied. KZPSA ids are 7, 6, 5, for the 
3 A8400 and 0,1,2,3 for the HSZ50s. KZPSAs are on PCI slot 7, MCs on 
slot 0, The MC ID are 1 (master) and 4,6 for the clients. All systems 
the FWD ISP controllers for the local wide disks, SCSI-2 for the RRD45
and TLZ09s. The master node uses another ISP bus for a TZ887 and TSZ07.
The only shared bus is the KZPSA.

	As the systems will be serving many third party applications
(which are hardware linked licenses). ASE is not being used. One of
the systems acts as the NIS master which serves the two clients. The 
node also mounts the file systems via the HSZ50s and NFS serves them 
to the client via the MC. Users access the systems via the FDDI.

	When installed using FDDI all worked as expected. However after
installing the MC drivers (TCR140) MCA-UA. We mounted the the NFS 
files systems via the MC path. Till this point all worked fine. Now 
if we reboot any of the clients the system hangs at the NFS mount with 
rpc time outs. If we reboot the master it hangs at the memory channel 
startup waiting for node 1 etc. The only way to bring up the system 
would be to shutdown all three systems, power off the MC-HUB, boot the 
master node, power on the MC-HUB followed by the two client nodes. 
Once up everything works fine.

	Also noted intermittently is that during boot up there are
many CAM errors, with the isp controllers similar to those reported
in note 5707 in digital_unix conf.

	So are we doing anything wrong here? Is the memory channel
behaviour normal? Would appreciate any comments/ideas to resolve this
problem.

Regards

Justin

T.R	Title	User	Personal Name	Date	Lines
1977.1	Shared scsi not supported without ASE	NETRIX::"[email protected]"	Dave Cherkus	`Wed Apr 02 1997 14:14`	16
	I think the root cause is when one node leaves your 'cluster' it issues a SCSI bus reset, and because the ASE code is not installed on the other nodes they don't know what to do about it. The NFS hangs are probably due to IOs that will never complete because of this. Shared SCSI is not supported without the ASE product installed. Shutting down everything of course clears the problem. Why are the two client nodes on the shared scsi if they aren't serving the data? Getting them off the bus will prove or disprove my theory. [Posted by WWW Notes gateway]
1977.2	No-shared bus, same behaviour	ZPOVC::JUSTIN		`Thu Apr 03 1997 03:50`	14
	Hello, I have tried what you suggested. However the symptoms remain. We also had shared SCSI when using FDDI for interconnect, it did not have these problems. However I do see that having shared bus without ASE (which I understand prevents dual mounting?) is potentially unsafe. I am in the process of getting a full TCR-UA license to test if it will still behave the same. Is there any other options that I can try? Justin
1977.3	more info	ZPOVC::JUSTIN		`Thu Apr 03 1997 06:02`	102
	Hello, Here is some additional info . When the Master system is booted (others shutdown) this is what we get >>> boot . . . Dual TLEP at node 4 Dual TLEP at node 3 Dual TLEP at node 2 Dual TLEP at node 1 Dual TLEP at node 0 monitorBoot: doing it... Cluster Memory Channel primary adaptor is online. Rev 14 adaptor is the primary channel (pci bus 1, slot 0) connected to virtual hub (VH1) as node 1. dli: configured clubase: configured skipping test/delay for VH0/VH1 system drd: configured. dlmsl: configured cnxagent: configured dlm: configured. memory channel thread init checking for existing memory channel nodes unresponsive mc nodes - waiting for node mask 1 unresponsive mc nodes - waiting for node mask 1 unresponsive mc nodes - waiting for node mask 1 unresponsive mc nodes - waiting for node mask 1 unresponsive mc nodes - waiting for node mask 1 unresponsive mc nodes - waiting for node mask 1 cam_logger: CAM_ERROR packet cam_logger: bus 0 target 1 lun 0 ss_perform_timeout timeout on disconnected request cam_logger: CAM_ERROR packet cam_logger: bus 0 target 1 lun 0 isp_termio_abort_bdr Failed to abort specified IO - scheduling chip reinit cam_logger: CAM_ERROR packet cam_logger: bus 0 isp_reinit Begining Adaptor/Chip reinitialization cam_logger: CAM_ERROR packet cam_logger: bus 0 isp_cam_bus_reset_tmo SCSI Bus Reset performed unresponsive mc nodes - waiting for node mask 1 unresponsive mc nodes - waiting for node mask 1 unresponsive mc nodes - waiting for node mask 1 unresponsive mc nodes - waiting for node mask 1 unresponsive mc nodes - waiting for node mask 1 unresponsive mc nodes - waiting for node mask 1 crashing unresponsive node 0 It then hangs here forever. If the memory channel hub is turned off then this is the boot up sequence. . . . Dual TLEP at node 4 Dual TLEP at node 3 Dual TLEP at node 2 Dual TLEP at node 1 Dual TLEP at node 0 monitorBoot: doing it... Cluster Memory Channel primary adaptor is online. Rev 14 adaptor is the primary channel (pci bus 1, slot 0) connected to virtual hub (VH1) as node 1. dli: configured clubase: configured skipping test/delay for VH0/VH1 system drd: configured. dlmsl: configured cnxagent: configured dlm: configured. memory channel thread init checking for existing memory channel nodes booting as primary memory channel node on mc0 memory channel software inited - node 1 on mc0 ccomsub: configured mcnet: configured Starting secondary cpu 1 Starting secondary cpu 2 Starting secondary cpu 3 Starting secondary cpu 4 Starting secondary cpu 5 Starting secondary cpu 6 Starting secondary cpu 7 Starting secondary cpu 8 Starting secondary cpu 9 . . .
1977.4	Bad MC jumper settings	NETRIX::"[email protected]"	Dave Cherkus	`Thu Apr 03 1997 07:57`	6
	Ah! You are using a real hub, yet your MC board is jumpered for virtual hub. The MC board should have come with a manual explaining how to change this. If not, let me know and I'll vector you to a web page that explains it. [Posted by WWW Notes gateway]
1977.5	pin 1-2 jumpered?	ZPOVC::JUSTIN		`Thu Apr 03 1997 08:04`	6
	Hi, We've double checked it with the manual, the jumper is across pin 1 and 2, of the 3 pins. Is was also the factory default. This is the line card on the PCI bus that we are talking about right? Justin
1977.6	Bad board?	NETRIX::"[email protected]"	Dave Cherkus	`Thu Apr 03 1997 14:36`	7
	According to my info, you are correct, so if Digital UNIX is still reporting the MC board is in virtual hub mode I would suspect a defective board. It will never work till UNIX reports a STD (real hub) setting instead of VH0 or VH1. Dave [Posted by WWW Notes gateway]
1977.7	BTW...	NETRIX::"[email protected]"	Dave Cherkus	`Thu Apr 03 1997 14:39`	5
	...your printout says VH1, which is the 'no jumper installed' setting. I really suspect a defective board or jumper, or a misinstalled jumper. Dave [Posted by WWW Notes gateway]
1977.8	bad jumper setting	ZPOVC::JUSTIN		`Thu Apr 03 1997 23:59`	10
	Hello, Yes the jumper on the line card on the master node was not inserted properly, hence virtual hub. Once it is properly inserted everything works fine. Including the NFS/NIS. We will disconnect the clients from the shared FWD-SCSI bus for safety reasons. Thanks Dave for your help. Justin
1977.9	You're welcome.	NETRIX::"[email protected]"	Dave Cherkus	`Tue Apr 08 1997 08:53`	6
	> Thanks Dave for your help. No problem. Glad things are working fine now. Dave [Posted by WWW Notes gateway]