Title: | *OLD* ALL-IN-1 (tm) Support Conference |
Notice: | Closed - See Note 4331.l to move to IOSG::ALL-IN-1 |
Moderator: | IOSG::PYE |
Created: | Thu Jan 30 1992 |
Last Modified: | Tue Jan 23 1996 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 4343 |
Total number of notes: | 18308 |
Hi, A customer has two nodes in a cluster and has just enabled cluster ALIAS the system crashed. I logged on and examined the crash dump and have got the following information( <node name>$SRV73 was active): VAX/VMS System dump analyzer Dump taken on 22-FEB-1993 12:49:46.22 INVEXCEPTN, Exception while above ASTDEL or on interrupt stack SDA> sh crash System crash information ------------------------ Time of system crash: 22-FEB-1993 12:49:46.22 Version of system: VAX/VMS VERSION V5.5-1 System Version Major ID/Minor ID: 1/0 VAXcluster node: PER1, a VAX 6000-610 Crash CPU ID/Primary CPU ID: 01/01 Bitmask of CPUs active/available: 00000002/00000002 CPU bugcheck codes: CPU 01 -- INVEXCEPTN, Exception while above ASTDEL or on interrupt stack Press RETURN for more. SDA> CPU 01 Processor crash information ---------------------------------- CPU 01 reason for Bugcheck: INVEXCEPTN, Exception while above ASTDEL or on inter rupt stack Process currently executing on this CPU: PER1$SRV73 Current IPL: 8 (decimal) CPU database address: 87FA2000 MPB address: 00000000 CPU 01 Processor crash information ---------------------------------- General registers: R0 = 00000008 R1 = 04080000 R2 = 00000003 R3 = 20545349 R4 = 00000001 R5 = 85F35580 R6 = 2053554C R7 = 85F355AC R8 = 85F35570 R9 = 80004FB0 R10 = 00000005 R11 = 85F35580 AP = 7FFE96A8 FP = 7FFE966C SP = 87FA3D88 PC = 80DA8842 PSL = 04080009 CPU 01 Processor crash information ---------------------------------- Processor registers: P0BR = 8C88FE00 SBR = 0F8C2800 ASTLVL = 00000001 P0LR = 00008ABD SLR = 001CB280 SISR = 00000104 P1BR = 8C297C00 PCBB = 0BEE1620 ICCS = 00000041 P1LR = 001FF78A SCBB = 0F8B4800 SID = 13000202 XDEV = 00048087 XBE = 00000040 XBEER = 00000000 XFADR = 61880008 NCSR = 00000800 TODR = 2B0DEF74 TBSTS = 800001D0 PCSTS = FFFFF800 BCETSTS= 00000140 NESTS = 00000000 CEFSTS = 00019200 BCEDSTS= 00000400 ICR = FFFFD906 ICCS = 00000041 IPORT = 000000C1 OPORT0 = 0000000D OPORT1 = 000000C0 RXCS = 00000040 TXCS = 00000080 Press RETURN for more. SDA> CPU 01 Processor crash information ---------------------------------- ISP = 87FA3D88 KSP = 7FFE7800 ESP = 7FFE966C SSP = 7FFED800 USP = 002FD954 No spinlocks currently owned by CPU 01 SDA> The customer had started up the server on the nodes after cluster alias was enabled, also the crash occured two hours after the aliasing was enabled. The question is is there anything the customer may have missed out that may have led to the crash ? I have seen topics 1228 and 2102 but they do not match the problem. The customer has just bought the VAX and will be getting a 3rd one soon, I would be grateful for any pointers. Thanks, Sunil
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
2301.1 | No real guess | CHRLIE::HUSTON | Mon Feb 22 1993 13:30 | 16 | |
I can't think of anything specific. One thought though, have the customer reboot the cluster. There are some internal funny games DECnet plays with aliasing. The reason I suspect this is that hte server itself never raises it priority level or specifically does much AST work. This is however done by DASL which the server uses, DASL also talks to DECnet and does load balancing if cluster aliasing is being used. Maybe something is not quite correct due to this. An alternative which is somewhat less painfull than a reboot, it stop and re-start DECnet, note that this will stop the server also so it will have to restarted by the ALL-IN-1 manager. --Bob | |||||
2301.2 | Will reboot after cluster alias enabled | BUSHIE::SETHI | Man from Downunder | Tue Feb 23 1993 00:13 | 15 |
Hi Bob, I should have followed my feels on this I had the funny feeling that they may not have rebooted. I say this because I read something somewhere regarding this and I just didn't want to take a jump in the deep without some confirmation. Were these words of wisdom spoken in this conference ? I have made the suggestion to the customer and the systems are due for a reboot tonight. I will post a reply here to confirm if if it's been a success or not. Thanks and regards, Sunil | |||||
2301.3 | Reboot cleared the problem | GIDDAY::SETHI | Man from Downunder | Mon Mar 15 1993 23:24 | 1 |