| Bill Arvidson will be getting you a patch for this known problem.
Looks like it was fixed in the V4.0A and B stream.
BC
From: LANDO::WASTED::freiss "Rich Freiss USG 07-Mar-1997 1320" 7-MAR-1997 13:20:15.75
To: arvidson@DEC:.zko.wasted
CC: lando::cummins, freiss@DEC:.zko.wasted, ROMTSS::ZAGARIA,
dutile@DEC:.zko.wasted
Subj: Patch NEEDED ASAP for a Rawdhie Customer for V3.2G and TCR.
Bill,
I've attached a known problem that was reported in the Rawhide
notes file. This is a known problem that we sent you a change for
V3.2G. I've checked the support pool and the code is not submited.
The code only needs to be put in the V3.2G support pool. V4.0A has the
fix.
Can you please get a patch to the customer or the CSE ASAP?
I find it very interesting that this problem has not been reported
prior to this on other TCR V3.2G large memory installations.
Regards,
Rich Freiss
------- Forwarded Message
Return-Path: dutile
Delivery-Date: Fri, 07 Mar 97 11:42:01 -0500
Return-Path: dutile
Received: from localhost by wasted.zk3.dec.com; (5.65v3.2/1.1.8.2/18Feb95-1123
AM)
id AA22090; Fri, 7 Mar 1997 11:41:59 -0500
Message-Id: <[email protected]>
To: freiss
Subject: The v3.2g patch for 4100+TCR installed
Date: Fri, 07 Mar 97 11:41:59 -0500
From: "Don Dutile, UNIX Systems Hardware Support" <dutile>
X-Mts: smtp
------- Forwarded Message
To: arvidson
Cc: dutile
Subject: Rawhide patch-fix for >1GB and TCR installed
Date: Wed, 28 Aug 96 17:05:42 -0400
From: "Don Dutile, UNIX Systems Hardware Support" <dutile>
Bill,
There is a bug in the v3.2g support of RAWHIDE
such that when > 1GB of main memory is on the system, and
TCR is installed, the kernel will not boot (it installs & builds,
but won't boot).
The fix for the bug can be found at:
wasted:~dutile/rawhide/mcpcia.c.tcr_1gigplus
This file is ready to be bci'd into a sandbox backed by
v32g(supportos). So, all one has to do is
workon -sb your_sb_to_fix
cd kernel/io/dec/pci
bco mcpcia.c
cp wasted:~dutile/rawhide/mcpcia.c.tcr_1gigplus mcpcia.c
bco mcpcia.c
then you can srequest, bsubmit, build, etc. the above, and
mkpatch it into a fix-kit.
We don't have to back port the fix to v3.2f since TCR isn't
supported on V3.2f. If you want to, you could do the above
to v3.2f, and make two patch kits, one for v3.2f & v3.2g, since
the above file has not been modified in v32fsupportos, i.e,
mcpcia.c is the same in v3.2f, v3.2fsupportos & v3.2g.
One would do the v3.2f patch as a covering-the-bases in case
someone were to set the /etc/sysconfigtab variable basic-dma-window-size
to less than 1GB on a 1GB+ RAWHIDE/AS4100.
The above fix has been applied and tested on the v4.0a kit, and was
verified on a v3.2g system (both types of AS4100's reported a QAR
on this problem).
If you need or want more info, ping me.... Don
------- End of Forwarded Message
Return-Path: [email protected]
Delivery-Date: Fri, 07 Mar 97 09:11:38 -0500
Return-Path: [email protected]
Received: from DECnet-Mail11.wasted.zk3.dec.com by wasted.zk3.dec.com; (5.65v3.2/1.1.8.2/18Feb95-1123AM)
id AA04578; Fri, 7 Mar 1997 09:11:18 -0500
Date: Fri, 7 Mar 1997 09:11:18 -0500
Message-Id: <[email protected]>
Mime-Version: 1.0
From: [email protected] (Bill Cummins, PKO3-2/Q21, 223-4641)
To: [email protected]
Cc: CUMMINS
Subject: Anything jump out at you with respect to this customer log? Thanks, BC
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
<<< MVBLAB::SYS$SYSDEVICE:[NOTES$LIBRARY]ALPHASERVER_4100.NOTE;1 >>>
-< AlphaServer 4100 >-
================================================================================
Note 522.0 Memory upgrade 2 replies
ROMTSS::ZAGARIA 348 lines 6-MAR-1997 04:49
--------------------------------------------------------------------------------
Hi, all
I have two AlphaServer 4100 with D.U. 3.2g that are working in
an existing TruCluster configuration .
We have upgraded both systems with a CPU board B3001-CA and
2 x 512MB Edo memory boards (B3030-FA).
So now we have 2 CPU B3001-ca and 2GB of memory.
At boot time we detected a crash as listed below. The crash
occurred with both vmunix and genvmunix .
If we downgrade only the memory to 1 GB the systems work fine.
The memory boards are 4 x 512 Edo and their locations on the
backplane are 0L-0H and 1L-1H , the memory test env. is set "full".
Another hit : we reinstalled Digital Unix 3.2g with 2CPu and 2GB
memory on a new spare disk and boot performed fine.
( vmunix and genvmunix )
At this point my asks are:
Why old genvmunix doesn't boot correctly as the new one ?
How can I integrate new memory boards with the original software
installation ?
Regards Antonio
**************************************************************************
SROM V1.1 on cpu0
SROM V1.1 on cpu1
XSROM V3.0 on cpu1
XSROM V3.0 on cpu0
mem_pair0 - 1024 MB
mem_pair1 - 1024 MB
20..20..21..21..23..
please wait 48 seconds for T24 to complete
24..24..
Memory testing complete on cpu0
Memory testing complete on cpu1
starting console on CPU 0
sizing memory
0 1024 MB EDO
1 1024 MB EDO
starting console on CPU 1
probing IOD1 hose 1
bus 0 slot 1 - NCR 53C810
bus 0 slot 2 - DEC PCI FDDI
bus 0 slot 3 - DECchip 21040-AA
bus 0 slot 4 - DEC KZPSA
bus 0 slot 5 - NCR 53C810
probing IOD0 hose 0
bus 0 slot 1 - PCEB
probing EISA Bridge, bus 1
bus 0 slot 2 - DEC PCI MC
bus 0 slot 3 - S3 Trio64/Trio32
bus 0 slot 4 - DEC KZPSA
bus 0 slot 5 - DEC KZPSA
configuring I/O adapters...
ncr0, hose 1, bus 0, slot 1
pfi0, hose 1, bus 0, slot 2
tulip0, hose 1, bus 0, slot 3
kzpsa0, hose 1, bus 0, slot 4
ncr1, hose 1, bus 0, slot 5
floppy0, hose 0, bus 1, slot 0
kzpsa1, hose 0, bus 0, slot 4
kzpsa2, hose 0, bus 0, slot 5
mc0, hose 0, bus 0, slot 2
System temperature is 26 degrees C
AlphaServer 4100 Console V3.0-10, 19-NOV-1996 13:57:07
P00>>>sho mem*
memory_test full
P00>>>sho fru
Digital Equipment Corporation
AlphaServer 4100
Console V3.0-10 OpenVMS PALcode V1.19-2, Digital UNIX PALcode V1.21-14
Module Part # Type Rev Name Serial #
System Motherboard 23803-01 0 0000 mthrbrd0 AY63807378
Memory 1024 MB EDO N/A 0 0000 mem0 N/A
Memory 1024 MB EDO N/A 0 0000 mem1 N/A
CPU (Uncached) B3001-CA 0 0001 cpu0 KA624PNHYD
CPU (Uncached) B3001-CA 0 0001 cpu1 KA621PEHVA
Bridge (IOD0/IOD1) B3040-AA 600 0032 iod0/iod1 AY64201654
PCI Motherboard B3050-AA 8 0002 saddle0 KA626PVDBB
Bus 0 iod0 (PCI0)
Slot Option Name Type Rev Name
1 PCEB 4828086 0005 pceb0
2 DEC PCI MC 181011 000B mc0
3 S3 Trio64/Trio32 88115333 0000 vga0
4 DEC KZPSA 81011 0000 kzpsa1
5 DEC KZPSA 81011 0000 kzpsa2
Bus 1 pceb0 (EISA Bridge connected to iod0, slot 1)
Slot Option Name Type Rev Name
Bus 0 iod1 (PCI1)
Slot Option Name Type Rev Name
1 NCR 53C810 11000 0002 ncr0
2 DEC PCI FDDI f1011 0000 pfi0
3 DECchip 21040-AA 21011 0024 tulip0
4 DEC KZPSA 81011 0000 kzpsa0
5 NCR 53C810 11000 0002 ncr1
P00>>>sho device
polling ncr0 (NCR 53C810) slot 1, bus 0 PCI, hose 1 SCSI Bus ID 7
dka500.5.0.1.1 DKa500 RRD45 0436
polling kzpsa0 (DEC KZPSA) slot 4, bus 0 PCI, hose 1 TPwr 1 Fast 1 Bus ID 7
kzpsa0.7.0.4.1 dkb TPwr 1 Fast 1 Bus ID 7 N01 A10
jkb0.0.0.4.1 JKb0 TL810 1.20
mkb200.2.0.4.1 MKb200 TZ88 CC34
mkb300.3.0.4.1 MKb300 TZ88 CC34
mkb400.4.0.4.1 MKb400 TZ88 CC34
mkb500.5.0.4.1 MKb500 TZ88 CC34
polling ncr1 (NCR 53C810) slot 5, bus 0 PCI, hose 1 SCSI Bus ID 7
mkc0.0.0.5.1 MKc0 EXABYTE EXB-85058SQANXR1 07J0
mkc300.3.0.5.1 MKc300 TLZ09 0165
dkc400.4.0.5.1 DKc400 RZ28D 0010
dkc500.5.0.5.1 DKc500 RZ29B 0016
dkc600.6.0.5.1 DKc600 RZ29B 0016
polling floppy0 (FLOPPY) PCEB - XBUS hose 0
dva0.0.0.1000.0 DVA0 RX23
polling kzpsa1 (DEC KZPSA) slot 4, bus 0 PCI, hose 0 TPwr 1 Fast 1 Bus ID 7
kzpsa1.7.0.4.0 dkd TPwr 1 Fast 1 Bus ID 7 P01 A10
dkd100.1.0.4.0 DKd100 HSZ50 V50Z
dkd101.1.0.4.0 DKd101 HSZ50 V50Z
dkd200.2.0.4.0 DKd200 HSZ50 V50Z
dkd300.3.0.4.0 DKd300 HSZ50 V50Z
dkd301.3.0.4.0 DKd301 HSZ50 V50Z
dkd302.3.0.4.0 DKd302 HSZ50 V50Z
dkd400.4.0.4.0 DKd400 HSZ50 V50Z
jkd607.6.0.4.0 JKd607 DIGITAL ffff
polling kzpsa2 (DEC KZPSA) slot 5, bus 0 PCI, hose 0 TPwr 1 Fast 1 Bus ID 7
kzpsa2.7.0.5.0 dke TPwr 1 Fast 1 Bus ID 7 P01 A10
dke100.1.0.5.0 DKe100 HSZ50 V50Z
dke101.1.0.5.0 DKe101 HSZ50 V50Z
dke200.2.0.5.0 DKe200 HSZ50 V50Z
dke201.2.0.5.0 DKe201 HSZ50 V50Z
dke300.3.0.5.0 DKe300 HSZ50 V50Z
dke400.4.0.5.0 DKe400 HSZ50 V50Z
jke607.6.0.5.0 JKe607 DIGITAL ffff
polling pfi0 (DEC PCI FDDI) slot 2, bus 0 PCI, hose 1
fwa0.0.0.2.1: 00-00-F8-40-0A-09
polling tulip0 (DECchip 21040-AA) slot 3, bus 0 PCI, hose 1
ewa0.0.0.3.1 00-00-F8-22-82-90
P00>>>boot -fl i
(boot dkc600.6.0.5.1 -flags i)
Building FRU table
block 0 of dkc600.6.0.5.1 is a valid boot block
reading 32 blocks from dkc600.6.0.5.1
bootstrap code read in
base = 200000, image_start = 0, image_bytes = 4000
initializing HWRPB at 2000
initializing page table at 1f2000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code
OSF boot - Mon Jul 24 21:56:39 EDT 1995
Enter [kernel_name] [option_1 ... option_n]: genvmunix
Loading genvmunix ...
Current PAL Revision <0x4000200010113>
Switching to OSF PALcode Succeeded
New PAL Revision <0x4000e00020115>
Loading into KSEG Address Space
Sizes:
text = 5293968
data = 1350768
bss = 2798416
Starting at 0xfffffc00002394b0
Alpha boot: available memory from 0x2c08000 to 0x7fff6000
Digital UNIX V3.2G (Rev. 62); Thu Jul 18 22:21:57 EDT 1996
physical memory = 2048.00 megabytes.
available memory = 2003.92 megabytes.
using 7856 buffers containing 61.37 megabytes of memory
Master cpu at slot 0.
Firmware revision: 3.0
PALcode: Digital-UNIX/OSF version 1.21
AlphaServer 4100 5/300 0MB
pci1 at mcbus0 slot 5
psiop0 at pci1 slot 1
Loading SIOP: script c0001900, reg 4121300, data c000d838
scsi0 at psiop0 slot 0
rz5 at scsi0 bus 0 target 5 lun 0 (DEC RRD45 (C) DEC 0436)
fta0 DEC DEFPA FDDI Module, Hardware Revision 0
fta0 at pci1 slot 2
fta0: DMA Available.
fta0: DEC DEFPA (PDQ) FDDI Interface, Hardware address: 00-00-F8-40-0A-09
fta0: Firmware rev: 2.46
tu0: DECchip 21040-AA: Revision: 2.4
tu0 at pci1 slot 3
tu0: DEC TULIP Ethernet Interface, hardware address: 00-00-F8-22-82-90
tu0: console mode: selecting 10BaseT (UTP) port: half duplex: no link
pza0 at pci1 slot 4
asr = 0x200
cam_logger: CAM_ERROR packet
cam_logger: bus 1
spo_adap_reinit
Adapter State couldn't be set
ccfg_simattach(): SIM Attach Failed, sim_init() reported FAILURE - Retry.
pza0 not probed
pza in slot 4 not configured.
psiop1 at pci1 slot 5
Loading SIOP: script c0013900, reg 4121000, data c0027c38
cam_logger: CAM_ERROR packet
cam_logger: bus 2
psiop_pci_script_cntl
SCRIPTS did not start
ccfg_simattach(): SIM Attach Failed, sim_init() reported FAILURE - Retry.
psiop1 not probed
psiop in slot 5 not configured.
gpc0 at eisa0
pci0 at mcbus0 slot 4
eisa0 at pci0
ace0 at eisa0
ace1 at eisa0
lp0 at eisa0
fdi0 at eisa0
fd0 at fdi0 unit 0
vga0 at pci0 slot 3
PCXAL keyboard, language English (American)
1024x768 (S3TRIO )
pza0 at pci0 slot 4
asr = 0x200
cam_logger: CAM_ERROR packet
cam_logger: bus 1
spo_adap_reinittrap: invalid memory read access from kernel mode
faulting virtual address: 0x00000000000000b0
pc of faulting instruction: 0xfffffc0000529db0
ra contents at time of fault: 0xfffffc0000529d34
sp contents at time of fault: 0xffffffffb66f2da8
panic (cpu 0): kernel memory fault
DUMP: No primary swap, no explicit dumpdev.
Nowhere to put header, giving up.halted CPU 0
halt code = 5
HALT instruction executed
PC = fffffc0000454ad0
P00>>>
P00>>>
AlphaServer 4100 Console V3.0-10, 19-NOV-1996 13:57:07
P00>>>sho *
arc_enable OFF
auto_action HALT
boot_dev dkc600.6.0.5.1 dkc500.5.0.5.1
boot_file
boot_osflags a
boot_reset OFF
bootdef_dev dkc600.6.0.5.1 dkc500.5.0.5.1
booted_dev
booted_file
booted_osflags
cda0 dka500.5.0.1.1
char_set 0
com1_baud 9600
com1_flow SOFTWARE
com1_modem OFF
com2_baud 9600
com2_flow SOFTWARE
com2_modem OFF
console serial
cpu_enabled f
d_group field
d_omit
d_passes 1
d_runtime 0
d_verbose 0
dump_dev
enable_audit ON
ewa0_loop_count 3e8
ewa0_loop_inc a
ewa0_loop_patt ffffffff
ewa0_loop_size 2e
ewa0_lp_msg_node 1
ewa0_mode Twisted-Pair
exdep_data 5555555555555555
exdep_location 0
exdep_size 0
exdep_space pmem
exdep_type 3
fru_table ON
full_powerup_diags ON
fwa0_loop_count 3e8
fwa0_loop_inc a
fwa0_loop_patt ffffffff
fwa0_loop_size 2e
fwa0_lp_msg_node 1
graphics_background 4
graphics_foreground 7
graphics_page 0
graphics_switch -1
graphics_sync 0
graphics_type VIDEO
kbd_hardware_type LK411
kzpsa0_fast 1
kzpsa0_host_id 7
kzpsa0_termpwr 1
kzpsa1_fast 1
kzpsa1_host_id 7
kzpsa1_termpwr 1
kzpsa2_fast 1
kzpsa2_host_id 7
kzpsa2_termpwr 1
language 36
language_name English (American)
license MU
memory_test full
ocp_text BITTP1
os_type UNIX
pal OpenVMS PALcode V1.19-2, Digital UNIX PALcode V1.21-14
pci_arbmode Round-Robin
pci_parity ON
pci_req64
pka0_disconnect 1
pka0_fast 1
pka0_host_id 7
pkc0_disconnect 1
pkc0_fast 1
pkc0_host_id 7
prompt >>>
rcm_answer
rcm_dialout
rcm_init
reset_boot_arg0 0
reset_boot_arg1
reset_boot_arg2
sys_model_num 4100
sys_serial_num AY64912663
sys_type RACKMOUNT
tga_sync_green 0
tt_allow_login 1
tta0_page 0
tta0_type VIDEO
tty_dev 0
version V3.0-10, 19-NOV-1996 13:57:07
P00>>>
================================================================================
Note 522.1 Memory upgrade 1 of 2
PERFOM::HENNING 15 lines 6-MAR-1997 07:13
-< Interesting... >-
--------------------------------------------------------------------------------
Very interesting! A 2-cpu 2GB cacheless system may be somewhat rare.
(One would imagine that most people with the $$ to buy 2GB would also
probably buy cached CPUs). So it's possible that you might be building
a configuration that has less testing than other configurations.
The assertion that genvmunix from V3.2G works on your new disk but not
on the old disk is interesting. Could you do an ls -l on both of them
just to make sure that there aren't any surprises?
If they really are the same, you may need the help of the Unix hardware
support team. You could file a QAR addressed to their attention (set
host gorge, username QAR_INTERNAL), or you could visit their home page
and look for clues
http://www.zk3.dec.com/uhs/
================================================================================
Note 522.2 Memory upgrade 2 of 2
HARMNY::CUMMINS 9 lines 6-MAR-1997 13:54
-< No SCSI adapters moved, removed, or added? >-
--------------------------------------------------------------------------------
I have forwarded your note off to the UNIX support group. A question
for you in the meantime, however. Are you sure you haven't moved your
KZPSA or other SCSI adapters at any time during your boot experiments?
In other words, only the memory config has changed?
The UNIX boot audit trail shows CAM errors on the KZPSA and NCR810 SCSI
adapters on PCI hose 1 (slots 4, 5) and ends up not configuring them.
BC
------- End of Forwarded Message
|