T.R | Title | User | Personal Name | Date | Lines |
---|
1115.1 | boot_reset = ON to avoid problem | AFW4::MAZUR | | Mon Feb 24 1997 20:36 | 38 |
| This has been seen by serveral customers and brought to the attention of
console engineering. And it was registered as an IPMT and there is a
fix in the upcoming release of the console. If you are really inconvenienced,
set the boot_reset environment variable to ON.
Here is the problem. The console code has a memory leak and a fragmentation
problem. The console has a limited amount of heap space, around 1MB to do its
work. Memory leaks are chunks of memory that get allocated, but then get
lost from the console's bookkeeping. Gradually the size of the console
heap gets smaller and smaller. Fragmentation is when the usage of the
memory is spotty and peppered about the heap. The longer a console lives
without being re-inited, the more fragmented the heap gets, and the
largest contiguous free chunk gets smaller and smaller.
The console pulls in overlays from the flash rom. These overlays are
compressed. The decompression algorithm requires a 140KB of contiguous
memory to get an overlay into memory. That is why a shrunken heap problem
manifests itself mostly as a decompression error.
One of the leaks is known to be in the console's KZPSA driver. After
many boots and show dev's, the console has been known to see this problem.
Customers with boot_reset ON are less likely to see a problem because
the the console memory is totally reinitialized every the boot, and the
lost memory is recaptured and there are large contiguous chunks again.
Without the resets, as the heap ages and the losses accrue.
Leaks are difficult to track down, but the work is ongoing to resolve
them. The console that is to be released soon, the console watches
a low-on-heap threshold. If the problem is detected, the console will
reinitialize itself at boot time (equivalent to having boot_reset=ON)
resulting in a clean nicely reformed heap, avoiding the potential for
a decompression error, and allowing the boot to procede correctly.
|
1115.2 | | CSC32::BLAYLOCK | If at first you doubt,doubt again. | Fri Feb 28 1997 13:13 | 43 |
|
With the new V4.8-6 console code, should we be getting multiple
messages of this type? At least it does not hang anymore...
P00>>>b
(boot dkg100.1.0.4.3 -flags a)
SRM boot identifier: scsi 3 4 0 1 100 ef00 81011
boot adapter: kzpsa6 rev 0 in bus slot 4 off of kftha0 in TLSB slot 8
block 0 of dkg100.1.0.4.3 is a valid boot block
reading 16 blocks from dkg100.1.0.4.3
bootstrap code read in
base = 200000, image_start = 0, image_bytes = 2000
initializing HWRPB at 2000
initializing page table at 1f2000
initializing machine state
setting affinity to the primary CPU
Configuring I/O adapters...
tulip0, slot 0, bus 0, hose2
Unable to malloc memory for Decompression for NET
kzpsa0, slot 2, bus 0, hose2
kzpsa1, slot 4, bus 0, hose2
kzpsa2, slot 8, bus 0, hose2
kzpsa3, slot 9, bus 0, hose2
kzpsa4, slot 10, bus 0, hose2
kzpsa5, slot 11, bus 0, hose2
tulip1, slot 12, bus 0, hose3
Unable to malloc memory for Decompression for NET
floppy0, slot 0, bus 1, hose3
kzpsa6, slot 4, bus 0, hose3
kzpsa7, slot 8, bus 0, hose3
kzpsa8, slot 9, bus 0, hose3
kzpsa9, slot 10, bus 0, hose3
pfi0, slot 11, bus 0, hose3
jumping to bootstrap code
Digital UNIX boot - Mon Aug 19 21:02:26 EDT 1996
Loading vmunix ...
Loading at fffffc0000230000
Current PAL Revision <0x10000400010113>
Switching to OSF PALcode Succeeded
New PAL Revision <0x10000400020115>
|
1115.3 | | AFW3::MAZUR | | Sun Mar 02 1997 21:42 | 2 |
| This is still not good, but at least you are in business.
|
1115.4 | | CSC64::BLAYLOCK | If at first you doubt,doubt again. | Mon Mar 03 1997 11:14 | 6 |
|
So is this expected or a new side effect?
I realize that we are still able to boot (a good thing ;-)
but with the error messages continuing, my customer will
ask about the viability of the fix.
|
1115.5 | Work is ongoing | MAY30::AMATO | Bob Amato | Tue Mar 04 1997 10:01 | 12 |
|
Hello,
As stated in .1, work on this problem is ongoing.
Does the problem in .2 occur after an init? Or after
several reboot/shutdown cycles? The output from
a "show config" on this system would be helpful.
Thanks,
Bob
|
1115.6 | | CSC64::BLAYLOCK | If at first you doubt,doubt again. | Wed Mar 12 1997 16:47 | 69 |
| Sorry about taking so long to get to you.
The problem occurs after a number shutdown/reboot cycles
shutdown -r brings it out more than anything else.
Here is the show config on the 8200 system that we have.
The unknown include some FORE systems ATM cards (2) and
some PT334 (SS7) cards.
P08>>>show config
Name Type Rev Mnemonic
TLSB
4++ KN7CC-AB 8014 0000 kn7cc-ab0
5++ KN7CC-AB 8014 0000 kn7cc-ab1
7+ MS7CC 5000 0000 ms7cc0
8+ KFTHA 2000 0D02 kftha0
C0 PCI connected to kftha0 pci0
0+ DECchip 21041-AA 141011 0011 tulip0
1+ KZPSA 81011 0000 kzpsa0
2+ KZPSA 81011 0000 kzpsa1
4+ KZPSA 81011 0000 kzpsa2
5+ ????? 3341214 0010 unknown0
6+ ????? 3001127 0000 unknown1
7+ KZPSA 81011 0000 kzpsa3
8+ KZPSA 81011 0000 kzpsa4
9+ KZPSA 81011 0000 kzpsa5
A+ KZPSA 81011 0000 kzpsa6
C1 PCI connected to kftha0 pci1
0+ SIO 4828086 0005 sio0
4+ KZPSA 81011 0000 kzpsa7
5+ ????? 3341214 0010 unknown2
6+ ????? 3001127 0000 unknown3
B+ DEC PCI FDDI F1011 0000 pfi0
Controllers on SIO sio0
0+ DECchip 21040-AA 21011 0024 tulip1
1+ FLOPPY 2 0000 floppy0
2+ KBD 3 0000 kbd0
3+ MOUSE 4 0000 mouse0
EISA connected to pci1 through sio0 eisa0
P08>>>
kzpsa1, slot 2, bus 0
kzpsa2, slot 4, bus 0
kzpsa3, slot 7, bus 0
kzpsa4, slot 8, bus 0
kzpsa5, slot 9, bus 0
kzpsa6, slot 10, bus 0
tulip1, slot 12, bus 0
floppy0, slot 0, bus 1
kzpsa7, slot 4, bus 0
pfi0, slot 11, bus 0
jumping to bootstrap code
Digital UNIX boot - Mon Aug 19 21:02:26 EDT 1996
Loading vmunix ...
Loading at fffffc0000230000
Current PAL Revision <0x10000500010112>
Switching to OSF PALcode Succeeded
New PAL Revision <0x10000300020115>
Sizes:
text = 3185632
data = 456080
|
1115.7 | | AFW3::MAZUR | | Thu Mar 13 1997 07:56 | 5 |
| We have identified the problem and are working on the solution.
The problem is multiplied by the number of KZPSAs. The best work
around right now is to have boot_reset ON.
|
1115.8 | Another problem | NNTPD::"[email protected]" | Carlos Leitao | Fri Apr 18 1997 07:52 | 43 |
| Hi
Because we had the problem mentioned on this note , as soon as we
receive the new version we installed it.
At this moment the new message hapens after several reboots and it is:
CPU 0 booting
Innsufficient Heap for overlay decompression.
System will be reset prior to boot.
halted CPU 1
CPU 2 is not halted
CPU 3 is not halted
CPU 4 is not halted
CPU 5 is not halted
CPU 6 is not halted
CPU 7 is not halted
halt code = 1
operator initiated halt
PC = fffffc000039e53c
..........................
The same for all CPU's
..........................
halted CPU 5
operator initiated halt
PC = fffffc0000264560
insufficient dynamic memory for a request of 40960 bytes
PID bytes name
-------- ---------- ----
00000000 35424 ????
00000001 40672 idle
00000002 800 dead_eater
00000003 800 poll
00000004 800 timer
Thanks for your help
Carlos
[Posted by WWW Notes gateway]
|
1115.9 | | AFW3::MAZUR | | Fri Apr 18 1997 10:45 | 13 |
|
>Hi
>Because we had the problem mentioned on this note , as soon as we
>receive the new version we installed it.
>At this moment the new message hapens after several reboots and it is:
>
>CPU 0 booting
>
>Innsufficient Heap for overlay decompression.
>System will be reset prior to boot.
Was boot_reset ON?
|
1115.10 | reply | NNTPD::"[email protected]" | carlos leitao | Wed Apr 23 1997 10:47 | 7 |
| Hi
Yes the console variable boot_reset is ON
CL
[Posted by WWW Notes gateway]
|
1115.11 | | MAY30::MAZUR | | Thu Apr 24 1997 09:41 | 19 |
| The cause of this problem has been fixed. All the leaks in the KZPSA
driver have been plugged.
Until the next release, you can avoid this problem by rebooting your
system by a 2 step process.
Instead of:
$ REBOOT
use
$ SHUTDOWN
>>> b
The command line boot will recognize that boot_reset is ON and reinitialize
the system and clean up from the previous leaks.
|
1115.12 | Any date ? | NNTPD::"[email protected]" | Carlos Leitao | Thu Apr 24 1997 13:31 | 10 |
| Hi
can you tell me when it's going to be available the next release ?
The two step method , doesn't work on that particular configuration.
They need to a reboot.
Thanks for your help
Best Regards
Carlos Leitao
[Posted by WWW Notes gateway]
|
1115.13 | | HARMNY::MAZUR | | Thu Apr 24 1997 17:17 | 2 |
| The new V4.0 Firmware CD ship date is 7/2/97.
|