Title: | DECnet/OSI for OpenVMS |
Moderator: | TUXEDO::FONSECA |
Created: | Thu Feb 21 1991 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 3990 |
Total number of notes: | 19027 |
I've discovered a bug in DECnet/OSI or something it calls if MAXBUF is set to 65535 (and maybe other values). Dave <<< VAXAXP::NOTES$:[NOTES$LIBRARY]ALPHANOTES.NOTE;1 >>> -< Alpha Support Conference - Digital Internal Use Only >- ================================================================================ Note 51.0 V6.1, nonpaged pool fills up with 65536 byte blocks 2 replies SMAUG::GARROD "IBM Interconnect Engineering" 49 lines 25-FEB-1997 19:54 -------------------------------------------------------------------------------- We're running OpenVMS V6.1 on an AlphaServer 2000 node. Just lately the system has had made problems. Here are the symptoms. Soon after boot the system consumes all of Non Paged Pool, expanding up to NPAGEVIR. NPAGEVIR is 53 Megabytes at present. Once its done that SET HOSTing into the node won't work and quite understandably the who system often hangs after that. Doing a SHOW POOL/SUMMARY shows that most of the pool is consumed by "UNKNOWN" memory blocks and that most of these are 65536 bytes long. I've searched COMET looking for answers but have come up empty. There are plenty of customer reports of similar problems but there never seems to be a concrete answer. I suspect something to do with DECnet OSI. Because if we don't run DECnet then the pool expansion doesn't seem to happen. The COMET searches also seem to implicate certain versions of Pathworks but I've turned off Pathworks and the problem still occurs. I've turned on the system parameter SYSTEM_CHECK to turn on pool checking/poisoning. But have not managed to isolate the problem. A SHOW POOL/RING never seems to catch one of the 65536 memory allocatations. So i suspect that somehow they're not actually being allocated but the free list is being corrupted. I spebt most of the afternoon working with the DECnet OSI project leader on this problem because we suspected DNS and/or DEcnet. I'm now up to the latest ECO of DECnet (V6.3 ECO6) but that didn't fix the problem. We also installed the newest DECdns because we believed there were some memory problems with that in earlier kits. Anybody got any ideas? It's made our node useless. It basically gets to the point where no pool is left but limps along because there is anough pool for most allocation requests. But I guess REMACP needs a bigger block than is left. What I really need to know is how I can find what's causing the 65536 byte blocks to appear in pool. By the way looking at the them the first byte is always "01". The rest seems to be the pool poisoniung pattern "a". And they seem to have a bunch of zeros at the end. Any help would be most appreciated. Dave ================================================================================ Note 51.2 V6.1, nonpaged pool fills up with 65536 byte blocks 2 of 2 EDSCLU::GARROD "IBM Interconnect Engineering" 23 lines 26-FEB-1997 09:24 -< Problem caused by MAXBUF being set to 65535 >- -------------------------------------------------------------------------------- To answer my own question. I hope this is useful to others. We've finally identified the problem. The problem was caused by a user changing the SYSGEN parameters MAXBUF from the default of 8192 to 65535. a write CURRENT was done and at the next reboot the problem occurs. What was happening was that for every DECnet logical link either DECnet or something it uses (maybe DECdns) was allocating a number of chunks of non-paged pool of size 65536 and never freeing them. Thus each DECnet logical link was causing about 1/2 Megabyte of non-paged pool to depleted. It doesn't take long until you kill the system like this. I verified that an inbound MAIL and an inbound SET HOST link caused this pool usage. I'm not sure whether other sorts of DECnet links were also doing the same. As I said in .0 we are running DECnet/OSI on the OpenVMS Alpha system. Now we've fixed MAXBUF everything is working fine again. Dave
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
3890.1 | Have you determined if this happens with VMS V6.2 and DNV V6.3 ? | STEVMS::PETTENGILL | mulp | Thu Feb 27 1997 00:25 | 10 |
V6 included some new pool allocation logic in VMS and it took a while to tune it so that it didn't consume all of pool. I think that there were some consumers of pool that made some bad assumptions about how pool was allocated or had latent bugs that also caused problems. Since I have a V6.2/V6.3 system handy, I'm setting it to the maximum value. I note that for V6.2, the maximum value for MAXBUF is 64000. Perhaps that indicates that by the time the pool allocation header and rounding it added on to 65535, the resulting value can not be handled by the code that uses MAXBUF. |