T.R | Title | User | Personal Name | Date | Lines |
---|
657.1 | Escalate Formally, If Urgent... | XDELTA::HOFFMAN | Steve, OpenVMS Engineering | Wed May 28 1997 13:06 | 131 |
|
A 1 GB AlphaServer 2100 5/250, running OpenVMS Alpha V6.2-1H3...
What is the baseline for this system when running "normally"?
The average CPU, memory, I/O, etc., loading... Does this
average loading differ from the loading when the hangs arise?
:We have a very urgent problem on a customer's dual Alphaserver cluster.
If urgent, then I'd suggest formal escalation via an IPMT...
:Yesterday morning we noticed one of the nodes had run out of pagefile, and
:processes (largely RDB server processes) were going into RWMPB state.
:Shortly Afterwards, the system hung up. After about two hours, it suddenly
:came back, and everything was OK.
I'd guess this system was memory-wedged...
:While it was "hung", we could do an occasional SHOW SYS and MON CLUS from
:the other node, and nothing was happening on the system. There was hardly
:any CPU usage or disk IO, in fact nothing of note, just a whole bag of
:processes in Resource Wait.
:This morning, exactly the same thing happened, but on the other cluster
:member. Again, the system hung for around two hours, then just as we
:decided to hit the rest button, it came back. This hadn't happened before
:this week.
:What I'm really after here is some idea of where to look for the problem
:- what are the most likely causes of this kind of thing.
:Here's a few system stats. This is serious - the customer's going to get
:nasty if we don't sort it out soon.
I'd suggest an IPMT, then.
:$sh cpu/ful
:
:TSLV12, a AlphaServer 2100 5/250
...
When the problem next occurs, encourage these folks to force a
crashdump, and get a copy of the dump file for analysis here in
OpenVMS engineering.
System Memory Resources on 27-MAY-1997 09:57:47.61
:Physical Memory Usage (pages): Total Free In Use Modified
: Main Memory (1024.00Mb) 131072 1153 92109 37810
This system is very constrained by available memory. And thus system
appears to have an unusually large modified list in proportion to the
free memory. I'd strongly suggest more memory, or smaller working
sets, or fewer inswapped processes, or some combination of these
factors...
...
:Slot Usage (slots): Total Free Resident Swapped
: Process Entry Slots 1143 1009 134 0
: Balance Set Slots 500 368 132 0
These values look rather unusual -- I'd assume there is some tuning
"cruft" in MODPARAMS.DAT, something that caused a large number of
processes to be requested.
Dynamic Memory Usage (bytes): Total Free In Use Largest
Nonpaged Dynamic Memory 47218688 16411584 30807104 1018560
Paged Dynamic Memory 48087040 12892144 35194896 12872656
:Paging File Usage (blocks): Free Reservable Total
: DISK$DRS_PAGE_2:[SYSEXE]SWAPFILE_TSLV13.SYS;1
: 499968 499968 499968
: DISK$DRS_PAGE_2:[SYSEXE]PAGEFILE_TSLV13.SYS;1
: 0 -1931152 2999936
:Of the physical pages in use, 11598 pages are permanently allocated to OpenVMS.
Your pagefile is badly overcommitted -- you need more pagefile
configured, or more physical memory added, or fewer processes.
(You're certainly not forcing processes out -- it's possible
you might want to encourage some swapping of idle processes, to
try to free up some memory. This is a short-term fix, pending
system and workload adjustments.) I'd look at configuring the
additional pagefile on another disk spindle, under the assumption
that spreading the paging I/O loading is prefered...
:NatDRP> sh sys
:
:OpenVMS V6.2-1H3 on node TSLV13 27-MAY-1997 09:58:51.49 Uptime 8 01:57:49
: Pid Process Name State Pri I/O CPU Page flts Pages
:20600801 SWAPPER HIB 16 0 0 00:00:53.50 0 0
:20600805 CONFIGURE HIB 10 18 0 00:01:09.71 199 14
:...
20607043 RDBSERVER_2170 LEF 6 1172 0 00:00:00.96 1592 171 N
2060704A RDBSERVER_2186 LEF 6 65857 0 00:00:40.19 26808 508 N
2060606E RDBSERVER_33440 LEF 6 68636 0 00:00:40.84 25591 82 N
20607071 RDBSERVER_2210 LEF 6 2818 0 00:00:02.26 2478 331 N
..
2060687A RDBSERVER_2217 LEF 6 960 0 00:00:00.70 1082 22 N
2060707C RDBSERVER_26794 LEF 6 842 0 00:00:00.63 1521 203 N
2060707D RDBSERVER_33704 LEF 6 764 0 00:00:00.87 1093 363 N
..
2060709A RDBSERVER_2234 LEF 6 1295 0 00:00:01.21 1591 126 N
2060709B RDBSERVER_2235 LEF 6 212 0 00:00:00.26 611 26 N
2060709D RDBSERVER_2236 LEF 6 2290 0 00:00:01.71 1521 24 N
2060709F RDBSERVER_33721 LEF 6 70148 0 00:00:53.33 38766 441 N
206060AA RDBSERVER_33476 LEF 6 217 0 00:00:00.29 642 18 N
206070AB RDBSERVER_33477 LEF 4 2970 0 00:00:02.16 2136 194 N
206068AC RDBSERVER_33479 LEF 6 239 0 00:00:00.29 695 20 N
...
See if there are any images that can be installed /SHARE -- to try to
reduce the memory requirements. This AlphaServer 2100 looks pretty
heavily loaded... And in particular, see if any of those Rdb server
processes have component images that can be installed /SHARE...
:Sorry it's such a long note, but this really is a serious one!
Again, I'd suggest an IPMT.
As for SYSGEN, find out what is in MODPARAMS.DAT, and find out
when the last pass of AUTOGEN -- with FEEDBACK -- was run. If
this tool is not used regularly, then clean the "cruft" out of
MODPARAMS.DAT, re-run AUTOGEN with FEEDBACK, and reboot.
I'd also encourage the use of DECamds, as this tool can often
be very useful for identifying and managing these sorts of
problems...
And I'd recommend more memory... A faster processor... Etc...
|
657.2 | Thanks | ROBSON::drspc8.reo.dec.com::Warne | Systems from Heaven | Wed May 28 1997 14:08 | 296 |
| Thanks for the reply.
The system is always pretty heavily loaded, but it has got heavier over the past few weeks as more sites have connected
to it.
Autogen hasn't been used to set the parameters directly, but parameters have been changed on the basis of Autogen
reports.
I've ran an Autogen (testfiles) yesterday , and one value hardcoded in MODPARAMS which worries me is
VIRTUALPAGECNT = 550000 . This seems rather low - could it cause problems?
I've tagged the Agen report file at the end of this note.
*
Could you confirm to me how to force a crash on the Alpha? I guess if it's hung, you need to hit the reset button, then
write something invalid into the memory to make it crash, but I'm not sure of the exact procedure.
*
Again, many thanks.
Chris
Old values below are the parameter values at the time of collection.
The feedback data is based on 219 hours of up time.
Feedback information will be used in the subsequent calculations
Parameter information follows:
------------------------------
MAXPROCESSCNT parameter information:
Feedback information.
Old value was 1143, New value is 914
Maximum Observed Processes: 188
VIRTUALPAGECNT parameter information:
- AUTOGEN parameter calculation has been overridden.
The calculated value was 2105344. The value 550000
will be used in accordance with the following requirements:
VIRTUALPAGECNT has been specified by a hard-coded value of 550000.
Information on OpenVMS executable image Processing:
Processing SYS$MANAGER:VMS$IMAGES_MASTER.DAT
Total global pagelets counted = 41393
Total global sections counted = 121
Total resident code pages counted = 320
Total resident data pages counted = 0
GBLPAGFIL parameter information:
- AUTOGEN parameter calculation has been overridden.
The calculated value was 128. The value 32768
will be used in accordance with the following requirements:
GBLPAGFIL has been increased by 768.
GBLPAGFIL minimum value is 32768.
GBLPAGES parameter information:
Feedback information.
Old value was 681600, New value is 681600
Current used GBLPAGES: 131056
Global buffer requirements: 32768
GBLSECTIONS parameter information:
Feedback information.
Old value was 700, New value is 700
Current used GBLSECTIONS: 477
- AUTOGEN parameter calculation has been overridden.
The calculated value was 570. The value 700
will be used in accordance with the following requirements:
GBLSECTIONS has been increased by 180.
GBLSECTIONS minimum value is 700.
LOCKIDTBL parameter information:
Feedback information.
Old value was 73728, New value is 96256
Current number of locks: 102390
Peak number of locks: 120832
LOCKIDTBL_MAX parameter information:
Feedback information.
Old value was 262144, New value is 262144
- AUTOGEN parameter calculation has been overridden.
The calculated value was 157081. The value 262144
will be used in accordance with the following requirements:
LOCKIDTBL_MAX minimum value is 262144.
RESHASHTBL parameter information:
Feedback information.
Old value was 65535, New value is 65536
Current number of resources: 59632
TMSCP_LOAD parameter information:
- AUTOGEN parameter calculation has been overridden.
The calculated value was 0. The value 1
will be used in accordance with the following requirements:
TMSCP_LOAD has been specified by a hard-coded value of 1.
MSCP_BUFFER parameter information:
Feedback information.
Old value was 2048, New value is 2048
MSCP server I/O rate: 0 I/Os per 10 sec.
I/Os that waited for buffer space: 0
I/Os that fragmented into multiple transfers: 0
SCSCONNCNT parameter information:
Feedback information.
Old value was 40, New value is 40
Peak number of nodes: 2
Number of CDT allocation failures: 0
SCSRESPCNT parameter information:
Feedback information.
Old value was 300, New value is 300
RDT stall count: 0
SCSBUFFCNT parameter information:
Feedback information.
Old value was 4096, New value is 4096
CIBDT stall count: 0
SHADOW_MAX_COPY parameter information:
- AUTOGEN parameter calculation has been overridden.
The calculated value was 4. The value 18
will be used in accordance with the following requirements:
SHADOW_MAX_COPY has been specified by a hard-coded value of 18.
NPAGEDYN parameter information:
Feedback information.
Old value was 45744128, New value is 48791552
Maximum observed non-paged pool size: 49807360 bytes.
Non-paged pool request rate: 61 requests per 10 sec.
LNMSHASHTBL parameter information:
Feedback information.
Old value was 2048, New value is 1280
Current number of shareable logical names: 1640
- AUTOGEN parameter calculation has been overridden.
The calculated value was 1024. The value 1280
will be used in accordance with the following requirements:
LNMSHASHTBL minimum value is 1280.
BALSETCNT parameter information:
- AUTOGEN parameter calculation has been overridden.
The calculated value was 912. The value 500
will be used in accordance with the following requirements:
BALSETCNT has been specified by a hard-coded value of 500.
ACP_DIRCACHE parameter information:
Feedback information.
Old value was 24576, New value is 24576
Hit percentage: 99%
Attempt rate: 1117 attempts per 10 sec.
- AUTOGEN parameter calculation has been overridden.
The calculated value was 3000. The value 24576
will be used in accordance with the following requirements:
ACP_DIRCACHE has been specified by a hard-coded value of 24576.
ACP_DINDXCACHE parameter information:
Feedback information.
Old value was 8192, New value is 8192
Hit percentage: 100%
Attempt rate: 456 attempts per 10 sec.
- AUTOGEN parameter calculation has been overridden.
The calculated value was 750. The value 8192
will be used in accordance with the following requirements:
ACP_DINDXCACHE has been specified by a hard-coded value of 8192.
ACP_HDRCACHE parameter information:
Feedback information.
Old value was 24576, New value is 24576
Hit percentage: 97%
Attempt rate: 776 attempts per 10 sec.
- AUTOGEN parameter calculation has been overridden.
The calculated value was 3000. The value 24576
will be used in accordance with the following requirements:
ACP_HDRCACHE has been specified by a hard-coded value of 24576.
ACP_MAPCACHE parameter information:
Feedback information.
Old value was 1024, New value is 1024
Hit percentage: 80%
Attempt rate: 0 attempts per 10 sec.
PAGEDYN parameter information:
Feedback information.
Old value was 48087040, New value is 47955968
Current paged pool usage: 35128128 bytes.
Paged pool request rate: 118 requests per 10 sec.
PFRATH parameter information:
- AUTOGEN parameter calculation has been overridden.
The calculated value was 8. The value 4
will be used in accordance with the following requirements:
PFRATH has been specified by a hard-coded value of 4.
WSDEC parameter information:
- AUTOGEN parameter calculation has been overridden.
The calculated value was 4000. The value 37
will be used in accordance with the following requirements:
WSDEC has been specified by a hard-coded value of 37.
DUMPSTYLE parameter information:
- AUTOGEN parameter calculation has been overridden.
The calculated value was 1. The value 0
will be used in accordance with the following requirements:
DUMPSTYLE has been specified by a hard-coded value of 0.
FREEGOAL parameter information:
- AUTOGEN parameter calculation has been overridden.
The calculated value was 2000. The value 520
will be used in accordance with the following requirements:
FREEGOAL has been specified by a hard-coded value of 64.
MPW_LOLIMIT parameter information:
- AUTOGEN parameter calculation has been overridden.
The calculated value was 1500. The value 2048
will be used in accordance with the following requirements:
MPW_LOLIMIT minimum value is 2048.
PROCSECTCNT parameter information:
- AUTOGEN parameter calculation has been overridden.
The calculated value was 32. The value 128
will be used in accordance with the following requirements:
PROCSECTCNT has been specified by a hard-coded value of 128.
VAXCLUSTER parameter information:
- AUTOGEN parameter calculation has been overridden.
The calculated value was 1. The value 2
will be used in accordance with the following requirements:
VAXCLUSTER has been specified by a hard-coded value of 2.
EXPECTED_VOTES parameter information:
- AUTOGEN parameter calculation has been overridden.
The calculated value was 5. The value 3
will be used in accordance with the following requirements:
EXPECTED_VOTES has been specified by a hard-coded value of 3.
VOTES parameter information:
- AUTOGEN parameter calculation has been overridden.
The calculated value was 1. The value 2
will be used in accordance with the following requirements:
VOTES has been specified by a hard-coded value of 2.
RMS_DFNBC parameter information:
- AUTOGEN parameter calculation has been overridden.
The calculated value was 8. The value 64
will be used in accordance with the following requirements:
RMS_DFNBC has been specified by a hard-coded value of 64.
GH_EXEC_CODE parameter information:
Feedback information.
Old value was 512, New value is 512
GH_EXEC_DATA parameter information:
Feedback information.
Old value was 128, New value is 112
GH_RES_CODE parameter information:
Feedback information.
Old value was 512, New value is 512
GH_RES_DATA parameter information:
Feedback information.
Old value was 0, New value is 0
Page, Swap, and Dump file calculations
--------------------------------------
Page and Swap file calculations:
--------------------------------
PAGEFILE information:
Feedback information.
Old value was 3000000, New value is 3000000
Maximum observed usage: 2159280
Override Information - parameter calculation has been overridden.
The calculated value was 3238900. The new value is 3000000.
PAGEFILE calculation has been set to current size by user.
PAGEFILE will not be modified. The file size is within 10%.
SWAPFILE information:
Feedback information.
Old value was 500000, New value is 500000
Maximum observed usage: 5120
Override Information - parameter calculation has been overridden.
The calculated value was 234000. The new value is 500000.
SWAPFILE calculation has been set to current size by user.
SWAPFILE will not be modified. The file size is within 10%.
|
657.3 | Clean MODPARAMS, ReAUTOGEN, Reboot, Buy Memory | XDELTA::HOFFMAN | Steve, OpenVMS Engineering | Wed May 28 1997 14:49 | 111 |
|
:Thanks for the reply.
(Text wrapped for width...)
:The system is always pretty heavily loaded, but it has got heavier over the
:past few weeks as more sites have connected to it.
It looks like you have hit a knee in the curve, then... :-)
:Autogen hasn't been used to set the parameters directly, but parameters have
:been changed on the basis of Autogen reports.
OK -- I'm looking to avoid cases where folks have not used AUTOGEN
to make changes, to avoid cases where folks have made direct SYSGEN
changes, and to avoid cases where folks have constrained AUTOGEN
through site-specific tuning entries ("cruft") in MODPARAMS.DAT.
:I've ran an Autogen (testfiles) yesterday , and one value hardcoded in
:MODPARAMS which worries me is VIRTUALPAGECNT = 550000 . This seems rather
:low - could it cause problems?
It could, but the problems caused by insufficient virtual page
count settings do not usually cause the sorts of problems reported
here... (Most applications usually slam into the lesser of the
VIRTUALPAGECNT and the PGFLQUOTA settings, and merely "fall over".
It's certainly possible Rdb is smarter about this, but you'll want
to check the Oracle Rdb documentation for suggested settings for
these and other parameters...)
:I've tagged the Agen report file at the end of this note.
You might also want to post the contents of MODPARAMS.DAT.
I'd also determine who suggested the PFRATH and WSDEC settings
shown in the autogen report, and who suggested the FREEGOAL
parameter settings, as I would tend to leave these to the default
settings calculated by AUTOGEN. (If one is not careful with
these particular parameters, one can encounter thrashing...)
I'd also look at the lock tables, as Rdb can definitely consume
a large number of locks, and I'd expect has specific requirements
around these and other parameters. (The values for SYSGEN and
for SYSUAF settings are called out in the Rdb documentation for
the particular version of Rdb in use.
And I'd also tend to replace any sorts of PAGEFILE=3000000 and
SWAPFILE=500000 "absolute assignments" in MODPARAMS with either
MIN_PAGEFILE and MIN_SWAPFILE settings, or with PAGEFILE=0 and
SWAPFILE=0 settings. I tend to prefer the latter, as it tells
AUTOGEN to avoid these files, and I can adjust these settings
(usually upward) to match local requirements manually...
:Could you confirm to me how to force a crash on the Alpha? I guess if it's
:hung, you need to hit the reset button, then write something invalid into
:the memory to make it crash, but I'm not sure of the exact procedure.
Given the DUMPSTYLE setting, make sure you have bootstrapped
with a suffiently large dump file -- the current setting of
DUMPSTYLE saves all of physical memory...
See below for the generic directions on how to crash an Alpha,
and write a dump file...
From the OpenVMS installation and upgrade manual (available to DIGITAL
internal users via the URL http://axiom.zko.dec.com:8000/docset/):
A.3.2.3 Emergency Shutdown with Crash Commands
Use crash commands only if the system is "hung" (stops responding to any commands) and you cannot log in to
the SYSTEM account to use the SHUTDOWN.COM procedure or the OPCCRASH.EXE program.
Note: The method described here works on all Alpha computers. However, on certain systems, you can force
your processor to fail (crash) by entering a specific console command. See the hardware manuals that came with
your computer for that information.
To force your processor to fail, do the following:
1.Halt the system by entering Ctrl/P or by pressing the Halt button. (See Section A.3.1 for more
information about how to halt your Alpha computer.)
2.To examine processor registers, enter the following commands and press the Return key:
>>> E -N F R0
>>> E PS
The system displays the contents of the registers. Write down these values if you want to save
information about the state of the system.
3.Enter the following commands and press the Return key:
>>> D PC FFFFFFFF00000000
>>> D PS 1F00
By depositing these values, you cause the system to write a memory dump to the system dump file on the
disk.
4.Enter the following command and press the Return key:
>>> CONTINUE
This causes the system to perform a bugcheck.
5.After the system reboots, log in to the SYSTEM account.
6.To examine the dump file, enter the following commands and press the Return key after each one:
$ ANALYZE/CRASH SYS$SYSTEM:SYSDUMP.DMP
SDA> SHOW CRASH
For more information about the System Dump Analyzer (SDA) utility, see the OpenVMS Alpha System
Dump Analyzer Utility Manual.
|
657.4 | MODPARAMS | ROBSON::drspc8.reo.dec.com::Warne | Systems from Heaven | Wed May 28 1997 15:04 | 348 |
| Here's the MODPARAMS.DAT file (it's a bit of a mess!)
I've sent a request to the guy I think set these parameters up, to
see if he can explain them.
Many thanks again!
Chris
!****************************************************************************
! This section contains System Parameters found in
! SYS$SYSDEVICE:[SYS0.SYSEXE]ALPHAVMSSYS.PAR
! with values that must be preserved when AUTOGEN is run.
!
SCSSYSTEMID=1124
SCSNODE="PLANET "
VAXCLUSTER=2
EXPECTED_VOTES=1
VOTES=1
RECNXINTERVAL=20
DISK_QUORUM=" "
QDSKVOTES=1
QDSKINTERVAL=10
ALLOCLASS=100
LOCKDIRWT=0
NISCS_CONV_BOOT=0
NISCS_LOAD_PEA0=1
NISCS_PORT_SERV=0
MSCP_LOAD=1
MSCP_SERVE_ALL=2
!****************************************************************************
! This section contains any parameters found in
! SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
!
!vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
!
!++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
! SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
! Created during upgrade to OpenVMS AXP V6.2 16-MAY-1996 09:39:33.53
!
! This is a new file created by the OpenVMS upgrade procedure. This file
! was built by using the data found in the following file(s) previously
! used by this system:
!
! SYS$SYSDEVICE:[SYS0.SYSEXE]ALPHAVMSSYS.PAR
! SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
!
! This/These old file(s) have been renamed to:
!
! SYS$SYSDEVICE:[SYS0.SYSEXE]ALPHAVMSSYS.PAR_OLD
! SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT_OLD
!
! A new
!
! SYS$SYSDEVICE:[SYS0.SYSEXE]ALPHAVMSSYS.PAR
!
! has been built for you in order to ensure compatiblity with this release.
! Previous parameters found to be larger than the new defaults were retained.
! Certain other previous parameters were also retained.
!
! Please check the following sections of this file to see what files were used
! in what sequence to create the new APLHAVMSSYS.PAR file.
! Please review and edit this file for possible duplications, additions
! and deletions you wish to make.
!
!----------------------------------------------------------------------------
!****************************************************************************
! This section contains System Parameters found in
! SYS$SYSDEVICE:[SYS0.SYSEXE]ALPHAVMSSYS.PAR
! with values that must be preserved when AUTOGEN is run.
!
SCSSYSTEMID=1124
SCSNODE="PLANET "
VAXCLUSTER=2
EXPECTED_VOTES=1
VOTES=1
RECNXINTERVAL=20
DISK_QUORUM=" "
QDSKVOTES=1
QDSKINTERVAL=10
ALLOCLASS=100
LOCKDIRWT=0
NISCS_CONV_BOOT=0
NISCS_LOAD_PEA0=1
NISCS_PORT_SERV=0
MSCP_LOAD=1
MSCP_SERVE_ALL=2
!****************************************************************************
! This section contains any parameters found in
! SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
!
!vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
!
! SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
! Created during upgrade to OpenVMS AXP V6.1 1-AUG-1994 15:34:57.78
!
! This is a new file created by the OpenVMS upgrade procedure. This file
! was built by using the data found in the following file(s) previously
! used by this system:
!
! SYS$SYSDEVICE:[SYS0.SYSEXE]ALPHAVMSSYS.PAR
! SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
!
! This/These old file(s) have been renamed to:
!
! SYS$SYSDEVICE:[SYS0.SYSEXE]ALPHAVMSSYS.PAR_OLD
! SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT_OLD
!
! A new
!
! SYS$SYSDEVICE:[SYS0.SYSEXE]ALPHAVMSSYS.PAR
!
! has been built for you in order to ensure compatiblity with this release.
! Previous parameters found to be larger than the new defaults were retained.
! Certain other previous parameters were also retained.
!
! Please check the following sections of this file to see what files were used
! in what sequence to create the new APLHAVMSSYS.PAR file.
! Please review and edit this file for possible duplications, additions
! and deletions you wish to make.
!
!----------------------------------------------------------------------------
!****************************************************************************
! This section contains System Parameters found in
! SYS$SYSDEVICE:[SYS0.SYSEXE]ALPHAVMSSYS.PAR
! with values that must be preserved when AUTOGEN is run.
!
SCSSYSTEMID=1124
SCSNODE="PLANET "
VAXCLUSTER=2
EXPECTED_VOTES=1
VOTES=1
RECNXINTERVAL=20
DISK_QUORUM=" "
QDSKVOTES=1
QDSKINTERVAL=10
ALLOCLASS=100
LOCKDIRWT=0
NISCS_CONV_BOOT=0
NISCS_LOAD_PEA0=1
NISCS_PORT_SERV=0
MSCP_LOAD=1
MSCP_SERVE_ALL=2
!****************************************************************************
! This section contains any parameters found in
! SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
!
!vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
!
! SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
! Created during installation of OpenVMS AXP V6.1 27-JUL-1994 11:55:18.95
!
SCSNODE="PLANET"
SCSSYSTEMID="1124"
!
! End of SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
! Created during installation of OpenVMS AXP V6.1 27-JUL-1994 11:55:18.95
! CLUSTER_CONFIG appending for ADD operation on 27-JUL-1994 12:02:56.43
VOTES=1
DISK_QUORUM=""
AGEN$INCLUDE_PARAMS SYS$MANAGER:AGEN$NEW_NODE_DEFAULTS.DAT
SCSNODE="PLANET"
SCSSYSTEMID=1124
NISCS_LOAD_PEA0=1
VAXCLUSTER=2
MSCP_LOAD=1
MSCP_SERVE_ALL=2
ALLOCLASS=100
INTERCONNECT="NI"
BOOTNODE="NO"
! CLUSTER_CONFIG end
! for RDB
min_pql_denqlm = 1000
min_gblpages = 115000
pagefile=0
swapfile=0
dumfile=0
!
!^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
!
! End of SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
! Created during upgrade to OpenVMS AXP V6.1 1-AUG-1994 15:34:57.78
!
!^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
!
! End of SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
! Created during upgrade to OpenVMS AXP V6.2 16-MAY-1996 09:39:33.53
!
!^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
!
! End of SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
! Created during upgrade to OpenVMS AXP V6.2 20-MAY-1996 09:47:52.62
!--------------------------------------------------------------
! These are recommended values for a DRS Alpha V3.1 system
! which should not be changed without reference to the
! DRS Alpha V3.1 installation guide
!--------------------------------------------------------------
SCSNODE="TSLV12"
SCSSYSTEMID=58146
VAXCLUSTER=2
!
SHADOWING=2
!!SHADOW_MAX_COPY=13
SHADOW_SYS_DISK=1
SHADOW_SYS_UNIT=999
!
DR_UNIT_BASE=10
!
EXPECTED_VOTES=3
VOTES=2
DISK_QUORUM="DKB102"
QDSKVOTES=1
RECNXINTERVAL=20
QDSKINTERVAL=10
ALLOCLASS=100
INTERCONNECT="FDDI"
LOCKDIRWT=1
NISCS_CONV_BOOT=0
NISCS_LOAD_PEA0=1
NISCS_PORT_SERV=0
MSCP_LOAD=1
MSCP_SERVE_ALL=1
BOOTNODE="YES"
MIN_PAGEDYN=2053000
MIN_GBLPAGES=130000
MIN_GBLPAGFIL=32768
MIN_GBLSECTIONS=700
MIN_SYSMWCNT=6144
VIRTUALPAGECNT=550000
MIN_WSMAX=20480
MIN_LNMSHASHTBL=1280
MIN_RESHASHTBL=2048
MIN_LOCKIDTBL=10240
MIN_LOCKIDTBL_MAX=262144
MIN_MAXPROCESSCNT=512
BALSETCNT=500
MIN_SWPOUTPGCNT=1024
DUMPBUG=0
DUMPSTYLE=0
SAVEDUMP=1
CHANNELCNT=8191
CLISYMTBL=250
CTLPAGES=1500
MAXBUF=8192
PROCSECTCNT=128
DEADLOCK_WAIT=10
MSCP_BUFFER=2048
MSCP_CREDITS=32
SCSBUFFCNT=4096
MIN_SCSCONNCNT=40
ACP_DIRCACHE=24576
ACP_DINDXCACHE=8192
ACP_HDRCACHE=24576
ACP_FIDCACHE=8192
ACP_MAPCACHE=1024
ACP_MAXREAD=64
ACP_WINDOW=16
RMS_DFNBC=64
WSINC=2400
PFRATH=4
WSDEC=37
MIN_PFCDEFAULT=64
MIN_SPTREQ=2500
MIN_PQL_MPRCLM=8
MIN_PQL_MFILLM=100
MIN_PQL_MASTLM=600
MIN_PQL_MBIOLM=100
MIN_PQL_MBYTLM=40000
MIN_PQL_MDIOLM=100
MIN_PQL_MENQLM=12000
MIN_PQL_MWSDEFAULT=512
MIN_PQL_MWSQUOTA=1024
MIN_PQL_MWSEXTENT=2048
MIN_PQL_DFILLM=128
MIN_PQL_DASTLM=4096
MIN_PQL_DBIOLM=128
MIN_PQL_DBYTLM=65536
MIN_PQL_DDIOLM=4096
MIN_PQL_DENQLM=12000
MIN_PQL_DJTQUOTA=8192
MIN_PQL_DTQELM=20
MIN_PQL_DWSDEFAULT=1024
MIN_PQL_DWSEXTENT=1024
MIN_PQL_DFILLM=300
MIN_PQL_DENQLM=12000
MIN_PQL_DPGFLQUOTA=32768
FREEGOAL=64
MMG_CTLFLAGS = 3
MIN_MPW_HILIMIT=6368
MIN_MPW_LOLIMIT=2048
MIN_MPW_THRESH=2048
MIN_MPW_WAITLIMIT=12288
MIN_MPW_LOWAITLIMIT=6128
WINDOW_SYSTEM=1
WS_OPA0=1
LGI_BRK_TERM=0
LGI_BRK_DISUSER=0
LGI_PWD_TMO=30
LGI_RETRY_LIM=3
LGI_RETRY_TMO=20
LGI_BRK_LIM=2
LGI_BRK_TMO=900
LGI_HID_TIM=216000
DUMPFILE=0
SWAPFILE=0
PAGEFILE=0
MIN_NPAGEDYN=33554432 ! 32MB for National DRP
!-----------------------------------------------
!End of parameter definitions for DRS Alpha V3.1
!-----------------------------------------------
!-----------------------------------------------
! Added by Chris Turner 15-Feb-1997 for increased
! Map Data disk implementation (12 Member BVS)
!-----------------------------------------------
TAPE_ALLOCLASS=100
TMSCP_LOAD=1
TMSCP_SERVE_ALL=1
SHADOW_MAX_COPY=18
|
657.5 | That File Just Screams "Clean Me"... | XDELTA::HOFFMAN | Steve, OpenVMS Engineering | Wed May 28 1997 15:55 | 29 |
|
Well, that MODPARAMS.DAT clearly meets the definition of "cruft".
I would review the documentation for the various layered products,
and I would make a serious effort to reduce the number of parameter
entries in this MODPARAMS.DAT file -- there are several blocks of
duplicate entries...
I would definitely comment-out the WSINC=2400, PFRATH=4, and the
WSDEC=37 entries, and allow SYSGEN and AUTOGEN to determine values.
And I'd also comment-out the *_MPW_* settings...
And as mentioned before, I'd look seriously at reducing the overall
system load and/or at increasing the available CPU and memory...
(As one test, save a copy of MODPARAMS.DAT and the *.PAR parameter
file to the side, clean out everything in MODPARAMS.DAT that isn't
required by one of the application packages, reAUTOGEN, and reboot.
If problems ensue, one can reload the *.PAR file via a SYSGEN> USE
command, and reboot, or reAUTOGEN and reboot. Or one can resolve
and then adjust the setting(s) of the parameter(s), and reboot...)
--
ps: "dumfile=0" won't work, and should engender a message or two.
The customer also has several different settings for key parameters,
such as VOTES, DISK_QUORUM, etc. (These parameters are not likely
related to the specific problem you are seeing, however.)
|
657.6 | Oh no, it's getting worse! | ROBSON::drspc8.reo.dec.com::Warne | Systems from Heaven | Thu May 29 1997 13:51 | 25 |
| I'll definitely be reviewing the MODPARAMS settings as you advise.
It got worse this morning! One of the nodes crashed instantly, with no
warning and no logs at all. It rebooted, but when I did an ANAL/CRASH
all I got was :
%SDA-W-INCOMPL, system space memory not completely written in dump file
%SDA-W-NOTSAVED, global pages not saved in the dump file
%SDA-W-NOTSAVED, processes not saved in the dump file
%SDA-E-NOREAD, unable to access location 8AC91988
There's limited system disk space, so the dump file was only 54820
blocks (DUMPSTYLE is set to 1, despite what it says in MODAPRAMS).
Is this SDA error because the file is just too small to write any
meaningful information to it, or could it be anything to do with having
a shadowed system disk?
Of course, when it rebooted, all the disk sets went into ShadowMergeMbr
state, and users couldn't do much work on the system as it was running
with 90% Interrupt State! SHADOW_MAX_COPY was set to 18, so I've advised
it's dropped down to 6 to hopefully ease this problem next time.
Without a dump, I can't say why the system crashed, but it looks like
we've got two problems here - one a tuning issue, and the other a
mystery!
|
657.7 | Clean Out MODPARAMS | XDELTA::HOFFMAN | Steve, OpenVMS Engineering | Thu May 29 1997 14:33 | 40 |
|
paragraphs reordered...
:It got worse this morning! One of the nodes crashed instantly, with no
:warning and no logs at all. It rebooted, but when I did an ANAL/CRASH
:all I got was :
The only interesting clue here -- given the lack of a dump
file -- is what got displayed on the console during the
crash.
:There's limited system disk space, so the dump file was only 54820
:blocks (DUMPSTYLE is set to 1, despite what it says in MODAPRAMS).
That would tend to point to hand-tuning in SYSGEN, and that's
something that tends to lead to these sorts of weird problems
when a hand-made tweak isn't reflected in another parameter,
and to weird problems after AUTOGEN is run and some hand-made
tweak is lost.
:%SDA-W-INCOMPL, system space memory not completely written in dump file
:%SDA-W-NOTSAVED, global pages not saved in the dump file
:%SDA-W-NOTSAVED, processes not saved in the dump file
:%SDA-E-NOREAD, unable to access location 8AC91988
:Is this SDA error because the file is just too small to write any
:meaningful information to it, or could it be anything to do with having
:a shadowed system disk?
The error from SDA -- there should be an equivilent set of
warnings during the crash -- just indicates the dump is not
complete.
--
Note that you can have a SYS$COMMON:[SYSEXE]SYSDUMP-COMMON.DMP,
and add aliases into each root for SYS$COMMON:[SYSEXE]SYSDUMP.DMP.
This saves some space, but risks overwriting the first dump if any
other system crashes before the dump can be saved (via SDA> COPY
or similar approaches) somewhere off the system disk...
|
657.8 | | TWICK::PETTENGILL | mulp | Fri May 30 1997 18:19 | 7 |
| >> Well, that MODPARAMS.DAT clearly meets the definition of "cruft".
VMS upgrades put an annoying amount of "cruft" in modparams.dat.
What is needed is a utility to cleanup and verify modparams.dat, but since
I've thought about it many times and never had time, I'm certainly not
volunteering to do it.
|
657.9 | Why no mini-merges? | VMSSPT::JENKINS | Kevin M Jenkins VMS Support Engineering | Mon Jun 02 1997 09:55 | 30 |
|
<Of course, when it rebooted, all the disk sets went into ShadowMergeMbr
<state, and users couldn't do much work on the system as it was running
<with 90% Interrupt State! SHADOW_MAX_COPY was set to 18, so I've advised
<it's dropped down to 6 to hopefully ease this problem next time.
<Without a dump, I can't say why the system crashed, but it looks like
<we've got two problems here - one a tuning issue, and the other a
<mystery!
Lowering SHADOW_MAX_COPY is not the way to address this. The problem
is most likely that the system disk is in merge. The only way to
deal with that is to remove one member of the shadowset and put
it back as a copy target. If you are going to try this you need to be
carefull to identify which of the shadowsets had the crash dump
written to it, this is output at the console during the dump. Make
sure you don't remove that one.
Second, as long as the other disks are in merge then user IO is going
to suffer. If this system is already running close to IO capacity then
the extra load caused by the merge state can cause significant
performance problems. The actual IO being done by the SHADOW_SERVER
process is usually not a problem so lowering SHADOW_MAX_COPY can
actually make the situtation worse.
I don't recall the config... but why didn't the shadowsets do
mini-merges? What kind of controllers and disks are being used to
create the shadowsets? Do you have write logging disabled via
SHADOW_SYS_DISK set 801 hex?
Kevin
|
657.10 | Standalone system will only do merge | VMSSPT::JENKINS | Kevin M Jenkins VMS Support Engineering | Mon Jun 02 1997 09:58 | 4 |
| Went back and looked at .0, if this is a standalone system
then you will always have to do full merges on after any system crash.
|
657.11 | my guess | EVMS::KUEHNEL | Andy K�hnel | Mon Jun 02 1997 11:33 | 47 |
| The initial problem was: system "locks up" temporarily with many
processes in RWMPB.
We get into this situation when a huge amount of pages get thrown on
an already large modified page list: the total number of modified pages
is > MPW_WAITLIMIT, therefore any process replacing a page in its working
set that needs to go on the modified page list is thrown into RWMPB.
Processes are released when the modified page list shrinks below
MPW_LOWAITLIMIT.
This whole mechanism was put into place in order to prevent a single
thrashing process from essentially getting all system memory because
it could potentially create modified pages much faster than they can be
written back. Unfortunately, innocent bystanders suffer if we need to
throw just a single page out of their working set and onto the modified
page list.
On the system at hand, there are a couple of large server processes,
which apparently can grow to > 25,000 pages. Assuming that many of
these are global pages that are not also mapped by another process, a
simple image exit/process deletion will throw these pages onto the
modified page list, and it will take a while to write this stuff back.
This is what I believe happened and caused the "temporary hang"
If this guess is right, what can you do to prevent this?
You have 42,600 global pages (681,600 pagelets). This would be the
maximum of what can be thrown onto the modified page list by this
mechanism. A MPW_WAITLIMIT of about 42,000 pages amd a MPW_LOWAITLIMIT
of 40,000 pages would probably do the job. I would recommend against
turning MPW_HILIMIT to this level because this would delay the writing
of modified pages.
You may also help the situation somewhat by reducing MPW_WRTCLUSTER to
maybe 16 and increasing MPW_IOLIMIT to 64. This could be even more
effective if multiple pagefiles on different disks were used or if the
pagefile were on a stripe set.
However:
The crash you saw may or may not be related to this problem. Without a
dump, it's impossible to tell. If the crash was caused by some kind of
memory starvation "at a bad time", by increasing MPW_WAITLIMIT you
would risk running into the situation more easily!
|