[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference vaxaxp::vmsnotes

Title:VAX and Alpha VMS
Notice:This is a new VMSnotes, please read note 2.1
Moderator:VAXAXP::BERNARDO
Created:Wed Jan 22 1997
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:703
Total number of notes:3722

657.0. "Very urgent problem on large cutomer site" by ROBSON::drspc8.reo.dec.com::Warne (Systems from Heaven) Wed May 28 1997 12:13

We have a very urgent problem on a customer's dual Alphaserver cluster.

Yesterday morning we noticed one of the nodes had run out of pagefile, and processes (largely RDB server processes) 
were going into RWMPB state. Shortly Afterwards, the system hung up. After about two hours, it suddenly came back, and 
everything was OK.
While it was "hung", we could do an occasional SHOW SYS and MON CLUS from the other node, and nothing was happening on 
the system. There was hardly any CPU usage or disk IO, in fact nothing of note, just a whole bag of processes in 
Resource Wait.
This morning, exactly the same thing happened, but on the other cluster member. Again, the system hung for around two 
hours, then just as we decided to hit the rest button, it came back. This hadn't happened before this week.

What I'm really after here is some idea of where to look for the problem - what are the most likely causes of this kind 
of thing.

Here's a few system stats. This is serious - the customer's going to get nasty if we don't sort it out soon.


$sh cpu/ful

TSLV12, a AlphaServer 2100 5/250
Multiprocessing is DISABLED. Uniprocessing synchronization image loaded.
Minimum multiprocessing revision levels: CPU = 1
System Page Size = 8192
System Revision Code =
System Serial Number =
Default CPU Capabilities:
        QUORUM RUN
Default Process Capabilities:
        QUORUM RUN
PRIMARY CPU = 00
CPU 00 is in RUN state
Current Process: _RTA9:          PID = 2021444D
Serial Number:
Revision:
VAX floating point operations supported.
IEEE floating point operations and data types supported.
Processor is Primary Eligible.
PALCODE: Revision Code = 1.18
         PALcode Compatibility = 1
         Maximum Shared Processors = 4
         Memory Space:  Physical address = 00000000 00000000
                        Length = 0
         Scratch Space: Physical address = 00000000 00000000
                        Length = 0
Capabilities of this CPU:
        PRIMARY QUORUM RUN
Processes which can only execute on this CPU:
        CONFIGURE        PID = 20200805  Reason: PRIMARY Capability


$sh mem   (about ten minutes after the problem was first noticed yesterday)


              System Memory Resources on 27-MAY-1997 09:57:47.61
Physical Memory Usage (pages):     Total        Free      In Use    Modified
  Main Memory (1024.00Mb)         131072        1153       92109       37810
Virtual I/O Cache (Kbytes):        Total        Free      In Use
  Cache Memory                      3200           0        3200
Granularity Hint Regions (pages):  Total        Free      In Use    Released
  Execlet code region                512           0         508           4
  Execlet data region                128           1         111          16
  VMS exec data region              6103           0        6103           0
  Resident image code region         512           0         322         190
Slot Usage (slots):                Total        Free    Resident     Swapped
  Process Entry Slots               1143        1009         134           0
  Balance Set Slots                  500         368         132           0
Dynamic Memory Usage (bytes):      Total        Free      In Use     Largest
  Nonpaged Dynamic Memory       47218688    16411584    30807104     1018560
  Paged Dynamic Memory          48087040    12892144    35194896    12872656
Paging File Usage (blocks):                     Free  Reservable       Total
  DISK$DRS_PAGE_2:[SYSEXE]SWAPFILE_TSLV13.SYS;1
                                              499968      499968      499968
  DISK$DRS_PAGE_2:[SYSEXE]PAGEFILE_TSLV13.SYS;1
                                                   0    -1931152     2999936
Of the physical pages in use, 11598 pages are permanently allocated to OpenVMS.


NatDRP> sh sys

OpenVMS V6.2-1H3  on node TSLV13  27-MAY-1997 09:58:51.49  Uptime  8 01:57:49
  Pid    Process Name    State  Pri      I/O       CPU       Page flts  Pages
20600801 SWAPPER         HIB     16        0   0 00:00:53.50         0      0
20600805 CONFIGURE       HIB     10       18   0 00:01:09.71       199     14
20600807 IPCACP          HIB     10     1109   0 00:00:01.07        54     17
20600808 ERRFMT          HIB      8     6122   0 00:00:06.07        59     31
20600809 CACHE_SERVER    HIB     16     5217   0 00:00:15.40        24     33
2060080A CLUSTER_SERVER  HIB      8      111   0 00:00:07.56        64     24
2060080B OPCOM           HIB      7   161011   0 00:00:42.26       529     31
2060080C AUDIT_SERVER    HIB      9    28613   0 00:00:52.56       535     69
2060080D JOB_CONTROL     HIB      8    14346   0 00:00:04.20       222     23
2060080E SHADOW_SERVER   HIB      4  7260630   0 00:19:58.13        39    167
20600810 SMISERVER       HIB      7      344   0 00:00:00.38       200     22
20600811 TP_SERVER       HIB     10    11092   0 00:00:05.88       310     35
20600812 NatDRP Monitor  LEF      6    26316   0 00:00:22.86      1324    108
20600813 NETACP          HIB     10   257998   0 00:04:12.43      1712    101
20600815 LATACP          HIB     14       13   0 00:00:55.20       566     37
20607017 RDBSERVER_2147  LEF      6      352   0 00:00:00.50       918     19  N
2060701A RDBSERVER_2146  LEF      6      355   0 00:00:00.32      1066    367  N
20607033 RDBSERVER_2161  LEF      6      212   0 00:00:00.25       678     20  N
20607043 RDBSERVER_2170  LEF      6     1172   0 00:00:00.96      1592    171  N
2060704A RDBSERVER_2186  LEF      6    65857   0 00:00:40.19     26808    508  N
2060606E RDBSERVER_33440 LEF      6    68636   0 00:00:40.84     25591     82  N
20607071 RDBSERVER_2210  LEF      6     2818   0 00:00:02.26      2478    331  N
20607072 _RTA6:          LEF      6     7372   0 00:00:06.20      6847    538
20605079 RDBSERVER_2215  LEF      6     1290   0 00:00:01.00      1493    243  N
2060687A RDBSERVER_2217  LEF      6      960   0 00:00:00.70      1082     22  N
2060707C RDBSERVER_26794 LEF      6      842   0 00:00:00.63      1521    203  N
2060707D RDBSERVER_33704 LEF      6      764   0 00:00:00.87      1093    363  N
20600889 SECURITY_SERVER HIB     10     3514   0 00:00:52.26      1946    113
2060088B REMACP          HIB      9     1500   0 00:00:00.32        70     18
2060709A RDBSERVER_2234  LEF      6     1295   0 00:00:01.21      1591    126  N
2060709B RDBSERVER_2235  LEF      6      212   0 00:00:00.26       611     26  N
2060709D RDBSERVER_2236  LEF      6     2290   0 00:00:01.71      1521     24  N
2060709F RDBSERVER_33721 LEF      6    70148   0 00:00:53.33     38766    441  N
206060AA RDBSERVER_33476 LEF      6      217   0 00:00:00.29       642     18  N
206070AB RDBSERVER_33477 LEF      4     2970   0 00:00:02.16      2136    194  N
206068AC RDBSERVER_33479 LEF      6      239   0 00:00:00.29       695     20  N
206070B0 _RTA7:          LEF      6   221893   0 00:02:47.01      7190     63
206070B2 RDBSERVER_33741 LEF      6     2790   0 00:00:02.21      2129    555  N
206070C7 DECW$TE_70C7    LEF      5     6987   0 00:00:04.23      2139     79
206070C8 _FTA83:         HIB      5     2681   0 00:00:01.28      2122     89
206070D6 RDBSERVER_2312  LEF      6     1615   0 00:00:01.41      1233    480  N
206068E1 RDBSERVER_2325  LEF      6     1398   0 00:00:01.13      1264     93  N
206070EB RDBSERVER_2331  LEF      6     2413   0 00:00:01.86      2755    325  N
206070F3 RDBSERVER_33775 LEF      6     1351   0 00:00:01.07      1810     63  N
206068FF RDBSERVER_33282 LEF      6      673   0 00:00:00.69       971     64  N
20606904 RDBSERVER_33539 LEF      6      427   0 00:00:00.55      1001    276  N
20607107 RDBSERVER_33286 LEF      6      364   0 00:00:00.55       773     27  N
20607109 RDBSERVER_33544 LEF      6      366   0 00:00:00.49       840     27  N
2060590B RDBSERVER_33547 LEF      6      390   0 00:00:00.48       819     30  N
2060091F RDMS_MONITOR    LEF     15    43702   0 00:00:20.13     21075     44
20606920 RDBSERVER_2397  LEF      6    76320   0 00:01:03.20    107204   4415  N
20607121 RDBSERVER_33565 LEF      6    65805   0 00:00:57.71    139048   5478  N
20600922 SW$TSLV13$100A  LEF      0    12862   0 00:00:00.85        79     48
20600924 SNS$WATCHDOG    HIB      6   334198   0 00:05:41.86      2048     84
20606926 RDBSERVER_33313 LEF      6     1450   0 00:00:01.22      1662    324  N
20607127 RDBSERVER_33316 LEF      6    65470   0 00:00:55.73     96327   9405  N
20600928 DECW$SERVER_0   HIB      8     3255   0 00:00:31.10      1742     33
20606929 RDBSERVER_33573 LEF      6     1444   0 00:00:01.36      1733    448  N
2060712B RDBSERVER_33319 LEF      6    67782   0 00:01:00.12    123629   4647  N
20607135 RDBSERVER_33367 LEF      6      920   0 00:00:01.06      1095     49  N
2060693D RDBSERVER_8207  LEF      6      814   0 00:00:00.66      1183     23  N
2060513F RDBSERVER_8210  LEF      6     1237   0 00:00:01.29      1312    333  N
20606140 RDBSERVER_33381 LEF      6     4065   0 00:00:03.04      2118    352  N
20606941 RDBSERVER_8211  LEF      5      220   0 00:00:00.33       718     20  N
2060694B RDBSERVER_8217  LEF      6     2859   0 00:00:02.13      1341    399  N
20606980 DECW$LOGINOUT   LEF      4      139   0 00:00:00.19       520     54
20607185 RDBSERVER_8230  LEF      6     1775   0 00:00:01.48      1511    357  N
20607187 RDBSERVER_33662 LEF      6     2182   0 00:00:01.84      1453    436  N
20606995 RDBSERVER_33415 LEF      6      213   0 00:00:00.29       640     18  N
20606998 RDBSERVER_24626 LEF      6     2610   0 00:00:02.15      1787    414  N
20606199 RDBSERVER_33421 LEF      6      219   0 00:00:00.32       626     32  N
206061B4 RDBSERVER_33449 LEF      6     1900   0 00:00:01.69      1279    239  N
206071C4 RDBSERVER_8285  LEF      6     2308   0 00:00:01.77      1547    132  N
206039D1 RDBSERVER_33720 LEF      6     1083   0 00:00:01.00      1080    465  N
206059D4 RDBSERVER_8299  LEF      6      221   0 00:00:00.31       611    233  N
206071D5 GARY_MM         LEF      6    41794   0 00:01:03.36      4571     38
206071DB RDBSERVER_33730 LEF      6     1691   0 00:00:01.48      1374    367  N
206069ED RDBSERVER_33750 LEF      6    72561   0 00:00:57.79     44499  25405  N
20607200 RDBSERVER_33504 LEF      6      719   0 00:00:00.74       906    402  N
20602209 RDBSERVER_8348  LEF      6     1273   0 00:00:01.04       821    440  N
20607210 _RTA5:          LEF      4     1698   0 00:00:01.34      1391     59
20607215 RDBSERVER_33527 LEF      6      857   0 00:00:00.76       763    264  N
20607217 RDBSERVER_163   LEF      6      221   0 00:00:00.24       670    190  N
20602225 RDBSERVER_8362  LEF      6     2416   0 00:00:01.70      1162    386  N
20607227 RDBSERVER_33548 LEF      6      215   0 00:00:00.23       596    413  N
2060722F RDBSERVER_8371  LEF      6      220   0 00:00:00.28       621    166  N
20606A39 RDBSERVER_8385  LEF      6      222   0 00:00:00.27       617    219  N
20606A3A RDBSERVER_8386  LEF      6      789   0 00:00:01.10       665    427  N
20606246 SUTHERLAND_K    LEF      5    27654   0 00:00:36.99      2665    174
2060724A RDBSERVER_8398  LEF      6     1108   0 00:00:00.98       996    333  N
20606A50 RDBSERVER_8409  LEF      6     1880   0 00:00:01.56      1138    307  N
20605259 RDBSERVER_8418  LEF      6      514   0 00:00:00.51       824    174  N
2060725D RDBSERVER_33589 LEF      4    23924   0 00:00:15.44       726    488  N
20601A61 RDBSERVER_33597 LEF      6      355   0 00:00:00.37       773    144  N
20606263 RDBSERVER_33593 LEF      6      201   0 00:00:00.34       616    166  N
20601A69 PRIV_1          LEF      5     4474   0 00:00:04.09      1522    318  S
2060526D RDBSERVER_33606 LEF      6      213   0 00:00:00.26       600    238  N
20607272 RDBSERVER_33611 RWMPB    6    11253   0 00:00:15.57     28792   7828  N
20606A78 RDBSERVER_33620 LEF      6    11292   0 00:00:11.69       732    495  N
20607279 _RTA9:          CUR      4      624   0 00:00:00.58      1746    101
2060727E FAL_33645       LEF      5      544   0 00:00:00.41       225     67  N
20607280 RDBSERVER_33633 LEF      4      750   0 00:00:00.78      1202    686  N
20607281 RDBSERVER_8468  LEF      4      446   0 00:00:00.63       986    331  N
20607282 RDBSERVER_8459  LEF      4      570   0 00:00:00.61      1064    612  N
20607283 RDBSERVER_33642 RWMPB    6      367   0 00:00:00.41       685    128  N
20606286 RDBSERVER_33632 RWMPB    6       94   0 00:00:00.20       621    278  N
20606A88 NET_273         RWMPB    6      136   0 00:00:00.26       674    471  N
2060628A FAL_33384       RWMPB    6      129   0 00:00:00.15       352     86  N
20606A8B RDBSERVER_33643 RWMPB    6       63   0 00:00:00.09       307    128  N
2060728C RDBSERVER_33388 RWMPB    6       60   0 00:00:00.11       301    128  N
2060728D RDBSERVER_277   RWMPB    4       56   0 00:00:00.09       192     84  N
2060728E _RTA10:         RWMPB    6       94   0 00:00:00.10       106     51
20605B36 LCKMGR          LEF      8      383   0 00:00:00.25       410    110
20605B42 RDBSERVER_33578 LEF      6      344   0 00:00:00.38       852     34  N
20606B45 RDBSERVER_26662 LEF      6      303   0 00:00:00.33       751     19  N
20606B50 RDBSERVER_33600 LEF      5      512   0 00:00:00.59      1212     27  N
20606B5D RDBSERVER_26669 LEF      6      202   0 00:00:00.24       582     17  N
20606B5F RDBSERVER_33352 LEF      6      522   0 00:00:00.68      1225    369  N
20605B68 RDBSERVER_33354 LEF      6      363   0 00:00:00.46      1797     35  N
20606B77 RDBSERVER_33375 LEF      6      198   0 00:00:00.34       583     19  N
2060637B RDBSERVER_33379 LEF      6      364   0 00:00:00.44      1075    157  N
20605B7C RDBSERVER_33380 LEF      6      393   0 00:00:00.39       965    253  N
20606C18 RDBSERVER_33293 LEF      6    63257   0 00:01:06.20      3862    387  N
20606C1F RDBSERVER_33302 LEF      6      218   0 00:00:00.29       656     21  N
20606C24 RDBSERVER_33564 LEF      6     3255   0 00:00:02.25      1663    272  N
20606C29 RDBSERVER_33571 LEF      6     1079   0 00:00:00.96      1366     51  N
20606C2F RDBSERVER_33577 LEF      6      220   0 00:00:00.31       608     21  N
20605C34 RDBSERVER_33583 LEF      6     2647   0 00:00:01.77      2074    363  N
20606C47 PRIV            HIB      9   156056   0 00:01:11.10      6580     76
20606C4A RDBSERVER_33343 LEF      5      216   0 00:00:00.30       646     18  N
20606C5A RDBSERVER_33618 LEF      6     2200   0 00:00:01.57      1627    194  N
20606468 _RTA1:          LEF      6     9234   0 00:00:08.49      3343    111
20605C6B RDBSERVER_33641 LEF      6    36019   0 00:00:25.68      1988    189  N
20606C71 TRACEYC_MM      LEF      6      825   0 00:00:02.14      1960     19
20606C73 RDBSERVER_33649 LEF      6    44214   0 00:00:36.27     25461    340  N



After this, it got worse with more and more processes going into RWMPB, until the system virtually locked up 
completely.

Here's some relevant SYSGEN parameters (they're the same on both systems)

Parameter Name            Current    Default     Min.     Max.     Unit  Dynamic
--------------            -------    -------    -------  -------   ----  -------
PFCDEFAULT                     64         64         0      2032 Pagelets   D
 internal value                 4          4         0       127 Pages      D
GBLSECTIONS                   700        250        80      3276 Sections
GBLPAGES                   681600      30720     10240        -1 Pagelets
 internal value             42600       1920       640        -1 Pages
GBLPAGFIL                   32768        128        32        -1 Pages
MAXPROCESSCNT                1143         32        12      8192 Processes
VECTOR_PROC                     1          1         0         3 Coded-valu
PROCSECTCNT                   128         32         5      1024 Sections
MINWSCNT                       20         20        10        -1 Pure-numbe
PAGFILCNT                       4          4         4        63 Files
SWPFILCNT                       2          2         0        63 Files
SYSMWCNT                    53946       2048       512     65536 Pagelets
 internal value              3372        128        32      4096 Pages
KSTACKPAGES                     1          1         1       768 Pages
BALSETCNT                     500         30         8      8192 Slots
WSMAX                      524288       4096      1024   1048576 Pagelets
 internal value             32768        256        64     65536 Pages
NPAGEDYN                 45744128    1048576    163840        -1 Bytes
NPAGEVIR                167772160    8388608    163840        -1 Bytes
PAGEDYN                  48087040     212992     65536        -1 Bytes
VIRTUALPAGECNT             550000      65536      2048   4194304 Pagelets
 internal value             34375       4096       128   -262144 Pages
QUANTUM                        20         20         2     32767 10Ms       D
MPW_WRTCLUSTER                 64         64        16       512 Pages
MPW_HILIMIT                 32768        512        64     65535 Pages
MPW_LOLIMIT                  2048         16         0     65535 Pages
MPW_IOLIMIT                     4          4         1       127 I/O
MPW_THRESH                   4096         16         0     65536 Pages      D
MPW_WAITLIMIT               33024        576        64     65535 Pages      D
MPW_LOWAITLIMIT             32512        448        56     65535 Pages      D
FILE_CACHE                     10         10         0       100 Percent
PFRATL                          0          0         0        -1 Flts/10Sec D
PFRATH                          4          8         0        -1 Flts/10Sec D
WSINC                        2400       2400         0        -1 Pagelets   D
 internal value               150        150         0        -1 Pages      D
WSDEC                          37       4000         0        -1 Pagelets   D
 internal value                 3        250         0        -1 Pages      D
AWSMIN                        512        512         0        -1 Pagelets   D
 internal value                32         32         0        -1 Pages      D
AWSTIME                        20         20         1        -1 10Ms       D
SWPOUTPGCNT                  1024        512         0        -1 Pagelets   D
 internal value                64         32         0        -1 Pages      D
LONGWAIT                       30         30         0     65535 Seconds    D
DORMANTWAIT                     2          2         0     65535 Seconds    D
ERRORLOGBUFFERS                 4          4         2        64 Buffers
DUMPSTYLE                       1          0         0        -1 Bitmask
EXTRACPU                     1000       1000         0        -1 10Ms       D
MAXSYSGROUP                     8          8         1     32768 UIC Group  D
MVTIMEOUT                    3600       3600         1     64000 Seconds    D
TAPE_MVTIMEOUT                600        600         1     64000 Seconds    D
MAXBUF                       8192       8192      4096     64000 Bytes      D
DEFMBXBUFQUO                 1056       1056       256     64000 Bytes      D
DEFMBXMXMSG                   256        256        64     64000 Bytes      D
FREELIM                       520         32        16        -1 Pages
FREEGOAL                      520        200        16        -1 Pages      D
GROWLIM                       520         63         0        -1 Pages      D
BORROWLIM                     520        300         0        -1 Pages      D
XFMAXRATE                     236        236         0       255 Special    D
LAMAPREGS                       0          0         0       255 Mapregs
CLISYMTBL                     250        512        48      1024 Pagelets   D
LOCKIDTBL                   73728       1792      1792   4194304 Entries
LOCKIDTBL_MAX              262144      65535      1792   4194304 Entries
RESHASHTBL                  65535         64         1     65535 Entries
DEADLOCK_WAIT                  10         10         0        -1 Seconds    D
TIMEPROMPTWAIT              65535         -1         0        -1 uFortnight
LNMSHASHTBL                  2048        512       128      8192 Entries
LNMPHASHTBL                   512        512       128      8192 Entries


Finally, these are the quotas for the account through which most of the RDBSERVER processes are run:

Maxjobs:         0  Fillm:       300  Bytlm:       200000
Maxacctjobs:     0  Shrfillm:      0  Pbytlm:           0
Maxdetach:       0  BIOlm:       200  JTquota:       3000
Prclm:          15  DIOlm:       200  WSdef:         2048
Prio:            4  ASTlm:       210  WSquo:         8192
Queprio:         4  TQElm:        20  WSextent:     65536
CPU:        (none)  Enqlm:     12000  Pgflquo:     600000


Sorry it's such a long note, but this really is a serious one!
T.RTitleUserPersonal
Name
DateLines
657.1Escalate Formally, If Urgent...XDELTA::HOFFMANSteve, OpenVMS EngineeringWed May 28 1997 13:06131
   A 1 GB AlphaServer 2100 5/250, running OpenVMS Alpha V6.2-1H3...

   What is the baseline for this system when running "normally"?
   The average CPU, memory, I/O, etc., loading...  Does this
   average loading differ from the loading when the hangs arise?

:We have a very urgent problem on a customer's dual Alphaserver cluster.

   If urgent, then I'd suggest formal escalation via an IPMT...

:Yesterday morning we noticed one of the nodes had run out of pagefile, and
:processes (largely RDB server processes) were going into RWMPB state. 
:Shortly Afterwards, the system hung up. After about two hours, it suddenly
:came back, and everything was OK.

   I'd guess this system was memory-wedged...

:While it was "hung", we could do an occasional SHOW SYS and MON CLUS from
:the other node, and nothing was happening on the system. There was hardly
:any CPU usage or disk IO, in fact nothing of note, just a whole bag of
:processes in Resource Wait.

:This morning, exactly the same thing happened, but on the other cluster
:member. Again, the system hung for around two hours, then just as we
:decided to hit the rest button, it came back. This hadn't happened before
:this week.

:What I'm really after here is some idea of where to look for the problem
:- what are the most likely causes of this kind  of thing.

:Here's a few system stats. This is serious - the customer's going to get
:nasty if we don't sort it out soon.

   I'd suggest an IPMT, then.

:$sh cpu/ful
:
:TSLV12, a AlphaServer 2100 5/250
...


   When the problem next occurs, encourage these folks to force a
   crashdump, and get a copy of the dump file for analysis here in
   OpenVMS engineering.

              System Memory Resources on 27-MAY-1997 09:57:47.61
:Physical Memory Usage (pages):     Total        Free      In Use    Modified
:  Main Memory (1024.00Mb)         131072        1153       92109       37810

   This system is very constrained by available memory.  And thus system
   appears to have an unusually large modified list in proportion to the
   free memory.  I'd strongly suggest more memory, or smaller working
   sets, or fewer inswapped processes, or some combination of these
   factors...

...
:Slot Usage (slots):                Total        Free    Resident     Swapped
:  Process Entry Slots               1143        1009         134           0
:  Balance Set Slots                  500         368         132           0

   These values look rather unusual -- I'd assume there is some tuning
   "cruft" in MODPARAMS.DAT, something that caused a large number of
   processes to be requested.

Dynamic Memory Usage (bytes):      Total        Free      In Use     Largest
  Nonpaged Dynamic Memory       47218688    16411584    30807104     1018560
  Paged Dynamic Memory          48087040    12892144    35194896    12872656

   
:Paging File Usage (blocks):                     Free  Reservable       Total
:  DISK$DRS_PAGE_2:[SYSEXE]SWAPFILE_TSLV13.SYS;1
:                                              499968      499968      499968
:  DISK$DRS_PAGE_2:[SYSEXE]PAGEFILE_TSLV13.SYS;1
:                                                   0    -1931152     2999936
:Of the physical pages in use, 11598 pages are permanently allocated to OpenVMS.

   Your pagefile is badly overcommitted -- you need more pagefile
   configured, or more physical memory added, or fewer processes.
   (You're certainly not forcing processes out -- it's possible
   you might want to encourage some swapping of idle processes, to
   try to free up some memory.  This is a short-term fix, pending
   system and workload adjustments.)  I'd look at configuring the
   additional pagefile on another disk spindle, under the assumption
   that spreading the paging I/O loading is prefered...

:NatDRP> sh sys
:
:OpenVMS V6.2-1H3  on node TSLV13  27-MAY-1997 09:58:51.49  Uptime  8 01:57:49
:  Pid    Process Name    State  Pri      I/O       CPU       Page flts  Pages
:20600801 SWAPPER         HIB     16        0   0 00:00:53.50         0      0
:20600805 CONFIGURE       HIB     10       18   0 00:01:09.71       199     14
:...
20607043 RDBSERVER_2170  LEF      6     1172   0 00:00:00.96      1592    171  N
2060704A RDBSERVER_2186  LEF      6    65857   0 00:00:40.19     26808    508  N
2060606E RDBSERVER_33440 LEF      6    68636   0 00:00:40.84     25591     82  N
20607071 RDBSERVER_2210  LEF      6     2818   0 00:00:02.26      2478    331  N
..
2060687A RDBSERVER_2217  LEF      6      960   0 00:00:00.70      1082     22  N
2060707C RDBSERVER_26794 LEF      6      842   0 00:00:00.63      1521    203  N
2060707D RDBSERVER_33704 LEF      6      764   0 00:00:00.87      1093    363  N
..
2060709A RDBSERVER_2234  LEF      6     1295   0 00:00:01.21      1591    126  N
2060709B RDBSERVER_2235  LEF      6      212   0 00:00:00.26       611     26  N
2060709D RDBSERVER_2236  LEF      6     2290   0 00:00:01.71      1521     24  N
2060709F RDBSERVER_33721 LEF      6    70148   0 00:00:53.33     38766    441  N
206060AA RDBSERVER_33476 LEF      6      217   0 00:00:00.29       642     18  N
206070AB RDBSERVER_33477 LEF      4     2970   0 00:00:02.16      2136    194  N
206068AC RDBSERVER_33479 LEF      6      239   0 00:00:00.29       695     20  N
...

   See if there are any images that can be installed /SHARE -- to try to
   reduce the memory requirements.  This AlphaServer 2100 looks pretty
   heavily loaded...  And in particular, see if any of those Rdb server
   processes have component images that can be installed /SHARE...

:Sorry it's such a long note, but this really is a serious one!

   Again, I'd suggest an IPMT.

   As for SYSGEN, find out what is in MODPARAMS.DAT, and find out
   when the last pass of AUTOGEN -- with FEEDBACK -- was run.  If
   this tool is not used regularly, then clean the "cruft" out of
   MODPARAMS.DAT, re-run AUTOGEN with FEEDBACK, and reboot.

   I'd also encourage the use of DECamds, as this tool can often
   be very useful for identifying and managing these sorts of
   problems...

   And I'd recommend more memory...  A faster processor...   Etc...

657.2ThanksROBSON::drspc8.reo.dec.com::WarneSystems from HeavenWed May 28 1997 14:08296
Thanks for the reply. 

The system is always pretty heavily loaded, but it has got heavier over the past few weeks as more sites have connected 
to it.

Autogen hasn't been used to set the parameters directly, but parameters have been changed on the basis of Autogen 
reports. 

I've ran an Autogen (testfiles) yesterday , and one value hardcoded in MODPARAMS which worries me is 
VIRTUALPAGECNT = 550000 . This seems rather low - could it cause problems?

I've tagged the Agen report file at the end of this note.

*
Could you confirm to me how to force a crash on the Alpha? I guess if it's hung, you need to hit the reset button, then 
write something invalid into the memory to make it crash, but I'm not sure of the exact procedure.
*

Again, many thanks.


Chris






  Old values below are the parameter values at the time of collection.
  The feedback data is based on 219 hours of up time.
  Feedback information will be used in the subsequent calculations


Parameter information follows:
------------------------------

MAXPROCESSCNT parameter information:
        Feedback information.
           Old value was 1143, New value is 914
           Maximum Observed Processes: 188

VIRTUALPAGECNT parameter information:
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 2105344.  The value 550000
           will be used in accordance with the following requirements:
           VIRTUALPAGECNT has been specified by a hard-coded value of 550000.

Information on OpenVMS executable image Processing:

        Processing SYS$MANAGER:VMS$IMAGES_MASTER.DAT
           Total global pagelets counted = 41393
           Total global sections counted = 121
           Total resident code pages counted = 320
           Total resident data pages counted = 0


GBLPAGFIL parameter information:
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 128.  The value 32768
           will be used in accordance with the following requirements:
           GBLPAGFIL has been increased by 768.
           GBLPAGFIL minimum value is 32768.

GBLPAGES parameter information:
        Feedback information.
           Old value was 681600, New value is 681600
           Current used GBLPAGES: 131056
           Global buffer requirements: 32768

GBLSECTIONS parameter information:
        Feedback information.
           Old value was 700, New value is 700
           Current used GBLSECTIONS: 477
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 570.  The value 700
           will be used in accordance with the following requirements:
           GBLSECTIONS has been increased by 180.
           GBLSECTIONS minimum value is 700.

LOCKIDTBL parameter information:
        Feedback information.
           Old value was 73728, New value is 96256
           Current number of locks: 102390
           Peak number of locks: 120832

LOCKIDTBL_MAX parameter information:
        Feedback information.
           Old value was 262144, New value is 262144
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 157081.  The value 262144
           will be used in accordance with the following requirements:
           LOCKIDTBL_MAX minimum value is 262144.

RESHASHTBL parameter information:
        Feedback information.
           Old value was 65535, New value is 65536
           Current number of resources: 59632

TMSCP_LOAD parameter information:
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 0.  The value 1
           will be used in accordance with the following requirements:
           TMSCP_LOAD has been specified by a hard-coded value of 1.

MSCP_BUFFER parameter information:
        Feedback information.
           Old value was 2048, New value is 2048
           MSCP server I/O rate: 0 I/Os per 10 sec.
           I/Os that waited for buffer space: 0
           I/Os that fragmented into multiple transfers: 0

SCSCONNCNT parameter information:
        Feedback information.
           Old value was 40, New value is 40
           Peak number of nodes: 2
           Number of CDT allocation failures: 0

SCSRESPCNT parameter information:
        Feedback information.
           Old value was 300, New value is 300
           RDT stall count: 0

SCSBUFFCNT parameter information:
        Feedback information.
           Old value was 4096, New value is 4096
           CIBDT stall count: 0

SHADOW_MAX_COPY parameter information:
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 4.  The value 18
           will be used in accordance with the following requirements:
           SHADOW_MAX_COPY has been specified by a hard-coded value of 18.

NPAGEDYN parameter information:
        Feedback information.
           Old value was 45744128, New value is 48791552
           Maximum observed non-paged pool size: 49807360 bytes.
           Non-paged pool request rate: 61 requests per 10 sec.

LNMSHASHTBL parameter information:
        Feedback information.
           Old value was 2048, New value is 1280
           Current number of shareable logical names: 1640
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 1024.  The value 1280
           will be used in accordance with the following requirements:
           LNMSHASHTBL minimum value is 1280.

BALSETCNT parameter information:
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 912.  The value 500
           will be used in accordance with the following requirements:
           BALSETCNT has been specified by a hard-coded value of 500.

ACP_DIRCACHE parameter information:
        Feedback information.
           Old value was 24576, New value is 24576
           Hit percentage: 99%
           Attempt rate: 1117 attempts per 10 sec.
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 3000.  The value 24576
           will be used in accordance with the following requirements:
           ACP_DIRCACHE has been specified by a hard-coded value of 24576.

ACP_DINDXCACHE parameter information:
        Feedback information.
           Old value was 8192, New value is 8192
           Hit percentage: 100%
           Attempt rate: 456 attempts per 10 sec.
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 750.  The value 8192
           will be used in accordance with the following requirements:
           ACP_DINDXCACHE has been specified by a hard-coded value of 8192.

ACP_HDRCACHE parameter information:
        Feedback information.
           Old value was 24576, New value is 24576
           Hit percentage: 97%
           Attempt rate: 776 attempts per 10 sec.
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 3000.  The value 24576
           will be used in accordance with the following requirements:
           ACP_HDRCACHE has been specified by a hard-coded value of 24576.

ACP_MAPCACHE parameter information:
        Feedback information.
           Old value was 1024, New value is 1024
           Hit percentage: 80%
           Attempt rate: 0 attempts per 10 sec.

PAGEDYN parameter information:
        Feedback information.
           Old value was 48087040, New value is 47955968
           Current paged pool usage: 35128128 bytes.
           Paged pool request rate: 118 requests per 10 sec.

PFRATH parameter information:
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 8.  The value 4
           will be used in accordance with the following requirements:
           PFRATH has been specified by a hard-coded value of 4.

WSDEC parameter information:
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 4000.  The value 37
           will be used in accordance with the following requirements:
           WSDEC has been specified by a hard-coded value of 37.

DUMPSTYLE parameter information:
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 1.  The value 0
           will be used in accordance with the following requirements:
           DUMPSTYLE has been specified by a hard-coded value of 0.

FREEGOAL parameter information:
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 2000.  The value 520
           will be used in accordance with the following requirements:
           FREEGOAL has been specified by a hard-coded value of 64.

MPW_LOLIMIT parameter information:
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 1500.  The value 2048
           will be used in accordance with the following requirements:
           MPW_LOLIMIT minimum value is 2048.

PROCSECTCNT parameter information:
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 32.  The value 128
           will be used in accordance with the following requirements:
           PROCSECTCNT has been specified by a hard-coded value of 128.

VAXCLUSTER parameter information:
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 1.  The value 2
           will be used in accordance with the following requirements:
           VAXCLUSTER has been specified by a hard-coded value of 2.

EXPECTED_VOTES parameter information:
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 5.  The value 3
           will be used in accordance with the following requirements:
           EXPECTED_VOTES has been specified by a hard-coded value of 3.

VOTES parameter information:
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 1.  The value 2
           will be used in accordance with the following requirements:
           VOTES has been specified by a hard-coded value of 2.

RMS_DFNBC parameter information:
        - AUTOGEN parameter calculation has been overridden.
           The calculated value was 8.  The value 64
           will be used in accordance with the following requirements:
           RMS_DFNBC has been specified by a hard-coded value of 64.

GH_EXEC_CODE parameter information:
        Feedback information.
           Old value was 512, New value is 512

GH_EXEC_DATA parameter information:
        Feedback information.
           Old value was 128, New value is 112

GH_RES_CODE parameter information:
        Feedback information.
           Old value was 512, New value is 512

GH_RES_DATA parameter information:
        Feedback information.
           Old value was 0, New value is 0

Page, Swap, and Dump file calculations
--------------------------------------

Page and Swap file calculations:
--------------------------------

PAGEFILE information:
        Feedback information.
           Old value was 3000000, New value is 3000000
           Maximum observed usage: 2159280
        Override Information - parameter calculation has been overridden.
           The calculated value was 3238900.  The new value is 3000000.
           PAGEFILE calculation has been set to current size by user.
           PAGEFILE will not be modified. The file size is within 10%.

SWAPFILE information:
        Feedback information.
           Old value was 500000, New value is 500000
           Maximum observed usage: 5120
        Override Information - parameter calculation has been overridden.
           The calculated value was 234000.  The new value is 500000.
           SWAPFILE calculation has been set to current size by user.
           SWAPFILE will not be modified. The file size is within 10%.

657.3Clean MODPARAMS, ReAUTOGEN, Reboot, Buy MemoryXDELTA::HOFFMANSteve, OpenVMS EngineeringWed May 28 1997 14:49111
:Thanks for the reply. 

   (Text wrapped for width...)

:The system is always pretty heavily loaded, but it has got heavier over the
:past few weeks as more sites have connected to it.

   It looks like you have hit a knee in the curve, then...  :-)

:Autogen hasn't been used to set the parameters directly, but parameters have
:been changed on the basis of Autogen reports. 

   OK -- I'm looking to avoid cases where folks have not used AUTOGEN
   to make changes, to avoid cases where folks have made direct SYSGEN
   changes, and to avoid cases where folks have constrained AUTOGEN
   through site-specific tuning entries ("cruft") in MODPARAMS.DAT.

:I've ran an Autogen (testfiles) yesterday , and one value hardcoded in
:MODPARAMS which worries me is VIRTUALPAGECNT = 550000 . This seems rather
:low - could it cause problems?

   It could, but the problems caused by insufficient virtual page
   count settings do not usually cause the sorts of problems reported
   here...  (Most applications usually slam into the lesser of the
   VIRTUALPAGECNT and the PGFLQUOTA settings, and merely "fall over".
   It's certainly possible Rdb is smarter about this, but you'll want
   to check the Oracle Rdb documentation for suggested settings for
   these and other parameters...)

:I've tagged the Agen report file at the end of this note.

   You might also want to post the contents of MODPARAMS.DAT.

   I'd also determine who suggested the PFRATH and WSDEC settings
   shown in the autogen report, and who suggested the FREEGOAL
   parameter settings, as I would tend to leave these to the default
   settings calculated by AUTOGEN.  (If one is not careful with
   these particular parameters, one can encounter thrashing...)

   I'd also look at the lock tables, as Rdb can definitely consume
   a large number of locks, and I'd expect has specific requirements
   around these and other parameters.  (The values for SYSGEN and
   for SYSUAF settings are called out in the Rdb documentation for
   the particular version of Rdb in use.

   And I'd also tend to replace any sorts of PAGEFILE=3000000 and
   SWAPFILE=500000 "absolute assignments" in MODPARAMS with either
   MIN_PAGEFILE and MIN_SWAPFILE settings, or with PAGEFILE=0 and
   SWAPFILE=0 settings.  I tend to prefer the latter, as it tells
   AUTOGEN to avoid these files, and I can adjust these settings
   (usually upward) to match local requirements manually...

:Could you confirm to me how to force a crash on the Alpha? I guess if it's 
:hung, you need to hit the reset button, then write something invalid into
:the memory to make it crash, but I'm not sure of the exact procedure.

   Given the DUMPSTYLE setting, make sure you have bootstrapped
   with a suffiently large dump file -- the current setting of
   DUMPSTYLE saves all of physical memory...

   See below for the generic directions on how to crash an Alpha,
   and write a dump file...

   From the OpenVMS installation and upgrade manual (available to DIGITAL
   internal users via the URL http://axiom.zko.dec.com:8000/docset/):

A.3.2.3 Emergency Shutdown with Crash Commands

Use crash commands only if the system is "hung" (stops responding to any commands) and you cannot log in to
the SYSTEM account to use the SHUTDOWN.COM procedure or the OPCCRASH.EXE program. 

Note: The method described here works on all Alpha computers. However, on certain systems, you can force
your processor to fail (crash) by entering a specific console command. See the hardware manuals that came with
your computer for that information. 

To force your processor to fail, do the following: 

   1.Halt the system by entering Ctrl/P or by pressing the Halt button. (See Section A.3.1 for more
     information about how to halt your Alpha computer.) 
   2.To examine processor registers, enter the following commands and press the Return key: 

     >>> E -N F R0
     >>> E PS


     The system displays the contents of the registers. Write down these values if you want to save
     information about the state of the system. 
   3.Enter the following commands and press the Return key: 

     >>> D PC FFFFFFFF00000000
     >>> D PS 1F00


     By depositing these values, you cause the system to write a memory dump to the system dump file on the
     disk. 
   4.Enter the following command and press the Return key: 

     >>> CONTINUE


     This causes the system to perform a bugcheck. 
   5.After the system reboots, log in to the SYSTEM account. 
   6.To examine the dump file, enter the following commands and press the Return key after each one: 

     $ ANALYZE/CRASH SYS$SYSTEM:SYSDUMP.DMP
     SDA> SHOW CRASH


     For more information about the System Dump Analyzer (SDA) utility, see the OpenVMS Alpha System
     Dump Analyzer Utility Manual. 
657.4MODPARAMSROBSON::drspc8.reo.dec.com::WarneSystems from HeavenWed May 28 1997 15:04348
Here's the MODPARAMS.DAT file (it's a bit of a mess!)

I've sent a request to the guy I think set these parameters up, to 
see if he can explain them.


Many thanks again!

Chris

!****************************************************************************
! This section contains System Parameters found in
! SYS$SYSDEVICE:[SYS0.SYSEXE]ALPHAVMSSYS.PAR
! with values that must be preserved when AUTOGEN is run.
!
SCSSYSTEMID=1124
SCSNODE="PLANET  "
VAXCLUSTER=2
EXPECTED_VOTES=1
VOTES=1
RECNXINTERVAL=20
DISK_QUORUM="                "
QDSKVOTES=1
QDSKINTERVAL=10
ALLOCLASS=100
LOCKDIRWT=0
NISCS_CONV_BOOT=0
NISCS_LOAD_PEA0=1
NISCS_PORT_SERV=0
MSCP_LOAD=1
MSCP_SERVE_ALL=2
!****************************************************************************
! This section contains any parameters found in
! SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
!
!vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
!
!++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
! SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
! Created during upgrade to OpenVMS AXP V6.2 16-MAY-1996 09:39:33.53
!
! This is a new file created by the OpenVMS upgrade procedure.  This file
! was built by using the data found in the following file(s) previously
! used by this system:
!
!      SYS$SYSDEVICE:[SYS0.SYSEXE]ALPHAVMSSYS.PAR
!      SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
!
! This/These old file(s) have been renamed to:
!
!      SYS$SYSDEVICE:[SYS0.SYSEXE]ALPHAVMSSYS.PAR_OLD
!      SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT_OLD
!
! A new
!
!      SYS$SYSDEVICE:[SYS0.SYSEXE]ALPHAVMSSYS.PAR
!
! has been built for you in order to ensure compatiblity with this release.
! Previous parameters found to be larger than the new defaults were retained.
! Certain other previous parameters were also retained.
!
! Please check the following sections of this file to see what files were used
! in what sequence to create the new APLHAVMSSYS.PAR file.
! Please review and edit this file for possible duplications, additions
! and deletions you wish to make.
!
!----------------------------------------------------------------------------
!****************************************************************************
! This section contains System Parameters found in
! SYS$SYSDEVICE:[SYS0.SYSEXE]ALPHAVMSSYS.PAR
! with values that must be preserved when AUTOGEN is run.
!
SCSSYSTEMID=1124
SCSNODE="PLANET  "
VAXCLUSTER=2
EXPECTED_VOTES=1
VOTES=1
RECNXINTERVAL=20
DISK_QUORUM="                "
QDSKVOTES=1
QDSKINTERVAL=10
ALLOCLASS=100
LOCKDIRWT=0
NISCS_CONV_BOOT=0
NISCS_LOAD_PEA0=1
NISCS_PORT_SERV=0
MSCP_LOAD=1
MSCP_SERVE_ALL=2
!****************************************************************************
! This section contains any parameters found in
! SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
!
!vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
!
! SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
! Created during upgrade to OpenVMS AXP V6.1  1-AUG-1994 15:34:57.78
!
! This is a new file created by the OpenVMS upgrade procedure.  This file
! was built by using the data found in the following file(s) previously
! used by this system:
!
!      SYS$SYSDEVICE:[SYS0.SYSEXE]ALPHAVMSSYS.PAR
!      SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
!
! This/These old file(s) have been renamed to:
!
!      SYS$SYSDEVICE:[SYS0.SYSEXE]ALPHAVMSSYS.PAR_OLD
!      SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT_OLD
!
! A new
!
!      SYS$SYSDEVICE:[SYS0.SYSEXE]ALPHAVMSSYS.PAR
!
! has been built for you in order to ensure compatiblity with this release.
! Previous parameters found to be larger than the new defaults were retained.
! Certain other previous parameters were also retained.
!
! Please check the following sections of this file to see what files were used
! in what sequence to create the new APLHAVMSSYS.PAR file.
! Please review and edit this file for possible duplications, additions
! and deletions you wish to make.
!
!----------------------------------------------------------------------------
!****************************************************************************
! This section contains System Parameters found in
! SYS$SYSDEVICE:[SYS0.SYSEXE]ALPHAVMSSYS.PAR
! with values that must be preserved when AUTOGEN is run.
!
SCSSYSTEMID=1124
SCSNODE="PLANET  "
VAXCLUSTER=2
EXPECTED_VOTES=1
VOTES=1
RECNXINTERVAL=20
DISK_QUORUM="                "
QDSKVOTES=1
QDSKINTERVAL=10
ALLOCLASS=100
LOCKDIRWT=0
NISCS_CONV_BOOT=0
NISCS_LOAD_PEA0=1
NISCS_PORT_SERV=0
MSCP_LOAD=1
MSCP_SERVE_ALL=2
!****************************************************************************
! This section contains any parameters found in
! SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
!
!vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
!
! SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
! Created during installation of OpenVMS AXP V6.1 27-JUL-1994 11:55:18.95
!
SCSNODE="PLANET"
SCSSYSTEMID="1124"
!
! End of SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
! Created during installation of OpenVMS AXP V6.1 27-JUL-1994 11:55:18.95
! CLUSTER_CONFIG appending for ADD operation on 27-JUL-1994 12:02:56.43
VOTES=1
DISK_QUORUM=""
AGEN$INCLUDE_PARAMS SYS$MANAGER:AGEN$NEW_NODE_DEFAULTS.DAT
SCSNODE="PLANET"
SCSSYSTEMID=1124
NISCS_LOAD_PEA0=1
VAXCLUSTER=2
MSCP_LOAD=1
MSCP_SERVE_ALL=2
ALLOCLASS=100
INTERCONNECT="NI"
BOOTNODE="NO"
! CLUSTER_CONFIG end

! for RDB
min_pql_denqlm = 1000
min_gblpages = 115000
pagefile=0
swapfile=0
dumfile=0
!
!^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
!
! End of SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
! Created during upgrade to OpenVMS AXP V6.1  1-AUG-1994 15:34:57.78
!
!^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
!
! End of SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
! Created during upgrade to OpenVMS AXP V6.2 16-MAY-1996 09:39:33.53
!
!^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
!
! End of SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT
! Created during upgrade to OpenVMS AXP V6.2 20-MAY-1996 09:47:52.62
!--------------------------------------------------------------
! These are recommended values for a DRS Alpha V3.1 system
! which should not be changed without reference to the
! DRS Alpha V3.1 installation guide
!--------------------------------------------------------------
SCSNODE="TSLV12"
SCSSYSTEMID=58146
VAXCLUSTER=2
!
SHADOWING=2
!!SHADOW_MAX_COPY=13
SHADOW_SYS_DISK=1
SHADOW_SYS_UNIT=999
!
DR_UNIT_BASE=10
!
EXPECTED_VOTES=3
VOTES=2
DISK_QUORUM="DKB102"
QDSKVOTES=1
RECNXINTERVAL=20
QDSKINTERVAL=10
ALLOCLASS=100
INTERCONNECT="FDDI"
LOCKDIRWT=1
NISCS_CONV_BOOT=0
NISCS_LOAD_PEA0=1
NISCS_PORT_SERV=0
MSCP_LOAD=1
MSCP_SERVE_ALL=1
BOOTNODE="YES"
MIN_PAGEDYN=2053000

MIN_GBLPAGES=130000
MIN_GBLPAGFIL=32768
MIN_GBLSECTIONS=700

MIN_SYSMWCNT=6144

VIRTUALPAGECNT=550000
MIN_WSMAX=20480

MIN_LNMSHASHTBL=1280
MIN_RESHASHTBL=2048
MIN_LOCKIDTBL=10240
MIN_LOCKIDTBL_MAX=262144

MIN_MAXPROCESSCNT=512
BALSETCNT=500

MIN_SWPOUTPGCNT=1024

DUMPBUG=0
DUMPSTYLE=0
SAVEDUMP=1

CHANNELCNT=8191

CLISYMTBL=250
CTLPAGES=1500

MAXBUF=8192
PROCSECTCNT=128
DEADLOCK_WAIT=10

MSCP_BUFFER=2048
MSCP_CREDITS=32
SCSBUFFCNT=4096
MIN_SCSCONNCNT=40

ACP_DIRCACHE=24576
ACP_DINDXCACHE=8192
ACP_HDRCACHE=24576
ACP_FIDCACHE=8192
ACP_MAPCACHE=1024
ACP_MAXREAD=64
ACP_WINDOW=16
RMS_DFNBC=64

WSINC=2400
PFRATH=4
WSDEC=37
MIN_PFCDEFAULT=64

MIN_SPTREQ=2500

MIN_PQL_MPRCLM=8
MIN_PQL_MFILLM=100
MIN_PQL_MASTLM=600
MIN_PQL_MBIOLM=100
MIN_PQL_MBYTLM=40000
MIN_PQL_MDIOLM=100
MIN_PQL_MENQLM=12000

MIN_PQL_MWSDEFAULT=512
MIN_PQL_MWSQUOTA=1024
MIN_PQL_MWSEXTENT=2048

MIN_PQL_DFILLM=128
MIN_PQL_DASTLM=4096
MIN_PQL_DBIOLM=128
MIN_PQL_DBYTLM=65536
MIN_PQL_DDIOLM=4096
MIN_PQL_DENQLM=12000
MIN_PQL_DJTQUOTA=8192
MIN_PQL_DTQELM=20

MIN_PQL_DWSDEFAULT=1024
MIN_PQL_DWSEXTENT=1024

MIN_PQL_DFILLM=300
MIN_PQL_DENQLM=12000

MIN_PQL_DPGFLQUOTA=32768

FREEGOAL=64
MMG_CTLFLAGS = 3

MIN_MPW_HILIMIT=6368
MIN_MPW_LOLIMIT=2048
MIN_MPW_THRESH=2048
MIN_MPW_WAITLIMIT=12288
MIN_MPW_LOWAITLIMIT=6128

WINDOW_SYSTEM=1

WS_OPA0=1

LGI_BRK_TERM=0
LGI_BRK_DISUSER=0
LGI_PWD_TMO=30
LGI_RETRY_LIM=3
LGI_RETRY_TMO=20
LGI_BRK_LIM=2
LGI_BRK_TMO=900
LGI_HID_TIM=216000

DUMPFILE=0
SWAPFILE=0
PAGEFILE=0
MIN_NPAGEDYN=33554432   ! 32MB for National DRP

!-----------------------------------------------
!End of parameter definitions for DRS Alpha V3.1
!-----------------------------------------------

!-----------------------------------------------
! Added by Chris Turner 15-Feb-1997 for increased
! Map Data disk implementation (12 Member BVS)
!-----------------------------------------------
TAPE_ALLOCLASS=100
TMSCP_LOAD=1
TMSCP_SERVE_ALL=1
SHADOW_MAX_COPY=18
657.5That File Just Screams "Clean Me"...XDELTA::HOFFMANSteve, OpenVMS EngineeringWed May 28 1997 15:5529
   Well, that MODPARAMS.DAT clearly meets the definition of "cruft".

   I would review the documentation for the various layered products,
   and I would make a serious effort to reduce the number of parameter
   entries in this MODPARAMS.DAT file -- there are several blocks of
   duplicate entries...

   I would definitely comment-out the WSINC=2400, PFRATH=4, and the
   WSDEC=37 entries, and allow SYSGEN and AUTOGEN to determine values.
   And I'd also comment-out the *_MPW_* settings...

   And as mentioned before, I'd look seriously at reducing the overall
   system load and/or at increasing the available CPU and memory...

   (As one test, save a copy of MODPARAMS.DAT and the *.PAR parameter
   file to the side, clean out everything in MODPARAMS.DAT that isn't
   required by one of the application packages, reAUTOGEN, and reboot.
   If problems ensue, one can reload the *.PAR file via a SYSGEN> USE
   command, and reboot, or reAUTOGEN and reboot.  Or one can resolve
   and then adjust the setting(s) of the parameter(s), and reboot...)

	--

   ps: "dumfile=0" won't work, and should engender a message or two.
   The customer also has several different settings for key parameters,
   such as VOTES, DISK_QUORUM, etc.  (These parameters are not likely
   related to the specific problem you are seeing, however.)

657.6Oh no, it's getting worse!ROBSON::drspc8.reo.dec.com::WarneSystems from HeavenThu May 29 1997 13:5125
I'll definitely be reviewing the MODPARAMS settings as you advise.

It got worse this morning! One of the nodes crashed instantly, with no 
warning and no logs at all. It rebooted, but when I did an ANAL/CRASH 
all I got was :

%SDA-W-INCOMPL, system space memory not completely written in dump file
%SDA-W-NOTSAVED, global pages not saved in the dump file
%SDA-W-NOTSAVED, processes not saved in the dump file
%SDA-E-NOREAD, unable to access location 8AC91988

There's limited system disk space, so the dump file was only 54820 
blocks (DUMPSTYLE is set to 1, despite what it says in MODAPRAMS).

Is this SDA error because the file is just too small to write any 
meaningful information to it, or could it be anything to do with having 
a shadowed system disk?

Of course, when it rebooted, all the disk sets went into ShadowMergeMbr 
state, and users couldn't do much work on the system as it was running 
with 90% Interrupt State! SHADOW_MAX_COPY was set to 18, so I've advised 
it's dropped down to 6 to hopefully ease this problem next time. 
Without a dump, I can't say why the system crashed, but it looks like 
we've got two problems here - one a tuning issue, and the other a 
mystery!
657.7Clean Out MODPARAMSXDELTA::HOFFMANSteve, OpenVMS EngineeringThu May 29 1997 14:3340
   paragraphs reordered...

:It got worse this morning! One of the nodes crashed instantly, with no 
:warning and no logs at all. It rebooted, but when I did an ANAL/CRASH 
:all I got was :

   The only interesting clue here -- given the lack of a dump
   file -- is what got displayed on the console during the
   crash.

:There's limited system disk space, so the dump file was only 54820 
:blocks (DUMPSTYLE is set to 1, despite what it says in MODAPRAMS).

   That would tend to point to hand-tuning in SYSGEN, and that's
   something that tends to lead to these sorts of weird problems
   when a hand-made tweak isn't reflected in another parameter,
   and to weird problems after AUTOGEN is run and some hand-made
   tweak is lost.

:%SDA-W-INCOMPL, system space memory not completely written in dump file
:%SDA-W-NOTSAVED, global pages not saved in the dump file
:%SDA-W-NOTSAVED, processes not saved in the dump file
:%SDA-E-NOREAD, unable to access location 8AC91988

:Is this SDA error because the file is just too small to write any 
:meaningful information to it, or could it be anything to do with having 
:a shadowed system disk?

   The error from SDA -- there should be an equivilent set of
   warnings during the crash -- just indicates the dump is not
   complete.

   --

   Note that you can have a SYS$COMMON:[SYSEXE]SYSDUMP-COMMON.DMP,
   and add aliases into each root for SYS$COMMON:[SYSEXE]SYSDUMP.DMP.
   This saves some space, but risks overwriting the first dump if any
   other system crashes before the dump can be saved (via SDA> COPY
   or similar approaches) somewhere off the system disk...
657.8TWICK::PETTENGILLmulpFri May 30 1997 18:197
>>   Well, that MODPARAMS.DAT clearly meets the definition of "cruft".

VMS upgrades put an annoying amount of "cruft" in modparams.dat.

What is needed is a utility to cleanup and verify modparams.dat, but since
I've thought about it many times and never had time, I'm certainly not
volunteering to do it.
657.9Why no mini-merges?VMSSPT::JENKINSKevin M Jenkins VMS Support EngineeringMon Jun 02 1997 09:5530
<Of course, when it rebooted, all the disk sets went into ShadowMergeMbr 
<state, and users couldn't do much work on the system as it was running 
<with 90% Interrupt State! SHADOW_MAX_COPY was set to 18, so I've advised 
<it's dropped down to 6 to hopefully ease this problem next time. 
<Without a dump, I can't say why the system crashed, but it looks like 
<we've got two problems here - one a tuning issue, and the other a 
<mystery!

    Lowering SHADOW_MAX_COPY is not the way to address this. The problem
    is most likely that the system disk is in merge. The only way to
    deal with that is to remove one member of the shadowset and put
    it back as a copy target. If you are going to try this you need to be
    carefull to identify which of the shadowsets had the crash dump
    written to it, this is output at the console during the dump. Make
    sure you don't remove that one.
    
    Second, as long as the other disks are in merge then user IO is going
    to suffer. If this system is already running close to IO capacity then
    the extra load caused by the merge state can cause significant
    performance problems. The actual IO being done by the SHADOW_SERVER
    process is usually not a problem so lowering SHADOW_MAX_COPY can
    actually make the situtation worse. 
    
    I don't recall the config... but why didn't the shadowsets do
    mini-merges? What kind of controllers and disks are being used to
    create the shadowsets? Do you have write logging disabled via
    SHADOW_SYS_DISK set 801 hex?
    Kevin
    
657.10Standalone system will only do mergeVMSSPT::JENKINSKevin M Jenkins VMS Support EngineeringMon Jun 02 1997 09:584
    Went back and looked at .0, if this is a standalone system
    then you will always have to do full merges on after any system crash.
    
    
657.11my guessEVMS::KUEHNELAndy K�hnelMon Jun 02 1997 11:3347
    The initial problem was: system "locks up" temporarily with many
    processes in RWMPB.
    
    We get into this situation when a huge amount of pages get thrown on
    an already large modified page list: the total number of modified pages
    is > MPW_WAITLIMIT, therefore any process replacing a page in its working
    set that needs to go on the modified page list is thrown into RWMPB.
    
    Processes are released when the modified page list shrinks below
    MPW_LOWAITLIMIT.
    
    This whole mechanism was put into place in order to prevent a single
    thrashing process from essentially getting all system memory because
    it could potentially create modified pages much faster than they can be
    written back.  Unfortunately, innocent bystanders suffer if we need to
    throw just a single page out of their working set and onto the modified
    page list.
    
    On the system at hand, there are a couple of large server processes,
    which apparently can grow to > 25,000 pages.  Assuming that many of
    these are global pages that are not also mapped by another process, a
    simple image exit/process deletion will throw these pages onto the
    modified page list, and it will take a while to write this stuff back.
    This is what I believe happened and caused the "temporary hang"
    
    If this guess is right, what can you do to prevent this?
    
    You have 42,600 global pages (681,600 pagelets).  This would be the
    maximum of what can be thrown onto the modified page list by this
    mechanism.  A MPW_WAITLIMIT of about 42,000 pages amd a MPW_LOWAITLIMIT
    of 40,000 pages would probably do the job.  I would recommend against
    turning MPW_HILIMIT to this level because this would delay the writing
    of modified pages.
    
    You may also help the situation somewhat by reducing MPW_WRTCLUSTER to
    maybe 16 and increasing MPW_IOLIMIT to 64.  This could be even more
    effective if multiple pagefiles on different disks were used or if the
    pagefile were on a stripe set.
    
    
    However:
    
    The crash you saw may or may not be related to this problem.  Without a
    dump, it's impossible to tell.  If the crash was caused by some kind of
    memory starvation "at a bad time", by increasing MPW_WAITLIMIT you
    would risk running into the situation more easily!