[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference vaxaxp::vmsnotes

Title:VAX and Alpha VMS
Notice:This is a new VMSnotes, please read note 2.1
Moderator:VAXAXP::BERNARDO
Created:Wed Jan 22 1997
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:703
Total number of notes:3722

259.0. "Can OpenVMS handle this much auditing?" by GIDDAY::GILLINGS (a crucible of informative mistakes) Thu Feb 27 1997 21:49

  I have a customer who wants to run with full file access auditing. That's
  SUCCESS, FAIL, SYSPRV, GRPPRV, READALL and BYPASS for all access types.

  They have specifically chosen systems which should have sufficient resources
  to do this. Indeed, they've been running for more than 1 year with auditing
  enabled as above. However, they have also been suffering crashes. At first
  these were occasional, but recently they have increased to 1 or more per day.
  It seems the frequency is related to the gradual increase in workload over
  time. The node is pretty much exclusively a Teamlinks server.

  The crash footprint is always very similar, the process is always OAFC$SERVER
  with READ ACCVIOs in EXE$CHKPRO at varying offsets. Those that I've checked
  are in the "item list scanner" at the targets of the large CASE statement
  leading to ITEM_xxx: labels. Possibly also of interest, the AUDIT_SERVER is
  always current on the other CPU. This doesn't match anything in CANASTA that
  I can access.

  The customer is running OpenVMS/Alpha V6.1 and Pathworks V4.2 in a 5
  node cluster of 2xdual processor 2100's + 3 VAXes. They won't upgrade
  OpenVMS or Pathworks for various reasons. We also can't get them to install
  all the patches we want installed. It's a secure site in another city so
  detailed analysis is very difficult.

  The good news is that I was able to convince them to disable SUCCESS and
  SYSPRV audits. Their audit load for this node went from anything up to
  500,000 blocks of journal per day to 20 events (yes, that's TWENTY events :-).
  Now, touch wood with all fingers and toes crossed, we've had 3 crash free
  days, so maybe the crashes were caused by stressing the security audit
  subsystem, or perhaps a small synchronisation window with AUDIT_SERVER.

  So, the question is, is OpenVMS qualified to run with that level of
  auditing, assuming sufficient disk, CPU and I/O bandwidth? We can point
  out the absurdity of it all, but it *does* sell hardware, so I'm very 
  loath to try and dissuade them from their wishes over the long term.
  Is there any specific tuning or system configuration that we should
  suggest? Perhaps there are some upgrades or patches we can convince them
  to install if/when the spooks insist that they need all that audit data.

						John Gillings, Sydney CSC 
T.RTitleUserPersonal
Name
DateLines
259.1UTRTSC::thecow.uto.dec.com::JurVanDerBurgChange mode to Panic!Fri Feb 28 1997 05:499
VMS should be able to deal with this, period. No crashes whatsoever should 
result from this. It looks like a synchronization problem somewhere where
the CHKPRO system service and the audit server play some role.

I suggest escalating this (or have a thourough look in the crash and look for
the problem instead of fighting symptoms. That's what i would do).

Jur.

259.2ALPHAZ::HARNEYJohn A HarneyFri Feb 28 1997 09:2512
re: .0

Well, VMS should run with all audits enabled, it will just be
very sluggish.

There were a few remedial fixes for CHKPRO; I don't know exactly when
or what, but I know they exist.  Is the customer adverse to even these
patches, or just random ones?

Back to V6.1 Alpha increases the chances that a fix will be available to help.

\john
259.3BSS::JILSONWFH in the Chemung River ValleyFri Feb 28 1997 09:317
IMHO there appear to be a number of scenarios where auditing can cause 
system hang, crashes, etc.  I IPMT'd a deathly embrace for mutexes when 
auditing failed logical name table access.  The auditing mechanisms mainly 
around logging is way too resource intensive for my taste.  I believe this 
will get worse as systems get faster and IO struggles to keep up.

Jilly
259.4ALPHAZ::HARNEYJohn A HarneyFri Feb 28 1997 09:489
re: .3

Well, I wouldn't say "a number of."  The mutex/logical-name-table/audit
problem is in fact the only (real) outstanding audit server problem I know
of, and it will likely never be solved.  (It involves two types of
incompatible synchronization, and brings on a deadlock)

If you know of others, please QAR or IMPT them!
\john