[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference vaxaxp::vmsnotes

Title:VAX and Alpha VMS
Notice:This is a new VMSnotes, please read note 2.1
Moderator:VAXAXP::BERNARDO
Created:Wed Jan 22 1997
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:703
Total number of notes:3722

632.0. " BADDIRENT and BAD_DIRFIDSEQ... again" by HTSC19::KENNETH () Fri May 23 1997 05:05

Hi,

I have seen some of the entries on this conference having similar problem 
but I can't find a solution.

A customer told me that some of this files on a directory get lost, and
if you do a "$dir" you will see the file is marked "no such file".

$ DIR <DIRECTORY>

FILENAME  	bkl date
FILENAME no such file
FILENAME no such file
FILENAME no such file

When $ana/disk/repair DSA7: you will get:

%ANALDISK-W-BADDIRENT, invalid file identification indirectory entry
	[PORD.JOB.PTR]JV_125242.LST;1
-ANALDISK-I-BAD_DIRFIDSEQ, invalid file sequence number in directory file ID


The customer is running VAX OpenVMS V6.2 on a cluster environment with 
shadow disk.  The disk is a RZ28.  Disk keeper V7.1 is also running on 
the system.

Is there anyone can give me some idea about this error?

Thanks for your help in advance.

Kenneth Leung
T.RTitleUserPersonal
Name
DateLines
632.1Shut Off Disk KeeperXDELTA::HOFFMANSteve, OpenVMS EngineeringFri May 23 1997 10:1814
   The "no such file" error you are seeing is normal for "lost" files,
   but the question is why the file structure is getting corrupted in
   this fashion. 

   An obvious potential cause of this would be a buggy on-line disk
   defragmentation tool -- if this tool is in use at this site, then
   turn it off and see if the problem reoccurs.  I assume this is the
   "Disk Keeper" package is doing.

   Also, make sure the current set of shadowing patches is in use on
   the system.  If not, the get the current patches from the patch
   area (http://www.service.digital.com/), and apply them.

632.2VAXF11X01_071GIDDAY::GILLINGSa crucible of informative mistakesSun May 25 1997 19:1814
    
>   Also, make sure the current set of shadowing patches is in use on
>   the system.  If not, the get the current patches from the patch
>   area (http://www.service.digital.com/), and apply them.
    
    
    Even more important - make sure you have the latest F11X ECO installed.
    See VAXF11X01_071 - it corrects a number of problems, some of which
    could result in the symptoms you're describing if a defragger was
    used on the disk.
    
    This ECO is MANDATORY if you're using a defragger.
    
    						John Gillings, Sydney CSC
632.3This may be an operational problem rather than a corruption.MOVIES::MCLARENOh no - Not ANOTHER amusing one-linerMon May 26 1997 08:379
    
    	Note that you will see this error if Alias directory
    	entries are created using SET FILE /ENTER, and
    	then the target file is deleted, and the file header
    	re-used. This will leave dangling directory entries
    	which can result in the observed behaviour.
    
    regards
    Duncan McLaren.
632.4customer experienced serious disk corruptionCUJO::SAMPSONTue Jun 03 1997 02:2928
	Re: .2, FYI, we have a customer who experienced serious disk
corruption on all disks being defragmented, in the following unsupported
configuration:

	The defragmenter was X2.1A-3, running on OpenVMS Alpha V6.2,
on two AlphaServer 1000's.  The 38 disks being defragmented were on
OpenVMS VAX V6.2, OpenVMS Alpha V6.2, and OpenVMS Alpha V7.1, all in
the same cluster.  Note that the V6.2 systems did *not* have the
required CLUSIO ECO.

	The customer has temporarily stopped using the defragmenter,
is applying the ECOs recommended by the CSC, and will upgrade the
defragmenter to V2.2.  The customer finds it strange that there had
not been any apparent problems with the unsupported configuration
until May 20th, the day after the 1970+10K date.  We do not yet have
enough information to escalate an IPMT on this.

	Applying the ALPF11X01_071 ECO to the OpenVMS Alpha V7.1 systems
*only* has *not* been effective in preventing XQPERR crashes on those
systems.  It appears that the entire cluster must receive the appropriate
ECOs in order to fully benefit from them.

	The CXO CSC was apparently unaware of the possible existence of any
such disk corruption problem.  Where did you find out that VAXF11X01_071 is
*mandatory* on systems using a defragmenter?

	Thanks,
	Bob Sampson
632.5hopefully disk corruption is finally endedCUJO::SAMPSONTue Jun 03 1997 22:296
	The good news: customer finally applied the ECOs (CLUSIO, DRIV02,
and F11X) to all systems in the cluster today.  The bad news: corruption
of disks may have continued, even after use of the defragmenter was
discontinued.  One disk is missing its master file directory [000000],
and another is just missing some of its files, some of which are "lost"
from their directories, and others of which don't seem to exist anymore.
632.6make sure all previous mess is cleaned up!GIDDAY::GILLINGSa crucible of informative mistakesWed Jun 04 1997 03:339
> The bad news: corruption
>of disks may have continued, even after use of the defragmenter was
>discontinued. 

    Applying the ECO won't fix existing damage, which can lay "dormant" on
  the disk. I'd recommend an image backup and restore of any suspect disk,
  then look for signs of corruption which occurs *after* the restore.

						John Gillings, Sydney CSC
632.8VMSSG::FRIEDRICHSAsk me about Young EaglesThu Jun 05 1997 11:0011
   
   I anxiously await hearing from Al Meier. 
   
   It was my understanding from the grapevine this morning that the
   customer was not running COMPAT nor CLUSIO..  Even without either one of 
   them, I know of no issue with any software incompatibility with V7.1
   
   Cheers,
   jeff friedrichs
   Project Leader - COMPAT and CLUSIO kits
   
632.9The conclusions in .7 are PREMATURESTAR::BOAENLANclusters/VMScluster Tech. OfficeThu Jun 05 1997 12:3118
re: .7
 The conclusions in .7 are PREMATURE.  Al called me late yesterday and
among several things I suggested he check was to see if an old image/TIMA
kit had been loaded on top of the compatibility kit or some other newer
 image.  This was just a hunch, not based on any specific knowledge.

We've never seen anything like this widespread corruption before, except
 when someone's booted a CI node with VAXCLUSTER = 0 into a running cluster
creating a partitioned cluster.  The problem is still being researched
here.

PLEASE DON'T SUBMIT NOTES BASED ON IN PROGRESS PROBLEM SOLVING.  WE DON'T
KNOW WHETHER ARE NOT THE SCENARIO YOU DESCRIBE IS THE PROBLEM.  WE MADE A
STRONG EFFORT TO ENSURE COMPATIBILITY BETWEEN KITS SO WE SUSPECT YOU MAY BE
WRONG.
'Gards,

Verell
632.10never mind thenCUJO::SAMPSONThu Jun 05 1997 17:096
    Okay, I've deleted my note .7.  I didn't state the conjecture as a
    conclusion, but as a conjecture.  Since you don't want a progress
    report entered here, I will refrain from adding anything.
    
    Sorry,
    Bob Sampson