[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference noted::hackers_v1

Title:-={ H A C K E R S }=-
Notice:Write locked - see NOTED::HACKERS
Moderator:DIEHRD::MORRIS
Created:Thu Feb 20 1986
Last Modified:Mon Aug 03 1992
Last Successful Update:Fri Jun 06 1997
Number of topics:680
Total number of notes:5456

399.0. "DIFF of VMS not efficient" by LEROUF::GENTILI (franco GENTILI AEG Valbonne ) Wed Feb 04 1987 05:57

 Hi, 

 I am looking for something a program or an idea which would permit
 to do a perfect comparison between two files,such as the DIFF of VMS
 but the DIFFERENCES in VMS is not enough efficient for example in
 the following case :

 You ve A.TXT which is  aaaaa     
                        bbbbb
			ddddd   

 and    B.TXT which is  aaaa 
                        bbbbb   	
                        ccccc

As you can see the diff between the 2 files are 1st line in B.TXT as 1 "a"
less than A.TXT and last line of B.TXT is ccccc instead of ddddd.

If you do a DIFF a.txt b.txt you can't identify what were the diff..

even if you try some switches like /CHANGE etc...

If you have some ideas, it would be great.

Franc.
    

T.RTitleUserPersonal
Name
DateLines
399.1Use CMS DIFFERENCES insteadULTRA::CRANEOlorin I was in the West that is forgotten...Wed Feb 04 1987 08:2737
If you are always "diffing" text files, use CMS differences. It seems to
work much better than the DCL command. If I use the command

   $ CMS DIFFERENCES /IGNORE=(FORM_FEEDS, LEADING, TRAILING, SPACING) -
        X. Y. /OUTPUT=Z.

on the two text-strings in your note, I get this differences-listing:

DEC/CMS File Comparison Utility
Files Compared By CRANE On  4-FEB-1987 08:21:16
   (1)  KD2:[CRANE]X.;1
   (2)  KD2:[CRANE]Y.;1




+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 
File KD2:[CRANE]X.;1 Line 1    
      1)aaaaa

File KD2:[CRANE]Y.;1 Line 1    
      2)aaaa
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 


+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 
File KD2:[CRANE]X.;1 Line 3    
      1)ddddd

File KD2:[CRANE]Y.;1 Line 3    
      2)ccccc
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 



**** End of Differences ****

399.2DIFF/MATCH=1SWAMI::LAMIACheap, fast, good -- pick twoWed Feb 04 1987 10:1129
    You can use the /MATCH qualifier in DIFF to get a similar effect.
    I'm not sure about the efficiency (==performance?) though.  CMS
    is probably much better, but it is a layered product, too.
    
    $ dif a.,b./mat=1
************
File $DISK1:[LAMIA]A.;1
    1   aaaaa
    2   bbbbb
******
File $DISK1:[LAMIA]B.;1
    1   aaaa 
    2   bbbbb
************
************
File $DISK1:[LAMIA]A.;1
    3   ddddd
******
File $DISK1:[LAMIA]B.;1
    3   ccccc
************

Number of difference sections found: 2
Number of difference records found: 2

DIFFERENCES /IGNORE=()/MATCH=1/MERGED=1/OUTPUT=$DISK1:[LAMIA]C.;1-
    $DISK1:[LAMIA]A.;1-
    $DISK1:[LAMIA]B.;1
    
399.3try this, tooIOSG::HORSFIELDjakc - the well-known typoThu Feb 05 1987 04:0523
	i have a program written by Anker Berg-Sonne, of sedt fame,
	which might help. he wrote it so that he could copy a large
	file to his rainbow over a slow line, edit it, and then
	send back just the changes to the vax and apply them to
	the original file. (there's another program to do the
	merging).

	anyway, i sometimes find it useful for differences, mainly
	because i find its output is more readable than that from
	vax DIFFERENCES.

	here's the output:

d1:aaaaa 
i1:aaaa 


	here's the filespec:

	iosg::user3:[horsfield.odds]dif.exe
	
	
	jack :^) 
399.4$ difference/parallelISWSW::DOOLITTANThis brain intentionally left blankFri Feb 06 1987 19:4718
-------------------------------------------------------------------------------
File DUA2:[DOOLITTAN]X.X;2             |  File DUA2:[DOOLITTAN]Y.Y;1           
------------------- 1 ------------------------------------- 1 -----------------
aaaaaa                                 |  aaaaa                                
------------------- 3 ------------------------------------- 3 -----------------
dddddd                                 |  cccccc                               
-------------------------------------------------------------------------------

Number of difference sections found: 2
Number of difference records found: 2

DIFFERENCES /IGNORE=()/WIDTH=80/MATCH=1/OUTPUT=DUA2:[DOOLITTAN]TEST.LIS;1-
    /PARALLEL-
    DUA2:[DOOLITTAN]X.X;2-
    DUA2:[DOOLITTAN]Y.Y;1
    

    andy
399.5VIDEO::LEICHTERJJerry LeichterSat Mar 14 1987 11:1031
There are several different file comparison algorithms in widespread use.
No one is uniformally "better" than the others; it's easy to construct examples
that each will do well on while the others will do poorly on.

DIFFERENCES uses an algorithm that's fast, potentially uses a lot of memory,
but works well when changes are mainly insertions and deletions and there
are few duplicated lines.  It doesn't do well when blocks of text are moved
around; as far as DIFFERENCES is concerned, this is a deletion at one position
and an insertion elsewhere.

The CMS algorithm is, I believe, an elaboration of an algorithm used in the
Unix diff command.  It requires more access to the files (each file is scanned
twice), a fair amount of memory independent of the number of differences
between the files (DIFFERENCES uses essentially no memory when the files are
identical), and can potentially track movement of lines, though I don't know
if it does.  There are some bad examples for this algorithm, but I can't recon-
struct them right now.

For an example of a different approach, pick up my DOCCOM tool (in the Tool-
shed).  DOCCOM's algorithm uses a constant, moderate amount of memory indepen-
dent of the input files, reads each file exactly once, and ignores moved
blocks of text.  On the down side, it does not detect deletions, only inser-
tions (hence changes), though you can always reverse the role of "new" and
"old" files; and it's statistical - there is a (presumably quite small)
probability that it will completely miss some changes.  As its name implies,
developed DOCCOM for comparing documents; for typical English text, it does
an excellent job quickly.  Its low memory usage was a major factor on 11's,
where I first developed it.  (The whole program needs some re-munging for
VAXes, but I've never gotten around to doing it - so it works, though not
nearly as well as it might.)
							-- Jerry
399.6CMS not UNIX derivedSOFCAD::KNIGHTDave KnightMon Mar 16 1987 10:213
    The CMS DIFF algorithm is NOT based on the UNIX algorithm; especially
    since I designed the algorithm and since at the time I designed
    it, I hadn't the slightest idea how UNIX did it.
399.7VIDEO::LEICHTERJJerry LeichterTue Mar 24 1987 22:553
re: .6
So, what's the CMS algorithm like?
							-- Jerry