[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference noted::hackers_v1

Title:	-={ H A C K E R S }=-
Notice:	Write locked - see NOTED::HACKERS
Moderator:	DIEHRD::MORRIS

Created:	Thu Feb 20 1986
Last Modified:	Mon Aug 03 1992
Last Successful Update:	Fri Jun 06 1997
Number of topics:	680
Total number of notes:	5456

399.0. "DIFF of VMS not efficient" by LEROUF::GENTILI (franco GENTILI AEG Valbonne ) Wed Feb 04 1987 05:57

 Hi, 

 I am looking for something a program or an idea which would permit
 to do a perfect comparison between two files,such as the DIFF of VMS
 but the DIFFERENCES in VMS is not enough efficient for example in
 the following case :

 You ve A.TXT which is  aaaaa     
                        bbbbb
			ddddd   

 and    B.TXT which is  aaaa 
                        bbbbb   	
                        ccccc

As you can see the diff between the 2 files are 1st line in B.TXT as 1 "a"
less than A.TXT and last line of B.TXT is ccccc instead of ddddd.

If you do a DIFF a.txt b.txt you can't identify what were the diff..

even if you try some switches like /CHANGE etc...

If you have some ideas, it would be great.

Franc.

T.R	Title	User	Personal Name	Date	Lines
399.1	Use CMS DIFFERENCES instead	ULTRA::CRANE	Olorin I was in the West that is forgotten...	`Wed Feb 04 1987 08:27`	37
	If you are always "diffing" text files, use CMS differences. It seems to work much better than the DCL command. If I use the command $ CMS DIFFERENCES /IGNORE=(FORM_FEEDS, LEADING, TRAILING, SPACING) - X. Y. /OUTPUT=Z. on the two text-strings in your note, I get this differences-listing: DEC/CMS File Comparison Utility Files Compared By CRANE On 4-FEB-1987 08:21:16 (1) KD2:[CRANE]X.;1 (2) KD2:[CRANE]Y.;1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + File KD2:[CRANE]X.;1 Line 1 1)aaaaa File KD2:[CRANE]Y.;1 Line 1 2)aaaa - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + File KD2:[CRANE]X.;1 Line 3 1)ddddd File KD2:[CRANE]Y.;1 Line 3 2)ccccc - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ** End of Differences **
399.2	DIFF/MATCH=1	SWAMI::LAMIA	Cheap, fast, good -- pick two	`Wed Feb 04 1987 10:11`	29
	You can use the /MATCH qualifier in DIFF to get a similar effect. I'm not sure about the efficiency (==performance?) though. CMS is probably much better, but it is a layered product, too. $ dif a.,b./mat=1 ********** File $DISK1:[LAMIA]A.;1 1 aaaaa 2 bbbbb ** File $DISK1:[LAMIA]B.;1 1 aaaa 2 bbbbb ******** ******** File $DISK1:[LAMIA]A.;1 3 ddddd ** File $DISK1:[LAMIA]B.;1 3 ccccc ********** Number of difference sections found: 2 Number of difference records found: 2 DIFFERENCES /IGNORE=()/MATCH=1/MERGED=1/OUTPUT=$DISK1:[LAMIA]C.;1- $DISK1:[LAMIA]A.;1- $DISK1:[LAMIA]B.;1
399.3	try this, too	IOSG::HORSFIELD	jakc - the well-known typo	`Thu Feb 05 1987 04:05`	23
	i have a program written by Anker Berg-Sonne, of sedt fame, which might help. he wrote it so that he could copy a large file to his rainbow over a slow line, edit it, and then send back just the changes to the vax and apply them to the original file. (there's another program to do the merging). anyway, i sometimes find it useful for differences, mainly because i find its output is more readable than that from vax DIFFERENCES. here's the output: d1:aaaaa i1:aaaa here's the filespec: iosg::user3:[horsfield.odds]dif.exe jack :^)
399.4	$ difference/parallel	ISWSW::DOOLITTAN	This brain intentionally left blank	`Fri Feb 06 1987 19:47`	18
	------------------------------------------------------------------------------- File DUA2:[DOOLITTAN]X.X;2 \| File DUA2:[DOOLITTAN]Y.Y;1 ------------------- 1 ------------------------------------- 1 ----------------- aaaaaa \| aaaaa ------------------- 3 ------------------------------------- 3 ----------------- dddddd \| cccccc ------------------------------------------------------------------------------- Number of difference sections found: 2 Number of difference records found: 2 DIFFERENCES /IGNORE=()/WIDTH=80/MATCH=1/OUTPUT=DUA2:[DOOLITTAN]TEST.LIS;1- /PARALLEL- DUA2:[DOOLITTAN]X.X;2- DUA2:[DOOLITTAN]Y.Y;1 andy
399.5		VIDEO::LEICHTERJ	Jerry Leichter	`Sat Mar 14 1987 11:10`	31
	There are several different file comparison algorithms in widespread use. No one is uniformally "better" than the others; it's easy to construct examples that each will do well on while the others will do poorly on. DIFFERENCES uses an algorithm that's fast, potentially uses a lot of memory, but works well when changes are mainly insertions and deletions and there are few duplicated lines. It doesn't do well when blocks of text are moved around; as far as DIFFERENCES is concerned, this is a deletion at one position and an insertion elsewhere. The CMS algorithm is, I believe, an elaboration of an algorithm used in the Unix diff command. It requires more access to the files (each file is scanned twice), a fair amount of memory independent of the number of differences between the files (DIFFERENCES uses essentially no memory when the files are identical), and can potentially track movement of lines, though I don't know if it does. There are some bad examples for this algorithm, but I can't recon- struct them right now. For an example of a different approach, pick up my DOCCOM tool (in the Tool- shed). DOCCOM's algorithm uses a constant, moderate amount of memory indepen- dent of the input files, reads each file exactly once, and ignores moved blocks of text. On the down side, it does not detect deletions, only inser- tions (hence changes), though you can always reverse the role of "new" and "old" files; and it's statistical - there is a (presumably quite small) probability that it will completely miss some changes. As its name implies, developed DOCCOM for comparing documents; for typical English text, it does an excellent job quickly. Its low memory usage was a major factor on 11's, where I first developed it. (The whole program needs some re-munging for VAXes, but I've never gotten around to doing it - so it works, though not nearly as well as it might.) -- Jerry
399.6	CMS not UNIX derived	SOFCAD::KNIGHT	Dave Knight	`Mon Mar 16 1987 10:21`	3
	The CMS DIFF algorithm is NOT based on the UNIX algorithm; especially since I designed the algorithm and since at the time I designed it, I hadn't the slightest idea how UNIX did it.
399.7		VIDEO::LEICHTERJ	Jerry Leichter	`Tue Mar 24 1987 22:55`	3
	re: .6 So, what's the CMS algorithm like? -- Jerry