T.R | Title | User | Personal Name | Date | Lines |
---|
399.1 | Use CMS DIFFERENCES instead | ULTRA::CRANE | Olorin I was in the West that is forgotten... | Wed Feb 04 1987 08:27 | 37 |
| If you are always "diffing" text files, use CMS differences. It seems to
work much better than the DCL command. If I use the command
$ CMS DIFFERENCES /IGNORE=(FORM_FEEDS, LEADING, TRAILING, SPACING) -
X. Y. /OUTPUT=Z.
on the two text-strings in your note, I get this differences-listing:
DEC/CMS File Comparison Utility
Files Compared By CRANE On 4-FEB-1987 08:21:16
(1) KD2:[CRANE]X.;1
(2) KD2:[CRANE]Y.;1
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
File KD2:[CRANE]X.;1 Line 1
1)aaaaa
File KD2:[CRANE]Y.;1 Line 1
2)aaaa
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
File KD2:[CRANE]X.;1 Line 3
1)ddddd
File KD2:[CRANE]Y.;1 Line 3
2)ccccc
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
**** End of Differences ****
|
399.2 | DIFF/MATCH=1 | SWAMI::LAMIA | Cheap, fast, good -- pick two | Wed Feb 04 1987 10:11 | 29 |
| You can use the /MATCH qualifier in DIFF to get a similar effect.
I'm not sure about the efficiency (==performance?) though. CMS
is probably much better, but it is a layered product, too.
$ dif a.,b./mat=1
************
File $DISK1:[LAMIA]A.;1
1 aaaaa
2 bbbbb
******
File $DISK1:[LAMIA]B.;1
1 aaaa
2 bbbbb
************
************
File $DISK1:[LAMIA]A.;1
3 ddddd
******
File $DISK1:[LAMIA]B.;1
3 ccccc
************
Number of difference sections found: 2
Number of difference records found: 2
DIFFERENCES /IGNORE=()/MATCH=1/MERGED=1/OUTPUT=$DISK1:[LAMIA]C.;1-
$DISK1:[LAMIA]A.;1-
$DISK1:[LAMIA]B.;1
|
399.3 | try this, too | IOSG::HORSFIELD | jakc - the well-known typo | Thu Feb 05 1987 04:05 | 23 |
| i have a program written by Anker Berg-Sonne, of sedt fame,
which might help. he wrote it so that he could copy a large
file to his rainbow over a slow line, edit it, and then
send back just the changes to the vax and apply them to
the original file. (there's another program to do the
merging).
anyway, i sometimes find it useful for differences, mainly
because i find its output is more readable than that from
vax DIFFERENCES.
here's the output:
d1:aaaaa
i1:aaaa
here's the filespec:
iosg::user3:[horsfield.odds]dif.exe
jack :^)
|
399.4 | $ difference/parallel | ISWSW::DOOLITTAN | This brain intentionally left blank | Fri Feb 06 1987 19:47 | 18 |
| -------------------------------------------------------------------------------
File DUA2:[DOOLITTAN]X.X;2 | File DUA2:[DOOLITTAN]Y.Y;1
------------------- 1 ------------------------------------- 1 -----------------
aaaaaa | aaaaa
------------------- 3 ------------------------------------- 3 -----------------
dddddd | cccccc
-------------------------------------------------------------------------------
Number of difference sections found: 2
Number of difference records found: 2
DIFFERENCES /IGNORE=()/WIDTH=80/MATCH=1/OUTPUT=DUA2:[DOOLITTAN]TEST.LIS;1-
/PARALLEL-
DUA2:[DOOLITTAN]X.X;2-
DUA2:[DOOLITTAN]Y.Y;1
andy
|
399.5 | | VIDEO::LEICHTERJ | Jerry Leichter | Sat Mar 14 1987 11:10 | 31 |
| There are several different file comparison algorithms in widespread use.
No one is uniformally "better" than the others; it's easy to construct examples
that each will do well on while the others will do poorly on.
DIFFERENCES uses an algorithm that's fast, potentially uses a lot of memory,
but works well when changes are mainly insertions and deletions and there
are few duplicated lines. It doesn't do well when blocks of text are moved
around; as far as DIFFERENCES is concerned, this is a deletion at one position
and an insertion elsewhere.
The CMS algorithm is, I believe, an elaboration of an algorithm used in the
Unix diff command. It requires more access to the files (each file is scanned
twice), a fair amount of memory independent of the number of differences
between the files (DIFFERENCES uses essentially no memory when the files are
identical), and can potentially track movement of lines, though I don't know
if it does. There are some bad examples for this algorithm, but I can't recon-
struct them right now.
For an example of a different approach, pick up my DOCCOM tool (in the Tool-
shed). DOCCOM's algorithm uses a constant, moderate amount of memory indepen-
dent of the input files, reads each file exactly once, and ignores moved
blocks of text. On the down side, it does not detect deletions, only inser-
tions (hence changes), though you can always reverse the role of "new" and
"old" files; and it's statistical - there is a (presumably quite small)
probability that it will completely miss some changes. As its name implies,
developed DOCCOM for comparing documents; for typical English text, it does
an excellent job quickly. Its low memory usage was a major factor on 11's,
where I first developed it. (The whole program needs some re-munging for
VAXes, but I've never gotten around to doing it - so it works, though not
nearly as well as it might.)
-- Jerry
|
399.6 | CMS not UNIX derived | SOFCAD::KNIGHT | Dave Knight | Mon Mar 16 1987 10:21 | 3 |
| The CMS DIFF algorithm is NOT based on the UNIX algorithm; especially
since I designed the algorithm and since at the time I designed
it, I hadn't the slightest idea how UNIX did it.
|
399.7 | | VIDEO::LEICHTERJ | Jerry Leichter | Tue Mar 24 1987 22:55 | 3 |
| re: .6
So, what's the CMS algorithm like?
-- Jerry
|