T.R | Title | User | Personal Name | Date | Lines |
---|
9965.1 | Decimal point is a decimal point? | MARVIN::GOUGH | Raoul Gough | Wed May 28 1997 10:26 | 10 |
|
Looks to me like sort doesn't work with "." as a field separator (maybe
because it's a decmial point?) Anyway, you could work around it like
this:
sed 's/\./:/g' y.y | sort -t: -k n1 -k n2 -k n3 -k n4 | sed 's/:/./g'
(Couldn't resist looking at this one, I musn't be busy enough)
Ray.
|
9965.2 | yep - it is confused | SMURF::WENDY | | Wed May 28 1997 11:44 | 12 |
| I was curious too and agree that the problem is that sort is getting
confused trying to do a numeric sort, which relies on the decimal/radix
symbol and your use of the '." as the separator. If, for example, you
set LANG=es_ES.ISO8859-1, the Spanish locale that has a ',' as a radix
symbol and use your sort command as is you get the expected results.
I think you will need to process the file to change the separator or
ues a different locale - but make sure you know what the decimal point
is. You can tell by running the command "locale -k LC_NUMERIC".
wendy
UEG/I18N
|
9965.3 | It's not the decimal point (exactly)! | TEACH::SMITTY | Daylight come an' me wan' go home! | Wed May 28 1997 14:05 | 59 |
| The problem is not with the decimal point -- at least not as described
in the previous replies. The command you are attempting is actually a
victim of two problems. The first is the periods in the numbers. A
numeric value is not allowed to have more than one period in it. From
the man pages for sort:
"-n [XPG4-UNIX] Sorts any initial numeric strings (including regular
expressions consisting of optional spaces, optional dashes, and
zero (0) or more digits with optional radix character and thousands
separator, as defined by the current locale) by arithmetic value.
An empty digit string is treated as zero; leading zeros and signs
on zeros do not affect ordering. Only one period (.) can be used
in numeric strings. All subsequent periods (.) and any character
to the right of the period (.) will be ignored."
The second problem is the way the keys are specified. The notation
"-k1n" describes a key that starts with the first word on the line and
ends with the last character. Again from the sort man pages:
"[XPG4-UNIX] The format of a key field definition is as follows:
field_start[type][,field_end[type]]
where the field_start and field_end arguments define a key field
that is restricted to a portion of the line, and type is a modifier
specified by b, d, f, i, n, or r...
...A missing field_end argument means the last character of the
line."
The unfortunate combination of these two things is the cause of the
strange behavior that we are all seeing. I'm guessing here, but I
suspect that sort processes the first key numerically, assumes all the
periods it sees are part of the number, ignores everything from the
second period to the end of the line, orders all the records as it
found them (since they have all been evaluated as identical 168.132
values), and somehow never correctly identifies the existence of the
other keys.
By isolating each field, however, using the XPG4-UNIX syntax
"-k<start>,<end>", the problems go away. Here's an example from my
system:
ahem- cat numbers.txt
168.132.11.1
168.132.11.25
168.132.11.3
168.132.128.28
168.132.13.10
168.132.13.2
ahem- sort -t . -k1,1n -k2,2n -k3,3n -k4,4n numbers.txt
168.132.11.1
168.132.11.3
168.132.11.25
168.132.13.2
168.132.13.10
168.132.128.28
Regards,
Bill
|
9965.4 | It's doing what the spec says | WIBBIN::NOYCE | Pulling weeds, pickin' stones | Wed May 28 1997 19:29 | 13 |
| > and somehow never correctly identifies the existence of the
> other keys.
No, you don't have to assume that it forgets the other keys.
Instead, it sorts the file as if it looks like
168.132 132.11 11.1 1
168.132 132.11 11.25 25
168.132 132.11 11.3 3
168.132 132.128 128.28 28
168.132 132.13 13.10 10
168.132 132.13 13.2 2
|
9965.5 | | BIGUN::nessus.cao.dec.com::Mayne | Meanwhile, back on Earth... | Wed May 28 1997 20:21 | 4 |
| So what does the spec say about using "." as a separator? (Hey, it's not *my*
use of the "." that's stuffing things up. 8-)
PJDM
|
9965.6 | | BIGUN::nessus.cao.dec.com::Mayne | Meanwhile, back on Earth... | Thu May 29 1997 19:53 | 18 |
| If sort works as in .4, and "all subsequent periods (.) and any character to the
right of the period (.) will be ignored" (according to the man page), how is the
following explained?
# cat a.a
1.2
1.22
1.3
# sort -t . -k 2n a.a
1.2
1.3
1.22
# sort -t . -k 1n -k 2n a.a
1.2
1.22
1.3
PJDM
|
9965.7 | | IOSG::MARSHALL | | Fri May 30 1997 09:43 | 28 |
| >> how is the following explained?
The order of keys is significant. Sorting on a later key won't undo the
ordering obtained by earlier sorts. The sorting done by the later key only
affects consecutive records where the previous keys had identical values.
From the sort(1) man page:
When there are multiple key fields, later keys are compared only after
all earlier keys compare as equal.
This is generally desirable behaviour, enabling you to sort, for example,
forenames within surnames, or in this case subnets within a network.
As the -k1n had sorted the (whole) of each record numerically, and all were
distinct, the -k2n effectively did nothing. As a previous reply observed, to
obtain the sorting required by the basenoter you have to tell the sort command
where each key value ends wrt the defined separator, eg -k1,1n.
So -k1n means: the key starts with the first field and extends to the end of
the line; a '.' within this range will be treated as decimal
point, not as a field separator
-k1,1n means: the key field starts and ends with the first field, enabling
the use of '.' as field separator to override the use of '.'
as a decimal point within a numeric field.
Scott
|