T.R | Title | User | Personal Name | Date | Lines |
---|
1041.1 | | VMSMKT::KENAH | There are no mistakes in Love... | Mon Apr 19 1993 09:40 | 6 |
| Do any of the current standards deal with "special" letters
(That is, those letters beyond the simple 26 used by British
and American English)? For example, �, �, ll, accented vowels,
ligatures, etc. Where do they fit into ordering sequences?
andrew
|
1041.2 | "1812" par l'duc de la terHorst... | TLE::JBISHOP | | Mon Apr 19 1993 09:52 | 14 |
| It's worse than you think.
I saw a library's ordering rules (in a C.S. article).
There were rules about sorting books with titles containing
numbers (spell it out), and books in foreign languages containing
numbers (spell it out in the foreign language), and titles
consisting only of numbers, and books in non-Latin scripts,
and names with prefixes separated by a space (de, von), and
names with prefixes not separated by a space (e.b. terHorst)
and on and on.
If I can remember the article, I'll post a reference.
-John Bishop
|
1041.3 | Was there any life before computers? | KETJE::HAENTJENS | Beware of Counterfeit | Tue Apr 20 1993 05:58 | 18 |
| To .1: Yes, many existing (country) standards on ordering have rules
about symbols beyond the Latin letters A-Z. Just as an example, the
Belgian standard on Alphabetical Ordering (1959!) has rules about the
Dutch ij, the German �, about �, �, �, �, � (with my excuses to people
who are not using Latin-1 for displaying this note), about the symbol
&, about numbers and many other things. Don't forget that alphabetical
ordering is older than computers!
To .1 and .2: Many existing standards have been designed for explaining
to people how they should put things in alphabetical order, very few
take into account what computers can do with some level of efficiency.
The standardization efforts in ISO and CEN concentrate on "lexical" or
"mechanical" ordering. Not only because it is easier for computers, but
also because rules based on knowledge or on linguistics, when used in a
multicultural context, make it more difficult for people to find
entries in a list, instead of making it easier as originally meant.
Ren�.
|
1041.4 | I'm a rational sort | RAGMOP::T_PARMENTER | Human. All too human. | Tue Apr 20 1993 07:32 | 21 |
| Isn't this discussion kind of culturally biased? The CH in Spanish
isn't "treated" as a separate letter in the alphabet; it *is* a
separate letter in the Spanish alphabet, as are LL, RR, and also �.
Spanish children's blocks come in sets of 30, not 26.
Also, in Norwegian the �, �, �, � are not "special" characters, "A with
a ring", "O with a slash"; they are full fledged members of the
alphabet.
Norwegian and Spanish, and Finnish and Hungarian, etc., are languages
with alphabets. So is English.
I once tried to explain the meaning of the word "tilde" to a Spanish
friend, but he just couldn't see the � as an "N with tilde"; it was
just an � to him.
Incidentally, the Spanish sort their alphabet "rationally", with the CH
following the C and the LL following the L, etc. On the other hand,
the Norwegians sort their alphabet "rationally", with the A-Z in the
first 26 places and the others following in "order" at the end.
|
1041.5 | | SMURF::BINDER | Deus tuus tibi sed deus meus mihi | Tue Apr 20 1993 10:26 | 29 |
| Some varieties of internationalization support in UNIX� software such
as DEC OSF/1� use a localization environment variable (a logical for you
VMS types, but it's not quite the same) called LC_COLLATE that controls
how sorting is to be done. I quote from the DEC OSF/1 Guide to
Programming Support Tools:
A character range can include a multicharacter collating element
enclosed within bracket-period delimiters ([. and .]). These
"collating symbols" are necessary for languages that treat some
strings as individual collating elements. For example, in Spanish,
the strings ch and ll each are collating symbols (that is, the
Spanish primary sort order is a, b, c, ch, d,..., k, l, ll, m,
...). The bracket-period delimiters in the RE syntax distinguish
multicharacter collating elements from a list of the individual
characters that make up the element. When using Spanish collation
rules, [[.ch.]] is treated as an RE matching the sequence ch, while
[ch] is treated as an RE matching c or h. In addition, [a-[.ch.]]
matches a, b, c, and ch.
So there is some sanity in the computer world.
-dick
----
� UNIX is a registered trademark of UNIX Systems Laboratories, Inc.
� Open Software Foundation, OSF, OSF/1, OSF/Motif, and Motif are
trademarks of the Open Software Foundation, Inc.
|
1041.6 | | VMSMKT::KENAH | There are no mistakes in Love... | Tue Apr 20 1993 11:20 | 10 |
| >I once tried to explain the meaning of the word "tilde" to a Spanish
>friend, but he just couldn't see the � as an "N with tilde"; it was
>just an � to him.
Makes sense to me -- in English, it would be like trying to explain
Q as "O with a squiggly thing on the bottom." Or "B" as "P with
an extra bump on the side." Nope, they're just "B & Q."
andrew
|
1041.7 | | CALS::DESELMS | Opera r�lz | Tue Apr 20 1993 12:32 | 5 |
| RE: -1
Great example...
- Jim
|
1041.8 | Me too | AUSSIE::WHORLOW | Bushies do it for FREE! | Tue Apr 20 1993 16:32 | 10 |
| G'daym,
Minor rathole ..
There is 'Sans Souci' in Australia... It's a suburb of Sydney...
derek
PS where would that fit in the Sanssouci / SANSSOUCI /Sans-Souci...
scheme?
|
1041.9 | | JIT081::DIAMOND | Pardon me? Or must I be a criminal? | Tue Apr 20 1993 18:56 | 9 |
| Re .5
>>A character range can include a multicharacter collating element
>>enclosed within bracket-period delimiters ([. and .]).
[...]
>>When using Spanish collation rules, [[.ch.]] is treated as an RE
>>matching the sequence ch, while [ch] is treated as an RE matching
>>c or h. In addition, [a-[.ch.]] matches a, b, c, and ch.
How do they do it in a character set that doesn't have [ and ] ?
|
1041.10 | Difference between tilde and squiggly thing | KETJE::HAENTJENS | Beware of Counterfeit | Wed Apr 21 1993 05:10 | 21 |
| Re .4
Of course I'm culturally biased: I have grown up in some specific
culture, how could I be unbiased! But my words "CH is treated as a
letter" were not meant to convey anything negative or a "looking down"
attitude.
Re .4 .6
The difference between � and q is that the first one is treated as n
with tilde outside Spain, whereas no alphabet considers q as o with
squiggly thing, as far as I know.
I mean: 'do�a' is in between 'don' and 'donate' in an English
dictionary, not between 'donsie' and 'doodle' and similarly for other
European language dictionaries. You can also lookup 'ca�on', 'se�or' and
'se�orita'. For those languages that use the Q, it is always a separate
letter. You can also look at it from a historical perspective. The Q
derives straight from the 3000 year old Semitic alphabet, whereas the
tilde is only a few hundred years old and it was at some point in time
added to the N to make a new letter.
Ren�.
|
1041.11 | My rathole or yours? | FORTY2::KNOWLES | DECspell snot awl ewe kneed | Wed Apr 21 1993 06:50 | 20 |
| � You can also look at it from a historical perspective. The Q
�derives straight from the 3000 year old Semitic alphabet, whereas the
�tilde is only a few hundred years old and it was at some point in time
�added to the N to make a new letter.
Indeed. There is a jolly enticing rathole opportunity here: the � was
a medi�val transcription shortcut where there were two NNs in the source
word - cannon =� ca��n. I wonder if this introduced the � as a
free-standing letter which was then used where there was no manuscript
involved and the root had an -NI- (as in se�or and many other cases).
I have some early Spanish texts at home, and will check whether �
co-existed with -ni- for a time. Stop me if I'm boring you...
But whatever the history, the fact now is that for someone from Spain
n and � are wholly discrete. Similarly (not a similar phenomenon, but
a similar lack of historical awareness) a modern Italian will pronounce
PREZZO with a -ts- and MEZZO with a -dz- because that's the right way,
rather than because of Latin PRETIUM and MEDIUM.
b
|
1041.12 | A�other r�thol� | KETJE::HAENTJENS | Beware of Counterfeit | Wed Apr 21 1993 08:52 | 10 |
| ... and to make the issue even more complicated: sometimes letters are
considered to be different, but nevertheless ordered together, at least
in the first ordering level. For example, many French speaking people
will argue that � and � are different letters, but all French
dictionaries consider them equivalent for the first ordering level.
Similarly, u and � are not quite the same in Germany and there are two
ordering methods, one of which considers u and � equivalent for the
first level (- the other method orders � as if it were u+e).
Ren�.
|
1041.13 | | VMSMKT::KENAH | blah blah blah GINGER | Wed Apr 21 1993 11:43 | 15 |
| Is this an accurate synopsis?
1. Different European languages have developed ordering rules that are
internally consistent.
2. You are trying to develop more general ordering rules, rules that
incorporate different language's rules while maintaining internal
consistency as well as consistency with each individual language.
In addition, it sounds like you're trying to make sense between
similar but distinct words and word groupings.
3. Finally, the ordering scheme you develop must be implemented on a
computer, since computers are valuable tools for tasks like ordering.
Do any of the existing standards (ISO, XPG) deal with this topic?
|
1041.14 | | VMSMKT::KENAH | blah blah blah GINGER | Wed Apr 21 1993 14:11 | 4 |
| I re-read .0 and see that it states POSIX compiliant systems support
Multilevel ordering -- which POSIX standard is it a part of?
andrew
|
1041.15 | 9945-2.2 | KETJE::HAENTJENS | Beware of Counterfeit | Thu Apr 22 1993 03:34 | 9 |
| Andrew, your summary in .13 is very good! The only thing which I will
not reach, is consistency with each individual language. This will only
be partial consistency with individual languages.
POSIX is, I believe, ISO/IEC 9945-2.2 Shell and Utilities. The XPG
counterpart can be found in 'X/Open CAE Specification,System Interface
Definitions, Issue 4' ISBN:1-872630-46-4 or X/Open Doc.N� C204.
Ren�.
|
1041.16 | | VMSMKT::KENAH | blah blah blah GINGER | Thu Apr 22 1993 07:21 | 4 |
| Thanks for the POSIX and XPG references -- I'll think I'll check 'em
out (I believe one of my colleagues has a copy of XPG4).
andrew
|
1041.17 | | NOVA::FISHER | DEC Rdb/Dinosaur | Thu Apr 22 1993 07:31 | 33 |
| Q: Different European languages have developed ordering rules that are
internally consistent.
It is my understanding that there are some internal differences. I
think I was told that there are 3 ways of sorting German, one was
called a telephone book sort, another was a diciotnary sort, I forget
the third.
Did .1 say that RR was a different letter in spanish?
While the ordering of
Sans Souci
SANS SOUCI
Sanssouci
SANSSOUCI
Sans-souci
SANS-SOUCI
relative to each other are important, it must also be noted
whether SANSCRIT and sanserif are allowed to interrupt the sequence.
Yet another aside occurs to me: When ordering words with letters
containing diacriticals, most current algorithms -- and therefore
those of VMS SORT and Rdb -- go left to right, for example:
with odering being (I think) e � � � �, one would order a
doublet as ee e� e� �e but we received an inquiry from a
salesman in Canada concerning doing it from right to left
as in: ee �e e� e�. [these actual examples may never occur
but they are the same as, say, bete b�te bet�.]
Wel, enough meandering...
ed
|
1041.18 | Ordering with Sanscrit | KETJE::HAENTJENS | Beware of Counterfeit | Thu Apr 22 1993 09:11 | 17 |
| Re .17
These examples do occur. See my report (filespec in .0). The backwards
check is now part of a Canadian Standard, that's why you got the
inquiry. It cannot be implemented with VMS NCS, but it can be
implemented with POSIX LC_COLLATE. (See earlier reply.)
I had heard about CH, LL and � in Spanish, but not about RR...
The order, in my opinion, should be: SANSCRIT, sanserif, {all forms of
Sans Souci}, santon, SAP.
Re .16
I just read in the NOTED::WORLDWIDE notesfile that there is a document
about XPG4 in I18N::ISE$PUBLIC:[INFO]XPG4_FINAL.PS.
Ren�.
|
1041.19 | %^} | VMSMKT::KENAH | blah blah blah GINGER | Thu Apr 22 1993 14:47 | 4 |
| SANSCRIT would probably wind up somewhere else in American English -
that's because the usual transliteration is SANSKRIT.
andrew
|
1041.20 | let those R's rip | RAGMOP::T_PARMENTER | Human. All too human. | Tue Apr 27 1993 07:22 | 14 |
| RR is a separate letter in Spanish, but, unlike all the other "letters
not in the English alphabet", it never appears in the initial position,
and therefore has no separate heading in the dictionary.
Someone more knowledgeable will have to help me out here so far as what
this means, but the R in an initial position is normally pronounced
like the RR in an interior position, with a trill, while the R in an
interior position gets one "tap", similar to the "dd" in English
"ladder".
Letter names:
C = ce CH = che L = ele LL = elle
N = ene � = e�e R = ere RR = erre
|
1041.21 | ARR, Matey! | CALS::DESELMS | Opera r�lz | Tue Apr 27 1993 07:56 | 6 |
| A "flipped R", is just like a trilled R, except that instead of the tongue
tapping the roof of your mouth a bunch of times, it only taps the roof of
the mouth once. It is indeed exactly the same as "dd" in "ladder".
Pronounce Spanish with an American ARR and they'll laugh in your face.
- Jim
|
1041.22 | | NOVA::FISHER | DEC Rdb/Dinosaur | Thu Apr 29 1993 08:11 | 6 |
| But rr in Spanish also has no special collation rule [that I have
seen].
Is rr collated after rz?
ed
|
1041.23 | | NOTIME::SACKS | Gerald Sacks ZKO2-3/N30 DTN:381-2085 | Thu Apr 29 1993 14:46 | 3 |
| re .20:
It's an alveolar flap.
|
1041.24 | Knuth, of course | TLE::JBISHOP | | Fri Aug 06 1993 12:58 | 7 |
| re .2
See Knuth's _Sorting_and_Searching_ (his volume 3), pp 7..9
for some library sorting rules, e.g. "Ignore initial articles,
unless not in nominative case...".
-John Bishop
|
1041.25 | | VMSMKT::KENAH | I���-I {���} {��^} {^�^} {���} {��} | Fri Aug 06 1993 13:29 | 5 |
| A question came up in another conference -- does Digital support
Cyrillic alphabets?
I'm embarrassed to ask this, because I don't know whether ISO Latin-1
includes Cyrillic alphabets. (We *do* support ISO Latin-1, don't we?)
|
1041.26 | Nope. | SMURF::BINDER | Sapientia Nulla Sine Pecunia | Fri Aug 06 1993 13:41 | 17 |
| Re .25
> I'm embarrassed to ask this, because I don't know whether ISO Latin-1
> includes Cyrillic alphabets.
It doesn't.
Producing International Products -- Software handbook
(Identification Number A-MN-ELEN467-00-0 Rev B)
...says this:
The ISO Latin Alphabet No. 1 has been developed by the International
Organization for Standards (ISO) as the standard character set for the
Western European languages. It will eventually supersede the DEC
Multinational Character Set. Further ISO character sets are being
developed to cover European languages not based on the Latin Alphabet.
|
1041.27 | | VMSMKT::KENAH | I���-I {���} {��^} {^�^} {���} {��} | Fri Aug 06 1993 14:10 | 7 |
| Thanks.
So: does Digital support Cyrillic alphabets?
Also: Does Digital support ISO Latin-1?
andrew
|
1041.28 | | REGENT::BROOMHEAD | Don't panic -- yet. | Fri Aug 06 1993 14:21 | 8 |
| ISO Latin-1 is Digital's default character set -- so, yes, we support
it.
ISO Latin-Cyrillic (ISO 8859-5 (which is not ISO Latin-5)) is provided
on a few of our printers (dot matrix ones) and can be added via a
cartridge on our ANSI laser printers. So, yes, we support it.
Ann B.
|
1041.29 | | VMSMKT::KENAH | I���-) (���) {��^} {^�^} {���} /��\ | Fri Aug 06 1993 15:27 | 9 |
| Thank you, Ann. I didn't realize ISO Latin-1 was our default,
although (based on Dick's description) it's obvious.
How about Cyrillic support at the user-interface level?
andrew
P.S. I'm tracking this question through another path within Digital;
should I get an expanded answer, I'll post it here.
|
1041.30 | | NRSTA2::KALIKOW | Supplely Chained | Fri Aug 06 1993 15:40 | 5 |
| Hey andrew -- Keep us posted on whether you get the answer thru
"official" or "other" channels faster than this employee-interest
notesfile... It'd be great if we could get you out of the BOX
faster... :-)
|
1041.31 | | ISTWI1::KINACI | Walk thru this world | Mon Aug 09 1993 06:47 | 16 |
| I think Cyrillic is ISO-Latin 2 is it not?
I know there is some Cyrillic support out there and there is more to
come once the Fonts acquired from Monotype go into distribution.
I've been informed that we will have a wide scale test for the various
fonts. I will be working on testing ISO-Latin 5 for Turkey, for example.
I know that there is a Cyrillic version of DECterm. Hold on, I am not
sure if we are talking full UI localization or if there is just character
set support. But the latter definitely exists. I know there was work
being done to get EPROMs which support Cyrillic for VT420 type terminals.
I believe this has been completed. I also know that the Cyrillic version
of ALL-IN-1 V3.0 should be shipping soon.
Suz
|
1041.32 | | VMSMKT::KENAH | I���-) (���) {��^} {^�^} {���} /��\ | Mon Aug 09 1993 07:03 | 6 |
| So far, the clear winner is through Employee-Interest conferences;
Of course the informal channels have given me pointers to more
formal channels, so the lines are getting blurred.
Of course without the informal channels, I never would have found
the formal channels...
|
1041.33 | Who can answer Andrew's question? | REGENT::BROOMHEAD | Don't panic -- yet. | Mon Aug 09 1993 10:49 | 14 |
| Suz,
Nope, it's ISO Latin-Cyrillic, with no number in sight.
Andrew,
"How about Cyrillic support at the user-interface level?"
I can't answer that. All I can tell you is I have the Cyrillic fonts
from Monotype that Suzan mentioned, but I don't know who is to pay
to make them into cartridges or soft fonts, or even which fonts (type-
faces) I should concentrate on.
Ann B.
|
1041.34 | | ISTWI1::KINACI | Walk thru this world | Mon Aug 09 1993 12:24 | 20 |
| Hi Ann!
Nice to run into you here.
RE the fonts. You probably know that Israel is going to be running a
Fonts Q.A. Project in early September, where we will all get to test
our own fonts. I suspect that will be when we will get a broader picture
of what is out there.
As for who pays... well.. I am told by very reliable sources that
corporate will pay for the internationalization of products deemed
necessary by the involved subsidiaries, starting in FY '94. We've
submitted a prioritized list of what we need, and as far as I know
the funding discussions should be well under way at this time. Past
experience indicates that it will be the beginning of Calendar year
1994 before we see much of anything.
I hear all this will change come FY'95.. Keep your fingers crossed!
Suz
|
1041.35 | | 4GL::LASHER | Working... | Tue Aug 10 1993 06:50 | 4 |
| While y'all are looking into this, could you also check to see whether
DECwindows supports Orthodox icons?
Lew Lasher
|
1041.36 | Spanish Alphabetical Order Simplified | REGENT::BROOMHEAD | Don't panic -- yet. | Mon May 02 1994 11:26 | 45 |
| <<< NOTED::DISK$NOTES7:[NOTES$LIBRARY_7OF4]WORLDWIDE.NOTE;2 >>>
-< Worldwide -- International Product Issues >-
================================================================================
Note 525.0 Change in Spanish collating rules No replies
R2ME2::HINXMAN "It's waiting for it that's so tryin" 39 lines 2-MAY-1994 07:58
--------------------------------------------------------------------------------
Days in dictionary numbered for two in Spanish alphabet
=======================================================
Associated Press (Boston Globe 1994-05-01)
MADRID - The world's more than 300 million Spanish speakers now have
two fewer letters in their alphabet to worry about, a mostly bookkeeping move
that won almost unanimous support but disturbed some traditionalists.
The Association of Spanish Language Academies, meeting in Madrid for
its 10th annual congress, voted last week to eliminate the "Ch" an "Ll" from
the Spanish alphabet.
The two letters, which historically have had their own separate
headings in dictionaries, now will be listed under other letters. Words
beginning with "Ch", like "chico", will fall under the letter "C", and words
beginning with "Ll", like "llama", will fall under the letter "L".
The move does not change pronunciation, usage or spelling. It was
made mainly to simplify dictionaries and make Spanish more computer-
compatible with English.
Pushing for the change was Spain, a member of the 12-nation European
Union. The EU has urged its members to implement measures that aid
translation and computer standardization.
Cuban delegate Luisa Campuzano said he favored the change "because it
means that dictionaries will be easier to use. But arguments related to the
European Union shouldn't be brought up. Our talks are along scientific lines
and nothing more."
The vote Wednesday was 17 in favor, one opposed and three abstaining.
Ecuador voted "no" and Panama, Nicaragua and Ecuador abstained.
"It's not that the letters are disappearing, they're just being put
in a different place in the dicitionary," said a Madrid artist, Maria Gato.
"I don't think most people are upset."
Guatemala supported the change, but one Guatemalan delegate, Mario
Alberto Carrera, referred to the simplification as "killing" part of the
language.
"The two letters have succumbed to the dictates of the market and the
Anglo-Saxon world," Carrera said.
Some dictionaries, including the highly respected Maria Moliner, had
already made the change.
The Spanish alphabet now has 27 letters - the 26 contained in the
alphabet plus a stylized "n".
|
1041.37 | | NOVA::FISHER | Tay-unned, rey-usted, rey-ady | Thu May 05 1994 07:46 | 9 |
| aye, the contrariness of it all....
One of th efun parts of "internationalizing Rdb" was to assure that
"c*" did not MATCH "chxyz" when SPanish was the collating sequence
in use.
Drat!
ed
|
1041.38 | | JIT081::DIAMOND | $ SET MIDNIGHT | Mon May 16 1994 02:47 | 10 |
| Re .36
> "The two letters have succumbed to the dictates of the market and the
>Anglo-Saxon world," Carrera said.
Cute opinion. Has the Library of Congress changed their lexicography
to consider Mc as Mc instead of as Mac? If they did or will, they're
succumbing to the dictates of the market and the Spanish world.
-- Norman Diamond
|