[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::decc_bugs

Title:DEC C Problem Reporting Forum
Notice:Report DEC C++ problems in TURRIS::C_PLUS_PLUS
Moderator:CXXC::REPETETCHEON
Created:Fri Nov 13 1992
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1299
Total number of notes:6249

1291.0. "When isdigit() not a digit? When it is �." by CARDHU::HILL (Mike Hill, Zuerich Switzerland) Wed Apr 30 1997 15:17

I have a problem with DECC on OpenVMS V6.2.  The isdigit() macro does not
do what I expect in certain cases.  The ctype.h definition of isdigit()
along with all other is*() routines does not mask off the high order bits
in certain cases (depends on various #defines).

One definition of is*() in ctype.h does this, the other does not.

The DEC C V5.5-003 compiler on OpenVMS Alpha V6.2-1H3 has this problem.

The program follows:

	#include stdio
	#include ctype
	main()
	{
	printf("%d\n",isdigit('�'));
	}

The decimal value of '�' is 201.  Since this is a char, the C compiler uses
the value -55 (signed bytes).

The .I file shows:

	main()
	{
	printf("%d\n",((decc$$gl___ctypea)?(*decc$$ga___ctypet)
	              [(int)('�')]&0x4:isdigit('�')));
	}                ^    ^
	                 |    |
	   +-------------+    |
	   |                  |
	   |     To work, we need (('�')&0xff) here
	   |
	   +---------- or (unsigned char) here

Looking at ctype.h (taken from SYS$SHARE:DECC$RTLDEF.TLB) it looks like the
following definition is not taking signed chars into account:

	#   if ( ( defined(__DECC) || defined(__DECCXX) ) && !defined(__VAXC) )
	#      define __ISFUNCTION(c,p)  \
	              (__ctypea?__ctypet[(int)(c)]&__##p:__IS##p(c))
	#   else
	#      define __ISFUNCTION(c,p)  \
	              (__ctypea?__ctypet[(int)(c)]&__/**/p:__IS/**/p*
	#   endif
	#   define isdigit(c)  __ISFUNCTION(c, DIGIT)

Instead of (c) in both lines above, I would like to see ((c)&0xff) or instead
of (int) I would hope for (unsigned char).

Looking again at ctype.h, the other definition of isdigit() looks like:

	#   define isdigit(c)  (__ctype [(c) & 0xFF] & _D)
	                                        ^
	                                        |
	                             Note masking to fix -ve chars

On machines where bytes are signed, with this ctype.h, the program won't work
as I would like.  On machines with unsigned bytes it will (or if the ctype.h
#defines pick the other definition).

I have read DECC_BUGS note 1125, and think that answer .2 implies that this is
expected, and the &0xff will be done for all cases.  Maybe it was missed in
this specific case, or maybe it is intentional.

Another 'workaround' would be to have 128 dummy table entries before the
__ctypet[] table where all are set to zero.  That would at least not require
the ctype.h to change.

Any input would be appreciated.

[Mike.Hill]
T.RTitleUserPersonal
Name
DateLines
1291.1SPECXN::DERAMODan D'EramoWed Apr 30 1997 15:5133
>The isdigit() macro does not do what I expect in certain cases.
        
        If you want your code to be portable then you need to lower
        your expectations. :-)  As topic 1125 said, it is up to the
        programmer to make sure the argument to isdigit is either EOF
        or one of { 0, 1, 2, ..., UCHAR_MAX }.  
        
>1125.1        	If the argument has any other value, the behavior is
>        	undefined.
        
        That means you must not use isdigit('�') in your program,
        although you can use isdigit((unsigned char)'�').
        
        One probably runs into this problem more often after
        
        	char *s;
        	int i;
        
        by trying to use isdigit(*s) or isdigit(s[i]) despite the
        possibility that *s or s[i] may not be in the domain of
        isdigit().
        
        The DEC C team can and from time to time does change <ctype.h>
        or the functions declared in it to cover for this mistake in
        the code.  This papers over a real problem in the code that
        could be uncovered again on the next system the code is ported
        to.  That other vendor might not be as responsive to customer
        requests as Digital is.
        
        You might try to compile with /UNSIGNED_CHAR until the program
        can be changed or a new header file is released.
        
        Dan
1291.2TLE::D_SMITHDuane Smith -- DEC C RTLWed Apr 30 1997 16:562
    One could also argue that signed characters are only useful to
    represent 7-bit ascii characters which � is not.
1291.3"Undefined" can also return "expected results"CARDHU::HILLMike Hill, Zuerich SwitzerlandFri May 02 1997 03:3620
    I fully agree with .1, but want to point out that programs which worked
    fine on previous versions of VMS will now fail in unexpected ways
    because different versions of idigit() are being used.
    
    This has caused customed dissatisfaction at an important customer, and
    there is no need for this.  Since it is undefined what will happen if
    a character outside the allowed range is given - why not just return
    the correct value?  This is what all previous versions of VAXC and DECC
    have done.
    
    Either by changing the idigit() macro or copying the 128 entries covering
    values 128-255 in front of the table.
    
    It is difficult to sell this as a coding error (although that is what
    it is) because compilers on other hardware the customer has don't have
    this problem.
    
    If it will help, I'll open an IPMT offering this as a suggestion.
    
    [Mike.Hill]
1291.4WIBBIN::NOYCEPulling weeds, pickin&#039; stonesFri May 02 1997 09:317
>    It is difficult to sell this as a coding error (although that is what
>    it is) because compilers on other hardware the customer has don't have
>    this problem.

Most likely that's because the other hardware defaults to unsigned char.
Your customer might prefer to compile with /UNSIGNED_CHAR on VMS, to improve
compatibility with this other hardware.