[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference hydra::axp-developer

Title:Alpha Developer Support
Notice:[email protected], 800-332-4786
Moderator:HYDRA::SYSTEM
Created:Mon Jun 06 1994
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:3722
Total number of notes:11359

3363.0. "Multipath Corporation" by HYDRA::BRYANT () Wed Mar 19 1997 16:36

    Company Name :  Multipath Corporation
    Contact Name :  Ron Young
    Phone        :  (702) 831-4400 
    Fax          :  (702) 831-4401
    Email        :  [email protected]
    Date/Time in :  19-MAR-1997 16:36:07
    Entered by   :  Pat Bryant
    SPE center   :  MRO

    Category     :  UNIX
    OS Version   :  
    System H/W   :  


    Brief Description of Problem:
    -----------------------------

Please contact Bill Desimone (508) 467-2394 of DEC about our account.

Company Name: Multipath Corporation
Customer Code Number: 992344
>From:	US6RMC::"[email protected]" "Ron Young"   18-MAR-1997 19:22:21.57
>To:	<hdlite::axpdeveloper>
>CC:	"Dwight Manley" <nicctr::manley>, "Randy Doering" <wbc::doering>
>Subj:	Porting to DEC UNIX 4.0b
>
>We are currently testing the port of our application on DEC UNIX 4.0B.
>This is an application that has worked properly under version 3.2.
>
>The application performs parallel processing using calls to the pthread
>library.  We have changed these calls where appropriate to match the man
>pages provided with 4.0B.  In fact these calls now match the pthread
>routines we use for IBM and SUN.
>
>When we run the new version under 4.0B, we have observed some "strange"
>behavior.  Although we do not know exactly what the problem is, we believe
>it to be one of the following:
>
>1) Using wrong compiler switches
>
>2) Linking against the wrong libraries
>
>3) Not using the __MB properly to update memory. (Most likely)
>
>1) Compiler switches:
>The compiler switches we are using are:
>
>f77 -r8 -i8 -automatic -tune ev5 -fast -pthread
>
>For C routines not containing the __MB call:
>cc  -tune ev5 -pthread
>
>For C routines containing the __MB call:
>cc  -tune ev5 -pthread -migrate -O4 -assume noaccuracy_sensitive
>
>2) Linking:
>We have options for building the application with and without a shared
>library.  The following lists the scripts used for both cases:
>
>As a shared library:
>   ld           \
>   -shared      \
>   -o fmslib.so \
>   -all         \
>      fmslib.a  \
>   -none        \
>   fmsint.a     \
>   blas.a       \
>   -lpthread -lmach -lexc -lfor -lUfor -lots -lm -lc_r -lc \
>   -set_version fmslib.51
>#
>#  Now link the DEMO_share application using fmslib.so:
>   f77 -call_shared -o DEMO_share   \
>       demo.o                  \
>       fmsnoshr.a fmslib.so    \
>       -lpthread -lmach -lexc -lc
>
>
>Using archives directly:
>   f77 -o DEMO_noshare         \
>       demo.o                  \
>       fmsnoshr.a fmslib.a fmsint.a fmslib.a \
>       blas.a                  \
>       -lpthread -lmach -lexc -lc
>
>3)Using __MB to update memory.
>Our application is written so that whenever a thread changes shared memory
>under the protection of a mutually exclusive lock, it issues the statement
>
>__MB();
>
>before releasing the lock (code is in c).  The function __MB is typed as
>
>void __MB(void);
>
>The c routine containing this instruction is compiled using the -migrate
>switch.  It was our understanding and experience that this was necessary to
>flush and update the cache when a thread changed a value in shared memory
>that might be read later by another thread.  Under 3.2 this seems to work
>properly.  Under 4.0B, using the same procedure, we have observed threads
>reading old values AFTER the __MB instruction has been issued.
>
>Has the procedure for updating cache changed under 4.0B?
>
>
>
>In addition to __MB not working, we observe is the following:
>1) When DEMO_share or DEMO_noshare are run, and BEFORE they call
>pthread_create to start threads, a ps -m shows the following:
>
>decunix> ps -m
>  PID TTY      S           TIME CMD
>  614 ttyp1    I        0:02.61 -csh (csh)
> 1040 ttyp1    I  +     0:02.72 ./DEMO_share
>               I        0:00.22
>               I        0:00.00
>               I        0:00.05
>               I        0:02.45
>  651 ttyp2    I  +     0:00.69 -csh (csh)
> 1030 ttyp4    S        0:00.25 -csh (csh)
>decunix>
>
>It would appear that there are 3 additional threads besides DEMO_share.
>These were not present under version 3.2.  What are they?
>
>
>
>2) When we run the non shared version on a workstation, DEMO_noshare, we
>get an error
>
>forrtl. severe (41): insufficient virtual memroy
>
>as soon as the application starts.  However if we run it on a larger
>server, it gets past this point.
>
>What system resources do we need to adjust on the workstation?
>
>
>Thanks, Ron Young  
>
>+---------------------------------+----------------------------------+
>| Ron Young                       | Phone:  (702) 831-4400           |
>| Multipath Corporation           | FAX:    (702) 831-4401           |
>| P.O. Box 8210                   | E-mail: [email protected]           |
>| Incline Village, NV 89452-8210  | See Multipath's home page at     |
>| U.S.A.                          | http://www.fmslib.com            |
>+---------------------------------+----------------------------------+
>
>

T.RTitleUserPersonal
Name
DateLines
3363.1Response sent...HYDRA::KENYONThe Foundation of Science...FictionFri Mar 21 1997 15:37237
See answers to his questions after the "**" below.

-jeff

From:	HYDRA::AXPDEVELOPER "[email protected]" 19-MAR-1997 16:53:05.75
To:	[email protected]
CC:	KENYON,AXPDEVELOPER
Subj:	RE: Problems with V4.0B and threads.

>----------
>From: 	Ron Young[SMTP:[email protected]]
>Sent: 	Wednesday, March 19, 1997 3:17 PM
>To: 	Bill Desimone
>Subject: 	Porting to DEC UNIX 4.0b
>
>>Date: Tue, 18 Mar 1997 16:10:16 -0800
>>To: DEC/AXP-HELP
>>From: Ron Young <[email protected]>
>>Subject: Porting to DEC UNIX 4.0b
>>Cc: DEC/Manley,DEC/Bench/Doering
>>
>>We are currently testing the port of our application on DEC UNIX 4.0B.
>This is an application that has worked properly under version 3.2.
>>
>>The application performs parallel processing using calls to the pthread
>library.  We have changed these calls where appropriate to match the man
>pages provided with 4.0B.  In fact these calls now match the pthread
>routines we use for IBM and SUN.
>>
>>When we run the new version under 4.0B, we have observed some "strange"
>behavior.  Although we do not know exactly what the problem is, we believe
>it to be one of the following:
>>
>>1) Using wrong compiler switches
>>
>>2) Linking against the wrong libraries
>>
>>3) Not using the __MB properly to update memory. (Most likely)
>>
>>1) Compiler switches:
>>The compiler switches we are using are:
>>
>>f77 -r8 -i8 -automatic -tune ev5 -fast -pthread
>>
>>For C routines not containing the __MB call:
>>cc  -tune ev5 -pthread
**
** Looks OK, see below, and note you are using "DEC C" (or -migrate really).  
** Why not use -O4 and -assume noaccuracy sensitive (or just -fast) here 
** as well?  We don't always recommend these, but you seem to be using them 
** elsewhere.  Your mileage may vary, and it is worth using -fast if you can.
**
>>
>>For C routines containing the __MB call:
>>cc  -tune ev5 -pthread -migrate -O4 -assume noaccuracy_sensitive
>>

**
** On V4.0, you get the "-migrate" compiler with or w/o the switch (it is the 
** default).  We recommend taking off the -migrate.
**

>>2) Linking:
>>We have options for building the application with and without a shared
>library.  The following lists the scripts used for both cases:
>>
>>As a shared library:
>>   ld           \
>>   -shared      \
>>   -o fmslib.so \
>>   -all         \
>>      fmslib.a  \
>>   -none        \
>>   fmsint.a     \
>>   blas.a       \
>>   -lpthread -lmach -lexc -lfor -lUfor -lots -lm -lc_r -lc \
>>   -set_version fmslib.51

**
** What you have looks generally correct, but we would like to see you use the 
** order as the compiler does.  Please link an empty module "f77 -pthread -v 
** foo.f -o /dev/null", and note the lib order.  For example, we see the FORTRAN 
** stuff before the math stuff before OTS
** etc, and then the threads stuff.  Please do the above, and follow it for the 
** libs you need.  One part of big interest may be the recursive versions of the
** libraries.  Not sure which simply point to the standard library (such as 
** libc_r pointing at libc).  Hopefully you get the idea.
**
>>#
>>#  Now link the DEMO_share application using fmslib.so:
>>   f77 -call_shared -o DEMO_share   \
>>       demo.o                  \
>>       fmsnoshr.a fmslib.so    \
>>       -lpthread -lmach -lexc -lc
>>
>>
>>Using archives directly:
>>   f77 -o DEMO_noshare         \
>>       demo.o                  \
>>       fmsnoshr.a fmslib.a fmsint.a fmslib.a \
>>       blas.a                  \
>>       -lpthread -lmach -lexc -lc
>>

**
** The above program builds should be linked -pthread, and not -lpthread, etc.  
** Let the compiler pick the lib order when using the compiler to link.  If you 
** use ld, you must specify directly.
**

>>3)Using __MB to update memory.
>>Our application is written so that whenever a thread changes shared memory
>under the protection of a mutually exclusive lock, it issues the statement
>>
>>__MB();
>>
>>before releasing the lock (code is in c).  The function __MB is typed as
>>
>>void __MB(void);
>>
>>The c routine containing this instruction is compiled using the -migrate
>switch.  It was our understanding and experience that this was necessary to
>flush and update the cache when a thread changed a value in shared memory
>that might be read later by another thread.  Under 3.2 this seems to work
>properly.  Under 4.0B, using the same procedure, we have observed threads
>reading old values AFTER the __MB instruction has been issued.
>>

**
** It sounds like you are doing:
**
**	grab a mutex, modify some data, issue __MB(), then release the mutex
**
** How do other threads have issued an __MB().  Thread do not have this info.  
** ie: __MB is not explicitly flushing the cache.  It only gaurantees the order 
** in which items go to memory.  Maybe on 3.2 this behaviour was the experience 
** however.
**

>>Has the procedure for updating cache changed under 4.0B?
>>

**
** We are not sure that this changed, but the __MB() does not gaurantee the 
** cache being flushed as noted, but does gaurantee the ordering of writes to 
** memory.  Assume thread A does the following:
**
**	v1=1
**	v2=2
**	__mb()
**	v3=3
**
** After thread A has completed ALL of the above instuctions, it is possible 
** that thread B will see any of the following:
**
**	case1	case2	case3	case4	case5
**
**	oldv1	oldv1	newv1	newv1	newv1
**	oldv2  	newv2	oldv2	newv2	newv2
**	oldv3	oldv3	oldv3	oldv3	newv3
**
** As you can see, only if v3 is "new" can v1 and v2 be gauranteed to be "new".
**
** If you could tell us how you are using the data you are locking with a mutex, 
** in conjunction with other data, that might help.
**

>>
>>
>>In addition to __MB not working, we observe is the following:
>>1) When DEMO_share or DEMO_noshare are run, and BEFORE they call
>pthread_create to start threads, a ps -m shows the following:
>>
>>decunix> ps -m
>>  PID TTY      S           TIME CMD
>>  614 ttyp1    I        0:02.61 -csh (csh)
>> 1040 ttyp1    I  +     0:02.72 ./DEMO_share
>>               I        0:00.22
>>               I        0:00.00
>>               I        0:00.05
>>               I        0:02.45
>>  651 ttyp2    I  +     0:00.69 -csh (csh)
>> 1030 ttyp4    S        0:00.25 -csh (csh)
>>decunix>
>>
>>It would appear that there are 3 additional threads besides DEMO_share.
>These were not present under version 3.2.  What are they?
>>
**
** Are you just curious, or are you seeing more threads using more resources 
** than you expect?  On V4.0 user threads are scheduled on kernel threads.  This 
** may be the effect.  Note that user threads can migrate to different kernel 
** threads.
**
>>
>>
>>2) When we run the non shared version on a workstation, DEMO_noshare, we
>get an error
>>
>>forrtl. severe (41): insufficient virtual memroy
>>
>>as soon as the application starts.  However if we run it on a larger
>server, it gets past this point.
>>

**
** I expect that the values in /etc/sysconfigtab are set differently (although 
** by default these
** should NOT be different between server and WS).
** 		per-proc-data-size = 134217728
**		max-per-proc-data-size = 1073741824
**
** You should be able to get around this by doing a "unlimit" in the C shell.  
** The default size is 1GB.  If you need to go higher, then you must set the 
** above to the desired size, as well as 'max-per-proc-address-space' as well 
** as 'vm-maxvas'.  Randy Doerring knows how to do this.
**

 
>>What system resources do we need to adjust on the workstation?
**
** See above
**
>>
>>
>>Thanks, Ron Young  
>>
>+---------------------------------+----------------------------------+
>| Ron Young                       | Phone:  (702) 831-4400           |
>| Multipath Corporation           | FAX:    (702) 831-4401           |
>| P.O. Box 8210                   | E-mail: [email protected]           |
>| Incline Village, NV 89452-8210  | See Multipath's home page at     |
>| U.S.A.                          | http://www.fmslib.com            |
>+---------------------------------+----------------------------------+
>
>
>
3363.2followup...HYDRA::KENYONThe Foundation of Science...FictionFri Mar 21 1997 15:4156
From:	SMTP%"[email protected]" 21-MAR-1997 12:01:41.98
To:	<[email protected]>
CC:	
Subj:	3363 Port to 4.0B


Thank you for your response to our earlier questions.  We have incorporated
your recommendations.  Part of the problem was that we were testing the
return code for pthread calls against -1 to detect an error instead of any
value other than zero.  As a result, errors which happened during lock
initialization and lock usage were not detected.  As a result, the threads
were not synchronized properly.

Now for trying to find out why the errors occurred in the first place.

For reasons unknown to me, the 24th call to pthread_cond_init returns a
condition code of 12, ENOMEM, not enough space.  This is after 26
successful calls to pthread_mutex_init and 23 calls to pthread_cond_init.
If I run the same code which was built under 3.2 on this 4.0B system, I do
not get this error.

Where is the space allocated?

How do I provide more?

Are there any system parameters I can provide which will help understand this?

I don't know if this is a disk space problem or not, but here is the output
from df:

decunix> df
Filesystem   512-blocks        Used   Available Capacity  Mounted on
/dev/rz2a        126334      108218        5482    96%    /
/proc                 0           0           0   100%    /proc
/dev/rz2g       1732204     1154418      404564    75%    /usr
/dev/rz3g       1602502     1294320      147930    90%    /rz3g
decunix>

The other problem we are having is that the ar command complains about not
enough space.  We have defined the environment variable TMPDIR to point to
/usr/tmp, and this temporarily has fixed the problem.

Where does ar do it's work when TMPDIR is not defined?

is /dev/rz2a getting too full.  We used the devault partitions on a rz28
disk and did a default installation of UNIX 4.0 (then upgraded to 4.0B).

If rz2a is too full, how do we increase it?
+---------------------------------+----------------------------------+
| Ron Young                       | Phone:  (702) 831-4400           |
| Multipath Corporation           | FAX:    (702) 831-4401           |
| P.O. Box 8210                   | E-mail: [email protected]           |
| Incline Village, NV 89452-8210  | See Multipath's home page at     |
| U.S.A.                          | http://www.fmslib.com            |
+---------------------------------+----------------------------------+

3363.3and my reponse...HYDRA::KENYONThe Foundation of Science...FictionFri Mar 21 1997 15:4125
From:	HYDRA::AXPDEVELOPER "[email protected]" 21-MAR-1997 15:23:53.42
To:	SMTP%"[email protected]"
CC:	AXPDEVELOPER
Subj:	RE: 3363 Port to 4.0B

Ron,

Generally one would not leave /tmp directly on the / partition.  You can
either mount an empty partition as /tmp, or make a link from /tmp to
a directory with more space (such as /usr/tmp).  I am not 100% sure, but 
the behaviour suggests that ar is doing its work in /tmp.  Either of the
above will fix this.

Please send the output from 'swapon -s', 'disklabel -r /dev/rrzXa'.  Do
the disklabel for drives rz2c and rz3c (note the use of /dev/rr, and not
/dev/r.

I would not think that the ENOMEM is due to any of this in anycase, I am
just interested in helping you get the system set up a bit more properly
in general with regard to /tmp, swap space, etc.  I will ask another
person here about the real issue around where the pthread calls are
allocating their space.  If it is on the stack, I can help increase that,
but I am not sure.

Jeff Kenyon
3363.4some moreHYDRA::BRYANTMon Mar 24 1997 11:2617
>ENOMEM Problem

What is your kernel paramter vm_vpagemax set to?  If it is set to the default 
which is 16384, then this is probably not large enough.  This virtual memory 
parameter is defining the largest contiguous memory region that a threaded 
prgram can use.  Your 3.2b memory requirements may have been at the very edge.  
Please double this value and rerun your application as instructed below:

(as root>
        cd /etc
        cat >vpagemax.stanza
        vm:
                vm-vpagemax=32768
        ^D
        sysconfigdb -a -f vpagemax.stanza
        reboot

3363.5HYDRA::AXPDEVELOPERAlpha Developer supportMon Mar 24 1997 11:38147
From:	SMTP%"[email protected]" 24-MAR-1997 10:20:21.27
To:	[email protected] ([email protected])
CC:	
Subj:	Re: 3363 Port to 4.0B

Return-Path: [email protected]
Received: by vaxsim.mro.dec.com (UCX V4.1-12, OpenVMS V6.2 VAX);
	Mon, 24 Mar 1997 10:20:16 -0500
Received: from genoa.tol.net by mail13.digital.com (8.7.5/UNX 1.5/1.0/WV)
	id JAA30870; Mon, 24 Mar 1997 09:48:47 -0500 (EST)
Received: from xl5100dp (toyabe-d66.sierra.net) by genoa.tol.net with SMTP id AA21458
  (5.67b8/IDA-1.5 for <[email protected]>); Mon, 24 Mar 1997 06:49:37 -0800
Message-Id: <[email protected]>
X-Sender: [email protected] (Unverified)
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Mon, 24 Mar 1997 06:48:00 -0800
To: [email protected] ([email protected])
From: Ron Young <[email protected]>
Subject: Re: 3363 Port to 4.0B
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"

Jeff -

Thanks for helping to set up our system.

Our system is a Alpha 3000 workstation, model 300.  It has 64Mb of memory.
There are 2 disks:

DKA200   2.10Gb   RZ28B   Unix 4.0B
DKA300   1.05Gb   RZ36    Unix 3.2

We like to keep at least 2 versions of the operating system (current and
previous) to support customers.  We prever to have these on different disks
so we can easily switch with a reboot.  As you can see, the default
installation of 4.0B on RZ28B left the h partition (874 Mb) unused.

Below is the output you requested:

Swap partition /dev/rz2b (default swap):
    Allocated space:        25088 pages (196MB)
    In-use space:               1 pages (  0%)
    Free space:             25087 pages ( 99%)


Total swap allocation:
    Allocated space:        25088 pages (196MB)
    Reserved space:          3677 pages ( 14%)
    In-use space:               1 pages (  0%)
    Available space:        21411 pages ( 85%)

# /dev/rrz2c:
type: SCSI
disk: RZ28B
label: 
flags:
bytes/sector: 512
sectors/track: 99
tracks/cylinder: 16
sectors/cylinder: 1376
cylinders: 2595
sectors/unit: 4110480
rpm: 5411
interleave: 1
trackskew: 13
cylinderskew: 22
headswitch: 0		# milliseconds
track-to-track seek: 0	# milliseconds
drivedata: 0 

8 partitions:
#        size   offset    fstype   [fsize bsize   cpg]
  a:   131072        0    4.2BSD     1024  8192    16 	# (Cyl.    0 - 95*)
  b:   401408   131072      swap                      	# (Cyl.   95*- 386*)
  c:  4110480        0    unused        0     0       	# (Cyl.    0 - 2987*)
  d:  1191936   532480    unused        0     0       	# (Cyl.  386*- 1253*)
  e:  1191936  1724416    unused        0     0       	# (Cyl. 1253*- 2119*)
  f:  1194128  2916352    unused        0     0       	# (Cyl. 2119*- 2987*)
  g:  1787904   532480    4.2BSD     1024  8192    16 	# (Cyl.  386*- 1686*)
  h:  1790096  2320384      swap                    	       # (Cyl. 1686*-
2987*)

# /dev/rrz3c:
type: SCSI
disk: rz26
label: 
flags:
bytes/sector: 512
sectors/track: 57
tracks/cylinder: 14
sectors/cylinder: 798
cylinders: 2570
sectors/unit: 2050860
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0		# milliseconds
track-to-track seek: 0	# milliseconds
drivedata: 0 

8 partitions:
#        size   offset    fstype   [fsize bsize   cpg]
  a:   131072        0    4.2BSD     1024  8192    16 	# (Cyl.    0 - 164*)
  b:   263168   131072    unused     1024  8192       	# (Cyl.  164*- 494*)
  c:  2050860        0    unused     1024  8192       	# (Cyl.    0 - 2569)
  d:   262144    99415    unused     1024  8192       	# (Cyl.  124*- 453*)
  e:   262144    99415    unused     1024  8192       	# (Cyl.  124*- 453*)
  f:   262144    99415    unused     1024  8192       	# (Cyl.  124*- 453*)
  g:  1656620   394240    4.2BSD     1024  8192    16 	# (Cyl.  494*- 2569)
  h:   262144    99415    unused     1024  8192       	# (Cyl.  124*- 453*)





At 03:09 PM 3/21/97 -0500, you wrote:
>Ron,
>
>Generally one would not leave /tmp directly on the / partition.  You can
>either mount an empty partition as /tmp, or make a link from /tmp to
>a directory with more space (such as /usr/tmp).  I am not 100% sure, but 
>the behaviour suggests that ar is doing its work in /tmp.  Either of the
>above will fix this.
>
>Please send the output from 'swapon -s', 'disklabel -r /dev/rrzXa'.  Do
>the disklabel for drives rz2c and rz3c (note the use of /dev/rr, and not
>/dev/r.
>
>I would not think that the ENOMEM is due to any of this in anycase, I am
>just interested in helping you get the system set up a bit more properly
>in general with regard to /tmp, swap space, etc.  I will ask another
>person here about the real issue around where the pthread calls are
>allocating their space.  If it is on the stack, I can help increase that,
>but I am not sure.
>
>Jeff Kenyon
>
>
+---------------------------------+----------------------------------+
| Ron Young                       | Phone:  (702) 831-4400           |
| Multipath Corporation           | FAX:    (702) 831-4401           |
| P.O. Box 8210                   | E-mail: [email protected]           |
| Incline Village, NV 89452-8210  | See Multipath's home page at     |
| U.S.A.                          | http://www.fmslib.com            |
+---------------------------------+----------------------------------+

3363.6HYDRA::BRYANTTue Mar 25 1997 16:40184
Jeff Kenyon & Pat Bryant -

PROBLEMS:
=========
1) call to pthread_cond_init fails with return code 12, not enough space
2) ar fails with message
   /: write failed, file system is full
   ar: error writing archive member contents: Error 0

STATUS:
=======
During the last couple days you have requested additional information and
offered suggestions.  I have provided the information and tried the
suggestions.  However I am still having the problems.

MACHINE DESCRIPTION:
====================
1) Model: Alpha 3000, model 300

2) Physical Memory: 64Mb

3) Virtual memory:
(Output from /sbin/sysconfig -q vm)
vm:
ubc-minpercent = 10
ubc-maxpercent = 100
ubc-borrowpercent = 20
ubc-maxdirtywrites = 5
ubc-nfsloopback = 0
vm-max-wrpgio-kluster = 32768
vm-max-rdpgio-kluster = 16384
vm-cowfaults = 4
vm-mapentries = 200
vm-maxvas = 1073741824
vm-maxwire = 16777216
vm-heappercent = 7
vm-vpagemax = 32768 <- NOTE: This was increased according to your suggestions
vm-segmentation = 1
vm-ubcpagesteal = 24
vm-ubcdirtypercent = 10
vm-ubcseqstartpercent = 50
vm-ubcseqpercent = 10
vm-csubmapsize = 1048576
vm-ubcbuffers = 256
vm-syncswapbuffers = 128
vm-asyncswapbuffers = 4
vm-clustermap = 1048576
vm-clustersize = 65536
vm-zone_size = 0
vm-kentry_zone_size = 16777216
vm-syswiredpercent = 80
vm-inswappedmin = 1
vm-page-free-target = 128
vm-page-free-min = 20
vm-page-free-reserved = 10
vm-page-free-optimal = 74
vm-page-prewrite-target = 256
dump-user-pte-pages = 0
kernel-stack-guard-pages = 1
vm-min-kernel-address = 18446744071562067968
contig-malloc-percent = 20
vm-aggressive-swap = 0
new-wire-method = 1
vm-segment-cache-max = 50
vm-page-lock-count = 0
gh-chunks = 0
gh-min-seg-size = 8388608
gh-fail-if-no-mem = 1


3) Disks:
   DKA200   2.10Gb   RZ28B   Unix 4.0B <- Default installation
   DKA300   1.05Gb   RZ36    Unix 3.2  <- Works fine

(Output from disklabel -r /dev/rrz2c)
# /dev/rrz2c:
type: SCSI
disk: RZ28B
label: 
flags:
bytes/sector: 512
sectors/track: 99
tracks/cylinder: 16
sectors/cylinder: 1376
cylinders: 2595
sectors/unit: 4110480
rpm: 5411
interleave: 1
trackskew: 13
cylinderskew: 22
headswitch: 0		# milliseconds
track-to-track seek: 0	# milliseconds
drivedata: 0 

8 partitions:
#        size   offset    fstype   [fsize bsize   cpg]
  a:   131072        0    4.2BSD     1024  8192    16 	# (Cyl.    0 - 95*)
  b:   401408   131072      swap                      	# (Cyl.   95*- 386*)
  c:  4110480        0    unused        0     0       	# (Cyl.    0 - 2987*)
  d:  1191936   532480    unused        0     0       	# (Cyl.  386*- 1253*)
  e:  1191936  1724416    unused        0     0       	# (Cyl. 1253*- 2119*)
  f:  1194128  2916352    unused        0     0       	# (Cyl. 2119*- 2987*)
  g:  1787904   532480    4.2BSD     1024  8192    16 	# (Cyl.  386*- 1686*)
  h:  1790096  2320384      swap                    	       # (Cyl. 1686*-
2987*)

(Output from disklabel -r /dev/rrz3c)
# /dev/rrz3c:
type: SCSI
disk: rz26
label: 
flags:
bytes/sector: 512
sectors/track: 57
tracks/cylinder: 14
sectors/cylinder: 798
cylinders: 2570
sectors/unit: 2050860
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0		# milliseconds
track-to-track seek: 0	# milliseconds
drivedata: 0 

8 partitions:
#        size   offset    fstype   [fsize bsize   cpg]
  a:   131072        0    4.2BSD     1024  8192    16 	# (Cyl.    0 - 164*)
  b:   263168   131072    unused     1024  8192       	# (Cyl.  164*- 494*)
  c:  2050860        0    unused     1024  8192       	# (Cyl.    0 - 2569)
  d:   262144    99415    unused     1024  8192       	# (Cyl.  124*- 453*)
  e:   262144    99415    unused     1024  8192       	# (Cyl.  124*- 453*)
  f:   262144    99415    unused     1024  8192       	# (Cyl.  124*- 453*)
  g:  1656620   394240    4.2BSD     1024  8192    16 	# (Cyl.  494*- 2569)
  h:   262144    99415    unused     1024  8192       	# (Cyl.  124*- 453*)


4) Swap space:
(Output from swapon -s)
Swap partition /dev/rz2b (default swap):
    Allocated space:        25088 pages (196MB)
    In-use space:               1 pages (  0%)
    Free space:             25087 pages ( 99%)


Total swap allocation:
    Allocated space:        25088 pages (196MB)
    Reserved space:          3677 pages ( 14%)
    In-use space:               1 pages (  0%)
    Available space:        21411 pages ( 85%)

QUESTIONS:
==========
1) What resources are allocated by pthread_cond_init?

2) How do I provide more?

3) How do I move the /tmp directory used by ar to a different location.  I
can temporarily do this by defining the environment variable TMPDIR to
/usr/tmp.  However I would like a permanent solution.

NOTES:
======
1) When the application is first built with a shared object library, the
error messages from pthread_cond_init do not occur and the application runs
properly.

2) When I increased vm-vpagemax from 16384 to 32768 per your suggestions,
it did not make any difference.

We have a 12 processor system waiting to run tests with this software.
Please let me know as soon as possible when you have any additional
suggestions.

Thanks, Ron Young
+---------------------------------+----------------------------------+
| Ron Young                       | Phone:  (702) 831-4400           |
| Multipath Corporation           | FAX:    (702) 831-4401           |
| P.O. Box 8210                   | E-mail: [email protected]           |
| Incline Village, NV 89452-8210  | See Multipath's home page at     |
| U.S.A.                          | http://www.fmslib.com            |
+---------------------------------+----------------------------------+

3363.7Sent Ron the following:HYDRA::BRYANTWed Mar 26 1997 12:1814
According to your disklabel, you don't have another free partition to create a 
/tmp directory.  I don't understand why you don't want ar to use /usr/tmp.  Your 
/usr partition appears to be large enough.  Setting TMPDIR to /usr/tmp works 
with ar, correct?

As far as the ENOMEM error goes, is it possible for you to do a sysconfig -q 
proc and sysconfig -q vm on both OS'es to determine is there is a gross 
difference between values set on 3.2 and those set on 4.0.  You can dual boot, 
correct?  I suspect your problem has to do with one of these settings not being 
high enough.  Feel free to send me the 4.0 settings.

Thanks.
Pat Bryant
Alpha Developer Support
3363.8HYDRA::BRYANTWed Mar 26 1997 15:10212
At 12:18 PM 3/26/97 -0500, you wrote:
>According to your disklabel, you don't have another free partition to
create a 
>/tmp directory.  I don't understand why you don't want ar to use /usr/tmp.
 Your 
>/usr partition appears to be large enough.  Setting TMPDIR to /usr/tmp works 
>with ar, correct?

This does work.  Is there any way to make this permanent instead of having
to set TMPDIR on each login?

Is there any way to glue the g and h partitions together, or is it too late?

Also do I really need to use the h partition for scratch or is the b
partition large enough?  The 3.2 system worked with a swap space the size
of b.

>
>As far as the ENOMEM error goes, is it possible for you to do a sysconfig -q 
>proc and sysconfig -q vm on both OS'es to determine is there is a gross 
>difference between values set on 3.2 and those set on 4.0.  You can dual
boot, 
>correct?  I suspect your problem has to do with one of these settings not
being 
>high enough.  Feel free to send me the 4.0 settings.

Below is the output you requested for both systems:

Unix 4.0B sysconfig -q proc
===========================
proc:
max-proc-per-user = 64
max-threads-per-user = 256
per-proc-stack-size = 2097152
max-per-proc-stack-size = 33554432
per-proc-data-size = 134217728
max-per-proc-data-size = 1073741824
max-per-proc-address-space = 1073741824
per-proc-address-space = 1073741824
autonice = 0
autonice-time = 600
autonice-penalty = 4
open-max-soft = 4096
open-max-hard = 4096
ncallout_alloc_size = 8192
round-robin-switch-rate = 0
round_robin_switch_rate = 0
sched-min-idle = 0
sched_min_idle = 0
give-boost = 1
give_boost = 1
maxusers = 32
task-max = 277
thread-max = 552
num-wait-queues = 64

Unix 3.2B sysconfig -q proc
===========================
max-proc-per-user = 64
max-threads-per-user = 256
per-proc-stack-size = 2097152
max-per-proc-stack-size = 33554432
per-proc-data-size = 134217728
max-per-proc-data-size = 1073741824
max-per-proc-address-space = 1073741824
per-proc-address-space = 1073741824
autonice = 0
open-max-soft = 4096
open-max-hard = 4096
ncallout = 284
ncallout_alloc_size = 8192
round-robin-switch-rate = 0
round_robin_switch_rate = 0
sched-min-idle = 0
sched_min_idle = 0
give-boost = 1
give_boost = 1

Unix 4.0B sysconfig -q vm
=========================
vm:
ubc-minpercent = 10
ubc-maxpercent = 100
ubc-borrowpercent = 20
ubc-maxdirtywrites = 5
ubc-nfsloopback = 0
vm-max-wrpgio-kluster = 32768
vm-max-rdpgio-kluster = 16384
vm-cowfaults = 4
vm-mapentries = 200
vm-maxvas = 1073741824
vm-maxwire = 16777216
vm-heappercent = 7
vm-vpagemax = 32768
vm-segmentation = 1
vm-ubcpagesteal = 24
vm-ubcdirtypercent = 10
vm-ubcseqstartpercent = 50
vm-ubcseqpercent = 10
vm-csubmapsize = 1048576
vm-ubcbuffers = 256
vm-syncswapbuffers = 128
vm-asyncswapbuffers = 4
vm-clustermap = 1048576
vm-clustersize = 65536
vm-zone_size = 0
vm-kentry_zone_size = 16777216
vm-syswiredpercent = 80
vm-inswappedmin = 1
vm-page-free-target = 128
vm-page-free-min = 20
vm-page-free-reserved = 10
vm-page-free-optimal = 74
vm-page-prewrite-target = 256
dump-user-pte-pages = 0
kernel-stack-guard-pages = 1
vm-min-kernel-address = 18446744071562067968
contig-malloc-percent = 20
vm-aggressive-swap = 0
new-wire-method = 1
vm-segment-cache-max = 50
vm-page-lock-count = 0
gh-chunks = 0
gh-min-seg-size = 8388608
gh-fail-if-no-mem = 1

Unix 3.2B sysconfig -q vm
=========================
ubc-minpercent = 10
ubc-maxpercent = 100
ubc-borrowpercent = 20
ubc-maxdirtywrites = 5
vm-max-wrpgio-kluster = 32768
vm-max-rdpgio-kluster = 16384
vm-cowfaults = 4
vm-mapentries = 200
vm-maxvas = 1073741824
vm-maxwire = 16777216
vm-heappercent = 7
vm-vpagemax = 16384
vm-segmentation = 1
vm-ubcpagesteal = 24
vm-ubcdirtypercent = 10
vm-ubcseqstartpercent = 50
vm-ubcseqpercent = 10
vm-csubmapsize = 1048576
vm-ubcbuffers = 256
vm-syncswapbuffers = 128
vm-asyncswapbuffers = 4
vm-clustermap = 1048576
vm-clustersize = 65536
vm-zone_size = 0
vm-kentry_zone_size = 16777216
vm-syswiredpercent = 80
vm-inswappedmin = 1
vm-page-free-target = 128
vm-page-free-min = 20
vm-page-free-reserved = 10
vm-page-free-optimal = 74
vm-page-prewrite-target = 256
dump-user-pte-pages = 0
kernel-stack-guard-pages = 1
vm-min-kernel-address = 18446744071562067968
contig-malloc-percent = 20
vm-aggressive-swap = 0
new-wire-method = 0
vm-segment-cache-max = 50
vm-nowait-memalloc = 0

I don't know what all these do.  Under proc, 4.0B has the following values
not listed for 3.2

autonice-time = 600
autonice-penalty = 4
maxusers = 32
task-max = 277
thread-max = 552
num-wait-queues = 64

and 3.2 has the following value not listed under 4.0B:

ncallout = 284

The other values under proc seem to be the same.

Under vm, 4.0B has the following values not listed for 3.2

vm-page-lock-count = 0
gh-chunks = 0
gh-min-seg-size = 8388608
gh-fail-if-no-mem = 1

and 3.2 has the following value not listed under 4.0B:

new-wire-method = 0

In addition, the value of vm-vpagemax was set to 32768 under 4.0B and is at
the default value of 16384 for 3.2.  Other than that, the values seem to be
the same.

Hope this helps,

Ron Young

+---------------------------------+----------------------------------+
| Ron Young                       | Phone:  (702) 831-4400           |
| Multipath Corporation           | FAX:    (702) 831-4401           |
| P.O. Box 8210                   | E-mail: [email protected]           |
| Incline Village, NV 89452-8210  | See Multipath's home page at     |
| U.S.A.                          | http://www.fmslib.com            |
+---------------------------------+----------------------------------+

3363.9followup...HYDRA::KENYONThe Foundation of Science...FictionFri Mar 28 1997 09:12251
From:	SMTP%"[email protected]" 27-MAR-1997 19:07:07.82
To:	[email protected] ([email protected])
CC:	
Subj:	Re: Have you tried setting limits to unlimited?

Pat -

Jeff Kenyon helped me set up the system so there is enough swap space and a
/tmp file large enough to hold the work of ar.  Therefore those issues are
resolved.

The remaining issue is trying to get the threads working properly under
4.0B.  There is still something strange happening which I cannot figure
out.  This application uses FORTRAN 77 for the math and c for all system
related services, including all pthread calls.  I believe that the
libraries being collected during loading may be in the wrong order.

When I link first using a shared object as follows, it works fine,
including the threads.

#!/bin/ksh
# This script produces a shareable object library for FMS:
#
if   [ "$TARGET" = "ev4" ]
then
   BLAS="/usr/opt/XMDLOA331/dxml/libdxml_ev4.a "
elif [ "$TARGET" = "ev5" ]
then
   BLAS="/usr/opt/XMDLOA331/dxml/libdxml_ev5.a "
fi
LIBS1="-lUfor -lfor -lm "
LIBS3="-lpthread -lmach -lexc -lc"
LIBS="$LIBS1$LIBS3"
ld             \
-shared        \
-o fmslib.so   \
-all           \
   fmslib.a    \
-none          \
fmsint.a       \
$BLAS$LIBS     \
-set_version fmslib.51

Then I link the DEMO application using fmslib.so as follows:

f77 -call_shared -o DEMO demo.o \
fmsnoshr.a fmslib.so

However, when I try and link using the object libraries directly, it fails.
 I get different behavior depending on if I use (-lpthread -lmach -lexc
-lc) or -pthread.  Also I get different behavior if I print out a line from
the FORTRAN application before the first c routine is called.  The
following 4 cases illustrate the problems I am having.

CASE1: (Linking with -lpthread -lmach -lexc -lc and no print output)
======
decunix> cat linkdbg
f77 -v -o DEMO demo.o \
fmsnoshr.a fmslib.a fmsint.a fmslib.a blas.a \
-lpthread -lmach -lexc -lc
decunix> ./linkdbg
/usr/bin/cc -v -o DEMO /usr/lib/cmplrs/fort/for_main.o -O4 demo.o
fmsnoshr.a fms
lib.a fmsint.a fmslib.a blas.a -lpthread -lmach -lexc -lc -lUfor -lfor
-lFutil -
lm_4sqrt -lm -lots
/usr/lib/cmplrs/cc/ld -o DEMO -g0 -O4 -call_shared
/usr/lib/cmplrs/cc/crt0.o /us
r/lib/cmplrs/fort/for_main.o demo.o fmsnoshr.a fmslib.a fmsint.a fmslib.a
blas.a
 -lpthread -lmach -lexc -lc -lUfor -lfor -lFutil -lm_4sqrt -lm -lots -lc
/usr/lib/cmplrs/cc/ld:
1.73u 1.83s 0:08 41% 0+106k 0+419io 0pf+0w 106stk+17464mem
decunix> ./DEMO
forrtl: severe (41): insufficient virtual memory
decunix>

This case terminates with insufficient virtual memroy.

CASE2: (Linking with -pthread and no print output.  NOTE: This seems to
move the order of the pthread libraries in the ld phase to the end.)
======
decunix> cat linkdbg1
f77 -v -o DEMO demo.o \
fmsnoshr.a fmslib.a fmsint.a fmslib.a blas.a \
-pthread
decunix> ./linkdbg1
/usr/bin/cc -v -o DEMO -pthread /usr/lib/cmplrs/fort/for_main.o -O4 demo.o
fmsno
shr.a fmslib.a fmsint.a fmslib.a blas.a -lUfor -lfor -lFutil -lm_4sqrt -lm
-lots

/usr/lib/cmplrs/cc/ld -o DEMO -g0 -O4 -call_shared
/usr/lib/cmplrs/cc/crt0.o /us
r/lib/cmplrs/fort/for_main.o demo.o fmsnoshr.a fmslib.a fmsint.a fmslib.a
blas.a
 -qlUfor_r -lUfor -qlfor_r -lfor -qlFutil_r -lFutil -qlm_4sqrt_r -lm_4sqrt
-qlm_
r -lm -qlots_r -lots -lpthread -lmach -lexc -lc
/usr/lib/cmplrs/cc/ld:
1.85u 1.79s 0:08 42% 0+100k 0+419io 0pf+0w 100stk+15928mem
decunix> ./DEMO

 ----------------------------------------
 FMS VERSION 5.2-01   Built on  3/26/1997
 FMS is a licensed product of:
 Multipath Corporation
 (702) 831-4400
 http://www.fmslib.com
 ----------------------------------------
 FMS52-d-450-012 IS LICENSED TO:
 MULTIPATH DEMONSTRATION
 ----------------------------------------
 Date = 27-MAR-1997       Time = 14:51:18
 FMS License will expire on.............=30-JUN-1997
 Default MAXMD  parameter.....(R8 words)=   2097152
 Default MAXCPU parameter...............=         1
 Memory for FMS...............(R8 words)=   2097152
fms$_semini:  12 = pthread_cond_init  (68992,0)


 **********************
 * FATAL ERROR IN FMS *
 *  FMS$ERR_SYSTEM    *
 **********************


 IOSTAT PARAMETER =        12

System Error Condition = 12, Not enough space

This gets further, then dies when trying to allocate a condition variable.
(NOTE: there were a lot of pthread calls before this one that failed that
worked OK).

CASE3: (Linking with printed output and -lpthread -lmach -lexc -lc)
======
decunix> cat linkdbg2
f77 -v -o DEMO demo.o \
inicom.dbg \
fmsnoshr.a fmslib.a fmsint.a fmslib.a blas.a \
-lpthread -lmach -lexc -lc
decunix> ./linkdbg2
/usr/bin/cc -v -o DEMO /usr/lib/cmplrs/fort/for_main.o -O4 demo.o
inicom.dbg fmsno
shr.a fmslib.a fmsint.a fmslib.a blas.a -lpthread -lmach -lexc -lc -lUfor
-lfor -l
Futil -lm_4sqrt -lm -lots
/usr/lib/cmplrs/cc/ld -o DEMO -g0 -O4 -call_shared
/usr/lib/cmplrs/cc/crt0.o /usr/
lib/cmplrs/fort/for_main.o demo.o inicom.dbg fmsnoshr.a fmslib.a fmsint.a
fmslib.a
 blas.a -lpthread -lmach -lexc -lc -lUfor -lfor -lFutil -lm_4sqrt -lm -lots
-lc
/usr/lib/cmplrs/cc/ld:
1.79u 1.95s 0:08 42% 0+105k 0+428io 0pf+0w 105stk+17648mem
decunix> ./DEMO
decunix>

This one just dies without any message.  Depending on factors I do not
understand, I have seen this one run and then die when the first thread was
created.

CASE4: (Printed output and -pthread)
======
decunix> cat linkdbg3
f77 -v -o DEMO demo.o \
inicom.dbg \
fmsnoshr.a fmslib.a fmsint.a fmslib.a blas.a \
-pthread
decunix> ./linkdbg3
/usr/bin/cc -v -o DEMO -pthread /usr/lib/cmplrs/fort/for_main.o -O4 demo.o
inicom.
dbg fmsnoshr.a fmslib.a fmsint.a fmslib.a blas.a -lUfor -lfor -lFutil
-lm_4sqrt -l
m -lots
/usr/lib/cmplrs/cc/ld -o DEMO -g0 -O4 -call_shared
/usr/lib/cmplrs/cc/crt0.o /usr/
lib/cmplrs/fort/for_main.o demo.o inicom.dbg fmsnoshr.a fmslib.a fmsint.a
fmslib.a
 blas.a -qlUfor_r -lUfor -qlfor_r -lfor -qlFutil_r -lFutil -qlm_4sqrt_r
-lm_4sqrt
-qlm_r -lm -qlots_r -lots -lpthread -lmach -lexc -lc
/usr/lib/cmplrs/cc/ld:
1.91u 1.90s 0:08 42% 0+99k 0+429io 0pf+0w 99stk+16096mem
decunix> ./DEMO
 FMS$_INICOM: Returned from FMS$_COMSHR
 FMS$_INICOM: End of initializing FMSTST

FMS$_INICOM: Output from getrlimit:
 rlim_cur =2147483647
 rlim_max =2147483647


FMS$_INICOM: Output from getrlimit:
 rlim_cur =  60440576
 rlim_max =  60440576

 FMS$_INICOM: End of initializing FMSCOM
 FMS$_INICOM: End of initializing FMSR8
 FMS$_INICOM: End of initializing FMSCHR
 INICOM: IENTER =                    79
 INICOM: NENTER =                     1
 FMS$_INICOM: End of initializing FMSDAT
 INICOM: IENTER =                    79
 INICOM: NENTER =                     1
 FMS$_INICOM: End of initializing FMSDCH
 FMS$_INICOM: Start of initializing FMSFIL
 INICOM: IENTER =                    79
 INICOM: NENTER =                     1
 FMS$_INICOM: End of initializing FMSFIL

 ----------------------------------------
 FMS VERSION 5.2-01   Built on  3/26/1997
 FMS is a licensed product of:
 Multipath Corporation
 (702) 831-4400
 http://www.fmslib.com
 ----------------------------------------


 **********************
 * FATAL ERROR IN FMS *
 *  FMS$ERR_LICENSE   *
 **********************


 SOFTWARE LICENSE VIOLATION.
 ERROR OPENING AUTHORIZATION FILE.
 You must have a file named FMSLIC.52.  This file can be in
 your default directory or in a directory pointed to by the
 environment variable FMS_LICENSE
decunix>

This one shows the printed output then dies on a FORTRAN OPEN statement
when it tries to open a file (which I know exists but it can't find).

I don't know what to try next.  I know the problem is not in the code
because the shared object version works.  ANY suggestions you may have will
be appreciated.

Thanks, Ron
+---------------------------------+----------------------------------+
| Ron Young                       | Phone:  (702) 831-4400           |
| Multipath Corporation           | FAX:    (702) 831-4401           |
| P.O. Box 8210                   | E-mail: [email protected]           |
| Incline Village, NV 89452-8210  | See Multipath's home page at     |
| U.S.A.                          | http://www.fmslib.com            |
+---------------------------------+----------------------------------+


3363.10Looking for the culpritHYDRA::BRYANTWed Apr 02 1997 12:5423
Ron,

The 'insufficient virtual memory' error you are experiencing is coming from RTL 
malloc returning a zero.  I wonder if you are doing an UNFORMATTED OPEN and 
performing reads on an ASCII file.  In this case, since files created with 
FORTRAN UNFORMATTED I/O have a byte count as the first part of each record, when 
you try to read an ASCII file with UNFORMATTED I/O, the byte count will be 
garbage or possibly a very large number.  Could this be happening to you and it 
just wasn't detected on 3.0?

In any case, do a 

%setenv f77_dump_flag Y

and rerun DEMO.  This will cause all fatal FORRTL errors to envoke abort() which 
produces a core dump.  Then use dbx to determine which routine is calling 
malloc.

e-mail back this information and let me know the results.  I'm also interested 
if you were able to increase any of the parameters using the limit command.

Thanks.
Pat Bryant
3363.11Response backHYDRA::BRYANTWed Apr 02 1997 17:44132
Pat -

I set f77_dump_flag Y as you requested.  The code is failing as soon as it
is trying to perform FORTRAN I/O, which I have reduced down to a print
statement.  This is the first I/O performed in the code.  Below is the output:

decunix> dbx -r ./DEMO

forrtl: severe (41): insufficient virtual memory
thread 0xa signal IOT/Abort trap at >*[nxm_thread_kill, 0x3ff8053eab0]  ret
    r3
1, (r26), 1


(dbx) where
>  0 nxm_thread_kill(0x4, 0x140150860, 0x3ff80193d3c, 0x980, 0x14015c018)
[0x3ff80
53eab0]
   1 pthread_kill(0x3ffc0082590, 0x20, 0x0, 0x0, 0x11fffffb5) [0x3ff8056ed4c]
   2 (unknown)() [0x3ff805756ec]
   3 __tis_raise(0x11fffffb5, 0x3ffc0080310, 0x3ff8010fb04, 0x3ffc0080c50,
0x3ff80
159f44) [0x3ff8010fb00]
   4 raise(0x3ff8010fb04, 0x3ffc0080c50, 0x3ff80159f44, 0x3ff80575618,
0x3ff80170a
6c) [0x3ff80159f40]
   5 abort(0x3ffc0560c30, 0x3ffc05655d0, 0x3ff80d13180, 0x0, 0x600000000)
[0x3ff80
170a68]
   6 for__issue_diagnostic(0x29, 0x2, 0x6, 0x11ffff830, 0x0) [0x3ff80d0b614]
   7 for__io_return(0x0, 0x0, 0x0, 0x0, 0x0) [0x3ff80d0baec]
   8 for_write_seq_lis(0x3ffc00802a0, 0x140142a00, 0x11ffffca0,
0x120009fd0, 0x140
02f760) [0x3ff80d4b0bc]
   9 fms$_fmsaut(NOWDAT = [1]   2
[2]     4
[3]     1997
, NOWTIM = [1]  10
[2]     47
[3]     0
, SERIAL = 0.0) ["d5/fmsaut.f":4, 0x1200165bc]
  10 fms$_fmsini(0x0, 0x474e414c, 0x400000002, 0xa000007cd, 0x2f)
["d5/fmsini2.f":
1951, 0x12001a9c4]
  11 fmsini(0x120016900, 0x120016940, 0x120016980, 0x1200169c0,
0x120016a10) ["d5/
fmsini.f":1716, 0x1200166a4]
  12 demo(0x120016980, 0x1200169c0, 0x120016a10, 0x8008460d, 0x1200164e8)
["d5/dem
o.f":2, 0x12001653c]
  13 main() ["for_main.c":203, 0x1200164e4]
(dbx) quit


decunix> cat fmsaut.f
        SUBROUTINE FMS$_FMSAUT (NOWDAT, NOWTIM, SERIAL)
        INTEGER*4    NOWDAT(3), NOWTIM(3)
        REAL*8       SERIAL
        print *,'Hello'  <--- This is where it fails
        return
        end
decunix>

This is what I am using to link DEMO:

decunix> cat linkdbg
f77 -v -o DEMO demo.o \
fmsaut.dbg \
fmsnoshr.a fmslib.a fmsint.a fmslib.a blas.a \
-lpthread -lmach -lexc -lc

decunix> ./linkdbg
/usr/bin/cc -v -o DEMO /usr/lib/cmplrs/fort/for_main.o -O4 demo.o
fmsaut.dbg fmsno
shr.a fmslib.a fmsint.a fmslib.a blas.a -lpthread -lmach -lexc -lc -lUfor
-lfor -l
Futil -lm_4sqrt -lm -lots
/usr/lib/cmplrs/cc/ld -o DEMO -g0 -O4 -call_shared
/usr/lib/cmplrs/cc/crt0.o /usr/
lib/cmplrs/fort/for_main.o demo.o fmsaut.dbg fmsnoshr.a fmslib.a fmsint.a
fmslib.a
 blas.a -lpthread -lmach -lexc -lc -lUfor -lfor -lFutil -lm_4sqrt -lm -lots
-lc
/usr/lib/cmplrs/cc/ld:
1.80u 1.90s 0:09 38% 0+107k 0+429io 0pf+0w 107stk+17704mem
decunix>

I also tried setting limits as you requested.  While some of the values
increased, the results were the same:

decunix> limit
cputime         unlimited
filesize        unlimited
datasize        131072 kbytes
stacksize       2048 kbytes
coredumpsize    unlimited
memoryuse       59024 kbytes
descriptors     4096 files
addressspace    1048576 kbytes

decunix> limit datasize unlimited
decunix> limit stacksize unlimited
decunix> limit memoryuse unlimited
decunix> limit descriptors unlimited
decunix> limit addressspace unlimited

decunix> limit
cputime         unlimited
filesize        unlimited
datasize        1048576 kbytes
stacksize       32768 kbytes
coredumpsize    unlimited
memoryuse       58944 kbytes
descriptors     4096 files
addressspace    1048576 kbytes

decunix> dbx -r ./DEMO

forrtl: severe (41): insufficient virtual memory
thread 0xa signal IOT/Abort trap at >*[nxm_thread_kill, 0x3ff8053eab0]  ret
    r3
1, (r26), 1
(dbx)

It still seems that it is doing something wrong with FORTRAN I/O, maybe due
to the mixture of FORTRAN and C and the order of the libraries searched
during the ld phase.

Any more suggestions?

Thanks, Ron
3363.12Provided input to Ron via CMA note #1520HYDRA::BRYANTTue Apr 22 1997 08:5427
To:[email protected]
cc:
Subject:Problem Status
--------
Ron,

I was able to reproduce the out of memory error you are getting.  I queried 
Threads Engineering to get some help on this.  There response was that since 
there are no shared libraries for the threads code, that mixing shared libraries 
with static libraries may not work (i.e. it's not supported).  It's not clear 
whether this is the source of your problem at this point, but most likely it is.

At some point in the future Digital will be releasing static libraries for the 
threads code.  In the meantime, are you able to work around this by building via 
shared?

Also, Engineering was asking about the second argument in the following:

> fms$_fork: 12 = pthread_create(0,4831836840,fms$_io,0)

> What, exactly, are you passing for the second argument here? That big integer
> is, I hope, the address of an attributes object, but your display certainly
> doesn't make that obvious.

I'll wait to hear back from you on this.
Thanks.
Pat
3363.13HYDRA::BRYANTWed Apr 23 1997 10:5939
Pat -

With the help of Bob Morgan, I was able to get this going.  The "fix"
involved the following:
1) On the FORTRAN routines, especially the main routine that executes
first, include the -reentrancy threaded compiler directive.  This is
necessary to set the "mode" of the compiler.

2) On the FORTRAN and C routines, especially during the final link step,
use the -pthread directive.  This will automatically bring in the correct
libraries.

3) FMS contained code from the fork days to make certain common blocks
shared.  This involved calling mmap to make a couple page aligned regions
shared.  For unknown reasons, this seems to confuse a stack or heap so that
when a pthread routine is called, it thinks it is out of shared memory.
Removing these calls seems to have fixed the problem.

4) As a word of caution, earlier (3.2) calls to pthread routines returned a
-1 if the call failed.  The error message was obtained from errno.  Under
the current release (4.0), pthread returns the errno value if there is an
error.  Successful calls return a 0.
FMS was coded to detect -1 as a failed code.  When the current pthread
calls failed, FMS continued.  This caused lots of false symptoms until this
was corrected.  I would expect that other programmers may have this problem
also.  It might be worth a special porting note as an alert.

At this point both the shared and non-shared versions of the library seem
to work.

In answer to your most recent question about our debug printout for the
pthread_create call, what is printed is an address for the second argument.

Thanks for your help.

Hope some of this is useful to others porting to 4.0B.

Ron
3363.14HYDRA::BRYANTWed Apr 23 1997 11:0029
Sent him asking if he is linking the same way after making all the code changes.
 Also sent him mail regarding 4).

Ron,

The threads package on Digital UNIX 3.2 follows a very early draft of what has
become POSIX threads.  On 4.0, the threads package supports the latest POSIX
draft 1003.1c.  This is why you are seeing differences.  In your documentation
set which resides on the 4.0 CD, Appendix D of the Guide to DECThreads explains
these differences.  For example, regarding error returns, the document states:

G.1.1 Error Status and Function Returns 

The new DECthreads POSIX 1003.1c interface does not use errno. (Note that
DECthreads still provides a thread-
specific errno cell for use by libraries and application code, but the 1003.1c
interface does not write to this cell.) If
an error condition occurs, a pthread routine returns an integer value indicating
the type of error. For example, a
call to the Draft 4 implementation of pthread_cond_destroy that returned a -1
and set errno to EBUSY, now
returns EBUSY as the routine return value in the current implementation. On
successful completion, most pthread
routines return a zero. 

It may be worth taking a look at the rest of the differences.

Thanks.
Pat Bryant