[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference hydra::axp-developer

Title:	Alpha Developer Support
Notice:	[email protected], 800-332-4786
Moderator:	HYDRA::SYSTEM

Created:	Mon Jun 06 1994
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	3722
Total number of notes:	11359

3363.0. "Multipath Corporation" by HYDRA::BRYANT () Wed Mar 19 1997 16:36

    Company Name :  Multipath Corporation
    Contact Name :  Ron Young
    Phone        :  (702) 831-4400 
    Fax          :  (702) 831-4401
    Email        :  [email protected]
    Date/Time in :  19-MAR-1997 16:36:07
    Entered by   :  Pat Bryant
    SPE center   :  MRO

    Category     :  UNIX
    OS Version   :  
    System H/W   :  


    Brief Description of Problem:
    -----------------------------

Please contact Bill Desimone (508) 467-2394 of DEC about our account.

Company Name: Multipath Corporation
Customer Code Number: 992344
>From:	US6RMC::"[email protected]" "Ron Young"   18-MAR-1997 19:22:21.57
>To:	<hdlite::axpdeveloper>
>CC:	"Dwight Manley" <nicctr::manley>, "Randy Doering" <wbc::doering>
>Subj:	Porting to DEC UNIX 4.0b
>
>We are currently testing the port of our application on DEC UNIX 4.0B.
>This is an application that has worked properly under version 3.2.
>
>The application performs parallel processing using calls to the pthread
>library.  We have changed these calls where appropriate to match the man
>pages provided with 4.0B.  In fact these calls now match the pthread
>routines we use for IBM and SUN.
>
>When we run the new version under 4.0B, we have observed some "strange"
>behavior.  Although we do not know exactly what the problem is, we believe
>it to be one of the following:
>
>1) Using wrong compiler switches
>
>2) Linking against the wrong libraries
>
>3) Not using the __MB properly to update memory. (Most likely)
>
>1) Compiler switches:
>The compiler switches we are using are:
>
>f77 -r8 -i8 -automatic -tune ev5 -fast -pthread
>
>For C routines not containing the __MB call:
>cc  -tune ev5 -pthread
>
>For C routines containing the __MB call:
>cc  -tune ev5 -pthread -migrate -O4 -assume noaccuracy_sensitive
>
>2) Linking:
>We have options for building the application with and without a shared
>library.  The following lists the scripts used for both cases:
>
>As a shared library:
>   ld           \
>   -shared      \
>   -o fmslib.so \
>   -all         \
>      fmslib.a  \
>   -none        \
>   fmsint.a     \
>   blas.a       \
>   -lpthread -lmach -lexc -lfor -lUfor -lots -lm -lc_r -lc \
>   -set_version fmslib.51
>#
>#  Now link the DEMO_share application using fmslib.so:
>   f77 -call_shared -o DEMO_share   \
>       demo.o                  \
>       fmsnoshr.a fmslib.so    \
>       -lpthread -lmach -lexc -lc
>
>
>Using archives directly:
>   f77 -o DEMO_noshare         \
>       demo.o                  \
>       fmsnoshr.a fmslib.a fmsint.a fmslib.a \
>       blas.a                  \
>       -lpthread -lmach -lexc -lc
>
>3)Using __MB to update memory.
>Our application is written so that whenever a thread changes shared memory
>under the protection of a mutually exclusive lock, it issues the statement
>
>__MB();
>
>before releasing the lock (code is in c).  The function __MB is typed as
>
>void __MB(void);
>
>The c routine containing this instruction is compiled using the -migrate
>switch.  It was our understanding and experience that this was necessary to
>flush and update the cache when a thread changed a value in shared memory
>that might be read later by another thread.  Under 3.2 this seems to work
>properly.  Under 4.0B, using the same procedure, we have observed threads
>reading old values AFTER the __MB instruction has been issued.
>
>Has the procedure for updating cache changed under 4.0B?
>
>
>
>In addition to __MB not working, we observe is the following:
>1) When DEMO_share or DEMO_noshare are run, and BEFORE they call
>pthread_create to start threads, a ps -m shows the following:
>
>decunix> ps -m
>  PID TTY      S           TIME CMD
>  614 ttyp1    I        0:02.61 -csh (csh)
> 1040 ttyp1    I  +     0:02.72 ./DEMO_share
>               I        0:00.22
>               I        0:00.00
>               I        0:00.05
>               I        0:02.45
>  651 ttyp2    I  +     0:00.69 -csh (csh)
> 1030 ttyp4    S        0:00.25 -csh (csh)
>decunix>
>
>It would appear that there are 3 additional threads besides DEMO_share.
>These were not present under version 3.2.  What are they?
>
>
>
>2) When we run the non shared version on a workstation, DEMO_noshare, we
>get an error
>
>forrtl. severe (41): insufficient virtual memroy
>
>as soon as the application starts.  However if we run it on a larger
>server, it gets past this point.
>
>What system resources do we need to adjust on the workstation?
>
>
>Thanks, Ron Young  
>
>+---------------------------------+----------------------------------+
>| Ron Young                       | Phone:  (702) 831-4400           |
>| Multipath Corporation           | FAX:    (702) 831-4401           |
>| P.O. Box 8210                   | E-mail: [email protected]           |
>| Incline Village, NV 89452-8210  | See Multipath's home page at     |
>| U.S.A.                          | http://www.fmslib.com            |
>+---------------------------------+----------------------------------+
>
>

T.R	Title	User	Personal Name	Date	Lines
3363.1	Response sent...	HYDRA::KENYON	The Foundation of Science...Fiction	`Fri Mar 21 1997 15:37`	237
	See answers to his questions after the "" below. -jeff From: HYDRA::AXPDEVELOPER "[email protected]" 19-MAR-1997 16:53:05.75 To: [email protected] CC: KENYON,AXPDEVELOPER Subj: RE: Problems with V4.0B and threads. >---------- >From: Ron Young[SMTP:[email protected]] >Sent: Wednesday, March 19, 1997 3:17 PM >To: Bill Desimone >Subject: Porting to DEC UNIX 4.0b > >>Date: Tue, 18 Mar 1997 16:10:16 -0800 >>To: DEC/AXP-HELP >>From: Ron Young <[email protected]> >>Subject: Porting to DEC UNIX 4.0b >>Cc: DEC/Manley,DEC/Bench/Doering >> >>We are currently testing the port of our application on DEC UNIX 4.0B. >This is an application that has worked properly under version 3.2. >> >>The application performs parallel processing using calls to the pthread >library. We have changed these calls where appropriate to match the man >pages provided with 4.0B. In fact these calls now match the pthread >routines we use for IBM and SUN. >> >>When we run the new version under 4.0B, we have observed some "strange" >behavior. Although we do not know exactly what the problem is, we believe >it to be one of the following: >> >>1) Using wrong compiler switches >> >>2) Linking against the wrong libraries >> >>3) Not using the __MB properly to update memory. (Most likely) >> >>1) Compiler switches: >>The compiler switches we are using are: >> >>f77 -r8 -i8 -automatic -tune ev5 -fast -pthread >> >>For C routines not containing the __MB call: >>cc -tune ev5 -pthread Looks OK, see below, and note you are using "DEC C" (or -migrate really). Why not use -O4 and -assume noaccuracy sensitive (or just -fast) here as well? We don't always recommend these, but you seem to be using them elsewhere. Your mileage may vary, and it is worth using -fast if you can. >> >>For C routines containing the __MB call: >>cc -tune ev5 -pthread -migrate -O4 -assume noaccuracy_sensitive >> On V4.0, you get the "-migrate" compiler with or w/o the switch (it is the default). We recommend taking off the -migrate. >>2) Linking: >>We have options for building the application with and without a shared >library. The following lists the scripts used for both cases: >> >>As a shared library: >> ld \ >> -shared \ >> -o fmslib.so \ >> -all \ >> fmslib.a \ >> -none \ >> fmsint.a \ >> blas.a \ >> -lpthread -lmach -lexc -lfor -lUfor -lots -lm -lc_r -lc \ >> -set_version fmslib.51 What you have looks generally correct, but we would like to see you use the order as the compiler does. Please link an empty module "f77 -pthread -v foo.f -o /dev/null", and note the lib order. For example, we see the FORTRAN stuff before the math stuff before OTS etc, and then the threads stuff. Please do the above, and follow it for the libs you need. One part of big interest may be the recursive versions of the libraries. Not sure which simply point to the standard library (such as libc_r pointing at libc). Hopefully you get the idea. >># >># Now link the DEMO_share application using fmslib.so: >> f77 -call_shared -o DEMO_share \ >> demo.o \ >> fmsnoshr.a fmslib.so \ >> -lpthread -lmach -lexc -lc >> >> >>Using archives directly: >> f77 -o DEMO_noshare \ >> demo.o \ >> fmsnoshr.a fmslib.a fmsint.a fmslib.a \ >> blas.a \ >> -lpthread -lmach -lexc -lc >> The above program builds should be linked -pthread, and not -lpthread, etc. Let the compiler pick the lib order when using the compiler to link. If you use ld, you must specify directly. >>3)Using __MB to update memory. >>Our application is written so that whenever a thread changes shared memory >under the protection of a mutually exclusive lock, it issues the statement >> >>__MB(); >> >>before releasing the lock (code is in c). The function __MB is typed as >> >>void __MB(void); >> >>The c routine containing this instruction is compiled using the -migrate >switch. It was our understanding and experience that this was necessary to >flush and update the cache when a thread changed a value in shared memory >that might be read later by another thread. Under 3.2 this seems to work >properly. Under 4.0B, using the same procedure, we have observed threads >reading old values AFTER the __MB instruction has been issued. >> It sounds like you are doing: grab a mutex, modify some data, issue __MB(), then release the mutex How do other threads have issued an __MB(). Thread do not have this info. ie: __MB is not explicitly flushing the cache. It only gaurantees the order in which items go to memory. Maybe on 3.2 this behaviour was the experience however. >>Has the procedure for updating cache changed under 4.0B? >> We are not sure that this changed, but the __MB() does not gaurantee the cache being flushed as noted, but does gaurantee the ordering of writes to memory. Assume thread A does the following: v1=1 v2=2 __mb() v3=3 After thread A has completed ALL of the above instuctions, it is possible that thread B will see any of the following: case1 case2 case3 case4 case5 oldv1 oldv1 newv1 newv1 newv1 oldv2 newv2 oldv2 newv2 newv2 oldv3 oldv3 oldv3 oldv3 newv3 As you can see, only if v3 is "new" can v1 and v2 be gauranteed to be "new". If you could tell us how you are using the data you are locking with a mutex, in conjunction with other data, that might help. >> >> >>In addition to __MB not working, we observe is the following: >>1) When DEMO_share or DEMO_noshare are run, and BEFORE they call >pthread_create to start threads, a ps -m shows the following: >> >>decunix> ps -m >> PID TTY S TIME CMD >> 614 ttyp1 I 0:02.61 -csh (csh) >> 1040 ttyp1 I + 0:02.72 ./DEMO_share >> I 0:00.22 >> I 0:00.00 >> I 0:00.05 >> I 0:02.45 >> 651 ttyp2 I + 0:00.69 -csh (csh) >> 1030 ttyp4 S 0:00.25 -csh (csh) >>decunix> >> >>It would appear that there are 3 additional threads besides DEMO_share. >These were not present under version 3.2. What are they? >> Are you just curious, or are you seeing more threads using more resources than you expect? On V4.0 user threads are scheduled on kernel threads. This may be the effect. Note that user threads can migrate to different kernel threads. >> >> >>2) When we run the non shared version on a workstation, DEMO_noshare, we >get an error >> >>forrtl. severe (41): insufficient virtual memroy >> >>as soon as the application starts. However if we run it on a larger >server, it gets past this point. >> I expect that the values in /etc/sysconfigtab are set differently (although by default these should NOT be different between server and WS). per-proc-data-size = 134217728 max-per-proc-data-size = 1073741824 You should be able to get around this by doing a "unlimit" in the C shell. The default size is 1GB. If you need to go higher, then you must set the above to the desired size, as well as 'max-per-proc-address-space' as well as 'vm-maxvas'. Randy Doerring knows how to do this. >>What system resources do we need to adjust on the workstation? See above ** >> >> >>Thanks, Ron Young >> >+---------------------------------+----------------------------------+ >\| Ron Young \| Phone: (702) 831-4400 \| >\| Multipath Corporation \| FAX: (702) 831-4401 \| >\| P.O. Box 8210 \| E-mail: [email protected] \| >\| Incline Village, NV 89452-8210 \| See Multipath's home page at \| >\| U.S.A. \| http://www.fmslib.com \| >+---------------------------------+----------------------------------+ > > >
3363.2	followup...	HYDRA::KENYON	The Foundation of Science...Fiction	`Fri Mar 21 1997 15:41`	56
	From: SMTP%"[email protected]" 21-MAR-1997 12:01:41.98 To: <[email protected]> CC: Subj: 3363 Port to 4.0B Thank you for your response to our earlier questions. We have incorporated your recommendations. Part of the problem was that we were testing the return code for pthread calls against -1 to detect an error instead of any value other than zero. As a result, errors which happened during lock initialization and lock usage were not detected. As a result, the threads were not synchronized properly. Now for trying to find out why the errors occurred in the first place. For reasons unknown to me, the 24th call to pthread_cond_init returns a condition code of 12, ENOMEM, not enough space. This is after 26 successful calls to pthread_mutex_init and 23 calls to pthread_cond_init. If I run the same code which was built under 3.2 on this 4.0B system, I do not get this error. Where is the space allocated? How do I provide more? Are there any system parameters I can provide which will help understand this? I don't know if this is a disk space problem or not, but here is the output from df: decunix> df Filesystem 512-blocks Used Available Capacity Mounted on /dev/rz2a 126334 108218 5482 96% / /proc 0 0 0 100% /proc /dev/rz2g 1732204 1154418 404564 75% /usr /dev/rz3g 1602502 1294320 147930 90% /rz3g decunix> The other problem we are having is that the ar command complains about not enough space. We have defined the environment variable TMPDIR to point to /usr/tmp, and this temporarily has fixed the problem. Where does ar do it's work when TMPDIR is not defined? is /dev/rz2a getting too full. We used the devault partitions on a rz28 disk and did a default installation of UNIX 4.0 (then upgraded to 4.0B). If rz2a is too full, how do we increase it? +---------------------------------+----------------------------------+ \| Ron Young \| Phone: (702) 831-4400 \| \| Multipath Corporation \| FAX: (702) 831-4401 \| \| P.O. Box 8210 \| E-mail: [email protected] \| \| Incline Village, NV 89452-8210 \| See Multipath's home page at \| \| U.S.A. \| http://www.fmslib.com \| +---------------------------------+----------------------------------+
3363.3	and my reponse...	HYDRA::KENYON	The Foundation of Science...Fiction	`Fri Mar 21 1997 15:41`	25
	From: HYDRA::AXPDEVELOPER "[email protected]" 21-MAR-1997 15:23:53.42 To: SMTP%"[email protected]" CC: AXPDEVELOPER Subj: RE: 3363 Port to 4.0B Ron, Generally one would not leave /tmp directly on the / partition. You can either mount an empty partition as /tmp, or make a link from /tmp to a directory with more space (such as /usr/tmp). I am not 100% sure, but the behaviour suggests that ar is doing its work in /tmp. Either of the above will fix this. Please send the output from 'swapon -s', 'disklabel -r /dev/rrzXa'. Do the disklabel for drives rz2c and rz3c (note the use of /dev/rr, and not /dev/r. I would not think that the ENOMEM is due to any of this in anycase, I am just interested in helping you get the system set up a bit more properly in general with regard to /tmp, swap space, etc. I will ask another person here about the real issue around where the pthread calls are allocating their space. If it is on the stack, I can help increase that, but I am not sure. Jeff Kenyon
3363.4	some more	HYDRA::BRYANT		`Mon Mar 24 1997 11:26`	17
	>ENOMEM Problem What is your kernel paramter vm_vpagemax set to? If it is set to the default which is 16384, then this is probably not large enough. This virtual memory parameter is defining the largest contiguous memory region that a threaded prgram can use. Your 3.2b memory requirements may have been at the very edge. Please double this value and rerun your application as instructed below: (as root> cd /etc cat >vpagemax.stanza vm: vm-vpagemax=32768 ^D sysconfigdb -a -f vpagemax.stanza reboot
3363.5		HYDRA::AXPDEVELOPER	Alpha Developer support	`Mon Mar 24 1997 11:38`	147
	From: SMTP%"[email protected]" 24-MAR-1997 10:20:21.27 To: [email protected] ([email protected]) CC: Subj: Re: 3363 Port to 4.0B Return-Path: [email protected] Received: by vaxsim.mro.dec.com (UCX V4.1-12, OpenVMS V6.2 VAX); Mon, 24 Mar 1997 10:20:16 -0500 Received: from genoa.tol.net by mail13.digital.com (8.7.5/UNX 1.5/1.0/WV) id JAA30870; Mon, 24 Mar 1997 09:48:47 -0500 (EST) Received: from xl5100dp (toyabe-d66.sierra.net) by genoa.tol.net with SMTP id AA21458 (5.67b8/IDA-1.5 for <[email protected]>); Mon, 24 Mar 1997 06:49:37 -0800 Message-Id: <[email protected]> X-Sender: [email protected] (Unverified) X-Mailer: Windows Eudora Pro Version 3.0 (32) Date: Mon, 24 Mar 1997 06:48:00 -0800 To: [email protected] ([email protected]) From: Ron Young <[email protected]> Subject: Re: 3363 Port to 4.0B Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Jeff - Thanks for helping to set up our system. Our system is a Alpha 3000 workstation, model 300. It has 64Mb of memory. There are 2 disks: DKA200 2.10Gb RZ28B Unix 4.0B DKA300 1.05Gb RZ36 Unix 3.2 We like to keep at least 2 versions of the operating system (current and previous) to support customers. We prever to have these on different disks so we can easily switch with a reboot. As you can see, the default installation of 4.0B on RZ28B left the h partition (874 Mb) unused. Below is the output you requested: Swap partition /dev/rz2b (default swap): Allocated space: 25088 pages (196MB) In-use space: 1 pages ( 0%) Free space: 25087 pages ( 99%) Total swap allocation: Allocated space: 25088 pages (196MB) Reserved space: 3677 pages ( 14%) In-use space: 1 pages ( 0%) Available space: 21411 pages ( 85%) # /dev/rrz2c: type: SCSI disk: RZ28B label: flags: bytes/sector: 512 sectors/track: 99 tracks/cylinder: 16 sectors/cylinder: 1376 cylinders: 2595 sectors/unit: 4110480 rpm: 5411 interleave: 1 trackskew: 13 cylinderskew: 22 headswitch: 0 # milliseconds track-to-track seek: 0 # milliseconds drivedata: 0 8 partitions: # size offset fstype [fsize bsize cpg] a: 131072 0 4.2BSD 1024 8192 16 # (Cyl. 0 - 95) b: 401408 131072 swap # (Cyl. 95- 386) c: 4110480 0 unused 0 0 # (Cyl. 0 - 2987) d: 1191936 532480 unused 0 0 # (Cyl. 386- 1253) e: 1191936 1724416 unused 0 0 # (Cyl. 1253- 2119) f: 1194128 2916352 unused 0 0 # (Cyl. 2119- 2987) g: 1787904 532480 4.2BSD 1024 8192 16 # (Cyl. 386- 1686) h: 1790096 2320384 swap # (Cyl. 1686- 2987) # /dev/rrz3c: type: SCSI disk: rz26 label: flags: bytes/sector: 512 sectors/track: 57 tracks/cylinder: 14 sectors/cylinder: 798 cylinders: 2570 sectors/unit: 2050860 rpm: 3600 interleave: 1 trackskew: 0 cylinderskew: 0 headswitch: 0 # milliseconds track-to-track seek: 0 # milliseconds drivedata: 0 8 partitions: # size offset fstype [fsize bsize cpg] a: 131072 0 4.2BSD 1024 8192 16 # (Cyl. 0 - 164) b: 263168 131072 unused 1024 8192 # (Cyl. 164- 494) c: 2050860 0 unused 1024 8192 # (Cyl. 0 - 2569) d: 262144 99415 unused 1024 8192 # (Cyl. 124- 453) e: 262144 99415 unused 1024 8192 # (Cyl. 124- 453) f: 262144 99415 unused 1024 8192 # (Cyl. 124- 453) g: 1656620 394240 4.2BSD 1024 8192 16 # (Cyl. 494- 2569) h: 262144 99415 unused 1024 8192 # (Cyl. 124- 453) At 03:09 PM 3/21/97 -0500, you wrote: >Ron, > >Generally one would not leave /tmp directly on the / partition. You can >either mount an empty partition as /tmp, or make a link from /tmp to >a directory with more space (such as /usr/tmp). I am not 100% sure, but >the behaviour suggests that ar is doing its work in /tmp. Either of the >above will fix this. > >Please send the output from 'swapon -s', 'disklabel -r /dev/rrzXa'. Do >the disklabel for drives rz2c and rz3c (note the use of /dev/rr, and not >/dev/r. > >I would not think that the ENOMEM is due to any of this in anycase, I am >just interested in helping you get the system set up a bit more properly >in general with regard to /tmp, swap space, etc. I will ask another >person here about the real issue around where the pthread calls are >allocating their space. If it is on the stack, I can help increase that, >but I am not sure. > >Jeff Kenyon > > +---------------------------------+----------------------------------+ \| Ron Young \| Phone: (702) 831-4400 \| \| Multipath Corporation \| FAX: (702) 831-4401 \| \| P.O. Box 8210 \| E-mail: [email protected] \| \| Incline Village, NV 89452-8210 \| See Multipath's home page at \| \| U.S.A. \| http://www.fmslib.com \| +---------------------------------+----------------------------------+
3363.6		HYDRA::BRYANT		`Tue Mar 25 1997 16:40`	184
	Jeff Kenyon & Pat Bryant - PROBLEMS: ========= 1) call to pthread_cond_init fails with return code 12, not enough space 2) ar fails with message /: write failed, file system is full ar: error writing archive member contents: Error 0 STATUS: ======= During the last couple days you have requested additional information and offered suggestions. I have provided the information and tried the suggestions. However I am still having the problems. MACHINE DESCRIPTION: ==================== 1) Model: Alpha 3000, model 300 2) Physical Memory: 64Mb 3) Virtual memory: (Output from /sbin/sysconfig -q vm) vm: ubc-minpercent = 10 ubc-maxpercent = 100 ubc-borrowpercent = 20 ubc-maxdirtywrites = 5 ubc-nfsloopback = 0 vm-max-wrpgio-kluster = 32768 vm-max-rdpgio-kluster = 16384 vm-cowfaults = 4 vm-mapentries = 200 vm-maxvas = 1073741824 vm-maxwire = 16777216 vm-heappercent = 7 vm-vpagemax = 32768 <- NOTE: This was increased according to your suggestions vm-segmentation = 1 vm-ubcpagesteal = 24 vm-ubcdirtypercent = 10 vm-ubcseqstartpercent = 50 vm-ubcseqpercent = 10 vm-csubmapsize = 1048576 vm-ubcbuffers = 256 vm-syncswapbuffers = 128 vm-asyncswapbuffers = 4 vm-clustermap = 1048576 vm-clustersize = 65536 vm-zone_size = 0 vm-kentry_zone_size = 16777216 vm-syswiredpercent = 80 vm-inswappedmin = 1 vm-page-free-target = 128 vm-page-free-min = 20 vm-page-free-reserved = 10 vm-page-free-optimal = 74 vm-page-prewrite-target = 256 dump-user-pte-pages = 0 kernel-stack-guard-pages = 1 vm-min-kernel-address = 18446744071562067968 contig-malloc-percent = 20 vm-aggressive-swap = 0 new-wire-method = 1 vm-segment-cache-max = 50 vm-page-lock-count = 0 gh-chunks = 0 gh-min-seg-size = 8388608 gh-fail-if-no-mem = 1 3) Disks: DKA200 2.10Gb RZ28B Unix 4.0B <- Default installation DKA300 1.05Gb RZ36 Unix 3.2 <- Works fine (Output from disklabel -r /dev/rrz2c) # /dev/rrz2c: type: SCSI disk: RZ28B label: flags: bytes/sector: 512 sectors/track: 99 tracks/cylinder: 16 sectors/cylinder: 1376 cylinders: 2595 sectors/unit: 4110480 rpm: 5411 interleave: 1 trackskew: 13 cylinderskew: 22 headswitch: 0 # milliseconds track-to-track seek: 0 # milliseconds drivedata: 0 8 partitions: # size offset fstype [fsize bsize cpg] a: 131072 0 4.2BSD 1024 8192 16 # (Cyl. 0 - 95) b: 401408 131072 swap # (Cyl. 95- 386) c: 4110480 0 unused 0 0 # (Cyl. 0 - 2987) d: 1191936 532480 unused 0 0 # (Cyl. 386- 1253) e: 1191936 1724416 unused 0 0 # (Cyl. 1253- 2119) f: 1194128 2916352 unused 0 0 # (Cyl. 2119- 2987) g: 1787904 532480 4.2BSD 1024 8192 16 # (Cyl. 386- 1686) h: 1790096 2320384 swap # (Cyl. 1686- 2987) (Output from disklabel -r /dev/rrz3c) # /dev/rrz3c: type: SCSI disk: rz26 label: flags: bytes/sector: 512 sectors/track: 57 tracks/cylinder: 14 sectors/cylinder: 798 cylinders: 2570 sectors/unit: 2050860 rpm: 3600 interleave: 1 trackskew: 0 cylinderskew: 0 headswitch: 0 # milliseconds track-to-track seek: 0 # milliseconds drivedata: 0 8 partitions: # size offset fstype [fsize bsize cpg] a: 131072 0 4.2BSD 1024 8192 16 # (Cyl. 0 - 164) b: 263168 131072 unused 1024 8192 # (Cyl. 164- 494) c: 2050860 0 unused 1024 8192 # (Cyl. 0 - 2569) d: 262144 99415 unused 1024 8192 # (Cyl. 124- 453) e: 262144 99415 unused 1024 8192 # (Cyl. 124- 453) f: 262144 99415 unused 1024 8192 # (Cyl. 124- 453) g: 1656620 394240 4.2BSD 1024 8192 16 # (Cyl. 494- 2569) h: 262144 99415 unused 1024 8192 # (Cyl. 124- 453) 4) Swap space: (Output from swapon -s) Swap partition /dev/rz2b (default swap): Allocated space: 25088 pages (196MB) In-use space: 1 pages ( 0%) Free space: 25087 pages ( 99%) Total swap allocation: Allocated space: 25088 pages (196MB) Reserved space: 3677 pages ( 14%) In-use space: 1 pages ( 0%) Available space: 21411 pages ( 85%) QUESTIONS: ========== 1) What resources are allocated by pthread_cond_init? 2) How do I provide more? 3) How do I move the /tmp directory used by ar to a different location. I can temporarily do this by defining the environment variable TMPDIR to /usr/tmp. However I would like a permanent solution. NOTES: ====== 1) When the application is first built with a shared object library, the error messages from pthread_cond_init do not occur and the application runs properly. 2) When I increased vm-vpagemax from 16384 to 32768 per your suggestions, it did not make any difference. We have a 12 processor system waiting to run tests with this software. Please let me know as soon as possible when you have any additional suggestions. Thanks, Ron Young +---------------------------------+----------------------------------+ \| Ron Young \| Phone: (702) 831-4400 \| \| Multipath Corporation \| FAX: (702) 831-4401 \| \| P.O. Box 8210 \| E-mail: [email protected] \| \| Incline Village, NV 89452-8210 \| See Multipath's home page at \| \| U.S.A. \| http://www.fmslib.com \| +---------------------------------+----------------------------------+
3363.7	Sent Ron the following:	HYDRA::BRYANT		`Wed Mar 26 1997 12:18`	14
	According to your disklabel, you don't have another free partition to create a /tmp directory. I don't understand why you don't want ar to use /usr/tmp. Your /usr partition appears to be large enough. Setting TMPDIR to /usr/tmp works with ar, correct? As far as the ENOMEM error goes, is it possible for you to do a sysconfig -q proc and sysconfig -q vm on both OS'es to determine is there is a gross difference between values set on 3.2 and those set on 4.0. You can dual boot, correct? I suspect your problem has to do with one of these settings not being high enough. Feel free to send me the 4.0 settings. Thanks. Pat Bryant Alpha Developer Support
3363.8		HYDRA::BRYANT		`Wed Mar 26 1997 15:10`	212
	At 12:18 PM 3/26/97 -0500, you wrote: >According to your disklabel, you don't have another free partition to create a >/tmp directory. I don't understand why you don't want ar to use /usr/tmp. Your >/usr partition appears to be large enough. Setting TMPDIR to /usr/tmp works >with ar, correct? This does work. Is there any way to make this permanent instead of having to set TMPDIR on each login? Is there any way to glue the g and h partitions together, or is it too late? Also do I really need to use the h partition for scratch or is the b partition large enough? The 3.2 system worked with a swap space the size of b. > >As far as the ENOMEM error goes, is it possible for you to do a sysconfig -q >proc and sysconfig -q vm on both OS'es to determine is there is a gross >difference between values set on 3.2 and those set on 4.0. You can dual boot, >correct? I suspect your problem has to do with one of these settings not being >high enough. Feel free to send me the 4.0 settings. Below is the output you requested for both systems: Unix 4.0B sysconfig -q proc =========================== proc: max-proc-per-user = 64 max-threads-per-user = 256 per-proc-stack-size = 2097152 max-per-proc-stack-size = 33554432 per-proc-data-size = 134217728 max-per-proc-data-size = 1073741824 max-per-proc-address-space = 1073741824 per-proc-address-space = 1073741824 autonice = 0 autonice-time = 600 autonice-penalty = 4 open-max-soft = 4096 open-max-hard = 4096 ncallout_alloc_size = 8192 round-robin-switch-rate = 0 round_robin_switch_rate = 0 sched-min-idle = 0 sched_min_idle = 0 give-boost = 1 give_boost = 1 maxusers = 32 task-max = 277 thread-max = 552 num-wait-queues = 64 Unix 3.2B sysconfig -q proc =========================== max-proc-per-user = 64 max-threads-per-user = 256 per-proc-stack-size = 2097152 max-per-proc-stack-size = 33554432 per-proc-data-size = 134217728 max-per-proc-data-size = 1073741824 max-per-proc-address-space = 1073741824 per-proc-address-space = 1073741824 autonice = 0 open-max-soft = 4096 open-max-hard = 4096 ncallout = 284 ncallout_alloc_size = 8192 round-robin-switch-rate = 0 round_robin_switch_rate = 0 sched-min-idle = 0 sched_min_idle = 0 give-boost = 1 give_boost = 1 Unix 4.0B sysconfig -q vm ========================= vm: ubc-minpercent = 10 ubc-maxpercent = 100 ubc-borrowpercent = 20 ubc-maxdirtywrites = 5 ubc-nfsloopback = 0 vm-max-wrpgio-kluster = 32768 vm-max-rdpgio-kluster = 16384 vm-cowfaults = 4 vm-mapentries = 200 vm-maxvas = 1073741824 vm-maxwire = 16777216 vm-heappercent = 7 vm-vpagemax = 32768 vm-segmentation = 1 vm-ubcpagesteal = 24 vm-ubcdirtypercent = 10 vm-ubcseqstartpercent = 50 vm-ubcseqpercent = 10 vm-csubmapsize = 1048576 vm-ubcbuffers = 256 vm-syncswapbuffers = 128 vm-asyncswapbuffers = 4 vm-clustermap = 1048576 vm-clustersize = 65536 vm-zone_size = 0 vm-kentry_zone_size = 16777216 vm-syswiredpercent = 80 vm-inswappedmin = 1 vm-page-free-target = 128 vm-page-free-min = 20 vm-page-free-reserved = 10 vm-page-free-optimal = 74 vm-page-prewrite-target = 256 dump-user-pte-pages = 0 kernel-stack-guard-pages = 1 vm-min-kernel-address = 18446744071562067968 contig-malloc-percent = 20 vm-aggressive-swap = 0 new-wire-method = 1 vm-segment-cache-max = 50 vm-page-lock-count = 0 gh-chunks = 0 gh-min-seg-size = 8388608 gh-fail-if-no-mem = 1 Unix 3.2B sysconfig -q vm ========================= ubc-minpercent = 10 ubc-maxpercent = 100 ubc-borrowpercent = 20 ubc-maxdirtywrites = 5 vm-max-wrpgio-kluster = 32768 vm-max-rdpgio-kluster = 16384 vm-cowfaults = 4 vm-mapentries = 200 vm-maxvas = 1073741824 vm-maxwire = 16777216 vm-heappercent = 7 vm-vpagemax = 16384 vm-segmentation = 1 vm-ubcpagesteal = 24 vm-ubcdirtypercent = 10 vm-ubcseqstartpercent = 50 vm-ubcseqpercent = 10 vm-csubmapsize = 1048576 vm-ubcbuffers = 256 vm-syncswapbuffers = 128 vm-asyncswapbuffers = 4 vm-clustermap = 1048576 vm-clustersize = 65536 vm-zone_size = 0 vm-kentry_zone_size = 16777216 vm-syswiredpercent = 80 vm-inswappedmin = 1 vm-page-free-target = 128 vm-page-free-min = 20 vm-page-free-reserved = 10 vm-page-free-optimal = 74 vm-page-prewrite-target = 256 dump-user-pte-pages = 0 kernel-stack-guard-pages = 1 vm-min-kernel-address = 18446744071562067968 contig-malloc-percent = 20 vm-aggressive-swap = 0 new-wire-method = 0 vm-segment-cache-max = 50 vm-nowait-memalloc = 0 I don't know what all these do. Under proc, 4.0B has the following values not listed for 3.2 autonice-time = 600 autonice-penalty = 4 maxusers = 32 task-max = 277 thread-max = 552 num-wait-queues = 64 and 3.2 has the following value not listed under 4.0B: ncallout = 284 The other values under proc seem to be the same. Under vm, 4.0B has the following values not listed for 3.2 vm-page-lock-count = 0 gh-chunks = 0 gh-min-seg-size = 8388608 gh-fail-if-no-mem = 1 and 3.2 has the following value not listed under 4.0B: new-wire-method = 0 In addition, the value of vm-vpagemax was set to 32768 under 4.0B and is at the default value of 16384 for 3.2. Other than that, the values seem to be the same. Hope this helps, Ron Young +---------------------------------+----------------------------------+ \| Ron Young \| Phone: (702) 831-4400 \| \| Multipath Corporation \| FAX: (702) 831-4401 \| \| P.O. Box 8210 \| E-mail: [email protected] \| \| Incline Village, NV 89452-8210 \| See Multipath's home page at \| \| U.S.A. \| http://www.fmslib.com \| +---------------------------------+----------------------------------+
3363.9	followup...	HYDRA::KENYON	The Foundation of Science...Fiction	`Fri Mar 28 1997 09:12`	251
	From: SMTP%"[email protected]" 27-MAR-1997 19:07:07.82 To: [email protected] ([email protected]) CC: Subj: Re: Have you tried setting limits to unlimited? Pat - Jeff Kenyon helped me set up the system so there is enough swap space and a /tmp file large enough to hold the work of ar. Therefore those issues are resolved. The remaining issue is trying to get the threads working properly under 4.0B. There is still something strange happening which I cannot figure out. This application uses FORTRAN 77 for the math and c for all system related services, including all pthread calls. I believe that the libraries being collected during loading may be in the wrong order. When I link first using a shared object as follows, it works fine, including the threads. #!/bin/ksh # This script produces a shareable object library for FMS: # if [ "$TARGET" = "ev4" ] then BLAS="/usr/opt/XMDLOA331/dxml/libdxml_ev4.a " elif [ "$TARGET" = "ev5" ] then BLAS="/usr/opt/XMDLOA331/dxml/libdxml_ev5.a " fi LIBS1="-lUfor -lfor -lm " LIBS3="-lpthread -lmach -lexc -lc" LIBS="$LIBS1$LIBS3" ld \ -shared \ -o fmslib.so \ -all \ fmslib.a \ -none \ fmsint.a \ $BLAS$LIBS \ -set_version fmslib.51 Then I link the DEMO application using fmslib.so as follows: f77 -call_shared -o DEMO demo.o \ fmsnoshr.a fmslib.so However, when I try and link using the object libraries directly, it fails. I get different behavior depending on if I use (-lpthread -lmach -lexc -lc) or -pthread. Also I get different behavior if I print out a line from the FORTRAN application before the first c routine is called. The following 4 cases illustrate the problems I am having. CASE1: (Linking with -lpthread -lmach -lexc -lc and no print output) ====== decunix> cat linkdbg f77 -v -o DEMO demo.o \ fmsnoshr.a fmslib.a fmsint.a fmslib.a blas.a \ -lpthread -lmach -lexc -lc decunix> ./linkdbg /usr/bin/cc -v -o DEMO /usr/lib/cmplrs/fort/for_main.o -O4 demo.o fmsnoshr.a fms lib.a fmsint.a fmslib.a blas.a -lpthread -lmach -lexc -lc -lUfor -lfor -lFutil - lm_4sqrt -lm -lots /usr/lib/cmplrs/cc/ld -o DEMO -g0 -O4 -call_shared /usr/lib/cmplrs/cc/crt0.o /us r/lib/cmplrs/fort/for_main.o demo.o fmsnoshr.a fmslib.a fmsint.a fmslib.a blas.a -lpthread -lmach -lexc -lc -lUfor -lfor -lFutil -lm_4sqrt -lm -lots -lc /usr/lib/cmplrs/cc/ld: 1.73u 1.83s 0:08 41% 0+106k 0+419io 0pf+0w 106stk+17464mem decunix> ./DEMO forrtl: severe (41): insufficient virtual memory decunix> This case terminates with insufficient virtual memroy. CASE2: (Linking with -pthread and no print output. NOTE: This seems to move the order of the pthread libraries in the ld phase to the end.) ====== decunix> cat linkdbg1 f77 -v -o DEMO demo.o \ fmsnoshr.a fmslib.a fmsint.a fmslib.a blas.a \ -pthread decunix> ./linkdbg1 /usr/bin/cc -v -o DEMO -pthread /usr/lib/cmplrs/fort/for_main.o -O4 demo.o fmsno shr.a fmslib.a fmsint.a fmslib.a blas.a -lUfor -lfor -lFutil -lm_4sqrt -lm -lots /usr/lib/cmplrs/cc/ld -o DEMO -g0 -O4 -call_shared /usr/lib/cmplrs/cc/crt0.o /us r/lib/cmplrs/fort/for_main.o demo.o fmsnoshr.a fmslib.a fmsint.a fmslib.a blas.a -qlUfor_r -lUfor -qlfor_r -lfor -qlFutil_r -lFutil -qlm_4sqrt_r -lm_4sqrt -qlm_ r -lm -qlots_r -lots -lpthread -lmach -lexc -lc /usr/lib/cmplrs/cc/ld: 1.85u 1.79s 0:08 42% 0+100k 0+419io 0pf+0w 100stk+15928mem decunix> ./DEMO ---------------------------------------- FMS VERSION 5.2-01 Built on 3/26/1997 FMS is a licensed product of: Multipath Corporation (702) 831-4400 http://www.fmslib.com ---------------------------------------- FMS52-d-450-012 IS LICENSED TO: MULTIPATH DEMONSTRATION ---------------------------------------- Date = 27-MAR-1997 Time = 14:51:18 FMS License will expire on.............=30-JUN-1997 Default MAXMD parameter.....(R8 words)= 2097152 Default MAXCPU parameter...............= 1 Memory for FMS...............(R8 words)= 2097152 fms$_semini: 12 = pthread_cond_init (68992,0) ********************** * FATAL ERROR IN FMS * * FMS$ERR_SYSTEM * ******************** IOSTAT PARAMETER = 12 System Error Condition = 12, Not enough space This gets further, then dies when trying to allocate a condition variable. (NOTE: there were a lot of pthread calls before this one that failed that worked OK). CASE3: (Linking with printed output and -lpthread -lmach -lexc -lc) ====== decunix> cat linkdbg2 f77 -v -o DEMO demo.o \ inicom.dbg \ fmsnoshr.a fmslib.a fmsint.a fmslib.a blas.a \ -lpthread -lmach -lexc -lc decunix> ./linkdbg2 /usr/bin/cc -v -o DEMO /usr/lib/cmplrs/fort/for_main.o -O4 demo.o inicom.dbg fmsno shr.a fmslib.a fmsint.a fmslib.a blas.a -lpthread -lmach -lexc -lc -lUfor -lfor -l Futil -lm_4sqrt -lm -lots /usr/lib/cmplrs/cc/ld -o DEMO -g0 -O4 -call_shared /usr/lib/cmplrs/cc/crt0.o /usr/ lib/cmplrs/fort/for_main.o demo.o inicom.dbg fmsnoshr.a fmslib.a fmsint.a fmslib.a blas.a -lpthread -lmach -lexc -lc -lUfor -lfor -lFutil -lm_4sqrt -lm -lots -lc /usr/lib/cmplrs/cc/ld: 1.79u 1.95s 0:08 42% 0+105k 0+428io 0pf+0w 105stk+17648mem decunix> ./DEMO decunix> This one just dies without any message. Depending on factors I do not understand, I have seen this one run and then die when the first thread was created. CASE4: (Printed output and -pthread) ====== decunix> cat linkdbg3 f77 -v -o DEMO demo.o \ inicom.dbg \ fmsnoshr.a fmslib.a fmsint.a fmslib.a blas.a \ -pthread decunix> ./linkdbg3 /usr/bin/cc -v -o DEMO -pthread /usr/lib/cmplrs/fort/for_main.o -O4 demo.o inicom. dbg fmsnoshr.a fmslib.a fmsint.a fmslib.a blas.a -lUfor -lfor -lFutil -lm_4sqrt -l m -lots /usr/lib/cmplrs/cc/ld -o DEMO -g0 -O4 -call_shared /usr/lib/cmplrs/cc/crt0.o /usr/ lib/cmplrs/fort/for_main.o demo.o inicom.dbg fmsnoshr.a fmslib.a fmsint.a fmslib.a blas.a -qlUfor_r -lUfor -qlfor_r -lfor -qlFutil_r -lFutil -qlm_4sqrt_r -lm_4sqrt -qlm_r -lm -qlots_r -lots -lpthread -lmach -lexc -lc /usr/lib/cmplrs/cc/ld: 1.91u 1.90s 0:08 42% 0+99k 0+429io 0pf+0w 99stk+16096mem decunix> ./DEMO FMS$_INICOM: Returned from FMS$_COMSHR FMS$_INICOM: End of initializing FMSTST FMS$_INICOM: Output from getrlimit: rlim_cur =2147483647 rlim_max =2147483647 FMS$_INICOM: Output from getrlimit: rlim_cur = 60440576 rlim_max = 60440576 FMS$_INICOM: End of initializing FMSCOM FMS$_INICOM: End of initializing FMSR8 FMS$_INICOM: End of initializing FMSCHR INICOM: IENTER = 79 INICOM: NENTER = 1 FMS$_INICOM: End of initializing FMSDAT INICOM: IENTER = 79 INICOM: NENTER = 1 FMS$_INICOM: End of initializing FMSDCH FMS$_INICOM: Start of initializing FMSFIL INICOM: IENTER = 79 INICOM: NENTER = 1 FMS$_INICOM: End of initializing FMSFIL ---------------------------------------- FMS VERSION 5.2-01 Built on 3/26/1997 FMS is a licensed product of: Multipath Corporation (702) 831-4400 http://www.fmslib.com ---------------------------------------- ******************** * FATAL ERROR IN FMS * * FMS$ERR_LICENSE * ********************** SOFTWARE LICENSE VIOLATION. ERROR OPENING AUTHORIZATION FILE. You must have a file named FMSLIC.52. This file can be in your default directory or in a directory pointed to by the environment variable FMS_LICENSE decunix> This one shows the printed output then dies on a FORTRAN OPEN statement when it tries to open a file (which I know exists but it can't find). I don't know what to try next. I know the problem is not in the code because the shared object version works. ANY suggestions you may have will be appreciated. Thanks, Ron +---------------------------------+----------------------------------+ \| Ron Young \| Phone: (702) 831-4400 \| \| Multipath Corporation \| FAX: (702) 831-4401 \| \| P.O. Box 8210 \| E-mail: [email protected] \| \| Incline Village, NV 89452-8210 \| See Multipath's home page at \| \| U.S.A. \| http://www.fmslib.com \| +---------------------------------+----------------------------------+
3363.10	Looking for the culprit	HYDRA::BRYANT		`Wed Apr 02 1997 11:54`	23
	Ron, The 'insufficient virtual memory' error you are experiencing is coming from RTL malloc returning a zero. I wonder if you are doing an UNFORMATTED OPEN and performing reads on an ASCII file. In this case, since files created with FORTRAN UNFORMATTED I/O have a byte count as the first part of each record, when you try to read an ASCII file with UNFORMATTED I/O, the byte count will be garbage or possibly a very large number. Could this be happening to you and it just wasn't detected on 3.0? In any case, do a %setenv f77_dump_flag Y and rerun DEMO. This will cause all fatal FORRTL errors to envoke abort() which produces a core dump. Then use dbx to determine which routine is calling malloc. e-mail back this information and let me know the results. I'm also interested if you were able to increase any of the parameters using the limit command. Thanks. Pat Bryant
3363.11	Response back	HYDRA::BRYANT		`Wed Apr 02 1997 16:44`	132
	Pat - I set f77_dump_flag Y as you requested. The code is failing as soon as it is trying to perform FORTRAN I/O, which I have reduced down to a print statement. This is the first I/O performed in the code. Below is the output: decunix> dbx -r ./DEMO forrtl: severe (41): insufficient virtual memory thread 0xa signal IOT/Abort trap at >[nxm_thread_kill, 0x3ff8053eab0] ret r3 1, (r26), 1 (dbx) where > 0 nxm_thread_kill(0x4, 0x140150860, 0x3ff80193d3c, 0x980, 0x14015c018) [0x3ff80 53eab0] 1 pthread_kill(0x3ffc0082590, 0x20, 0x0, 0x0, 0x11fffffb5) [0x3ff8056ed4c] 2 (unknown)() [0x3ff805756ec] 3 __tis_raise(0x11fffffb5, 0x3ffc0080310, 0x3ff8010fb04, 0x3ffc0080c50, 0x3ff80 159f44) [0x3ff8010fb00] 4 raise(0x3ff8010fb04, 0x3ffc0080c50, 0x3ff80159f44, 0x3ff80575618, 0x3ff80170a 6c) [0x3ff80159f40] 5 abort(0x3ffc0560c30, 0x3ffc05655d0, 0x3ff80d13180, 0x0, 0x600000000) [0x3ff80 170a68] 6 for__issue_diagnostic(0x29, 0x2, 0x6, 0x11ffff830, 0x0) [0x3ff80d0b614] 7 for__io_return(0x0, 0x0, 0x0, 0x0, 0x0) [0x3ff80d0baec] 8 for_write_seq_lis(0x3ffc00802a0, 0x140142a00, 0x11ffffca0, 0x120009fd0, 0x140 02f760) [0x3ff80d4b0bc] 9 fms$_fmsaut(NOWDAT = [1] 2 [2] 4 [3] 1997 , NOWTIM = [1] 10 [2] 47 [3] 0 , SERIAL = 0.0) ["d5/fmsaut.f":4, 0x1200165bc] 10 fms$_fmsini(0x0, 0x474e414c, 0x400000002, 0xa000007cd, 0x2f) ["d5/fmsini2.f": 1951, 0x12001a9c4] 11 fmsini(0x120016900, 0x120016940, 0x120016980, 0x1200169c0, 0x120016a10) ["d5/ fmsini.f":1716, 0x1200166a4] 12 demo(0x120016980, 0x1200169c0, 0x120016a10, 0x8008460d, 0x1200164e8) ["d5/dem o.f":2, 0x12001653c] 13 main() ["for_main.c":203, 0x1200164e4] (dbx) quit decunix> cat fmsaut.f SUBROUTINE FMS$_FMSAUT (NOWDAT, NOWTIM, SERIAL) INTEGER4 NOWDAT(3), NOWTIM(3) REAL8 SERIAL print ,'Hello' <--- This is where it fails return end decunix> This is what I am using to link DEMO: decunix> cat linkdbg f77 -v -o DEMO demo.o \ fmsaut.dbg \ fmsnoshr.a fmslib.a fmsint.a fmslib.a blas.a \ -lpthread -lmach -lexc -lc decunix> ./linkdbg /usr/bin/cc -v -o DEMO /usr/lib/cmplrs/fort/for_main.o -O4 demo.o fmsaut.dbg fmsno shr.a fmslib.a fmsint.a fmslib.a blas.a -lpthread -lmach -lexc -lc -lUfor -lfor -l Futil -lm_4sqrt -lm -lots /usr/lib/cmplrs/cc/ld -o DEMO -g0 -O4 -call_shared /usr/lib/cmplrs/cc/crt0.o /usr/ lib/cmplrs/fort/for_main.o demo.o fmsaut.dbg fmsnoshr.a fmslib.a fmsint.a fmslib.a blas.a -lpthread -lmach -lexc -lc -lUfor -lfor -lFutil -lm_4sqrt -lm -lots -lc /usr/lib/cmplrs/cc/ld: 1.80u 1.90s 0:09 38% 0+107k 0+429io 0pf+0w 107stk+17704mem decunix> I also tried setting limits as you requested. While some of the values increased, the results were the same: decunix> limit cputime unlimited filesize unlimited datasize 131072 kbytes stacksize 2048 kbytes coredumpsize unlimited memoryuse 59024 kbytes descriptors 4096 files addressspace 1048576 kbytes decunix> limit datasize unlimited decunix> limit stacksize unlimited decunix> limit memoryuse unlimited decunix> limit descriptors unlimited decunix> limit addressspace unlimited decunix> limit cputime unlimited filesize unlimited datasize 1048576 kbytes stacksize 32768 kbytes coredumpsize unlimited memoryuse 58944 kbytes descriptors 4096 files addressspace 1048576 kbytes decunix> dbx -r ./DEMO forrtl: severe (41): insufficient virtual memory thread 0xa signal IOT/Abort trap at >*[nxm_thread_kill, 0x3ff8053eab0] ret r3 1, (r26), 1 (dbx) It still seems that it is doing something wrong with FORTRAN I/O, maybe due to the mixture of FORTRAN and C and the order of the libraries searched during the ld phase. Any more suggestions? Thanks, Ron
3363.12	Provided input to Ron via CMA note #1520	HYDRA::BRYANT		`Tue Apr 22 1997 07:54`	27
	To:[email protected] cc: Subject:Problem Status -------- Ron, I was able to reproduce the out of memory error you are getting. I queried Threads Engineering to get some help on this. There response was that since there are no shared libraries for the threads code, that mixing shared libraries with static libraries may not work (i.e. it's not supported). It's not clear whether this is the source of your problem at this point, but most likely it is. At some point in the future Digital will be releasing static libraries for the threads code. In the meantime, are you able to work around this by building via shared? Also, Engineering was asking about the second argument in the following: > fms$_fork: 12 = pthread_create(0,4831836840,fms$_io,0) > What, exactly, are you passing for the second argument here? That big integer > is, I hope, the address of an attributes object, but your display certainly > doesn't make that obvious. I'll wait to hear back from you on this. Thanks. Pat
3363.13		HYDRA::BRYANT		`Wed Apr 23 1997 09:59`	39
	Pat - With the help of Bob Morgan, I was able to get this going. The "fix" involved the following: 1) On the FORTRAN routines, especially the main routine that executes first, include the -reentrancy threaded compiler directive. This is necessary to set the "mode" of the compiler. 2) On the FORTRAN and C routines, especially during the final link step, use the -pthread directive. This will automatically bring in the correct libraries. 3) FMS contained code from the fork days to make certain common blocks shared. This involved calling mmap to make a couple page aligned regions shared. For unknown reasons, this seems to confuse a stack or heap so that when a pthread routine is called, it thinks it is out of shared memory. Removing these calls seems to have fixed the problem. 4) As a word of caution, earlier (3.2) calls to pthread routines returned a -1 if the call failed. The error message was obtained from errno. Under the current release (4.0), pthread returns the errno value if there is an error. Successful calls return a 0. FMS was coded to detect -1 as a failed code. When the current pthread calls failed, FMS continued. This caused lots of false symptoms until this was corrected. I would expect that other programmers may have this problem also. It might be worth a special porting note as an alert. At this point both the shared and non-shared versions of the library seem to work. In answer to your most recent question about our debug printout for the pthread_create call, what is printed is an address for the second argument. Thanks for your help. Hope some of this is useful to others porting to 4.0B. Ron
3363.14		HYDRA::BRYANT		`Wed Apr 23 1997 10:00`	29
	Sent him asking if he is linking the same way after making all the code changes. Also sent him mail regarding 4). Ron, The threads package on Digital UNIX 3.2 follows a very early draft of what has become POSIX threads. On 4.0, the threads package supports the latest POSIX draft 1003.1c. This is why you are seeing differences. In your documentation set which resides on the 4.0 CD, Appendix D of the Guide to DECThreads explains these differences. For example, regarding error returns, the document states: G.1.1 Error Status and Function Returns The new DECthreads POSIX 1003.1c interface does not use errno. (Note that DECthreads still provides a thread- specific errno cell for use by libraries and application code, but the 1003.1c interface does not write to this cell.) If an error condition occurs, a pthread routine returns an integer value indicating the type of error. For example, a call to the Draft 4 implementation of pthread_cond_destroy that returned a -1 and set errno to EBUSY, now returns EBUSY as the routine return value in the current implementation. On successful completion, most pthread routines return a zero. It may be worth taking a look at the rest of the differences. Thanks. Pat Bryant