Title: | DECC |
Notice: | General DEC C discussions |
Moderator: | TLE::D_SMITH N TE |
Created: | Fri Nov 13 1992 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 2212 |
Total number of notes: | 11045 |
I've spent about 6 hours trying to proove to myself that I've done something wrong, but I can't find the source of this problem. Compiling with and without optimization has no effect. It's a very interesting problem: two programs that given the same intial conditions, evaluate a conditional differently even though the only difference between the two programs is one line of code that occurs after the conditional. The program is designed to execute simultaneously with multiple instances. A mailbox is used to synchronize execution between a master and any number of slaves. Master/Slave mode and synchronization is controlled in the procedures iohammer_master.com and iohammer_slave.com by setteing the value of sync = the number of slaves + 1 in iohammer_master.com and - that sum in iohammer_slave.com. The copies of these two procedures that I've made available are set up to run 1 master and only 1 slave. The behavior shown below still occurs with the tests set up for 2 slaves executing as well. I checked relevant variables in the program and all appear to be initialized so what is getting screwed up is beyond me. The odd thing is that if this were a timing issue, intuitively, one might assume that the slower version, which accesses the resource for a longer period, would actually be the version that fails, but it doesn't, it works while the other "faster" version is the one that fails. WARNING!!! Running the program in the debugger eliminates the problem, but will crash your system if you do not quit out of the debugger starting the actual testing which actually occurs around line 36504 with a sys$cmkrnl(execute_... I set my break at line 36359 of a nonoptimized debug version ;2 of the program and the problem vanished. rtn_status returned a normal value in both the master and the slave. Typing go to see what would happen beyond this point was a big mistake: the system crash dumped. Quitting near or at this point you should have no problem. I've summarized the differences in the code below and made sources, .lis, procedures and a testfile. available for your perusal. Good luck, should you choose to accept this assignment. - Mark 381-1556 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ Select code of interest @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ Version 1: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ /* %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% */ /* Open the test file with RMS and capture important stuff */ /* %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% */ fab = &outfab; /* set fab pointer */ rtn_status = sys$open ( fab ); /* open the file */ if (!(rtn_status & 1)) /* error if low bit clear */ { printf("Failure to open test file: %s\n to get file stats. rtn_status = %lu", test_file_name,rtn_status); sys$exit(rtn_status); /* exit if so */ } else printf("Success opening test file: %s\n to get file stats. rtn_status = %lu", test_file_name,rtn_status); /* dir_id = outnam.nam$w_did; capture the directory id */ file_size = outfab.fab$l_alq; /* get highest vbn in file */ last_record = file_size; rtn_status = sys$close ( fab ); /* close the file */ /* %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% */ /* The following code will be executed if testing Fast IO. Otherwise fall through */ /* and execute normal QIO. Get an aligned and mapped buffer object to be used for data */ /* transfer. Then align an IOSA object on a quadword boundary. */ /* %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% */ if (do_fast_io) { rtn_status = get_fastIO_buffer(); /* allocate and map an aligned buffer */ if (rtn_status != SS$_NORMAL) /* exit on error condition */ { printf("Failure in get_fastIO_buffer"); sys$exit(rtn_status); /* exit if so */ } /* Now we need to align the IOSA */ temp = ((long)fast_IO_buffer) + MAX_XFER_SIZE; /* find the end of the IO buffer */ temp += QUADW - ((temp + QUADW)%QUADW); /* ensure quadword alignment */ fast_IO_iosa = (char *)temp; /* store the iosa pointer */ } /* %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% * Get an IO Channel number to the device * %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% */ devnam_desc.dsc$a_pointer = (char *)test_file_name; /* init name string pointer */ devnam_desc.dsc$w_length = test_file_name_length; /* and the string length */ access_mode = USERMODE; /* specify user mode */ rtn_status = sys$assign(&devnam_desc, /* get an IO channel number */ &channel_num, access_mode, 0,0); if (rtn_status != SS$_NORMAL) /* get one okay? */ { printf("Failure of sys$assign to obtain IO channel for test file: %s status = %lu\n", test_file_name, rtn_status); sys$exit(rtn_status); /* exit if not */ } @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ Version 2: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ /* %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% */ /* Open the test file with RMS and capture important stuff */ /* %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% */ fab = &outfab; /* set fab pointer */ rtn_status = sys$open ( fab ); /* open the file */ if (!(rtn_status & 1)) /* error if low bit clear */ { printf("Failure to open test file: %s\n to get file stats. rtn_status = %lu", test_file_name,rtn_status); sys$exit(rtn_status); /* exit if so */ } /* dir_id = outnam.nam$w_did; capture the directory id */ file_size = outfab.fab$l_alq; /* get highest vbn in file */ last_record = file_size; rtn_status = sys$close ( fab ); /* close the file */ . . . > Code same as in version 1: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ Version 1's execution: Window 1> @ioh_master starting! Sequential Variable Sz IOs Q: 7 Writes: 50% VC: NO Reps: 200 SP: 0 Seed: 0 RS: 0 SSR: 2 Success opening test file: DKC100:[000000.T]IOHAMMER.TST;1 to get file stats. rtn_status = 65537[ IO/s ] [kbps] [CPU s] [RT us] [ST us] [Lat us] [-- IOs] [-- RIOs -- usec] [-- WIOs -- usec] [ET sec] [Sampls] 471.5 1679 0.05 14893 2121 12772 206 103 5079 103 24707 0.4 0 Window 2> @ioh_slave slave sync = 2 starting! Sequential Variable Sz IOs Q: 7 Writes: 50% VC: NO Reps: 200 SP: 0 Seed: 0 RS: 0 SSR: 2 [ IO/s ] [kbps] [CPU s] [RT us] [ST us] [Lat us] [-- IOs] [-- RIOs -- usec] [-- WIOs -- usec] [ET sec] [Sampls] 474.8 1691 0.05 14647 2105 12542 206 103 6114 103 23180 0.4 0 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ Version 2's execution: Window 1> @ioh_master starting! Sequential Variable Sz IOs Q: 7 Writes: 50% VC: NO Reps: 200 SP: 0 Seed: 0 RS: 0 SSR: 2 Failure to open test file: DKC100:[000000.T]IOHAMMER.TST;1 to get file stats. rtn_status = 98954 %RMS-E-FLK, file currently locked by another user Window 2> @ioh_slave slave sync = 2 starting! Sequential Variable Sz IOs Q: 7 Writes: 50% VC: NO Reps: 200 SP: 0 Seed: 0 RS: 0 SSR: 2 [ IO/s ] [kbps] [CPU s] [RT us] [ST us] [Lat us] [-- IOs] [-- RIOs -- usec] [-- WIOs -- usec] [ET sec] [Sampls] 474.8 1691 0.05 14647 2105 12542 206 103 6114 103 23180 0.4 0 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ I've made the following files and procedures available in BULOVA::DISK$TRANSFER:[PUBLIC] IOHAMMER.C;2 601 29-MAY-1997 08:53:17.00 IOHAMMER.C;1 601 29-MAY-1997 08:43:39.00 IOHAMMER.CLD;27 2 22-MAY-1997 08:22:00.00 IOHAMMER.EXE;2 229 29-MAY-1997 10:55:40.00 IOHAMMER.EXE;1 229 29-MAY-1997 10:55:45.00 IOHAMMER.LIS;2 2789 29-MAY-1997 08:56:34.00 IOHAMMER.LIS;1 2792 29-MAY-1997 08:54:47.00 IOHAMMER.TST;1 507 29-MAY-1997 10:13:58.00 <<< test file read from and written to by iohammer IOHAMMER_BUILD.COM;1 1 29-MAY-1997 11:55:37.00 IOHAMMER_ENV.COM;1 1 29-MAY-1997 09:26:18.00 IOHAMMER_MASTER.COM;1 1 29-MAY-1997 10:17:35.00 IOHAMMER_SLAVE.COM;1 1 29-MAY-1997 10:17:55.00 ... You'll also see IOHAMMER.EXE;461 225 22-MAY-1997 08:14:46.00 which is present for other folks, you don't want it. Modifications required to run on your system.... A good portion of this program runs in kernel mode. The problem is occuring before the program goes into kernel mode. They take seconds to run and complete without hanging. You'll need at least CMKRNL NETMBX PRMMBX PHY_IO privs, and maybe something I've missed, but since my account grants me all-powerful-being status, I have all privs on and can't tell if I missed anything here. You'll have to make a few quick edits, and then you'll be ready to run IOHAMMER_ENV.COM;1 followed by IOHAMMER_MASTER.COM in one window and IOHAMMER_ENV.COM;1 followed by IOHAMMER_SLAVE.COM in another. change the path names to point to the desired location of your test and output files in IOH_MASTER.COM;1 $ set noverify $! Varbl / Samples /WP 0,25 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ $ ioh DKC100:[000000.T]507blocks.TST;1 DKC100:[000000.T]data.txt/q=7/reps=200/wp=50/seed=0/echo=1/ssr=2/sync=2 !--------- test file ----------! !----- output file -----! IOH_slave.COM;1 $ set noverify $! Varbl / Samples /WP 0,25 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ $ ioh DKC100:[000000.T]507blocks.TST;1 DKC100:[000000.T]data.txt/q=7/reps=200/wp=50/seed=0/echo=1/ssr=2/sync=-2 !--------- test file ----------! !----- output file -----! In iohammer.cld change the image path to point to your copy of the image ............. ! define verb ioh image "sys$sysdevice:[deyoung.io]iohammer" !------ image path -------!
T.R | Title | User | Personal Name | Date | Lines |
---|---|---|---|---|---|
2206.1 | Suggestions, and a Guess... | XDELTA::HOFFMAN | Steve, OpenVMS Engineering | Fri May 30 1997 12:09 | 38 |
Mailboxes and mailbox polling are inappropriate for synchronization of multi-process execution, and the use of permanent mailboxes is unnecessary -- one can (and should) use locks for synchronization, and lock value blocks for messaging, and one should use temporary mailboxes when possible. The use of a single hardcoded event flag is poor practice -- the lib$get_ef call should be used to acquire an event flag, and one should use unique event flags for each call that may be outstanding at any given time -- I see a large number of asynchronous sys$qio calls sharing this event flag, and sharing the same IOSB, and I see no use of sys$synch or similar mechanisms to ensure the asynchronous operation(s) have completed. I also see a large number of common or shared variables in isolation, something I prefer to avoid in favor of fewer structures and a fewer external variables, as large numbers of isolated variables can make modular coding rather more difficult, and can be more difficult to debug. If possible, I'd move to a signal handler or other centralized error handling, as I suspect this would tend to make the code somewhat easier to follow -- it would tend to remove the paths that handle errors from the main code flow, and into seperate routines. You will also want to add a "\n" at the end of the two printf() calls. If you can get rid of the code that might crash the system -- since it would appear this kernel code is executed after the test that is failing and appears unrelated -- I might be more willing to run-test this code. If I were guessing -- and I am -- at a cause, I would look for one or more variables getting clobbered. I'd also resolve the event flag (over)use issues. | |||||
2206.2 | working | DECC::VOGEL | Fri May 30 1997 12:36 | 6 | |
This is also being worked off-line. Ed | |||||
2206.3 | CSC64::BLAYLOCK | If at first you doubt,doubt again. | Fri May 30 1997 14:55 | 6 | |
outfab.fab$b_shr = FAB$V_SHRGET|FAB$V_SHRPUT; Make those V's M's presuming that the FLK is the problem you are describing. Otherwise, what problem are you trying to solve again? | |||||
2206.4 | EPS::VANDENHEUVEL | Hein | Fri May 30 1997 15:50 | 13 | |
> file_size = outfab.fab$l_alq; /* get highest vbn in file */ Just a quick word of warning... This is the ALLOCATED size only. It is likely to be appropropriate for a file copy, but the find the last valid byte in a sequential file you'd need to hook up an XABFHC and look for EBK * 512 + FFB. >/* The following code will be executed if testing Fast IO. Otherwise fall through */ Be sure to check topic 622 in the VMSnotes_V12 archive. | |||||
2206.5 | Yes... FAB$V_GET|FAB$V_PUT do need to be changed. | STAR::DEYOUNG | Fri May 30 1997 16:22 | 6 | |
Yes!!! I was misinterpreting the RMS Reference manual's description of the fab. M's should solve the cause of the FLK error's, but not why they are occuring inconsistently between the two different versions of the code. I'll implement this change asap. Thank's for catching that one. - Mark | |||||
2206.6 | Still mystified... | STAR::DEYOUNG | Mon Jun 02 1997 11:47 | 37 | |
I changed the line outfab.fab$b_shr = FAB$V_SHRGET|FAB$V_SHRPUT; to... outfab.fab$b_shr = FAB$M_SHRGET|FAB$M_SHRPUT; but the FLK problem still occurs. I've added copied two massively trimmed down versions of the code into Bulova::disk$transfer:[public] iohammer.c;101 (FLK Error) iohammer.lis;101 iohammer.exe;101 (debug) iohammer.exe;111 (non-debug) and iohammer.c;102 (No FLK but subsequent %SYSTEM-W-ACCONFLICT, file access conflict which is of less interest at this point due to the inconsistent manifestation of the FLK ERROR which is effected by the presence or lack thereof of the afore mention printf. iohammer.lis;102 iohammer.exe;102 (debug) iohammer.exe;112 (non-debug) These programs will not crash your systems, but require at least NETMBX PRMMBX PHY_IO privs. NOTE: although I've included debug files, I could not get the FLK error to occur with the version generated from the source that produces a non-debug version that consistently does cause FLK error. Mark "Dazed and Confused" | |||||
2206.7 | EPS::VANDENHEUVEL | Hein | Mon Jun 02 1997 13:15 | 21 | |
I browsed over the program to nitice that the RMS sharing specified is hardly of consequence as it is only used to figure out the file size. The real sharing is defined by: outfib.fib$l_acctl = FIB$M_WRITE| FIB$M_NOREAD| FIB$M_NOWRITE| FIB$M_WRITETHRU; This explicitly tells others they can not read, which mean that if there is a current reader, you will have an access conflict. This could even be your own prior access if that happened to have the ASY bit set (an not uncommon sideeffect of erroneous C $M/$V bit manipulations but it doesn't look like a problem here). I think you should either use SYS$PARSE / SYS$SEARCH to find the FID/DID and then access with QIO only (specifiying sharing!) or skip the QIO ACCESS and tell RMS to open the file with a User Mode Channel. For that you specify FOP=UFO and retrieve the channel in the low word from FAB$L_STV. hth, Hein. | |||||
2206.8 | Example of UFO and FAB$L_STV Channel Number Access... | XDELTA::HOFFMAN | Steve, OpenVMS Engineering | Mon Jun 02 1997 14:53 | 120 |
: ... For that you specify FOP=UFO and retrieve the channel in : the low word from FAB$L_STV. And here is an example of UFO and STV: #include <descrip.h> #include <lib$routines.h> #include <psldef.h> #include <rms.h> #include <secdef.h> #include <ssdef.h> #include <starlet.h> #include <stdio.h> #include <string.h> #include <stsdef.h> #include <unixlib.h> #define MAXACCLEN 16 struct ItemList3 { short int ItemLength; short int ItemCode; void *ItemBuffer; void *ItemRetLen; }; struct RmsFileContext { struct FAB fab; struct NAM nam; char rss[NAM$C_MAXRSS]; short max_rec_siz; char *data_buffer; }; #define P0SPACE ((void*)0x0200) #define BOGUSMAX 10 RmsFileOpen( struct FAB *fab, char *FileName, char *DefFileName ) { int RetStat; *fab = cc$rms_fab; fab->fab$l_alq = 10; fab->fab$b_fac = 0; fab->fab$l_fop = FAB$M_UFO | FAB$M_CIF; fab->fab$b_shr = FAB$M_UPI | FAB$M_SHRPUT | FAB$M_SHRGET | FAB$M_SHRUPD; fab->fab$l_fna = FileName; fab->fab$b_fns = strlen( FileName ); fab->fab$l_dna = DefFileName; fab->fab$b_dns = strlen( DefFileName ); /* // Attempt to open the file... */ RetStat = sys$create( fab, 0, 0 ); if ( !$VMS_STATUS_SUCCESS( RetStat ) ) return RetStat; return RetStat; } main() { int RetStat; struct ItemList3 ItmLst[10]; $DESCRIPTOR( SecDsc, "FACNAM_GLOBAL_SECTION_NAME" ); int i; void *InAdr1[2] = {P0SPACE,P0SPACE}; void *RetAdr1[2] = {NULL,NULL}; void *InAdr2[2] = {P0SPACE,P0SPACE}; void *RetAdr2[2] = {NULL,NULL}; struct FAB Fab1, Fab2; struct insec { int Bogus[BOGUSMAX]; } *Sec1, *Sec2; /* // Create and open, and map the global section... */ RetStat = RmsFileOpen( &Fab1, "BOGUS", "SYS$SCRATCH:.TMP" ); if (!$VMS_STATUS_SUCCESS( RetStat )) lib$signal( RetStat ); RetStat = sys$crmpsc( InAdr1, RetAdr1, PSL$C_USER, SEC$M_EXPREG | SEC$M_WRT | SEC$M_DZRO | SEC$M_GBL, &SecDsc, 0, 0, Fab1.fab$l_stv, 1, 0, 0, 0 ); if (!$VMS_STATUS_SUCCESS( RetStat )) lib$signal( RetStat ); /* // Create and open, and map the global section again... */ RetStat = RmsFileOpen( &Fab2, "BOGUS", "SYS$SCRATCH:.TMP" ); if (!$VMS_STATUS_SUCCESS( RetStat )) lib$signal( RetStat ); RetStat = sys$crmpsc( InAdr2, RetAdr2, PSL$C_USER, SEC$M_EXPREG | SEC$M_WRT | SEC$M_GBL, &SecDsc, 0, 0, Fab2.fab$l_stv, 1, 0, 0, 0 ); if (!$VMS_STATUS_SUCCESS( RetStat )) lib$signal( RetStat ); /* // Write the information to one "window"... // ... read the data back from the other. */ Sec1 = RetAdr1[0]; Sec2 = RetAdr2[0]; for ( i = 0; i < BOGUSMAX; i++) Sec1->Bogus[i] = i; for ( i = 0; i < BOGUSMAX; i++) printf( "Bogus[%d] = %d\n", i, Sec2->Bogus[i] ); return SS$_NORMAL; } | |||||
2206.9 | Problem solved: thankyou for the assistance. | STAR::DEYOUNG | Tue Jun 03 1997 16:02 | 5 | |
Thankyou all for taking the time to help out. The problem was solved with the missing parameters fab->fab$l_fop = FAB$M_UFO | FAB$M_CIF; fab->fab$b_shr = FAB$M_UPI | - Mark |