[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::decladebug

Title:Digital Ladebug debugger
Moderator:TLE::LUCIA
Created:Fri Feb 28 1992
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:969
Total number of notes:3959

893.0. "Stale NFS handle => can't quit" by PADC::PDONAHUE (Paul Donahue) Thu Mar 20 1997 16:26

If I'm running ladebug on the foo executable and I recompile foo while ladebug
(4.0-23) is still running, it (obviously) can't read the original program that I
was debugging.  It complains about a stale NFS handle.  This is understandable
if I want to step or examine variables, etc.

However, I can't even quit ladebug.  Why does it try to read the symbol table
when you give it the quit command:

(ladebug) quit
File read failed: : Stale NFS file handle
Unhandled Ladebug ErrorReadingSymbolTable exception. Recovering...
(ladebug)

I end up having to ctl-Z and kill the job (and there are a couple more steps
since it doesn't die gracefully).  This isn't very clean.

Is there a particular reason that quit needs to read the symbol table?  Could
this error be eliminated in a future ladebug version?


Thanks,

-Paul
T.RTitleUserPersonal
Name
DateLines
893.1Can you give us more detail?TLE::LUCIAhttp://asaab.zko.dec.com/~lucia/biography.htmlMon Mar 24 1997 13:4146
A simple attempt to reproduce this shows that it works.  The directory "scratch"
is mounted on an NFS disk.

scratch> cat > foo.c
int main()
{
printf("I race, therefore I am\n");
return 1;
}
scratch> cc -g foo.c
scratch> $DELIVERIES/4.0-23.for.v3.2/input/usr/bin/decladebug
Welcome to the Ladebug Debugger Version 4.0-23
(ladebug) load a.out
Reading symbolic information ...done
(ladebug) stop in main
[#1: stop in int main(void) ]
(ladebug) run
[1] stopped at [main:3 0x1200011a0]
      3 printf("I race, therefore I am\n");
(ladebug) sh cc -g foo.c
(ladebug) quit
scratch>

Note also if you rerun, the binary is reread since it changed.

scratch> $DELIVERIES/4.0-23.for.v3.2/input/usr/bin/decladebug a.out
Welcome to the Ladebug Debugger Version 4.0-23
------------------ 
object file name: a.out 
Reading symbolic information ...done
(ladebug) stop in main
[#1: stop in int main(void) ]
(ladebug) run
[1] stopped at [main:3 0x1200011a0]
      3 printf("I race, therefore I am\n");
(ladebug) sh cc -g foo.c
(ladebug) c
I race, therefore I am
Process has exited with status 1
(ladebug) rerun
Reading symbolic information ...done
Warning: the breakpoints may not be valid anymore
[1] stopped at [main:3 0x1200011a0]
      3 printf("I race, therefore I am\n");
(ladebug) quit
scratch> 
893.2DECCXL::OUELLETTEcrunchMon Mar 24 1997 14:301
Did you make your NFS disk go away before you tried to quit?
893.3Hmmm. Works now.PADC::PDONAHUEPaul DonahueMon Mar 24 1997 15:0513
Now it seems to work.  I don't remember exactly what I was doing at the time to
exhibit the behavior.  Now it complains, but it does properly quit under all
conditions that I can conjure up.

But it really was a problem.  Seriously.  I'm not crazy :-)

It has happened before.  If it happens again, I'll get the specific details and
post them.


Thanks,

-Paul
893.4TLE::LUCIAhttp://asaab.zko.dec.com/~lucia/biography.htmlMon Mar 24 1997 16:528
How does one "make your NFS disk go away before you tried to quit"?  As long as
the binary is open, automount keeps the disk mounted.  Do you forcefully
dismount the file system while in use?  

We'd be happy to make ladebug more robust, but we need to know how to reproduce
this case first.

Tim
893.5simulate the hardware failure...DECCXL::OUELLETTEcrunchMon Mar 24 1997 16:566
> How does one "make your NFS disk go away...

Power off the system serving the disk.
Or alternatively, disconnect the network adapter from the network.

R.
893.6Problem is back - more detailsPADC::PDONAHUEPaul DonahueTue Apr 15 1997 19:10108
OK.  I'm seeing the problem again.  Here's what I did:

I ran my program on my workstation from a disk that is served by our main NFS
server.  Also, ladebug is served from there (/usr/local/ is not local).  I was
running along and simultaneously I recompiled on the NFS server (not on my
workstation).  My program continued to run on my workstation with no problem. 
Then my program finished its number-crunching and returned to the program's
prompt (not the ladebug prompt).  I entered a couple more quick commands and
then it apparently requeries the NFS server or the local inode cache is flushed
or whatever.  I eventually get:

abm00>
abm11>
abm11>
File read failed: : Stale NFS file handle
Unhandled Ladebug ErrorReadingSymbolTable exception. Recovering...
(ladebug) quit
File read failed: : Stale NFS file handle
Unhandled Ladebug ErrorReadingSymbolTable exception. Recovering...
(ladebug) q
File read failed: : Stale NFS file handle
Unhandled Ladebug ErrorReadingSymbolTable exception. Recovering...
(ladebug) quit
File read failed: : Stale NFS file handle
Unhandled Ladebug ErrorReadingSymbolTable exception. Recovering...

Note that "abmXX>" is my program's prompt - I simply hit return several times at
the abm prompt (which issues a default command) and it got the error on the
third or fourth time.  Then I couldn't quit ladebug.  I did a Ctl-Z and tried to
kill it.  Then I got:

smchip.pa.dec.com> kill %1
smchip.pa.dec.com>
Ladebug Debugger Version 4.0-23 caught signal "Terminated" (15).
This is an unexpected condition and may indicate the presence of a defect.
If you wish to report this, please include the stack trace that follows.
Diagnostic stack trace ...
0x24343a50
0x1232c298
0x1221db90
0x12403c70
0x1228f5a4
0x1228df30
0x242c52f0
0x1228f0a4
0x1228df30
0x242c52f0
0x1228b1f8
0x1228df30
0x242c492c
0x242c4568
0x1228d3f8
0x1228df30
0x242c492c
0x242c4568
0x1228ecbc
0x122c6a80
0x122c1ab0
0x1225452c
0x122284fc
0x12213754
0x1220238c
0x121fee1c
end of diagnostic stack trace.
File read failed: : Stale NFS file handle
Unhandled Ladebug ErrorReadingSymbolTable exception. Recovering...
(ladebug) q
[1]  + Suspended (tty output) decladebug ~/bigfoot/work/biu/sa1500/abm
smchip.pa.dec.com> fg
decladebug ~/bigfoot/work/biu/sa1500/abm

File read failed: : Stale NFS file handle
Unhandled Ladebug ErrorReadingSymbolTable exception. Recovering...
(ladebug) quit
File read failed: : Stale NFS file handle
Unhandled Ladebug ErrorReadingSymbolTable exception. Recovering...
(ladebug)
Suspended
smchip.pa.dec.com> kill %1

Ladebug Debugger Version 4.0-23 caught signal "Terminated" (15).
This is an unexpected condition and may indicate the presence of a defect.

[etc...]

I can't even kill the stupid thing.  I ended up doing a kill -9 after getting
five ladebug stack dumps.

Anyway, the NFS server is still up and is still serving the disk.  One point
that might be of interest is that we have tons of symlinks around here.  I was
running ~/bigfoot/work/biu/sa1500/abm and we have:
~ -> /udir/pdonahue -> /padc_u2/pdonahue -> /r/padc_u2/pdonahue ->
/tmp_mnt/r/padc_u2/pdonahue
~/bigfoot/work -> /padc_p3/bigfoot/beh/osf_work ->
/r/padc_p3/bigfoot/beh/osf_work -> /tmp_mnt/r/padc_p3/bigfoot/beh/osf_work

Everything is mounted from the same server.  All the disks that are necessary to
resolve the links and serve the file were online throughout the whole thing.

I don't necessarily mind that it barfs when I recompile underneath it, but not
being able to quit or even kill it is the problem.

Any ideas?


Thanks,

-Paul
893.7TLE::BRETTThu May 29 1997 16:151
All (?) ^C problems are fixed in BL38