|
Since the errors started to occur when you went from Phase IV
to DECnet/OSI, I'd start with the DECnet/OSI folks. ACMS is
not aware of which 'phase' of DECnet is running, in fact, we
do not support DECnet/OSI except with Phase IV routing. The
text of the error, FILNOTACC, does not make sense to me, but
it might to a DECnet/OSI person.
Bill
|
|
Hi !
Thanks for answering .
I am a decnet/osi person, so FILNOTACC does make sense to me.
As I intend to escalate this pb through IPMT channels ,
I suppose the DECnet/OSI engineering will need some
detailed info from swlup : in particular the link id
associated to the read failure reported . If we don't
have any information about how ACMS interfaces with DECnet
and if we're unable to associate NSP traces to SWLUP logs ,
I'm afraid we won't be able to go further !
Despite this status FILNOTACC does not make sense to you,
you might already have experimented such problems :
a collegue of mine , here in TSC, who is an ACMS specialist,
once had the same swlup logs on his system, and got rid
of them just with a "good" ACMS tuning. Unfortunatly,
it didn't work for this customer ...
Now, I do agree that the DECnet "phase" is not ACMS's problem,
but anyway, ACMS server code is responsible to detect, report
and eventually survive network errors.
So I think FILNOTACC should have (and certainly does) a meaning
for ACMS . Please help !
A few words about FILNOTACC (on decnet network operations):
It normally means that an application tries an I/O operation
(usually read or write) on a network link which is not yet
established .
It happens that some versions of DECnet/OSI incorrectly return
this status to decnet applications, and this may confuse them.
In this erroneous circumstance , DECnet/OSI reports FILNOTACC
when a link has been established , then disconnected .
Thanks again , and best regards , Michele.
|
|
As you can see, the errors are handled by ACMS. The error returned
from the service was a -F- but we reported it in SWL as a -I-.
When a link breaks, MSS (our messaged-based interconnect) reports
the error and then tries to re-establish the link. In this case, the
customer is not reporting any user problems, so the links are
re-established and the operation continues successfully. One thing
about MSS is that it multiplexes the operations on a link. The CP
process is the user-interface, serving probably around 20 users.
If each user is talking to the same application on the same remote
node, there are only 2 links, one for executing the task, the other
for performing the necessary presentation services. All 20 users
use the same 2 links.
DECnet/OSI is just now starting to cause problems for us and we'd
appreciate some assistance in understanding how it works differently
from DECnet Phase IV.
Bill
|
|
Hi !
Thanks for those info : these are good news ...
The nsp traces (ctf) I've done during such logs were reported
by swlup show that :
1) there are lots of links established
2) there are lots of links disconnected ( much more than reported
by swlup)
3) these links are always disconnected with "reason = abort",
most of the time (I saw 1 exception for 30 links) by the
ACMS client.
4) on these links , the same scenario occurs each time :
the link is established (connect request sent by acms client),
then exactly 5 NSP data messages are exchanged between the
server and the client ; the data messages sent by the client
are "short" ; the fourth data message is always sent by
the server , ans is "long" (over 1400 bytes , thus segmented
into 2 parts) ; the fifth message is always sent by the
client, and is 21 bytes longs.
After the 5th message has been sent by the client, the client
send a disconnect request (DISI , reason = abort) , and the
server confirms the disconenction (DISC).
I must say that there's so much traffic that in 2 minute,
I get 2000 binary blocks and some trace records lost ; so
I 'm unable to "follow" each link from it's establishment
up to it's disconnection , and I'm unable to decide if
some of the link disconnections that are kept into the trace
file are related to swlup FILNOTACC records.
Now, about the decnet/osi behaviour difference (from decnet IV),
please see below some extracts from CFS #38060:
------------------------------------------------------------------------
DECnet/OSI returns wrong status codes on link disconnect
which may be 'breaking' BACKUP and DFS
DECnet/OSI is returning the wrong status on a number of
occasions when links are disconnected. This appears to involve
NSP, OSI Transport, and Session Control, ...
As stated above, there are actually a number of problems, but
the most critical at this instant in time is that when links are
disconnected DECnet/OSI returns "%SYSTEM-F-FILNOTACC, file not
accessed on channel" as the error status in the IOSB, and this
appears to be confusing both BACKUP and DFS to the point where
they 'hang'.
I can easily reproduce this problem with NSP Transport, but not
with OSI Transport. I am not sure why, except that OSI Transport
status returns seem to be so badly broken that the confusing
status may not be passed up the stack ---
This is what I expect from "%SYSTEM-F-FILNOTACC, file not
accessed on channel" error ... I have always thought that this
error status would be returned in R0 (not the IOSB) if I
attempted to do a file operation (eg read/write virtual to a
file device) when I did not have a file open. In network terms I
would expect this status in R0 if I did a read/write virtual to
a NET device when I have not done an IO$_ACCESS to set up a
logical link. Once I have got past the R0 check I do not expect
to get this error -- ie I have never expected this error in the
IOSB. If something goes severely wrong after the IO has been
issued but before it completes then I expect a SS$_ABORT or
other type of error indicating why it failed.
What appears to happen with DECnet/OSI is that once a link is
established you can issue read/write etc, but if the link is
disconnected some of the outstanding IO at that time are
completed with FILNOTACC in the IOSB. I have seen IO$_WRITEVBLK
and IO$_DEACCESS with this error, but not IO$_READVBLK (which
may or may not be significant). I believe that the IO$_WRITEVBLK
should fail with PATHLOST, UNREACHABLE, or THIRDPARTY (depending
upon the type of disconnect), and this would be compatible with
PhaseIV behaviour. The IO$_DEACCESS should return success. When
an application is notified of a disconnect the application must
do an IO$_DEACCESS on the channel to free its end of the link.
If the application does not do a IO$_DEACCESS then the Session
Control port is not deleted and it hangs around, using up
resources. The IO$_DEACCESS should clean up and return success.
..................................
Event Type: SOLUTION
Date & Time: 1-Oct-1996
Actor: BADGE\96563
Mike Dyer has concluded his changes for $QIO. These fixes have
been tested and will be included in the next ECO kit for DNVOSI V6.3
ECO6.
Fix for Filnotaccess,Unkresult and Remoteshutdown.
Check the QIO UCB for a local disconnect and add
the mappings for I-DISCDATATRUN and REMOTESHUTDOWN
Modify; QIO_EXECUTE.B32, QIO_COMPLETION.B32,
QIO_MAPERR.B32 and QIO_STRUCTURES.SDL
Directory HELP""::ABBYRD$DKA100:[KANSAS.KITS.VAX]
NET$DRIVER.EXE 52 30-SEP-1996 21:21:52.00
NET$DRIVER.STB 6 30-SEP-1996 21:21:59.00
NET$OSDRIVER.EXE 74 30-SEP-1996 21:22:03.00
NET$OSDRIVER.STB 7 30-SEP-1996 21:22:10.00
copy to sys$common:[sys$ldr]
------------------------------------------------------------
Personal addendum : Decnet/OSI eco 6 release notes do not clearly
describe these changes (no details for FILNOTACC)
the net$driver and net$osdriver coming with
eco 6 are dated 15-NOV-1996 , which let's me thing
other problems have been fixed.
|