T.R | Title | User | Personal Name | Date | Lines |
---|
2429.6 | Working . . . | IOSG::SHOVE | Dave Shove -- REO2-G/M6 | Mon Mar 22 1993 17:38 | 9 |
| There's no special .EXE or anything that XF uses. (In some cases it'll
use the file cab server, but you reckon there's no problem with that.)
It's all done with the (somewhat involved!) Named Data on form
EM$INDEX$OPTIONS. Basically it just loops through your selections doing
MAIL FORWARDs.
I'm copying your .log files now . . .
D.
|
2429.10 | No official support line available, desperate... | LEMAN::PUNKIE::REGINA | Carrie's in the carrot land | Wed Mar 31 1993 11:42 | 21 |
| Ok,
we tried. But: official channels say no (I guess they don't have the
resources) because we are not a customer. How I loved the hotline in Valbonne
some years ago!
The problem appeared after an upgrade to VMS 5.5-2. The problem appears on 2
nodes which are 6000-430. The problem does not appear on the third node 8800.
Has anyone anywhere a 6000-430 with 5.5-2 and ALL-IN-1 V3.0 unpatched?
As a side info, the cluster *HAS* been rebooted after the VMS upgrade and the
uptime of 370 + days is a feature somewhere (dixit the system manager). In
the meantime I also recompiled and reinstalled OA$MAIN, as the *MTHRTL.EXE
has a new version in VMS 5.5-2 (lots of shots in the dark ...). To no avail,
you guessed it.
Sorry to be a pain, but this notesfiles seems to be the only available
resource for our problem.
/rhr
|
2429.11 | works for 1 only (unpatched 3.0) | FORTY2::ASH | Grahame Ash @REO | Wed Mar 31 1993 12:22 | 10 |
| and I can't be any help either. Our cluster has 3 different 6000s,
unfortunately not one the same as yours. XF on OUTBOX 'works' in exactly the
same way on all 3 systems - it processes the first message successfully, and
then stops, as if I'd only selected F, not XF. No error messages.
I remember during 2.3 development we debated long and hard about what XF would
do (and if anyone actually wanted it!). I can't remember what we decided, but
I'm sure it wasn't 'just process one message'.
grahame
|
2429.12 | OK on our 6000-340 with 5.5-2 | IOSG::SHOVE | Dave Shove -- REO2-G/M6 | Wed Mar 31 1993 17:51 | 19 |
| Well, one of our production systems is a 6000-340 running 5.5-2
(TRON::); it is however running a "prototype of a possible future
release" (nudge nudge!)
On that system, XF works fine, as it does on the 8800 in the same
cluster.
So it appears that whatever it was, we've fixed it.
Unfortunately, there's no record in our change history of our having
deliberately fixed it, so it must have been as a side-effect of another
change.
It might be worth seeing if it's reproducible on 3.0-1, or getting
someone to do so. Then at least you'd know if you'll be able to fix it
by installing the patch.
Sorry again,
D.
|
2429.14 | I don't think it's VMS or VAX | FORTY2::ASH | Grahame Ash @REO | Thu Apr 08 1993 11:08 | 16 |
| Hi Regina,
Yes, we're all on 5.5-2.
In practice it's very rare for ALL-IN-1 to react as differently as this after
a VMS or CPU upgrade - I can't remember ever seeing a problem that was that
specific to the operating environment.
I'd say that it's more likely that there's something wrong with the ALL-IN-1
setup or startup on the failing machines. Are all the relevant images and
sections installed (check using $INSTALL)? Has A1V3START.COM completed
successfully? (And there must be other similar things!)
Good luck,
grahame
|
2429.15 | INSTALL and SYSGEN has been checked for the obvious | LEMAN::PUNKIE::REGINA | Carrie's in the carrot land | Tue Apr 13 1993 11:36 | 15 |
| Hmmm..
I *DID* check with INSTALL, went one by one through all ALL-IN-1 images with
their privileges as well as all other installed images and the only probably
relevant differences I came up with was MTHRTL and VMTHRTL, which is normal
to be different between in a 6000 series and a 8000 series machine.
Remember, only XF refuses to work.
One last question: what is the microcode revision of your 6000-430? Do an
ANA/ERR, I am interested in CPU and console revision.
Thanks and regards
Regina
|
2429.16 | Here's what our (TRON's) ANAL/ERR says | IOSG::SHOVE | Dave Shove -- REO2-G/M6 | Tue Apr 13 1993 12:37 | 5 |
| TIME STAMP KA62B CPU FW REV# 6. CONSOLE FW REV# 8.0
Hope it helps - means nothing to me.
D.
|
2429.17 | My system wants to play too - unfortunately! | SUBURB::A1_CRAN | Caroline (A1_Cran) Croft | Wed Jun 09 1993 13:28 | 26 |
|
Hi,
I suddenly seem to be getting this problem too on our node SUBURB.
The cluster has 4 nodes, 2 6520's & 2 6340's. If I link on the 6340
then the XF will work on the 2 6340's but NOT the 6520's. If I link on
the 6520's it won't work on any of the nodes.
I am almost possitive this happened after I install the a1_eco01030
patch, although I also installed the a1_fix009030 at the same time.
Has anyone got any ideas, because I am at a full stop now!!!
Regards
Caroline.
|
2429.18 | XF- Your current document cannot be established | KERNEL::SIMPSONR | fred | Thu Jun 10 1993 14:43 | 85 |
|
Hello,
Here is some more information on Caroline's problem (2429.17).
It is only reproducable on the one cluster, we have performed tests on two
other machines, one of which has exactly the same hardware setup. The only
apparent difference is that the cluster that does not work, was originally
V2.4 and then became an Official V3.0 Field Test site. All the ALL-IN-1
systems are patched to V3.0-1.
The system has four machines, of which the problem only occurs on two. To
reproduce the problem,log onto one of the nodes that does not work, create
three files in WP, then goto EM bring up an index and selem, then do
an XF and <RETURN>, this will bring the following message box:
+---------------------------------------------------------+
|Your xf operation failed on document entitled XF Test2 in|
|folder XF TEST. Your current document cannot be |
|established. |
| |
| |
|Press RETURN to continue or press EXIT SCREEN to abandon.|
+---------------------------------------------------------+
On pressing RETURN and 'Y' to send, a mail is sent, but contains only the last
document to be forwarded and the message text. The ALL-IN-1 system is on the
network so that if somebody from engineering can find time to log in and see
the problem at first hand please get in touch with Caroline Croft (774 6254)
for details.
I also compared ALL-IN-1 traces and there appears to be no differences except
for the error messages, below are some snippets from the A1TRACE.LOG from the
node that has the problem:
![IO] Getting field DELETE from DOCDB, Value: Y
![IO] Getting field MODIFY from DOCDB, Value: Y
![SYMBOL] Symbol: OA$CURDOC, Value: XF TEST 274897
![IO] Getting record from CAB$PDAF, Key: [.DOC7]ZUSIMOBLI.WPL, Key-of-ref: D
! AF_KEY/0
![IO] Getting field FORWARDABLE from CAB$, Value:
![SYMBOL] Symbol: CAB$.FORWARDABLE[OA$CURDOC], Value:
![A1LOG] Entry: %OA-I-LOGERROR, %OA-W-CAB_NEED_CURDOC, Your current document ca
! nnot be established
![IO] Getting record from DOCDB, Key: CREATED 725097,
! Key-of-ref: DOCUMENT/0
![A1LOG] Entry: %OA-I-LOGFUN, Function: IFNOTSTATUS
![FUNC] Function: XOP, Cmd line: "~~TEST_FOR_X_ERROR~~"
![A1LOG] Entry: %OA-I-LOGFUN, Function: XOP "~~TEST_FOR_X_ERROR~~"
![SYMBOL] Symbol: "~~TEST_FOR_X_ERROR~~", Value: ~~TEST_FOR_X_ERROR~~
![SYMBOL] Symbol: EM$INDEX$OPTIONS, Value:
![FUNC] Function: GET, Cmd line: #X_TEXT = OA$MSG_TEXT
![A1LOG] Entry: %OA-I-LOGFUN, Function: GET #X_TEXT = OA$MSG_TEXT
![SYMBOL] Symbol: #X_TEXT = OA$MSG_TEXT, Value: Your current document cannot be
! established
![FUNC] Function: OA$MSG_PURGE, Cmd line:
![A1LOG] Entry: %OA-I-LOGFUN, Function: OA$MSG_PURGE
![SYMBOL] Symbol: #X_KEY, Value: XF TEST 274897
![IO] Getting record from DOCDB, Key: XF TEST 725103,
! Key-of-ref: DOCUMENT/0
![IO] Getting field TYPE from DOCDB, Value: DOCUMENT
![IO] Getting field MAIL_STATUS from DOCDB, Value:
![IO] Getting field DELETE from DOCDB, Value: Y
![IO] Getting field MODIFY from DOCDB, Value: Y
![FUNC] Function: SCRIPT, Cmd line: GENERAL_CURDOC_ERROR
![A1LOG] Entry: %OA-I-LOGFUN, Function: SCRIPT GENERAL_CURDOC_ERROR
!
![SCRIPT] Opening script GENERAL_CURDOC_ERROR (SCRIPT)
![IVP] Script GENERAL_CURDOC_ERROR opened
![SCRIPT] Function nesting level: 0. Script context follows:
Finally if there is anybody who has a similar hardware and software setup and
was an Official field test site for V3.0 please would it be possible for you
to try and reproduce the problem,
Thanks in Advance,
Richard Simpson.
|
2429.20 | Don't you just hate this when this happens...!! | SUBURB::CROFTC | Caroline Croft | Mon Jun 14 1993 11:28 | 23 |
|
Hi,
In answer to .18 Yes - I have - However in absolute desperation on
Friday I compared sysgen params across the 2 types of Nodes, and
against the ALL-IN-1 installation guide, and the following parameters
were set (via autogen) to be too low.
MAXBUF
PIOPAGES
PQL_MENQLM
TTY_TYPAHDSZ
I reset these to the recomended minimum levels, rebooted, and
hey-presto - it worked.
SO I guess you can quite rightly blame us for this one!!
Regards
Caroline.
|
2429.21 | Nasty things these quotas! | IOSG::CHINNICK | gone walkabout | Mon Jun 14 1993 13:09 | 29 |
|
Hmmm... very interesting...
I've been looking at this problem and I couldn't figure out what could
be causing it. I even had to answer the SPR on it as 'Not Reproducible'
It looks like it's caused by ENQLM from the info you've just given. Of
course, it could just be intermittent and have gone away?
[This is by elimination:
TTY_TYPAHDSZ - more chance of winning the lottery than this
PIOPAGES - possible but unlikely - I'd expect RMS$_DME if so
MAXBUF - not doing any buffered I/O here, so v. unlikely
]
Could you look at the account(s) where you were having the problem and
tell us what the ENQLM was set to??. These days, I suggest ENQLM=1200
for ALL-IN-1 because there are so many data-sets and other products
linked to it that it is very easy to blow this limit.
If it is ENQLM, then the failure is most likely to be when the lock is
taken out on the document. This should cause an OA$_CAB_DOCLOCKED error
which should be seen when GOLD\W was used. Can anyone confirm this was
the observed symptom??
Congratulations on finding it!
Paul.
|
2429.22 | | SUBURB::A1_CRAN | Caroline (A1_Cran) Croft | Mon Jun 14 1993 16:41 | 19 |
|
Hi,
ENQLM is 350 on most of our ALL-IN-1 accounts - sounds like we need to
up it somewhat.
We couldn't do a GOLD W on the error because it was trapped and then
the general curdoc error routine was called which draws the box and
displays the 'current document could not be established error'. A trace
however generated the OA-W-CAB_NEED_CURDOC error.
The funny thing is, I have set the sysgen params on my VAXstation 3100
and still cannot replicate the problem, so could it be a 'funny' with
the 6520's prehaps...
Regards
Caroline
|
2429.23 | The answer is blowing in the wind... | IOSG::CHINNICK | gone walkabout | Mon Jun 14 1993 17:09 | 40 |
|
Well, 1200 is just my recommendation for ENQLM... RMS can get quite
lock hungry with the bucket, record, global buffer and file locks. Add
to this the explicit locks which ALL-IN-1 uses and you could get up to
quite a high number. Normally 350 might seem a lot, but I'd say its a
bit too low. I think I've blown 300 before now.
OA$_CAB_NEED_CURDOC is normally an error in response to an attempt to
validate the current document. The code tries to check if the required
document is current for various CAB functions and if not tries to get
the record. If this get fails, you get the error. It can also happen in
other circumstances where the current document has not been properly
set and I was thinking that it was along these lines where a problem
was occuring. Errors such as a failure to obtain the document lock
causes OA$_CAB_DOCLOCKED to be signalled. In either case, you'll get
this problem of 'no current document'.
In any case, I think quotas are a good candidate. Of course, RMS
structure errors, disk problems and the normal myriad of I/O failures
could all give this message. So ENQLM is not the only possibility!
The nature of this error is likely to be very sensitive to your
configuration. ALL-IN-1 may be locking a lot of records to process an
XF operation and depending on the nature of these documents you could
be locking 10's to 100's of SDAF records alone. Add to this any other
data-sets which are active or held open and you soon see that it will
depend on the documents you are forwarding, the file tuning options,
the size of the system, the values for the quotas and parameters, the
direction of the wind, which side you got out of bed, etc etc...
And, the SYSGEN parameters - even if dynamic - would only apply at
login time. You could try the test of using SDA to check the remaining
ENQLM before an XF operation and then during and after to see what sort
of numbers are being used. PQL_MENQLM is the minimum value - it only
overrides the UAF quota value if this value is less than the minimum -
maybe your account on the 3100 has a higher value?
Paul.
|