T.R | Title | User | Personal Name | Date | Lines |
---|
3038.1 | What was the server error | SYSTEM::HELLIAR | http://samedi.reo.dec.com/ | Wed Mar 05 1997 08:11 | 19 |
| 3. The customer is running OBB V2.5A. At regular intervals the customer
issues commands like
trade track appl/type=trans/database=live/dir=in/partner=tst
trade track appl/type=doc/database=live/dir=in/partner=tst/curent=avail
trade track appl/type=doc/database=live/dir=out/part=tst/current=avail
When there's a lot of activity on the system or there are a lot of documents
in the databases then some of the above commands will fail with the error
message "server error". The next run sometimes doesn't show the error.
so it looks like the error message is load related.
>> 'Server Error' should indicate that the server has detected an
>> error, and reported it in the server error log (@DECEDI$LOOK, or
>> decedi_look). Is there anything in the error log indicating the
>> underlying cause?
Graham
|
3038.2 | I thought it was empty | UTRTSC::SMEETS | Workgroup support | Wed Mar 05 1997 10:44 | 23 |
| Hi Graham,
Thanks for your reply.
>> 'Server Error' should indicate that the server has detected an
>> error, and reported it in the server error log (@DECEDI$LOOK, or
>> decedi_look). Is there anything in the error log indicating the
>> underlying cause?
I was onsite when this happened and if I'm correct there was nothing in
DECEDI$LOOK with respect to the server error.
I'll ask the customer to perform the Trade track commands again and closely
watch de DECEDI errorlog when the server error occurs.
How about enabling OBB tracing (define OBB_TRACE_FLAGS "RT"). Shouldn't there
also be a decedi.err file ?
Any ideas about the other problems/observations ?
Thanks,
Martin
|
3038.3 | 1,2,&3 | SYSTEM::HELLIAR | http://samedi.reo.dec.com/ | Wed Mar 05 1997 14:27 | 26 |
| Martin:
1. LIST TRANS/ARCH works fine on my T3.2 system. However PARTNER is
only stored on outbound transmissions. So in my case /PARTNER= only
matched the outbound half. /SINCE and /DIRECTION also worked OK.
2. The first time you enter INTERCHNAGE and do LIST DOCUMENT you will
establish a new connection to the database server. The database server
will then take your query and processes it...as its the first time
you've requested it you dont benifit from the database caching so it may
take some time. Subsequent calls within the SAME interchange session
should be quicker.
3. If its reporting server error but not generating something in the
server error log then something is HIGHLY suspect. One that I have
experienced on OSF is that the request causing the database to generate
a work file in which it does such things as sorting...this then blows
the available disk space and thus causes the call to fail..if the error
log is then on the same disk then there is no room to actually log the
error message. The other occasion I have seen server error but nothing
in the server error log was when the user was pointing at the wrong
server and hence lookinh in the wrong servers error log. Using the
OBB_TRACE_FLAGS will confirm your talking to the right server but will
not catch the RDB error code which is what I think your suffering from.
Graham
|
3038.4 | | FORTY2::DALLAS | Paul Dallas, DEC/EDI @REO2-F/E2 | Thu Mar 06 1997 09:33 | 12 |
| On OpenVMS check the definition of sys$scratch. This needs to point to
a disk that:
a) DEC/EDI and Rdb can access
b) has enough space to store the query needed.
Rdb uses memory to process its sort lists, but if the list is long
(lots in the database) it will use the sys$scratch device. If this
device is protected such that Rdb can't write to it, the sort list
will not be written. Similarly if there is insufficient disk space.
I haven't seen "Server Error" with this, but I have seen commands
fail to find data that I knew was in the database.
|
3038.5 | Don't think that's the cause of the problem | UTRTSC::SMEETS | Workgroup support | Thu Mar 06 1997 15:36 | 17 |
| Hi Paul,
>> On OpenVMS check the definition of sys$scratch.
This should point to the sys$login directory of the user.
>> This needs to point to a disk that:
>> a) DEC/EDI and Rdb can access
>> b) has enough space to store the query needed.
I don't think that a) or b) does apply to this problem due to the intermittent
appearance of the problem.
But anyway I'll check with the customer next week.
Martin
|
3038.6 | | FORTY2::DALLAS | Paul Dallas, DEC/EDI @REO2-F/E2 | Fri Mar 07 1997 09:27 | 10 |
| Martin,
This is an intermitent problem. It only appears when the sort list
reaches the critical size that makes Rdb choose to use disk storage
to sort the list rather than in-memory storage. Certainly if either
(a) or (b) apply then you will see the problem every time the sort
list becomes too large to handle in memory, but you won't see it if
the sort list is small.
P.
|
3038.7 | Fix ? | UTRTSC::SMEETS | Workgroup support | Fri Mar 07 1997 10:03 | 14 |
| Hi Paul,
>> This is an intermitent problem. It only appears when the sort list
>> reaches the critical size that makes Rdb choose to use disk storage
>> to sort the list rather than in-memory storage. Certainly if either
>> (a) or (b) apply then you will see the problem every time the sort
>> list becomes too large to handle in memory, but you won't see it if
>> the sort list is small.
Can this problem be fixed ? If yes, do you need an IPMT ?
Thanks,
Martin
|
3038.8 | | METSYS::THOMPSON | | Fri Mar 07 1997 10:42 | 75 |
| Martin,
To help solve problems like these we need all the information that can
be gathered.
>1. Are there known bugs with respect to the list trans/arch command ?>
> If there are some 200 TF's (using the same connection, and the same partner)
> in the Archive database and I issue the command list trans/arch than I'll
> see all 200 TF's.
>
> If however I issue the command list trans/arch/partner=tst than there are
> no TF's returned, however the partner = TST........
> The same applies to combinations with /since or /direction
As far as we know there are no bugs whatever in list trans.
However anything from insufficient process quota's to file protection
violations can conspire to prevent it working. So gather as much information
as possible. A "well worn" phrase we often hear around here is "there was
nothing interesting in the error log" - when in practice there neatly
always is. Even literally nothing can indicate file protection problems.
So:
a/ make sure "DECEDI$LOG_SEVERITY" = "INFORMATIONAL", at least while this is
being de-bugged. Extract and post whatever is there for a some minutes
before and after the test.
b/ If there are protection problems - let VMS tell you all about them:
$ reply/enable ! from an Oper Account
$ set audit/alarm/enable= ...
$ ! for ... substitute protection, quota alarms, etc..
c/ Note the time of the test and look in sys$system:RDMMON51.LOG (for 51 subs
whatever version of RDB is being used). Were there any significant events
in there?
d/ Were there any RDB crash dumps?
e/ Do the "list trans" with the help of an RDB expert that knows how to
specify and interpret the RDMS Trace flags. This was documented in
Appendix C of the "RDB Guide to Database Performance and Tuning" - I have
old manuals though.
I have seen this type of problem before. What had happened was that one
of the indexes had become corrupted. When you use "list tr/arch" that
would use the working index, where as list tr/partn=xxx would use the
broken index.
You could try rmu/unload to dump the contents of the Archive Database. Then
re-create it to the same size and rmu/load it. Perhaps even dropping the index
and re-creating it may help.
Check that you have the very latest ECO for your Oracle RDB version.
>
>3. The customer is running OBB V2.5A. At regular intervals the customer
> issues commands like >>>
>
>
> trade track appl/type=trans/database=live/dir=in/partner=tst
> trade track appl/type=doc/database=live/dir=in/partner=tst/curent=avail
> trade track appl/type=doc/database=live/dir=out/part=tst/current=avail>>
> When there's a lot of activity on the system or there are a lot of documents
> in the databases then some of the above commands will fail with the error
> message "server error". The next run sometimes doesn't show the error.
All of the above comments apply. In addition - look in rmu/show statistics to
see if you are getting stalls and deadlocks.
Mark
|
3038.9 | | METSYS::THOMPSON | | Fri Mar 07 1997 10:46 | 7 |
|
Another tip, to help resolve protection problems, to the operation from
a "normal" account and then from one with all privileges enabled. does
it make a difference? Often you can "blast thru" protection restrictions
with a big enough sledge hammer.
|
3038.10 | | FORTY2::DALLAS | Paul Dallas, DEC/EDI @REO2-F/E2 | Fri Mar 07 1997 14:34 | 12 |
| It's not clear that this is a problem with sys$scratch. That is just
one of the possibilites to check (and possibly to eliminate). As Mark
pointed out there are several possible reasons why the list trans could
go wrong, not all due to DEC/EDI.
If this *is* a problem with sys$scratch, then the solution is either to
set up sys$scratch correctly or raise an IPMT against Rdb. It is
possible that DEC/EDI could return a (more) meaningful error, but the
sys$scratch issue can not be fixed within DEC/EDI as it created by Rdb.
However, we're still at the stage of identifying the problem.
|
3038.11 | Monday we'll start testing | UTRTSC::SMEETS | Workgroup support | Fri Mar 07 1997 15:06 | 12 |
| Hi Mark and Paul,
Both thanks for your replies.
Monday my customer will be back in the office and I'll ask him to perform some
tests.
I'll keep you informed.
Have a nice weekend,
Martin
|