[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference orarep::nomahs::rdb_60

Title:Oracle Rdb - Still a strategic database for DEC on Alpha AXP!
Notice:RDB_60 is archived, please use RDB_70..
Moderator:NOVA::SMITHISON
Created:Fri Mar 18 1994
Last Modified:Fri May 30 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:5118
Total number of notes:28246

5030.0. "Any other place to see why distributed transactions aborted?" by BROKE::BASTINE () Fri Feb 14 1997 13:17

I have a customer who has a remote "Headquarters" system that they load data
into on a nightly basis.  The load is issuing a remote connection to do its
work.

Once and a while the load will fail with:

"Distributed transaction was aborted, NOSECERR, unable to determine secondary
error"

This is expected, and documented.  We know that.  However, the only place to
look for why it failed is the NETSERVER.LOG.  The customer knows this, but
has this problem.  If the error occurs at 3:00am, the NETSERVER.LOG for this
process that failed has been purged.  Is there ANY other place to look for
why it failed?  Is there some logical or process they could put in place
to force such errors to be logged in a log file that won't get purged?  
According to the customer, and I don't know much about the NETSERVER stuff,
they are purged as a function of the network and how it works.  Is there
anyway to stop this purging, or a way in the RDBSERVER.COM file to capture
remote errors somewhere else so that they will have a record of what happened?

Are there any DECdtm log files that might capture way it aborted the 
transaction?

Thanks,
Renee
T.RTitleUserPersonal
Name
DateLines
5030.1DUCATI::LASTOVICAIs it possible to be totally partial?Fri Feb 14 1997 13:2417
you could, I suppose, edit SYS$SYSTEM:NETSERVER.COM to remove
or alter how the purging is done.  Off the top of my head,
something along the lines of:

change:
	$ IF F$SEARCH("SYS$LOGIN:NETSERVER.LOG;-10") .NES. "" THEN -
 	       PURGE /KEEP:3 SYS$LOGIN:NETSERVER.LOG

to:
	$ IF F$SEARCH("SYS$LOGIN:NETSERVER.LOG;-100") .NES. "" 
	$ THEN	TYPE/OUTPUT=SYS$LOGIN:OLDLOGS.TXT SYS$LOGIN:NETSERVER.LOG.*
	$	PURGE SYS$LOGIN:NETSERVER.LOG/KEEP=3
	$ ENDIF

this may result in some duplicates in OLDLOGS.TXT;*, but it might help
avoid missing stuff.  Or, just turn off purging altogether and just do it
manually once a day or so.
5030.2What if they have the log files and there is nothing in themBROKE::BASTINEFri Feb 14 1997 20:2915
Thanks for the idea... it is worth a try.


On another similar note, the customer had the same situation occur today,
during the day, so the netserver.logs were still around, and he couldn't
find any errors in any of the netserver.log files to indicate why the 
distributed transaction aborted.  He is really getting worried that there 
is a problem with their code that makes RDB fail, makes the distributed 
transaction fail, but leaves no trace on what the problem is so that they
can fix it!

Has anyone ever seen a situation where these errors occured and there was
nothing logged in the netserver.log?

Renee
5030.3NOVA::SMITHIDon't understate or underestimate Rdb!Fri Feb 14 1997 22:209
Another thing to do is something like

	SEARCH NETSERVER.LOG;-1 "RDB"/WINDOW=(1,5)

if every netserver did this then the log would carry forward the information
(if it was logged at one time).  This might be easier than just incorporating
the older logs in the latest.

Ian
5030.4remote monitor log?NOVA::BRYDENSun Feb 16 1997 14:172
        Can you see the remote attach appearing in the monitor log? are
        there any errors reported there? 
5030.5We did check the monitor logs, nothing abnormalBROKE::BASTINEMon Feb 17 1997 08:305
We checked the monitor logs on the remote node.  All attaches look normal
and there were no unexpected or abnormal terminations.  According to the 
monitor log, all was fine.

Renee
5030.6NOVA::BRYDENMon Feb 17 1997 15:4910
        Renee,
        
        	Can you actually see the request coming in and being logged
        on the remote nodes monitor log? I assume from your base note the
        customer is doing 2 attaches. One local,one remote. If you get the
        timestamp from the local attach, you should see one in the remote
        monitor log around the same time...  does this attach look
        correct?
        
        Dave
5030.7Will check again...BROKE::BASTINETue Feb 18 1997 09:5112
I didn't actually see the monitor log, but the customer looked at it.  He 
knew the time of the failure and didn't see that any attaches right before
that failure had any abnormalities.  They all looked normal, normal successful
attaches and disconnects.

However, the customer did say that they have many many processes attaching in
this manner per day.  That it would be difficult to track which one specifically
came in and was the failed process, but I can ask him to check again to 
see if the process was registering an attach at all.

Thanks,
Renee
5030.8More informationBROKE::BASTINETue Feb 18 1997 13:1212
I spoke with the customer again.  He just did a blanket check of the monitor
log, he didn't know the pid that failed, but said he could probably find it
and trace the pid to see if it did a successful attach/disconnect from Rdb.

They did find some database corruption in the database over the weekend, and
think that may have been the reason for the aborted distribed transactions,
but he won't be sure until they don't see the aborts anymore.  If this is 
the case, the netserver.log didn't seem to log any errors, nor did any other
log we looked at, which was kind of disconcerting to the customer.

Thanks
Renee