[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | USG buildhelp questions/answers |
|
Moderator: | SMURF::FILTER |
|
Created: | Mon Apr 26 1993 |
Last Modified: | Mon Jan 20 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 2763 |
Total number of notes: | 5802 |
2570.0. "Here s the file that srequest cannot parse -- please help ASAP" by AOSG::FILTER (Automatic Posting Software - mail to flume::puck) Mon Sep 16 1996 18:02
Date Of Receipt: 16-SEP-1996 16:39:01.60
From: KAMLIA::kucherov "sergei kucherov 16-Sep-1996 1637"
To: odehelp@DEC:.zko.kamlia
CC: kucherov@DEC:.zko.kamlia
Subj: Here's the file that srequest cannot parse -- please help ASAP
Note that I've tried 3 times with srequest -nofill -form file_below
and it's failed the same way each time.
-----------------------------------------------------------------------------
Submit Request Form
Digital Internal Use Only
USEG Support Pool Submit Request Form
(Form version 2.7)
================Section 1. Patch Identification=================
1a) Patch Announcement Summary
This patch fixes some hangs that can occur during the "syncing disks..."
portion of panic processing, improves the reliability of getting a dump
after a system panic, and also makes it more likely that AdvFS buffers will
be synced to disk after a system panic.
1b) CLD/SPR/QAR information
CLD/QAR/SPR number(s) Priority Component(s)
--------------------- -------- ------------
HPXQ43C4D 2 FILE SYSTEMS
TKTR52185 2 KERNEL
QAR 46016 S ADVFS
1c) Release Note Information
======================================================================
REQUIRED PATCHES (other patches that are MANDATORY to install WITH this patch):
FILES TO BE DISTRIBUTED:
/usr/sys/BINARY/machdep.o RCS:
/usr/sys/BINARY/pmap.o RCS:
/usr/sys/BINARY/kern_clock.o RCS:
/usr/sys/BINARY/kern_synch.o RCS:
/usr/sys/BINARY/xpt.o RCS:
/usr/sys/BINARY/xcr_port.o RCS:
/usr/sys/BINARY/pr.o RCS:
/usr/sys/BINARY/msfs_io.o RCS:
/usr/sys/BINARY/msfs_vfsops.o RCS:
/usr/sys/BINARY/vfs_bio.o RCS:
/usr/sys/vfs/vfs_conf.o RCS:
INSTALLATION INSTRUCTIONS:
A kernel rebuild is required.
PROBLEM: ( HPXQ43C4D, TKTR52185, QAR 46016 ) (Patch ID: <Ignore this> )
This patch fixes some hangs that may occur after the message
"syncing disks..." is printed when the system panics. When these hangs
occur, the completion of the "syncing disks..." message - the word "failed"
or "done" does not get printed, and the system does not take a dump.
In addition to fixing these known hangs, a timout mechanism is added to the
"syncing disks" logic that will improve the reliability of getting a dump by
using the system clock to break out of the "syncing disks" path and take
a dump if no progress is being made on reducing the number of buffers to be
flushed. The numbers printed periodically between the "syncing disks..."
and "done" messages are the number of buffers left to flush.
This patch also makes it more likely that AdvFS buffers will be flushed to
disk during the "syncing disks..." processing after a system panic. There
is still no guarantee that writes in progress at the time of a panic will be
completed.
1d) Internal description
In the boot() routine which is called on a panic, the current delay loop
logic which waits for all the buffer flushes started by mntflushbuf() calls
for local filsystems, can hang due to device problems, resouce shortages in
drivers, or giving up control to a looping thread at the thread_preempt()
call. The change to this routine along with the change to hardclock()
forces a timeout of the loop if no progress is being made reducing the
number of busy buffers as returned by mntbusybuf(), in one second. If the
timeout occurs after a panic, hardclock() issues a second panic which goes
directly to take a dump. This is the logic to prevent new hangs on system
panics after the "syncing disks..." message has been printed.
The change to pmap_unload() is to prevent it from calling pmap_tbsync()
on a panic. The other cpus have been stopped so it isn't necessary, and
pmap_update_send() uses event_timeout() which will do a secondary panic if
panicstr is set. A second panic goes directly to take a dump stopping the
"syncing disks". This change is made because advfs, presto, and possibly
other device drivers use pmap_unload() in code executed for "syncing disks".
The change to mpsleep() is to return EINTR on an interruptible sleep and to
preempt if the sleep is uninterruptible. The EINTR return is a backport of
a platinum fix by Jim Woodward. These changes prevent a hang if control is
given to another thread via thread_block() while syncing disks. If the panic
thread gave up control due to a temporary shortage (such as out of driver
ccb's) that will be corrected on an interrupt, returning EINTR or preempting
in mpsleep() will keep the next thread running from taking over the cpu and
preventing the panic thread from running again.
The change to xpt_callback() causes it to bypass xpt_callback_thread when
syncing disks after panic because the xpt_callback_thread may not get to run.
If the xpt_callback_thread was next to be dispatched on another stopped cpu,
or was running on a stopped cpu, it will not run to process interrupts while
syncing disks. The changes in the scheduler or boot() to make sure the
xpt_callback_thread could run after a panic were determined to be too much of
a change for a patch.
The change to xcr_cmd_cmplt() fixes a hang that can occur in the re driver
if enough i/o was in progress at the time of a panic to use the pending queue.
The fix was provided by Bill Dallas.
The change to PRstrategy() prevents writes from getting out of order when
syncing disks after a panic and flushing presto buffers on reboot. It removes
a direct call to the device driver on panic which bypassed the presto cache.
The direct call to the driver sent writes directly to disk while syncing
disks after a panic, without invalidating data from the same blocks in nvram.
On reboot any dirty nvram blocks flushed could overwrite the last data
written on the panic getting writes out of order (possible corruption).
The change to call_disk() just keeps it from preempting if the system has
paniced. The preempt was added to call_disk() to give threads of a higher
priority a chance to run during a long i/o path. This isn't necessary once
the system has paniced, and takes control from the panic thread possibly
allowing another thread to run that will do a second panic and thereby stop
syncing disks.
The new routine msfs_mntflushbuf() is called by mntflushbuf() if the local
filesystem is AdvFS, to flush advfs buffers for the domain. (To sync disks,
boot() calls mntflushbuf() for each locally mounted filesystem. Because
AdvFS is not tightly integrated with VFS and the UBC, AdvFS buffers don't get
flushed by mntflushbuf().)
================Section 2. Pool integration=================
2a) Which support pool(s) do you plan to submit to?
v40asupportos [s] v40asupportx11 [ ] PTA support pools
v40asupportcde [ ] v40asupportdx [ ]
v40supportos [s] v40supportx11 [ ] Platinum support pools
v40supportcde [ ] v40supportdx [ ]
v32gsupportos [s] v32gsupportx [ ] MP2 support pools
v32fsupportos [s] v32fsupportx [ ] HW6 support pools
v32de2supportos [s] v32de2supportx [ ] HW5 and V3.2E-2 support pools
v32de1supportos [*] v32de1supportx [ ] MP1 and V3.2E-1 support pools
v32csupportos [s] v32csupportx [ ] Platinum Lite support pools
v32bsupportos [ ] v32bsupportx [ ] Hardware release for Gold Minor
v32supportos [ ] v32supportx [ ] Gold Minor support pools
v30bsupportos [ ] v30bsupportx [ ] Hardware release for Gold
v30supportos [ ] v30supportx [ ] Gold support pools
v20bsupportos [ ] v20bsupportx [ ] Hardware release for Sterling
v20supportos [ ] v20supportx [ ] Sterling support pools
tcr14supportos [ ] tcr14supportdx [ ] based on OS V4.0A
tcr1supportos [s] tcr1supportdx [ ] based on OS V3.2DE-1
ase13supportos [ ] ase13supportdx [ ] based on OS V3.2DE-1
ase12asupportos [ ] based on OS V3.2C
ase12supportos [ ] based on OS V3.2
ase11supportos [ ] based on OS V3.0
2b) FYI - Does this patch need to be submitted to the development pool(s)?
PTB [ ]
PTC [ ]
Steel [X] QAR 48140 has been opened to address handling
thread problems differently in steel
2c) Ported from:
v32de1supportos-63-amilicia
2d) Baselevel:
Baselevel do you wish to submit to?
BL 0 of project v32de2supportos
2e) Integration log:
cat ../link/Logs/Version.log:
Start build: Wed Sep 11 17:30:09 EDT 1996
automatic nightly build
project-baselevel: V32DE2SUPPORTOS-BL0
version.build: 124
version.type: P
version.variant: DE-2
Done build: Wed Sep 11 21:09:15 EDT 1996
Start install: Wed Sep 11 21:09:56 EDT 1996
Done install: Wed Sep 11 22:51:29 EDT 1996
================Section 3. Testing=================
3a) Code and Patch Readme Reviewers:
Anne Milicia
3b) Functional Testing - Prior to srequest:
I tested shutdown, panic, and reboot, with and without heavy i/o in progress
on:
- a 3 cpu GAMMA with and without presto enabled to rz28's
- a 4 cpu SABLE with i/o to a raid disk
- a 1 cpu AVANTI
Built all kernels including AVANTI.LITE which does not have MSFS option.
3c) Regression Testing:
Delete lines that do not apply to your change.
Tested GENERIC YES
Tested SAS N/A
Tested REALTIME N/A
Tested INSTALL N/A
3d) Test Instruments:
================Section 4. Customer Impacts=================
4a) For shared libraries only:
ORIGINAL:
CURRENT:
NEW:
4b) Compatibility impacts:
NO:
4c) Standards Compliance:
NO:
================Section 5. Inventory Content Changes=================
5a) Changed inventories:
NEW inventory files:
CHANGED inventory files:
DEFUNCT inventory files:
5b) List Source files:
bstat -all for the list of files and the revs:
[ ./kernel/arch/alpha/machdep.c ]
version 1.2.243.2 selected setname Sergei_Kucherov_x32de2
[ ./kernel/bsd/kern_synch.c ]
version 4.3.45.2 selected setname Sergei_Kucherov_x32de2
[ ./kernel/io/dec/eisa/xcr_port.c ]
version 1.1.58.2 selected setname Sergei_Kucherov_x32de2
================Section 6. Code differences=================
6) Code Diffs:
bdiff -r$NEW -all -c >& bdiff.log
[ ./kernel/arch/alpha/machdep.c ]
===================================================================
RCS file: ./kernel/arch/alpha/machdep.c,v
retrieving revision 1.2.166.5
diff -c -r1.2.166.5 OdeSrvrTmpSergei_Kucherov026820/machdep.c
*** 1.2.166.5 1995/10/26 21:43:19
--- OdeSrvrTmpSergei_Kucherov026820/machdep.c 1996/09/16 20:21:51
***************
*** 4,13 ****
/*
* HISTORY
* $Log: machdep.c,v $
* Revision 1.2.166.5 1995/10/26 21:43:19 Mark_Bozen
* merge of MP1 BL1 changes
* [1995/10/26 21:35:31 Mark_Bozen]
! *
* Revision 1.2.166.4 1995/08/04 16:35:45 Mark_Bozen
* Minor changes to warning messages in mces_sce_handle and
* mces_pce_handle.
--- 4,19 ----
/*
* HISTORY
* $Log: machdep.c,v $
+ * Revision 1.2.243.2 1996/09/16 20:01:05 Sergei_Kucherov
+ * Change syncing disks in boot() to continue as long as progress is
+ * being made. Use hardclock() to guarantee a timeout within 1 second
+ * of no progress. QAR 46016.
+ * [1996/08/29 15:41:26 Ann_Milicia]
+ *
* Revision 1.2.166.5 1995/10/26 21:43:19 Mark_Bozen
* merge of MP1 BL1 changes
* [1995/10/26 21:35:31 Mark_Bozen]
! *
* Revision 1.2.166.4 1995/08/04 16:35:45 Mark_Bozen
* Minor changes to warning messages in mces_sce_handle and
* mces_pce_handle.
***************
*** 960,966 ****
*
* $EndLog$
*/
! #pragma ident "@(#)$RCSfile: machdep.c,v $ $Revision: 1.2.166.5 $ (DEC) $Date:
1995/10/26 21:43:19 $"
#include <sys/types.h>
#include <machine/reg.h>
--- 966,972 ----
*
* $EndLog$
*/
! #pragma ident "@(#)$RCSfile: machdep.c,v $ $Revision: 1.2.243.2 $ (DEC) $Date:
1996/09/16 20:01:05 $"
#include <sys/types.h>
#include <machine/reg.h>
***************
*** 1579,1584 ****
--- 1585,1592 ----
int waittime = -1;
int shutting_down = 0;
+ int bootsync = 0;
+ int bootsync_ticks = 0;
boot(reason, howto)
int reason, howto;
***************
*** 1594,1599 ****
--- 1602,1609 ----
extern int cpu;
extern long hwrpb_addr;
+ extern long lbolt;
+ extern int hz;
/*
* "shutting_down" is used by device drivers to determine
***************
*** 1657,1666 ****
VFS_SYNC(rootfs, MNT_NOWAIT, error);
}
(void) splnet(); /* block software interrupts */
printf("syncing disks... ");
{
! int ind, nbusy;
/*
* Write out locally mounted file system blocks
--- 1667,1686 ----
VFS_SYNC(rootfs, MNT_NOWAIT, error);
}
+ /*
+ * Setup to timeout if syncing disks hangs.
+ */
+ (void) splclock();
+ bootsync_ticks = 5 * hz;
+ bootsync = 1;
+
(void) splnet(); /* block software interrupts */
printf("syncing disks... ");
{
! int nbusy;
! int prev_nbusy = 0;
!
! long prev_lbolt;
/*
* Write out locally mounted file system blocks
***************
*** 1676,1687 ****
mp = mp->m_next;
} while (mp != rootfs);
/*
* Wait for all writes to all locally
* mounted file systems.
*/
! for (ind = 0; ind < 30; ind++) {
! int prev_nbusy;
/*
* Give up cpu to higher priority
* threads, so the CAM callback
--- 1696,1707 ----
mp = mp->m_next;
} while (mp != rootfs);
+ prev_lbolt = lbolt;
/*
* Wait for all writes to all locally
* mounted file systems.
*/
! do {
/*
* Give up cpu to higher priority
* threads, so the CAM callback
***************
*** 1700,1726 ****
mp = mp->m_next;
} while (mp != rootfs);
! /* break out if
! * 1. All blocks are out, or
! * 2. we looped 12 times and we
! * didn't make progress this time.
! * 3. In case of panic, we looped 6 times
! * and didn't finish "sync".
! * (could be data corruption).
! */
! if (nbusy == 0
! || (ind > 12 && prev_nbusy == nbusy)
! || (reason == RB_PANIC
! && ind > 6 &&
! prev_nbusy == nbusy))
! break;
! printf("%d ", nbusy);
! prev_nbusy = nbusy;
/*
! * DELAY at least 40 millisecond for each loop
*/
! DELAY(10000*(4+ind));
! }
if (nbusy)
printf("%d failed\n", nbusy);
else
--- 1720,1743 ----
mp = mp->m_next;
} while (mp != rootfs);
! if ((lbolt - prev_lbolt) >= hz/2) {
! printf("%d ", nbusy);
! prev_lbolt = lbolt;
! }
!
/*
! * If we're making progress allow another
! * second of syncing.
*/
! if (nbusy != prev_nbusy) {
! prev_nbusy = nbusy;
! bootsync_ticks = hz;
! }
!
! } while (nbusy && bootsync_ticks > 0);
!
! bootsync = 0;
!
if (nbusy)
printf("%d failed\n", nbusy);
else
***************
*** 4588,4591 ****
--- 4605,4609 ----
percpu = (struct rpb_percpu *)((u_long)rpb + rpb->rpb_percpu_off );
return ( percpu->rpb_proctype );
}
+
[ ./kernel/bsd/kern_synch.c ]
===================================================================
RCS file: ./kernel/bsd/kern_synch.c,v
*** 4.3.32.2 1995/10/20 14:21:46
--- OdeSrvrTmpSergei_Kucherov026194/kern_synch.c 1996/09/16 20:21:59
***************
*** 4,14 ****
/*
* HISTORY
* $Log: kern_synch.c,v $
* Revision 4.3.32.2 1995/10/20 14:21:46 James_Woodward
* change load_context to pass current lwc_pending state to the next
* thread. This keeps us from losing lwc interrupts.
* [1995/10/10 11:31:04 James_Woodward]
! *
* Revision 4.3.27.2 1995/04/05 18:24:11 Paula_Long
* Converting ptlite files linked to ptos.bl3 (modified in ptos) to real
files.
* [1995/04/05 02:45:25 Paula_Long]
--- 4,20 ----
/*
* HISTORY
* $Log: kern_synch.c,v $
+ * Revision 4.3.45.2 1996/09/16 20:05:56 Sergei_Kucherov
+ * In mpsleep(), return EINTR for interruptible sleeps when system is
+ * panicing. This fixes hangs in sigsuspend() loop when thread_preempt()
+ * is issued in panic sync path. QAR 46016.
+ * [1996/08/06 16:08:40 Ann_Milicia]
+ *
* Revision 4.3.32.2 1995/10/20 14:21:46 James_Woodward
* change load_context to pass current lwc_pending state to the next
* thread. This keeps us from losing lwc interrupts.
* [1995/10/10 11:31:04 James_Woodward]
! *
* Revision 4.3.27.2 1995/04/05 18:24:11 Paula_Long
* Converting ptlite files linked to ptos.bl3 (modified in ptos) to real
files.
* [1995/04/05 02:45:25 Paula_Long]
***************
*** 174,180 ****
*
* $EndLog$
*/
! #pragma ident "@(#)$RCSfile: kern_synch.c,v $ $Revision: 4.3.32.2 $ (DEC)
$Date: 1995/10/20 14:21:46 $"
/*
*/
/*
--- 180,186 ----
*
* $EndLog$
*/
! #pragma ident "@(#)$RCSfile: kern_synch.c,v $ $Revision: 4.3.45.2 $ (DEC)
$Date: 1996/09/16 20:05:56 $"
/*
*/
/*
***************
*** 407,412 ****
--- 413,422 ----
s = splnet();
/* Must clear any assert_wait, here or in caller */
clear_wait(th, THREAD_AWAKENED, FALSE);
+ if (catch)
+ error = EINTR;
+ else
+ thread_preempt(th, 0);
splx(s);
goto out;
}
retrieving revision 4.3.32.2
diff -c -r4.3.32.2 OdeSrvrTmpSergei_Kucherov026194/kern_synch.c
[ ./kernel/io/dec/eisa/xcr_port.c ]
===================================================================
RCS file: ./kernel/io/dec/eisa/xcr_port.c,v
retrieving revision 1.1.47.2
diff -c -r1.1.47.2 OdeSrvrTmpSergei_Kucherov009769/xcr_port.c
*** 1.1.47.2 1996/03/25 20:09:51
--- OdeSrvrTmpSergei_Kucherov009769/xcr_port.c 1996/09/16 20:22:05
***************
*** 4,13 ****
/*
* HISTORY
* $Log: xcr_port.c,v $
* Revision 1.1.47.2 1996/03/25 20:09:51 Sergei_Kucherov
* merge of V32C Rev 74 or V32DE1 Rev 14 Support changes
* [1996/03/21 21:29:33 Sergei_Kucherov]
! *
* Revision 1.1.33.3 1996/03/05 14:41:09 Joseph_Melvin
* Corrected logic bug discovered while working fix for QAR#41490.
* Parenthesis in wrong spot for condition check during while
--- 4,19 ----
/*
* HISTORY
* $Log: xcr_port.c,v $
+ * Revision 1.1.58.2 1996/09/16 20:08:57 Sergei_Kucherov
+ * When system has panic'ed (shutting_down) make sure pending queue is
+ * processed by calling xcr_handle_que() in xcr_cmd_cmplt(). Fix from
+ * Bill Dallas. QAR 46016.
+ * [1996/07/29 16:26:55 Ann_Milicia]
+ *
* Revision 1.1.47.2 1996/03/25 20:09:51 Sergei_Kucherov
* merge of V32C Rev 74 or V32DE1 Rev 14 Support changes
* [1996/03/21 21:29:33 Sergei_Kucherov]
! *
* Revision 1.1.33.3 1996/03/05 14:41:09 Joseph_Melvin
* Corrected logic bug discovered while working fix for QAR#41490.
* Parenthesis in wrong spot for condition check during while
***************
*** 106,112 ****
*
* $EndLog$
*/
! #pragma ident "@(#)$RCSfile: xcr_port.c,v $ $Revision: 1.1.47.2 $ (DEC) $Date:
1996/03/25 20:09:51 $"
#define XCRERRLOG
--- 112,118 ----
*
* $EndLog$
*/
! #pragma ident "@(#)$RCSfile: xcr_port.c,v $ $Revision: 1.1.58.2 $ (DEC) $Date:
1996/09/16 20:08:57 $"
#define XCRERRLOG
***************
*** 1705,1710 ****
--- 1711,1723 ----
if(cnt_ws == (CNTRL_WS *)NULL){
panic("xcr_cmd_cmplt CNTRL_WS == NULL");
}
+
+ /*
+ * Before We call back the driver.. see
+ * if we can get another cmd going
+ *
+ */
+ xcr_handle_que( cnt_ws->cntrl_softc);
/*
* Call back the driver
==================================================================
Digital Internal Use Only
T.R | Title | User | Personal Name | Date | Lines
|
---|