| Title: | DIGITAL UNIX (FORMERLY KNOWN AS DEC OSF/1) |
| Notice: | Welcome to the Digital UNIX Conference |
| Moderator: | SMURF::DENHAM |
| Created: | Thu Mar 16 1995 |
| Last Modified: | Fri Jun 06 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 10068 |
| Total number of notes: | 35879 |
We are benchmarking Sybase 11.0.2.2 with Open Client 10.0.0.3 on 3
machines.
Machine 1: 8400 with 4 5/300MHz CPU's, 4GB Memory, KFTHA, 2 PCI buses,
2 HSZ50 dual pairs with 128 MB cache, 24 RZ29B's in 4 RAID 0 + 1 arrays,
Digital UNIX V3.2G with full DUV32GAS00001-19970501 patches applied.
Machine 2: 8400 with 4 5/6/440MHz CPU's, 4GB Memory, KFTHA, 2 PCI buses,
2 HSZ50 dual pairs with 128 MB cache, 24 RZ29B's in 4 RAID 0 + 1 arrays,
Digital UNIX V3.2G with full DUV32GAS00001-19970501 patches applied.
Machine 3: 4100 with 4 5/400MHz CPU's, 1 GB Memory, 2 HSZ50 dual pairs
with 128 MB cache, 24 RZ29B's in 4 RAID 0 + 1 arrays, Digital UNIX
V3.2G with full DUV32GAS00001-19970501 patches applied.
The customer has an in-house application, which is multi-threaded
(badly, their words not mine) running 15 threads, which sorts through the
database and ouputs the relevant data. Their current system is an 8400
with 4 300MHz CPU's, 2GB Memory running Digital Unix V3.2D and Sybase
10.
When the application was first run on the 440MHz system it caused a
panic with the iether of the following errors;
panic (cpu 0): pciaerror
or
panic (cpu 3): xpt_callback: callback_on freed CCB
Installing the latest patch kit and recompiling and linking the
application to the 3.2G libraries stopped the system crashes. The
application then crashed when run but the core was always corrupted.
It always failed at the same point in the code.
We then ran it using ladebug which indicated that the code was crashing
during a free() funtion call. I then tried setting old_obreak=0 in the
kernel but made no difference. The call to free() was then removed from
the code but the application then failed at the next free() call.
The application behaved the same on the 300MHz system as the 440MHz
system.
But when we ran the origional code (compiled and linked against 3.2D)
on the 4100 system it ran to completion.
Has anyone any idea what is going on here?
Currently we are running the 3.2G complied and linked code on the 4100
and trying a run on the 300MHz 8400 with the memory reduced to 1GB. It
takes a while to run the benchmark so they are set to run over night.
The kernel parameters which have been modified are;
4100
ipc:
shm-max=2118123520
sem-mni=32
num-of-sems=120
proc:
max-proc-per-user=2048
max-threads-per-user=2048
per-proc-data-size=4294967296
max-per-proc-data-size=4294967296
per-proc-address-space=4294967296
max-per-proc-address-space=4294967296
per-proc-stack-size=1073741824
max-per-proc-stack-size=1073741824
rt:
aio-max-num=1024
vm:
vm-maxvas=4294967296
ubc-maxpercent=30
vm-ubcseqstartpercent=20
vm-vpagemax=131072
8400's
ipc:
shm-max=2147483647
sem-mni=1024
num-of-sems=120
#
rt:
aio-max-num=1024
aio-task-max-num=1024
vm:
vm-maxvas=4294967296
ubc-maxpercent=30
vm-ubcseqstartpercent=20
vm-vpagemax=4294967296
vm-maxwire=2147483648
vm-kentry_zone_size=33554432
contig-malloc-percent=2
proc:
max-proc-per-user=5000
max-threads-per-user=5000
per-proc-data-size=1073741824
max-per-proc-data-size=4294967296
per-proc-address-space=4294967296
max-per-proc-address-space=4294967296
per-proc-stack-size=134217728
max-per-proc-stack-size=1073741824
sched-min-idle=10
Billy.
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 9986.1 | Un-official simport patches seems to address this panic problem.. | NNTPD::"[email protected]" | Sri | Thu May 29 1997 16:31 | 9 |
Hi, Simport patches that addresses some of these relevant panics. The patches can be obtained from decatl.alf.dec.com:/pub/patches/misc/simport_patches Regards Sri [Posted by WWW Notes gateway] | |||||
| 9986.2 | Nothing new here... | WTFN::SCALES | Despair is appropriate and inevitable. | Fri May 30 1997 18:35 | 23 |
I'm afraid that there's not much information here to go on. .0> We then ran it using ladebug which indicated that the code was crashing .0> during a free() funtion call. I presume you were seeing a SEGV inside free(). Can you tell if the memory management data structures have been corrupted? Check the customer's code for instances of using memory after it's been freed, freeing memory twice, writing beyond the end of an array in dynamically allocated memory, or use of uninitialized local pointer variables. Any of these sorts of things in the customer's application could result in these symptoms. .0> But when we ran the origional code (compiled and linked against 3.2D) .0> on the 4100 system it ran to completion. It could easily be the case that in the 3.2D image the corruption either doesn't happen (i.e., because the timing of the threads' execution is different) or it corrupts an otherwise benign location (because of different timing, or different bits in the uninitialized pointer value). Webb | |||||
| 9986.3 | Latest Update | NESBIT::BGIRVAN | Mon Jun 02 1997 04:54 | 29 | |
I applied the simport patch suggested by Sri. We then ran the
executable and it ran to completion. So we ran it again to be sure, and
this time it fell over exactly as it had before.
We also managed to get it to run successfully on a 8000 without the
patch installed, but with the memory down to 1GB, but after the run
when I examined shared memory with 'ipcs' I got a load of rubbish
returned to the screen, as if shared memory had been currupted. I've
also noticed this after a failed run.
One other strange parameter I've noticed, the base address of the
kernel's vitual address space is different between the 8400's and the
4100, ie,
4100
vm-min-kernel-address = 18446744071562067968
8000
vm-min-kernel-address = 18446744065119617024
The 4100 address is the default according to the tuning guide, so why
is the 8400 different and is it significant?
Billy
| |||||
| 9986.4 | ladebug trace | NESBIT::BGIRVAN | Mon Jun 02 1997 05:40 | 398 | |
Here's the trace form ladebug on the problem.
asset_type1 : Starting segval netting asset_max is [79]
[asset_type1] : Retrieving segvals [exec p_rptdb_asset1_net A0019]
Executing procedure exec p_rptdb_asset1_net A0019 and retrieving data
Completed Executing procedure exec p_rptdb_asset1_net A0019 and
retrieving data rows 79
[asset_type1] : Completed Retrieval of segvals [exec p_rptdb_asset1_net
A0019]
[asset_type1] : Assign Category for [exec p_rptdb_asset1_net A0019]
[asset_type1] : Completed Assign Category for [exec p_rptdb_asset1_net
A0019]
[asset_type1] : Assigning customer_type and category for [exec
p_rptdb_asset1_net A0019] vehicle_max = [877], customer_max=[49051]
[asset_type1] : Completed Assigning customer_type and category for
[exec p_rptdb_asset1_net A0019]
[asset_type1] : Assigning Trans Type Group for [exec p_rptdb_asset1_net
A0019]
[asset_type1] : completed Assigning Trans Type Group for [exec
p_rptdb_asset1_net A0019]
[asset_type1] : completed cleaning trans_type btree for [exec
p_rptdb_asset1_net A0019]
[asset_type1] : completed cleaning customer btree for [exec
p_rptdb_asset1_net A0019]
[asset_type1] : completed cleaning vehicle btree for [exec
p_rptdb_asset1_net A0019]
[asset_type1] : Completed and release segval_netting mutex
asset_type1 : Walking through seg_net array
asset_type1 : array_number [0], accnode [31080], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [1], accnode [31081], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [2], accnode [32447], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [3], accnode [32448], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [4], accnode [32449], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [5], accnode [32450], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [6], accnode [32451], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [7], accnode [32452], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [8], accnode [32453], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [9], accnode [32454], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [10], accnode [32455], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [11], accnode [32458], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [12], accnode [32459], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [13], accnode [32460], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [14], accnode [32463], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [15], accnode [32464], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [16], accnode [32465], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [17], accnode [32466], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [18], accnode [32467], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [19], accnode [32468], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [20], accnode [32469], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [21], accnode [32471], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [22], accnode [32472], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [23], accnode [32473], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [24], accnode [32474], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [25], accnode [32475], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [26], accnode [32476], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [27], accnode [32477], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [28], accnode [32478], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [29], accnode [32479], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [30], accnode [32480], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [31], accnode [32481], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [32], accnode [32482], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [33], accnode [32483], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [34], accnode [32484], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [35], accnode [32487], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [36], accnode [32488], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [37], accnode [32489], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [38], accnode [32490], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [39], accnode [32491], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [40], accnode [32492], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [41], accnode [32493], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [42], accnode [32494], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [43], accnode [32495], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [44], accnode [32496], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [45], accnode [32497], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [46], accnode [32498], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [47], accnode [32499], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [48], accnode [32500], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [49], accnode [32501], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [50], accnode [36206], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [51], accnode [36210], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [52], accnode [60908], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [53], accnode [60915], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [54], accnode [60931], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [55], accnode [60936], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [56], accnode [81128], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [57], accnode [81130], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [58], accnode [81479], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [59], accnode [84737], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [60], accnode [84739], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [61], accnode [84742], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [62], accnode [84744], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [63], accnode [138068], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [64], accnode [138070], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [65], accnode [150283], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [66], accnode [150286], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [67], accnode [150405], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [68], accnode [164141], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [69], accnode [164143], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [70], accnode [177090], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [71], accnode [203592], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [72], accnode [205942], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [73], accnode [205944], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [74], accnode [208001], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [75], accnode [208002], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [76], accnode [212447], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [77], accnode [212449], s1 [ABS], cust_type
[], active [1]
asset_type1 : array_number [78], accnode [212526], s1 [ABS], cust_type
[], active [1]
[1] stopped at [asset_type1:2080 0x12003d390]
2080 printf("asset_type1 : deallocating seg_net. pointer
value is [%x]\n", seg_net);
(ladebug) next
asset_type1 : deallocating seg_net. pointer value is [b37a6008]
stopped at [asset_type1:2081 0x12003d3ac]
2081 fflush(stdout);
(ladebug) next
stopped at [asset_type1:2082 0x12003d3c4]
2082 root_seg=*rs;
(ladebug) step
stopped at [asset_type1:2083 0x12003d3d0]
2083 segval_dealloc(seg_net); seg_net=NULL;
(ladebug) step
stopped at [segval_dealloc:210 0x12002ed94]
(ladebug) step
stopped at [segval_dealloc:210 0x12002ed94]
210 if (s != NULL)
(ladebug) step
stopped at [segval_dealloc:212 0x12002ed9c]
212 printf("Deallocating segval pointer value
[%x]\n", s);
(ladebug) step
Deallocating segval pointer value [b37a6008]
stopped at [segval_dealloc:213 0x12002edb8]
213 fflush(stdout);
(ladebug) step
stopped at [segval_dealloc:214 0x12002edd0]
214 free(s);
(ladebug) step
Thread received signal SEGV
stopped at [free: ??? 0x3ff810323e4]
(ladebug) listobj
ObjectName Start Addr Size
Symbols
(bytes)
Loaded
----------------------------------------------------------------------------
rptbal_create 0x120000000 303104
Yes
/usr/shlib/libm.so 0x3ff80800000 991232
Yes
/usr/local/sybase11/lib/libsybdb.so
0x3ffbff40000 786432
Yes
/usr/shlib/libpthreads.so
0x3ff81000000 311296
Yes
/usr/shlib/libmach.so 0x3ff81800000 65536
Yes
/usr/shlib/libc_r.so 0x3ff82000000 589824
Yes
/usr/shlib/libc.so 0x3ff82800000 925696
Yes
(ladebug) where
>0 0x3ff810323e4 in free(0x3ffc28002b8, 0xffffffffb37a6008, 0x40, 0x0,
0x0, 0x8) DebugInformationStrippedFromFile92:???
#1 0x12002edd8 in segval_dealloc(s=-1283825656)
/usr/project/cord/rptdb/c/m3_alloc.c:214
#2 0x12003d3d8 in asset_type1(arg=0x0)
/usr/project/cord/rptdb/c/netting.c:2083
#3 0x3ff8104285c in /usr/shlib/libpthreads.so
| |||||
| 9986.5 | SYBASE parameters | NESBIT::BGIRVAN | Mon Jun 02 1997 05:41 | 276 | |
And here are the SYBASE intitialization parameters.
bggibx0015 # pwd
/usr/local/sybase
bggibx0015 # ls *.cfg
RMTGLSPRD01.cfg
bggibx0015 # cat RMTGLSPRD01.log
cat: cannot open RMTGLSPRD01.log
bggibx0015 # cat RMTGLSPRD01.cfg
##############################################################################
#
# Configuration File for the Sybase SQL Server
#
# Please read the System Administration Guide (SAG)
# before changing any of the values in this file.
#
##############################################################################
[Configuration Options]
[General Information]
[Backup/Recovery]
recovery interval in minutes = DEFAULT
print recovery information = DEFAULT
tape retention in days = DEFAULT
[Cache Manager]
number of oam trips = DEFAULT
number of index trips = DEFAULT
procedure cache percent = DEFAULT
memory alignment boundary = DEFAULT
[Named Cache:default data cache]
cache size = DEFAULT
cache status = default data cache
[4K I/O Buffer Pool]
pool size = 20.0000M
wash size = DEFAULT
[16K I/O Buffer Pool]
pool size = 20.0000M
wash size = DEFAULT
[Disk I/O]
allow sql server async i/o = DEFAULT
disk i/o structures = DEFAULT
page utilization percent = DEFAULT
number of devices = 25
disable character set conversions = DEFAULT
[Network Communication]
default network packet size = DEFAULT
max network packet size = DEFAULT
remote server pre-read packets = DEFAULT
number of remote connections = DEFAULT
allow remote access = DEFAULT
number of remote logins = DEFAULT
number of remote sites = DEFAULT
max number network listeners = DEFAULT
tcp no delay = DEFAULT
allow sendmsg = DEFAULT
syb_sendmsg port number = DEFAULT
[O/S Resources]
max async i/os per engine = 1024
max async i/os per server = 1024
[Physical Resources]
[Physical Memory]
total memory = 256000
additional network memory = DEFAULT
lock shared memory = DEFAULT
shared memory starting address = DEFAULT
[Processors]
max online engines = DEFAULT
min online engines = DEFAULT
[SQL Server Administration]
number of open objects = 800
number of open databases = 30
audit queue size = DEFAULT
default database size = DEFAULT
identity burning set factor = DEFAULT
allow nested triggers = DEFAULT
allow updates to system tables = DEFAULT
print deadlock information = DEFAULT
default fill factor percent = DEFAULT
number of mailboxes = DEFAULT
number of messages = DEFAULT
number of alarms = DEFAULT
number of pre-allocated extents = DEFAULT
event buffers per engine = DEFAULT
cpu accounting flush interval = DEFAULT
i/o accounting flush interval = DEFAULT
sql server clock tick length = DEFAULT
runnable process search count = DEFAULT
i/o polling process count = DEFAULT
time slice = DEFAULT
deadlock retries = DEFAULT
cpu grace time = 200
number of sort buffers = DEFAULT
sort page count = DEFAULT
number of extent i/o buffers = DEFAULT
size of auto identity column = DEFAULT
identity grab size = DEFAULT
lock promotion HWM = DEFAULT
lock promotion LWM = DEFAULT
lock promotion PCT = DEFAULT
housekeeper free write percent = DEFAULT
partition groups = DEFAULT
partition spinlock ratio = DEFAULT
[User Environment]
number of user connections = 200
stack size = DEFAULT
stack guard size = DEFAULT
systemwide password expiration = DEFAULT
permission cache entries = DEFAULT
user log cache size = DEFAULT
user log cache spinlock ratio = DEFAULT
[Lock Manager]
number of locks = DEFAULT
deadlock checking period = DEFAULT
freelock transfer block size = DEFAULT
max engine freelocks = DEFAULT
address lock spinlock ratio = DEFAULT
page lock spinlock ratio = DEFAULT
table lock spinlock ratio = DEFAULT
bggibx0015 #
bggibx0015 #
| |||||