|
RE: -1
Also cross-posted at C_PLUS_PLUS #3564, NT-DEVELOPER #3238, Windows-NT #5991
Also cross-posted at C_PLUS_PLUS #3564, NT-DEVELOPER #3238
Brief description of the test, RAMDBASER.exe and RAMDBABU.exe:
The work of the ramdbserver is to do looking up entries in a
giant hash table, supposed to run very fast. It is a mutithreaded
application written in VC++ 4.0. There are three threads spawned
by ramdbser.exe: one of the idle threads is a UI thread, which
gets very little time during the course of execution. The other
is the DB sweep thread which is designed to be very lightweight.
Basically it just checks to see if a record has aged out of the
database and marks it dead.
The customer, a SUN shop, is trying to understand why his application is
not scalable on WNT SMP system (intel and alpha). The customer
strongly suspect a named pipe problem in in NT. This application is
designed to test the serial resolution efficency of the named pipe
mechanism.
For those who would like to take a look, the programs can be ftp at
fluid:/pub/RAMDB/Alpha/ramdb1.zip, and ../Intel/ramdb2.zip.
See enclosed two of my reports - names removed to 'protect' the
innocent.
Again, there is a sales pending due to the SMP server performance.
Any help would be greatly appreciated.
Thanks,
Miller
-------------------------------------------------------------------
Report#2 5/8/97 to the customer
I have completed the Pentium Pro 200 (2cpu) test and the
result is 1500res/sec which is slower than 1700 res/sec for
single CPU pentium pro 200. So the application/program
RAMDBABU.exe and RAMDBser.exe did not scale well (actually
took some performance loss) on both the Intel/NT and ALpha/NT
platforms. We have to look at the following closely:
The Digital Personal Workstation 200i2
(dual cpu pentium pro 200) result:
---------------------------------
Sys configuration: 128MB memory, 512KB cache, NT server 4.0 SP2,
pagefile size 128MB,
2CPU active, Display: Matrox 4MB 1024x768x256
Disk: 2GB
DPW 200i2 2CPU -- 6670ms, 1500 res/sec (tested in MRO lab)
Comparisons with other systems:
4100 5/300 (hal.dll 99,520KB, 12/14/97 multiprocessor)
1CPU 5550ms, 1801 res/sec
2CPU 6352ms, 1574 res/sec
3CPU 6195ms, 1614 res/sec
4CPU 6788ms, 1473 res/sec
The data ONLIFE reported are:
-------------------------
1700 res/sec for Pentinum Pro 200, single cpu
2300 res/sec for AlphaServer 4100, 1CPU, 466MHz 2GB RAM
1900 res/sec for AlphaServer 4100, 2CPU, 466MHz, 4GB RAM
The rest of statistics data based on PERFMON, PVIEW, and NT task
manager on Pentium Pro look very similar to the ALphaserver (ie.
high syscall/sec, context switches/sec, %processor time, threads
context switches,etc). The working set on Pentium is smaller (2120KB
for Ramserver on Intel vs. 2808KB on Alpha) though.
----------------------------------------------------------------------
Report #1 5/7/97 to the customer.
I completed the first round of test and you can find attached
the test and analysis. Also, I need a few things from
you to proceed further:
1. I tried to rebuild ramdbabu.exe with VC50, but ramdbabuse.h
was missing. Can you forward me a copy?
I think the ramdbabu.h is for ramdbabuseDlg.h.
2. I have a Pentinum 200i dual system here. Genne told me
that the result you got on a single Pentium was 1700 res/sec.
Can you send me the Intel version of ramdb so I can see
how Intel SMP server behaves?
3. What exactly is the ramdb workload trying to do?
The Alphaserver 4100 test results
- - - ---------------------------------
System configuration:
AlphaServer 4100 (EV5 - 21164 chip) 5/300 300MHz
Note: This system is not exactly the same as yours. I believe
your AlphaServer 4100 466MHz is a EV56 - 21264 system.
Also, your 466Mhz result (2300 res/sec 1 CPU result is higher
than mine possibly due to the clock speed).
Memory: 524MB, Cache 2MB, Disk: 4GB NTFS.
NT: V4.0 SP1
4100 5/300 1CPU 5550ms, 1801 res/sec
2CPU 6352ms, 1574 res/sec
3CPU 6195ms, 1614 res/sec
4CPU 6788ms, 1473 res/sec
The data you reported are:
-------------------------
1700 res/sec for Pentinum Pro 200, single cpu
2300 res/sec for AlphaServer 4100, 1CPU, 466MHz 2GB RAM
1900 res/sec for AlphaServer 4100, 2CPU, 466MHz, 4GB RAM
Analysis
- - - --------
- - - - No Alignment Fixups
- - - - No Floating Point Emulations
Note: High number of fixups or emulations will cause big performance
degradation on Alpha.
- - - - Sys call/sec is 16,094 max
- - - - 100% CPU usage
- - - - Working set is 2808KB
- - - - Multithreading
RAMABU has 1 thread (81% privilege mode, 19% user mode,
context switches/sec 466201)
RAMSERVER has 3 threads
0 - 96% priv mode, 4% user, cont sw: 1908
Dynamic Priority: 14
1 - 89% 11% cont sw: 239571 for 2CPUs
429937 for 3CPUs
Dynamic Priority: 15
2 - 0% 0%
Dynamic Priority: 1
Observation:
1. By looking at the result, there seems to be a scalability issues on
AlphaServer. However, we have to conduct the same tests on Intel Server
on NT 4.0. The issues could be related the NT, the OS.
2. There is a large number of sys call/secs, and context switches,
especially for the server thread #1. Context switches also increases
after adding extra CPUs.
What are server threads #0, and #1 doing?
Why the server thread #2 is doing nothing?
I am interested to see how Intel SMP server handles the test.
Next steps:
I have dissemabled the ramdbabu.exe and ramdbser.exe to look at the
machine code. The code was compiled with no debug and no
/Zh switches that could impact performance otherwise.
---------------------------------------------------------------------
|
| Disassembly of ramdbabu.exe and ramdbser.exe
Here are a portion of the dump files (also at the fluid ftp site). -Miller
Dump of file ramdbabu.exe
File Type: EXECUTABLE IMAGE
00402000: 23DEFFF0 lda sp,0xFFF0(sp)
00402004: A2010010 ldl a0,0x10(t0)
00402008: B75E0000 stq ra,0(sp)
0040200C: D34003EC bsr ra,00402FC0
00402010: A75E0000 ldq ra,0(sp)
00402014: 23DE0010 lda sp,0x10(sp)
00402018: 6BFA8001 ret
0040201C: 47FF041F nop
00402020: 23DEFFE0 lda sp,0xFFE0(sp)
00402024: 47EC9411 mov 0x64,a1
00402028: B75E0000 stq ra,0(sp)
0040202C: 47FF0412 clr a2
00402030: B7FE0008 stq zero,8(sp)
00402034: 63FF0000 trapb
00402038: B21E0010 stl a0,0x10(sp)
0040203C: 43F00010 sextl a0,a0
00402040: B3DE000C stl sp,0xC(sp)
00402044: D34003E2 bsr ra,00402FD0
00402048: A01E0010 ldl v0,0x10(sp)
0040204C: 247F0040 ldah t2,0x40
00402050: 63FF0000 trapb
00402054: 47E03401 mov 1,t0
00402058: 206341A0 lda t2,0x41A0(t2)
0040205C: B03E0008 stl t0,8(sp)
00402060: B0600000 stl t2,0(v0)
00402064: 20DFFFFF mov 0xFFFF,t5
00402068: A75E0000 ldq ra,0(sp)
0040206C: A01E0010 ldl v0,0x10(sp)
00402070: B0DE0008 stl t5,8(sp)
00402074: 63FF0000 trapb
00402078: 23DE0020 lda sp,0x20(sp)
0040207C: 6BFA8001 ret
00402080: 6BFA8001 ret
00402084: 00000000 call_pal halt
00402088: 00000000 call_pal halt
0040208C: 00000000 call_pal halt
00402090: 243F0041 ldah t0,0x41
00402094: A001A2C0 ldl v0,0xA2C0(t0)
00402098: 6BFA8001 ret
0040209C: 00000000 call_pal halt
004020A0: 241F0040 ldah v0,0x40
004020A4: 20004000 lda v0,0x4000(v0)
004020A8: 6BFA8001 ret
004020AC: 00000000 call_pal halt
004020B0: A0210018 ldl t0,0x18(t0)
004020B4: 23DEFFF0 lda sp,0xFFF0(sp)
004020B8: B75E0000 stq ra,0(sp)
004020BC: 22010080 lda a0,0x80(t0)
.
.
.
Dump of file ramdbser.exe
File Type: EXECUTABLE IMAGE
00402000: 243F0041 ldah t0,0x41
00402004: A001C48C ldl v0,0xC48C(t0)
00402008: 6BFA8001 ret
0040200C: 00000000 call_pal halt
00402010: 241F0040 ldah v0,0x40
00402014: 20006000 lda v0,0x6000(v0)
00402018: 6BFA8001 ret
0040201C: 00000000 call_pal halt
00402020: A2010010 ldl a0,0x10(t0)
00402024: 23DEFFF0 lda sp,0xFFF0(sp)
00402028: B75E0000 stq ra,0(sp)
0040202C: D3400774 bsr ra,00403E00
00402030: A75E0000 ldq ra,0(sp)
00402034: 23DE0010 lda sp,0x10(sp)
00402038: 6BFA8001 ret
0040203C: 47FF041F nop
00402040: 23DEFFE0 lda sp,0xFFE0(sp)
00402044: 47FF0411 clr a1
00402048: B75E0000 stq ra,0(sp)
0040204C: B7FE0008 stq zero,8(sp)
00402050: 63FF0000 trapb
00402054: B21E0010 stl a0,0x10(sp)
00402058: 43F00010 sextl a0,a0
0040205C: B3DE000C stl sp,0xC(sp)
00402060: D340076B bsr ra,00403E10
00402064: A01E0010 ldl v0,0x10(sp)
00402068: 63FF0000 trapb
0040206C: 247F0040 ldah t2,0x40
00402070: 47E03401 mov 1,t0
00402074: 20636088 lda t2,0x6088(t2)
00402078: B03E0008 stl t0,8(sp)
0040207C: B0600000 stl t2,0(v0)
00402080: 20DFFFFF mov 0xFFFF,t5
00402084: A75E0000 ldq ra,0(sp)
00402088: A01E0010 ldl v0,0x10(sp)
0040208C: B0DE0008 stl t5,8(sp)
00402090: 63FF0000 trapb
00402094: 23DE0020 lda sp,0x20(sp)
00402098: 6BFA8001 ret
0040209C: 00000000 call_pal halt
004020A0: 23DEFFF0 lda sp,0xFFF0(sp)
004020A4: 22010018 lda a0,0x18(t0)
004020A8: B75E0000 stq ra,0(sp)
004020AC: D3400038 bsr ra,00402190
004020B0: A75E0000 ldq ra,0(sp)
004020B4: 23DE0010 lda sp,0x10(sp)
004020B8: 6BFA8001 ret
004020BC: 47FF041F nop
004020C0: 23DEFE70 lda sp,0xFE70(sp)
|