[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::ase

Title:ase
Moderator:SMURF::GROSSO
Created:Thu Jul 29 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2114
Total number of notes:7347

1996.0. "Distribuited Raw Disk performance" by ROMOIS::CIARAMELLA () Fri Apr 11 1997 05:29

    hello,
    
    I have a customer using trucluster, and Oracle on a distribuited raw
    disks. Benchmarking DRD, He got that the speed of dist. raw disk
    on the node on which reside the disk is 50% faster of when is queried
    from the other cluster member. 
    There is somebody that have experience in drd performance ?, 
    Which shoud be the overhead introduced from the cluster software ?
    
    Thanks,
    					enzo
T.RTitleUserPersonal
Name
DateLines
1996.1DRD perf issuesMIPSBX::"[email protected]"pelleMon Apr 14 1997 12:2430
There are a number of things that influence remote DRD performance:

1. The HW of the MEMORY CHANNEL board.
2. Which machine it is plugged into. Which model, how many CPU:s etc.
3. How much I/O is actually done simultaneously.
4. How big each  I/O request is (bigger is better up to 64 KB).
etc

We have measured the raw throughput over MC in a cluster using DRD 
to be between 20-60 MB depending on the MC board rev, the number and types
of nodes used etc.
If you are using a lot of SCSI buses in parallell(each one being able to do 
~15 MB/s, a realistic valued for Fast Wide SCSI), one can imagine that DRD 
performance will drop compared to local I/O since the MC bandwidth will be a 
bottleneck. Also, the CPU load on a DRD server may be significant depending on
 the configuration used.

As answer to your question, I would like to say the following:
When setting up OPS using DRD the database should be configured to use local
access if possible especially if it is a big database using a lot of disks
and SCSI buses. If DRD is used to access a local disk, there is very little
overhead compared to local I/O without DRD. With DRD one buys a high 
availability solution. In case of different kinds of failures the disks can 
still be accessed. However, for certain types of failures (failed path
access),
the performance may go down if DRD is used remotely instead of locally. 
The performance hit is very application dependent. For OPS it can be 
substantial, something to think about when configuring OPS.

[Posted by WWW Notes gateway]
1996.2Good answerROMOIS::CIARAMELLAWed Apr 16 1997 12:5116
    Thanks for your answer, but the mainly customer concern is about the
    DRD server/client relative performance that in some cases (blocksize >
    32kb, tested using diskx) effectively are for the cliente one half of
    the server. You say in your mail, "DRD is mainly for availability", 
    could be a good answer.
    
    The customer configuration is a couple of :
    
    8400 with 6 CPU alpha EV5
    6 GB of memory
    2 kzpsa
    
    In the configuration is included an SW800 cabinet with 4 hsz40 and
    several GigaBytes of disks.
    
    					enzo  
1996.3some commentsALFAM7::GOSEJACOBThu Apr 17 1997 05:4334
    re .2
    Sorry I have no hard data on drd local/remote access but I thought it
    wouldn't hurt to throw in a comment or two.
    
    When the customer tested drd I/O performance where the machines idle
    otherwise? The reason I'm asking is that all I/O requests from the drd
    client will actually be translated into 'real' I/O requests on the drd
    server machine. Now if that machine is busy with other I/O activity the
    drd access will definitely be slower than local disk access from the
    drd client. I just wanted to point out that you have to make sure that
    nothing else is happening on either machine while their test is running.
    
    Did I get this right: they have used diskx on the drd devices. Could
    you double check if diskx can actually be used with drds. I thought I
    read somewhere in notes that this is not supported.
    
    Looking at you configuration data I see 2 pretty powerful machines with
    only 2 KZPSA each to access 4 HSZ40's. Does that mean you are using 2
    HSZ's on each shared bus? In which case you need to keep in mind that
    even if you access all devices configure on one HSZ from one machine
    and all devices configured on the other HSZ from the second machine in
    sum you still only get the performance of one SCSI bus.
    
    Was there any reason for not using one shared SCSI bus per HSZ? Or are
    the HSZ's configured as dual redundant?
    
    And another one. If you have only 2 KZPSAs make sure that you do not
    plug them into the PCI bus right next to each other. Well I'm not a PCI
    expert but I think I remember that the PCI bus is somehow split into 2
    halfes. So for performance reasons you should plug one KZPSA into each
    half of the PCI bus (by leaving a gap of 6 PCI slots).
    
    	Martin
    
1996.4UTRUST::PILMEYERQuestions raise the doubtFri Apr 18 1997 05:1011
>>    And another one. If you have only 2 KZPSAs make sure that you do not
>>    plug them into the PCI bus right next to each other. Well I'm not a PCI
>>    expert but I think I remember that the PCI bus is somehow split into 2
>>    halfes. So for performance reasons you should plug one KZPSA into each
>>    half of the PCI bus (by leaving a gap of 6 PCI slots).
    
    Actually it is 3 times 4 slots. But that shouldn't cause a problem for
    just two KZPSA's unless maybe when there are also MC or FDDI or fast
    ethernet controllers in the same segment.
    
    -Han
1996.5Tunable parametersROMOIS::CIARAMELLAFri Apr 18 1997 11:1927
    ------------------------------------------------------------------------
    re .3, .4
    ------------------------------------------------------------------------
    
    The use of diskx to exercise DRD disks, is reported in the Trucluster
    Management Guide. 
    I verified the performance of DRD disk using a dd command too, with the
    same results.                             
    
    Sorry,  I am not been very clear about KZPSA and hsz40. There are two  
    hsz40 are configured as dual redundant, for each shared scsi bus. 
    
    
    The kzpsa already take advantage of two pci bridge.
    
    -------------------------------------------------------------------
    
    Le me to submit something that I have in mind at wich at the moment I
    do not have answer:
    
    
    -Any idea which is the relation between the drd segment that could be
    found in drd subsystem and performane ??. The default value is 1MB.
    -The other parameters, are tunable or is not useful.
    
    						enzo
    
1996.6drd's are 300% slower than ocal device ??GUIDUK::SOMERTurgan Somer - SEO06 - DTN 548-6439Fri Apr 25 1997 03:04345
Here are some findings from tests we performed with DRD's:

1. DRD device reads from shared (SCSI) devices take about 300% longer than 
   comparable/identical ADVFS file system reads from the same shared device
	
2. Raw (rrzxx) device reads from local (non-shared device) take essentially 
   the same time as a ADVFS file system read on comparable/identical files
	
3. Subsequent repeat ADVFS file system reads are 1500% faster than the 
   initial read (i.e., the entire read was cached)
	
4. Raw device reads do not take advantage of any hardware or UNIX OS 
   caching irrespective of whether they are a local (rrzxx) or a shared device 
   (DRD). None of the reads were cached.
	
5. DRD device reads of a DRD device allocated to Node1 by Node2 
   took about 10% longer than if the read was done on Node1
	
6. Mirroring and or striping of data had no significant impact on raw 
   device reads and inconsequential or marginal improvement on ADVFS file 
   system reads

Test Environment:

2 8200's Dual Processors with 512 MB Memory (carewise1 and carewise2
                                               are their node names)
2 KZPSA's in each 8200's connecting to two shared SCSI's with dual
   redundant HSZ50's on each shared SCSI bus
10 RZ29's on each dual redundant HSZ50's
Dual Memory Channel
DU 4.0A and TruCluster 1.4
Oracle Parallel Server (OPS)

Test Parameters:

The tests yielding the above summary findings were performed on Alpha 
8200's which were being used solely by the tester.  All tests were run 
singularly and the preceding test was allowed to complete prior to 
executing the subsequent test.

All timings involved the use of a 100MB (1024*1024*100 = 104,857,600 bytes) 
file or section from the same raw device.  The test timings used in 
developing the above findings were "read only" in context from a singular 
device so as to minimize potential I/O, CPU bandwidth bottlenecks.  Where 
parameters were changed (i.e. blocksize), their noted effect was 
negligible.


Test Results

The following test results are edited script output of the actual tests 
performed:

Script started on Wed Apr 23 10:43:30 1997
# csh
--->		Create Test Files by reading from distributed raw device.

root@carewise1 /u01/oradata/test 41# /usr/bin/time dd if=/dev/rdrd/drd1 of=t1 
                                                      bs=32768 count=3200
3200+0 records in
3200+0 records out

real   65.0
user   0.0
sys    3.3

root@carewise1 /u01/oradata/test 43# /usr/bin/time dd if=/dev/rdrd/drd1 of=t2 
                                                      bs=32768 count=3200
3200+0 records in
3200+0 records out

real   65.3
user   0.0
sys    3.5

root@carewise1 /u01/oradata/test 45# ls -l

Ktotal 307216
drwxr-xr-x   2 root     dba           8192 Apr 23 10:53 .
drwxr-xr-x   8 oracle   dba           8192 Apr 23 10:42 ..
-rw-r--r--   1 root     dba      104857600 Apr 23 10:45 t1
-rw-r--r--   1 root     dba      104857600 Apr 23 10:53 t2


--->		Copy file system object to null device (Read of file on 
                                                    shared AdvFS device)

root@carewise1 /u01/oradata/test 48 /usr/bin/time dd if=t2 of=/dev/null 
                                                            bs=32768
3200+0 records in
3200+0 records out

real   19.7
user   0.0
sys    1.8

root@carewise1 /u01/oradata/test 49# /usr/bin/time dd if=t1 of=/dev/null 
                                                            bs=32768
3200+0 records in
3200+0 records out

real   19.7
user   0.0
sys    1.8

--->		Repeat reads show effects of hardware/UNIX OS cacheing

root@carewise1 /u01/oradata/test 50# /usr/bin/time dd if=t1 of=/dev/null 
                                                            bs=32768
3200+0 records in
3200+0 records out

real   1.3
user   0.0
sys    1.3

--->		Change in blocksize used during reads

root@carewise1 /u01/oradata/test 51# /usr/bin/time dd if=t1 of=/dev/null bs=4096

25600+0 records in
25600+0 records out

real   1.7
user   0.1
sys    1.6

root@carewise1 /u01/oradata/test 52# /usr/bin/time dd if=t2 of=/dev/null bs=4096

25600+0 records in
25600+0 records out

real   19.9
user   0.1
sys    2.1

root@carewise1 /u01/oradata/test 53# /usr/bin/time dd if=t2 of=/dev/null bs=8192

12800+0 records in
12800+0 records out

real   1.4
user   0.0
sys    1.4

root@carewise1 /u01/oradata/test 54# /usr/bin/time dd if=t1 of=/dev/null bs=8192

12800+0 records in
12800+0 records out

real   19.9
user   0.0
sys    1.8

--->		Perform raw read from DRD to null device

root@carewise1 /u01/oradata/test 55# /usr/bin/time dd if=/dev/rdrd/drd1 
                                         of=/dev/null bs=32768 count=3200
3200+0 records in
3200+0 records out

real   46.1
user   0.0
sys    0.7

--->		Repeat reads show no effects from hardware/UNIX OS caching

root@carewise1 /u01/oradata/test 56# /usr/bin/time dd if=/dev/rdrd/drd1 
                                         of=/dev/null bs=32768 count=3200
3200+0 records in
3200+0 records out

real   46.1
user   0.0
sys    0.7

--->		Perform raw read from Local (non-shared) Device to null device

root@carewise1 /u01/oradata/test 57# /usr/bin/time dd if=/dev/rrz33c 
                                         of=/dev/null bs=32768 count=3200
3200+0 records in
3200+0 records out

real   15.4
user   0.0
sys    0.5

--->		Repeat read show no effect from hardware/UNIX OS caching

root@carewise1 /u01/oradata/test 58# /usr/bin/time dd if=/dev/rrz33c 
                                         of=/dev/null bs=32768 count=3200
3200+0 records in
3200+0 records out

real   15.4
user   0.0
sys    0.5

--->		Another local disk performs as expected

root@carewise1 /u01/oradata/test 59# /usr/bin/time dd if=/dev/rrz35c 
                                         of=/dev/null bs=32768 count=3200
3200+0 records in
3200+0 records out

real   15.4
user   0.0
sys    0.6

root@carewise1 /u01/oradata/test 61# # exit

--->		Tests on second node show slight performance hit (aprox 10%)

root@carewise2 /u02/oradata/test 60# /usr/bin/time dd if=/dev/rdrd/drd1 of=t2 
                                                      bs=32768 count=3200
3200+0 records in
3200+0 records out

real   71.2
user   0.0
sys    3.9

root@carewise2 /u02/oradata/test 61# ls -l

total 204816
drwxr-xr-x   2 root     dba           8192 Apr 23 10:49 .
drwxr-xr-x   8 oracle   dba           8192 Apr 23 10:45 ..
-rw-r--r--   1 root     dba      104857600 Apr 23 10:48 t1
-rw-r--r--   1 root     dba      104857600 Apr 23 10:51 t2


---> 		Read of file on shared AdvFS device

root@carewise2 /u02/oradata/test 63# /usr/bin/time dd if=t1 of=/dev/null 
                                                            bs=32768
3200+0 records in
3200+0 records out

real   19.8
user   0.0
sys    1.9

---> 		Cached read

root@carewise2 /u02/oradata/test 64# /usr/bin/time dd if=t1 of=/dev/null 
                                                            bs=32768
3200+0 records in
3200+0 records out

real   1.4
user   0.0
sys    1.4

root@carewise2 /u02/oradata/test 65# /usr/bin/time dd if=t2 of=/dev/null 
                                                            bs=32768
3200+0 records in
3200+0 records out

real   17.4
user   0.0
sys    1.8

root@carewise2 /u02/oradata/test 66# /usr/bin/time dd if=t2 of=/dev/null 
                                                            bs=32768
3200+0 records in
3200+0 records out

real   1.4
user   0.0
sys    1.4

--->		Blocksize change

root@carewise2 /u02/oradata/test 67# /usr/bin/time dd if=t1 of=/dev/null 
                                                            bs=65536
1600+0 records in
1600+0 records out

real   20.0
user   0.0
sys    1.8

root@carewise2 /u02/oradata/test 68# /usr/bin/time dd if=t2 of=/dev/null 
                                                            bs=8192
12800+0 records in
12800+0 records out

real   17.5
user   0.0
sys    1.9


---->	Reads of raw device over memory channel showing 10% performance hit

root@carewise2 /u02/oradata/test 70# /usr/bin/time dd if=/dev/rdrd/drd1 
                                         of=/dev/null bs=32768 count=3200

3200+0 records in
3200+0 records out
real   52.0
user   0.0
sys    1.1

root@carewise2 /u02/oradata/test 71# /usr/bin/time dd if=/dev/rdrd/drd1 
                                         of=/dev/null bs=32768 count=3200
3200+0 records in
3200+0 records out
real   51.6
user   0.0
sys    1.2

root@carewise2 /u02/oradata/test 72# /usr/bin/time dd if=/dev/rdrd/drd1 
                                         of=/dev/null bs=32768 count=3200
3200+0 records in
3200+0 records out
real   51.0
user   0.0
sys    1.0


Conclusions

These tests, in my opinion, isolate the poor Distributed Raw Device I/O 
performance from other influencing and obscuring factors (i.e., 
Oracle, disk striping, mirroring, etc.).  


DRD's are required to support OPS. In my opinion, a 300% I/O performance 
hit is an unacceptable compromise for this high-availability feature.  The 
distributed disk infrastructure has adequate system support for distributed 
AdvFS.  The distributed I/O performance for DRD's should be as good or 
better than that for distributed AdvFS.  The hardware cache on the HSZ 
controllers and UNIX OS I/O caching should work for both DRD and 
distributed AdvFS I/O.

One of the motivating factors in using raw devices with Oracle in general, 
is to improve I/O performance.  In OPS/TruCluster environment the opposite 
seems to hold true. 

Any comments on our test methodology and findings?


X-Posted in SMURF::ASE and EPS::ORACLE

    
1996.7??? HSZ cache ???BACHUS::DEVOSManu Devos NSIS Brussels 856-7539Fri Apr 25 1997 06:3923
    Hi Turgan,
    
    "The drdr device reads are 300% slower than local raw disks"
    
    I don't agree with your test methodology. You read first the rdrd1 disk
    and recorded 46.1 seconds. Then you immediately read the /dev/rrz33c
    disk (the same as rdrd1, right?). 
    
    	What are you doing of the HSZ read cache ?
    
    In my humble opinion, you should power off each component before each
    test to be sure that you are in the same conditions for each
    comparative tests.
    
    And concerning your general conclusions about ADVFS/RDRD performance
    comparison with Oracle, you completely forget that ORACLE is using
    share memory (SGA - System Global Area) to cache most possible data
    from the database. What are you also doing of the impact of the WRITE
    on database ?
    
    My 5 pence,
    
    Regards, Manu.
1996.8.6 findings re-checked hereALFAM7::GOSEJACOBFri Apr 25 1997 12:10109
    re .6
I'll leave aside another discussion about using 'dd' for I/O performance
tests. I personally quit using it.

Couple of comments about your conclusions.

>1. DRD device reads from shared (SCSI) devices take about 300% longer than 
>   comparable/identical ADVFS file system reads from the same shared device
>
>2. Raw (rrzxx) device reads from local (non-shared device) take essentially 
>   the same time as a ADVFS file system read on comparable/identical files

When I look at you performance data the one thing that seems to cause
all your trouble is the I/O performance of the HSZ device you
configured under the drd device drd1. Have you tried dd'ing directly
from that device?

Have another look at the elapsed time you see reading from drd1 and
rrz33c; the factor is the exact 300% you are wondering about. I could
not believe my eyes when I saw your numbers so I gave it a quick try.
Find the data attached below.


>3. Subsequent repeat ADVFS file system reads are 1500% faster than the 
>   initial read (i.e., the entire read was cached)

Sure and you would see the exact same thing happening when Oracle has
the data cached in it's SGA.


>4. Raw device reads do not take advantage of any hardware or UNIX OS 
>   caching irrespective of whether they are a local (rrzxx) or a shared device 
>   (DRD). None of the reads were cached.

You should be able to see faster reads from the HSZ disk repeating the
reads. Are you sure that read caching is enabled for the HSZ units you
configured and that the maximum_cached_transfersize is set to something
larger than 32k?


>6. Mirroring and or striping of data had no significant impact on raw 
>   device reads and inconsequential or marginal improvement on ADVFS file 
>   system reads

Well you should not expect much of a performance gain from striping if
you are reading with one single-threaded process (dd). The picture will
be completely different if you are running Oracle with parallel query
on top of these devices (e.g. full table scans with a parallel degree
of 8 or 16).

Just my 2 Pfennige


	Martin

--- cut here for test results ---

2 node 2100 cluster
UNIX V4.0B
TCR 1.4
1 KZPSA each
RZ29's directly attached to shared bus


local disk
----------
# /bin/time dd if=/dev/rrz16c of=/dev/null bs=32k count=3200
3200+0 records in
3200+0 records out

real   15.3
user   0.0
sys    0.8


AdvFs
-----
/bin/time dd if=fred of=/dev/null bs=32k
3200+0 records in
3200+0 records out

real   15.4
user   0.1
sys    4.0


local drd
---------
# /bin/time dd if=/dev/rdrd/drd17 of=/dev/null bs=32k count=3200
3200+0 records in
3200+0 records out

real   15.2
user   0.1
sys    0.8


remote drd
----------
# /bin/time dd if=/dev/rdrd/drd17 of=/dev/null bs=32k count=3200
3200+0 records in
3200+0 records out

real   16.4
user   0.1
sys    1.4


Now that all looks much more like it!
1996.9Slight clarificationGUIDUK::SOMERTurgan Somer - SEO06 - DTN 548-6439Fri Apr 25 1997 17:159
>    
>   I don't agree with your test methodology. You read first the rdrd1 disk
>    and recorded 46.1 seconds. Then you immediately read the /dev/rrz33c
>    disk (the same as rdrd1, right?). 
>    
    
    Sorry I was not clear enough: /dev/rrz33c is non-shared local disk
    attached to a non-shared KZPSA. It's not the disk underlying the
    rdrd1 device. Hope this clarifies it.
1996.10 apple != pear BACHUS::DEVOSManu Devos NSIS Brussels 856-7539Fri Apr 25 1997 17:3521
 Hi Turgan,
    
 >       Sorry I was not clear enough: /dev/rrz33c is non-shared local disk
 >       attached to a non-shared KZPSA. It's not the disk underlying the
 >       rdrd1 device. Hope this clarifies it.
    
    So, you compare the speed of two different disks connected to two
    different buses, one behind the HSZ and the other not...
    
    Also, if your drd1 device is on a shared bus, you should take in
    account that the bus speed (10 or 20 Mb/sec) is spread on the two
    systems. So, you should arrange to quiesce the bus on the other system
    to avoid limiting the bandwith available to your test system.
    
    By the way, I also use "dd" for my tests.
    
    Anyway, thanks for making the results of your tests public !!!
    
    Manu.
    
    
1996.11 apple != cherry ?ALFAM7::GOSEJACOBMon Apr 28 1997 06:1024
    re .last couple of entries
    
    Turgan's findings basically boil down to:
    
    1) dd from the locally attached raw device is 3 times faster than dd
       from a drd located behind a HSZ on the shared bus.
    
       Even if this was comparing completely different SCSI setups a factor
       of 3 sounds a bit high. And I would love to see the timing for the dd
       from the HSZ raw device the drd is based on.
    
    2) dd from the drd device is 3 times slower than dd from AdvFs located
       on the same HSZ device.
    
       Well I very much doubt this result (see .6 for test results). No
       offence intended but maybe this is also comparing the apple to a
       cherry or something.
    
    If you have the chance, Turgan could you please re-run the tests and
    also provide more detailed information about the drd and AdvFs
    configuration.
    
    	Martin
      
1996.12whoops !ALFAM7::GOSEJACOBMon Apr 28 1997 06:3516
    re .11
    > ... (see .6 for results)
    
    What I meant to say was: see .8 for different results.
    
    BTW. my major reason for not using dd for I/O tests is the fact that it
    doesn't allow to parallelize I/O. On the other hand bunch of parallel
    processes having a go at the I/O subsystem is the typical thing
    happening when you run an Oracle (or Informix, Sybase, ...) database
    scanning large portions of data.
    
    So dd is fine for a first glance at your I/O setup but before I'm
    satisfied with an I/O configuration I have it tested more thoroughly.
    
    	Martin 
    
1996.13Test scripts and test results for further commentsGUIDUK::SOMERTurgan Somer - SEO06 - DTN 548-6439Mon Apr 28 1997 22:06233
    
Here are the disk-timings for the HSC drives using the rrzxNNx mnemonics 
    
Specifically, drd1 is a striped pair (rrza50c, rrza49c) and mirrored on 
                                  rrza42c and rrza41c.  
    
    Here are the scripts we are using to test the disks on the 8200's:
    
    ------ Cut here -----------
    
#!/bin/sh 
################################################################################
#
#		Name	
#			script_name
#
#		Syntax
#			script_name [parm1 [parm2 [...]]]
#
#		Description
#		==============================================================
#		
#
#		Command Line Parameters
#		==============================================================
#		
#
#		Operational Environment
#		==============================================================
#
#
#		Additional Notes
#		==============================================================
#
#
################################################################################
#@
################################################################################
#@@@
ProgName=`/usr/bin/basename $0`
LocDir=${LOCAL-/usr/local}
if [ $# -gt 0 ]
then
	case $1 in
		-h)
			$LocDir/bin/ManPage $0
			exit
		;;
		-e)
			echo "Displaying Bourne shell (/bin/sh) environment.\ 
                                                                        \n"\
			> /tmp/$ProgName.env
			printenv >> /tmp/$ProgName.env
			echo "\nDisplaying c-shell (/bin/csh) environment.\
                                                                           \n"\
			>> /tmp/$ProgName.env
			set >> /tmp/$ProgName.env
			cat /tmp/$ProgName.env| more -c
			rm /tmp/$ProgName.env
			exit
		;;
		-x)
			Debug=true
			set -x
			shift
		;;
	esac
fi
disks=`/sbin/voldisk list | grep online	\
| awk '/rootdg/ {print substr($1,1,length($1)-1)}
			 /sliced/ {print $1}'	\
| uniq`
for disk in $disks
do
	echo "\nScanning first $1 MB of \"${disk}\" . . .\n" 2>&1
	count=`expr $1 \* 32`
	/usr/bin/time dd if=/dev/r${disk}c of=/dev/null count=$count bs=32768\ 
                                                                         2>&1
done
    
    
    
    -------- Cut here ------------
    
#!/bin/sh 
################################################################################
#
#		Name	
#			AvgDiskThruPut
#
#		Syntax
#			AvgDiskThruPut [num_of_MB_to_scan]
#
#		Description
#		===============================================================
#		AvgDiskThruPut finds each local drive and each HSC drive allocated to the
#		local node and performs a raw data read of the specified number of MB's
#		(defaults to 100 MB's - 1024 x 1024 x 100 = 104,857,600 bytes).  It creates a
#		report file in the local log directory "AvgDiskThruPut.mmdd.pid".
#
#		Command Line Parameters
#		===============================================================
#		1 - num_of_MB_to_scan
#
#		Operational Environment
#		==============================================================
#		Calls a subprogram "DiskTimer" which creates a temporary file.
#
#
#		Additional Notes
#		==============================================================
#
#
################################################################################
#@
################################################################################
#@@
#
################################################################################
#@@@
ProgName=`/usr/bin/basename $0`
LocDir=${LOCAL-/usr/local}
if [ $# -gt 0 ]
then
	case $1 in
		-h)
			$LocDir/bin/ManPage $0
			exit
		;;
		-e)
			echo "Displaying Bourne shell (/bin/sh) environment.\ 
                                                                           .\n"\
			> /tmp/$ProgName.env
			printenv >> /tmp/$ProgName.env
			echo "\nDisplaying c-shell (/bin/csh) environment.\
                                                                           \n"\
			>> /tmp/$ProgName.env
			set >> /tmp/$ProgName.env
			cat /tmp/$ProgName.env| more -c
			rm /tmp/$ProgName.env
			exit
		;;
		-r)
			Debug=true
			shift
		;;
		-x)
			Debug=true
			set -x
			shift
		;;
	esac
fi
Debug=${Debug-false}
pid=$$
subprogram=DiskTimer
MBytes=${1-100}
dts=`date +"%m%d"`
[ $Debug != true ] && $LocDir/bin/$subprogram $MBytes > \ 
                                                     /tmp/$subprogram.$dts.$pid
echo `date` > $LocDir/log/$ProgName.$dts.$pid
awk -v MBytes=$MBytes '
BEGIN {
	printf "\nSample Size - %-d MBytes\n", MBytes;
	print "\nDevice     Through Put\n======     ============";
}
	/Scanning/  { device = $6 }
			/real/	{ time = $2;
		AvgThruPut = MBytes/time;
		printf "%-8s   %4.2f MB/sec\n", device, AvgThruPut;
							}
' /tmp/$subprogram.$dts.$pid | tee -a $LocDir/log/$ProgName.$dts.$pid
    
    
    
    -------- Cut here -----------
    
    Here are the results:
    

Wed Apr 23 16:37:39 PDT 1997

Sample Size - 100 MBytes

Device     Through Put
======     ============
"rz33"     6.29 MB/sec
"rz34"     6.58 MB/sec
"rz35"     6.29 MB/sec
"rz36"     6.33 MB/sec
"rza41"    1.88 MB/sec
"rza42"    2.66 MB/sec
"rza43"    2.72 MB/sec
"rza44"    2.67 MB/sec
"rza49"    2.66 MB/sec
"rza50"    2.66 MB/sec
"rza51"    2.67 MB/sec
"rza52"    2.64 MB/sec
"rzb42"    2.65 MB/sec
"rzb44"    2.65 MB/sec
"rzb50"    2.65 MB/sec
"rzb52"    2.65 MB/sec
"rzc41"    2.72 MB/sec
"rzc49"    2.67 MB/sec



Thu Apr 24 16:08:15 PDT 1997

Sample Size - 100 MBytes

Device     Through Put
======     ============
"rz33"     6.62 MB/sec
"rz34"     6.58 MB/sec
"rz35"     6.58 MB/sec
"rz36"     6.58 MB/sec
"rza41"    1.89 MB/sec
"rza42"    2.67 MB/sec
"rza43"    2.75 MB/sec
"rza44"    2.67 MB/sec
"rza49"    2.66 MB/sec
"rza50"    2.96 MB/sec
"rza51"    2.70 MB/sec
"rza52"    2.65 MB/sec
"rzb42"    2.65 MB/sec
"rzb44"    2.66 MB/sec
"rzb50"    2.65 MB/sec
"rzb52"    2.65 MB/sec
"rzc41"    2.82 MB/sec
"rzc49"    2.76 MB/sec
    
    
1996.14LSM is used to mirror striped "drd1"GUIDUK::SOMERTurgan Somer - SEO06 - DTN 548-6439Mon Apr 28 1997 22:5211
    
    Some clarification:
    
    *) "drd1" striped pair (rrza50c & rrza49c) is mirrored across SCSI buses
       using LSM mirroring to (rrza42c & rrz41c)
    
    *) The first Script above should be named "DiskTimer", since it's
       called from within the second Script "AvgDiskThruPut"
    
    Again, thanks for all the comments so far.
    
1996.15it starts making senseALFAM7::GOSEJACOBTue Apr 29 1997 10:5130
    re .13 .14
    Ah, now you are talking:
    >	...
    >"rz35"     6.29 MB/sec
    >"rz36"     6.33 MB/sec
    >"rza41"    1.88 MB/sec
    >"rza42"    2.66 MB/sec
    >"rza43"    2.72 MB/sec
    >	...
    >"drd1" striped pair (rrza50c & rrza49c) is mirrored across SCSI buses
    >using LSM mirroring to (rrza42c & rrz41c)
    
    From reading between the lines I assume that you are using LSM to
    stripe over the pairs rrza50c/rrza49c and rrza42c/rrz41c too. Your
    performance numbers show that you are talking to disks behind the HSZ
    which are about 2.5 times slower than the directly connected disks. BTW
    a factor of 2.5 still seems a bit high. Are you sure the disks behind
    the HSZ's are the same type as the local ones (which from the read
    performance look like RZ29's)?
    
    Now add a couple of layers of software (LSM, drd) on top of that and
    there you are with a simple explanation of why reads from your drd
    device are 300% slower than reads from a local disk.
    
    But if you can blame anything for that performance hit than it's the I/O
    subsystem (HSZ and stuff) NOT the drd implementation. What I still
    don't understand are your AdvFs numbers.
    
    	Martin
    
1996.16still perplexedGUIDUK::SOMERTurgan Somer - SEO06 - DTN 548-6439Tue Apr 29 1997 14:0722
    
    Martin,
    
    All of our disks are RZ29's.
    
    rz33,rz34,rz35,rz36 are non-shared local disks attached to a KZPSA in
    each 8200.
    
    The rest, rz41....rz50 are behind the HSZ50's. As I pointed out above
    rz41-42 is a striped pair mirrored thru LSM to rz49-50 striped-pair on
    the other HSZ.
    
    What perplexes me is why rz33-36 shows 6.29 MB/sec number, whereas
    the ones behind the HSZ's show ~2MB/sec. 
    
    When you go directly to the /dev/rzxxx don't you bypass all of the LSM
    and DRD layers? That is if I "dd" to, say, /dev/rrza41 I go:
    
    KZPSA-->HSZ-->disk and none of the LSM and DRD layers are involved?
    WHy are we seeing a 1/3 drop in performance compared to local disks?