| Hi
I assume these flat files are to the file system?
Are the files read/written sequentially?
If the above answers are yes, then the system should be
able to do a good job with it via AdvFS, LSM striping, and
HSZxx controllers (where xx should be 50 or 70).
You should look at the I/O tuning talk from the
symposium in http://www-unix.zk3.dec.com/symp_s97/unix.htm
The machine choice and specific I/O tests would be your next
step....you may want to discuss this with experts like
Doug Williams from the server group.
Eric
|
| The short answer is host based striping. Across many controllers.
If you want redundancy you may want to use array controllers
to create mirrors or RAID-5s, but RAID-5 may not do well on
the write side of things.
re: max I/O size.
It depends on the I/O subsystem and higher level components
used. SCSI has a maximum I/O size of 16,777,215 bytes (16 MB
- 1). The HSZ40/50 only allow a maximum of 64 KB. Out of
the box, I think UFS, AdvFS and LSM also have a 64 KB maximum,
though it may be possible to raise it.
re: multi-block transfers
The answer is trivially obvious that the question is confusing
me. Of course. You can hand the raw disk interface or file
system a very large I/O and it will handle breaking that down
in however many I/Os are needed.
re: Caching.
When reading, caches usually serve two purposes:
o Holding frequently read data.
o Holding the data that is about to be read (read-ahead).
You'll be hard to pressed to find a big enough cache that
it can hold these large files (assuming they're sequentially
read). On the other, UFS and I think AdvFS will do read-
head once they decide you're doing sequential reads. This
can help performance considerably when using the file system.
Most modern disks also do read-ahead with their caches. The
HSZ family doesn't do any read-ahead that I know of (a notable
flaw in my opinion).
Write caching is a little different. Even when the cache
has filled and you can't write any faster than the cache can
flush, there may be opportunities for I/O subsystem to
optimize the writes. It may collect smaller writes done to
the cache into larger writes, it may write data out of order
if it knows the head may be under the right sector, etc.
re: effectiveness of disk caches.
Probably very. You can use scu(8) to turn off the read-ahead
cache on a disk. Do a benchmark. I suspect the difference
will be significant.
re: Controller questions.
I need to think about these. You may want to ask in the
ASK_SSAG conference on SSAG. Be sure to specify the controller
of interest.
re: Advantage of raw I/O.
It doesn't tie up any memory for buffer cache. It saves a
data copy from the cache to the user buffer. The disadvantage
is that it doesn't tie up any memory for the buffer cache,
meaning you have none features offered by the buffer cache;
no read-head unless you do it yourself. When in doubt
Benchmark.
re: Performance problems of RAID-5.
In a worst case a simple RAID-5 implementation has to do four
I/Os to write one sector. Fancy implementation that fix the
write hole may do more. Write back caches may reduce this some
by absorbing writes to the same places. They may also allow
the oppurtunity to take separate writes to the same general
area and combine them.
A smart RAID-5 implementation can take very large writes and
make them appear to be RAID-3 writes, which only require one
additional I/O for the parity (and any house keeping I/Os).
In the normal state for reads, RAID-5 should look like striping
with one extra disk. Not in the normal state, RAID-5 will still
be reading where anything else would be getting I/O errors.
If your application can take data fast enough, you don't want
RAID-5 anyway. A controller based array is only as fast as
the connection to the host. Host based striping lets you
spread out that I/O over more controllers and generally get
faster I/O.
re: Performance problem is RAID-1.
Without a safe write-back cache, writes don't complete until
the last one does. This hurts the write performance. With
a safe write cache (such as when using controller RAID), the
controller can complete the write when it gets the data and
get the data to the disk at its leisure. The usual problems
of the cache filling apply... And there are the usual house
keeping I/Os.
Using controller based mirroring the host only has to send
the data once, but that controller becomes a single point
of failure. You can use mirroring either at the host level
or the controller level. Benchmark each and balance the
tradeoffs. If you want some device redundancy, but want
the highest performance, may be host striping and controller
mirroring is the right answer.
|
| re: Concurrent transfers.
The simple answer is one, but...
A pair of Fast/Wide SCSI devices can exchange one 16 bit word
of data every clock tick, with the clock ticking at 10 Mhz.
For long transfers, not all the data may be tranferred at
once, to prevent other devices from getting starved for
bandwidth. I think protocol transfers use 8 bit words at
5 Mz, which cuts into the total bandwidth available.
So, any discrete instant (a clock tick) only one word is
being transferred. Over a collection of such instants
many transfers may appear to be in progress. If a single
device can't saturate the bus, then two or more devices
may be able to use the spare cycles.
And, there is more to being able to accept multiple commands
than concurrent seeking. If an individual SCSI device
supports command queuing it can accept multiple commands
sort them, break them up, combine them internally, etc
to offer higher through-put. In an array controller, the
multiple commands could be completed independently since
each device of the array can operate independently.
re: Controller doing I/O.
What kind of controller? For your proposed I/O load, I don't
think it really matters much, because you'll have saturated
the bus long before you saturate the ability of the controller
to handle I/Os.
|