T.R | Title | User | Personal Name | Date | Lines |
---|
512.1 | No problem that I can see | SWAMI::LAMIA | Free radicals of the world, UNIONIZE! | Fri Jul 17 1987 13:42 | 19 |
| > access it later.) The data must be "sorted" in order of date/time
> stamps, but the records can arrive out of time order (i.e. a 12:00
> record can arrive before an 11:00 record). But, later viewing (both
Hmm, this isn't very clear, but I assume that you understand what
you mean and you know how to collect and distinguish the dates.
> data collection without interruption. The quantity of data isn't
> all that big, and it's not an extremely time-critical situation.
> (For instance, they may collect 200 bytes per minute and want
> to retain the most recent 2-hours' worth of activity.)
Let's see... 200 bytes * 60 min/hr * 12 working hr/day * 7 day/wk * 2 wk
= 2,016,000 bytes = 3938 blocks every 2 weeks.
I don't think this is big enough to worry about using CONVERT to
reclaim deleted space any more than once every couple of weeks, or even
once a month! Just make sure you tune the RMS file carefully for good
insertion performance of records in roughly sorted key order.
|
512.2 | Go ahead: Use indexed files with delete AND Convert/reclaim | CASEE::VANDENHEUVEL | Formerly known as BISTRO::HEIN | Fri Jul 17 1987 13:46 | 1 |
|
|
512.3 | how about this way... | DYO780::DYSERT | Barry Dysert | Fri Jul 17 1987 14:09 | 10 |
| I see that I shouldn't have even provided what I think is the obvious
solution because no other ideas have yet been presented. Let me
try this one: how about using the date/time stamp as an alternate
key, using anything else as the primary and doing REWRITEs, modifying
the alternate key. This would prevent the file from growing, no?
What I don't know is if this would cause performance problems (bucket
splits or something) and eventually require a CONVERT anyway.
Are there any other ideas, or at least some discussion on this second
method versus the one presented in .0? Thank you!
|
512.4 | Alternate key does not `feel' right. | CASEE::VANDENHEUVEL | Formerly known as BISTRO::HEIN | Sat Jul 18 1987 05:22 | 35 |
| Using an alternate key will cause at lot more IO's: For every record
updated, not only the primary bucket will be updated but also the
old AND new sidr bucket will be read and updated. When retrieving
records using an alternate key you almost garantuee an IO per record
unless the alternate key order largely follows the primary key order
(which might be the case here) or when you cache the whole data
level from the file in global buffers.
Go indexed. Once you have a good solution you need no other. Right?
Nevertheless, Given the relative small and limited amount of data
there are probably several alternatives. One idea that might prove
intersting is to make use of the fact that the records will probably
be largely coming in in key order. That opens the opportunity to
handle out of sequence records through an exception procedure such
as a forward/barckward pointer. Thus you could us a relative file
(or even a fixed length record sequential file) as follows:
Record 0 -> record number of record with lowest timestamp in file &
record number of record with highest timestamp in file
Record i is logically followed by record i+1 UNLESS diverted through
presence of key value in pointer field.
You might be able to use sequential puts to relative file to have
RMS handle the free slot handling... until EOF. At EOF you must
wrap around to a low key value.
Using an sequential file RMS can not tell you whether a record
exeists or is deleted and you might consider a record bitmap
to find free space.
Hein.
|
512.5 | Piece of cake! | ALBANY::KOZAKIEWICZ | You can call me Al... | Sun Jul 19 1987 11:56 | 31 |
| An application I wrote a number of years ago sounds like what you are trying
to do. It collected data from a process control system on a time domain
basis and stored the data in several files. The data was used for process
optimization and we wanted to purge "old" data on an automatic basis. This
resulted in two classes of files - high resolution data which was to be
retained for 7 days and lower resolution data which was kept for 6 months.
The solution was to size and populate the files with null records to their
eventual capacity up front. A hashing algorithm was applied to the date and
time in such a manner as to "wrap around" upon itself after 7 days or 6
months. The result of this was used as the primary key. The null records
inserted into the file had all the possible combinations of this key
represented. For example, on the 7 day file, we collected data every 15
minutes. The hashed key became the day-of-week and the hour and minute of day.
3:30 PM Wednesday would yield 41530, for example. The rest of the record
consisted of the "real" date, time, and process data. The application which
stored the data would fetch the record with the apporopriate primary key,
modify all the other fields, and rewrite the record. Using DTR or whatever to
analyze the data in the file was straightforward because the date and time (the
primary way of accessing the data from a users standpoint) were represented in
a normal fashion in alternate keys.
The original version of this system was done with RMS-11 prologue 1 files, so
I didn't have the luxury of on-line garbage collection. By populating the
file in advance, and never changing the primary key, I was able to realize
the goal of a stable file which didn't require occasional cleanup. I have
used this same technique elsewhere, always based on the date/time. I can speak
from experience when I point out that any period that doesn't roughly
corrospond to some interval on a calendar (week, month, year) is a real bitch
to implement because of the hashing algorithm (try to do 10 days, for
instance!).
|
512.6 | Beware of pre-loading `empty' records with compression. | CASEE::VANDENHEUVEL | Formerly known as BISTRO::HEIN | Mon Jul 20 1987 05:29 | 11 |
| Re .5
Beware of data compression when trying to preload records
into an indexed file, intending to update them later with
no change to the structure:
The `Empty' records that are always used are in fact long
strings of a single character (space or zero probably)
Such records will compressed to repeat counts only and
subseqent updates are garantueed to increase the size of
the record in the buckets thus potentially causing splits!
|
512.7 | | ALBANY::KOZAKIEWICZ | You can call me Al... | Mon Jul 20 1987 09:38 | 7 |
| re: -1
Yes bucket splits will occur until all the records have "real" data in them.
Actually, since the original application was written under RSX, this wasn't
a problem (no compression). When transferred to VMS, data compression was
disabled.
|
512.8 | thanks to all | DYO780::DYSERT | Barry Dysert | Mon Jul 20 1987 11:05 | 5 |
| I really like your suggestion, Al (.5). Although I haven't yet
coded a test program I presume that you won't incur any eventual
bucket splits or continual file growth. I'll discuss the various
ideas presented by everyone and let the customer decide what he
thinks is best. Thanks for everyone's input!
|
512.9 | TRIED GLOBAL SECTIONS ? | TROPPO::RICKARD | Doug Rickard - waterfall minder. | Sun Aug 02 1987 22:49 | 15 |
|
I had a similar problem one time but after several tries I finally
gave up on ISAM files. Instead I mapped a global section file which
was big enough to hold the window of data and used it as a circular
buffer. Because of the simultaneous access capabilities, other
processes could be accessing the same data at the same time as the
data acquisition program was putting it in. Every entry was time
stamped, and I wrote my own code to work through the window and put
sliding averages, etc. into external ISAM files. Worked a treat and I
can highly recommend the partiocular approach. Otherwise, the hashed
approach mentioned earlied is a real neat way to go.
Doug Rickard.
|