|
'S' THE SEQUOIA CONNECTION 'S'
~'E'~ VOLUME 1 NUMBER 1 ~'E'~
'~'~Q~'~' NOVEMBER 1991 '~'~Q~'~'
'~'U'~' '~'U'~'
O The monthly newsletter on technologies, O
I people, and events of the SEQUOIA 2000 PROJECT I
***A*** ***A**
********* ********
********TERTIARY STORAGE SYSTEMS*****FILE SYSTEMS****DATA BASES********
***********REPOSITORY****NETWORKING****SCIENTIFIC VISUALIZATION**********
**************************GLOBAL*CHANGE*SCIENCE**************************
Sequoia 2000 is a large scale collaboration between global change
scientists and computer scientists throughout the University of
California system, with funding and participation provided by state and
federal government agencies, and industry. It is a research project in
interactive, multi-terrabyte information systems, high speed networking,
and scientific visualization in support of global change science. Global
change research, in turn, serves as a test bed for the advanced
distributed information systems developed by the computer scientists.
@ Copyright Sequoia 2000 Project & Digital Equipment Corporation 1991.
-------------------------------------------------------------------------
Send subscription and information requests to RDVAX::Sequoia
or
[email protected]
-------------------------------------------------------------------------
Ira Machefsky - Publisher & Editor
Anita Scholte - Associate Editor & ASCII Horticulturist
***********************************
Table of Contents
This issue 623 lines
I. Editor's Column - The Genesis of Sequoia 2000
II. Executive Overview of Sequoia 2000
III. Sequoia 2000 Technical Report
IV. Sequoia 2000 Background Reader
V. Next month in Sequoia Connection...
VI. Electronic Order Form for Papers
**********************************
I. Editor's Column
INTRODUCTION
Welcome to the first issue of the "Sequoia Connection". With
this electronic newsletter we hope to keep the community of people
interested in Sequoia up to date on the latest discoveries,
innovations, technologies, and events of the project.
The tide of electronic information that besets us every day has
reached full flood, and no one is so well aware of this as your editor,
who has done his share to contribute to this innundation. It is with
this in mind that we hope to keep this newsletter always informative
and worth your reading time. Any suggestions you may have in this
regard are always welcome. We are considering shifting the format of
this newsletter from ASCII to Postscript which would make it more
readable as a printed document but less readable (totally unreadable)
to most subscribers in their on-line mail systems. If you have any
feelings about this format change we would like to hear from you.
This issue of the newsletter is devoted to familiarizing you
with the project through a variety of background readings. Section II
is a 4 page executive overview of the project, and Section III tells
you how to get a Postscript copy of the Sequoia Technical Report.
Section IV highlights the Sequoia Background Reader, a desert isle
selection of readings on the global change science and the computer and
information systems technologies involved in the project. If you want a
good grounding in Sequoia 2000 these are "must read" items. The
Technical Report was derived from the proposal which launched the
project, and the background reader brought the computer scientists and
global change scientists up to speed in their respective disciplines.
We'll offer more background papers in next month's issue.
THE GENESIS OF SEQUOIA 2000
With all the interest in Sequoia it is worth reflecting for a
moment on how the project came to be. About one year ago Digital's
External Research Program (ERP) was casting about for what we were then
calling our next "Flagship Project". Project Athena, Digital's largest
and one of its most successful external research projects, was coming
to an end. On the strength of the Athena experience, the ERP staff
decided that it was definitely worth having at least one project that
was much larger and more ambitious than any of the 200 or so other
university-based projects that were going on at universities around the
world at any given time.
But what was that project to be? In the '80s the university
research world was ablaze with ideas of "3M" (Megabit, Megabyte,
Megapixel - referring to network, storage, and graphics capacity)
workstations that were, at decade's end, to issue in the client-server
architected RISC workstation products that would send the established
computer world into upheaval. ERP had no shortage of research proposals
from universities who, almost with a single voice, trumpeted these
technologies as winners for the '90s. Athena became the most celebrated
and successful of these projects. With the culmination of research
interest in these technologies at the beginning of the '90s, was there
any consensus among the research community about the next great
direction to pursue? What were the key research ideas that would lead
on to the great products for the turn of the century?
It was with a mind to answer these questions that Digital's
External Research Program submitted a Request for Proposals (RFP) to
major universities around the world. The RFP guidelines asked
universities to write a short "think piece" that would delineate the
opportunities for research in strategic areas where major breakthroughs
might be possible and what each school's role in supporting such an
undertaking might be. In addition ERP was looking for proposals that
were interdisciplinary, even inter-university, and application driven,
seeking to avoid projects that were either too narrow or based on a
pure technology "push" without concern for ultimate application.
About 25 such "think pieces" were submitted as a response to
the RFP. From these 25, six "think pieces" were selected to be developed
into full-fledged proposals. A remarkably large number of the "think
pieces" dealt with research in the area of large-scale information
management. This was both an interesting and surprising result, since
ERP had previously seen no such consensus on important research areas.
A thorough review of the six finalist's proposals was conducted
by the Digital engineering and marketing community, culminating with a
command presentation by each university's PIs of their proposal in the
General Doriot Auditorium before a select group of Digital research,
engineering and marketing people, impaneled to advise ERP on a final
selection.
Sequoia 2000 was the project ultimately selected from this
year-long scrutiny. The rest will be history.
**********************************
II. Executive Overview of Sequoia 2000
This executive overview of the project is intended as a thumbnail
sketch of the basic research ideas and objectives of the program. It
serves as a good general introduction to the project or summary for
those who have no need to explore the details of the research program.
It is approximately four pages long.
SEQUOIA 2000
LARGE CAPACITY OBJECT SERVERS
TO SUPPORT GLOBAL CHANGE RESEARCH
July 31, 1991
Principal Investigators:
Michael Stonebraker
Computer Science Division
University of California
549 Evans Hall
Berkeley, CA 94720
(415) 642-5799
[email protected]
Jeff Dozier
University of California
Center for Remote Sensing and Environmental Optics
1140 Girvetz Hall
Santa Barbara, CA 93106
(805) 893-2309
[email protected]
Faculty Investigators
Michael Bailey, San Diego Supercomputer Center, San Diego
Tim Barnett, Scripps Institution of Oceanography, San Diego
Hans-Werner Braun, San Diego Supercomputer Center, San Diego
Michael Buckland, School of Library and Information Studies, Berkeley
Ralph Cicerone, Department of Geosciences, Irvine
Frank Davis, Center for Remote Sensing and Environmental Optics, Santa
Barbara
Domenico Ferrari, Computer Science Division, Berkeley
Catherine Gautier, Center for Remote Sensing and Environmental Optics,
Santa Barbara
Michael Ghil, Department of Atmospheric Sciences, Los Angeles
Randy Katz, Computer Science Division, Berkeley
Ray Larson, School of Library and Information Studies, Berkeley
C. Roberto Mechoso, Climate Dynamics Center, Los Angeles
David Neelin, Department of Atmospheric Sciences, Los Angeles
John Ousterhout, Computer Science Division, Berkeley
Joseph Pasquale, Computer Science Department, San Diego
David Patterson, Computer Science Division, Berkeley
George Polyzos, Computer Science Department, San Diego
John Roads, Scripps Institution of Oceanography, San Diego
Lawrence Rowe, Computer Science Division, Berkeley
Ray Smith, Center for Remote Sensing and Environmental Optics, Santa
Barbara
Richard Somerville, Scripps Institution of Oceanography, San Diego
Richard Turco, Institute of Geophysics and Planetary Physics, Los Angeles
COOPERATING ORGANIZATIONS
DEC Colorado Springs Research Laboratory
DEC San Francisco Research Laboratory
Exabyte Corp.
Hewlett-Packard Labs
National Center for Atmospheric Research
National Meteorological Center, NOAA
San Diego Supercomputer Center
State of California Air Resources Board
State of California Department of Water Resources
TRW
United States Geological Survey
University of California, Berkeley
University of California, Los Angeles
University of California, Office of the President
University of California, San Diego
University of California, Santa Barbara
University of Colorado
1. MOTIVATION FOR THE PROJECT
Among the most important challenges that will confront
the scientific and computing communities during the 1990s is
the development of models to predict the impact of Global
Change on the planet Earth. Specific issues include the
greenhouse effect, ozone depletion, scarcity of potable
water, deforestation, and the increasing toxicity of the
atmosphere.
One responsibility of Earth System Scientists is to
inform the development of public policy, particularly with
respect to costly remedies to control the impact of human
enterprise on the global environment. Clearly, human
activities accelerate natural rates of change. However, it
is difficult to predict the long-term effects of even well-
documented changes, because our understanding of variations
caused by nature is so poor. Therefore, it is imperative
that our predictive capabilities be improved.
Throughout the UC System are many leading scientists
who study various aspects of global change. Associated with
Sequoia 2000 are three of the stellar ones, the Center for
Remote Sensing and Environmental Optics (CRSEO) on the Santa
Barbara campus, the UCLA Climate Dynamics Center, and the
Climate Research Division (CRD) of Scripps Institution of
Oceanography at San Diego.
UC Global Change researchers have learned that serious
problems in the data systems available to them impede their
ability to access needed data and thereby do research
[CEES91]. In particular, five major shortcomings in current
data systems have been identified:
1) Current storage management system technology is inade-
quate to store and access the massive amounts of data
required.
Currently, researchers need access to datasets on the
order of one terabyte, and these datasets are growing
rapidly. Clearly, tertiary memory is a requirement. How-
ever, current system software, including file systems and
data base systems, offers no support for this type of
multi-level storage hierarchy. Moreover, current tertiary
memory devices (such as tape and optical disk) are exceed-
ingly slow, and innovative hardware and software are
required to mask these long access delays through sophisti-
cated caching, and increase effective transfer bandwidth by
compression techniques and parallel device utilization.
None of the necessary support is incorporated in currently
available commercial systems.
2) Current I/O and networking technologies do not support
the data transfer rates required for browsing and visualiza-
tion.
Examination of satellite data or output from models of
the Earth's processes requires that we visualize data sets
or model outputs in various ways. A particularly challeng-
ing technique is to fast-forward satellite data in either
the temporal or spatial dimension. The desired effect is
similar to that achieved by the TV weather forecasters who
show, in a 20-second animated summary, movement of a storm
based on a composite sequence of images collected from a
weather satellite over a 24-hour period. Time-lapse movies
of concentrations of atmospheric ozone over the Antarctic
``ozone hole'' show interesting spatial-temporal patterns.
Time-lapse movies and rapid display of two-dimensional sec-
tions through three-dimensional data place severe demands on
the whole I/O system to generate data at usable rates.
Additionally, severe networking problems arise when
investigators are geographically remote from the I/O server.
Not only is a high bandwidth link required that can deliver
20-30 images per second (i.e. up to 600 Mbits/sec), but also
the network must guarantee delivery of required data without
pauses that would degrade real-time viewing. Current com-
mercial networking technology cannot support such
``guaranteed delivery'' contracts.
3) Current visualization software is too primitive to allow
Global Change researchers to render the data returned for
useful interactive viewing on a user workstation.
Global Change researchers would like, for example, to
roam through an AVIRIS ``image cube,'' displaying any three
of the 224 spectral bands in RGB color, while at the same
time displaying information, perhaps graphically, about all
224 bands. Today, each scientist must develop a substantial
amount of device-specific and dataset-specific display and
rendering code to perform such functions. Even after
development, such code faces substantial performance prob-
lems if there is not enough space to buffer the information
from an entire dataset locally.
4) Current data base systems are inadequate to store the
diverse types of data required.
Earth System Scientists require access to the following
disparate kinds of data for their remote sensing applica-
tions:
Point Data for specific geographic points. In situ
snow measurements include depth and vertical profiles
of density, grain size, temperature, and composition,
measured at specific sites and times by researchers
traveling on skis.
Vector Data. Topographic maps are often organized as
polygons of constant elevation (i.e. a single datum
applying to a region enclosed by a polygon, which is
typically represented as a vector of points). Other
vector data include drainage basin boundaries, stream
channels, etc.
Raster Data. Many satellite and aircraft remote sens-
ing instruments produce a regular array of point meas-
urements. The array may be 3-dimensional if multiple
measurements are made at each location. This ``image
cube'' (2 spatial plus 1 spectral dimension) is
repeated every time the satellite completes an orbit.
The volumes are large; for example, a single frame from
the AVIRIS NASA aircraft instrument contains 140
Mbytes.
Text Data. Global Change researchers have large quan-
tities of textual data including computer programs,
descriptions of data sets, descriptions of results of
simulations, technical reports, etc. that need to be
organized for easy retrieval.
Current commercial relational data base systems (e.g.
DB 2, RDB, ORACLE, INGRES, etc.) are not good at managing
these kinds of data. During the last several years a
variety of next generation DBMSs have been built, including
IRIS [WILK90], ORION [KIM90], POSTGRES [STON90], and Star-
burst [HAAS90]. The more general of these systems appear to
be usable, at least to some extent, for point, vector, and
text data. However, none are adequate for the full range of
needed capabilities.
5) It is extremely difficult to share the objects noted
above with other interested researchers.
Most of the data objects that Earth System Scientists
wish to store are ones that they also wish to share with
other researchers. For example, the Santa Barbara group has
written a computer program that will analyze an image and
detect the outline of the snow cover in the image. Not only
do they need to store this program, but they also wish to
share it with other interested scientists around the coun-
try. In the same vein, they would like to share technical
reports, data sets and the output of simulation runs. Like-
wise, they require access to similar objects produced by
research groups in other places.
Effective sharing of these classes of data objects
requires an on-line, distributed repository that could cata-
log available objects, and then provide browsing support to
a scientist. We call this the electronic repository. It
consists of software capabilities for indexing and browsing
an object base built into and on top of a DBMS. Such
software is not currently available.
2. OBJECTIVES OF SEQUOIA 2000
In summary, Global Change researchers require a massive
amount of information to be effectively organized in an
electronic repository. They also require ad-hoc collections
of information to be quickly accessed and transported to
their workstations for visualization. The hardware, file
system, DBMS, networking, and visualization solutions
currently available are totally inadequate to support the
needs of this community.
The problems faced by Global Change researchers are
faced by other users as well. Most of the Grand Challenge
problems share these characteristics [CPM91], i.e. they
require large amounts of data, accessed in diverse ways from
a remote site quickly, with an electronic repository to
enhance collaboration. Moreover, these issues are also
broadly applicable to the computing community at large.
Consider, for example, an automobile insurance application.
Such a company wishes to store police reports, diagrams of
each accident site and pictures of damaged autos. Such
image data types will cause existing data bases to expand by
factors of 1000 or more, and insurance data bases are likely
to be measured in Terabytes in the near future. Further-
more, the same networking and access problems will appear,
although the queries may be somewhat simpler. Lastly, visu-
alization of accident sites is likely to be similar in com-
plexity to visualization of satellite images.
The purpose of the Sequoia 2000 project is to build a
five-way partnership to work on these issues. The first
element of the partnership is a technical team, primarily
computer and information scientists, from several campuses
of the University of California. They will attack a
specific set of research issues surrounding the above prob-
lems as well as build prototype information systems.
The second element of the partnership is a collection
of Global Change researchers, primarily from the Santa Bar-
bara, Los Angeles, and San Diego campuses, whose investiga-
tions have substantial data storage and access requirements.
These researchers will serve as users of the prototype sys-
tems and will provide feedback and guidance to the technical
team.
The third element of the partnership is a collection of
public agencies who must implement policies affected by
Global Change. We have chosen to include the California
Department of Water Resources (DWR), the California Air
Resources Board (ARB) and the United States Geological Sur-
vey (USGS). These agencies are end users of the Global
Change data and research being investigated. They are also
interested in the technology for use in their own research.
The fourth element of the partnership is DEC, which
will provide extensive hardware support and key research
participants for the project.
Lastly, the fifth element of the partnership is a col-
lection of other industrial participants, who can serve as a
sounding board for our ideas and participate in technology
transfer. Exabyte, Hewlett-Packard, and TRW are the initial
members of this group, and we are actively soliciting addi-
tional participants.
We call this proposal Sequoia 2000, after the long-
lived trees of the Sierra Nevada. Successful research on
Global Change will allow humans to better adapt to a chang-
ing Earth, and the 2000 designator shows that the project is
working on the critical issues facing the planet Earth as we
enter the next century.
REFERENCES
[CEES91] Committee on Earth and Environmental Sciences, Our
Changing Planet: The FY 1992 U.S. Global Change
Research Program, Office of Science and Technology
Policy, Washington, D.C. (1991).
[CPM91] Committee on Physical, Mathematical and Engineer-
ing Sciences, Grand Challenges: High Performance
Computing and Communications, Office of Science
and Technology Policy, Washington, D.C. (1991).
[HAAS90] Haas, L. et al., ``Starburst Mid-Flight: As the
Dust Clears,'' IEEE Transactions on Knowledge and
Data Engineering (1990).
[KIM90] Kim, W. et al., ``Architecture of the ORION Next-
Generation Database System,'' IEEE Transactions on
Knowledge and Data Engineering (March 1990).
[STON90] Stonebraker, M. et al., ``The Implementation of
POSTGRES,'' IEEE Transactions on Knowledge and
Data Engineering (March 1990).
[WILK90] Wilkinson, K. et al., ``The IRIS Architecture and
Implementation,'' IEEE Transactions on Knowledge
and Data Engineerig (March 1990).
***********************************
III. Sequoia 2000 Technical Report
The Sequoia 2000 Technical Report is a detailed description and
justification of the research agenda for the Sequoia project. It is
derived from the Sequoia 2000 proposal submitted to Digital's External
Research Program and is co-authored by project PIs Mike Stonebraker and
Jeff Dozier.
It is available on-line in Postscript format by "replying" to
this message, or sending mail to RDVAX::Sequoia, or sending internet
mail to [email protected] with a request for the Sequoia
Technical Report. An abstract follows:
Technical Report #91/1
"Large Capacity Object Servers to Support Global Change Research"
by Michael Stonebraker and Jeff Dozier (July 1991).
ABSTRACT:
Improved data management is crucial to the success of current
scientific investigations of Global Change. New modes of research,
especially the synergistic interactions between observations and
model-based simulations, will require massive amounts of diverse
data to be stored, organized, accessed, distributed, visualized, and
analyzed. Achieving the goals of the U.S. Global Change Research
Program will largely depend on more advanced data management systems
that will allow scientists to manipulate large-scale data sets and
climate system models.
Refinements in computing - specifically involving storage,
networking distributed file systems, extensible distributed data
base management, and visualization - can be applied to a range of
Global Change applications through a series of specific
investigation scenarios. Computer scientists and environmental
researchers at several U.C. campuses will collaborate to address
these challenges. This project complements both NASA's EOS project
and UCAR's (University Corporation for Atmospheric Research) Climate
Systems Modeling Program in addressing the gigantic data
requirements of Earth System Science research before the turn of the
century. Therefore, we have named it SEQUOIA 2000, after the giant
trees of the Sierra Nevada, the largest organisms on the Earth's
land surface.
***********************************
IV. Sequoia 2000 Background Reader
The project scientists were confronted with the dilemma of
developing a shared, common background of knowledge on global change
science, and information and computer science relevant to the project.
The Sequoia 2000 Background Reader was created to begin this process.
It consists of basic articles representing each discipline. Highly
recommended background reading for the project.
The Sequoia Background Reader is available only in hard copy.
Send requests for it to RDVAX::Sequoia or internet requests to
[email protected]. A bibliographic summary of the contents of
the Background Reader follows.
Sequoia 2000 Background Reader
"Planning for the EOS Data and Information System (EOSDIS)"
J. Dozier & H.K. Ramapriyan. Global Environmental Change, R.W.
Corell and P.A. Anderson (eds.). NATO ASI Series vol. I1,
Springer-Verlag, Berlin, 1991.
"The Global Change Computing Initiative"
Gary Boyles. DEC Publication, Colorado, 1991.
"Interdecadal Oscillations and the Warming Trend in Global
Temperature Time Series"
M. Ghil and R. Vautard. Nature, vol. 350, no. 6316, pp.324-
327, March 1991.
"Computer Simulation of the Greenhouse Effect"
Warren M. Washington and Thomas W. Bettge. Computers in
Physics, May/June 1990.
"The Recent Climate Record: What it Can and Cannot Tell Us"
Thomas R. Karl, J. Dan Tarpley, Robert G. Quayle, Henry F.
Diaz, David A. Robinson, ans Raymond S. Bradley. Reviews of
Geophysics, vol. 27, no.3, pp.405-430, August 1989.
"Future Trends in Database Systems"
Michael Stonebraker. IEEE Transactions on Knowledge and Data
Engineering, vol.1, no.1, pp. 33-44, March 1989.
"National and International Implications of the Linked Systems
Protocol for Online Bibliographic Systems"
by Michael K. Buckland and Clifford A Lynch. Cataloging &
Classification Quarterly, vol.8, no.3, pp.15-33, Spring 1988.
"Data Storage in 2000 - Trends in Data Storage Technologies"
M.H. Kryder. IEEE Transactions on Magnetics, vol. 25, no.6,
pp. 4358-4363, November 1989.
"A Scheme for Real Time Channel Establishment in Wide-Area Networks"
Domencio Ferrari and Dinesh C. Verma. IEEE Journal on Selected
Areas in Communications, vol,8, no.3, pp.368-379, April 1990.
"The Cost of Messages"
Jim Gray. Proceedings of Principles of Distributed Systems,
Toronto, Canada, ACM Press, 1989.
*****
***********************************
V. The December Issue...
Sequoia video and multimedia research...Sequoia networking
research...Sequoia at EDUCOM (UCSD Prof. Joe Pasquale's research
featured)...two new tech reports by Prof. Randy Katz, UCB...More
Sequoia background readings...Who will DEC hire to fill it's three
Sequoia research slots?...and more...
***********************************
VI. Sequoia 2000 Electronic Order Form
NOTE: IF YOU WOULD LIKE A COPY OF ANY OF THE PAPERS MENTIONED TO DATE
IN "SEQUOIA 2000", PLEASE SUBMIT THE FOLLOWING ELECTRONIC ORDER FORM TO
RDVAX::SEQUOIA OR FOR INTERNET MAIL, [email protected].
FOR HARD COPY PAPERS PLEASE INCLUDE YOUR NAME, ADDRESS, MAIL STOP, AND
TELEPHONE NUMBER. ON-LINE PAPERS WILL BE SENT WEEKLY, HARD COPY MONTHLY.
( ) BKGD.0001 Sequoia 2000 Background Reader - Hard Copy Only
( ) BKGD.0002 Sequoia 2000 Technical Report - Online Postscript
************************THE END************************
Distribution:
TO: VMSMail Distribution List ( _@TENAYA::WTD)
CC: HAYES@TENAYA@MRGATE
|