T.R | Title | User | Personal Name | Date | Lines |
---|
410.1 | KDD Nuggets 96:34 | IJSAPL::OLTHOF | Spellchecked Henry Although | Fri Nov 08 1996 08:12 | 820 |
410.2 | count me in | FOUNDR::BARNETT_T | | Tue Nov 12 1996 01:01 | 5 |
410.3 | Read note 406 and enroll on the WEB site | UTROP1::dhcppc.uto.dec.com::olthof_h | Spellchecked Henry Although | Tue Nov 12 1996 08:04 | 9 |
410.4 | KDD Nuggets 96:35 | IJSAPL::OLTHOF | Spellchecked Henry Although | Sun Nov 17 1996 12:36 | 886 |
410.5 | KDD Nuggets 96:36 | IJSAPL::OLTHOF | Spellchecked Henry Although | Fri Nov 22 1996 08:40 | 862 |
410.6 | KDD Nuggets 96:37 | IJSAPL::OLTHOF | Spellchecked Henry Although | Wed Nov 27 1996 11:10 | 929 |
410.7 | KDD Nuggets 96:38 | IJSAPL::OLTHOF | Spellchecked Henry Although | Tue Dec 10 1996 07:11 | 821 |
410.8 | KDD Nuggets 96:39 | IJSAPL::OLTHOF | Spellchecked Henry Although | Sat Dec 14 1996 07:44 | 1078 |
410.9 | KDD Nuggets 96:40 | IJSAPL::OLTHOF | Spellchecked Henry Although | Fri Dec 20 1996 09:07 | 770 |
410.10 | KDD Nuggets 97:01 | IJSAPL::OLTHOF | Spellchecked Henry Although | Sun Jan 05 1997 19:16 | 827 |
410.11 | 97:02 | IJSAPL::OLTHOF | Spellchecked Henry Although | Fri Jan 10 1997 10:21 | 655 |
410.12 | 97:03 | IJSAPL::OLTHOF | Spellchecked Henry Although | Mon Jan 20 1997 14:17 | 561 |
410.13 | 97:04 | IJSAPL::OLTHOF | Spellchecked Henry Although | Mon Feb 03 1997 11:47 | 1444 |
| Knowledge Discovery Nuggets 97:04, e-mailed 97-01-28
News:
* GPS, Information Week on Debunking Data-Mining Myths
http://www.techweb.com/se/directlink.cgi?IWK19970120S0042
* N. Uffenheimer, EDS in the data warehouse, datamining, DSS areas
Publications:
* J. P. Brown, Data Mining: What Needs To Be Done, And Why.
http://www.hal-pc.org/~jpbrown
* F. Famili, Intelligent Data Analysis Journal - First Issue is live,
http://www.elsevier.com/locate/ida
Siftware:
* B. Li, Parallel C4.5,
http://merv.cs.nyu.edu:8001/~binli/pc4.5/
Positions:
* E. Babb, Jobs in data mining in London,
http://www.parsys.com/dafs.htm
* D. Berleant, Tenure Track, Teaching and Research at U. of Arkansas
Meetings:
* D. Stodder, Data Mining Summit program,
http://www.dbsummit.com
--
KDD Nuggets is a free electronic newsletter for the Data Mining and Knowledge
Discovery in Databases (KDD) community, focusing on the latest research and
applications.
Submissions are most welcome and should be emailed,
with a DESCRIPTIVE subject line (and a URL, when available) to [email protected]
To subscribe, email to [email protected] message with
subscribe kdd-nuggets
in the first line (the rest of the message and subject are ignored).
See http://info.gte.com/~kdd/subscribe.html for details.
Nuggets frequency is approximately 3 times a month.
Back issues of Nuggets, a catalog of S*i*ftware (data mining tools),
and a wealth of other information on Data Mining and Knowledge Discovery
is available at Knowledge Discovery Mine site http://info.gte.com/~kdd
-- Gregory Piatetsky-Shapiro (editor)
********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) *
* and not necessarily of their respective employers (or GTE Laboratories) *
*****************************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Variations on old chestnut on how to use programming languages
to shoot yourself in the foot ...
HTML: You shoot yourself in the foot, but the bullet takes 10 minutes
to get there.
VRML: You have to fight your way through 3 levels of DOOM before you
can shoot yourself in the foot with a blaster cannon.
JAVA: You shoot yourself and everyone else on the internet in the foot.
JAVASCRIPT: You shoot yourself and everyone else on the internet in
the foot with rubber bullets.
PERL: You try to shoot yourself in the foot, but can't figure out
the instructions that came with the gun.
TCL: You shoot yourself in the foot with a cap gun.
Thanks to L. Brothers
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 27 Jan 1997 17:11:18 -0500
From: [email protected] (Gregory Piatetsky-Shapiro)
Subject: Information week on Debunking Data-Mining Myths -
Content-Length: 23384
see
http://www.techweb.com/se/directlink.cgi?IWK19970120S0042
for full text
January 20, 1997, Issue: 614
Section: InformationWeek Labs
Debunking Data-Mining Myths --
Don't let contradictory claims about
data mining keep you from improving
your business
By Robert D. Small
A great deal of what is said about data mining is
incomplete, exaggerated, or wrong. Data mining has
taken the business world by storm, but as with many
new technologies, there seems to be a direct
relationship between its potential benefits and the
quantity of often-contradictory claims, or myths,
about its capabilities and weaknesses. It's difficult to
fight these myths, which are based on
misunderstandings, hopes, and fears. The new
technology cycle typically goes like this: Enthusiasm
for an innovation leads to spectacular assertions.
Ignorant of the technology's true capabilities, users
jump in without adequate preparation or training.
Then, sobering reality sets in. Finally, frustrated and
unhappy, users complain about the new technology
and urge a return to "business as usual." When you
undertake a data-mining project, avoid a cycle of
unrealistic expectations followed by disappointment.
Understand the facts instead, and your data-mining
efforts will be successful. - Simply put, data mining
is used to discover patterns and relationships in your
data in order to help you make better business
decisions.
Myth: Data mining produces surprising results that
will utterly transform your business.
Fact: Most often, the results of data mining yield
steady improvement to an already successful
organization, often contributing important incremental
changes rather than revolutionary ones.
Nevertheless, data mining can lead to significant
change in several ways. First, it may give the talented
business manager a small advantage each year, on
each project, with each customer. Compounded over
a period of time, these small advantages turn into a
large competitive edge. For example, a catalog retailer
that can better target its mailing list can increase
profits by reducing the cost of mailings while
increasing the number of orders. Over time, this can
result in a substantially more profitable business.
Second, data mining occasionally does uncover one
of those rare "breakthrough" facts, such as scientists'
noticing the association between the fatal Reyes
Syndrome and children taking aspirin.
In short, data mining is a powerful search tool for
forward-looking companies.
Myth: Data-mining techniques are so sophisticated
that they can substitute for domain knowledge or for
experience in analysis and model building.
Fact: No analysis technique can replace experience
and knowledge of the business and its markets. On
the contrary, data mining makes education and
experience in many areas more important than
ever.While experts may need to learn new analytical
techniques to stay current and make leading-edge
contributions, someone who's an expert only in
analytical techniques, without having knowledge of
the business, is of no help.
Experience in building models, however, can ensure
more profitable use of data mining, since data
mining is simply the newest tool for building models.
The less domain knowledge a data mining expert
brings to a problem, the more important it is to
perform the data mining in close cooperation with
people who understand the business.
Similarly, the less skill and experience that business
experts have in modeling and using the associated
tools, the more help they need from data-mining
experts in leveraging their business knowledge.
For example, financial analysts seeking to increase the
return on their clients' investments may ask an expert
data miner to analyze a large, complex database on
previous clients. The data miner may discover that
certain variables predict success in investing, but it
takes a financier to know whether it's legal to influence
those variables.
Myth: Data-mining tools automatically find the
patterns you're looking for, without being told what to
do.
Fact: Data mining is most cost-effective when used
to solve a particular problem. Although a data-mining
tool can indeed explore your data and uncover
relationships, it still needs to be directed toward a
specific goal. Simply giving a data-mining tool a
mailing list and expecting it to find customer profiles
that improve the efficiency of a direct-mail campaign
is not particularly effective. You need to be more
specific in your goals. For example, to improve the
value of mailing-list responses, your model might
emphasize customers who have previously bought
expensive items; to increase the number of
responses, your model might emphasize customers
who have responded to previous mailings.
Myth: Data mining is useful only in certain areas, such
as marketing, sales, and fraud detection.
Fact: Virtually any process from pharmacology to
customer service can be studied, understood, and
improved using data mining. These techniques are
being applied to such diverse applications as
manufacturing process control, human resources, and
food-service management.
Data mining is useful wherever data can be collected.
Of course, in some instances, cost/benefit
calculations might show that the time and effort of the
analysis is not worth the likely return. For example,
suppose you suspect that if you collect just one more
piece of information about your customers, you could
double the number of orders you received. But you
also know that mailing to twice as many people will
also double the number of orders. If gathering the
data is more expensive than sending the extra
mailings, then it makes sense to increase the mailings
rather than mine the data.
Myth: The methods used in data mining are
fundamentally different from the older quantitative
model-building techniques.
Fact: All methods now used in data mining are natural
extensions and generalizations of analytical methods
known for decades. Neural nets, a special case of
projection pursuit regression, were developed in the
1940s. CART (classification and regression trees)
methods were used by social scientists in the 1960s.
K-nearest neighbor, a form of density estimation, has
been used for a half-century.
All these methods-just like regression
techniques-model relationships between a set of
profile variables and an outcome.
What's new in data mining is that we're now applying
these techniques to more general business problems,
thanks to the increased availability of data and
inexpensive processing power.
Furthermore, because communication between the
business community and methodologists, who are
mainly academics, has often been poor, there was,
until recently, no user-friendly software for
implementing these methods. The recent interest in
data mining is in part due to the improved user
interfaces that make these techniques more available
to business experts.
The rise of these powerful methods is a great step
forward, but the old tools are still valuable. Varieties
of regression techniques, discriminant analysis, and
even simple graphs can help reveal hidden patterns.
No single method solves all or even a majority of
problems. Successful data mining requires a portfolio
of tools, both old and new.
Myth: Data mining is an extremely complex process.
Fact: The algorithms of data mining may be complex,
but new tools have made those algorithms easier to
apply. Often, just the correct application of relatively
simple analyses, graphs, and tables can reveal a great
deal about our business. Much of the difficulty in
applying data mining comes from the same
data-organization issues that arise when using any
modeling techniques. These include data preparation
tasks-such as deciding which variables to include and
how to encode them-and deciding how to interpret
and take advantage of the results.
Myth: Only massive databases are worth mining.
Fact: It's true that many methods used in data mining
were specifically developed for analyzing very large
data sets, and that many data-mining applications
involve massive data sets. But a moderately sized or
small data set can also yield valuable information. For
example, buying patterns may depend most strongly
on the day of the week or the time of the year. A
modest database consisting of only "day" and "sales"
could show this pattern, give the retailer some idea of
its magnitude, and allow for planning of inventory and
staffing.
Even when building a massive database, try out some
simple analysis on the data while the database is still
moderate in size. You may decide to collect the data
differently or to collect different data altogether.
Myth: Data mining is more effective with more data,
so all existing data should be brought into any
data-mining effort.
Fact: More data items are useful only if they
contribute more information about the issues at hand,
or goals. Otherwise, they can be worse than
worthless. A database may have a great deal of
information about an item (or about the relationship
between items) but nothing about other items that are
actually closely related. For example, a company may
have information about how customers use one credit
card, but nothing about how those customers use
their other credit cards.
However, adding data with little information content
can actually lower the predictive power of the
database. By including irrelevant data or adding
multiple measurements of the same item, the utility of
the data-mining results will be reduced. For example,
if you include age as well as birth date, the analysis
tool will discover that both factors are equally relevant
and will therefore assign a lower weight to both
measures as predictors.
Myth: Building a data-mining model on a sample of a
database is ineffective, because sampling loses the
information in the unused data.
Fact: The thrust of almost all developments in the
study of sampling is to maximize the amount of
information gained per unit of effort expended.
Keep in mind that your data probably already
represents a sample of a larger population. When you
analyze your customer database to help acquire new
customers, you're basing your model on a sample of
the total population.
Under some circumstances, you may be forced to
sample. Not all your data may be relevant to the
problem at hand or reflect the population you're trying
to model. Many data warehouses include historical
data that reflects conditions-such as unexpired
patents-that no longer apply, rendering it
inappropriate for building a model to guide future
decisions.
Sometimes full-scale data-gathering is not practical.
For example, if you'd like to learn about customers'
satisfaction with your new product or service, but it
takes an hour to administer a customer satisfaction
survey, you'll most likely decide to limit your analysis
to a sample.
In fact, a relatively small random probability sample,
correctly taken, can yield excellent results. Although
there are 60 million or more voters in a presidential
race, the final poll before the election, which is based
on two-thousandths of 1% of those voters, is seldom
off by more than 2%. If we had a database of all 60
million voters and hundreds of measurements on each
one, we couldn't build a better model for predicting
the winner.
Even when it's possible to build the model on the
entire database, you may choose not to. It's often a
better use of resources to build and evaluate many
models using samples of the data, rather than rely on a
single model using all the data.
Myth: Data mining is another fad that will soon fade,
allowing us to return to standard business practice.
Fact: Although the name may change, data mining as
a vital application will not go away. Companies have
been using related quantitative techniques in many
parts of their businesses for a long time. Data mining
is just one more advance in a research process that
has been ongoing since the beginning of the 20th
century. A recent increase in the power of computers,
coupled with cheap electronic methods for capturing
large amounts of data, brings us to this step now.
Data mining can't be ignored-the data is there, the
methods are numerous, and the advantages that
knowledge discovery brings to a business are
tremendous. Companies whose data-mining efforts
are guided by "mythology" will find themselves at a
serious competitive disadvantage to those
organizations taking a measured, rational approach
based on facts.
Robert D. Small is VP of Research of Two Crows
Corp. in Potomac, Md. He can be reached at
[email protected].
SIDEBAR: Six Steps For Successful Data Mining
- Identify the goal
- Assemble the relevant data
- Choose your analysis methods
- Decide which software tool is best for implementing
the method
- Run the analysis
- Decide how to implement the results
Data: Two Crows Corp.
Copyright � 1997 CMP Media Inc.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Thu, 16 Jan 1997 22:15:57 -0800
Subject: EDS in roads into the data warehouse, datamining, DSS areas
EDS, the largest computer service provider in the world, has established
a focused consulting practice in the area of data warehousing, data
mining and decision support systems. EDS built a world-class integration
lab (in the domain of the insurance industry)to demonstrate
applications, test tools, integrate solution components and build proof
of concepts. For a free white paper and additional information, please
contact Nathan Uffenheimer at (972)604-8915.
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "jpbrown" <[email protected]>
Organization: Ultimate Resources
Date: Thu, 16 Jan 1997 13:57:24 -0006
Subject: What Needs To Be Done, And Why.
Descriptive Introduction: The Databases that are the core of
Data Warehousing are not just repositories. Together, they
form an interactive machine that makes it possible to learn
much more about the constituent population or populations.
This expands on: http://www.hal-pc.org/~jpbrown
Text: Most data collections are hybrid in one way or another.
I have spent several years studying many actual cases. Over and
over again, I ran into the apples and oranges problem, where
there are sub-populations that are very different, one from
another. I do not need to tell you how confusing the results
of analysis can be, if these situations are ignored.
I have continued to devise ways to detect the anomalies of the
hybrid database, always assuming that some aspects of this problem
may be present, or may develop with the passage of time. If they
do develop as time goes on, there needs to be a method for
detecting the onset of Change. I have developed, and expect
to continue to develop, new methods to make effective, reliable
analyses in cases where hybrid sub-populations are recognized.
In using these techniques you can:
* take an unfamiliar population and diagnose potential problems.
* identify the causes of the problems.
* apply different methods that will measure the analyzability of
naturally occurring hybrid populations.
* suggest ways to increase the utility of data, or to point out
that some types of data are incurably unhelpful.
* use different techniques (Autoclassification) to separate out
sub-populations, based on predictability or other sources
of coherence.
* make reliable predictions.
* detect and remedy Changes in causal systems that would
otherwise reduce reliability.
So far, the great strides that have been taken in Databases, Data
Marts and Data Warehouses, have been advances in Data Manipulation.
The next great strides will be taken in SuperInduction, and they
will be applied before, during, and after the various steps of
manipulation.
The resulting Output:
* will be based, without prejudice (objectively), on the Input.
* will also have had the benefit of many kinds of new knowledge,
developed during the analytical process.
* and will be ideally presented to produce the best possible
results for the corporate user (Decision Support).
If you have gone through the Web Site http://www.hal-pc.org/~jpbrown
and you want to see some of the extra complex links, let me know at
[email protected]
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 17 Jan 1997 08:46:53 -0500
From: [email protected] (Fazel Famili)
Subject: Intelligent Data Analysis Journal - First Issue is live
Intelligent Data Analysis - An International Journal (New)
An electronic, Web-based journal
Published by Elsevier Science
URL: http://www.elsevier.com/locate/ida
http://www.elsevier.nl/locate/ida
The first issue of Intelligent Data Analysis journal is on live. This is
a quarterly journal published by Elsevier Science Inc. The journal is
planning to offer a number of new features that are not currently available
in paper journals: (i) an alerting service notifying subscribers of new
papers in the journal, (ii) links to large-scale data collections, (iii)
links to secondary collection of data related to material presented in the
journal, (iv) the ability to test new search mechanisms on the collection
of journal articles, (v) links to related bibliographic material, and (vi)
inclusion of 3-D objects and multiple color graphs.
Please refer to one of the above sites that contain articles for the first
issue and journal home page (e.g. Aims and Scope, Author Submission Guide-
lines, and more).
Best wishes,
A. Famili
Editor-in-Chief
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 23 Jan 1997 10:02:10 -0500
From: [email protected] (Bin Li)
Subject: new siftware entry for PC4.5
Could you add an entry in the Siftware page for our parallel
C4.5 classification tool? Thanks,
_______
Bin Li
----------------------------------------------------------------------------
Siftware: Parallel C4.5 (PC4.5)
*URL: http://merv.cs.nyu.edu:8001/~binli/pc4.5/
*Description: If you have C4.5 and a network of workstations that are
accessible to you, PC4.5 will help you better use C4.5. PC4.5 offers you
these advantages:
1. It is faster. In an N trial c4.5 run, a single process builds N
classification trees one by one and then picks the best one. In
PC4.5, the N trials are each handled by a process and each process
is run on a different machine (if N or more machines are available).
2. It is fault-tolerant. PC4.5 automatically assigns a process to
a machine if the machine is idle (i.e. no activity by the machine's
owner). If the owner of a machine comes back or it fails during
a PC4.5 computation, the PC4.5 process automatically retreats and
resumes on a different machine that is idle.
3. It supports multiple platforms. PC4.5 runs on SunOS, Solaris and
Linux machines (for HPUX, IRIX, and ALPHA, please contact author).
Networked multi-platform workstations can run PC4.5 processes of a
single PC4.5 program at the same time.
PC4.5 is built with the Persistent Linda (PLinda) system, a software system
for robust distributed parallel computing developed at New York University.
To get more information on PLinda, please visit our web site at
http://merv.cs.nyu.edu:8001/~binli/plinda/ or send email to
[email protected].
Both PC4.5 and PLinda are research efforts led by professor Dennis Shasha.
Important: You must have the original C4.5 package in order to use PC4.5.
To get C4.5, please contact Dr. J. R. Quinlan ([email protected]).
*Discovery tasks: Classification
*Platform(s): Unix (SunOS, Solaris, Linux; please contact author for HPUX,
IRIX, and ALPHA)
*Contact: Bin Li
715 Broadway, Rm 715
New York, NY 10003
(212) 998-3485
email: [email protected] (preferred)
*Status: Public Domain (source code)
*Source of information: ftp://cs.nyu.edu/pub/plinda/pc4.5.tar.gz
*Updated: 1997-01-22 by Bin Li, [email protected]
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: 17 Jan 1997 12:04:57 +0000
From: "Ed Babb" <[email protected]>
Subject: kdd- job in data mining
OPPORTUNITY IN DATA MINING!
PARSYS is a leading European supplier of parallel systems and technology. They
are currently the lead partner in a large multinational ESPRIT project aimed at
building a parallel data mining file server. Consequently, they are looking for
people interested in data mining systems and with experience of parallel
computers, database technology and machine learning.
The positions involve adapting learning techniques such as rule induction,
neural networks, genetic algorithms to run on a parallel computer. Also helping
to adapt an existing database system to run on a parallel machine. Enthusiasm
for producing fast algorithms in C is essential.
At least a 2.1 degree in Computing, Artificial Intelligence or equivalent is
needed. In addition, several years relevant experience is desirable. Salary
will depend on age and experience.
Please post your CV stating current salary to: Ed Babb, PARSYS LTD, Boundary
House, Boston Road, Hanwell, London, W7 2QE, UK. Alternatively email him on
[email protected] if you wish to make any brief informal enquires.
Please see http://www.parsys.com/dafs.htm for summary of the DAFS project.
*********************************************
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected] (BERLEANT DANIEL J)
Date: Tue, 21 Jan 1997 08:20:37 -0600
Subject: POSITION: Tenure Track, Teaching and Research
This is an informal request for inquiries from people interested in
the tenure track position offered by our dept. starting next
September. Feel free to spread the word.
If you are interested in teaching two software related courses per
semester (typically one undergrad, one grad) and in doing research in
empirical NLP, text processing, information retrieval from full text,
data/knowledge mining from full text, etc., AND you have/are getting
Ph.D. and a formal qualification in engineering (Bachelor's, Master's,
or Ph.D. degree with the word "engineering" in it or issued by a
dept., college, campus, or university with the word "engineering" in
its name, etc.), please email me to discuss applying.
If you don't think you have an engineering degree, check - maybe
you'll be surprised.
I am very interested in promoting applications from people in the
above mentioned areas and look forward to responding forthrightly to
your inquiry.
Best Regards,
Daniel Berleant
Dept. of Computer Systems Engineering
University of Arkansas, Fayetteville
Phone: (501) 575-5590
Fax: (501) 575-5339
Email: [email protected]
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 20 Jan 97 12:54:56 PST
From: "Dave Stodder" <[email protected]>
To: [email protected]
Subject: Data Mining Summit program
As you know, the 1997 Data Mining Summit is coming up Feb. 18-21 in
San Francisco. The conference is sponsored by Miller Freeman Inc.'s
Database Programming & Design and DBMS magazines.
We have a great lineup of speakers: Usama Fayyad, Evangelos
Simoudis, Kamran Parsaye, Larry Kershberg, Bob Vere, Gene Feruzza,
and others, including case studies. The complete program is located
at www.dbsummit.com.
I am attaching files of the complete program, if it would be
possible to include it with KDD Nuggets.
Thanks very much,
David
David Stodder
Conference Chair, Data Mining Summit
Editor-in-Chief, Database Programming & Design
411 Borel Ave., Suite 100
San Mateo, CA 94402
(415) 655-4290, Fax (415) 655-4350
Internet: [email protected]
Return-Path: <[email protected]>
Date: Mon, 27 Jan 1997 17:01:16 -0500 (EST)
X-Sender: [email protected] (Unverified)
X-Mailer: Windows Eudora Pro Version 2.2 (16)
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
To: [email protected]
From: Gregory Piatetsky <[email protected]>
Content-Length: 36234
Tuesday, February 18
Data Mining and the Internet:
New Dimensions in Knowledge Discovery
Chaired by David Stodder
Editor-in-Chief
Database Programming & Design
Successful application of data mining tools and knowledge discovery tools
methods can have a tremendous effect on an organization. Combined with the
Internet, data mining explodes into a new world of possibility. Electronic
commerce and other activity will create huge new resources of data that
businesses can mine for greater efficiency and customer service. But perhaps
more importantly, data mining combined with Internet-based applications has
the potential to deliver whole new areas of profitable decision support
services.
This special seminar will focus on the dynamic combination of data mining,
advanced databases, and the Internet. Bringing a series of experts together,
this all-day session will cover key topics, including:
-- Development and use of intelligent software agents
-- How data mining fits with the technology advances made by commercial
search engines and browsers
-- Case studies of organizations that have created effective data mining
applications for Internet customers
-- Developments in heterogeneous database access to enable wider use of
data mining
-- Data mining and knowledge discovery methods that work best for
creating Internet-aware applications
-- Advances in graphics and data visualization that will impact Internet
data mining applications
For the latest news about this seminar, including the scheduled speakers,
please check back with this Web site. The complete program will be in place
in early December.
Wednesday, February 19
8:30 - 9:35
OLAP and Data Mining: Bridging the Gap=20
Part I
Kamran Parsaye=20
CEO
Information Discovery Inc.=20
To date, most observers have viewed data mining and online analytical
processing (OLAP) as separate components of decision support. It has been
difficult to link the two largely because no coherent theory exists upon
which to build a relationship. In this keynote speech, Parsaye will
introduce a unified theory and methodology for OLAP and data mining. He will
describe in detail how the two activities can reinforce each other.
Parsaye will begin by describing the "dimensions" of decision support=
and
how data mining activity fits into one of the dimensions. Data mining within
a single dimension is a rough approximation of multidimensional mining.
Parsaye will describe how a lack of attention to dimensionality in data
mining can result in unexpected results reminiscent of the "lossless join=
"
problem in the early days of relational databases.
In the second part of his presentation, Parsaye will present a formal
framework for mining OLAP data and will introduce a new set of
multidimensional normalization constructs that allow us to understand OLAP
discovery.
In this session you will learn:
- How OLAP, data mining, and other activities fit together in the four
"spaces," or dimensions, of decision support
- Limitations of normalization and star schemas for data mining activities
- New structures that go beyond star schemas
- A methodology for applying OLAP data mining, with three distinct=
processes
of episodic, strategic, and continuous mining for specific user groups
within corporate environments.
Kamran Parsaye is CEO of Information Discovery Inc. He has developed
commercial data mining applications since the mid-1980s. Parsaye has a range
of experience in the software industry both in research and in business, and
has provided guidance to top-level management of leading industrial,
financial, and government organizations. He is coauthor of Intelligent
Database Tools & Applications (John Wiley & Sons, 1993).
9:45 - 10:50
OLAP and Data Mining: Bridging the Gap
Part II
Kamran Parsaye
CEO
Information Discovery Inc.
(For description, see above)
Break 10:50 - 11:10
11:10 - 12:15
Institutionalizing Knowledge Discovery: Creating a New Business Process
Tej Anand
Director of Knowledge Discovery
Human Interface Technology Center
NCR Corp.
Practitioners are slowly beginning to accept that knowledge discovery is
much more than just the application of machine learning or statistical
algorithms to a dataset. Researchers understand that a knowledge discovery
process exists, and they even agree on what basic tasks make up that
process. However, for knowledge discovery to move beyond finding
"interesting trivia" to become a business process akin to marketing, the
details behind the knowledge discovery process must be expounded. Anand will
take the process apart to reveal its details; he will offer practical ideas
for accomplishing business goals through a new understanding of the process.
In this session you will learn:
- Why knowledge discovery is so difficult (contrary to what you might have=
heard)
- Why you cannot buy a tool to "do" knowledge discovery for you
- How process templates can remind the practitioner of tasks he or she must
complete and can provide a framework for making, recording, and auditing
decisions during the knowledge discovery process
- How process guides help the practitioner select data transformation
techniques, interpret data visualizations, select the correct machine
learning or statistical algorithm, and interpret results
- How embedding templates and guides into tools will allow knowledge
discovery to become an institutionalized business process.
Tej Anand is director of the knowledge discovery team at NCR Corp.=92s Human
Interface Technology Center. In 1993, he established this business and
technical consulting team to help retail, insurance, consumer packaged
goods, and other commercial enterprises realize business insights hidden in
their operational data. Team members also conduct research and development
to create knowledge discovery processes and data mining tools. Prior to
joining NCR, Anand developed data mining tools for A.C. Nielsen Co. He has
also been a member of the research staff at Philips Laboratories, where he
did research in the area of artificial intelligence software systems.
12:15 - 1:30
Lunch
Track A: Algorithms and Methods
1:30 - 2:35
Data Mining and the KDD Process: Algorithms and Limitations=20
Part I
Usama Fayyad
Senior Researcher
Microsoft Research
This two-part talk will provide an overview of the rapidly growing area of
knowledge discovery in databases (KDD). Fayyad will define KDD goals,
present motivations guiding the KDD process, and discuss how KDD relates to
data mining. He will then focus on the core data mining methods. These
methods have their origins in statistics, pattern recognition, artificial
intelligence (machine learning), databases, and parallel computing. Fayyad
will explore the limitations and challenges of each major data mining
method. He will break these methods down into classes and will cover a
sampling of algorithms for each class, outlining its advantages and=
limitations.
The goal of this two-part presentation is to provide a detailed snapshot of
the current state of data mining methods, how they fit into the KDD process,
and what key challenges developers should be aware of when applying them.
Fayyad will focus primarily on the technical aspects of the algorithms
rather than their use in particular implementations.
In this session you will learn:
- Definitions of KDD and data mining and how the two areas fit together
- Dominant data mining methods used in the field and the specific problems
they address
- Critical limitations and challenges of each method
- How to avoid pitfalls when applying data mining methods.
Usama Fayyad is a senior researcher at Microsoft Research. His interests
include knowledge discovery in large databases, data mining, machine
learning theory and applications, statistical pattern recognition, and
clustering. Before joining Microsoft in 1996, he headed the Machine Learning
Systems Group at the Jet Propulsion Laboratory (JPL), California Institute
of Technology, where he developed data mining systems for automated science
data analysis. He remains affiliated with JPL as a distinguished visiting
scientist. Fayyad received the JPL 1993 Lew Allen Award for Excellence in
Research and the 1994 NASA Exceptional Achievement Medal. He was program
cochair of KDD-94 and KDD-95 (the First International Conference on
Knowledge Discovery and Data Mining). He is general chair of KDD-96, an
editor-in-chief of the journal Data Mining and Knowledge Discovery, and
coeditor of Advances in Knowledge Discovery and Data Mining (MIT Press,=
1996).
2:45 - 3:50
Data Mining and the KDD Process: Algorithms and Limitations=20
Part II
Usama Fayyad
Microsoft Research
(For description, see above)
3:50 - 4:15
Break
4:15 - 5:00
Data Mining: The View from IBM
5:00 - 5:45
Data Mining: The View from Tandem Computers
Track B: Case Studies in Data Mining
1:30 - 2:35
Leveraging Customer Information for Competitive Advantage
Lisa Modisette
Director of Wireless Intelligent Solutions
Lightbridge Inc.
The cellular phone industry today looks much like the credit-card industry
of a few years ago. The market is growing at nearly 50 percent a year but
will reach a saturation point soon- just as the credit card industry has.
"Churn," or customer attrition, is a growing problem for the maturing
cellular phone industry. In this case study, Modisette will describe how
data mining techniques that worked so well in the credit card industry to
prevent and reverse customer attrition may be applied to the wireless
telecommunications industry.
Modisette will describe how Lightbridge Inc., a wireless communications
provider, has used data mining tools to retain good customers at minimal
cost. Data mining tools make use of existing customer transactional and
demographic data, allowing companies to quickly and easily discover customer
needs. Detailed customer knowledge will enable carriers to prepare for a
more saturated market and offer new businesses based on customer knowledge.
In this session you will learn:
- How Lightbridge uses data mining and churn modeling techniques to combat
customer attrition
- Specific predictive modeling techniques and their effectiveness
- How to get the most out of existing data and acquire a deeper knowledge=
of
customer behavior.
Lisa Modisette is responsible for the development and marketing of
Lightbridge Inc.=92s Wireless Intelligence line of products and services,
designed to provide decision support and database marketing to wireless
carriers. She joined Lightbridge in 1994 and has driven the development of
the new decision-support product line since its inception. Modisette has
experience in identifying customer needs and in creating and maximizing the
use of decision-support systems, database marketing, and customer
segmentation. Modisette also has expertise in OLAP, business intelligence,
database marketing, product management, sales training, and a variety of
information technology. Before joining Lightbridge, she was director of the
telecommunications industry practice at Metaphor Inc., an IBM subsidiary.
She has a B.A. in marketing from the University of Colorado.
2:45 - 3:50
Business Experiences with Data Mining
Evangelos Simoudis
Director of Data Mining Solutions
IBM Corp.
Health care and insurance are two industries that offer interesting
opportunities for data mining applications. In this presentation, Simoudis
will describe how two businesses have developed production data mining
systems. The Health Insurance Commission (HIC), an agency of the Australian
government, processes claims for Australia=92s Medicare, Medibank Private,
Pharmaceutical Benefits, and Child Care programs. HIC uses data mining to
help reduce costs by ensuring that all medical tests and services are
appropriately prescribed and accurately billed.
John Hancock, an insurance and financial services provider, has a marketing
and services database to support the company=92s cross-selling efforts and=
to
accurately identify future customer service requirements. Hancock developed
a survey of 55,000 targeted users; it uses data mining to provide profiles
based on survey results.
In this session you will learn:
-- Case study examples of data mining methods used for reducing costs and
profiling customers
-- The technology/business integration important for data mining success
-- Important processes to ensure accurate results from data mining
Evangelos Simoudis is IBM=92s director of Data Mining Solutions. Before
joining IBM, Simoudis led Lockheed Corp.=92s data mining research, and was
responsible for the commercial introduction and marketing of Lockheed's
Recon data mining system for financial and retail markets. Simoudis also
spent six years as a member of the principal research staff at Digital
Equipment Corp.'s Artificial Intelligence Center. He conducted research on
machine learning, pattern recognition, knowledge-based systems, and
distributed artificial intelligence; Digital has incorporated his research
work in products for engineering design and diagnostics. Simoudis has
written extensively on data mining and machine learning, and is the North
American editor of the Artificial Intelligence Review.
3:50 - 4:15
Break
4:15 - 5:00
Data Mining: The View from Angoss Software
Thursday, February 20
8:30 - 9:35
Keynote Speech
Speaker TBA
9:45 - 10:50
Weaving Detail into the Big Picture
Denise M. Barnhart
Chief, Corporate Analysis Division
Army and Air Force Exchange Service
"There=92s too much data ... but it=92s just not enough." With the=
continued
growth of very large databases (VLDBs) and the mushrooming need for quick
access to progressively smaller details of the retail business, corporations
risk losing sight of the larger view, the brighter opportunity, or the
insidious trend. The Army and Air Force Exchange Service (AAFES), which
provides $6 billion in goods and services to military servicemen and
servicewomen around the world, has taken on this challenge. In a case study
presentation, Barnhart will describe AAFES=92s extensive use of massively
parallel analytical processing and data mining. The organization uses this
advanced technology for retail research and integrating analysis results
with operational and strategic processes.
In this session you will learn:
- How AAFES uses neural nets to understand demographics and project market
potential
- Neural net applications that let an organization view data both at the
total business level and at the detailed level of specific items in a retail
store
- How AAFES calculates relationships between retail items and categories=
and
links these categories to demographic characteristics
- Techniques for the cross-utilization of multiple databases for=
configuring
retail stores to maximize corporate earnings per square foot
- How to overcome challenges in integrating database patterns into the
corporate strategic vision.
Denise Barnhart is chief of the Corporate Analysis Division, part of the
Army and Air Force=92s Exchange Service=92s (AAFES=92s) Strategic Planning
Directorate. AAFES is profit-generating agency of the Defense Department.
Barnhart joined AAFES in 1976 as a CPA and has since specialized in the
strategic optimization of stores for the benefit of both customer
satisfaction and bottom line. She was an early proponent of the day-to-day
use of neural nets in planning store construction in the late =9280s. Today,
AAFES wholly plans mall sales and earnings levels, store mix, sizing, and
parking requirements with neural net analyses. With the refinement of retail
point-of-sale in the =9290s, Barnhart has extended corporate strengths in
local markets.
10:50 - 11:10
Break
11:10 - 12:15
The Visualization of Large, Complex Datasets
Georges Grinstein
Professor, Institute for Visualization and Perception Research=20
University of Massachusetts Lowell
Visualization is the translation of data, sampled or generated, into some
perceptual presentation, most typically visual, to provide insights into the
data. It represents the mapping of data into a symbolic representation
useful for researchers, analysts, scientists, and business managers. This
"mapping," or interaction, can occur at several stages of the=
visualization
presentation pipeline; it directs the transformations or alters the
presentation of data.
Visualization is no longer simply an application of computer graphics. While
computer graphics remain the underpinning technology of this discipline,
visualization now includes- and must support- databases, real-time
interaction, networking, supercomputing, multimedia, visual programming,
systems theory, and human perception. This development has provided some
very fertile ground for integrating knowledge discovery, statistics, and
visualization.
In this talk Grinstein will highlight key research issues in the
visualization of large, complex informational spaces.
In this session you will learn:
- A brief history of visualization, from initial efforts to extend data
presentation beyond the classic pixel-driven techniques to the current
challenge of encompassing domain knowledge
- How visualization and data mining can work together to provide rich
user-exploration and analysis environments
- How to make astute use of visualization techniques.
Georges Grinstein is a professor of computer science at University of
Massachusetts in Lowell, Massachusetts. He also serves as director of the
university=92s Institute for Visualization and Perception Research and is
principal engineer with MITRE Corp.'s Center for Air Force C3I Systems.=20
Track A: Algorithms and Methods
1:30 - 2:35
Improving Prediction Performance with Genetic Algorithms=20
Steven Vere
President
Ultragem Data Mining Co.
Data mining with genetic algorithms is a new technology aimed at improving
prediction performance. However, many of today's commercial data mining
products actually incorporate older machine learning algorithms, such as ID3
and CART. These systems use heuristic algorithms to generate decision rules.
Being heuristic, they do not guarantee the best in prediction performance;
in most cases, we now know they do not. Ten years ago, these technologies
represented a good trade-off between prediction performance and training
speed. But in today=92s high-speed computing environment, it is possible to
use the controlled, brute computational force of genetic algorithms to find
the higher performing prediction rules that heuristic algorithms overlook.
In this presentation Vere will describe techniques for efficiently applying
the genetic algorithm paradigm to large data mining problems.
In this session you will learn:
- The definition and description of genetic algorithms
- Applications of genetic algorithms to data mining and numerical=
prediction
problems
- How specific techniques, such as averaging the predictions of sets of
genetically generated classifiers, can significantly enhance performance.
Steven Vere is president and founder of Ultragem Data Mining Co., a data
mining consulting company specializing in the commercial application of
evolutionary algorithms. He has over 20 years of experience in machine
learning and artificial intelligence. Vere has served as a member of the
computer science faculty at the University of Illinois, Chicago and has also
held senior technical and management positions at the NASA Jet Propulsion
Laboratory, Lockheed R&D Division, and Bank of America. His work has
appeared in research journals, AI Encyclopedia, and Scientific American; he
will be featured on a future episode of Beyond 2000, a television
documentary series. Vere holds a Ph.D. in computer science from University
of California at Los Angeles.
2:45 - 3:50
Data Mining: Finding the Total Business Solution
Gene Feruzza
President, Customer Management Services
Too often, we view data mining as only data visualization, predictive
modeling, or some other specific technique. Although these components are
important, supporting the total business solution requires that we take a
much broader scope. In this talk, Feruzza will on data mining processes in
real-world applications developed in telecommunications, financial services,
utilities, and online services. He will describe the cyclical nature of
successful data mining, first focusing on the data infrastructure (data mart
or warehouse) and data access and manipulation. Feruzza will then describe
the role, and integration, of modeling processes and technologies, including
rule-based techniques, traditional statistics, neural networks, and genetic
approaches. He will discuss experiences with delivering the knowledge
obtained from the technology to the business user, and how promote the
strategic integration of technology and business applications.
In this session you will learn:
-- How to view the full scope of data mining needs to be to be successful.
-- Why it=92s important to embrace and support all modeling technologies,=
not
just one
-- Solutions to common pitfalls based on data mining experiences
-- Best practices for delivering knowledge gained to the business user
-- Why data mining should be a cyclical, "living" process.
Gene Feruzza has extensive experience with advanced segmentation techniques
utilizing basic statistics and regression modeling, rule-based segmentation,
neural network modeling along with evolutionary and hybrid modeling
architectures. For 12 years he has provided integrated marketing and
business solutions for clients in telecommunications, electric utilities,
financial services, aerospace, manufacturing, and retail. He has worked for
two leading neural network hardware and software providers (HNC and Neural
Ware) as an instructor and consultant. He has also developed and marketed
his own database management and segmentation software. Feruzza graduated
from the University of Pittsburgh with a BS in computer science and=
mathematics.
4:15 - 5:00
Data Mining: The View from NeoVista
7:30 - 9:00
1:30 - 3:00
Birds of a Feather
Breakout Sessions
Success with data mining depends on an intimate knowledge of specific
industry application requirements. After the first Data Mining Summit last
April, we received many requests to include in the program organized
"networking" sessions for attendees to discuss specific industry=
challenges.
To close out the Second Annual Data Mining Summit, we invite attendees to
join in our special Birds of a Feather sessions, which will focus on data
mining issues faced by specific industries. A vertical industry expert will
lead each discussion group.
Come and share your questions and experiences with other like-minded data
mining practitioners! Depending on popularity, we plan to offer Birds of a
Feather sessions about data mining in the following industries:
- Retailing
- Health care
- Financial services
- Telecommunications
To help us organize the Birds of a Feather sessions ahead of the conference,
please use the registration form to choose which vertical industry session
you would like to attend.
Track B: Case Studies in Data Mining
1:30 - 2:35
Artificial Intelligence and Process-Delay Analysis: A Decision-Tree Case=
Study
Bob Evans
Member, Advanced Technology Staff
RR Donnelley & Sons Co.
Cylinder wear (called "banding") causes serious delays in the=
rotogravure
printing process and has plagued the industry for decades. A process-delay
analysis initiative at RR Donnelley & Sons=92 Gallatin, Tennessee plant has
reduced the incidence of cylinder banding to near negligible levels. In this
presentation, Evans will describe the Evans-Fisher Process Analysis Model, a
solution driven by decision-tree induction. Through case study examples, he
will describe the use of this powerful artificial intelligence method for
data mining. Evans will also address some of the business and social issues
associated with data collection and analysis.
At RR Donnelley, database technology is the vehicle for solving process
problems. Evans will show how decision-tree induction may be viewed as
automated query generation. Attendees will see examples of queries generated
by this tool. Evans will explain how decision-tree induction guides users
away from the "blind alleys" that can frustrate data mining efforts.=20
In this session you will learn:
- How to astutely define and collect data for decision-tree induction
- Case study examples of how the Evans-Fisher Process Analysis Model was
developed and applied
- How to use artificial intelligence and data mining to solve complex
industrial problems.
Bob Evans is on the advanced technology staff of RR Donnelley & Sons Co. in
Gallatin, Tennessee. He is also an adjunct assistant professor of computer
science at Volunteer State Community College in Tennessee. A 33-year
employee of RR Donnelley, he is responsible for implementing and upgrading
process-delay analysis using current data mining technology. He has
published several articles and has given presentations on shop-floor
applications of artificial intelligence. Computer scientists frequently cite
his application of decision-tree induction to cylinder bands as a successful
example of the transfer of data mining technology from the research
laboratory to an industrial environment. Evans holds an A.B. degree in
mathematics from Indiana University and a Master of Engineering degree in
computer science from Vanderbilt University.
2:45 - 3:50
Fraud Detection Systems: Combining Data Mining and Machine Learning
Tom Fawcett, Foster Provost
Members of the Technical Staff
Machine Learning Project
NYNEX Science and Technology
In this presentation, Fawcett and Provost will describe a framework that
combines data mining and machine learning techniques to design fraud
detection methods. Fraud detection is based on profiling customer behavior
and checking for anomalies. The domain of this case study is cloning fraud
in cellular telephony, but the methods involved are more widely applicable:
any domain in which fraudulent usage is superimposed upon legitimate usage
(as in credit card fraud) is a candidate. Fawcett and Provost use a
rule-learning program to uncover indicators of fraudulent behavior from a
large database of cellular calls. They will show how they use these
indicators to construct profilers and how their system combines evidence
from multiple profilers to generate high-confidence alarms.
In this session you will learn:
- How to create a profitable synergy of data mining and machine learning
- How to address the intricacies of building data mining systems under
real-world constraints
- Complications that arise when trying to assign cost/benefit trade-offs=
(the
cost of handling a false alarm differs from the cost of missing fraudulent
usage, which varies among fraud cases).
Tom Fawcett works in machine learning, data mining, and knowledge-based
systems. He has worked at NYNEX Science & Technology, GTE Laboratories, and
MITRE Corp. Fawcett holds a Ph.D. from the University of Massachusetts at
Amherst. While at GTE, his machine-learning system was used for automated
adaptation in telecommunications network management. He developed and
maintained a large knowledge-based mission planning system for MITRE.
Fawcett has published articles addressing the representation problem in
machine learning and has done research in case-based reasoning.
Foster Provost works on machine learning and data mining at NYNEX Science
and Technology, where, in addition to developing methods for the automated
design of fraud detection systems, he has also made advances by combining
data mining techniques with decision-analytic techniques for cost-effective
technician dispatch. Prior to joining NYNEX, Provost worked on data mining
in scientific domains, including botanical toxicology, high-energy physics,
and infant mortality. His work produced advances in rule learning, scaling
up machine learning methods to large databases, using background knowledge
to guide learning, and selecting inductive bias. Provost holds a Ph.D. from
the University of Pittsburgh, where he held IBM and Mellon graduate
fellowships. He received a B.S. in physics and mathematics from Duquesne
University. He is a recent recipient of NYNEX's President's Award.
4:15 - 5:00
Data Mining: The View from DataMind
7:30 - 9:00
Birds of a Feather
1:30 - 3:00
Success with data mining depends on an intimate knowledge of specific
industry application requirements. After the first Data Mining Summit last
April, we received many requests to include in the program organized
"networking" sessions for attendees to discuss specific industry=
challenges.
To close out the Second Annual Data Mining Summit, we invite attendees to
join in our special Birds of a Feather sessions, which will focus on data
mining issues faced by specific industries. A vertical industry expert will
lead each discussion group.
Come and share your questions and experiences with other like-minded data
mining practitioners! Depending on popularity, we plan to offer Birds of a
Feather sessions about data mining in the following industries:
- Retailing
- Health care
- Financial services
- Telecommunications
To help us organize the Birds of a Feather sessions ahead of the conference,
please use the registration form to choose which vertical industry session
you would like to attend.
Friday, February 21
8:30 - 9:35
Data Mining 1997/98: Key Trends & Market Perspectives
Aaron Zornes
Executive Vice President and ADS Service Director
Meta Group
Although the data mining market garnered less than $100 million in 1996,
industry analysts at Meta Group forecast the market will explode to more
than $800 million by the year 2000. During 2Q96, Meta Group surveyed 250+
Global 2000=96size business users of data mining products and services in
retailing, healthcare, financial services, and telecommunications. This
presentation will highlight key survey findings regarding adoption criteria,
timelines, technical parameters, and leading business applications. Meta
Group=92s study investigated not only the traditional uses of data mining
technology, such as fraud prevention and credit card authorization within
the financial services industry, but also investigated rapidly emerging
requirements stemming from data warehouse implementations and Web-enabled
commerce and marketing.
In this session you will learn:
- How to interpret early user adoption rates by industry segments
- What will be the impact of emerging systems integrators and data bureaus
- What=92s behind current data quality, data warehouse, and data=
visualization
trends
Aaron Zornes is executive vice president and ADS service director for Meta
Group. He is a leading authority on the software industry as it relates to
applications development and delivery- especially data warehousing and
second-generation multitier client/server applications. Zornes has devoted
more than 20 years to line and strategic management roles in leading vendor
and user organizations, including executive and managerial positions at
Ingres Corp., Wang Laboratories Inc., Software AG of North America, and
Cincom Systems Inc. He is a frequent author and keynote speaker on data
warehousing, data mining, advanced client/server tools, and customer-centric
application architectures. Since 1992, He has been conference chair of DCI's
Data Warehouse World conference series.
9:45 - 10:50
Knowledge Rovers: Configurable Agents to Support Enterprise Information
Infrastructures
Larry Kerschberg
Professor and Chair, Information and Software Systems Engineering
School of Information Technology and Engineering
George Mason University
Knowledge rovers represent a family of cooperating intelligent agents that
can support a collection of scenarios, decision-makers, and tasks. These
rovers play specific roles within the enterprise information infrastructure
to support users, maintain complex views, and mine and refine data into
knowledge. Rovers can roam the Internet, seeking, locating, negotiating for,
and retrieving data and knowledge specific to their mission.
For decision-makers to make appropriate use of information, the current
flood of data must be filtered and transformed. In this presentation,
Kerschberg will describe knowledge rovers and the data mining and software
agent technology that creates them. He will highlight important rovers and
how they fit into data warehouse, data mine, and data mart architectures.
Kerschberg will describe Field Agent rovers that discover new resources,
collect data, and bring back information; Information Curator rovers that
refine data into knowledge and place it in an information repository; and
Domain Servers that from within the repository facilitate access to multiple
data types, such as images, text, formatted data, and simulation data
related to a particular domain. Finally, Kerschberg will discuss Sentinal
rovers that monitor Domain Servers for interesting events, patterns, and
specified conditions to alert decision-makers and take actions on their=
behalf.
In this session you will learn:
- The role of intelligent agents in supporting enterprise information
architectures
- How to integrate a family of configurable rovers for discovery,
integration, and evolution of information
- The interrelationship among concepts such as data warehouses, data mines,
and information repositories in the enterprise information infrastructure
- The concept of virtual data mines and data mining over multiple
heterogeneous data sources.
Larry Kerschberg is professor and chair of the Department of Information and
Software Systems Engineering in the School of Information Technology and
Engineering at George Mason University in Virginia. He is also director of
the university=92s Center for Information Systems Integration and Evolution.
His research focuses on intelligent agents, intelligent information
integration, data mining and knowledge discovery in databases, and expert
database systems. His research is funded in part by DARPA. Kerschberg is
also President of KRM Inc., which pursues research and development in
knowledge rovers and mediators in intelligent information systems. He is
editor-in-chief of the International Journal of Intelligent Information
Systems, published by Kluwer Academic Publishing Co. Kerschberg organized
and has served as program chair of the First and Second International
Conferences on Expert Database Systems. He holds a Ph.D. in engineering from
Case Western Reserve University.
10:50 - 11:10
Break
11:10 - 12:15
Privacy Issues and Data Mining
Panel Session Chaired by
David Stodder,
Editor-in-Chief,
Database Programming & Design
Data mining tools, when combined with large, sophisticated databases,
already offer businesses and other organizations powerful new abilities to
learn more about clients, customers, citizens, and taxpayers. The Internet
and Web-enabled commerce will create vast sources of data and new ways to
package information databases as products and services. Privacy and security
specialists are becoming increasingly concerned that basic privacy rights
could be trampled in the race to provide modern, intelligent information
services. Businesses must take new security measures to protect proprietary
data- and learn how to resolve the tug-of-war with competitors and service
contractors over just who owns the data.
This panel session will feature a selection of experienced users, security
experts, and data mining professionals, who will focus on privacy and
security concerns that broadly effect the practice of data mining. The panel
will discuss what measures governments and business are taking- and should
take- with regard to data mining and the development of new information=
services.
David Stodder is editor-in-chief of Database Programming & Design. He has
been with the publication since its inception in 1987. He has served on the
advisory board of several industry conferences, including IDUG North
America, DCI=92s Database and Client/Server World, and Blenheim/NDN=92s=
DB/Expo.
He is also chair of Miller Freeman Inc.=92s VLDB Summit, Object/Relational
Summit, and Business Rules Summit conferences.
|
410.14 | 97:05 | IJSAPL::OLTHOF | Spellchecked Henry Although | Wed Feb 05 1997 09:17 | 1172 |
| Knowledge Discovery Nuggets 97:05, e-mailed 97-02-04
News:
* W. Kloesgen, KDD-97: Call For Panel Proposals
* E. Colet, Announcing a regular posting of NBA data mining patterns,
http://www.nba.com/news_feat/
* GPS, Business Week Feb 3, 1997 Story on Data Mining
* B. Griffin, Tools for quantifying newgroups and email postings?
* M. Rebhan, GeneCards: genes, proteins and diseases.
http://bioinformatics.weizmann.ac.il/cards
Publications:
* A. Basu, CFP: INFORMS Journal on Computing Special issue on
Knowledge Discovery and Data Mining
* M. Singh, CFP: IEEE Internet Computing, Special issue on Agents
http://www.computer.org/pubs/internet/
Positions:
* W. Buntine, PhD/Masters Research Assistantship at Berkeley
Meetings:
* D. Gordon, CFP: ICML-97 Workshop on ML application in the real world
http://www.aifb.uni-karlsruhe.de/WBS/ICML97/ICML97.html
* M. Smyth, Learning Methods Course by Hinton and Jordan,
Washington, D.C., May 2 -- 3, 1997
* J. Zytkow, Forthcoming events related to Data Mining
PKDD'97, ISMIS-97 and KDD-97
--
KDD Nuggets is a free electronic newsletter for the Data Mining and Knowledge
Discovery in Databases (KDD) community, focusing on the latest research and
applications.
Submissions are most welcome and should be emailed,
with a DESCRIPTIVE subject line (and a URL, when available) to [email protected]
To subscribe, email to [email protected] message with
subscribe kdd-nuggets
in the first line (the rest of the message and subject are ignored).
See http://info.gte.com/~kdd/subscribe.html for details.
Nuggets frequency is approximately 3 times a month.
Back issues of Nuggets, a catalog of S*i*ftware (data mining tools),
and a wealth of other information on Data Mining and Knowledge Discovery
is available at Knowledge Discovery Mine site http://info.gte.com/~kdd
-- Gregory Piatetsky-Shapiro (editor)
********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) *
* and not necessarily of their respective employers (or GTE Laboratories) *
*****************************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If there is a 50-50 chance that something can go wrong, then 9
times out of ten it will. (Paul Harvey News, 1979)
Excerpted from "Quotes, damned quotes and..." by John Bibby.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 3 Feb 1997 15:01:47 +0100
From: [email protected] (Willi Kloesgen)
Subject: KDD-97: Call for Panel Proposals
As in previous KDD conferences, the KDD-97 program will include panel
discussions. A great panel requires an interesting topic, good
speakers, and proper preparation. To facilitate all three we solicit
early suggestions. Please submit suggestions for topics and preferably also
for panelists who could represent diverse positions or approaches of the
topic. Suggested topics should relate to any of the main KDD-97 topics (see
http://www-aig.jpl.nasa.gov/kdd97).
The panel topics should be of general interest for a
large part of the KDD audience and allow several (controversial) approaches
to be discussed.
Please email informal suggestions by April 2, 1997 (earlier if possible) to:
Willi Kloesgen
[email protected]
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "Edward Colet"<[email protected]>
Date: Wed, 29 Jan 1997 18:00:02 -0400
Subject: Announcing a regular posting of NBA data mining patterns.
National Basketball Association teams have been using IBM's Advanced Scout
data mining application to discover trends and patterns in game data.
Now a selected set of discovered patterns are also made available to fans
via a regular posting on the Internet before and after NBA/NBC's game of
the week. The reported patterns are based on analyses of the teams
previous game(s), and additional commentary is added following the game.
The patterns can be found in the regular feature of the NBA website
entitled, "Beyond the Boxscore" (found under "News and Features"). The
NBA website is at "http://www.nba.com", and the data mining results are
under "http://www.nba.com/news_feat/". There are also links to more
information on Advanced Scout at "http://www.nba.com/ad/ibm", and at "
http://www.research.ibm.com/scout/home.html/".
Regards,
Ed Colet.
*********************************************
IBM T.J. Watson Research Center
30 Saw Mill River Road
Hawthorne NY 10532
phone: 914-784-6621; tie-line 863
fax: 914-784-7455
email: [email protected]
*********************************************
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 3 Feb 1997 09:57:37 -0500
From: [email protected] (Gregory Piatetsky-Shapiro)
Subject: Business Week Feb 3, 1997 Story on Data Mining
Last week's Business Week has a very nice story by John Verity on
"Coaxing Meaning out of Raw Data" (p. 134).
It described several successful customer modeling applications
at MCI, cellular fraud detection, US West, JPL, Walmart, and more
and featured quotes
from Usama Fayyad, Herb Edelstein, Steven Vere, and others.
"A huge opportunity is opening up", according to Usama,
but "the devil really is in the details", according
to NeoVista CEO John Harte.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 31 Jan 1997 11:41:47 -0800
From: [email protected] (Brian Griffin)
Organization: Netscape
Subject: Recommendation
Can you please recommend the best PD and commercial data mining tool for
quantifying newgroups and email postings.
Thank you very much,
Brian Griffin
Manager, Technical Support
Netscape Communications Corp.
[GPS -- if you do know such tools, please cc to [email protected] and
I will summarize to the list]
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 29 Jan 1997 05:05:46 +0200
From: Michael Rebhan <[email protected]>
Organization: Weizmann Institute of Science
Subject: GeneCards: genes, proteins and diseases.
http://bioinformatics.weizmann.ac.il/cards
This database aims at integrating knowledge about all human genes, their
products, and their involvement in diseases. And although it already
integrates what is easily available in different heterogenous databases,
the authors are planning to use technology from Artificial Intelligence,
including Knowledge Discovery in Databases (KDD) tools, to expand the
current resource. We would like to hear opinions from people inside the
AI/KDD community regarding the following projects:
a) a user guidance system that recognizes problems caused by "poorly
designed" search strategies entered to suggest intelligent options to
the user that might take him/her as fast as possible to the wanted
information (this system should thus somehow replace an expert in the
retrieval of biomedical information as much as possible).
b) knowledge extraction tools taking data from free text, like from
abstracts of papers in Medline, to gather data about the relationships
between genes/proteins (which one interacts directly with which one
a.s.o.), and about the role of a particular gene/protein in the
pathogenesis of a particular disease
Although both projects are still more or less ill-defined, we are very
interested in your ideas. If you are also fascinated by this challenge,
please email Michael Rebhan ([email protected]).
Michael Rebhan, Ph.D. Weizmann Institute of Science, Dept. Biol.
Serv.,
Bioinformatics Unit, Rehovot 76100, Israel (FAX: +972-8-934-4113)
WWW: http://bioinfo.weizmann.ac.il/cards/rebhan.html
Email: [email protected]
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 27 Jan 1997 08:48:06 -0700
From: Amit Basu <[email protected]>
Subject: cfp for INFORMS Journal on Computing
Call for Papers on Knowledge Discovery and Data Mining
for the INFORMS Journal on Computing
The knowledge and data management area of the INFORMS Journal on Computing
invites technical papers on the analysis, design and management of knowledge
discovery and data mining methods and systems. Selected papers will be
published in a special cluster on this topic. The journal is an official
publication of the Institute for Operations Research and Management
Sciences, and focuses on the interface between operations
research/management science and computer science. Papers that deal with
algorithms for system design, methods for efficient information management,
and analytical or empirical studies of system performance are welcome.
Topics of interest include (but are not limited to):
* performance analysis of KD/DM algorithms (efficiency, scalability,
reliability, etc.)
* the use of optimization methods in KD/DM
* comparative studies of KD/DM versus other exploratory data analysis
methods, including
traditional statistical and mathematical programming models
* analysis of context-specific KD/DM methods
* neural networks in KD/DM
* performance analysis of uncertainty management methods in KD/DM
* analysis of KD/DM algorithms in large-scale, distributed and/or
heterogeneous database systems
* efficiency and scalability analysis of KD/DM algorithms for specialized
databases
(spatial, temporal, multimedia, statistical, etc.)
* analysis of data mining methods on confidential data
* efficient data preprocessing methods (e.g., scrubbing, sampling and
reduction) for data mining
* performance of KD/DM methods on multidimensional data
Manuscripts should be prepared according to JoC guidelines.
Deadline: July 31, 1997. Four (4) copies of each manuscript should be
submitted to Professor Amit Basu, the Area Editor for Knowledge and Data
Management, at the following address:
Owen Graduate School of Management
Vanderbilt University
Nashville, TN 37203
TEL: 615-322-7043
FAX: 615-343-7177
email: [email protected]
For more information, please contact Professor Basu at the above address, or
the Editor-in-Chief of JoC, Professor Bruce Golden, at the address below:
College of Business and Management
University of Maryland
College Park, MD 20742
TEL: 301-405-2232
FAX: 301-314-9157
email: [email protected]
------------------------------------------------------------------------------
Amit Basu
Associate Professor
Owen Graduate School of Management
Vanderbilt University
Nashville, TN 37203
TEL: 615-322-7043
FAX: 615-343-7177
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Subject: IEEE Internet Computing: Agents
From: [email protected] (Munindar Singh)
Date: Wed, 29 Jan 1997 10:27:57 -0500 (EST)
IEEE Internet Computing
http://www.computer.org/pubs/internet/
CALL FOR PAPERS
IEEE Internet Computing is a new bimonthly magazine from the IEEE Computer
Society designed to help the engineer productively use the ever expanding
technologies and resources of the Internet. Internet Computing and IC on-line
will provide developers and users with the latest advances in Internet-based
computer applications and supporting technologies such as the World Wide Web,
Java programming, and Internet-based agents. Through the use of peer-reviewed
articles as well as essays, interviews, and roundtable discussions, IC will
address the Internet's widening impact on engineering practice and society.
IC is soliciting regular papers and papers for theme issues, including one on
agents. To submit, send e-mail to any member of the editorial board.
Include a plain text abstract, and a URL from which the paper can be viewed.
Members of the editorial board are listed on the IC web page. Author
guidelines are available at http://www.computer.org/pubs/internet/auguide.htm
Topics include system engineering issues such as agents, agent
message protocols, engineering ontologies, web scaling, intelligent
search, on-line catalogs, distributed document authoring, electronic
design notebooks, electronic libraries, security, remote instruction,
distributed project management, reusable service access and validation,
electronic commerce, and Intranets.
-----------------------------------------------------------
UPCOMING THEME ISSUES
------------------------------
Agents: Editorial Board Contacts:
What kinds of agents are performing useful Munindar Singh
work on the Internet? Papers should [email protected]
clearly define both the applications and or
technologies being used as well as the Michael Huhns
sense of "agent." Applications should be [email protected]
demonstrable. Issues include security, Due date: March 15, 1997
mobility, and agent communication
languages. Claims about the efficacy of
one approach or language should be
supported by examples from applications.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Sat, 25 Jan 1997 10:50:41 -0800
From: Wray Buntine <[email protected]>
Subject: PhD/Masters Research Assistantship
PhD/Masters Research Assistantships
Field: probabilistic algorithms, data analysis/mining and
optimization for CAD
Place: Electrical Engineering and Computer Science
University of California, Berkeley
The CAD group in the EECS Dept. at UC Berkeley is offering research support
for its Masters and Doctoral program. Research areas include but are not
limited to the use of data mining/analysis/engineering techniques in CAD or
optimization, and probabilistic methods for optimization or specialized
compilation.
The Electronic Design Technology (EDT) field is concerned with computer
automated or computer-assisted design of complex electronic systems. With
current hardware capabilities advancing rapidly, a key bottleneck is the
development of advanced algorithms for optimization and simulation of
partial, abstract or completed designs. Our task is to design, code and
experiment with new algorithms, methodologies, and software technologies for
alleviating this bottleneck. The task can include the use of data
mining/analysis to understand the nature of the optimization task, or in
order to develop adaptive optimization methods.
The ideal candidate should have a background in computer science, electrical
engineering or related disciplines, should be an accomplished or developing
programmer, and should have an interest in the theory and mathematical
techniques used in optimization, data analysis, or probabilistic methods.
Candidates who wish to apply are invited to respond with a copy of their CV
to:
Professor R. Newton URL: http://www.eecs.berkeley.edu/~newton
Dr. Wray Buntine URL: http://www.eecs.berkeley.edu/~wray
Dr. Andrew Mayer URL: http://www.eecs.berkeley.edu/~mayer
Dept. of Electrical Engineering and Computer Sciences
520 Cory Hall
University of California at Berkeley
Berkeley, CA, 94720
The CAD Group URL: http://www-cad.eecs.berkeley.edu
EECS, UC Berkeley URL: http://www.eecs.berkeley.edu
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Wed, 29 Jan 97 14:44:16 EST
Subject: ICML-97 workshop CFPs
CALL FOR PAPERS
ML APPLICATION IN THE REAL WORLD:
METHODOLOGICAL ASPECTS AND IMPLICATIONS
Workshop at the Fourteenth
International Conference on Machine
Learning (ICML-97)
Nashville, Tennessee
July 12, 1997
WWW-page: http://www.aifb.uni-karlsruhe.de/WBS/ICML97/ICML97.html
Description
Application of Machine Learning techniques to solve real-world problems
has gained more and more interest over the last decade. In spite of this
attention, the ML application process is still lacking a generally accepted
terminology, let alone commonly accepted approaches or solutions.
Several initiatives, both conferences and workshops have been held
concerning this topic.
The ICML-93 workshop of Langley and Kodratoff on ML applications as well
as at the ICML-95 workshop on 'Applying Machine Learning in Practice' by
Aha, Catlett, Hirsh and Riddle form the successful precedents of this workshop.
The focus of the ICML-95 workshop was the 'characterization of the
expertise used by machine learning experts during the course of applying
learning algorithms to practical applications'. In the last year a
significant research effort has been spent that deals with applications
of learning algorithms. A reflection of this is the recent interest in
Data Mining and KDD, as for instance reflected in the international KDD-
conference (1995 (Montreal) and 1996 (Portland, OR)). Since the
application of ML-techniques is also very relevant to the KDD-community
it is not surprising that this is also reflected in those conferences.
The workshop will draw along the lines of all these events, but
will emphasise the processes underlying the application of ML in
practice. Methodological issues, as well as issues concerning the kinds
and roles of knowledge needed for applying ML will form a major focus
of the workshop.
It aims at building upon some of the results of discussions at the
ICML-95 workshop on "Application of ML techniques in practice"
and at the same time tries to move forward to a consensus regarding a
methodology on the application of learning algorithms in practice.
The workshop "ML Application in the real world; methodological aspects and
implications" focuses on the methodological principles underlying
successful application of ML techniques. Apart from powerful ML
algorithms, good application strategies have to be defined. This implies a
thorough understanding of the initial problem definition and its relation
to the chain of tasks that leads towards a successful solution. Therefore a
two-dimensional approach regarding the process of ML application is
needed. The first dimension deals with the whole cycle of analysing the
setting, problem definition, knowledge extraction, database interaction,
learning, evaluation and iteration in real-world domains, where the second
dimension forms an "inner loop" to this cycle, where the problem
definition is used to refine the task at hand and map it on available
algorithms for learning, pre- and postprocessing and evaluation of
results.
Concerning these issues there is no clear distinction between ML and KDD,
and therefore this workshop will be equally interesting for
researchers from both communities.
This workshop does not focus on (methods for) developing new algorithms.
Moreover, case studies will only contribute to the workshop discussion if
general application principles can be derived from them.
Intended Participants and Audience
The workshop primarily aims at scientists and practitioners that apply ML
and related techniques to solve problems in the real world. To attend
the workshop, one should submit a paper, a one page extended abstract or
a statement of interest. In case of too much interest from
participants, the program committee will select participants on the
basis of workshop relevance. Ideally, the audience contains a mix of
university and industrial participants.
Workshop program
The program for this one-day workshop will have a maximum of 10
presentations. Some invited presentations will be part of the program.
Presentations will take 30 minutes (15-20 minutes presentation and 10-15
minutes discussion). Speakers are asked to focus their presentation on
the basis of a topic list that will be compiled during the review
process. To foster discussion and debate, accepted papers will be given
to a critic beforehand; by these means critics will be prepared to
debate presentations. At the end of the workshop, there will be a
plenary discussion session. Accepted papers will be distributed via the
workshop WWW-page before the workshop, to stimulate the discussion.
Accepted papers will also be published in workshop proceedings.
Papers are welcomed concerning (but not limited to) the following
topics:
* Methodological approaches focusing on the process of ML application,
or sub-processes, such as problem definition and refinement,
application design, data acquisition, pre- and postprocessing, task
analysis etc.
* Making explicit the kinds and roles of knowledge that are necessary
for execution of ML applications.
* Matching of problem definitions on specific techniques and multi-
technique configurations.
* Impact of methodologies for empirical research on the application of
ML-techniques.
* Identification of the relation of different ML strategies to given
problem types and identification of the characteristics that play a
role in describing the initial problems.
* Embedding of the ML application process in more general methodologies
for (knowledge) system development.
* Frameworks for support of (ML-)novices and experts for setting up
applications and reuse of previously application(part)s.
* Case studies, describing successful ML applications, that abstract
from the implementational aspects and focus on identification of the
choices that are made when designing the application i.e. the
(meta-)knowledge involved, etc.
* Comparison of the process of ML application with processes for
application of related techniques (e.g. statistical data analysis).
Submission guidelines
* Submitted papers should not exceed 3500 words or 8 pages Times Roman
12pt.
* The title page should contain paper title, author name(s), affiliations and
full addresses including e-mail of the corresponding author, as well as the
paper abstract and five keywords at most.
* Papers are reviewed by at least three members of the program committee on
their relevance for the workshop discussions.
* For preparation of the camera ready copies, an ICML style file will be
available.
Tentative Submission Schedule
* Submission deadline: March 22, 1997
* Notification of acceptance: April 9, 1997
* Camera ready copy + PS-file: May 1, 1997
* Papers available on WWW: June 15, 1997
* Workshop date: July 12, 1997
Electronic paper submissions are preferred. Please send your submission
to:
[email protected].
If Postscript printing is not available, paper submissions (4 hardcopies,
preferably double sided) can be sent to:
ICML Workshop "ML APPLICATION IN THE REAL WORLD"
p/o ATO-DLO, Floor Verdenius
Postbus 17
6700 AA Wageningen
Netherlands
Program Committee
Dr. Pieter Adriaans (Syllogic, Houten, The Netherlands)
Prof. C. Brodley (Purdue University, West Lafayette, IND, USA)
Prof. David Hand (Open University, Milton Keynes, United Kingdom)
Prof. Yves Kodratoff (LRI, Paris, France)
Dr. Vassilis Moustakis (Technical University of Crete, Chania, Greece)
Prof. Gholamreza Nakhaeizadeh (Daimler Benz AG Research, Ulm, Germany)
Dr. R. Kohavi (Silicon Graphics, Mountain View, CA, USA)
Dr. Enric Plaza i Cervera (IIIA-CSIC, Bellaterra, Catalonia, Spain)
Dr. Foster J. Provost (NYNEX Science & Technology, White Plains, NY,
USA)
Dr. P. Riddle (University of Auckland, New Zealand)
Dr. Celine Rouveirol (LRI, Paris, France)
Prof. Derek Sleeman (University of Aberdeen, United Kingdom)
Drs. Maarten van Someren (SWI, Amsterdam, The Netherlands)
Prof. Rudi Studer (University of Karlsruhe, Germany)
Organising Committee
Robert Engels (University of Karlsruhe, Germany)
[email protected]
Juergen Herrmann (University of Dortmund, Germany)
[email protected]
Bob Evans (RR Donnelley, Gallatin TN, USA)
[email protected]
Floor Verdenius (ATO-DLO, Wageningen, The Netherlands)
[email protected]
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Marney Smyth <[email protected]>
Subject: Learning Methods Tutorial -- Washington DC, May 1997
Date: Sat, 1 Feb 1997 12:19:02 -0500 (EST)
**************************************************************
*** ***
*** Learning Methods for Prediction, Classification, ***
*** Novelty Detection and Time Series Analysis ***
*** ***
*** Washington, D.C., May 2 -- 3, 1997 ***
*** ***
*** Geoffrey Hinton, University of Toronto ***
*** Michael Jordan, Massachusetts Inst. of Tech. ***
*** ***
**************************************************************
A two-day intensive Tutorial on Advanced Learning Methods will be held
on May 2nd and 3rd, 1997, at the Hyatt Regency on Capitol Hill,
Washington D.C. Space is available for up to 50 participants for the
course.
The course will provide an in-depth discussion of the large collection
of new tools that have become available in recent years for developing
autonomous learning systems and for aiding in the analysis of complex
multivariate data. These tools include neural networks, hidden Markov
models, belief networks, decision trees, memory-based methods, as well
as increasingly sophisticated combinations of these architectures.
Applications include prediction, classification, fault detection,
time series analysis, diagnosis, optimization, system identification
and control, exploratory data analysis and many other problems in
statistics, machine learning and data mining.
The course will be devoted equally to the conceptual foundations of
recent developments in machine learning and to the deployment of these
tools in applied settings. Case studies will be described to show how
learning systems can be developed in real-world settings. Architectures
and algorithms will be presented in some detail, but with a minimum of
mathematical formalism and with a focus on intuitive understanding.
Emphasis will be placed on using machine methods as tools that can
be combined to solve the problem at hand.
WHO SHOULD ATTEND THIS COURSE?
The course is intended for engineers, data analysts, scientists,
managers and others who would like to understand the basic principles
underlying learning systems. The focus will be on neural network models
and related graphical models such as mixture models, hidden Markov
models, Kalman filters and belief networks. No previous exposure to
machine learning algorithms is necessary although a degree in engineering
or science (or equivalent experience) is desirable. Those attending
can expect to gain an understanding of the current state-of-the-art
in machine learning and be in a position to make informed decisions
about whether this technology is relevant to specific problems in
their area of interest.
COURSE OUTLINE
Overview of learning systems; LMS, perceptrons and support vectors;
generalized linear models; multilayer networks; recurrent networks;
weight decay, regularization and committees; optimization methods;
active learning; applications to prediction, classification and control
Graphical models: Markov random fields and Bayesian belief networks;
junction trees and probabilistic message passing; calculating most
probable configurations; Boltzmann machines; influence diagrams;
structure learning algorithms; applications to diagnosis, density
estimation, novelty detection and sensitivity analysis
Clustering; mixture models; mixtures of experts models; the EM
algorithm; decision trees; hidden Markov models; variations on
hidden Markov models; applications to prediction, classification
and time series modeling
Subspace methods; mixtures of principal component modules; factor
analysis and its relation to PCA; Kalman filtering; switching
mixtures of Kalman filters; tree-structured Kalman filters;
applications to novelty detection and system identification
Approximate methods: sampling methods, variational methods;
graphical models with sigmoid units and noisy-OR units; factorial
HMMs; the Helmholtz machine; computationally efficient upper
and lower bounds for graphical models
REGISTRATION
Standard Registration: $700
Student Registration: $400
Cancellation Policy: Cancellation before Friday April 25th, 1997,
incurs a penalty of $150.00. Cancellation after Friday April 25th,
1997, incurs a penalty of one-half of Registration Fee.
Registration Fee includes Course Materials, breakfast, coffee breaks,
and lunch.
On-site Registration is possible. Payment of on-site registration must
be in US Dollar amounts, by Money Order or Check (preferably drawn on
a US Bank account).
Those interested in participating should return the completed
Registration Form and Fee as soon as possible, as the total number of
places is limited by the size of the venue.
Please print this form, and fill in the hard copy to return by mail
REGISTRATION FORM
Learning Methods for Prediction, Classification,
Novelty Detection and Time Series Analysis
Friday, May 2 - Saturday, May 3, 1997
Washington, D.C., USA.
--------------------------------------
Please complete this form (type or print)
Name ___________________________________________________
Last First Middle
Firm or Institution ______________________________________
Standard Registration ____ Student Registration ____
Mailing Address (for receipt) _________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
Country Phone FAX
__________________________________________________________
email address
(Lunch Menu - tick as appropriate):
___ Vegetarian ___ Non-Vegetarian
Fee payment must be made by MONEY ORDER or PERSONAL CHECK. All amounts
are given in US dollar figures. Make fee payable to Prof. Michael
Jordan. Mail it, together with this completed Registration Form to:
Professor Michael Jordan
Dept. of Brain and Cognitive Sciences
M.I.T.
E10-034D
77 Massachusetts Avenue
Cambridge, MA 02139
USA
HOTEL ACCOMMODATION
Hotel accomodation is the personal responsibility of each participant.
The Tutorial will be held in
Hyatt Regency on Capitol Hill
400 New Jersey Avenue, NW
Washington, DC 20001
1-800-233-1234 or (202) 737-1234
on May 2 -- 3, 1997.
The hotel has reserved a block of rooms for participants of the course. The
special room rates for participants are:
U.S. $139.00 (Single/Double) per night + tax
You must reserve accommodation before *April 1, 1997* to avail of this
special rate. Please be aware that these prices do not include State
or City taxes.
ADDITIONAL INFORMATION
A registration form is available from the course's WWW page at
http://www.ai.mit.edu/projects/cbcl/web-pis/jordan/course/
Marney Smyth
E-mail: [email protected]
Phone: 617 258-8928
Fax: 617 258-6779
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 3 Feb 1997 22:47:43 -0600
From: jan zytkow <[email protected]>
Dear Colleague:
You may be interested in the following forthcoming events related to
machine discovery. Please notice that there is still time to submit a
paper to each of these events:
1. PKDD'97 -- 1st European Symposium on Principles of Data Mining
and Knowledge Discovery, Trondheim, Norway, June 25-27, 1997
Deadline for submissions: February 17
2. International Symposium on Methodologies for Intelligent Systems
(ISMIS-97), Charlotte, North Carolina, October 15-18, 1997
Machine discovery and learning is a strong theme at ISMIS
Deadline for submissions: March 1.
3. The Third International Conference on Knowledge Discovery and Data
Mining (KDD-97), Newport Beach, California, August 14-17, 1997
Deadline for submissions: March 10 (Cover page by March 3).
Best regards,
-- Jan Zytkow
------------------------------------------------------------------
1.
------------------------------------------------------------------
New deadline for submitting papers to PKDD-97
The original deadline for submitting papers to the 1997 Principles of
Knowledge Discovery in Databases was Wednesday, February 5. This
deadline has been extended, so that PKDD-97 papers are now due on
Monday, February 17, 1997
Notice of acceptance: March 17
Camera ready copies: April 4
Submit by email (preferred) to [email protected] or by airmail to
Jan Komorowski
Department of Computer Systems
Norwegian University of Science and Technology
7034 Trondheim, Norway
Papers should be in English and not exceed ten single-spaced pages of
12pt font. The first page should begin with title, authors,
affiliations, surface and e-mail addresses, and an abstract of about
200 words.
The proceedings of the Symposium will be published in the Springer
Verlag Lecture Notes AI Series (www.springer.de/comp/comp.html) and
available at PKDD-97, June 25-27.
Watch the updated PKDD'97 WWW page for further details:
http://www.idt.ntnu.no/pkdd97
If you have already sent off your paper but would like to resubmit by
the new deadline, please send email to [email protected]
---------------------------------------------------------------------------
PKDD'97 -- 1st European Symposium on Principles of
Data Mining and Knowledge Discovery
Trondheim, Norway
June 25-27, 1997
Program Committee Introduction
* Pieter Adriaans Data Mining and Knowledge Discovery (KDD)
* Attilio Giordana have recently emerged from a combination of
* David Hand many research areas: databases, statistics,
* Bob Henery machine learning, automated scientific
* Mikhail Kiselev discovery, inductive programming, artificial
* Willi Kloesgen intelligence, visualization, decision
* Yves Kodratoff science, and high performance computing.
* Jan Komorowski
* Heikki Manilla While each of these areas can contribute in
* Marjorie Moulet specific ways, KDD focuses on the value that
* Steve Muggleton is added by creative combination of the
* Zdzislaw Pawlak contributing areas. The goal of PKDD'97 is
* Gregory to provide a European-based forum for
Piatetsky-Shapiro interaction among all theoreticians and
* Zbigniew Ras practitioners interested in data mining.
* Lorenza Saitta Fostering an interdisciplinary collaboration
* Erik Sandewall is one desired outcome, but the main
* Wei-Min Shen long-term focus is on theoretical principles
* Arno Siebes for the emerging discipline of KDD,
* Andrzej Skowron especially those new principles that go
* Derek Sleeman beyond each of the contributing areas.
* Shusaku Tsumoto
* Raul Valdes-Perez To promote these goals, PKDD'97 will be
* Rudiger Wirth organized into tracks around the key areas
* Stefan Wrobel contributing to KDD. For each area an ideal
* Wojtek Ziarko paper should focus on how its methods
* Jan Zytkow advance KDD's goals and principles.
Both theoretical and applied submissions are
sought. Reviewers will assess the
contribution towards the main goals of
PKDD'97, in addition to the usual
requirements of novelty, clarity and
significance. Applied papers should go
beyond an individual application, presenting
an explicit method that promises a degree of
generality within some stage of the
discovery process, such as preprocessing,
mining, visualization, use of prior
knowledge, knowledge refinement, and
evaluation. Theoretical papers should
demonstrate how they advance the process of
data mining and knowledge discovery.
------------------------------------------------------------------
2.
------------------------------------------------------------------
**** C A L L F O R P A P E R S ****
TENTH INTERNATIONAL SYMPOSIUM ON
METHODOLOGIES FOR INTELLIGENT SYSTEMS (ISMIS'97)
Hilton Hotel, Charlotte, North Carolina
October 15-18, 1997
SPONSORS
UNC-Charlotte, Oak Ridge National Laboratory, Univ. of Warsaw, and others.
PURPOSE OF THE SYMPOSIUM
This Symposium is intended to attract individuals who are actively
engaged both in theoretical and practical aspects of intelligent systems.
The goal is to provide a platform for a useful exchange between
theoreticians and practitioners, and to foster the cross-fertilization
of ideas in the following areas:
* Evolutionary Computation
* Intelligent Information Systems
* Learning and Knowledge Discovery
* Knowledge Representation and Integration
* Logic for Artificial Intelligence
* Robotics, Motion and Machine Vision
* Soft Computing
* Methodologies (modeling, design, validation, performance evaluation).
In addition, we solicit papers dealing with Applications of Intelligent
Systems in complex/novel domains, e.g. human genome, global change,
manufacturing, health care, etc.
SYMPOSIUM CHAIRS
Francois G. Pin (Oak Ridge National Lab.)
Zbigniew W. Ras (UNC-Charlotte & Polish Acad. Sci.)
Andrzej Skowron (U. Warsaw, Poland)
PROGRAM COMMITTEE
Luigia Carlucci Aiello (U. Roma, Italy)
Thomas Baeck (Inf. Centrum Dortmund & U. Leiden, The Netherlands)
Alan Biermann (Duke Univ.)
Jacques Calmet (U. Karlsruhe, Germany)
Jaime Carbonell (CMU)
Wesley Chu (UCLA)
Kenneth DeJong (GMU)
Robert Demolombe (CERT/ONERA, France)
Jon Doyle (MIT)
Toshio Fukuda (Nagoya U., Japan)
Attilio Giordana (U. Torino, Italy)
Diana Gordon (Naval Research Lab.)
Mirsad Hadzikadic (Carolinas HealthCare System)
Jiawei Han (Simon Fraser U., Canada)
David Hislop (Army Research Office)
Matthias Jarke (RWTH Aachen, Germany)
John Y. Jiang (Pacific Bell Lab.)
Willi Kloesgen (GMD, Germany)
Yves Kodratoff (U. Paris VI, France)
Jan Komorowski (U. Trondheim, Norway)
Alberto Martelli (U. Torino, Italy)
Robert Meersman (U. Brussels, Belgium)
Zbigniew Michalewicz (UNC-Charlotte & Polish Acad. Sci.)
Ryszard Michalski (GMU & Polish Acad. Sci.)
Jack Minker (U. Maryland)
Ephraim Nissan (U. Greenwich, UK)
Lin Padgham (RMIT U., Australia)
Rohit Parikh (CUNY)
Lynne Parker (ORNL)
Gregory Piatetsky-Shapiro (GTE Lab.)
Henri Prade (U. Paul Sabatier, France)
Luc De Raedt (U. Leuven, Belgium)
Marek Rusinkiewicz (MCC)
Lorenza Saitta (U. Torino, Italy)
Erik Sandewall (Linkoping U., Sweden)
Yoav Shoham (Stanford U.)
Richmond Thomason (U. Pittsburgh)
Jing Xiao (UNCC)
Carlo Zaniolo (UCLA)
Gian Piero Zarri (CNRS, France)
Maria Zemankova (NSF)
Jan M. Zytkow (Wichita State U. & Polish Acad. Sci.)
INVITED TALKS
Alan Biermann (Duke Univ.)
"Multimedia Dialogue: Theory and Practice"
Jaime Carbonell (CMU)
"Automated Text Summarization" or "Learning from the WEB"
Wesley Chu (UCLA)
"A knowledge-based multimedia medical distributed database system"
Michael Lowry (NASA Ames)
"V&V of AI systems that control deep-space spacecraft"
Gregory Piatetsky-Shapiro (GTE Lab.)
"Data Mining and Knowledge Discovery: The Second Generation"
Gio Wiederhold (Stanford U.)
"Achieving scalibility through an Ontology Algebra"
ORGANIZING COMMITTEE
Brian Bachman (First Union)
Mirsad Hadzikadic (Carolinas HealthCare System)
Karen Harber (ORNL)
Mieczyslaw Klopotek (Polish Acad. Sci.)
M.S. Narasimha (IBM-Charlotte)
Zbigniew W. Ras (UNC-Charlotte)
PAPER SUBMISSION
Authors are invited to submit four copies of their manuscript
(maximum 12 pages) to one of the addresses below:
Papers from US and Canada: Papers from Europe:
Francois G. Pin, ISMIS'97 Andrzej Skowron, ISMIS'97
ORNL, Bldg. 7601, M.S. 6305 Univ. of Warsaw
P.O. Box 2008 Dept. of Mathematics
Oak Ridge, TN 37831-6305 Banacha 2
e-mail: [email protected] PL-02-097 Warsaw, POLAND
fax: 423-574-4624 e-mail: [email protected]
tel: 423-574-6130 tel: 48-(22)-658-3449
All other papers:
Zbigniew W. Ras, ISMIS'97
Univ. of North Carolina
Dept. of Comp. Science
Charlotte, N.C. 28223
e-mail: [email protected]
fax: 704-547-3516
tel: 704-547-4567
Submissions should include a title page (1 copy) specifying the
title, all authors with their affiliations, abstract (100-200 words),
up to 10 keywords (begin the keyword list with at least one of the
ISMIS areas listed above); and the preferred address of the contact
author, including a telephone number, fax number, and e-mail address
(if available). The remainder of the paper can include up to 11 pages,
attached to the title page.
If possible, the title page should be ADDITIONALLY submitted via email
(in plain text) to <[email protected]> to facilitate submissions processing.
IMPORTANT DATES
Submission of Papers: March 1, 1997
Acceptance Notification: May 25, 1997
Final Paper: July 1, 1997
PUBLICATION
Papers accepted for Regular Sessions will be published by
Springer-Verlag in LNCS/LNAI.
Poster Session proceedings will be published by Oak Ridge
National Laboratory.
Both proceedings will be available at the symposium.
WWW
Information about ISMIS'97 can be found on
http://www.ipipan.waw.pl/~klopotek/ismis97.html
------------------------------------------------------------------
3.
------------------------------------------------------------------
The Third International Conference on
Knowledge Discovery and Data Mining (KDD-97)
August 14-17, 1997
Newport Beach, California, U.S.A.
Sponsored by the American Association for Artificial Intelligence
----------------------------------------------------------------------------
Call for Papers
The rapid growth of data and information has created a need and
an opportunity for extracting knowledge from databases, and both
researchers and application developers have been responding to that need.
Knowledge discovery in databases (KDD), also referred to as data mining, is
an area of common interest to researchers in machine discovery, statistics,
databases, knowledge acquisition, machine learning, data visualization, high
performance computing, and knowledge-based systems. KDD applications have
been developed for astronomy, biology, finance, insurance, marketing,
medicine, and many other fields.
The third international conference on knowledge discovery and
data mining (KDD-97) will follow up the success of KDD-95 and KDD-96
by bringing together researchers and application developers from
different areas focusing on unifying themes.
Suggested Topics
The topics of interest include, but are not limited to:
Theory and Foundational Issues in KDD
* Data and knowledge representation for KDD
* Probabilistic modeling and uncertainty management in KDD
* Modeling of structured, unstructured and multimedia data
* Fundamental advances in search, retrieval, and discovery methods
* Definitions, formalisms, and theoretical issues in KDD
Data Mining Methods and Algorithms
* Algorithmic complexity, efficiency and scalability issues in data
mining
* Probabilistic and statistical models and methods
* Using prior domain knowledge and re-use of discovered knowledge
* Parallel and distributed data mining techniques
* High dimensional datasets and data preprocessing
* Unsupervised discovery and predictive modeling
KDD Process and Human Interaction
* Models of the KDD process
* Methods for evaluating subjective relevance and utility
* Data and knowledge visualization
* Interactive data exploration and discovery
* Privacy and security
Applications
* Data mining systems and data mining tools
* Application of KDD in business, science, medicine and engineering
* Application of KDD methods for mining knowledge in text, image, audio,
sensor, numeric, categorical or mixed format data
* Resource and knowledge discovery using the Internet
This list of topics is not intended to be exhaustive but an indication of
typical topics of interest. Prospective authors are encouraged to submit
papers on any topics of relevance to knowledge discovery and data mining.
Demonstration Sessions
KDD-97 also invites working demonstrations of discovery systems.
Contact information for details is provided below.
Submission and Review Criteria
Both research and applications papers are solicited. All submitted papers
will be reviewed on the basis of technical quality, relevance to KDD,
novelty, significance, and clarity. Authors are encouraged to make their
work accessible to readers from other disciplines by including a carefully
written introduction. Papers should clearly state their relevance to KDD.
Please submit 7 hardcopies of a short paper (a maximum of 9
single-spaced pages not including cover page and bibliography, 1 inch
margins, and 12pt font) to be received by March 10, 1997. A cover
page must include author(s) full address, email, paper title and a 200
word abstract, and up to 5 keywords. This cover page must accompany
the paper. In addition, an ascii version of the cover page must be
submitted electronically by March 3, 1997 (earlier if possible),
preferably using a WWW form located at
http://www-aig.jpl.nasa.gov/kdd97/. If the WWW form cannot be used,
please submit the ascii cover page by email to
[email protected], using the template available by ftp at
http://www-aig.jpl.nasa.gov/kdd97/.
Please mail the 7 hardcopies of the full papers to:
AAAI (KDD-97)
445 Burgess Drive
Menlo Park, CA 94025-3496 USA
Phone: (+1 415) 328-3123
Fax: (+1 415) 321-4457
Email: [email protected]
Web Site: http://www.aaai.org.
Important Dates
* Submissions Due: March 10, 1997
* Acceptance Notice: April 28, 1997
* Camera-ready paper due: May 26, 1997
KDD-97 Organization
-------------------
General Conference Chair
Ramasamy Uthurusamy (General Motors Corporation, USA)
Program Co-Chairs
David Heckerman (Microsoft Research, USA)
Heikki Mannila (University of Helsinki, Finland)
Daryl Pregibon (AT&T Research, USA)
Publicity Chair
Paul Stolorz (Jet Propulsion Laboratory, USA)
Tutorial Chair
Padhraic Smyth (UC Irvine, USA)
Demo and Poster Sessions Chair
Tej Anand (NCR Corporation, USA)
Awards Chair
Gregory Piatetsky-Shapiro (GTE Laboratories, USA)
Panel Chair
Willi Kloesgen
Contact Information
-------------------
For further information, send inquiries regarding
* submission logistics to AAAI at [email protected]
Phone: (+1 415) 328-3123
Fax: (+1 415) 321-4457
* KDD-97 sponsorship and industry participation to
Ramasamy Uthurusamy [email protected]
Phone: 810-696-0669
Fax: 810-696-0580
* technical program and content to [email protected]
* demo and poster sessions to [email protected]
* general and publicity issues to [email protected]
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
410.15 | 97:06 | IJSAPL::OLTHOF | Spellchecked Henry Although | Wed Feb 12 1997 22:35 | 710 |
| Knowledge Discovery Nuggets 97:06, e-mailed 97-02-12
News:
* E. Colet, ESPN to regularly show the application of data mining
http://www.nba.com/allstar97/asgame/beyond.html
* K. Parsaye, IDI Press Release: "Bridge Between OLAP and Data Mining"
Publications:
* R. Greiner, CLNL 4: Computational Learning Theory and Natural
Learning Systems, v. IV: Making Learning Systems Practical,
http://www-mitpress.mit.edu/mitp/recent-books/comp/greop.html
* R. Kohavi, MLJ Spec Issue on Applications of Machine Learning
and the Knowledge Discovery Process, deadline: March 4.
http://reality.sgi.com/ronnyk/mljapps/
Positions:
* H. Mannila, Postdoctoral position in data mining /
pattern matching / spatial data,
http://www.cs.helsinki.fi/~mannila
* F. Provost, KB system developer positions at NYNEX
Science and Technology
Meetings:
* S. Cartmell, PADD 97 update --
http://www.demon.co.uk/ar/PADD97/
* B. Zupan, IDAMAP-97: Reminder and brief Second CFP
* G. Widmer, ECML'97 - Papers & Registration Info
http://is.vse.cz/ecml97/home.html
--
Knowledge Discovery Nuggets is a free electronic newsletter for the Data
Mining and Knowledge Discovery in Databases (KDD) community, focusing on
the latest research and applications.
Submissions are most welcome and should be emailed,
with a DESCRIPTIVE subject line (and a URL, when available) to [email protected]
To subscribe, email to [email protected] message with
subscribe kdd-nuggets
in the first line (the rest of the message and subject are ignored).
See http://info.gte.com/~kdd/subscribe.html for details.
Nuggets frequency is approximately 3-4 times a month.
Back issues of Nuggets, a catalog of S*i*ftware (data mining tools),
and a wealth of other information on Data Mining and Knowledge Discovery
is available at Knowledge Discovery Mine site http://info.gte.com/~kdd
-- Gregory Piatetsky-Shapiro (editor)
********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) *
* and not necessarily of their respective employers (or GTE Laboratories) *
*****************************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Arguing with engineers is like mud-wrestling with pigs.
Sooner or later you'll realize that they like it.
Thanks to Tom Lanning
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "Edward Colet"<[email protected]>
Date: Tue, 11 Feb 1997 16:11:17 -0400
Subject: "ESPN to regularly show the application of data mining"
On Sunday mornings from 9:00-9:30 (EST), ESPN will regularly broadcast
a show called "NBA Matchups presented by IBM". The show will feature
in-depth analysis of player and team match-ups based on trends and patterns
found by Advanced Scout that pertain to the National Basketball Association
(NBA) game of the week. The game of the week is aired later that afternoon
on NBC. Bob Hill (former coach of the San Antonio Spurs), Fred "Mad Dog"
Carter and Mark Jones (both of ESPN) are the hosts, and an invited guest
will round out the panel (last week's guest was Red Auerbach). As some of you
may know, several NBA coaches have been using IBM's Advanced Scout data
mining application to discover trends and patterns in game data. Advanced
Scout is also the basis for the "Beyond the Box Score" feature on the
NBA website (www.nba.com. Look under "News and Features"
if you don't see it off the home page).
Thanks,
Ed Colet.
*********************************************
IBM T.J. Watson Research Center
30 Saw Mill River Road
Hawthorne NY 10532
phone: 914-784-6621; tie-line 863
fax: 914-784-7455
email: [email protected]
*********************************************
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 5 Feb 1997 10:10:31 -0800
From: [email protected] (IDI)
Subject: OLAP & DM Press Release
************************************************************************
Special Release
CONTACT: IDI MARKETING COMMUNICATIONS
(310) 937-3600
Breakthrough Merges
OLAP and DataMining
The Bridge Between OLAP and Data Mining
Impacts all Corporate Decision Support Plans
Los Angeles -- January 27, 1997
The 2nd Annual Data Mining Summit in San Francisco, California on February
19, 1997 is likely to be remembered as the event in which On Line Analytical
Processing (OLAP) and datamining came together for the first time and took
uniform shape.
Up until now, most corporations had considered data mining and OLAP as
individual and disparate components of their decision support system,
because no coherent theory and methodology existed for a relationship. The
1997 Data Mining Summit will bridge this gap and will forever change the way
corporations view and use decision support systems.
At the Keynote Address for the Summit, Dr. Kamran Parsaye, CEO of
Information Discovery, Inc. will introduce a fundamentally new theory and
methodology for connecting OLAP and data mining, showing that they must be
merged in order to avoid incorrect and misleading results during data analysis.
"The bridge between OLAP and data mining is not a luxury but a necessity,"
said Dr. Parsaye. "OLAP analyses and datamining need to be performed
together if we are to trust the results from either" he added. "In the early
days of relational databases, before normalization theory was introduced,
people were getting incorrect results. Now, unless OLAP and data mining are
performed together a similar situation can prevail" he said.
The keynote address will show that whenever data analysis takes place, it
happens within some "dimension", and datamining along a single axis is
merely a rough approximation of multi-dimensional mining. Lack of attention
to dimensionality in data mining can result in unexpected results. And,
decision support errors can take a long time to be uncovered -- if ever. A
companion paper in the February issue of Database Programming and Design
magazine details examples of this phenomena and outlines a uniform approach
for dealing with both OLAP and datamining.
At the keynote, Dr. Parsaye will also describe how OLAP and data mining fit
together in the context of the Four Spaces of Decision Support. This
methodology for applying OLAP data mining has three distinct processes of
episodic, strategic and continuous mining for specific user groups within
corporate environments.
"Integration between OLAP and data mining can not take place at the desktop
level and must be performed on the server" said Dr. Parsaye. "IS departments
that hand their users OLAP data to be mined on the desktop could be
unknowingly getting their users into serious trouble" he said.
The impact of the new result on corporate planning for decision support and
data warehousing can be significant. Business users and IS departments can
no longer just consider an OLAP product and a separate data mining system
but will need to consider both at once to avoid the pitfalls outlined in the
keynote. This will also accelerate the use of products for both OLAP and
data mining.
For more information on the DataMining Summit please visit
http://www.dbsummit.com on the internet, or call (415) 905 2267. For more
information on Information Discovery, Inc. please visit
http://www.datamining.com on the internet.
[note: any comments from readers on appropriateness of posting
commercial press releases such as above? GPS]
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 3 Feb 1997 18:56:37 -0500 (EST)
From: Russell Greiner <[email protected]>
To: [email protected], [email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected]
Subject: CLNL v4 is here!
CC: [email protected], [email protected]
Content-Length: 363
We are pleased to announce that the book
"Computational Learning Theory and Natural Learning Systems
Volume IV: Making Learning Systems Practical"
(ed. Russell Greiner, Thomas Petsche, and Stephen Jose Hanson)
is now available from MIT Press; see
http://www-mitpress.mit.edu/mitp/recent-books/comp/greop.html
for details.
Cheers,
Russ Greiner
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Tue, 4 Feb 1997 23:23:12 -0800
From: Ronny Kohavi <[email protected]>
Subject: CFP: Special Machine Learning issue on applications of ML
This is a short reminder that the submission deadline for the special
issue of Machine Learning is in a few weeks. For more information, see
http://reality.sgi.com/ronnyk/mljapps/
*** Submission deadline: 4 Mar 1997
_____________________________________________________________________________
Machine Learning
Special Issue on
Applications of Machine Learning
and the Knowledge Discovery Process
Guest editors: Ronny Kohavi and Foster Provost
With the explosion in size of business and scientific databases
(VLDBs), the opportunities and pressure to mine the data and make
novel discoveries have increased dramatically. For many problems,
basic statistical summaries are not sufficient and there is a clear
and recognized need for solutions involving a machine learning
component. For example, modern businesses constantly seek to gain
competitive advantage by tailoring actions to different customer
segments and avoiding the trap of targeting the "average customer."
This special issue of the journal Machine Learning will be dedicated
to papers describing work in which machine learning technologies have
been applied to solve significant real-world problems. In particular,
it will focus on the application of Machine Learning technology, the
simplifying assumptions that *cannot* be made in a real-world
application, and the processes that are involved in going from the raw
data to the final knowledge that decision makers seek.
_____________________________________________________________________________
Ronny Kohavi and Foster Provost
[email protected]
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Wed, 12 Feb 1997 09:58:29 +0500
Subject: 3 positions at NYNEX S&T
KB system developer positions at NYNEX Science and Technology
The Integrated Network Services Testing & Analysis (INSTA) group at
NYNEX Science & Technology has three openings for knowledge-based (KB)
diagnostic system developers. The group is involved in building
monitoring, testing and diagnostic systems using state of the art AI
technologies for advanced Telecom networks and circuits. The group has
been building systems that support complete testing and diagnosis of
circuits, both from the central office and in the field. Systems
already built and deployed test and diagnose residential telephone
lines and some of the business services. In addition to these, the
group is currently looking at ISDN and broadband services.
The selected candidate would work on one or more of the following
projects:
- Building KB system for assisting field technicians out in the field in
testing and troubleshooting faults in telecomm circuits. The candidate
will also explore complementing this KB with the KB performing
centralized testing from the Central Office.
- Building KB system for automated centralized testing and diagnosis of
Special (Buisness) service circuits.
- Building monitoring, testing, and diagnostic systems for broadband
circuits.
- Building an intelligent interactive assistant to aid testers in
testing and diagnosing circuits.
Suitable candidates must have the following:
===========================================
- Background in Computer Science or Computer Engineering or Electrical
Engineering.
- Experience in all aspects of building knowledge-based systems
including knowledge acquisition, knowledge engineering, domain and
task modeling, testing, validation, and evaluation of the
knowledge-based systems.
- Good understanding of various AI techniques such as model-based
reasoning, case-based reasoning, neural nets, and machine learning.
- Good analytical skills.
- Quick learner - to quickly acquire relevant domain knowledge.
- Good system building experience
Experience with the following would be a plus:
===================================================================
- Knowledge of data-analysis tools (eg: statistical tools)
- Unix, C, C++, LISP, ARTIM, CLIPS...
- Distributed Client server Architectures
- databases, database wharehousing
- Telecomm experience: Operation Support Systems, Residential Lines,
Special services, broadband services, telecomm network and circuit
testing, alarm monitoring etc.
If interested, please mail a hard copy of your resume to:
Yuling Wu
NYNEX Science & Technology
400 Westchester Av.
White Plains, NY 10604
or email the postscript version to:
[email protected]
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 7 Feb 1997 14:21:24 +0200 (EET)
From: Heikki Mannila <[email protected]>
Subject: Postdoc position in Helsinki: data mining / pattern matching
/ spatial data
Content-Length: 1366
Postdoctoral position in
data mining / pattern matching / spatial data
University of Helsinki
Department of Computer Science
The pattern matching and data mining group in the Department of Computer
Science, University of Helsinki, has an opening for a postdoc researcher
in the areas of data mining, pattern matching, or spatial data.
The research group combines methods from pattern matching, statistics,
and databases to develop methods for the analysis of large data sets.
The group does theoretical and applied research. Currently, special
emphasis is given to work related to bioinformatics and geoinformatics.
The group is one of the leading ones in data mining and string matching.
For further information, see
http://www.cs.helsinki.fi/~mannila
http://www.cs.helsinki.fi/~ukkonen
Applicants should have a recent Ph.D. or equivalent. The appointment
is initially for one year, starting from September 1997.
Applications should contain a curriculum vita, a list of three referees
and a letter addressing the applicant's suitability for the position.
Applications and inquiries may be submitted by email to
[email protected] or [email protected]
before February 28, 1997.
Heikki Mannila Esko Ukkonen
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 5 Feb 1997 18:21:51 +0000
From: Steve Cartmell <[email protected]>
Subject: PA EXPO97 UPDATE
PRACTICAL APPLICATION EXPO97
==============================
CONFERENCE UPDATE
===================
Westminster Central Hall, London, 21-25 April, 1997
The Practical Application EXPO97 brings together four events under one
roof: PAAM97 - The Practical Application of Intelligent Agents and
Multi-Agents; PADD97- The Practical Application of Knowledge Discovery and
Data Mining; PACT97-The Practical Application of Constraint Technology and
PAP97-The Practical Application of Prolog.
PLEASE VISIT OUR RECENTLY UPDATED WEB PAGES FOR FURTHER INFORMATION ON
Tutorials
Invited Talks
Exhibition
Venue
Hotel reservations
Registration
http://www.demon.co.uk/ar/Expo97/
http://www.demon.co.uk/ar/PAP97/
http://www.demon.co.uk/ar/PACT97/
http://www.demon.co.uk/ar/PAAM97/
http://www.demon.co.uk/ar/PADD97/
The Practical Application Company
PO Box 137
Blackpool
Lancs FY2 9UN
UK
Tel: +44 (0)1253 358081
Fax: +44 (0)1253 353811
email: [email protected]
WWW: http://www.demon.co.uk/ar/TPAC/
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Blaz Zupan <[email protected]>
Subject: IDAMAP-97: Reminder and brief Second CFP
Date: Wed, 5 Feb 1997 10:35:20 +0100 (MET)
Reminder and brief Second Call for Papers for
IDAMAP-97
INTELLIGENT DATA ANALYSIS IN MEDICINE AND PHARMACOLOGY
Saturday, August 23, 1997
Workshop W15 at IJCAI-97
August 23-29, 1997, Nagoya, Japan
Paper submission deadline is March 3, 1997. Submit 8-12 page papers by
e-mail (postscript) and 3 hard-copies by surface mail to:
Nada Lavrac, Blaz Zupan
J. Stefan Institute, Jamova 39, SI-1000 Ljubljana, Slovenia
email: [email protected]
For up-to-date workshop information please check:
http://www-ai.ijs.si/ailab/activities/idamap97.html
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 10 Feb 1997 18:00:38 +0100 (MET)
From: Gerhard Widmer <[email protected]>
Subject: ECML'97 - Papers & Registration Info
-------------------------------------------------------------------------
NINTH EUROPEAN CONFERENCE ON MACHINE LEARNING (ECML-97)
Prague, Czech Republic, April 23-26 1997
******************************************************
ECML'97: LIST OF ACCEPTED PAPERS and REGISTRATION INFO
******************************************************
-------------------------------------------------------------------------
The list of accepted papers, INCLUDING ALL ABSTRACTS, is now
available from the ECML-97 WWW home page:
http://is.vse.cz/ecml97/home.html
This page also gives access to
- the 4 post-conference ECML/MLNet WORKSHOPS and
- ECML-97 REGISTRATION INFORMATION and the ECML REGISTRATION FORM.
- A preliminary version of the CONFERENCE PROGRAMME will be available soon.
For further questions about the program, contact Gerhard Widmer
at [email protected], for questions regarding registration,
contact the local organizers at [email protected].
For those without access to the WWW, please find below
- titles and contact addresses for the 4 MLNet workshops,
- the list of papers (w/o abstracts),
- an ascii version of the registration form.
------------------------------------------------------------------
ECML / MLNet WORKSHOPS (Saturday, April 26):
WS 1: Data-Driven Learning of Natural Language Processing Tasks
Contact: Walter Daelemans,
P.O. BOX 90153, NL-5000 LE Tilburg, The Netherlands.
Tel: +31 13 4663070, Fax: +31 13 4663110,
E-mail: [email protected]
WS1 WWW Page: http://www.cs.unimaas.nl/ecml97/
WS 2: Case-Based Learning: Beyond Classification of Feature Vectors
Contact: Dietrich Wettschereck,
GMD, FIT.KI, Schloss Birlinghoven,
53754 Sankt Augustin, Germany
Tel: +49-2241-14-2097, Fax: +49-2241-14-2072,
E-mail: [email protected]
WS2 WWW Page: http://nathan.gmd.de/persons/dietrich.wettschereck/ecmlws.html
WS 3: Learning in Dynamically Changing Domains:
Theory Revision and Context Dependence Issues
Contact: Gholamreza Nakhaeizadeh,
Research Center of Damiler-Benz AG, Ulm, Germany
E-mail: [email protected]
WS3 WWW Page: http://www.amsta.leeds.ac.uk/statistics/ecml97/dyn.htm
WS 4: Machine Learning and Human-Agent Interaction
Contact: Michael Kaiser,
Institute for Real-Time Computer Systems & Robotics
University of Karlsruhe, Kaiserstrasse 12,
D-76128 Karlsruhe, Germany
E-Mail: [email protected]
WS4 WWW Page: http://wwwipr.ira.uka.de/events/hai97/
Common dates for all workshops:
Deadline for submissions: February 15
Notification of acceptance: March 8
Camera-ready copy due: April 1
------------------------------------------------------------------
PAPERS ACCEPTED FOR PRESENTATION AT ECML'97:
INVITED TALKS / PAPERS:
Learning Complex Probabilistic Models (tentative title)
Stuart J. Russell, University of California, Berkeley, USA
Constructing and Sharing Perceptual Distinctions
Luc Steels, Free University of Brussels (VUB) and
Sony Computer Science Laboratory, Paris
On Prediction by Data Compression
Paul Vitanyi, CWI, Amsterdam
Ming Li, City University of Hong Kong
LONG TALKS/PAPERS:
Induction of Feature Terms with INDIE
Eva Armengol & Enric Plaza, IIIA, Barcelona, Spain
Integrated Learning and Planning Based on Truncating Temporal Differences
Pawel Cichosz, Warsaw University of Technology, Warsaw, Poland
Theta-subsumption for Structural Matching
Luc De Raedt, Katholieke Universiteit Leuven, Belgium
Peter Idestam-Almquist, Stockholm University, Sweden
Gunther Sablon, Katholieke Universiteit Leuven, Belgium
Constructing Intermediate Concepts by Decomposition of Real Functions
Janez Demsar, Blaz Zupan, Marko Bohanec, Ivan Bratko
University of Ljubljana and Jozef Stefan Institute, Ljubljana, Slovenia
Conditions for Occam's Razor Applicability and Noise Elimination
Dragan Gamberger, Rudjer Boskovic Institute, Zagreb, Croatia
Nada Lavrac, Jozef Stefan Institute, Ljubljana, Slovenia
Learning Different Types of New Attributes by Combining
the Neural Network and Iterative Attribute Construction
Yuh-Jyh Hu, University of California, Irvine, USA
Finite-Element Methods with Local Triangulation Refinement
for Continuous Reinforcement Learning Problems
Remi Munos, CEMAGREF, Antony, France
Compression-based Pruning of Decision Lists
Bernhard Pfahringer, University of Waikato, New Zealand
NeuroLinear: A System for Extracting Oblique Decision Rules
from Neural Networks
Rudy Setiono & Huan Liu, National University of Singapore
Model Combination in the Multiple-data-batches Scenario
Kai Ming Ting, University of Waikato, New Zealand
Boon Toh Low, Chinese University of Hong Kong
Natural Ideal Operators in Inductive Logic Programming
Fabien Torre & Celine Rouveirol, LRI, Paris, France
Ibots Learn Genuine Team Solutions
Cristina Versino & Luca Maria Gambardella, IDSIA, Switzerland
Global Data Analysis and the Fragmentation Problem in Decision Tree Induction
Ricardo Vilalta, Gunnar Blix, Larry Rendell,
University of Illinois at Urbana-Champaign, USA
SHORT TALK/PAPERS:
Exploiting Qualitative Knowledge to Enhance Skill Acquisition
Cristina Baroglio, Universita di Torino, Italy
Classification by Voting Feature Intervals
G"ulsen Demir"oz & H. Altay G"uvenir,
Bilkent University, Ankara, Turkey
Metrics on Terms and Clauses
Alan Hutchinson, King's College, London, UK
Learning When Negative Examples Abound
Miroslav Kubat, Robert Holte, Stan Matwin, University of Ottawa, Canada
A Model for Generalization based on Confirmatory Induction
Nicolas Lachiche, INRIA Looraine, France
Pierre Marquis, Universite d'Artois, France
Learning Linear Constraints in Inductive Logic Programming
Lionel Martin & Christel Vrain, Universite d'Orleans, France
Inductive Genetic Programming with Decision Trees
Nikolay Nikolaev, American University in Bulgaria
Vanyo Slavov, New Bulgarian University, Sofia, Bulgaria
Parallel and Distributed Search for Structure in Multivariate Time Series
Tim Oates, Matthew Schmill, Paul Cohen
University of Massachusetts, Amherst, USA
Probabilistic Incremental Program Evolution: Stochastic Search
Through Program Space
Rafal Salustowicz & J"urgen Schmidhuber, IDSIA, Switzerland
The GRG Knowledge Discovery System:
Design Principles and Architectural Overview
Ning Shan, Macro International Inc., Calverton, MD, USA
Howard Hamilton & Nick Cercone, University of Regina, Canada
Learning and Exploitation do not Conflict under Minimax Optimality
Csaba Szepesvari, University of Szeged, Hungary
Search-based Class Discretization
Luis Torgo & Joao Gama, University of Porto, Portugal
A Case Study in Loyalty and Satisfaction Research
Koen Vanhoof, Josee Bloemer, K. Pauwels
Limburgs Universitair Centrum, Belgium
---------------------------------------------------------------------
REGISTRATION FORM - ECML 97
(The deadline: March 25, 1997)
TO BE FAXED (42-2) 6731 0503 OR MAILED Action M Agency,
note, please Vrsovicka 68
that after March 1, 1997, 101 00 - Praha 10
the country number (42) Czech Republic
will be changed to (420)
FILL IN CAPITAL LETTERS, PLEASE
last name: first name:
Prof./Dr./Mr./Ms. affilliation:
university/dept.:
street: town:
Code: country:
phone: fax:
e-mail:
name of accompanying person(s):
date (time) of arrival: date of departure:
number of nights:
I will attend workshop: 1. 2. 3. 4. (tick, please)
ACCOMMODATION: krystal hotel (Conference site)
an individual choice up to price per night:
Room: single double NAME OF PERSON SHARING THE ROOM:
special needs (vegetarian, disabled, etc.):
CONFERENCE FEES: BEFORE / AFTER FEBRUARY 20, 1997
CONFERENCE FEE (APRIL 23-25) DM 270.00 / 320.00
MLNet WORKSHOP FEE (APRIL 26) DM 35.00 / 35.00
ACCOMPANYING PERSON FEE DM 80.00 / 100.00
ACCOMMODATION DEPOSIT: DM 150. 00
ACCOMMODATION BALANCE:
(NUMBER OF NIGHTS MINUS THE DEPOSIT )
SOCIAL PROGRAM:
SIGHTSEEING TOUR OF PRAGUE DM 25. 00
TA FANTASTIKA THEATRE DM 27. 00
TRIP & FAREWELL PARTY DM 65. 00
TOTAL AMOUNT:
PAYMENT BY CREDIT CARD:
AMEX VISA Master Card / Eurocard JCB Diners
club
Number:
Expire: / Four-numbers code (for amex
cards only): / / / /
I, the undersigned, give the authorization to the Action M
Agency to withdraw from my account the equivalent in Czech Crowns
of
the total amount of DM Your Signature
I agree to withdraw from my credit card
the accommodation balance (after March 25) Your Signature
PAYMENT BY BANK TRANSFER:
Name of the bank
Date of payment Your Signature
|
410.16 | 9707 | IJSAPL::OLTHOF | Spellchecked Henry Although | Sat Mar 01 1997 14:29 | 1415 |
| Knowledge Discovery Nuggets 97:07, e-mailed 97-02-24
Publications:
* GPS, Review of Adv. KDDM in NeuroVe$t journal
Siftware:
* R. Kohavi, SGI MineSet Available for Varsity Members
http://www.sgi.com/Products/software/MineSet
Positions:
* T. Gutschow, Data Mining Research Position at HNC Software Inc.
* C. Shearer, Vacancies - Data Mining Tool Development & Consulting :
UK & US, at ISL
* W. Zhang, Job: Machine Learning at Boeing
Meetings:
* M. P. Singh, 2nd CFP: Workshop on Agent Theories, Architectures,
and Languages (ATAL), Providence, RI, July 24-26, 1997
http://www.csc.ncsu.edu/faculty/mpsingh/activities/atal/
* H. M. Chung, CFP: track on Data Mining at AIS-97,
Indianapolis, Indiana, August 15-17, 1997
http://hsb.baylor.edu/ramsower/ais.ac.97
* L. DeRaedt, CFP: IJCAI-97 workshop on Frontiers of Inductive
Logic Programming, 25 August 1997
* M. Manago, 2 days course on Data Mining & CBR in San Francisco for
U. of Berkeley Extension, March 24-25, 1997
* M. Manago, Tutorial + Seminar on CBR & Data Mining,
London, 17-19 March 1997,
http://www.unicom.co.uk
--
Knowledge Discovery Nuggets is a free electronic newsletter for the
Data Mining and Knowledge Discovery community, focusing on the latest
research and applications. Submissions should be emailed,
with a DESCRIPTIVE subject line (and a URL, when available) to [email protected]
To subscribe, email to [email protected] message with
subscribe kdd-nuggets
in the first line (the rest of the message and subject are ignored).
See http://info.gte.com/~kdd/subscribe.html for details.
Nuggets frequency is approximately 3 times a month.
Back issues of KD Nuggets, a catalog of S*i*ftware (data mining tools),
and a wealth of other information on Data Mining and Knowledge Discovery
is available at Knowledge Discovery Mine site http://info.gte.com/~kdd/
-- Gregory Piatetsky-Shapiro (editor)
(p.s. this is my last week at GTE.
Starting today, I can be reached at [email protected] .
After March 1, 1997 I will continue to edit and distribute KD Nuggets
and maintain KD Mine pages at a new web site -- details to be announced soon!
The [email protected] and [email protected] email addresses would still
work for a while. GPS)
********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) *
* and not necessarily of their respective employers (or GTE Laboratories) *
*****************************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Q: What is the link between a large number of meetings and a
large number of job announcements?
A: Somebody got to work, while all those other people go to meetings
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Sun, 16 Feb 1997 12:20:06 -0500
From: gps0 (Gregory Piatetsky-Shapiro)
Subject: NeuroVe$t journal and Data Mining for Financial Applications]
Content-Length: 3383
Here, reprinted with permission, is the review of AKDDM book from
***
NeuroVe$t Journal, Jan/Feb 1996, pg.49, Reviews in Brief section -
Advances in Knowledge Discovery and Data Mining
Advances in Knowledge Discovery and Data Mining (AKDDM) provides
a well-edited collection of material from the 1994 KDD (Knowledge
Discovery in Databases) Workshop, and several additional invited papers.
In all, 23 papers presented in 7 chapters are included along with a
useful appendix on KDD terminology and resources on the Internet.
Coupled with an extensive index and a very good job of editing, AKDDM
makes for a very accessible and worthwhile collection of papers.
Of particular interest to investors and traders, especially those
using data-driven computer technologies, are "A Statistical Perspective
on Knowledge Discovery in Databases" by Elder and Pregibon, which
provides a very good introduction to the topics. "Finding Patterns in
Time Series" by Berndt and Clifford include in their studies a look at
various technical analysis patterns of daily DJIA prices from 1989 to
1993, using pattern templates that vary in length from 9 to 12 trading
days. "Integrating Inductive and Deductive Reasoning for Data Mining" by
Simoudis, Livezey and Kerber involves the creation of portfolios of 100
stocks from 7 years of data on 1500 stocks. "Predicting Equity Returns
from Securities Data with Minimal Rule Generation" by Apte and Hong
describes a minimal rule generation technique for forecasting 1-month S&P
500 returns using 40 fundamental and technical variables (not
specifically identified).
Unfortunately, there is scant mention of the specifics of rough
sets, nearest neighbor classifiers, learning vector quantizers,
self-organizing maps, fuzzy logic and other tools of interest to
practitioners and applied researchers working in the field. And, on more
than a couple of occasions, the authors (including the editors) appear to
venture beyond their respective areas of expertise. However, the few
shortcomings are overshadowed by several very good introductory studies.
Seldom do I recommend collections of workshop or conference papers
to the general audience. However, AKDDM represents an exception.
Despite its weaknesses, it provides a valuable introduction to a
relatively new, yet increasingly important area of applied research.
Financial practitioners who are particularly interested in data mining
will certainly want to take a look.
Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, and
Ramasamy Uthursusamy (editors). 1996. The MIT Press, 55 Hayward Street,
Cambridge, MA 02142. 620 pages. US$50. ISBN 0-262-56097-6.
617-253-5643. -- James Hampton
***
(c) Copyright 1997 Finance & Technology Publishing,
P.O. Box 764, Haymarket, VA 20168. Reprinted
with permission of the publisher from NeuroVe$t Journal, Jan/Feb 1997.
Details on NeuroVe$t Journal (now named J. of Computational Intelligence
in Finance are at) at http://ourworld.compuserve.com/homepages/ftpub
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Sat, 15 Feb 1997 12:14:00 -0800
From: Ronny Kohavi <[email protected]>
Subject: SGI MineSet Available for Varsity Members
Reply-to: [email protected]
Silicon Graphics' MineSet
Available to Varsity Members
----------------------------
MineSet(TM) version 1.1 is the second release of SGI's product for
data mining and exploratory data analysis. MineSet integrates tools
for data access, transformations, analytical data mining, and visual
data mining. See http://www.sgi.com/Products/software/MineSet for
more information.
In addition to 30-day free evaluation copies available to any site,
with the new release of SGI's Varsity program CDs (happening now),
varsity members can get PERMANENT MineSet licenses.
Any educational institution is eligible. To qualify, the institution
must have an infrastructure capable of handling technical software
support for its Silicon Graphics users who have purchased Varsity
Program software packages. THE VARSITY PROGRAM AGREEMENT MUST BE
COMPLETED AND SIGNED BY THE INSTITUTION AND APPROVED BY SILICON
GRAPHICS.
The institution buys the right to distribute Varsity Program Developer
Package right-to-use licenses in multiples of 10 or 25. These licenses
are maintained by purchasing yearly support. Thus, the cost of
ownership is significantly reduced in the second year and beyond.
How Does this Work
------------------
SGI Varsity sites will get Varsity CD-ROMs with MineSet or they can
download it directly from
http://www.sgi.com/Products/Evaluation/evaluation.html
To get a permanent license, the site administrator can use the VPX
(varsity ID) number to get a license from
http://www.sgi.com/Products/license.html (click the radio
button for varsity).
See http://www.sgi.com/silicon_campus/varsity.html for
more information about the SGI's varsity program.
For questions about MineSet, send e-mail to [email protected]
or visit our site at: http://www.sgi.com/Products/software/MineSet
--
Ronny Kohavi ([email protected])
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "Gutschow, Todd" <[email protected]>
Subject: Data Mining Research Position at HNC Software Inc.
Date: Wed, 12 Feb 1997 17:51:58 -0800
The Technology Development Group at HNC Software Incorporated has an
opening for a Manager of Data Mining Technology Research. The Technology
Development Group is responsible for the core data analysis, data
mining, and
data modeling technology used in all HNC vertical solution products. The
position
will report to the Vice President of Technology Development and will be
located at HNC's headquarters facility in San Diego, CA.
Duties/Job Description:
Conduct research in to new data mining algorithms in support of
the Database Mining=D2 Marksman and other HNC products. Identify and
coordinate data mining technology projects across all HNC operating
groups.
Monitor the data mining research literature to identify promising new
techniques.
Support product development and marketing activities via customer
presentations, conference talks, and white papers.
Required Qualifications (Experience/Skills):
MS or Ph.D. in computer science, engineering, mathematics or other
hard science (e.g., physics, chemistry, etc.). Five or more years
experience in implementing and evaluating new statistical data analysis, neural
networks, and/or data mining algorithms. Good software development
skills. Experience with modern software development processes and
tools (e.g., C++, Object oriented design, etc.). Strong communication and
presentation skills.
Preferred Qualifications (Experience/Skills)
Strong algorithm diagnosis and troubleshooting skills. Experience with
database marketing and its associated data analysis problems. Project
management experience.
If you know someone with the above qualifications who is interested in
employment opportunities with HNC, please ask them to fax, mail or
e-mail resumes immediately to:
Human Resources Department
HNC Software Inc.
5930 Cornerstone Court West
San Diego, CA 92121
FAX: (619) 452-6524
E-mail: [email protected]
Reference Job No. 293
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Colin Shearer <[email protected]>
Date: Thu, 13 Feb 97 14:36:13 GMT
Subject: VACANCIES - DATA MINING TOOL DEVELOPMENT & CONSULTING : UK & US
VACANCIES - DATA MINING TOOL DEVELOPMENT & CONSULTING : UK & US
===============================================================
Integral Solutions Limited (ISL) is a leading supplier of advanced decision
support technology, specialising in data mining.
Our award-winning Clementine tool combines multiple modelling techniques
(neural networks, rule induction, regression) with data visualisation and
manipulation to extract high-value decision making knowledge from large bodies
of historical data. A rich visual programming interface makes Clementine
accessible to non-technologist "data owners" - business, rather than IT,
experts - and provides high productivity for "power" users. Clementine has
established a leading position in the data mining market, and is in use in a
wide range of industry sectors including finance, retail, telecoms,
pharmaceuticals, utilities, broadcasting, defence. Applications are diverse
and include demand prediction, customer profiling, risk assessment, turnover
forecasting, process optimisation, fault pre-emption and fraud detection.
We have an urgent need to recruit top-quality technical personnel. Current
vacancies are:
Data Mining Tool Developers
---------------------------
Basingstoke, UK.
To work on the ongoing development of Clementine.
Candidates should have an interest in, and ideally experience of implementing,
advanced modelling and data analysis techniques; experience of commercial data
mining tool development is desirable but not essential. Experience of some or
all of the following would also be useful:
Unix GUI Development
VMS Pop11
X Windows / Motif C
Windows 95 / NT SQL
Databases/ODBC Statistics
Applicants should have a 2.1 or better at first degree; a relevant second
degree may be an advantage. Technical excellence is expected, but must be
combined with first rate communications and interpersonal skills and a desire
for close contact with customers. Recent graduates and those with commercial
experience will both be considered.
Data Mining Consultants
-----------------------
Basingstoke, UK; King of Prussia, PA, USA.
To apply Clementine to customers' business problems. The role will include
pre-sales consulting, training, and developing solutions.
Candidates should be degree-qualified (2.1 or better) and, ideally, should
have experience of data analysis and modelling in a business environment.
Excellent communication and interpersonal skills are vital, and candidates
should display initiative, creativity, enthusiasm (and the ability to convey
it to clients) and self-management skills.
As ISL's clients span many markets, our consultants need the ability to
assimilate knowledge of any client's business, understand their problems, and
fit a data mining solution to these. However, we also encourage applications
from those with a specific business/sector specialisation (for example finance
(banking, insurance), retail or manufacturing).
We are willing to consider applications both from experienced consultants and
from any other candidates who believe they have the aptitude to be developed
into first-class consultants.
This is an opportunity to join a small (30 people) but dynamic and rapidly
developing company in an exciting business/technology area. ISL provides a
stimulating and technically challenging environment with considerable scope
for professional development.
ISL is an equal opportunities employer. We encourage applications from new
graduates through to experienced professionals. Salaries/benefits are
competitive, and commensurate with relevant experience.
Please apply with CV to:
For UK: For US:
Linda Montgomery, Kevin Peyton
Integral Solutions Limited, ISL Decsion Systems Inc.
Berk House, 630 Freedom Business Center
Basing View, King of Prussia
Basingstoke, PA 19406
RG21 4RG USA
UK
Fax : +44 1256 63467 Fax : (610) 768 7774
Email: [email protected] Email: [email protected]
Tell us why you are the ideal candidate for a position at ISL.
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 17 Feb 1997 16:52:04 -0800
From: [email protected] (Wei Zhang)
Subject: Job: machine learning at Boeing
**Outstanding Machine Learning Researcher needed**
The Boeing Company, the world's largest aerospace company, is actively
working research projects in advanced computing technologies including
projects involving NASA, FAA, Air Traffic Control, and Global
Positioning as well as airplane and manufacturing research.
The Research and Technology organization located in Bellevue,
Washington, near Seattle, has an open position for a machine learning
researcher. We are the primary computing research organization for
Boeing and have contributed heavily to both short term technology
advances and to long range planning and development.
BACKGROUND REQUIRED: Machine Learning, Knowledge Discovery, Data
Mining, Statistics, Artificial Intelligence or related field.
RESEARCH AREAS: We are developing and applying techniques for data
mining and statistical analyses of diverse types of data, including:
safety incidents, flight data recorders, reliability, maintenance,
manufacturing, and quality assurance data. These are not areas where
most large R&D data mining efforts are currently focused. Research
areas include data models, data mining algorithms, statistics, and
visualization. Issues related to our projects also include pattern
recognition, multidimensional time series, and temporal databases. We
can achieve major practical impacts in the short-term both at Boeing
and in the airline industry, which may result in a safer and more
cost-effective air travel industry.
A Ph.D. in Computer Science or equivalent experience is highly
desirable for the position. We strongly encourage diversity in
backgrounds including both academic and industrial
experiences. Knowledge of machine learning, statistics, and data
mining are important factors. Experience with databases and
programming (C/C++, JAVA, and Splus) is desirable.
APPLICATION: If you meet the requirements and you are interested,
please send your resume via electronic e-mail in plain ASCII format to
[email protected] (Wei Zhang). You can also send it via
US mail to
Wei Zhang
The Boeing Company
PO Box 3707, MS 7L-66
Seattle, WA 98124-2207
Application deadline is April 30, 1997.
The Boeing Company is an equal opportunity employer.
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Note -- CFPs lately are getting too long! please send short
versions with all the wonderful details at your the conference website! GPS]
From: [email protected]
Subject: 2nd CFP: Agent Theories, Architectures, and Languages, 1997 (4th Intl Wshop)
Date: Mon, 17 Feb 1997 18:20:54 -0500 (EST)
Reply-To: [email protected]
SECOND CALL FOR PAPERS
The Fourth International Workshop on
Agent Theories, Architectures, and Languages (ATAL)
Providence, Rhode Island, USA
July 24-26, 1997
http://www.csc.ncsu.edu/faculty/mpsingh/activities/atal/
Intelligent agents are one of the most important developments in computer
science in the 1990s. Agents are of interest in many important application
areas, ranging from human-computer interaction to industrial process
control. The ATAL workshop series aims to bring together researchers
interested in the agent-level, micro aspects of agent technology.
Specifically, ATAL-97 will address issues such as theories of rational
agency, software architectures for intelligent agents, methodologies and
programming languages for realising agents, and software tools for applying
and evaluating agent systems. Papers that consider macro-level, societal
issues of agent-based systems are welcome only if they explicitly relate to
the workshop themes. ATAL-97 will be held over the three days immediately
preceding the AAAI-97 conference, also being held in Providence. The ATAL-97
proceedings will be formally published as volume four of the Intelligent
Agents series from Springer-Verlag.
TIMETABLE
Submissions due April 18, 1997
Notifications sent May 23, 1997
Prefinal versions due July 1, 1997
Workshop July 24-26, 1997
[edited for brevity -- full details at URL above. GPS]
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 17 Feb 1997 12:15:07 -0800 (PST)
From: H Michael Chung <[email protected]>
Call for Papers
Association of Information Systems 1997 Americas Conference
Indianapolis, Indiana, August 15-17, 1997
Mini-track on
"Tools and Applications of
Data Mining, Induction, and Knowledge Discovery:
In Search of a Mighty Tool"
Minitrack Chair: H. Michael Chung, CSULB
Description
This minitrack covers broader issues related to data mining, induction, and
knowledge discovery in the areas of business and management applications.
Tools based on regression analysis, information theoretic methods, genetic
algorithms, and neural networks have been applied to discover patterns of
financial fraud, to capture customer profiles for marketing, to predict
fluctuations in stock prices, to control product quality, and to diagnose
telecommunication network problems, among others . Expert decisions,
environmental/normative datasets, and Internet database are considered for
discovering information and knowledge.
There are many issues that should be addressed in order to reap quality
knowledge by applying sophisticated algorithms that would satisfy user
needs. Some of the relevant topics include
- Applications of Inductive Learning, Data Mining, and Knowledge
Discovery
- Data Warehousing
- Statistical Inference of Data Mining
- Knowledge Acquisition
- WWW Database and Agents
- Evaluation of Tools
- Economics of Decisions
- Data Visualization
- Learning Systems
***************Important Dates***************
Electronic Submission Deadline: March 1st, 1997
Notification of Acceptance: April 15th, 1997
Camera Ready Copy Due: May 4th, 1997
***************Submission Guidelines******************
Each submission must be FORWARDED ELECTRONICALLY AS A WORD PROCESSING FILE
(MS WORD OR WORDPERFECT FORMAT) ATTACHED TO AN E-MAIL MESSAGE to the
mini-track chair, H. Michael Chung. If this is not possible, then authors
should contact the mini-track chair and arrange for a suitable workaround.
Each submission is limited to THREE-PAGES IN LENGTH (APPROXIMATELY 1,750
WORDS) INCLUDING ALL FIGURES, TABLES, APPENDICES, AND REFERENCES, and must
include the
following:
a) The name, e-mail address, mailing address, university/organizational
affiliation, and phone/fax numbers of the contact person for the submission
in the first few lines of the file,
b) The submission title and the author's(s') name(s), the author's(s')
e-mail address(es), mailing address(es), and author's(s')
organization/university affiliation(s),
c) An abstract of the submission,
d) The body of the submission, and
e) A list of references or a bibliography.
All conference submissions and the submission review processes will be
managed through e-mail. The receipt of submissions will be quickly
confirmed by the mini-track chair. Submissions should follow the style
guidelines of the MIS Quarterly. All camera-ready copy preparation details
will be provided to submitting authors by the mini-track chairs through
e-mail upon acceptance.
Please send any questions and all submissions to Data Mining mini-track to
H. Michael Chung
Department of Information Systems
College of Business Administration
California State University, Long Beach
Long Beach, CA 90840-8506
TEL (562) 985-7691
FAX (562) 985-5543
INTERNET [email protected]
For additional information on the 1997 AIS Americas Conference,
please see the homepages, http://hsb.baylor.edu/ramsower/ais.ac.97.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 17 Feb 1997 15:05:21 +0100 (MET)
From: Luc De Raedt <[email protected]>
CALL FOR PARTICIPATION and PAPERS
IJCAI-97 Workshop on
FRONTIERS OF INDUCTIVE LOGIC PROGRAMMING
Monday 25 August 1997
GENERAL INFORMATION
The IJCAI-97 one day workshop on "Frontiers of ILP" in Nagoya, Japan,
will take place on August 25, immediately prior to
the start of the main IJCAI conference.
TECHNICAL DESCRIPTION
Inductive logic programming (ILP) is a recent subfield of
artificial intelligence that studies the induction of first order formulae
from examples. The purpose of this workshop is twofold:
on the one hand, we wish to widen the scope of ILP
by investigating its relations to neighboring fields,
and on the other hand, we wish to make ILP more accessible
for researchers from neighboring fields.
The workshop therefore solicits papers
that lie at the frontiers of ILP with neighboring fields.
A non-exclusive list of interesting topics for the workshop includes :
* ILP and Software Engineering:
what has ILP to offer to Software Engineering ?,
and in what way can Software Engineering help to design ILP systems
and applications ?
* ILP for Knowledge Discovery in Databases : ILP aims
at learning complex rules involving multiple relations from small
databases, whereas KDD typically induces simple rules about a
single relation from a large database. Furthermore, ILP allows to
exploit background knowledge in a variety of ways. Can KDD and ILP be
succesfully combined ?
* ILP and Computational or Algorithmic Learning Theory :
though many results have been obtained concerning the learnability
of inductive logic programming, most of the results are negative
and most of the positive results are reducible to propositional learning
methods. Is there a mismatch of COLT with ILP ? and if so,
what can be done about it ?
* ILP versus propositional learning methods :
Since the very start of ILP, researchers and practioners of
machine learning have wondered about the relation between
ILP and propositional learning methods. Theoretical and experimental
questions that arise include:
when to use ILP and when to use propositional learning methods ?
under what circumstances can ILP be reduced to propositional learning ?
what is the price to pay for using first order logic in
terms of efficiency ?
* ILP and Knowledge Representation : ILP has traditionally employed
computational logic to represent hypotheses and observations.
Alternative well-founded knowledge representation formalisms have received
little attention (with the exception of CLASSIC).
What can ILP learn from Knowledge Representation ?
and in what well-founded Knowledge Representation formalisms
is induction feasible ?
* ILP in multistrategy learning : Multistrategy learning
combines multiple learning strategies. What role can ILP
play for multistrategy learning ?
* ILP and Probabilistic reasoning: in contrast to
propositional learning methods, ILP has not used
probabilistic representations. How can ILP incorporate
such representations ? and how can it interact with
methods such as Bayes nets or Hidden Markov Models ?
* ILP for Intelligent Information Retrieval:
The rapid development of
the World Wide Web has spawned significant interest in intelligent
information retrieval. In particular, the need for algorithms for
reliably classifying textual documents into given categories (like
interesting/uninteresting) be useful for a wide variety of tasks.
Currently, most learning algorithms are not able to make use of
structural information like word order, succesive words, structure of
the text, etc. Can ILP algorithms offer advantages over conventional
information retrieval or machine learning algorithms for this sort of
tasks?
* Applications of ILP in subfields of AI : ILP has been applied
to other subfields of AI, including natural language processing,
intelligent agents and planning.
Further applications of ILP within AI are solicited.
Both position papers about the relation of ILP to other fields, as well
as research papers that make specific techical contributions
are solicited. However, to stimulate discussion, it is expected
that each technical paper also clarifies the position
of ILP with regard to the neighboring field(s) it addresses.
Except for the presentation of position and technical papers,
the workshop will also feature a panel discussion
on the frontiers of ILP and possibly an invited talk.
ORGANISERS
Luc De Raedt (chair and primary contact)
Saso Dzeroski
Koichi Furukawa
Fumio Mizoguchi
Stephen Muggleton
PROGRAMME COMMITTEE
Francesco Bergadano (Italy)
Luc De Raedt (co-chair, Belgium)
Saso Dzeroski (Slovenia)
Johannes Furnkranz (Austria)
Koichi Furukawa (Japan)
David Page (U.K.)
Fumio Mizoguchi (Japan)
Ray Mooney (U.S.A.)
Stephen Muggleton (co-chair, U.K.)
CALL FOR PARTICIPATION
Participation is open to all members of the AI Community.
However, to encourage interaction and a broad exchange of ideas
the number of participants will be strictly limited
(preferably under 30 and certainly under 40).
Participants will be selected on the basis of submissions.
Three types of submissions will be considered :
1) technical contributions (ideally, a 3 to 5 page extended abstract,
in the IJCAI Proceedings Format, 3000-4000 words),
2) position papers (ideally, a 1 to 3 page abstract
in the IJCAI Proceedings Format, 1000 - 3000 words)
3) a statement of interest (ideally, a one page motivation of why you
would like to participate, 300- 500 words)
Only submissions of type 1) and 2) will be considered
for presentation at the workshop and inclusion in the workshop notes.
Submissions should be received no later than April 1, 1997,
and must include first author's complete contact information,
including address, email, phone, and fax number. Though 1 April
is the hard deadline, the authors are encouraged to submit
their material by 24 March, in order to facilitate the reviewing process.
Double submissions with the ILP-97 Workshop (which is to take
place in Prague, September 1997) are allowed.
SUBMISSIONS
Submit papers by email (postscript) and surface mail (2 copies) to
Luc De Raedt
Dept. of Computer Science
Katholieke Universiteit Leuven
Celestijnenlaan 200A
B-3001 Heverlee
Belgium
Email : [email protected]
IMPORTANT DATES
- Paper submission : 1 April
- Notification to Authors : 21 April
- Camera ready copy : the submissions themselve
will serve as camera ready copy
(submissions in the IJCAI Proceedings Style are strongly preferred,
see http://www.ijcai.org/ijcai-97/ for details)
PUBLICATION
The accepted submissions will be included in the workshop notes
to be distributed at the workshop.
Post-conference publication of a selection of the workshop papers
will be considered and discussed at the workshop.
COSTS
To cover costs, a fee of $US 50 will be charged,
in addition to the normal IJCAI-97 conference registration fee.
Attendees of IJCAI workshops will be required to register
for the main IJCAI conference.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "MANAGO" <[email protected]>
Subject: 2 days course on Data Mining & CBR in San Francisco for University of Berkeley Extension
Date: Tue, 18 Feb 1997 17:23:09 +0100
Continuing Education in Engineering
University of California Berkeley Extension
Intensive short course at the San Francisco Airport
Course Organizer
Michel Manago, Acknosoft
Course Lecturers
Dr Usama Fayyad, Senior Researcher, Microsoft Research
Dr Michel Manago, President, Acknosoft international
Dr Evangelos Simoudis, Vice President, Data Mining and Decision Support
Solutions at IBM
Data Mining and Case-Based Reasoning (CBR): Principles and Applications
An intensive two-day course
Monday-Tuesday, March 24-25, 1997
San Francisco Airport
Course Description
The objective of this course is to present technologies for making better
use of data for decision-making purposes. Data mining techniques are used
to extract decision knowledge: for instance, in the form of a decision tree
or decision rules from a database. Case-based reasoning is the name given
to problem-solving methods that make direct use of past experiences (cases)
rather than a corpus of general knowledge. Data mining (DM) and case-based
reasoning (CBR) technologies can be used to:
* Explore and analyze databases and generate hypotheses about the data;
* Anticipate future events (decision support);
* Solve a new problem, whose solution is unknown, by retrieving and
adapting similar problems that have been previously solved.
According to the meta-group, the market for data mining is estimated at
$800 million by the year 2000. It is considered to be one of the three key
technologies that will have the biggest impact on information technologies
in the third millennium.
The course addresses both practical and theoretical issues. We will compare
and contrast the technologies, present the architecture of CBR and DM
systems, describe some algorithms, and more. We also will show how: cases
are indexed for efficient retrieval; the similarity between new and past
cases is assessed; cases can be represented; to use domain knowledge in
addition to data to characterize applications domains and reveal the
underlying methodology for building an application. We will identify the
market and present real applications in various domains such as technical
maintenance (diagnosis of Boeing 737 aircraft engines), customer support
(help desk for troubleshooting SEPRO robots in the plastic industry),
configuration (layouts of composite parts of an autoclave at Lockheed),
financial decision support, retail, and fraud detection.
Who Should Attend
This course is intended for:
* Business analysts who want to have an in-depth overview of data mining
technology and learn what it can really do and cannot do
* IT managers and technical staff who are in charge of engineering business
information systems and who want to learn how to implement data mining
solutions
* End-users who need to make better use of their data for decision making
* Customer service managers, maintenance managers, manufacturing managers,
financial decision makers who want to learn how to solve problems more
efficiently and at reduced costs
Anyone with a specific application in mind can benefit from the course,
which provides an overview of the technologies as well as of the
applications. Non-technical people will benefit from the basics of the
course, such as general principles and overview of applications
(quantification of business benefits, for example). There are no
prerequisites; this tutorial describes basic notions and illustrates these
with meaningful examples from a variety of applications in technical
maintenance, customer support, manufacturing, banking, and the consumer
market. Computer skills are not required.
Schedule
Monday-Tuesday, March 24-25, 1997
Registration: 8:00 am Monday
Lectures: 8:30 am-4:30 pm daily
Lunches: noon-1:00 pm daily
Location
Embassy Suites Hotel, San Francisco Airport, 150 Anza Blvd., Burlingame,
California.
Fee
The fee is $895 (EDP 326611). This includes:
* 2 days of instruction (1.4 ceu)
* Comprehensive course notes
* Daily lunches and refreshments
Topic Outline
Day One
From Data to Decisions
This brief introduction will provide to the attendees a common ground that
will enable them to understand and participate in the rest of the tutorial.
We will define knowledge discovery (KDD) in databases and case-based
reasoning (CBR)
Introduction to Knowledge Discovery in Databases
In this section we will:
Provide a general architecture for a generic KDD system that will enable
the subsequent discussion of the fundamental KDD issues, presentation of
the various KDD techniques, and description of various existing KDD
systems.
Present the basic knowledge discovery process, from the initial stages of
selecting data and cleaning of the selected data, to the identification of
important attributes and the final stages of integrating the extracted
knowledge into a decision support system.
Briefly discuss the various types of data mining techniques that are
commonly used for KDD. A brief introduction of CBR will be made.
Outline the core research issues in the field of KDD, as well as present
how these issues relate to fundamental AI issues such as representation and
search.
Preparing Data for Mining
The quality of the knowledge extracted by a KDD system from a data set is
related to the quality of the provided data. In this part of the tutorial
we will:
Examine various data problems, e.g., noisy data, incomplete data,
low-information content data, etc.
Discuss how each such problem affects the KDD operation.
Present techniques for solving certain of these problems, e.g., data
cleaning techniques. The large size of the databases that must be analyzed
necessitates the use of sampling techniques and the application of
dimensionality reduction techniques on a data set before a data mining
method is applied to it. We will present commonly used sampling methods and
discuss how they can be implemented. We will also discuss commonly used
dimensionality reduction techniques from statistics, e.g., principal
component analysis, and the use of domain knowledge for identifying
important attributes of a data set. Due to the particular prevalence and
importance of time-series data in a variety of application domains, we will
discuss techniques for preprocessing such data before it is presented to a
KDD system.
Data Mining and Technique Selection
We will present data mining techniques from five basic areas: (1)
artificial intelligence, (2) neural networks, (3) statistics, (4)
multidimensional and deductive databases, and (5) data visualization.
With each type of technique we will present its pros and cons with respect
to the generic KDD model defined in the tutorial's first part.
Databases and Visualization Techniques
Multidimensional and deductive databases merge knowledge-based techniques
with database technology. Recently such databases have been successfully
coupled with relational and legacy database management systems, providing
analysts with unique ways to express and automatically test hypotheses on
very large data sets. In addition, research on very large databases has
resulted in a variety of KDD techniques, such as association discovery and
sequence discovery. These techniques are based on simple database
operations, such as aggregation, and are applicable to specific types of
data, such as those commonly collected by large retail chains. We will
provide an introduction to multidimensional and deductive databases,
discuss data warehousing concepts, present how these techniques can be
applied on KDD tasks, and review the current research on databases.
Visualization has traditionally been used for the presentation of results
obtained by other methods, e.g., statistical analysis. We will discuss how
interactive visualization techniques can be used for knowledge discovery
operations. We will begin with simple techniques (scatter plots and line
plots, for example) and proceed with modern 3-D visualization techniques.
Some Examples of KDD Applications
We will first develop a set of criteria for comparing KDD systems. We will
then review in depth two such systems developed by the authors and
considered by the research community as representing the state-of-the-art:
IBM's customer segmentation data mining system and JPL's SKICAT system. In
addition to presenting the architecture of each system and discussing the
KDD methods it integrates, we will present a detailed account of how the
systems have been applied on financial, retail, manufacturing, astronomy,
and large image databases in planetary sciences.
Demonstration of a Data Mining System and Applications
Summary of the Day and Discussion
Summary, recap, overview of the basic unifying themes, and pointers to
available literature on KDD and future work.
Day Two
Overview of Case-Based Reasoning (CBR) Technology
In this introduction, we will present an overview of CBR, detail the CBR
cycle, and explain the main characteristics of CBR technology.
Applications of CBR in Technical Domains
We will present several CBR applications in technical domains. These deal
with maintenance, customer support, manufacturing, design, rapid evaluation
of production costs, and sale-support.
Troubleshooting CF56-3 engines for the Boeing 737. Time spent by airline
maintenance operators to solve engine failures and related costs (flight
delays or cancellations) are a major concern. The use of an intelligent
diagnostic software contributes to improving customer support and reduces
the cost of ownership by improving troubleshooting accuracy and reducing
airplane downtime. We will examine this application from the engine
manufacturer perspective (CFM international/Snecma) as well as from the
client's perspective (British Airways). Integration of the CBR
troubleshooting with electonic technical documentation. Demonstration.
A help desk for troubleshooting SEPRO robots in the plastic industry. Case
study from a small size company (160 employees) that has adopted CBR for it
customer support services. Demonstration.
Improving feedback from experience in manufacturing. We will present the
ongoing Noemie data warehousing and data mining project. Noemie aims at
increasing the quality and reliability of equipments for the oil industry.
Case study from the manufacturer perspective (Schlumberger) as well as from
the end-user's perspective (Nork Hydro).
CBR: How It Works
Based on the review of applications that will have been presented during
the morning, we will go into the details of the algorithms and present how
they have been used. In particular, we will describe mechanisms for:
retrieving cases; assessing the similarity; and indexing cases. We will
describe the link between induction, a form of KDD, and CBR. We also will
present some sample algorithms.
Comparing CBR with Other Technologies
During this part of the tutorial, we will compare CBR and other
technologies for decision making. In particular, we will look at rule-based
expert systems, classical statistics, neural networks, and standard
database queries. We will review a case study done at a banking institution
for comparing credit scoring, CBR, and rule-based expert systems.
Case-Based Reasoning in Practice
During this final presentation, we will detail the basic steps and a
methodology for building a CBR system. We will describe how to model cases,
state how cases can be acquired from scratch or from existing databases,
review potential sources for the cases, and explain how to choose an
algorithm. We will also investigate organizational issues for assuring case
quality and explain how human factors have to be taken into consideration
when delivering a CBR application.
Summary of the Tutorial and Discussion
Lecturers
Usama Fayyad, Ph.D., is a Senior Researcher at Microsoft Research. He is
also a Distinguished Visiting Scientist at the Jet Propulsion Laboratory
(JPL), California Institute of Technology, and an adjunct professor of
computer science at University of Southern California. Prior to joining
Microsoft Research, he headed the Machine Learning Systems Group at JPL and
was Principal Investigator of the Science Data Analysis and Visualization
Task and other tasks involving machine learning applications. He received
his Ph.D. in computer science and engineering from the University of
Michigan, Ann Arbor. He is a recipient of the NASA Exceptional Achievement
Medal (1994) and the 1993 Lew Allen Award for Excellence at JPL. He has
co-chaired Knowledge Discovery in Database conferences KDD-94 and KDD-95,
and is general chair of KDD-96. He is a co-editor of Advances in Knowledge
Discovery and Data Mining (AAAI/MIT Press 1996), and Editor-in-Chief of a
new journal on this topic (Kluwer).
Michel Manago, Ph.D., is the scientific and managing director of AcknoSoft.
Dr. Manago graduated from the University of Illinois at Urbana-Champaign
and obtained his Ph.D. at the University of Paris, writing his thesis on
"Integration of Symbolic and Numeric Techniques in Machine Learning." He
has applied DM and CBR in technical domains such as diagnosis of Boeing 737
engines, customer support for marine diesel engines and robots, maintenance
of trains, reliability analysis of gas meters, experience feedback to
increase quality of production when manufacturing oil equipment, nuclear
safety, design of plastic parts in the manufacturing industry, and active
sale support over the Internet. He is author of the KATE line of products
for DM and CBR. He is editor of the book Advances in Case Based Reasoning
(Springer Verlag, 1995) and author of the report "A Review of Industrial
Case-Based Reasoning. He received the Information Technologies European
Award in 1995 (the European "Nobel prize" in computer technologies), among
other honors.
Evangelos Simoudis, Ph.D., is Vice President, Data Mining and Decision
Support Solutions at IBM, where he is responsible for the development and
deployment of data mining solutions to IBM's customers worldwide. Prior to
joining IBM, Dr. Simoudis was a Group Leader of the Data Comprehension
Group at the Lockheed AI Center where, since 1991, he led the development
and market introduction of the Recon data mining system and led research on
knowledge discovery in databases, machine learning, case-based reasoning
and their application to financial, retail, and fraud detection problems.
In 1994 Dr. Simoudis and his team were awarded Lockheed's Pursuit of
Excellence Award for their work on the Recon system. Dr. Simoudis is also
an adjunct assistant professor at the computer engineering department of
Santa Clara University. Dr. Simoudis holds a Ph.D. in computer science from
Brandeis University, an M.S. in computer science from the University of
Oregon, a B.S. in electrical engineering from the California Institute of
Technology, and a B.A. in physics from Grinnell College.
Enrollment Information
Enrollment may be made by companies or individuals. Enrollment is limited
and advance enrollment is required. Upon request, a place in the course
will be reserved for individuals who require time to obtain authorization.
To reserve a place, call (510) 642-4151, or fax (510) 642-6027.
How to enroll
By phone: You may enroll by phone if you use MasterCard, Visa, or American
Express; call (510) 642-4111.
By fax: If you use MasterCard, Visa, or American Express, fill out the form
on the back of this brochure and send it via fax number (510) 642-0374.
Please be sure to fax the entire form including the mailing label, if there
is one. Please provide all the information requested on the form.
By mail: Fill out and return the enrollment form provided.
By purchase order: Companies, agencies, and other organizations may pay
course fees by purchase order.
Enrollments must be accompanied by the full fee or by purchase order
authorization. You may pay by check or use MasterCard, Visa, or American
Express. Make checks payable to the UC Regents.
For efficient enrollment processing, we must have the Priority Code from
this publication, whether or not it is addressed to you. This five-digit
code (three numbers and two letters) appears on the mailing label above the
addressee's name. If there is no label on your copy, the code appears in a
box in the middle of the address surface.
Cancellation policy: Any cancellation is subject to a $30 processing fee.
Cancellations received less than five working days from the start of the
course are subject to a $100 cancellation fee. Substitutions may be made at
any time. If the course is not held for any reason, UC Berkeley Extension's
liability is limited to refund of the full course fee.
Confirming your enrollment: If you enroll by mail and have not received an
enrollment confirmation five days prior to the scheduled date of the
course, please call (510) 642-4151 to confirm that the course will convene
as scheduled.
Housing
A group of rooms will be set aside at the Embassy Suites Hotel, San
Francisco Airport, 150 Anza Blvd., Burlingame, California, and reservation
information will be sent to enrollees. Participants may reserve rooms in
advance with Embassy Suites, phone (415) 342-4600 or fax (415) 342-8109.
Special rates will be available; participants in these courses should so
identify themselves when requesting room reservations. Reservations must
be made no later than one month before the date of your course. After this
date room reservations will be accepted only on a rate and space
availability basis.
Airport transportation and parking
Courtesy shuttle service is provided between the hotel and the airport.
There is ample free parking available at the hotel.
Continuing education units (ceu)
These units are a nationally recognized means of recording noncredit study
and are accepted by many employers and relicensure agencies as evidence of
a serious commitment to career advancement and the maintenance of
professional competence. One ceu is awarded for each 10 hours of
attendance. If you want us to keep a record of your ceu study you must fill
out and return a form that will be distributed in class.
Program Coordinator
Linda Reid, Continuing Education in Engineering, University Extension,
University of California, Berkeley
Program Representative
Natalie Dennis, Continuing Education in Engineering, University Extension,
University of California, Berkeley
General Information
Housing
A group of rooms will be set aside at the Embassy Suites Hotel, San
Francisco Airport, 150 Anza Blvd., Burlingame, California, and reservation
information will be sent to enrollees. Participants may reserve rooms in
advance with Embassy Suites, phone (415) 342-4600 or fax (415) 342-8109.
Special rates will be available; participants in these courses should so
identify themselves when requesting room reservations. Reservations must
be made no later than one month before the date of your course. After this
date room reservations will be accepted only on a rate and space
availability basis.
Airport transportation and parking
Courtesy shuttle service is provided between the hotel and the airport.
There is ample free parking available at the hotel.
Continuing education units (ceu)
These units are a nationally recognized means of recording noncredit study
and are accepted by many employers and relicensure agencies as evidence of
a serious commitment to career advancement and the maintenance of
professional competence. One ceu is awarded for each 10 hours of
attendance. If you want us to keep a record of your ceu study you must fill
out and return a form that will be distributed in class.
Program Coordinator
Linda Reid, Continuing Education in Engineering, University Extension,
University of California, Berkeley
Program Representative
Natalie Dennis, Continuing Education in Engineering, University Extension,
University of California, Berkeley
If you have questions
Call (510) 642-4151, e-mail [email protected], fax (510) 642-6027,
or write to Continuing Education in Engineering, University Extension, UC
Berkeley,
1995 University Ave., Berkeley, CA 94720-7010
The University of California, in accordance with applicable federal and
state law and University policy, prohibits discrimination, including
harassment, on the basis of race, color, national origin, religion, sex,
disability, age, medical condition (cancer-related), ancestry, marital
status, citizenship, sexual orientation, or status as a Vietnam-era veteran
or special disabled veteran. This nondiscrimination policy covers
admission, access, and treatment in University programs and activities.
Inquiries may be directed as follows: sex discrimination and sexual
harassment: Carmen McKines, Title IX Compliance Officer, (510) 643-7895;
disability discrimination and access: Ward Newmeyer, A.D.A./504 Compliance
Officer, (510) 643-5116 (voice or TTY/TDD); age discrimination: Alan T.
Kolling, Age Discrimination Act Coordinator, (510) 642-6392. Other
inquiries may be directed to the Academic Compliance Office, 200 California
Hall, #1500, (510) 642-2795.
CONTRACT TRAINING
Enlist our experts at your location
At UC Berkeley Extension we're committed to working with you and your staff
to help achieve your objectives. Through the Berkeley Partnership for
Professional Development, we'll meet with you to analyze your staff's
training needs, then custom-design a program to satisfy your special
requirements. Or you can select from our many established courses.
Contract training offers:
_ Choice of format: from workshops and sequential classes to multiday
residential seminars
_ Highly qualified instructors
_ Convenient location: on-site at your company or at a facility of your
choice
_ Courses tailored to your needs
To discuss your training needs,
call Karl Johnson at (510) 642-4151 or fax (510) 642-6027
ENROLL BY FAX with MasterCard, Visa, American Express, or a company
purchase order: (510) 642-0374.
Or enroll by phone with MasterCard, Visa, or American Express: (510)
642-4111.
Please give us the Priority Code (see below) if you enroll by phone.
To enroll by mail, return this entire page. Please do not remove the
mailing label.
Mail to: Dept. B, UC Berkeley Extension, 1995 University Ave., Berkeley, CA
94720.
Name
last
first
middle
Position
Company name
BUSINESS ADDRESS
number
street
mail stop
city
state
zip
Daytime phone
Fax number
These numbers are requested so that you can be notified if there is a
change in the schedule or status of your course.
Priority Code 6 0 9 ___ ___
For efficient processing, we must have the Priority Code from this
publication, whether or not it is addressed to you. This 5-digit code (3
numbers and 2 letters) appears on the mailing label above the addressee's
name. If there is no label on your copy, the code appears in a box in the
middle of the address surface.
I enclose $ ___________to cover_______enrollments in:
_____ Data Mining and Case-Based Reasoning
$895
EDP 326611
To pay by check, make check payable to the UC Regents.
To use oMasterCard oVisa oAmerican Express check appropriate box and give:
account number
date card expires
authorizing signature
For companies/agencies:
_____ Purchase order enclosed
(For proper processing this form must accompany your purchase order.)
Michel Manago
AcknoSoft
58 rue du Dessous des Berges
75013 Paris - France
tel : (33 1) 44 24 88 00, fax : (33 1) 44 24 88 66
web : http://www.AcknoSoft.com
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "MANAGO" <[email protected]>
Subject: Tutorial on CBR & Data Mining in London + 2 days seminar
on applications of CBR & Data Mining
Date: Tue, 18 Feb 1997 17:32:40 +0100
The following events are taking place in London on 17-19 March 1997
For registration please see the website (http://www.unicom.co.uk).
Principles & Applications of CBR & Data Mining
UNICOM Tutorial + Seminar Organized by Dr Michel Manago, Acknosoft
OBJECTIVES:
The objective of the tutorial is to present technologies for making better
use of data for decision making purposes. Induction is a data mining
technique that is used to extract decision Knowledge, for instance in the
form of a decision tree or decision rules, from a database. Case-Based
Reasoning is the name given to problem solving methods that make direct use
of past experiences (cases) rather than a corpus of general Knowledge. The
technologies can be used for:
1. Exploring and analysing databases and generate hypothesis about the
data
2. Anticipate future events (decision support)
3. Solve a new problem, whose solution is unknown, by retrieving and
adapting similar problems that have been previously solved.
During this course, we will describe the underlying techniques and
methodologies to improve the decision making process by making better use
of data. The course will address both theoretical and practical issues. We
will compare and contrast the technologies, present the architecture of a
CBR and a DM System, describe some algorithms etc. We will show how cases
are indexed for efficient retrieval, how the similarity between new and
past cases is assessed, how cases can be represented, how to use domain
knowledge in addition to data, characterise applications domains and reveal
the underlying methodology for building an application. We will identify
the market and delineate real applications in various domains.
A. From data to decisions
The brief Introduction will provide to the attendees a common ground that
will enable them to understand and participate in the rest of the tutorial.
We will define Data Mining (induction) and Case-Reasoning (CBR).
B. Introduction to induction
In this section we will:
1. Present how to generate decision tree by induction
2. Present the inductive process, from the initial stages of selecting data
to the identification of important attributes, and the final stages of
integrating the extracted knowledge into a decision support system.
C. Presentation of Based Reasoning (CBR) technology
In this introduction, we will present an overview of CBR, detail the CBR
cycle and explain the main characteristics of CBR technology..
D. CBR : how it works
We will go into the details of the algorithms and present how they have
been used. In particular, we will describe mechanisms for :
1. retrieving cases
2. assessing the similarity
3. Indexing cases. We will describe the link between induction, a form of
KDD, and CBR
Finally, we will present some sample algorithms.
E. Preparing Data for CBR and Data Mining
The quality of the knowledge extracted by a decision support system from a
data set, is related to the quality of the provided data. In this part of
the tutorial we will examine various data problems, e.g., noisy data,
incomplete data, low-information content data, etc.
F. Comparing induction and CBR with other technologies
During this part of the tutorial, we will compare KDD & CBR and other
technologies for decision making. In particular, we will look at rule based
expert systems, classical statistics, neural networks and standard database
queries. We will review a case study done at a Banking institution for
comparing credit scoring, CBR and rule base expert systems.
G. Applications of CBR and data mining
During this final presentation, we will detail the basic steps and a
methodology for building a CBR system. We will describe how to model cases,
stated how cases can be acquired from scratch or from existing databases,
review potential sources for the cases and explain how to choose an
algorithm. We will also investigate organisational issues for assuring case
quality and explain how human factors have to be taken into consideration
when delivering a CBR application. We will also try to characterise the
market for CBR and data mining.
H. Summary of the tutorial and discussion
PRESENTER:
Dr Michel Manago graduated from the University of Illinois in
Urbana-Champaign in 1983. He obtained his PhD in 1988 at University of
Paris on "Integration of Symbolic and Numeric Techniques in Machine
Learning. Since 1991, Dr Manago has been the scientific and managing
director of AcknoSoft where he has been "putting the technology to use".
Michel Manago is the father of the KATE line of products for taking smart
decisions from data. He was chairman of the 2nd European workshop on CBR in
1994, editor of the book Advances in Case Based Reasoning (Springer Verlag,
1995) and author of the report "A review of industrial Case Based
Reasoning. Dr Michel Manago received the Information Technologies European
Award in 1995 (the European "Nobel prize" in computer technologies), the
1st prize for innovative software application at the XPS trade show in
Germany in 1995 and the 1996 Application of the Year award by the French
computer magazine "Decision micros et rouseaux".
CBR and Data Mining: Putting the Technology to Use
BACKGROUND
Companies have gathered vast amounts of data that is not well used. Some
corporate databases almost work in write-only mode! Well exploited, this
mass of data could be turned into strategic corporate knowledge :
- the marketing department wants to discover trends in buyer behaviour
- the after sales division must work more efficiently so that the company
keeps customers
- the financial department wants to assess risks in a better way
- quality management and control must be improved...
However, going from data to decisions is not an easy task.
Innovative computer technologies such as data mining and Case Based
Reasoning (CBR), will help you solve complex problems in domains where
experience plays a critical role in good decision making. And with only a
short delay develop a solution and a guaranteed payback.
(C) Copyright AcknoSoft, 1997
OBJECTIVES :
The goal of this seminar is to get a clear view about the state of the art
of applying data mining and CBR technologies to solving practical problems.
The emphasis of the seminar will be on presentations done by users of the
technology as opposed to technology providers. They will share their
experience and delineate the benefits as well as the difficulties of
putting the technologies into use. The themes that will be covered by the
speakers include
- What are CBR and data mining?
- Features of the software products they have used to build their
application
- Comparison of data mining and CBR with other technologies
- Methodologies for case acquisition and maintenance
- Ensuring case quality and monitoring it over time
- Organisational issues that needed to be solved in order to field the
application
- Human factors
- Overcoming technological risks
- Cost and benefits of using data mining and CBR in various domains
The goal of the seminar is to present a clear view about issues that are in
common when building CBR and data mining applications in different domains
(banking, insurance, customer support and help desk, manufacturing,
energy). We will focus on general topics such as how to assess the costs
and quantify the benefits of using the technology, how to model cases so
that they contain the right sort of knowledge for decision making purposes,
how to use the tools to build systems that analyse cases efficiently or how
to manage a CBR project from the customer's perspective.
Benefits of Attending
-Find out how the knowledge of your specialists available to everyone in
your organisation
-Learn how to solve problems more quickly without the burden of building
expert systems
-Capitalise your experience
-Elicit the user point of view
-Share experience with other CBR application developers
-Find out how to analyse and distill your data into usable knowledge
-Take smart decisions that are based on your experience
Programme
Day 1
Brief introduction by Michel Manago
Short presentation about Data Mining and CBR, introduction of the
objectives of the seminar.
Using data mining and CBR at Deloitte & Touche
Olivier Curet and Jonathan Killin Deloitte & Touche Consulting Group UK,=
|
410.17 | 97:08 | IJSAPL::OLTHOF | Spellchecked Henry Although | Sat Mar 01 1997 14:30 | 667 |
| Knowledge Discovery Nuggets 97:08, e-mailed 97-02-28
News:
* GPS, New Location for KD Mine and KD Nuggets: www.kdnuggets.com
* W. Kloesgen, KDD-97: Second Call For Panel Proposals
* P. Maiste, Price Waterhouse announces new data mining services
* T. Denecke, Query: Data Mining and Workflow Management ?
* D. Throop, Query: Finding approximately duplicate records ?
Publications:
* P. Stolorz, CFP: DMKD special issue on scalable computing
http://www.research.microsoft.com/research/datamine/dmkdpar
Siftware:
* G6G, Intelligent Software Web Site,
http://www.intelligent-dir.com
Positions:
* W. Buntine, summer students and scientist positions in
autonomous data analysis
* B. Masand, KDD Job at GTE Laboratories, Waltham, Ma
* S. Wrobel, Two positions in Machine Learning/Data Mining at GMD
--
KD Nuggets is a free electronic newsletter for the Data Mining and Knowledge
Discovery community, focusing on the latest research and applications.
Submissions are most welcome and should be emailed,
with a DESCRIPTIVE subject line (and a URL) to [email protected]
To subscribe, email to [email protected] message with
subscribe kdd-nuggets
in the first line (the rest of the message and subject are ignored).
See http://www.kdnuggets.com/subscribe.html for details.
Nuggets frequency is 3-4 times a month.
Back issues of Nuggets, a catalog of S*i*ftware (data mining tools),
and a wealth of other information on Data Mining and Knowledge Discovery
is available at Knowledge Discovery Mine site http://www.kdnuggets.com/
-- Gregory Piatetsky-Shapiro (editor)
********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) *
* and not necessarily of their respective employers (or GTE Laboratories) *
*****************************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"An experimental science is supposed to do experiments
that find generalities. It's not just supposed to
tally up a long list of individual cases and their
unique life stories. That's butterfly collecting."
Richard C. Lewontin, biology professor at Harvard University
Thanks to Yolanda Gil
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 28 Feb 1997 09:41:10 -0500 (EST)
From: GPS <[email protected]>
Subject: New Location of KD Mine -- www.kdnuggets.com
I have set up a new location for Knowledge Discovery Mine web site
-- www.kdnuggets.com --
which is operational today, Feb 28, 1997.
I will continue to maintain and improve that site in my new job --
see www.kdnuggets.com/gps.html
The GTE location at info.gte.com/~kdd will remain for some time,
but I will not be updating it.
I will also continue to edit and email Knowledge Discovery Nuggets
(I have dropped the second D to emphasize the more general focus).
It will be gradually transitioned to kdnuggets.com site,
but in the meantime will continue be distributed from GTE.
The changeover should be transparent to all subscribers.
--
Gregory Piatetsky-Shapiro
please address KD Nuggets related email to [email protected]
(which is an alias for [email protected])
other email to me to [email protected]
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 26 Feb 1997 14:41:27 +0100
From: [email protected] (Willi Kloesgen)
Subject: KDD-97 organization -- call for panels
As in previous KDD conferences, the KDD-97 program will include panel
discussions. A great panel requires an interesting topic, good
speakers, and proper preparation. To facilitate all three we solicit
early suggestions. Please submit suggestions for topics and preferably also
for panelists who could represent diverse positions or approaches of the
topic. Suggested topics should relate to any of the main KDD-97 topics (see
http://www-aig.jpl.nasa.gov/kdd97).
The panel topics should be of general interest for a
large part of the KDD audience and allow several (controversial) approaches
to be discussed.
Please email informal suggestions by April 2, 1997 (earlier if possible) to:
Willi Kloesgen
[email protected]
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 21 Feb 1997 13:34:22 +0100
From: Tom Denecke <[email protected]>
Subject: Data Mining and Workflow Management
I am a student of Business Science and working in a research project
"controlling of workflow processes".
My idea is to use data mining techniques to evaluate the control data
of workflow systems. My problem is that I am not very familiar with that
technical terms. So it would be great to get a hint which methodolgies
would fit to this application domain.
Here is a little description which kind of information can be achieved:
There several process instances of each process type(for example
auditing).
After the execution of 100 instances, there exist a lot a data for this
process type, which can be explored.
- processing and idle time
- who executed the process (employee, role, orga. unit)
- which kind of workflow
- which activities were executed
- data about the process object (which customer, article ...)
- which other processes are running
- metrics concerning quality and cost of a process/activity
- ...
We would like to generate rules about the process performance
(bottle neck detection, when does a process perform well,..).
I would be very kind to get a little information, if there a similar
problems, which are solved by data mining techniques or just literature
hint.
Thank you very much
Tom Denecke
- MBA -
WWU Muenster
Rudolf-Harbig-Weg 24
48149 Muenster
PHONE + 49 251 89 75 65
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[The following is a commercial announcement. GPS]
From: [email protected]
Date: Fri, 28 Feb 97 08:09:07 EST
Subject: Press Release: Opening of a Knowledge Discovery Center
IMMEDIATE RELEASE
CONTACT:
Price Waterhouse Management Consulting in New York:
Jan Butler
212- 819-4838, [email protected]
Liza Kurtz
212-995-5680, ext. 210, [email protected]
PRICE WATERHOUSE LLC ANNOUNCES NEW DATA MINING SERVICES AND OPENING OF
KNOWLEDGE DISCOVERY CENTER
New York, NY - February 26 - Price Waterhouse Management Consulting, a
recognized leader in delivering data warehouse services to global companies,
introduces Data Mining Services for helping clients achieve strategic value
from the mounds of data often accumulated in the course of business. An
integrated offering of Price Waterhouse's Global Data Warehouse Practice, the
Data Mining Services range from introductory seminars on data mining and
knowledge discovery to full data mining system implementations. To support
these offerings, Price Waterhouse has opened the Knowledge Discovery Center in
Bethesda, Maryland
"Data mining has recently moved to the forefront of business executive's
strategic data warehouse initiatives, driven by a significant growth in the
amount of data that companies collect on their customers, processes, and
finances," said Mike Schroeck, Global Data Warehouse Practice Leader for Price
Waterhouse. Data mining technologies use sophisticated, automated algorithms to
discover hidden patterns, correlations, and interacting relationships among the
hundreds of strategic data elements collected by an organization. The impact of
data mining on a company's bottom line, whether through increased revenues or
decreased costs, is often enormous.
A leader in data mining knowledge and research, Price Waterhouse has performed
a comprehensive, hands-on evaluation of many of the leading data mining tools
currently available on the market, and has spoken at a variety of conferences
and trade shows on the subject. With years of analytical modeling and data
analysis experience, Price Waterhouse can help clients get the greatest return
on their data mining investment. "We are dedicated to offering value-added data
mining analyses to our clients. The time for businesses to take advantage of
these tools and algorithms has never been better," says Dr. Glenn Galfond,
Partner in charge of Price Waterhouses Management Analytics practice, which is
spearheading the firms Data Mining Services.
The Data Mining Services offered by Price Waterhouse include Data Mining 101,
Data Mining Proof, Data Mining Service, and Data Mining Solutions. Data Mining
101 is a half-day beginner's course in data mining. The course provides an
overview of the technology, examples of how it has been successfully used, and
a demonstration of the leading data mining tools. Data Mining Proof is a short
proof of concept project, in which Price Waterhouse mines a small extract of a
client's data for quick, but rewarding results. This allows the client to see
data mining's potential in a hands-on environment. Clients also receive a copy
of PW's comprehensive Data Mining Tool Evaluation report.
For companies that are ready to delve more deeply into data mining but do not
have the necessary in-house resources, Data Mining Service offers a full range
of data mining outsourcing options, including data extraction, data cleansing,
and data mining. For companies that wish to implement enterprise-wide data
mining systems, Data Mining Solutions offers Price Waterhouse's proven data
mining and data warehousing methodology and full-scale systems implementation
experience.
The Knowledge Discovery Center will be used to support these services and to
provide an environment for demonstrating the latest data mining tools and train
clients in their use. Price Waterhouse has equiped the Center with many of the
leading data mining tools. The technologies and algorithms available in the
Center encompass the full-breadth of data mining capabilities. Galfond adds,
"Price Waterhouse has invested heavily in the research and evaluation of the
leading data mining tools. Our clients can take advantage of this investment
while reaping the benefits that data mining brings to their companies."
Price Waterhouse Management Consulting delivers enterprise-wide solutions to
large multinational clients through integrated Information Technology and
Change Integration services. With in-depth knowledge of selected industries
and business process expertise, Price Waterhouse Management Consulting works
with clients worldwide, from strategy through implementation, to help them
improve business performance. Price Waterhouse Management Consulting services
are provided in the U.S. by Price Waterhouse LLC.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
{Please cc responses to the [email protected]
since the problem is of general interest. GPS]
From: "Throop, David R" <[email protected]>
Subject: Looking for phrase matching tool
Date: Tue, 25 Feb 1997 10:03:30 -0600
Dr. Piatetsky-Shapiro,
Thank you for your excellent website on data mining. I'm hoping you
might help me, or point me towards someone who can.
I'm looking for a piece of commercial software that may or may not
exist. I couldn't find it on your pages, but your stuff is the closest
I've found. So I'm asking you for any pointers.
We have several databases which have lists of components (pieces of the
International Space Station.) These databases have no common key. They
do, however, have english-language descriptions of the components (on
the order of 20 - 50 characters long.) However, these descriptions are
not identical. For instance, a certain power switch is known by two
different names:
RPCM N1-3B-C Switch14 and N1-3B-RPCM-C-RPC-14
As you see, the order of the identifiers is different, one set uses the
term 'switch' where another uses 'RPC', and the '14' is concatenated
with no space on one side.
Anyway, I'm looking for a piece of software that could go through the
databases, (armed with a dictionary, list of abbreviations, synonyms
etc) and come up with a set of best guesses about which items match.
Do you know of such a tool, either as a commercial product or a research
program?
Thanks
David Throop
281 212 9369
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 27 Feb 1997 22:43:45 -0800 (PST)
From: DMKDPAR <[email protected]>
Subject: CFP for DMKD special issue on scalable computing
============================================================================
CALL FOR PAPERS
============================================================================
DATA MINING AND KNOWLEDGE DISCOVERY
Special Issue on
Scalable High-Performance Computing for KDD
Guest editors: Paul Stolorz and Ron Musick
==========================================
http://www.research.microsoft.com/research/datamine/dmkdpar
Traditional computational techniques and computer architectures are
routinely overwhelmed by the sheer volume and complexity of information
generated from data-gathering instruments, computational and
experimental methodologies, and business operations. The fundamental
problem of extracting knowledge and insight from massive databases and
datasets is shared across a wide range of fields in business,
academia and government. The new field of Data Mining and Knowledge
Discovery in Databases (KDD) has arisen as an interdisciplinary response
to this situation, merging ideas drawn from disciplines such as statistics,
pattern recognition, machine learning, databases, visualization and
high performance computing.
This special issue of Data Mining and Knowledge Discovery is devoted
to the challenge of applying data mining and knowledge discovery methods
to large, complex datasets. Implementation of data mining ideas in
high-performance computing environments is crucial for coping with
large-scale data. In particular, parallel and distributed systems are
needed to ensure system scalability as datasets grow inexorably in size
and scope. These environments include dedicated massively parallel
supercomputers, super-servers built from clusters of commodity
workstations and high-speed network interfaces, and heterogeneous
networks distributed over regional, national and global scales.
High-performance and parallel computing holds the promise of scaling
to large data sets, allowing the data mining component to search a much
larger set of patterns and models than traditional computational platforms
and algorithms would allow. In addition, it promises to render the KDD
process much more interactive by allowing fast response times for
difficult search and model fitting problems.
Data Mining and Knowledge Discovery, published by Kluwer Academic
publishers, is the flagship publication in the rapidly growing area of
KDD. In this special issue we solicit the most dramatic new
developments in high performance large-scale KDD applications, highlighting
the promise of the technology and identifying the main challenges for
the future. Technically innovative papers that describe new theoretical
developments, or tackle the application of practical data mining
approaches to real problems and datasets on parallel and distributed
architectures, are solicited. Topics of interest include, but are
not limited to, the intersection of KDD with the following fields:
Parallel implementations of datamining & KDD methods:
Classification and regression: e.g. decision trees, neural nets
Pattern recognition
Belief nets and other Bayesian approaches
Genetic programming
Association rules
Statistical inference
Similarity detection and measurement
Clustering and density estimation
Change-detection
Text retrieval
Content-based indexing
Data visualization
Trend Analysis
Integration of KDD techniques with scalable I/O systems:
Data warehouses & federated databases
Parallel file systems
High-performance network interfaces
Intelligent data layout
Out-of-core algorithms
Parallel relational querying
High performance storage systems
Hierarchical and distributed storage
Methods to control complexity:
Random sampling
Anytime algorithms applied to datamining techniques
New complex data-type algorithms (eg. not based on feature vectors)
Domain simplification techniques
Inference error/confidence characterization
Parallel, clustered and/or distributed applications:
Datamining on commodity-based clusters and networks
Web-oriented datamining
Novel applications and case studies
Knowledge discovery systems and tools
SUBMISSION INSTRUCTIONS
Electronic submissions are STRONGLY ENCOURAGED. Postscript copies
of papers may be emailed to [email protected]. Latex style
files and related instructions can be obtained at the web site
http://www.research.microsoft.com/research/datamine.
===============
IMPORTANT DATES
===============
**************************************
SUBMISSION DEADLINE: May 8, 1997
ACCEPTANCE NOTIFICATION: June 20, 1997
**************************************
Enquiries about the submission process and scope of the special issue
may be sent to [email protected].
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[The following is a commercial announcement. GPS]
From: [email protected]
Date: Mon, 24 Feb 1997 22:47:04 -0500
Subject: SAIC and G6G Develop an Intelligent Software Web Site
"SAIC and G6G Develop an Intelligent Software Web Site"
NEW Web-Site Address is: www.intelligent-dir.com
Science Applications International Corporation's (SAIC) Asset Source for
Software Engineering Technology (ASSET) Division has teamed up with G6G
Consulting Group (G6G) and co-developed a ground breaking new World Wide
Web (Web) site focused on "intelligent software."
The new site contains the entire content of "The G6G Directory of
Intelligent Software," a publication that contains over 750 abstracts
covering 15 advanced technology corridors.
"The G6G Directory of Intelligent Software" contains product abstracts in
Expert (Knowledge-Based) Systems, Fuzzy Logic, Hypermedia, Hypertext and
Multimedia, Intelligent Software Tools, Neural Networks, Object-Oriented
Programming, Virtual Reality, Voice & Speech Systems, and other areas.
The directory is further categorized by over 140 sub-categories of "what"
the product can be used for or "what it is" such as:
- Data Mining - Manufacturing Systems
- Diagnostic Systems - Modeling
- Help Desk Systems - Network Systems
- Help Authoring Systems - Stock Market
- Knowledge Management - Software/Hardware
- Lending and Learning Systems - Software Development
- Customer Support Systems - and many others.
The directory content on this Web site will be updated on a weekly
basis. The combination of G6G's directory and ASSET's on-line free and
commercial product inventory will present a powerful complement of
information on the Web. Knowledge engineers, software engineers,
developers and other users of intelligent software products will find
www.intelligent-dir.com to be extremely useful.
This valuable free resource will help create a sense of community in the
world of intelligent software by providing an on-line source of
searchable information about intelligent software products and vendors.
__________________________________________________
The G6G Directory of Intelligent Software
--------------------------------------------------
http://www.intelligent-dir.com
--------------------------------------------------
SAIC/ASSET G6G Consulting Group
(304) 284-9000 (310) 458-4187
[email protected] [email protected]
__________________________________________________
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Tue, 18 Feb 1997 14:39:31 -0800
From: Wray Buntine <[email protected]>
Subject: summer students and scientist positions in autonomous data analysis
Please note the two sets of positions below.
Research scientist
2 summer students, or longer term support for PhD
The summer student position could be transferred into
longer term support for focussed PhD research if the
interest is right.
Wray Buntine
======================= Scientist
NASA's Center of Excellence in Information Technology at
Ames Research Center invites candidates to apply for a position as
Research Scientist in Information Technology:
Position description:
* We seek applicants to join a small team of space scientists and
computer scientists in developing NASA's next generation smart spacecraft
with on-board, autonomous data analysis systems. The group includes
leading space scientists (Ted Roush, Virginia Gulick) and leading data
analysts (Wray Buntine, Peter Cheeseman), and their counterparts at JPL.
* The team is doing the research and development required for
the task, and has a multi-year program with deliverables
planned. This is not a pure research position, and requires
dedication in seeing completion of the R&D milestones.
* The applicant will be responsible for the information technology side
of R&D, with guidance from senior space scientists on the project.
* The research has strong links with on-going work at the Center of
Excellence and is an integral part of NASA's long term goals.
Candidate requirements:
* Strong interest in demonstrating autonomous analysis systems to
enhance science understanding in operational tests, with the ultimate
goal of putting such systems in space.
* Ph.D. degree in Computer Science, Electrical Engineering, or related
field, and applied experience, possibly within the PhD. In
exceptional cases, an M.S. degree with relevant work experience will
suffice.
* Knowledge of neural or probabilistic networks, machine learning,
statistical pattern recognition, image processing, science data,
processing, probabilistic algorithms, or related topics is essential.
* Strong communication and organizational skills with the ability to lead
a small team and interact with scientists.
* Strong C programming and Unix skills (experimental, not
necessarily production), with experience in programming mathematical
algorithms: C++, Java, MatLab, IDL.
Application deadline:
* March 15th, 1997 (hardcopy required -- see below).
Please send any questions by e-mail to the addresses below, and type
"PI for Autonomous data analysis" as your header line.
Dr. Ted Roush: [email protected]
Dr. Wray Buntine: [email protected]
Full applications (which must include a resume and the names and addresses
of at least two people familiar with your work) should be sent by surface
mail (no e-mail, ftp or html applications will be accepted) to:
Dr. Steve Lesh
Attn: PI for Autonomous data analysis
Mail Stop 269-1
NASA Ames Research Center
Moffett Field, CA, 94035-1000
============================== Summer students or Student Assistantship
NASA's Center of Excellence in Information Technology at
Ames Research Center invites current PhD students to apply for
a summer position (possibly two available).
Position description:
* We seek applicants to join a small team of space scientists and
computer scientists in developing NASA's next generation of smart
space-craft on-board, autonomous data analysis systems. The group
includes leading space scientists (Ted Roush, Virginia Gulick) and
leading data analysts (Wray Buntine, Peter Cheeseman).
* We are working with spectrometers and a CCD camera, and are
building resource-bounded autonomous classification systems,
and trainable object recognizers.
* The successful student will have considerable flexibility
within the goals of the project to contribute.
* An ideal summer project would produce demonstration software together
with a conference paper.
Candidate requirements:
* Knowledge of neural or probabilistic networks, machine learning,
statistical pattern recognition, image processing, science data,
processing, probabilistic algorithms, or related topics is essential.
* Strong C programming and Unix skills (experimental, not
necessarily production), with experience in programming mathematical
algorithms: C++, Java, MatLab, IDL.
* Interest in revisiting the project at a later date.
Application deadline:
* We will accept applications on a continuing basis until
the beginning of summer, and will take good applicants as they apply.
Please send any questions by e-mail to the addresses below, and type
"PI for Autonomous data analysis" as your header line.
Dr. Ted Roush: [email protected]
Dr. Wray Buntine: [email protected]
Full applications (which must include a resume and the names and addresses
of at least two people familiar with your work) should be sent by surface
mail (no e-mail, ftp or html applications will be accepted) to:
Dr. Steve Lesh
Attn: summer student for Autonomous data analysis
Mail Stop 269-1
NASA Ames Research Center
Moffett Field, CA, 94035-1000
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 21 Feb 1997 14:11:12 -0500
From: [email protected] (Brij Masand)
Subject: KDD Job at GTE Laboratories, Waltham, Ma
**** An Outstanding Applied Researcher/Developer needed for the **********
**** Knowledge Discovery in Databases project at GTE Laboratories **********
Description: Participate in the design and development of
state-of-the-art systems for data mining and knowledge discovery. The
focus of the job is on applied research in KDD, including development
of prototypes to demonstrate innovative business applications of KDD.
The candidate will join one of the leading R&D teams in the
area of data mining and knowledge discovery. Our current projects
include predictive customer modeling for GTE's cellular telephone
markets. We are applying multiple learning and discovery methods to
very large, high-dimensional real-world databases, involving millions
of records and Gbytes of data and have created KDD-based solutions
that are being deployed in the field.
The ideal candidate will have a Ph.D. in Machine Learning or
related fields and 2-3 years of experience, or an M.S. with equivalent
experience. The candidate should have experience with machine
learning algorithms, be familiar with statistical theory, have
practical experience with databases, and be proficient with
Web/Internet tools. Excellent coding skills in C/Unix environment and
an ability to quickly pick up new systems and languages are needed. Good
communication skills, the ability to work in a team, and good coding
and system maintenance practices are very desirable.
GTE Laboratories incorporated, located in Waltham, Ma is the central
research facility for GTE. GTE is among the the largest local
exchange telephone carriers and the second largest mobile service
provider in the United States. Our research facility is located on a
quiet 50 acre campus-like setting in Waltham, MA, 20 minutes from
downtown Boston. Our salaries are competitive, and our outstanding
benefits include medical/life/dental insurance, saving
and investment plans, and an on-site fitness center.
Please send a resume and a cover letter
(preferably by e-mail, in ASCII) to:
[email protected]
or by fax to 617.466.3342 (Attn: Brij Masand)
I will be travelling till Mar 12th and will reply to email responses
after that. thanks! -- Brij Masand ([email protected])
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Subject: Two positions in Machine Learning/Data Mining at GMD
Date: Fri, 28 Feb 97 13:55:06 +0100
From: [email protected]
Two positions in Machine Learning/Data Mining at GMD
GMD's FIT.KI department (the AI research division of the
Institute for Applied Computer Science) is looking to
fill two scientist positions (M.S./Diplom or postdoc level) in the area of
Machine Learning/Data Mining.
We are looking for excellent people with a strong background in one
or both of these areas, preferably combining both theoretical/scientific
and application/software-engineering skills. Applications at both the
postdoctoral and the M.S. level are welcome.
You will be working as a research scientist in one of our current
ML/DM projects, KESO or ILP2, and will be part of FIT's data mining
group consisting of currently 4 people. Scientific work, writing and
presentation of papers, and application and software work will both be
part of your job. M.S. level applicants will be given time to complete their
Ph.D.s while at GMD.
Both positions are to be filled as soon as possible, for a period of initially
two or three years, renewable for up to five years. Salary is according to
the BAT IIa tariff, in the range of approx. DEM 50.000 to DEM 80.000 depending
on age, qualifications, and marital status. For more information about FIT.KI, see
http://nathan.gmd.de, for more information about the ML/data mining group, see
http://nathan.gmd.de/projects/ml/home.html.
If you are interested in such a position, please send your application
material to
Dr. Stefan Wrobel
GMD, FIT.KI
Schloss Birlinghoven
53754 Sankt Augustin
Germany
[email protected]
to be received no later than March 23, 1997 (preferably by paper mail,
but E-Mail is o.k. if otherwise you cannot meet the deadline). Please
include at least a brief curriculum vitae, description of your qualifications,
research experience and future research interests, degree/grade information
(if relevant) and if applicable, a selection of three of your best publications
(full text copy). We are looking forward to your application!
--------------------------------------------------------------
Dr. Stefan Wrobel
GMD -- German Natl. Research Center for Information Technology
FIT.KI, Schloss Birlinghoven, 53754 Sankt Augustin, Germany
Tel.: +49/2241/14-0, Fax: -2889 E-Mail: [email protected]
WWW http://nathan.gmd.de/persons/stefan.wrobel.html
Secr.: D. Boethgen Tel. -2731, E-Mail: [email protected]
|
410.18 | 97:09 | IJSAPL::OLTHOF | Spellchecked Henry Although | Thu Mar 13 1997 10:15 | 576 |
| Knowledge Discovery Nuggets 97:09, e-mailed 97-03-10
News:
* P. Domingo, Re: Looking for phrase matching tool
* R. Jain, Tandem Data Mining Announcement,
http://www.tandem.com
Siftware:
* R. Quinlan, C5.0: Successor to C4.5,
http://www.rulequest.com
Positions:
* P. Norvig, Job offered in information extraction and learning,
data mining, http://www.junglee.com
* M. Bramer, Research Fellowship in Knowledge Discovery
* X. Liu, Research Studentship in Intelligent Data Analysis,
http://web.dcs.bbk.ac.uk/~hui/IDA/home.html
* D. Sleeman, University of Aberdeen, Chair of Computing Science
http://www.csd.abdn.ac.uk/people/chair_fp.html
--
-2345678-2-2345678-3-2345678-4-2345678-5-2345678-6-2345678-7-2345678-
Knowledge Discovery Nuggets is a free electronic newsletter for the
Data Mining and Knowledge Discovery community, focusing on the latest
research and applications.
Submissions are most welcome and should be emailed, with a DESCRIPTIVE
subject line (and a URL) to [email protected].
To subscribe, see http://www.kdnuggets.com/subscribe.html
KD Nuggets frequency is 3-4 times a month.
Back issues of KD Nuggets, a catalog of data mining tools ("Siftware"),
and a wealth of other information on Data Mining and Knowledge Discovery is
available at Knowledge Discovery Mine site http://www.kdnuggets.com/
-- Gregory Piatetsky-Shapiro (editor)
********************* Official disclaimer *****************************
All opinions expressed herein are those of the contributors and not
necessarily of their respective employers, or of KD Nuggets
***********************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There is security, only opportunity
General McArthur
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To: [email protected]
cc: [email protected], [email protected]
Subject: Re: Looking for phrase matching tool
Date: Fri, 28 Feb 1997 13:43:11 -0800
From: "Pedro M. Domingos" <[email protected]>
Alvaro Monge and Charles Elkan of UC San Diego ([email protected],
[email protected]) have one such program. They have a paper in the
proceedings of KDD-96 (p. 267) that describes their system, and also gives
references to other work in the area.
Pedro
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Note: the following is a commercial announcement. GPS]
From: JAIN_ROHIT%t16@fedex
Date: 28 Feb 97 15:08:00 -0600
To: [email protected]
Cc: [email protected], [email protected]
Subject: Tandems's Feb. 11 announcement
Hi folks,
It seems in Nuggets you seem to cover announcements made by many companies.
I am wondering what would be needed on Tandem's part to have you include
that announcement in Nuggets. You can get to the announcement from our home
page at http://www.tandem.com. I have also included parts of it in this
message.
Rohit Jain
Contact:
Kristine Austin
Tandem Computers Incorporated
Tel: +1 (408) 285 6645
World Wide Web Home Page Address: http://www.tandem.com
Tandem Object Relational Data Mining Architecture Drives Next Generation of
Knowledge Discovery
Cupertino, CA February 11, 1997 Tandem. Computers Incorporated today
announced a revolutionary approach in bringing complete knowledge discovery
to business users through its Object Relational Data Mining technology. For
the first time, the complete warehouse data set is available for real-time
data mining, resulting in reduced processing time, more complete results,
and significantly easier management. This new architecture establishes a
standard SQL interface between client data mining tools and both object
relational and relational database engines. The database engine will perform
specialized data manipulation functions required by the data mining algorithms.
Tandem's Object Relational Data Mining architecture takes full advantage of
the capabilities of relational database engines resulting in the ability to
mine larger volumes of data and better performance.
By integrating the best-of-breed data mining software with a relational
database, Tandem's Object Relational Data Mining will enable business
professionals to more effectively uncover and exploit valuable patterns and
trends hidden in their data. This architecture will enhance knowledge
discovery in solutions such as credit card marketing, claims analysis,
retail basket analysis, and others.
The interface between data mining tools and the database engine is enabled
through the use of SQL extensions, ultimately allowing customers to enjoy a
much wider range of data mining clients. Tandem will promote the
establishment of de facto standards for these extensions with other database
vendors and data mining tool providers. "Initially, the use of SQL
extensions will greatly enhance the way traditional alphanumeric data types
are mined today," said Abhay Mehta, Tandem's director of Object Relational
Data Mining Development. As technology evolves, this architecture will
enable the fast, efficient mining of more complex data types such as image,
voice, video, and other multimedia objects. In the second half of 1997,
Tandem's ServerWare database will be the first to combine all of the
elements into a powerful knowledge discovery business environment.
Tandem will be able to build on its success in the data warehouse
marketplace to position itself well in the high-end macromining segment of
the data mining arena, said Dr. Wolfgang Martin, program director, META
Group. Tandem s approach is unique in that it opens up the powerful
ServerWare database, and other database management systems, to a wide range
of data mining functions while accommodating future data mining developments
and complex data types.
Tandem s data mining partners have been selected so that customers can
benefit from their combined breadth of data mining algorithms and for the
ability of their tools to work in a high-performance parallel environment
necessary to take advantage of this new architecture. Data mining partners
include leading companies such as Angoss Software International Limited,
Data Distilleries B.V., Magnify Incorporated, NeoVista Solutions
Incorporated, and Syllogic B.V.
ANGOSS Software International Limited
ANGOSS KnowledgeSEEKER excels in applications including fraud detection,
target marketing, process control, and risk management.
KnowledgeSEEKER displays results in a decision tree format by uncovering
valuable relationships and correlations in the dataset, and by writing
predictive rules. This format can be easily understood by any business end
user. KnowledgeSEEKER turns data into valuable business knowledge.
Data Distilleries B.V.
Data Distilleries Data Surveyor uses highly efficient decision tree based
search strategies and database optimization techniques, enabling it to take
into account hundreds of variables to mine finance, retail, insurance, and
database marketing databases. At the end of the data mining process, Data
Surveyor produces a graphical representation of the discovered relationships
and an overview of all actions and results during the mining process.
Magnify Incorporated
Magnify s PATTERN software is an open set of modular software tools for
mining, managing, and analyzing very large data sets. The PATTERN system
includes several specialized applications, such as PATTERN:Detect for
detecting fraud, anomalies, and rare events and PATTERN:Profit for
predicting the delinquency, bankruptcy, credit usage, and profitability of
customers. The PATTERN system incorporates algorithms for parallel and
distributed variants of classification, regression, and optimization trees,
and a variety of other data
mining algorithms.
NeoVista Solutions Incorporated
NeoVista Solutions Decision Series suite of knowledge discovery tools are
directed towards solving data mining challenges in a variety of markets,
including retail, insurance, telecommunications, and healthcare.
The Decision Series suite includes pattern discovery tools based on neural
networks, clustering, genetic algorithms, and association rules.
Syllogic B.V.
The Syllogic Data Mining Tool supports all stages in the data mining
process, including data selection, data cleaning, enrichment, coding,
discovery, and visualization. Using a toolbox approach, the tool combines
various database analysis techniques, such as decision trees, association
rules, k-nearest neighbor, clustering, and visualization to solve business
challenges in the finance, transportation, government, and system and
network management segments.
To help customers stay on the leading edge of data mining, Tandem is also
partnering with key universities such as Simon Fraser University in order to
benefit from the results of their on-going research. This alliance includes
parallelizing existing and next-generation data mining algorithms and
techniques.
Tandem is making a major investment in data mining and in driving its
widespread deployment as a business tool, said Bill Heil, senior vice
president and general manager of Tandem s ServerWare business unit. By
focusing on the Tandem ServerWare database engine and partnering with
best-of-breed solutions providers and researchers, we are able to supply
customers with the industry s most advanced and comprehensive range of data
mining solutions. What we are offering is an extensible approach designed to
keep customers at the forefront of the latest developments in knowledge
discovery.
Availability
Tandem s Object Relational Data Mining solutions will be available starting
in the third quarter of 1997. With these solutions, customers will be able
to take advantage of the industry s most scalable performance for mining
databases residing on either Microsoft. Windows NT.
Server based platforms (including Tandem s recently introduced S-series
servers based on Windows NT Server) or on Tandem s massively scalable
NonStop. Himalaya. servers.
About Tandem
Founded in 1974, Tandem Computers Incorporated designs and delivers
technology solutions that companies rely on to compete in a business world
that runs 24 hours a day. A US$1.9 billion company headquartered in
Cupertino, California, Tandem has offices, strategic partners, and providers
in more than 50 countries around the world.
Tandem, Himalaya, NonStop, Object Relational Data Mining, ServerWare, and
the Tandem logo are trademarks or registered trademarks of Tandem Computers
Incorporated in the United States and/or other countries. Microsoft and
Windows NT are either trademarks or registered trademarks of Microsoft
Corporation in the United States and other countries. All other brand or
product names are trademarks or registered trademarks of their respective
companies.
Contact:
Kristine Austin
Tandem Computers Incorporated
Tel: +1 (408) 285 6645
World Wide Web Home Page Address: http://www.tandem.com
Tandem Introduces Object Relational Data Mining Solutions and Services for
Vertical Markets
Business-driven offerings target card marketing, micromerchandising,
claims analysis, and other key applications
Cupertino, CA February 11, 1997 Applying its vertical market expertise and
new Object Relational Data Mining architecture to real-world business
problems, Tandem. Computers Incorporated today launched a series of Object
Relational Data Mining solutions packages for card marketing,
micromerchandising, and insurance claims analysis. Tandem also announced new
consulting services designed to allow companies to quickly enjoy low-risk,
discovery-driven decision making.
The solutions and services are based on Tandem s revolutionary new Object
Relational Data Mining architecture. This enables customers to efficiently
mine their entire database, not merely samples, for useful patterns and
trends. The result is a more effective realization of the full business
value of data. Object Relational Data Mining solutions add significant new
functionality to customer segmentation and predictive modeling techniques,
said Jonathan Kalman, managing director of MRJ Technology Solutions, a
leading specialty systems integrator. Tandem is taking a profoundly
different approach by integrating its powerful database, capable of handling
an entire organization s data, with leading data mining tools.
Delivering full value of business data
The new solutions packages will be comprised of the cross-platform Tandem
ServerWare, database, appropriate integrated data mining and other analysis
tools from leading solutions partners, Tandem S-series massively scalable
Himalaya. and/or Microsoft. Windows NT.
Server based hardware platforms, application and reporting templates, data
models, and Directional Consulting services. Though specially tested and
packaged, the solutions are all easily customizable. Initial solutions include:
Card Marketing
Aimed at card acquirers and issuers, this solutions package applies Object
Relational Data Mining architecture and other decision support technology to
improve the effectiveness of cardholder retention and acquisition efforts.
This provides a better understanding of when certain customers are likely to
leave and why, leading to more effective customer segmentation, increased
response rates to marketing promotions, and improved margins through
targeted product development and pricing.
Micromerchandising
This package enables retailers to mine immense volumes of detailed
merchandising data, resulting in improved in-stock positions, reduced
markdowns by better understanding buying patterns and trends, enhanced
promotional effectiveness, and improved store profitability through more
precise forecasting.
Claims Analysis
Aimed at insurance providers looking to contain underwriting costs and
improve loss ratios, this package uses Object Relational Data Mining
technology to support new product development, fraud profiling and
detection, better service provider alliances, and more exact underwriting
experience comparisons.
Immediate customer reaction to these benefits is positive. Said Juan
Verastigui, director of Claims System Development at USAA, a leading
insurance company, Tandem s Object Relational Data Mining architecture and
the way it leverages the parallel ServerWare database will provide USAA with
the ability to derive full value from all our claims data, and not just
subsets. The resulting faster and more complete answers to our business
queries will have a very positive effect on our bottom line.
Looking ahead, Object Relational Data Mining architecture will enable the
mining of complex data types that include voice, video and images.
Said MRJ s Jonathan Kalman, Object Relational Data Mining solutions provide
immediate value with traditional data types, and extensibility to meet
future multimedia analysis needs.
Directional Consulting, new Object Relational Data Mining services
Tandem s Directional Consulting services are an integral part of the new
solutions packages and are also available separately. These services define
a low-risk, high-return methodology proven over many Tandem based data
warehousing implementations for exploring and understanding how data mining
can support particular business initiatives.
Directional Consulting services use a phased approach to having data mining
production environments up and running within 90 days. The process begins
with establishing priorities for implementation of Object Relational Data
Mining and proceeds to a proof of concept phase to verify that the
selected data mining solutions will meet expectations.
System design, data modeling, and implementation then follow, culminating
with the establishment of a robust, scalable operational environment that
supports application evolution and growth.
Availability
Tandem Card Marketing, Micromerchandising, and Claims Analysis solutions
will be available beginning in the first quarter of 1997. These will be
enhanced to take advantage of Object Relational Data Mining technology in
the third quarter of 1997.
About Tandem
Founded in 1974, Tandem Computers Incorporated designs and delivers
technology solutions that companies rely on to compete in a business world
that runs 24 hours a day. A US$1.9 billion company headquartered in
Cupertino, California, Tandem has offices, strategic partners, and providers
in more than 50 countries around the world.
Tandem, Himalaya, NonStop, Object Relational Data Mining, ServerWare, and
the Tandem logo are trademarks or registered trademarks of Tandem Computers
Incorporated in the United States and/or other countries. Microsoft and
Windows NT are either trademarks or registered trademarks of Microsoft
Corporation in the United States and other countries. All other brand or
product names are trademarks or registered trademarks of their respective
companies.
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 5 Mar 1997 23:31:07 -0500 (EST)
From: [email protected] (Ross Quinlan)
Subject: Successor to C4.5
I have developed a new inductive program called C5.0. Its main advantages are:
* new, faster methods for generating rules
* support for boosting
* optional non-uniform misclassification costs
Further information and free demonstration versions are available from
http://www.rulequest.com
Ross Quinlan
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 28 Feb 1997 15:33:40 -0800
From: [email protected] (Peter Norvig) Organization: Junglee Corp.
To: [email protected], [email protected], [email protected],
[email protected], [email protected], [email protected]
Subject: Job offered in information extraction and learning, data mining
Junglee is looking for full-time employees and summer interns to work on
information discovery and data mining from text documents. We're
looking for creative hard-working people with experience in some of the
following: agents, databases, information extraction, parsing, regular
expressions, language design, statistics, machine learning, and GUI
design.
Junglee develops Internet and Intranet information technology for the
future and pushes it to market today. Technology that raises eyebrows
and drops barriers. Founded in 1996 by four PhD students from the
Stanford University Computer Science Department and a Silicon Valley
veteran, Junglee Corporation has excellent funding, high-profile
customers, and a strong revenue plan.
Our Virtual DataBase (VDB) engine is fueled by our ability for data
source description, extraction, and attribute mapping. Imagine
capturing data from hundreds of disparate unstructured web sites,
mixing that with data from other heterogeneous, distributed database
and non-database sources and turning it all into a relational aggregate
with the power of full SQL queries and the ease and portability of
HTML user interfaces. We call these applications PALs - powerful
information sites where people can ask for and get an answer.
Several of our PALs are up on the web today at www.junglee.com and
www.washingtonpost.com; we are currently building more of them for
some well-known companies.
One of the key aspects of the technology is discovering/mining
information from text. The project is lead by Peter Norvig who has done
extensive work on Natural Language Processing, Machine Learning, and
other Artificial Intelligence problems. While this project involves
significant ground-breaking research, it is definitely a development
project, not just research.
Please send responses to [email protected] or by fax to 408-522-9470
and mention this posting.
--
Peter Norvig [email protected] Junglee Corporation
phone: 408-522-9482 1250 Oakmead Parkway fax: 408-522-9470 Suite 310
http://www.junglee.com Sunnyvale CA 94086 http://www.norvig.com
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "Max Bramer" <[email protected]> Organization: University of
Portsmouth
To: [email protected], [email protected],
[email protected],
[email protected], [email protected], [email protected]
Date: Sat, 1 Mar 1997 17:05:45 +0000
Subject: Research Fellowship in Knowledge Discovery
Reply-to: [email protected]
UNIVERSITY OF PORTSMOUTH
DEPARTMENT OF INFORMATION SCIENCE
RESEARCH FELLOWSHIP IN KNOWLEDGE DISCOVERY
Salary: stlg17,472 - stlg20,381 (Pay award pending)
Closing Date: 21 March, 1997
(Note: This is an extension to the previously announced closing date.)
Reference: RTEC 0149 (G)
Applications are invited for a two-year Research Fellowship in the
Department of Information Science to commence as soon as possible.
The successful candidate will work closely with Professor Max Bramer (Head
of the Department of Information Science) to develop research in the area of
Knowledge Discovery and Data Mining. The Department currently has projects
in the sub-areas of automatic induction of classification rules from
examples, Case Based Reasoning, Neural Networks, Genetic Algorithms and
related statistical techniques.
Applicants should have a good honours degree in Computer Science or related
subject. Preference will be given to candidates who have (or expect soon to
receive) a higher degree in a relevant discipline.
Relevant commercial experience would also be an advantage.
Informal enquiries may be made to Professor Bramer, either by telephone
(01705) 844444 or by electronic mail ([email protected]), or to Simon
Thompson on (01705) 844097 ([email protected]). Further information
about the department is also available from the World Wide Web at
http://www.sis.port.ac.uk.
Further particulars are available from:
Personnel Office
University House
Winston Churchill Avenue
Portsmouth PO1 2UP
England
Telephone (01705) 843421 (24 hour answerphone) E-mail: [email protected]
http://www.port.ac.uk/
IMPORTANT NOTE: All applications should be sent (preferably on paper not by
email) to the Personnel Office NOT to the Department of Information Science.
_______________________________________________________
Professor Max Bramer
Department of Information Science
University of Portsmouth
Milton, Southsea PO4 8JF, England
Tel: +44-(0)1705-844444 Fax: +44-(0)1705-844006 email:
[email protected]
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected] (Xiaohui Liu)
Date: Tue, 4 Mar 97 12:17:57 GMT
To: [email protected]
Subject: Re: EPSRC CASE Research Studentship in Intelligent Data Analysis
BIRKBECK COLLEGE
DEPARTMENT OF COMPUTER SCIENCE
UNIVERSITY OF LONDON
EPSRC CASE Research Studentship in Intelligent Data Analysis
Applications are invited for an EPSRC CASE PhD studentship, within the
Intelligent Data Analysis (IDA) Group, at the Department of Computer
Science, Birkbeck College. The three-year studentship is for the
investigation of intelligent data analysis techniques for research
problems in process industries, funded by Honeywell Hi-Spec
Solutions, UK and Honeywell Technology Center, USA. The successful
candidate will have a tax-free salary of at least 10,000 pounds (there are
experience, age-related and dependants additions), and will be expected to
work on a joint research project between Birkbeck and Honeywell on "Causal
Modeling for Time Series Data".
The IDA Group at Birkbeck conducts research into the application of
computationally intelligent techniques to data analysis problems.
The group has enjoyed successful collaboration with several external
organisations in industry and medicine on a variety of IDA research
projects, funded by government agencies, industrial sponsorships and
charity organisations. The group is to host the second IDA conference in
London this August.
Applicants should have at least a 2(i) in Computer Science or related
subject, with a good background in Artificial Intelligence or Statistics,
or a 2(i) in Chemical Engineering with strong computing background.
Please submit a CV as soon as possible, but not later than 31 March 1997,
to Dr X Liu, Department of Computer Science, Birkbeck College, Malet
Street, London WC1E 7HX, UK. Phone Dr Liu on 0171-631 6711 or email him
([email protected]) if you wish to make an informal enquiry.
Information regarding this project and research activities of the IDA
Group at Birkbeck can be accessed on the World Wide Web via URL:
http://web.dcs.bbk.ac.uk/~hui/IDA/home.html
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Derek Sleeman <[email protected]>
Date: Sun, 2 Mar 1997 15:01:52 GMT
To: [email protected], [email protected], [email protected],
[email protected], [email protected], [email protected], [email protected]
Cc: [email protected]
Subject: CHAIR VACANCY (for Posting)
Announcement of Post (Closing date: early MARCH)
University of Aberdeen
Chair of Computing Science
Applications are invited for the post of Professor of Computing
Science. The new Professor will play a key role in strengthening
the teaching and research activities of the Department of
Computing Science. The new Professor will provide academic
leadership in the development of the Department's existing areas
of interest, Artificial Intelligence and Databases. Candidates
should have an international reputation with an excellent record
of innovative research as measured by publications and grant
income. Applications from academics, research managers and others
from Industry and public sector Institutions will be considered.
Further, as the University of Aberdeen has recently made a major
research investment in the Institute of Medical Sciences, it would
be an advantage if the person had experience of working with
Medical/Healthcare professionals. The person appointed will be
expected to acquire a significant role in the management of the
Department.
Informal enquiries may be directed to Professor A R Forrester,
Vice-Principal and Dean of the Faculty of Science and Engineering:
Email: [email protected]
Tel: +44 (0)1224 272081
Fax: +44 (0)1224 272082
More details of the Department's research activities can be found
on our research pages at http://www.csd.abdn.ac.uk/research/index.html
or contact Professor Derek Sleeman, Head of Department:
Email: [email protected]
Tel: +44 (0)1224 272295/6
Fax: +44 (0)1224 273422
For further particulars of this post, see:
http://www.csd.abdn.ac.uk/people/chair_fp.html
|
410.19 | 97:10 | IJSAPL::OLTHOF | Spellchecked Henry Although | Fri Mar 21 1997 14:47 | 793 |
| Knowledge Discovery Nuggets 97:10, e-mailed 97-03-19
News:
* J. Brown, Report on DM Summit in San Francisco, Feb 18-21, 1997
* B. Pearlmutter, Abbadingo One: DFA Learning Competition
http://abbadingo.cs.unm.edu/
Siftware:
* K. Schirmer, smart information services GmbH,
http://www.newscan-online.de
Positions:
* G. John, IBM DATA MINING ANALYST POSITIONS,
http://www.ibm.com/bi
* B. Perry, HRL Job Opening: Research Intern/Parttime (KDD, DAI, Java)
http://www.wins.hrl.com
Meetings:
* M. Bramer, Expert Systems 97: Call for Papers
http://www.sis.port.ac.uk/sges/es97.html
* M. Smyth, Hinton-Jordan Learning Methods Tutorial, May 1997,
http://www.ai.mit.edu/projects/cbcl/web-pis/jordan/course/
* L. De Raedt, Final call for IJCAI-97 Workshop on
Frontiers of inductive logic programming
* S. Dzeroski, ILP-97: CFP Reminder
http://www-ai.ijs.si/SasoDzeroski/ilp97.html
--
Knowledge Discovery Nuggets is a free electronic newsletter for the Data Mining
and Knowledge Discovery community, focusing on the latest research and
applications.
Submissions are most welcome and should be emailed, with a DESCRIPTIVE subject
line (and a URL) to [email protected]. Please keep meeting announcements
short and put all the details on the meeting web page !
To subscribe, see http://www.kdnuggets.com/subscribe.html
KD Nuggets frequency is 3-4 times a month.
Back issues of KD Nuggets, a catalog of data mining tools ("Siftware"), and a
wealth of other information on Data Mining and Knowledge Discovery is available
at Knowledge Discovery Mine site http://www.kdnuggets.com/
-- Gregory Piatetsky-Shapiro (editor)
********************* Official disclaimer ************************************
All opinions expressed herein are those of the contributors and not necessarily
of their respective employers or of KD Nuggets
*****************************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Knowledge is the antidote to fear
Ralph Waldo Emerson
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 17 Mar 1997 21:12:01 -0600
From: "J.P.Brown" <[email protected]>
Subject: Second Annual Data Mining Summit
The Second Annual Data Mining Summit was held, February 19-21, 1997,
at the San Francisco Regency Hyatt. As I was not at every session, this
is a generalization - no names, no pack drill.
The majority of the delegates were from the United States and Canada.
Nine other countries were represented, from Europe, South America and
Asia. There were presentations all the way from the "Biggies" to the
"Start-Ups". From the Past to the Present, there were papers on
specific Data Mining techniques, and much reliance on subjective
approaches. A thought-provoking paper with present-day relevance
covered the Public Perception of Data-Mining. From the Present to the
Future, there were extensions to accepted ideas and some concepts
moving toward a more controversial emphasis on objectivity.
The Basics, and some Specialties, were covered in detail, and
attention was paid to the Dimensions of Decision Support and to
On-Line Analytical Processing, both subjects of great importance.
Some intensely practical, no-nonsense success stories were presented,
and some novel perspectives on iterative "living" processes.
As well as successful Data Mining examples, Limitations, Challenges
and Possible Pitfalls were pointed out. Solutions were suggested.
Before these demonstrably useful techniques can become the work
horses of the future, a new generation of Tool Support must prove
itself to be effective. This has begun to happen, and the competition
between these new user-friendly applications will be interesting to
participate in.
Little attention to variations with passage of time, could be noted.
There seems to be a prevalent assumption that "situations" will not
change. This is "writing the history of the future" as opposed to the
approach which starts off by "predicting the past", and then keeps a
constant, trigger-happy lookout for significant change.
The approaches which were considered, varied from simple functions,
to Algorithms, to Genetic Algorithms. Complex hybrid populations
could be separated in several ways. Rules could be used, and
Artificial Neural Nets. Agents could do it, if they were made to be
versatile enough. Visualization was important because we can "think
with our eyes". Some of you will know that I am of the "all of the
above" school.
>From my own personal point of view the Data Mining Summit was
encouraging. The next move will be to put the pieces together, and to
consciously emphasize our goals. Those who want to know more about
the "all of the above" school, could try http://www.hal-pc.org/~jpbrown
and then let me know what they think.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Sun, 9 Mar 97 23:45 MST
From: "Barak Pearlmutter" <[email protected]>
To: [email protected]
Subject: Abbadingo One: DFA Learning Competition
Thought database miners might want to whet their teeth on these little datasets. Although neither as big nor as lucrative as the big boys, they are a bit more controlled, and give an opportunity to test an algorithm against all the competition.
Abbadingo One: DFA Learning Competition
Announcement
&
Call for Participation
In order to encourage the development of better grammar induction
algorithms, the Abbadingo One competition will award at least $1,024 to
the designer of the system that is most successful at discovering the
structure of random deterministic finite automata, as assessed by a
graded series of nine benchmark problems. The competition ends on
15-Nov-1997.
This competition is being sponsored by, among others,
* The Computer Science Department at the University of New Mexico,
which is providing computational support.
* The Kluwer Academic journal "Machine Learning," which will give
priority treatment to a paper describing the award winning algorithm.
* The Santa Fe Institute, which will host the award ceremony.
* The "Journal of Artificial Intelligence Research."
For details retrieve http://abbadingo.cs.unm.edu/
Good luck, and may the best algorithm win!
--
Competition Kevin J. Lang <[email protected]>
organizers: Barak A. Pearlmutter <[email protected]>
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[The following is a commercial announcement. GPS]
Date: Tue, 11 Mar 1997 21:30:46 +0100
From: Kai Schirmer <[email protected]>
Subject: smart information services GmbH
Hello!
We would like to introduce ourselves and are interested in being listed
in your company overview on data mining and knowledge discovery.
Formed in early 1995, smart information services GmbH is located in
Potsdam near Berlin in Germany. The company's activities centers in
application development, service and research using advanced information
technologies in the areas of Intelligent Information Retrieval.
Smart information is currently developing a news categorizing and
filtering system (newscan) using advanced text processing techniques.
Further activities focus on fact extraction from financial news and
automated classification of news from business news wires for signaling,
filtering and routing tasks.
The newscan news filtering system and service offers business
professionals a smartest, easy and cost-effective way of gaining current
awareness in a rapidly changing world. A true knowledge exchange
company, smart information provides electronic information services
which intelligently interconnect content providers and subscribers.
Its interactive, customized services include newscan for corporate
workgroups and enterprises. Newscan is a premium business intelligence
service customized to the specific needs of clients that focuses on the
industry news that's critical to their business. It provides customers
with "custom-tailored" news based on a profile that describes their
markets, news needs and specialized interests. Using advanced filtering
techiques, newscan selects highly relevant news by scanning some 3,000
to 4,000 German and English news daily and delivers only those relevant
to each customer in time for each business day.
Smart information is partner in the Esprit project ECRAN. ECRAN will
develop a new generation of Information Extraction (IE) applications, to
be included in telematic services having a large textual content. ECRAN
will analyse free texts (initially, financial information from
specialised newswire services, and market information on the internet)
extracting information content. The information can be compared against
a model of user requirements so that the system can precisely identify
text of interest to a customer.
By using the results of the ECRAN project specific financial, economic
and political information from standardised news will be extracted and
stored in a database format. The information extraction is based on
lexicon tuning technologies and sophisticated template handling. Once
stored in a database format the extracted facts can be analysed in
combination with time series.
Currently smart information is preparing a European research project on
information mining in heterogeneous environments. The main ideas are
described in the following.
In the past few years, the abundance of continuous data sources, the
connectivity allowed by local and worldwide public and private networks,
and the continuous decrease of the bandwidth/price ratio, have been
subject to a steady growth at explosive rates, and this trend has shown
no sign of decline ever since. Thus, staggering amounts of new
information are continuously made available to private users, business
firms and professional operators. Extracting the information relevant
for a given business or position from an overwhelming flood of data, and
being able to use it for tactical and strategical planning, as well as
decision support on the fly, is vital for business survival and
leadership, but it is getting less and less amenable of human handling.
On the other hand, an ever increasing part of current information fluxes
passes through computer networks, which makes them amenable of automatic
filtering, processing and interpretation. Both situations concur to
demonstrate both the need and the feasibility of systems that filter and
integrate information from different data sources, sometimes being
static and well structured (legacy Data bases), sometimes dynamic and
with a variable degree of standardization, from rigidly defined records,
to multimedia documents, to free text, speech, images.
Please link to our web-site "www.newscan-online.de".
Yours sincerely
Kai Schirmer
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Tue, 11 Mar 1997 20:43:00 -0800 (PST)
From: George John <[email protected]>
Subject: IBM DATA MINING ANALYST POSITIONS (please post/redistribute)
IBM DATA MINING ANALYST POSITIONS (please post/redistribute)
Help! We're drowning in work! IBM needs 10 more analysts for its
highly successful data mining group. Join our team of high-caliber
PhD's in an exciting multi-faceted career in data mining:
* Analyze data for customers using IBM's industry-leading data mining
products
* Interact directly with senior management at Fortune 500 companies
* Teach data mining classes to our customers and develop course materials
* Travel, see the world! (One member of our team just got back
from Paris, another is heading to Australia for two weeks... these
are not vacations, it's their job!)
* Interact with researchers and product developers, discuss ideas for
new data mining algorithms, new visualizations, and new features
for our products
* Assist sales reps in customer visits, be the "technical person" to
answer hard questions
* Work with the marketing group to help develop brochures, etc.
* Attend trade shows and conferences, learn more about the industry
and talk to customers
* Use SQL/AWK/PERL/SAS to process data (ooh, the excitement!)
The ideal candidate
* has an excellent understanding of the data analysis process and has
participated in several projects
* is strongly technically proficient in at least some areas of data
mining (background in statistics, machine learning, neural nets, or
pattern recognition, or related), with a desire to learn more
* has excellent communication and presentation skills
* is a self-starter, good at quickly becoming a productive member of
a team
* is a fast learner, can quickly become an expert in a new industry
and work with IBM consultants to productively apply data mining
* has some unix skills, knows enough AWK and PERL to be self-sufficient
in processing data
* has a good sense of humor, fun to work with, enjoys taking co-workers
out to dinner, insists on paying every time, etc...
Positions are available for both senior applicants (professors, PhD's,
MBA's, or 4+ years relevant business experience) and more junior members
(MS, BS, less job experience). Salaries are competitive, and based on
experience. The jobs are focused on business, but some amount of time
spent on research may be negotiated. IBM's data mining group is growing
quickly, and offers excellent career opportunities.
For more information on data mining at IBM, see the webpage for IBM
Global Business Intelligence Solutions (our parent organization) at
http://www.ibm.com/bi
Send resume to George H. John, [email protected].
ASCII (plain text) via email is *strongly* preferred.
Please put "DMJOBS-97:" then your name in the subject.
Hardcopy may be sent to
George H. John
IBM Alamden Research Center
650 Harry Rd / D2
San Jose, CA 95120-6099
FAX: 408-927-2100 (put "Attn: George John" on cover sheet)
IBM is an equal opportunity employer.
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 12 Mar 1997 15:25:34 -0800
From: [email protected] (Brad Perry)
Subject: HRL Job Opening: Research Intern/Parttime (KDD, DAI, Java) http://www.wins.hrl.com
Subject: HRL Job Opening: Research Intern/Parttime (KDD, DAI, Java)
We are currently looking to fill an intern, or part-time, PhD candidate at Hughes Research Laboratories (HRL). The position will be a summer intern capable of extending into a part-time position during the school year. HRL is located in Malibu, CA and represents the central research lab for Hughes Electronics Corporation.
Our group is investigating the use of agent, data mining, and database technologies to support information management, discovery, and analysis in large-scale dynamic Internet environments.
Our two primary research areas involve:
* Information exploitation techniques to effectively identify and disseminate semantically relevant information to large user populations, especially with the use of satellite broadcast channels.
* Data mining techniques to extract, represent, and manipulate semantic cues from large-scale and distributed information sources.
The candidate should have a background in DAI, agent architectures, machine learning, and data mining. Experience with KQML, KIF, and/or Java a definite plus. This position entails research and prototype development.
Required:
* PhD candidate in Computer Science (or related field)
* Good OO programming skills (implementation of prototypes will
be required).
* Unix programming background.
* Good oral and written communication skills.
Desirable:
* Machine Learning or Data Mining background
* Java programming experience (or C/C++, at least).
* Ontologies.
* Multidatabase systems.
* Distributed object systems (CORBA, RMI, etc.)
Please email your resume to Son Dao at [email protected], or mail to:
Son Dao
Hughes Research Laboratories
3011 Malibu Canyon Road
Malibu, CA 90265
HRL is an equal opportunity employer.
------
Brad Perry
Hughes Research Laboratories [email protected] (310) 317-5683
UCLA [email protected] (310) 206-4561
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "Max Bramer" <[email protected]>
To: [email protected], [email protected], [email protected],
[email protected], [email protected], [email protected]
Date: Sun, 9 Mar 1997 17:05:52 +0000
Subject: Expert Systems 97: Call for Papers
Reply-to: [email protected]
BRITISH COMPUTER SOCIETY
SPECIALIST GROUP ON EXPERT SYSTEMS
ANNUAL CONFERENCE - EXPERT SYSTEMS '97 (ES97)
CALL FOR PAPERS
The 17th annual Conference of the British Computer Society Specialist Group
on Expert Systems, ES97, is being held at St. John's College, Cambridge
between 15th and 17th December 1997. The objective of the ES series of
conferences is to bring together researchers and application developers
from business, industrial and academic communities to discuss issues and
solutions to problems based on techniques derived from Artificial
Intelligence.
The Conference continues to build on the success of previous years, with a
two-track event containing fully refereed technical and applications
papers.
For the Technical Stream, contributions are invited in the form of papers
of up to 5,000 words on knowledge-based systems and related areas of
Artificial Intelligence. Papers representing original work on theoretical
and applied AI relating to: constraint satisfaction; intelligent agents;
knowledge engineering methods; machine learning; model-based reasoning;
verification and validation of KBS; natural language understanding;
case-based reasoning, knowledge discovery in databases and other related
areas are welcome.
For the Applications Stream, contributions are invited in the form of
papers of up to 5,000 words presenting case studies of knowledge based
systems that address real-world problems such as: diagnosis, monitoring,
scheduling and selection. Most importantly, the papers should highlight the
critical elements of success and the lessons learned.
Papers submitted to both streams will be refereed and those accepted will
again be published in book form in the "Research and Development in Expert
Systems" and "Applications and Innovations in Expert Systems" series (for
the technical and application streams respectively).
To assist us with our planning of the conference, anyone intending to
submit a paper should provide a short abstract, with title, at the earliest
opportunity to the Conference Secretariat.
Authors should indicate the stream to which their papers are being
submitted. Please include your full name and postal address in any email
submissions.
Formatting instructions for papers will be sent as soon as the title and
abstract are received.
Four copies of papers should be submitted to arrive no later than Friday
20th June 1997. Submissions should be sent in paper form by post to the
Conference Secretariat.
Please note that presenters of submitted papers will be asked to cover
their costs of attending the conference by paying at the SGES members'
academic rate.
TUTORIALS & WORKSHOPS
The Conference Committee invites proposals for tutorials or workshops to be
presented on Monday 15 December. Proposals for full and half day tutorials,
from an individual or group of presenters should be directed in the first
instance to the Conference Secretariat.
EXHIBITION
A table top exhibition will run alongside the Conference. There will be a
limited number of spaces available and potential exhibitors are encouraged
to book early, as these will be on a first-come, first-served basis.
SPONSORSHIP
The Conference Committee is keen to make contact with any organisations who
may wish to sponsor the Conference, in whole or in part. Sponsorship of an
international conference such as ES97 will ensure the highest visibility
for the benefactor, both through the appearance of the company logo on all
promotional literature and in references to the Conference in all media
exposure prior to and after the event.
CONFERENCE COMMITTEE:
Conference Chair: Prof Max Bramer, University of Portsmouth, Southsea, PO4
8JF [email protected]
Deputy Conference Chair: Dr Ian Watson, University of Salford, Salford, M5
4WT [email protected]
Technical Programme Chair: Dr John Hunt, University of Wales, Dept of
Computer Science, Aberystwyth, Dyfed SY23 3DB [email protected]
Applications Programme Chair: Mrs Ann Macintosh, Artificial Intelligence
Applications Institute, Edinburgh, EH1 1HN [email protected]
CONFERENCE SECRETARIAT:
Ms. Kit Stones, The Conference Team
17 Spring Road
Kempston, Bedford MK42 8LS
Tel/Fax +44 (0)1234-302490
[email protected]
IMPORTANT DATES:
Title/Abstract notification: now
Full paper submission: 20 June 1997
Notification of acceptance: 8 August 1997
Camera ready papers due: 19 September 1997
WORLD WIDE WEB ADDRESS FOR CONFERENCE INFORMATION:
http://www.sis.port.ac.uk/sges/es97.html
_______________________________________________________
Professor Max Bramer
Department of Information Science
University of Portsmouth
Milton, Southsea PO4 8JF, England
Tel: +44-(0)1705-844444 Fax: +44-(0)1705-844006
email: [email protected]
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Marney Smyth <[email protected]>
Subject: Hinton-Jordan Learning Methods Tutorial, May 1997
Date: Mon, 10 Mar 1997 06:09:19 -0500 (EST)
**************************************************************
*** ***
*** Learning Methods for Prediction, Classification, ***
*** Novelty Detection and Time Series Analysis ***
*** ***
*** Washington, D.C., May 2 -- 3, 1997 ***
*** ***
*** Geoffrey Hinton, University of Toronto ***
*** Michael Jordan, Massachusetts Inst. of Tech. ***
*** ***
**************************************************************
A two-day intensive Tutorial on Advanced Learning Methods will be held
May 2 -- 3rd, 1997, at the Hyatt Regency on Capitol Hill, Washington
D.C. Space is available for up to 50 participants for the course.
The course will provide an in-depth discussion of the large collection
of new tools that have become available in recent years for developing
autonomous learning systems and for aiding in the analysis of complex
multivariate data. These tools include neural networks, hidden Markov
models, belief networks, decision trees, memory-based methods, as well
as increasingly sophisticated combinations of these architectures.
Applications include prediction, classification, fault detection, time
series analysis, diagnosis, optimization, system identification and
control, exploratory data analysis and many other problems in
statistics, machine learning and data mining.
The course will be devoted equally to the conceptual foundations of
recent developments in machine learning and to the deployment of these
tools in applied settings. Case studies will be described to show how
learning systems can be developed in real-world settings. Architectures
and algorithms will be presented in some detail, but with a minimum of
mathematical formalism and with a focus on intuitive understanding.
Emphasis will be placed on using machine methods as tools that can be
combined to solve the problem at hand.
WHO SHOULD ATTEND THIS COURSE?
The course is intended for engineers, data analysts, scientists,
managers and others who would like to understand the basic principles
underlying learning systems. The focus will be on neural network models
and related graphical models such as mixture models, hidden Markov
models, Kalman filters and belief networks. No previous exposure to
machine learning algorithms is necessary although a degree in
engineering or science (or equivalent experience) is desirable. Those
attending can expect to gain an understanding of the current
state-of-the-art in machine learning and be in a position to make
informed decisions about whether this technology is relevant to specific
problems in their area of interest.
COURSE OUTLINE
Overview of learning systems; LMS, perceptrons and support vectors;
generalized linear models; multilayer networks; recurrent networks;
weight decay, regularization and committees; optimization methods;
active learning; applications to prediction, classification and control
Graphical models: Markov random fields and Bayesian belief networks;
junction trees and probabilistic message passing; calculating most
probable configurations; Boltzmann machines; influence diagrams;
structure learning algorithms; applications to diagnosis, density
estimation, novelty detection and sensitivity analysis
Clustering; mixture models; mixtures of experts models; the EM
algorithm; decision trees; hidden Markov models; variations on hidden
Markov models; applications to prediction, classification and time
series modeling
Subspace methods; mixtures of principal component modules; factor
analysis and its relation to PCA; Kalman filtering; switching mixtures
of Kalman filters; tree-structured Kalman filters; applications to
novelty detection and system identification
Approximate methods: sampling methods, variational methods; graphical
models with sigmoid units and noisy-OR units; factorial HMMs; the
Helmholtz machine; computationally efficient upper and lower bounds for
graphical models
REGISTRATION
Standard Registration: $700
Student Registration: $400
Cancellation Policy: Cancellation before Friday April 25th, 1997, incurs
a penalty of $150.00. Cancellation after Friday April 25th, 1997, incurs
a penalty of one-half of Registration Fee.
Registration Fee includes Course Materials, breakfast, coffee breaks, and lunch.
On-site Registration is possible. Payment of on-site registration must
be in US Dollar amounts, by Money Order or Check (preferably drawn on a
US Bank account).
Those interested in participating should return the completed
Registration Form and Fee as soon as possible, as the total number of
places is limited by the size of the venue.
[edited for space]
ADDITIONAL INFORMATION
A registration form and hotel information
are available from the course's WWW page at
http://www.ai.mit.edu/projects/cbcl/web-pis/jordan/course/
Marney Smyth
E-mail: [email protected]
Phone: 617 258-8928
Fax: 617 258-6779
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 14 Mar 1997 16:47:02 +0100 (MET)
From: Luc De Raedt <[email protected]>
To: [email protected], [email protected]
Subject: Final CFP Frontiers of ILP Workshop at IJCAI
FINAL CALL FOR PARTICIPATION and PAPERS
IJCAI-97 Workshop on
FRONTIERS OF INDUCTIVE LOGIC PROGRAMMING
Monday 25 August 1997
==========================================================================
GENERAL INFORMATION
The IJCAI-97 one day workshop on "Frontiers of ILP" in Nagoya, Japan,
will take place on August 25, immediately prior to
the start of the main IJCAI conference.
TECHNICAL DESCRIPTION
Inductive logic programming (ILP) is a recent subfield of
artificial intelligence that studies the induction of first order formulae
from examples. The purpose of this workshop is twofold:
on the one hand, we wish to widen the scope of ILP
by investigating its relations to neighboring fields,
and on the other hand, we wish to make ILP more accessible
for researchers from neighboring fields.
The workshop therefore solicits papers
that lie at the frontiers of ILP with neighboring fields.
A non-exclusive list of interesting topics for the workshop includes :
* ILP and Software Engineering:
what has ILP to offer to Software Engineering ?,
and in what way can Software Engineering help to design ILP systems
and applications ?
* ILP for Knowledge Discovery in Databases : ILP aims
at learning complex rules involving multiple relations from small
databases, whereas KDD typically induces simple rules about a
single relation from a large database. Furthermore, ILP allows to
exploit background knowledge in a variety of ways. Can KDD and ILP be
succesfully combined ?
* ILP and Computational or Algorithmic Learning Theory :
though many results have been obtained concerning the learnability
of inductive logic programming, most of the results are negative
and most of the positive results are reducible to propositional learning
methods. Is there a mismatch of COLT with ILP ? and if so,
what can be done about it ?
* ILP versus propositional learning methods :
Since the very start of ILP, researchers and practioners of
machine learning have wondered about the relation between
ILP and propositional learning methods. Theoretical and experimental
questions that arise include:
when to use ILP and when to use propositional learning methods ?
under what circumstances can ILP be reduced to propositional learning ?
what is the price to pay for using first order logic in
terms of efficiency ?
* ILP and Knowledge Representation : ILP has traditionally employed
computational logic to represent hypotheses and observations.
Alternative well-founded knowledge representation formalisms have received
little attention (with the exception of CLASSIC).
What can ILP learn from Knowledge Representation ?
and in what well-founded Knowledge Representation formalisms
is induction feasible ?
* ILP in multistrategy learning : Multistrategy learning
combines multiple learning strategies. What role can ILP
play for multistrategy learning ?
* ILP and Probabilistic reasoning: in contrast to
propositional learning methods, ILP has not used
probabilistic representations. How can ILP incorporate
such representations ? and how can it interact with
methods such as Bayes nets or Hidden Markov Models ?
* ILP for Intelligent Information Retrieval:
The rapid development of
the World Wide Web has spawned significant interest in intelligent
information retrieval. In particular, the need for algorithms for
reliably classifying textual documents into given categories (like
interesting/uninteresting) be useful for a wide variety of tasks.
Currently, most learning algorithms are not able to make use of
structural information like word order, succesive words, structure of
the text, etc. Can ILP algorithms offer advantages over conventional
information retrieval or machine learning algorithms for this sort of
tasks?
* Applications of ILP in subfields of AI : ILP has been applied
to other subfields of AI, including natural language processing,
intelligent agents and planning.
Further applications of ILP within AI are solicited.
Both position papers about the relation of ILP to other fields, as well
as research papers that make specific techical contributions
are solicited. However, to stimulate discussion, it is expected
that each technical paper also clarifies the position
of ILP with regard to the neighboring field(s) it addresses.
Except for the presentation of position and technical papers,
the workshop will also feature a panel discussion
on the frontiers of ILP and possibly an invited talk.
ORGANISERS
Luc De Raedt (chair and primary contact)
Saso Dzeroski
Koichi Furukawa
Fumio Mizoguchi
Stephen Muggleton
PROGRAMME COMMITTEE
Francesco Bergadano (Italy)
Luc De Raedt (co-chair, Belgium)
Saso Dzeroski (Slovenia)
Johannes Furnkranz (Austria)
Koichi Furukawa (Japan)
David Page (U.K.)
Fumio Mizoguchi (Japan)
Ray Mooney (U.S.A.)
Stephen Muggleton (co-chair, U.K.)
CALL FOR PARTICIPATION
Participation is open to all members of the AI Community.
However, to encourage interaction and a broad exchange of ideas
the number of participants will be strictly limited
(preferably under 30 and certainly under 40).
Participants will be selected on the basis of submissions.
Three types of submissions will be considered :
1) technical contributions (ideally, a 3 to 5 page extended abstract,
in the IJCAI Proceedings Format, 3000-4000 words),
2) position papers (ideally, a 1 to 3 page abstract
in the IJCAI Proceedings Format, 1000 - 3000 words)
3) a statement of interest (ideally, a one page motivation of why you
would like to participate, 300- 500 words)
Only submissions of type 1) and 2) will be considered
for presentation at the workshop and inclusion in the workshop notes.
Submissions should be received no later than April 1, 1997,
and must include first author's complete contact information,
including address, email, phone, and fax number. Though 1 April
is the hard deadline, the authors are encouraged to submit
their material by 24 March, in order to facilitate the reviewing process.
Double submissions with the ILP-97 Workshop (which is to take
place in Prague, September 1997) are allowed.
SUBMISSIONS
Submit papers by email (postscript) and surface mail (2 copies) to
Luc De Raedt
Dept. of Computer Science
Katholieke Universiteit Leuven
Celestijnenlaan 200A
B-3001 Heverlee
Belgium
Email : [email protected]
IMPORTANT DATES
- Paper submission : 1 April
- Notification to Authors : 21 April
- Camera ready copy : the submissions themselve
will serve as camera ready copy
(submissions in the IJCAI Proceedings Style are strongly preferred,
see http://www.ijcai.org/ijcai-97/ for details)
PUBLICATION
The accepted submissions will be included in the workshop notes
to be distributed at the workshop.
Post-conference publication of a selection of the workshop papers
will be considered and discussed at the workshop.
COSTS
To cover costs, a fee of $US 50 will be charged,
in addition to the normal IJCAI-97 conference registration fee.
Attendees of IJCAI workshops will be required to register
for the main IJCAI conference.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Subject: ILP-97: CFP Reminder
Date: Mon, 17 Mar 1997 15:49:23 +0100
From: Saso Dzeroski <[email protected]>
The Seventh International Workshop on
Inductive Logic Programming
17-19 September 1997, Prague, Czech Republic
The deadline for paper submissions is 31 March 1997.
-------------
Invited talks will include:
"Data Mining: Algorithms and Limitations" by Usama Fayyad,
"Complexity of Logic Programming" by Georg Gottlob, and
"ILP and CLP" by Jean-Francois Puget.
For more information see
http://www-ai.ijs.si/SasoDzeroski/ilp97.html
|
410.20 | 97:11 | IJSAPL::OLTHOF | Spellchecked Henry Although | Tue Apr 01 1997 09:39 | 904 |
| Knowledge Discovery Nuggets 97:11, e-mailed 97-03-28
News:
* GPS, KDD-97 Tutorials Program
http://www-aig.jpl.nasa.gov/kdd97-docs/kdd97.tutorials.html
* J. Wiegand, KDD tools/methods for detection of skin malignancies?
Publications:
* P. Vitanyi, Kolmogorov Complexity and Applications, 2nd ed.,
http://www.cwi.nl/~paulv/kolmogorov.html
* R. Caldwell, Special Issue and Competition on
Improving Generalization for Nonlinear Financial Forecasting Models
http://ourworld.compuserve.com/homepages/ftpub/call.htm
Positions:
* V. Petraglia, Thinking Machines, Consultant Positions
* M. Ramoni, Research Studentships at the Knowledge Media Institute
Meetings:
* G. Widmer, ECML-97 Preliminary Programme,
23-25 April 1997, Prague, Czech Republic
http://is.vse.cz/ecml97/home.html
* J. Han, SIGMOD-97 Data Mining Workshop, May 11, 1997
http://fas.sfu.ca/cs/conf/dmkd97.html
* W. Wothke, Chicago ASA Data Mining meeting, May 2, 1997
http://www.smallwaters.com/datamine
* GPS, Data Mining'97 : Increasing Corporate Performance,
Paris, June 2-4, 1997, http://www.datamining.org/events.htm
* S. Tafolla, XpertUser Conference, 2-5 November 1997, Boston,
http://www.XpertUser.com
--
Knowledge Discovery Nuggets is a free electronic newsletter for the Data Mining
and Knowledge Discovery community, focusing on the latest research and
applications.
Submissions are most welcome and should be emailed, with a DESCRIPTIVE subject
line (and a URL) to [email protected]. Submissions may be edited for
brevity.
To subscribe, see http://www.kdnuggets.com/subscribe.html
KD Nuggets frequency is 3-4 times a month.
Back issues of KD Nuggets, a catalog of data mining tools ("Siftware"), and a
wealth of other information on Data Mining and Knowledge Discovery are available
at Knowledge Discovery Mine site http://www.kdnuggets.com/
-- Gregory Piatetsky-Shapiro (editor)
********************* Official disclaimer ************************************
All opinions expressed herein are those of the contributors and not necessarily
of their respective employers or of KD Nuggets
*****************************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The first and simplest emotion which we discover in the human mind, is curiosity.
--Edmund Burke
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: 27 Mar 1997, 17:12:15
From: GPS <[email protected]>
Subject: KDD-97 Tutorials
KDD-97 conference will have a day of excellent tutorials by leading
researchers-many thanks to P. Smyth for putting it together.
See http://www-aig.jpl.nasa.gov/kdd97-docs/kdd97.tutorials.html
for full details
================================================================
<title> KDD97 Tutorial Abstracts and Speakers </title>
<h2> Tutorial 1: Data Mining and KDD: An Overview </h2>
<h3> Usama Fayyad, Microsoft Research and
Evangelos Simoudis, IBM. </h3>
We present a basic tutorial of this new and emerging area and
emphasize relations to constituent communities including statistics,
databases, pattern recognition, learning, and visualization. The
tutorial provides a basic overview of the KDD process for extracting
knowledge from databases and covers the basics of each step in the
process including: data warehousing, selection and cleaning,
data transformation, data mining, evaluation, and visualization.
We also cover a sampling of successful applications and outline
challenges and issues to be addressed.<p>
<hr>
<h2> Tutorial 2: Modelling Data and Discovering Knowledge</h2>
<h3> David Hand, Open University, UK. </h3>
Our aim is to extract knowledge from large bodies of data. The size of
these bodies mean that we cannot do it unaided, but must use fast computers,
applying sophisticated statistical tools. Attempts to automate the process
of knowledge extraction date from at least the early 1980s, with the work on
statistical expert systems. We examine this work, noting its successes and
failures and, especially, what researchers in data mining and knowledge
discover can learn from those efforts. We examine what data are, what
information is, and what knowledge is. We contrast modelling with
discovery, especially in the context of large data sets. We examine high
level modelling issues, such as overfitting, generalisability,
overmodelling, and model evaluation. And we examine high level exploration
issues such as the discovery of accidental artefacts. The confluence of
computing and statistics in some areas provides a nice backdrop against
which to examine these issues, and we briefly discuss neural networks and
classification trees from these two perspectives.<p>
<hr>
<h2> Tutorial 3: Text Mining - Theory and Practice</h2>
<h3> Ronen Feldman, Bar-Ilan University, Israel. </h3>
Knowledge Discovery in Databases (KDD) focuses on the computerized
exploration of large amounts of data and on the discovery of interesting
patterns within them. While most work on KDD has been concerned with
structured databases, there has been little work on handling the huge
amount of information that is available only in unstructured textual form.
In this tutorial we will present the general theory of Text Mining and will
demonstrate several systems that use these principles to enable interactive
exploration of large textual collections. We will describe generic
techniques for text categorization and information extraction that are used
by these systems. The systems that will be presented are KDT which is
system for Knowledge Discovery in Texts, FACT, which discovers associations
amongst keywords labeling the items in a collection of textual documents,
and the Text Explorer which is a system that provides a high level language
for interactive exploration of textual collections.
We will present a general architecture for text mining and will outline the
algorithms and data structures behind the systems. We will give special
emphasis to incremental algorithms and to efficient data structures.
<p>
<hr>
<h2> Tutorial 4: Exploratory Data Analysis using Interactive Dynamic Graphics
</h2>
<h3> Deborah Swayne, Bell Communications Research
and Diane Cook, Iowa State University. </h3>
Researchers and software designers in the field of data mining
are just beginning to make extensive use of graphical methods.
Interactive dynamic data visualization has been explored
in the field of statistics for over twenty years, and we
propose that much of what has been learned in statistics is
relevant for data mining.
This class is an introduction to interactive data visualization as
it is practiced as part of exploratory data analysis. The XGobi
software, publicly available dynamic visualization software, will
be used in the analysis of examples from biology, business,
physics, engineering, and telecommunications.
The examples will illustrate a set of general visualization principles
which are embodied in specific methods such as brushing and
identification of points in simple scatterplots, three dimensional
rotations, rotations in higher dimensions such as the grand tour, and
directed searches in higher dimensions for interesting two dimensional
views using projection pursuit and manual control.
<p>
<hr><h2> Tutorial 5: Visual Techniques for Exploring Databases </h2>
<h3> Daniel Keim, University of Munich.</h3>
For data exploration to be effective, it is important to include the human in
the exploration process and combine the flexibility, creativity, and general
knowledge of the human with the enormous storage capacity and the
computational power of today's computers. Visual database exploration aims
at integrating the human in the exploration process, applying its perceptual
abilities to the large data sets available in today's computer systems. The
basic idea of visual data exploration is to present the data in some visual
form, allowing the human to get insight into the data and draw conclusions.
Visual data exploration techniques have proven to be of high value in
exploratory data analysis and they also have a high potential for exploring
large databases. Visual database exploration is especially powerful for the
first steps of the data mining process, namely understanding the data and
generating hypotheses about the data, but it may also significantly
contribute to the actual knowledge discovery by guiding the search using
visual feedback.
The goal of the tutorial is to show the potential of visualization technology
for exploring large databases. The tutorial provides an overview of the
state-of-the-art in data visualization and provides a classification of the
existing data visualization techniques. Besides describing each of the
classes, the tutorial focuses on new developments in data visualization,
which are relevant to the area of knowledge discovery, and describes a wide
range of recently developed techniques for visualizing large amounts of
arbitrary multi-attribute data which does not have any two- or
three-dimensional semantics and therefore does not lend itself to an easy
display. A detailed comparison shows the strength and weaknesses of the
existing techniques and reveals potentials for further improvements. Several
examples demonstrate the benefits of visualization techniques for exploring
databases. The tutorial concludes with an overview of existing database
exploration and visualization systems, including research prototypes as well
as commercial products.
<p>
<hr><h2> Tutorial 6: OLAP and Data Warehousing</h2>
<h3> Surajit Chaudhuri, Microsoft Research and
Umesh Dayal, Hewlett Packard Labs. </h3>
On-Line Analytical Processing (OLAP) and Data Warehousing technologies
enable enterprises to gain competitive advantage by exploiting the
ever-growing amount of data that is collected and stored in corporate
databases and files for better and faster decision making. Over the
past few years, these technologies have experienced explosive growth,
both in the number of products and services offered, and in the extent
of coverage in the trade press. Vendors (including all database companies)
are paying increasing attention to all aspects of decision support.
The area opens up interesting research directions, with ties to past
work in database systems, but with different assumptions and
requirements. Only very recently, however, has the database research
community started to understand and address some of these issues.
This tutorial presents an overview of OLAP and data warehousing, and an
in-depth study of selected aspects. An outline of the tutorial follows:
1. Introduction: definitions, evolution, differences from OLTP, architectures
2. Models and Tools: conceptual model for OLAP,
front-end tools (e.g., multidimensional spreadsheets),
database design (e.g., star and snowflake schema).
3. Database Server technologies for Decision Support
Queries: specialized indexing techniques,
specialized join and scan methods,
data partitioning and use of parallelism,
intelligent processing of aggregates,
complex query processing,
extensions to SQL,
ROLAP vs. MOLAP.
4. Other Services for OLAP/Data warehousing:
data cleaning, loading and refresh,
tools for warehouse, system and process management,
metadata management and the role of repository.
5. State of Commercial Practice.
6. Research Issues.
The target audience is
researchers and developers interested in learning about the concepts,
products and the technical innovations in the area of decision support
technologies.
<p>
<hr><h2> Tutorial 7: Statistical Models for Categorical Response Data</h2>
<h3> William DuMouchel, AT&T Research. </h3>
This tutorial will survey the most common models and methods statisticians
use to fit and test relationships among categorical (discrete) data. Most
of these techniques are described in statistics texts such as
<i> Categorical
Data Analysis </i>, by Alan Agresti, (Wiley 1990) and are widely available in
popular computer packages such as SAS and Splus. Therefore it is almost de
rigeur for someone with a new classification technique to compare the
proposal to one or more of these standard methods. The tutorial will focus
on loglinear and logistic regression models, and related models such as
probit, poisson regression, and survival models. In the short time
available, priority will be given to explaining why these techniques are so
popular among statisticians, and to how the basic models have been extended
to handle variables having more than two categories or when some of the
variables have continuous or ordinal scales. Examples of model fitting,
model search and model comparison using SAS and Splus will be presented and
discussed.
For Biographical Information on Presenters
see the web site http://www-aig.jpl.nasa.gov/kdd97-docs/kdd97.tutorials.html Contact Information:
<a href="http://www.ics.uci.edu/~smyth"> Padhraic Smyth </a>
University of California, Irvine (KDD-97 Tutorials Chair).
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Fri, 21 Mar 1997 20:04:25 -0500 (EST)
I am searching for KDD tools/approaches for searching through clinical data to
help develop and fine-tune medical imaging or detection equipment.
Specifically, early detection of skin malignancies.
Perhaps there is a group somewhere working on this.
Thank you.
Best wishes,
Jeff Wiegand
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 19 Mar 1997 15:48:16 +0100
From: [email protected]
Ming Li and Paul Vitanyi,
AN INTRODUCTION TO KOLMOGOROV COMPLEXITY AND ITS APPLICATIONS,
REVISED AND EXPANDED SECOND EDITION, Springer-Verlag, New York, 1997,
xx+637 pp, 41 illus. Hardcover \$49.95/ISBN 0-387-94868-6
(Graduate Texts in Computer Science Series)
After four years and two printings the second edition has now appeared. During
the preparation the book has been out of stock for a year. In interaction with
many readers and teachers of courses and seminars, all reported errors and
problems have been corrected. The book is revised and expanded by about
90 pages. The price has been *lowered* by over $9.
See the web page "http://www.cwi.nl/~paulv/kolmogorov.html".
>From the ``PREFACE TO THE SECOND EDITION'':
When this book was conceived ten years ago,
few scientists realized the width of scope and the
power for applicability of the central ideas. Partially
because of the enthusiastic reception of the first edition,
open problems have been solved and new applications have been
developed. We have added new material on the relation between
data compression and minimum description length induction,
computational learning, and universal prediction; circuit theory; distributed
algorithmics; instance complexity; CD compression;
computational complexity; Kolmogorov random graphs;
shortest encoding of routing tables in communication networks;
resource-bounded computable universal distributions; average case properties;
the equality of statistical entropy and expected Kolmogorov complexity;
and so on. Apart from being used by researchers and
as reference work, the book is now commonly used for graduate courses
and seminars. In recognition of this fact, the second
edition has been produced in textbook style. We have
preserved as much as possible the ordering of
the material as it was in the first edition.
The many exercises bunched together at the ends of
some chapters have been moved to the appropriate sections.
The comprehensive bibliography on Kolmogorov complexity
at the end of the book has been updated, as have
the ``History and References'' sections of the chapters.
Many readers were kind enough to express their appreciation
for the first edition and to send notification of typos, errors,
and comments. Their number is too large to thank them individually,
so we thank them all collectively.
BLURB:
Written by two experts in the field, this is the only
comprehensive and unified treatment of the
central ideas and their applications of Kolmogorov complexity---the
theory dealing with the quantity of information in individual objects.
Kolmogorov complexity is known variously as `algorithmic
information', `algorithmic entropy', `Kolmogorov-Chaitin
complexity', `descriptional complexity', `shortest program length',
`algorithmic randomness', and others.
The book is ideal for advanced undergraduate students, graduate students
and researchers in computer science, mathematics, cognitive sciences,
artificial intelligence, philosophy, statistics and physics.
The book is self contained in the sense that it contains the basic requirements
of computability theory, probability theory, information theory, and coding.
Included are also numerous problem sets, comments, source references and hints
to the solutions of problems, course outlines for classroom use, as well as a
great deal of new material not included in the first edition.
If you are seriously interested in using the text in the course,
contact Springer-Verlag's Editor for Computer Science, Martin
Gilchrist, for a complimentary copy.
Martin Gilchrist [email protected]
Suite 200, 3600 Pruneridge Ave. (408) 249-9314
Santa Clara, CA 95051
If you are interested in the text but won't be teaching a course,
we understand that Springer-Verlag sells the book, too.
To order, call toll-free 1-800-SPRINGER (1-800-777-4643); N.J.
residents call 201-348-4033. For information regarding
examination copies for course adoptions, write Springer-Verlag
New York, Inc. , 175 Fifth Avenue, New York,NY 10010.
You can order through the Web site: "http://www.springer-ny.com/"
For U.S.A./Canada/Mexico- e-mail: [email protected] or fax an
order form to: 201-348-4505.
For orders outside U.S.A./Canada/Mexico send this form to: [email protected]
Or call toll free: 800-SPRINGER - 8:30 am to 5:30 pm ET (that's 777-4643 and
201-348-4033 in NJ). Write to Springer-Verlag New York, Inc., 175 Fifth Avenue,
New York, NY, 10010.
Visit your local scientific bookstore. Mail payments may be made by check,
purchase order, or credit card (see note below). Prices are payable in U.S.
currency or its equivalent and are subject to change without notice. Remember,
your 30-day return privilege is always guaranteed!
Your complete address is necessary to fulfill your order.
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Randall Caldwell <[email protected]>
Subject: CFP: Improving Generalization for Nonlinear Financial
Forecasting Models
Journal of Computational Intelligence in Finance
Call for Papers
Special Issue and Competition on
"Improving Generalization for Nonlinear Financial Forecasting Models"
The Journal of Computational Intelligence in Finance, a peer-reviewed
technical journal, published by Finance & Technology Publishing, is
seeking papers for review and publication in 1997 on "Improving
Generalization for Nonlinear Financial Forecasting Models". For
comparison of methods submitted, the target variable series and
performance metrics are specified (though not required).
PUBLICATION DATE
November 1997
PAPER SUBMISSION DEADLINE
June 30, 1997
MOTIVATION
The critical issue in applying neural networks and other data-driven
forecasting systems is generalization, the performance on data not used
for training. The key to generalization behavior is model complexity.
Too simple a model cannot approximate the true relationship, and overly
complex models adjust to the noise in the data. Nearly all financial
applications of nonparametric models (such as neural networks and genetic
algorithms) vary model complexity by adjusting the number of parameters.
This special issue intends to highlight other methods to improve
generalization, in particular regularization (e.g., neural network
weight decay and smoothing) and techniques for combining models. Of
particular interest are nonlinear methods including neural networks,
genetic algorithms, nearest neighbor networks, polynomial networks,
fuzzy logic, and hybrids.
Nearly all studies apply cross-validation to select the best model.
Alternatives to cross-validation include 'analytical' selection rules
such as Akaike's Information Criterion, Schwartz's Information Criterion,
and a number of others. Of particular interest are the statistical
properties (i.e., bias and variance) of model selection methods in
estimating out-of-sample performance.
DATA, TARGET VARIABLES and PERFORMANCE METRICS
Data: daily prices of a financial time series (see below)
Target Variable: the relative difference in percent (RDP) between
today's closing price and the price five (5) days ahead
Performance Metrics: MSE (target). nRMSE and DS (to be used in the
analysis).
Participants are encouraged to use the forecast data, target variable and
performance metrics specified for this special issue, which are available
on the Web to those who submit a satisfactory abstract (including brief
biography) as outlined below. Participants are not be restricted regarding
the data used as inputs to their predictors. Especially interesting
original methods using other forecast data, target variables and
performance metrics will also be considered.
The forecast series is derived from daily closing prices for a financial
time series. The target variable is the relative difference in
percent (RDP) between today's closing price and the closing price
five (5) days ahead. The date, the underlying price series and the
target variable series are all provided in the downloadable data file.
The target metric is the MSE. Also, authors' analysis should include
the normalized RMSE (RMSE normalized using the standard deviation of
actual RDP values), and Directional Symmetry (percentage of correctly
predicted directions with respect to the target variable).
The forecast data provided is separated into in-sample (10 years of
daily data) and out-of-sample (2 years of daily data) sets. Participants
are not restricted regarding the data used as input to their predictors.
However, all data used should be disclosed in the paper presentaton,
including the details of all techniques and formulas used to pre-process
the data. Details on the predictor and the methods used for improving
generalization should be presented in the paper.
FORECAST HORIZON AND RE-TRAINING
Participants should test performance of their predictors over the entire
two-year out-of-sample dataset. Of interest are results of analyses and
performance of predictors over the entire two-year prediction period:
(1) without re-training and
(2) with re-training (optional).
The results from (1) and (2) can be useful for estimating the limits
of the forecasting horizon for the prediction methods presented.
For additional details on the forecast data, target variable and
performance metrics, see:
http://ourworld.compuserve.com/homepages/ftpub/call.htm
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 21 Mar 1997 11:07:08 -0500
From: Vaughn Petraglia <[email protected]>
Subject: Thinking Machines, Consultant Positions
Thinking Machines Professional Services
Senior Consultant Data Mining
San Francisco bay area and other locations
3/12/97
As a member of the new Thinking Machines Professional Services
Organization, you will be responsible for all aspects of bidding and
delivering consulting products and service to many of our most important
customers. You will lead or participate in small teams of seasoned
professionals to help our customers use Darwin to find new business
opportunities hidden in their very large databases and data warehouses.
Major job functions include:
1. Working with a TMC Account Executive to understand the customer or
prospects requirements, you will provide technical guidance through
the sales cycle.
2. Develop a project plans, risk analysis, and formal services bids.
3. Organizing and managing all resources needed to complete the project
within budget and on time.
4. Providing hands on data analysis and data mining consulting.
5. Consulting and skills transfer on the Darwin product.
6. Follow-up to insure customer satisfaction.
The ideal candidate will have:
1. Project management experience.
2. Excellent written and oral communications skills.
3. Advanced degree in an analytical field or equivalent experience.
4. Experience in data analysis, database systems, knowledge based systems
or data mining.
5. Experience in parallel algorithms and parallel computer systems is
desirable.
Contact: Vaughn Petraglia
[email protected]
Thinking Machines
14 Crosby Dr.
Bedford, Ma 01730
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 21 Mar 1997 18:45:32 +0000
From: Marco Ramoni <[email protected]>
Subject: Research Studentships at the Knowledge Media Institute
The Knowledge Media Institute (KMi) is home to internationally recognised
researchers in Educational Multimedia, Collaboration Technologies,
Artificial Intelligence, Cognitive Science, and Human-Computer Interaction.
KMi offers students an intellectually challenging environment with
exceptional research and computer facilities. We are currently seeking
applications for full-time, 3-year research studentships in the following
areas:
- Migratory Interfaces and Mobile Computing
- Virtual Intelligence and Knowledge Discovery
- Knowledge Management and Knowledge Modelling
- Sharing and Reusing Design Knowledge over the WWW
Applicants are typically expected to have a degree in computer science,
artificial intelligence, cognitive science, psychology, or a related
discipline. As KMi only accepts a very small number of research students
per year, admission is highly competitive. To apply, send a CV and short
project proposal (3 pages) along with a completed application form.
Successful candidates must be willing to live within reasonable commuting
distance from Milton Keynes, and be available to start on October 1, 1997.
Applicants are strongly encouraged to visit the KMi web site
(http://kmi.open.ac.uk/studentships) for more information on ongoing KMi
projects and the studentships.
An application form with further particulars can be obtained by contacting
Ms. Ortenz Rose by email ([email protected]), telephone (+44 (1908) 653
800) or post (Knowledge Media Institute, The Open University, Walton Hall,
Milton Keynes, MK7 6AA, UK). Informal advice on these studentships can be
obtained by contacting Dr. Tamara Sumner, admissions co-ordinator, by email
at [email protected] or by telephone at the number above.
Closing date for applications: 18 April 1997
Further particulars are attached below.
Virtual Intelligence and Knowledge Discovery
Marco Ramoni (KMi)
http://kmi.open.ac.uk/~marco
The Virtual Intelligence Project and the Knowledge Discovery Project at the
Knowledge Media Institute seek a candidate PhD student to work at the
intersection of their areas of research. The Virtual Intelligence Project
focuses on the development of distributed Artificial Intelligence
applications over the World Wide Web. The Knowledge Discovery Project
investigates probabilistic and statistical methods to extract reusable
knowledge sources from databases. The PhD project will fall into their
joint effort to develop a distributed knowledge discovery architecture over
the World Wide Web. The successful candidate will be able to choose a
research topic among a variety of key issues underlying this research,
ranging from methodological aspects of knowledge extraction and distributed
artificial intelligence to design and development issues of the
architecture.
More information on the Virtual Intelligence Project is available at:
http://kmi.open.ac.uk/~marco/projects/wai/vip
More information on the Knowledge Discovery Project is available at:
http://kmi.open.ac.uk/~marco/projects/kdd
For more information on this studentship, contact Marco Ramoni at
[email protected].
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 17 Mar 1997 15:21:43 +0100 (MET)
From: Gerhard Widmer <[email protected]>
Subject: ECML-97 Preliminary Programme
9th EUROPEAN CONFERENCE ON MACHINE LEARNING (ECML-97)
23-25 April 1997, Prague, Czech Republic
PRELIMINARY PROGRAMME
Up-to-date information on the conference (including registration information)
can be found at
http://is.vse.cz/ecml97/home.html
This programme with complete abstracts of all talks and links to the
workshops is also available at
http://www.ai.univie.ac.at/ecml/programme.html
-----------------------------------------------------------------------------
--------------------
WEDNESDAY, APRIL 23:
9.00 - 9.30 Welcome
9.30 - 10.30 INVITED TALK:
Uncertain Learning Agents
Stuart Russell, University of California, Berkeley, USA
10.30 - 11.00 Coffee Break
11.00 - 10.30 Integrated Learning and Planning Based on
Truncating Temporal Differences
Pawel Cichosz
11.30 - 12.00 Finite-Element Methods with Local Triangulation Refinement
for Continuous Reinforcement Learning Problems
Remi Munos
12.00 - 12.15 Learning and Exploitation Do Not Conflict
Under Minimax Optimality
Csaba Szepesvari
12.15 - 12.30 Exploiting Qualitative Knowledge to Enhance Skill Acquisition
Cristina Baroglio
12.30 - 14.00 Lunch
14.00 - 15.00 INVITED TALK:
Constructing and Sharing Perceptual Distinctions
Luc Steels, Free University of Brussels (VUB) and
Sony Computer Science Laboratory, Paris
15.00 - 15.30 Ibots Learn Genuine Team Solutions
Cristina Versino, Luca Maria Gambardella
15.30 - 16.00 Coffee Break
16.00 - 16.30 NeuroLinear: A System for Extracting Oblique Decision Rules
from Neural Networks
Rudy Setiono, Huan Liu
16.30 - 17.00 Learning Different Types of New Attributes by Combining the
Neural Network and Iterative Attribute Construction
Yuh-Jyh Hu
17.00 - 17.45 Commenting Session
-------------------
THURSDAY, APRIL 24:
9.00 - 10.00 INVITED TALK:
On Prediction by Data Compression
Paul Vitanyi, CWI, Amsterdam
10.00 - 10.30 Conditions for Occam's Razor Applicability and
Noise Elimination
Dragan Gamberger, Nada Lavrac
10.30 - 11.00 Coffee Break
11.00 - 11.30 Compression-Based Pruning of Decision Lists
Bernhard Pfahringer
11.30 - 11.45 Inductive Genetic Programming with Decision Trees
Nikolay I. Nikolaev, Vanio Slavov
11.45 - 12.00 Probabilistic Incremental Program Evolution:
Stochastic Search Through Program Space
Rafal Salustowicz, Juergen Schmidhuber
12.00 - 12.30 Constructing Intermediate Concepts by Decomposition
of Real Functions
Janez Demsar, Blaz Zupan, Marko Bohanec, Ivan Bratko
12.30 - 14.00 Lunch
14.00 - 14.30 Global Data Analysis and the Fragmentation Problem in
Decision Tree Induction
Ricardo Vilalta, Gunnar Blix, Larry Rendell
14.30 - 15.00 Model Combination in the Multiple-Data-Batches Scenario
Kai Ming Ting, Boon Toh Low
15.00 - 15.30 Commenting Session
15.30 - 16.00 Coffee Break
16.00 - 17.00 Poster Session
17.00 - open ECML Community Meeting
-----------------
FRIDAY, APRIL 25:
9.00 - 9.15 A Case Study in Loyalty and Satisfaction Research
Koen Vanhoof, Josee Bloemer, Koen Pauwels
9.15 - 9.30 Inducing and Using Decision Rules in the
GRG Knowledge Discovery System
Ning Shan, Howard J. Hamilton, Nick Cercone
9.30 - 9.45 Learning When Negative Examples Abound
Miroslav Kubat, Robert Holte, Stan Matwin
9.45 - 10.00 Search-Based Class Discretization
Luis Torgo, Joao Gama
10.00 - 10.15 Classification by Voting Feature Intervals
G"ulsen Demir"oz, H. Altay G"uvenir
10.15 - 10.30 A Model for Generalization Based on Confirmatory Induction
Nicolas Lachiche, Pierre Marquis
10.30 - 11.00 Coffee Break
11.00 - 11.30 Natural Ideal Operators in Inductive Logic Programming
Fabien Torre, Celine Rouveirol
11.30 - 12.00 Theta-subsumption for Structural Matching
Luc De Raedt, Peter Idestam-Almquist, Gunther Sablon
12.00 - 12.30 Induction of Feature Terms with INDIE
Eva Armengol, Enric Plaza
12.30 - 12.45 Metrics on Terms and Clauses
Alan Hutchinson
12.45 - 13.00 Learning Linear Constraints in Inductive Logic Programming
Lionel Martin, Christel Vrain
Afternoon off - trip and farewell party (optional; see social programme)
------------------
SATURDAY, APRIL 26:
ECML/MLNet WORKSHOPS:
WS 1: Data-Driven Learning of Natural Language Processing Tasks
WS 2: Case-Based Learning: Beyond Classification of Feature Vectors WS 3: Learning in Dynamically Changing Domains:
Theory Revision and Context Dependence Issues
WS 4: Machine Learning and Human-Agent Interaction
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Jiawei Han <[email protected]>
Date: Tue, 18 Mar 1997 22:05:37 -0800 (PST)
Subject: SIGMOD'97 Data Mining Workshop: Call for Participation
Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD'97)
in cooperation with ACM-SIGMOD'97
Tucson, Arizona, May 11, 1997
(URL: http://fas.sfu.ca/cs/conf/dmkd97.html)
PROGRAM
The workshop will be held one day before the SIGMOD/PODS'97 conference.
The program is as follows:
8:30--8:35 Opening Remarks
8:35--9:30 Invited Talk
9:30--9:45 Coffee Break
9:45--11:00 Session I Clustering/Classification
A Fast Clustering Algorithm to Cluster Very Large Categorical
Data Sets in Data Mining
Zhexue Huang
Clustering Based On Association Rule Hypergraphs
Eui-Hong Han, George Karypis, Vipin Kumar and Bamshad Mobasher
Ontology-based Induction of High Level Classification Rules
Merwyn G. Taylor, Kilian Stoffel and James A. Hendler
11:00--11:15 Coffee Break
11:15--12:30 Session II Applications
An efficient domain-independent algorithm for detecting
approximately duplicate database records
Alvaro E. Monge and Charles P. Elkan
An Application of Adaptive Data Mining: Facilitating
Web Information Access
Parvathi Chundi and Umeshwar Dayal
Efficient Roll-Up and Drill-Down Analysis for Large Data Sets
Min Wang and Bala Iyer
12:30--14:15 Lunch, Posters, Demos
14:15--15:30 Session III Association Rules
Mining Association Patterns from Nested Databases
Ke Wang
Maintenance of Discovered Association Rules: When to update?
S.D. Lee and David W. Cheung
Efficient Algorithms for Discovering Frequent Sets in
Incremental Databases
Ronen Feldman, Yonatan Aumann, Amihood Amir and Heikki Mannila
15:30--15:45 Coffee Break
15:45--17:00 Session IV Miscellany
Sharing Processing in Data Mining Systems
Arun Swami and Brian Lent
A Pattern Discovery Algebra
Alexander Tuzhilin
On the Complexity of Mining Temporal Trends
Jef Wijsen and Robert Meersman
17:00-18:00 Summary Discussion
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 20 Mar 1997 09:29:50 -0600
From: Werner Wothke <[email protected]>
Subject: Chicago ASA Data Mining Conference, May 2, 1997
The Chicago Chapter of the American Statistical Association is
presenting a Data Mining conference on May 2, titled
A Hard Look at Data Mining
The idea of the conference is to peel away most of the hype and present
the local statistical and data analysis community with some solid
technical and statistical information. A web site with additional
information can be found at
http://www.smallwaters.com/datamine
With beste wishes,
Werner Wothke
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 20 Mar 1997 17:48:34 -0500
From: Gregory Piatetsky-Shapiro <[email protected]>
Subject: Paris Data Mining'97 Event, June 2-4
See http://www.datamining.org/events.htm for full information
<h2 align=center>Data Mining'97 : Increasing Corporate Performance</h2>
<h2 align=center>Meridien Montparnasse Hotel, Paris, June 2-4, 1997</h2>
<h3>THE DATA MINING MARKET : TRENDS AND EVOLUTION</h3>
<dl>
<li>Market and players
<li>Perspectives and trends : Data Mining in 2000 and beyond
<li>Mining the Net : maximizing external data retrieval and analysis
<li>Data Mining and the law : situation and perspectives
</dl>
<h3>INTRODUCTION TO DATA MINING</h3>
<dl>
<li>More than a media phenomenon, what are the real issues for data mining ?
<li>Corporate data bases : retrieval and output
<li>The latest technologies
<li>Technology-human interface
</dl>
<h3>DATA MINING BEST PRACTICE</h3>
<dl>
<li>Data warehousing, On Line Analytical Processing and data mining
<li>Data and their representation for data mining
<li>Optimizing access to stored information
<li>Utilizing data mining to further management strategies
<li>Using data mining to measure corporate performance through data mining
</dl>
<h3>DATA MINING APPLICATIONS</h3>
<dl>
<li>Direct marketing and data mining : customer satisfaction and retention
<li>Geomarketing and data mining
<li>Marketing strategy and data mining : optimizing a commercial network
<li>Finance and data mining : credit management and risk assessment
<li>Adapting to changing markets through implementing data mining processes in all fields of business
</dl>
<p><strong>A unique opportunity to meet your potential customers and peers and hear the latest from the competition !</strong></p>
<p>This forum will be a premier opportunity to network & exchange business cards with CEOs, VPs, and managers of :
<dl>
<li>Finance
<li>Marketing
<li>Sales
<li>Strategic Planning
<li>Information Systems
<li>Advertising above and below the line
</dl>
In the fields of :
<dl>
<li>Financial services
<li>Insurance
<li>Mail order companies
<li>Retail
<li>Healthcare
<li>Computing, Telecommunications
<li>Government
<li>Transport and logistics
</dl>
</p>
<p><strong>This Conference will be a premiere in Europe. Come join us in Paris!</strong></p>
<p>For further information and registration, please contact us at <a href='mailto:[email protected]'>[email protected]</a></p>
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Sun, 23 Mar 1997 17:05:26 -0800
KNOWLEDGE ACCELERATION
The 1997 XpertUser Conference
2 - 5 November 1997
Boston, Massachusetts
http://www.XpertUser.com
In support of its XpertRule(r) and Profiler(tm) products, Attar Software
announces its 1997 XpertUser Conference entitled: "Knowledge
Acceleration." The Conference, to be held in Boston, MA, 2 - 5 November 1997, features a keynote
address by Professor Donald Michie, a pioneer in the field of Machine
Intelligence. In addition, there are planned tutorials on data mining
and knowledge engineering as well as application demonstrations, and
technical sessions with Dr. Akeel Al-Attar, and other experts from Attar's
world-wide customer base. The Conference web page is at http://www.XpertUser.com.
The registration fee is $695 until 1 July when it iincreases to $895.
|
410.21 | 97:12 | IJSAPL::OLTHOF | Spellchecked Henry Although | Wed Apr 23 1997 12:45 | 459 |
| Knowledge Discovery Nuggets 97:12, e-mailed 97-04-10
News:
* E. Colet, Advanced Scout News --
http://www.nextstep.com/new_this_week/120/advancedscout.html
* A. Andrusiewicz, Query -- Mining Association Rules
Publications:
* H. Motoda, Final CFP: IEEE Expert Special Issue on
Feature Transformation and Subset Selection
Siftware:
* O. Leng, WinViz for Excel,
http://jsaic.iti.gov.sg/projects/vizMain.html
Positions:
* W. Jones, Knowledge Discovery Research at U. of Alabama at Birmingham
(UAB), http://www.cis.uab.edu/info/kdrg/kdrg.html
* R. Straughan, Senior Consultant in Data Mining at NSRC in Singapore
http://www.nsrc.nus.sg
Meetings:
* R. Tibshirani, Modern Regression and Classification course,
New York , June 23-24, 1997
http://stat.stanford.edu/~trevor/mrc.finance.html
* PADD97, Practical Application of Knowledge Discovery and Data Mining
Conference Program, London, 23-25 April 1997,
http://www.demon.co.uk/ar/PADD97/
* M. Conkling, Data Warehousing Best Practices & Implementation
Conference
Chicago May 27-June 1, 1997,
http://www.dw-institute.com/
* GPS, Data Mining'97 : Increasing Corporate Performance,
Paris, June 2-4, 1997, cancelled
--
Knowledge Discovery Nuggets is a free electronic newsletter for the
Data Mining and Knowledge Discovery community, focusing on the
latest research and applications.
Submissions are most welcome and should be emailed, with a
DESCRIPTIVE subject line (and a URL) to [email protected].
To subscribe, see http://www.kdnuggets.com/subscribe.html
Back issues of KD Nuggets, a catalog of data mining tools
("Siftware"), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site
at http://www.kdnuggets.com/
-- Gregory Piatetsky-Shapiro (editor)
[email protected]
********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
No matter how neutral the topic, your message will offend SOMEONE.
Murphy's laws of BBS, thanks to
http://www.calweb.com/~logon/murphy.html
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "Edward Colet"<[email protected]>
Date: Wed, 26 Mar 1997 16:30:56 -0400
Subject: Advanced Scout
Readers may be interested in some recent updates on the data mining/KDD
work of IBM Research's Advanced Scout Project (the data mining application
used in the National Basketball Association). These can be found in
newspapers, TV, the web and the SIGMOD/PODS schedule. Specifically, the
press coverage of Advanced Scout appeared in the Los Angeles Times,
2/17/97, page C4. Also, the TV show, "NextStep" showed a feature on
Advanced Scout that aired in the San Francisco area on 3/8/97. A broadcast
of this feature will air nationwide on the Discovery channel at a later
date. The URL for the NextStep feature called "Hard-wired Hoops" can be
found at : http://www.nextstep.com/new_this_week/120/advancedscout.html
Also available on the Web is an online posting containing the abstract and
bio for the keynote address on data mining at SIGMOD/PODS, 1997 to be given
by Inderpal. The URL is:
http://mundos.ifsm.umbc.edu/~ramesh/sigmod97/advprog.html. It's
accessible from within both the SIGMOD or the PODS schedules.
Thanks,
Ed Colet.
*********************************************
IBM T.J. Watson Research Center
30 Saw Mill River Road
Hawthorne NY 10532
phone: 914-784-6621; tie-line 863
fax: 914-784-7455
email: [email protected]
*********************************************
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 27 Mar 1997 12:04:21 +1000 (EST)
From: Anna Andrusiewicz <[email protected]>
Hi,
I am working on a problem that may be related to mining generalized
association rules. The basic problem involves mining student enrolment
histories in order to figure out what subjects are being taken by what
kinds of students.
I would like to conduct a case study on the enrolments data I have, and
was wondering if anyone knows of a public domain system for mining
association, or multi-level association rules.
Any help offered will be much appreciated - thank you,
Anna Andrusiewicz
School of Information Technology
The University of Queensland, Australia
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Subject: Final Call for Papers: IEEE Special Issue
Date: Sat, 29 Mar 97 17:13:06 +0900
Final Call For Papers
IEEE Expert Special Issue on
Feature Transformation and Subset Selection
Guest Editors: Huan Liu and Hiroshi Motoda
(edited for space ... see Nuggets 96:37 for full CFP
http://www.kdnuggets.com/nuggets/96/n37.html#item4)
III. SUBMISSION REQUIREMENTS and SCHEDULE
High quality, original papers that deal with real-world problems
are solicitated. All the submitted manuscripts will be subject
to a rigorous review process. Manuscripts should be prepared in
accordance with the IEEE Expert "submission guidelines".
Manuscripts should be approximately 5,000 words long, preferably
not exceeding 10 references. This special issue is scheduled to
appear in late 1997.
Important Dates:
Submission April 30 (FIRM DEADLINE)
Notification June 30
Prospective authors should submit six copies of the completed
manuscript to one of the guest editors:
Huan Liu Hiroshi Motoda
S16 #4-17 Institute of Scientific & Industrial
Dept of Info Sys & Comp Sci Research
National University of Singapore Osaka University
Kent Ridge, Singapore, 119260 Ibaraki, Osaka 567, Japan
[email protected] [email protected]
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Sat, 29 Mar 1997 12:08:21 +0800
From: Ong Hwee Leng <[email protected]>
Subject: WinViz for Excel
A version of WinViz which runs with Excel 7.0 on Win95 is available for
sale. WinViz is a multi-dimensional visualisation tool developed at the
Information Technology Institute. More info & self-running demos can be
found at
http://jsaic.iti.gov.sg/projects/vizMain.html
-Hwee-Leng Ong
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 24 Mar 1997 09:39:26 +0600
From: [email protected] (Warren Jones)
Knowledge Discovery Research at University of Alabama at Birmingham (UAB)
URL:http://www.cis.uab.edu/info/kdrg/kdrg.html
This multidisciplinary research group is concentrating on healthcare
applications,
specifically on surveillance problems. The group consists of
representatives from
Computer and Information Sciences, Pathology and Health Informatics. A tool
called
Hawkeye has been developed which searches temporally organized medical
data,
builds associations and applies interestingness heuristics for the
identification
of trends of interest to medical domain experts. Hawkeye is also an example
of a
large scalable KDD system which requires the utilization of all stages of
the KDD
process. One of the important surveillance problems being investigated is
the
spread of antibiotic resistance.
This Group provides a very attractive opportunity for UAB computer science
graduate students to become involved in KDD research with a medical
emphasis.
Four Ph.D. students are currently associated with the Group and its
on-going
research. Graduate Assistantships are available for prospective
Ph.D.students who are interested in entering the program Fall 1997 with a
research interest in
the directions of the Knowledge Discovery Research Group.
UAB is a comprehensive urban institution in Alabama's largest city of
almost a
million population. Student enrollment exceeds 16,400, including more than
3,500 graduate students. The Academic Health Center is well-known for its
interdisciplinary biomedical research. The computer science graduate
program
has an enrollment of 50, half of which are Ph.D. students. The campus
encompasses
a seventy-block area on Birmingham's Southside, offering all of the
advantages of a university within a major city.
Warren T. Jones, Ph.D. Chair
Department of Computer and Information Sciences
University of Alabama at Birmingham
Birmingham, AL 35294-1170
Ph: (205)934-8657
Fax: (205)934-5473
[email protected]
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Robert Straughan <[email protected]>
Subject: Senior Consultant in Data Mining at NSRC in Singapore
Date: Sat, 5 Apr 1997 09:06:47 +0800 (SGT)
Staff Title: Group Leader - Senior Consultant, Commercial Applications
Date Required: 1 June 1997
Job Description: National Supercomputing Research Centre (NSRC) is
Singapore's national centre for High Performance Computing (HPC). NSRC
currently facilitates services and solutions to the Singapore industry
in the field of Computer Aided Engineering, Chemical Applications and
Electronics. Commercial Applications has been identified as a new
growth area, where HPC can make a significant impact on the commercial
industries' competitiveness. NSRC has therefore decided to expand into
this field and is currently looking for a person with extensive
industrial experience in the field of Data Mining within finance,
banking, insurance, or retail marketing. The Group Leader shall take
overall responsibility in promoting NSRC's capabilities within the
field of Data Mining to the commercial industry in Singapore and to
solicit for business. The Group Leader shall work closely with NSRC's
existing staff within this field to develop the best possible strategy
to target potential commercial organisations.
Skills Required: Minimum Masters Degree. Specialisation within the
field of Computer Science and Business Administration. At least 5
years experience from a financial institution or in retail marketing
within the field of Data Mining / Data Analysis. Extensive managerial
experience, in particular project management, business analysis and
negotiation skills. Strong knowledge of statistical analysis and
selection / building of appropriate modelling techniques to solve
business problems. A good understanding of the algorithms used in Data
Mining (neural networks, classifications etc.). Have previously used
IBM SP2 and tools such as Intelligent Miner and Darwin as well as
statistical packages such as SAS and SPSS.
Relocation assistance, allowances for housing, children's education and
transportation apply. Salary will be commensurate with qualifications
and experience.
You can obtain more details by contacting [email protected] or visit
our web site at http://www.nsrc.nus.sg.
Resumes can be sent to:
Administration Manager
NSRC
89 Science Park Drive
The Rutherford #01-05/08
Singapore 118261
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Sun, 23 Mar 97 22:45 EST
Subject: Modern Regression and Classification course - New York
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ +++
+++ Modern Regression and Classification: +++
+++ +++
+++ Statistical prediction methods for finance +++
+++ and marketing +++
+++ +++
+++ +++
+++ New York City: June 23-24, 1997 +++
+++ +++
+++ Trevor Hastie, Stanford University +++
+++ Rob Tibshirani, University of Toronto +++
+++ +++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
This two-day course will give a detailed overview of statistical models
for regression and classification. Known as machine-learning in
computer science and artificial intelligence, and pattern recognition
in engineering, this is a hot field with powerful applications in
finance, science and industry.
This course covers a wide range of models from linear regression
through various classes of more flexible models to fully nonparametric
regression models, both for the regression problem and for
classification.
This special version of our popular MRC course is tailored to financial
and marketing professionals.
Although a firm theoretical motivation will be presented, the emphasis
will be on practical applications and implementations, especially in
the finance and marketing areas. The course will include many examples
and case studies, and participants should leave the course well-armed
to tackle real problems with realistic tools. The instructors are at
the forefront in research in this area.
After a brief overview of linear regression tools, methods for
one-dimensional and multi-dimensional smoothing are presented, as well
as techniques that assume a specific structure for the regression
function. These include splines, wavelets, additive models, MARS
(multivariate adaptive regression splines), projection pursuit
regression, neural networks and regression trees. All of these can be
adapted to the time-series framework for predicting future trends from
the past.
The same hierarchy of techniques is available for classification
problems. Classical tools such as linear discriminant analysis and
logistic regression can be enriched to account for nonlinearities and
interactions. Generalized additive models and flexible discriminant
analysis, neural networks and radial basis functions, classification
trees and kernel estimates are all such generalizations. Other
specialized techniques for classification including nearest- neighbor
rules and learning vector quantization will also be covered.
Apart from describing these techniques and their applications to a wide
range of problems, the course will also cover model selection
techniques, such as cross-validation and the bootstrap, and diagnostic
techniques for model assessment.
Software for these techniques will be illustrated, and a comprehensive
set of course notes will be provided to each attendee.
Additional information is available at the Website:
http://stat.stanford.edu/~trevor/mrc.finance.html
************************************************************
Some quotes from past attendees:
"... the best presentation by professional statisticians I have
ever had the pleasure of attending"
"Superior to most courses in all aspects"
"I really liked how you emphasized concepts rather than
mathematical expressions"
"Your 2-day course has saved me months of research"
*************************************************************
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Rob Tibshirani, Dept of Preventive Med & Biostats, and Dept of Statistics
Univ of Toronto, Toronto, Canada M5S 1A8.
Phone: 416-978-4642 (PMB), 416-978-0673 (stats). FAX: 416 978-8299
computer fax 416-978-1525 (please call or email me to inform)
[email protected]. ftp: //utstat.toronto.edu/pub/tibs
http://www.utstat.toronto.edu/~tibs
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Mon, 31 Mar 1997 13:15:16 -0500 (EST)
Subject: PADD97
PADD97 - The First International Conference and Exhibition on
====================================================
The Practical Application of Knowledge Discovery and Data Mining
=========================================================
23rd April - 25th April 1997
REGISTRATION http://www.demon.co.uk/ar/Expo97/
INFORMATION http://www.demon.co.uk/ar/PADD97/
TUTORIALS
Usama Fayyad, Microsoft Research, USA
Evangelos Simoudis, IBM, USA
DATA Mining and the KDD Process
Blaise Egan, Huw Roberts, BT Laboratories, UK
Knowledge Discovery - Practical Methodology and Case Studies
Luc De Raedt, Catholic University of Leuven, Belgium
Principles and Practice of Inductive Logic Programming
INVITED SPEAKERS
Stephen Muggleton, Oxford University, UK
Declarative Knowledge Discovery in Industrial Databases
Usama Fayyad, Microsoft Research, USA
Data Mining: Algorithms, Challenges and Limitations
Xindong Wu, Monash University, Australia
Building Intelligent Learning Database Systems
Stephen Pass, Red Brick Systems, UK
Data Mining and Data Warehouses - The Power of Integration
Neil Mackin, White Cross Systems, UK
The Application of WhiteCross MPP Servers to Data Mining
PRACTICAL APPLICATION EXPO97
==============================
CONFERENCE REGISTRATION
=========================
Westminster Central Hall, London, 21-25 April, 1997
PADD97 is part of The Practical Application EXPO97 which brings together
four events under one roof: PAAM97 - The Practical Application of
Intelligent Agents and Multi-Agents; PADD97- The Practical Application of
Knowledge Discovery and Data Mining; PACT97-The Practical Application of
Constraint Technology and PAP97-The Practical Application of Prolog.
REGISTRATION NOW AVAILABLE AT
http://www.demon.co.uk/ar/Expo97/
PLEASE VISIT OUR WEB PAGES FOR FURTHER INFORMATION ON
Programmes
Tutorials
Invited Talks
Exhibition
Venue
Hotel reservations
http://www.demon.co.uk/ar/PAP97/
http://www.demon.co.uk/ar/PACT97/
http://www.demon.co.uk/ar/PAAM97/
http://www.demon.co.uk/ar/PADD97/
The Practical Application Company
PO Box 137
Blackpool
Lancs FY2 9UN
UK
Tel: +44 (0)1253 358081
Fax: +44 (0)1253 353811
email: [email protected]
WWW: http://www.demon.co.uk/ar/TPAC/
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 31 Mar 97 12:50:10 -0600 (CST)
From: Melinda Conkling <[email protected]>
Subject: Data warehousing event
Hi -- The Data Warehousing Institute (www.dw-institute.com) is holding its
Best Practices & Implementation Conference in Chicago May 27-June 1, 1997.
All conference information (including how to register) can be found
on-line.
Thanks! -- Melinda
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 10 April Mar 1997 17:48:34 -0500
From: Gregory Piatetsky-Shapiro <[email protected]>
Subject: Paris Data Mining'97 Event, June 2-4 -- cancelled
I have been informed by Gaelle Piernikarch, organizer of the
above conference, that it has been cancelled and
may be rescheduled for fall.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
410.22 | 97:13 | IJSAPL::OLTHOF | Spellchecked Henry Although | Wed Apr 23 1997 12:46 | 655 |
| Knowledge Discovery Nuggets 97:13, e-mailed 97-04-16
News:
* GPS, new address for subscribing to KD nuggets,
[email protected]
* G. Prisco, Query: Knowledge Discovery in Network Alarm Databases
Publications:
* J. Fuernkranz, AAI Spec Issue on First-Order Knowledge Discovery
in Databases, http://www.ai.univie.ac.at/ilp_kdd/aai-si.html
* T. Anand, Review of "Seven Methods for Transforming Corporate Data
into Business Intelligence" by Vasant Dhar and Roger Stein
* S. Kaski, Thesis on data exploration with SOMs available,
http://nucleus.hut.fi/~sami/thesis/thesis.html
Siftware:
* L. Zoob, SemioMap, the Discovery Search Application
http://www.semio.com
* S.D. BYERS, new version of ace.glm for Splus
http://lib.stat.cmu.edu/S/ace.glm
Positions:
* R. Straughan, Senior Consultant in Data Mining at NSRC in Singapore
http://www.nsrc.nus.sg
* N. Dayanand, Manager of the Data Analysis and Applications group
http://www.think.com
Meetings:
* J. Komorowski, PKDD'97 -- Preliminary symposium program,
http://www.idt.ntnu.no/pkdd97/
* ICML-Colt, ICML-97/Colt-97 call for participation
http://cswww.vuse.vanderbilt.edu/~mlccolt/
* X. Wu, CFP: IEEE Knowledge and Data Engineering Exchange
Workshop (KDEX-97), Nov 3, 1997, Newport Beach, CA, USA
http://www.sd.monash.edu.au/kdex-97
* M. Smyth, Hinton -- Jordan Learning Methods course:
spaces still available,
http://www.ai.mit.edu/projects/cbcl/web-pis/jordan/course/
--
Knowledge Discovery Nuggets is a free electronic newsletter for the
Data Mining and Knowledge Discovery community, focusing on the
latest research and applications.
Submissions are most welcome and should be emailed, with a
DESCRIPTIVE subject line (and a URL) to [email protected].
Please keep CFP and meetings announcements short and provide
a URL for details.
To subscribe, see http://www.kdnuggets.com/subscribe.html
KD Nuggets frequency is 3-4 times a month.
Back issues of KD Nuggets, a catalog of data mining tools
("Siftware"), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site
at http://www.kdnuggets.com/
-- Gregory Piatetsky-Shapiro (editor)
[email protected]
********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 is not equal to 3 - not even for very large values of 2.
Grabel's Law
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 16 Apr 1997 09:41:10 -0500 (EST)
From: Gregory Piatetsky-Shapiro <[email protected]>
Subject: New address for subscribing to KD Nuggets --
[email protected]
Thanks to many of you for the good words about Nuggets.
Last week I have completed the transfer of Nuggets server
(now called Knowledge Discovery Nuggets rather than KDD Nuggets
to emphasize the broader scope) to kdnuggets.com site.
To subscribe, please email to [email protected]
1-line message with
subscribe kdnuggets
(to unsubscribe, message should be unsubscribe kdnuggets)
See http://www.kdnuggets.com/subscribe.html for details.
Please address all submissions for Nuggets to [email protected] ;
Email to the old Nuggets address [email protected] will probably be forwarded to
[email protected] for some time, but it is better to send email to the
new address.
-- GPS
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 14 Apr 97 12:48:49 PDT
From: Giuseppe Prisco <[email protected]>
Subject: Knowledge Discovery in Switching Network Alarm Databases
We are interested in the application of KDD methods to a public switching
network alarm database. Our goal is to improve maintenance and severe alarm
prevention. Our research started studying TASA System experience and their
sequence analysis algorithm. Any help would be appreciated, in particular:
- suggestions, experiences etc.
- suggestions about (eventually free) software for searching significant
sequences.
- contacts with any Italian University, in order to start a possible thesis
work on that topic.
Thank you
_________________________________________
Giuseppe Prisco - Software Analyst
Telesoft s.p.a SPR/SSCT
Via degli Agrostemmi, 30 S.Palomba - Roma 00040
tel 06/71035723
email [email protected]
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Tue, 01 Apr 1997 12:50:19 +0200
From: Johannes Fuernkranz <[email protected]>
2nd Call For Papers
Applied Artificial Intelligence
Special issue on
First-Order Knowledge Discovery in Databases
(URL: http://www.ai.univie.ac.at/ilp_kdd/aai-si.html)
A recent MLnet Workshop, held at the ICML-96, focussed on a discussion of
the potential contribution of ILP for KDD. Information on the workshop
including a short summary and all accepted papers can be found at
http://www.ai.univie.ac.at/ilp_kdd/. The general conclusion was that ILP
can
be a valuable tool for data mining, its main advantages being the
expressiveness of first-order logic as a representation language and the
ability of many ILP systems to use strong language biases for restricting
the huge search space. ILP has a high flexibility in incorporating various
forms of background knowledge, which can be invaluable for large KDD tasks.
The special issue on "First-Order Knowledge Discovery in Databases" of the
Applied Artificial Intelligence Journal will thus welcome papers that focus
on one or more of the following topics:
* Embedding ILP into the KDD process
* Necessary pre- and post-processing steps for real-world applications
* Interfacing ILP systems with database managers
* Scalability of ILP for real-world databases
* Criteria for quantifying the complexity of ILP problems
* Evaluation of gain and price of ILP versus propositional learning
* Non-classification learning and discovery in a first-order framework
* Benefits of using background knowledge and/or strong explicit biases
* Innovative real-world applications of ILP
Papers on related subjects are also welcome, but a strong focus on
applications and database issues is required for all submissions.
see http://www.ai.univie.ac.at/ilp_kdd/aai-si.html for full details
on Submissions
Submission Deadline: April 30, 1997
[edited for space. GPS]
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "Anand, Tej" <[email protected]>
Subject: book review for Nuggets
Date: Fri, 4 Apr 1997 16:58:14 -0500
Book Review: "Seven Methods for Transforming Corporate Data into Business
Intelligence" by Vasant Dhar and Roger Stein,
(Prentice-Hall, 1997).
(see http://www.prenhall.com/allbooks/be_0132820064.html for more
on this book. GPS)
It has been quite a while since I have been able to read a
technical/business book in its entirety, but recently I accomplished
this feat with "Seven Methods for Transforming Corporate Data into
Business Intelligence" by Vasant Dhar and Roger Stein. Usually I am
unable to complete a technical/business book because either it is so
high-level (and abstract) that I cannot appreciate how the material
would apply to me, or it is so detailed that I am totally lost "in the
trees".
Seven Methods... is different. This short book starts off by providing
a framework for representing objectives and requirements for
"intelligent systems" (systems that embed AI techniques or systems
that explicitly represent knowledge) using a business oriented
vocabulary. This framework not only helps select the "appropriate"
technique but it helps in formulating the problem that makes that
selection transparent. The business vocabulary helps explain the
selection to management and business types.
The book then describes seven data-intensive modeling techniques (tree
induction, analogical reasoning, fuzzy logic, rule-based systems,
neural nets, genetic algorithms, and OLAP) using the framework. While
these chapters are written to enable business-oriented people to get a
quick understanding of the techniques, they are also great for
technical folks because they can provide us knowledge about techniques
in which we are not experts. All techniques are treated with uniform
depth, which makes it a handy reference. The explanation of the
techniques is highly visual with almost every other page containing a
high quality graphic that explains how the techniques work. One
quibble: Chapter 10, titled Machine Learning, could have been more
aptly titled "Tree Induction".
The book ends with seven detailed (8-10 pages each) case studies of
successful applications of each of the techniques. Each case study is
described using the same framework. This is where the rubber meets the
road, and for the seven case studies selected the framework holds up
very well.
My only real complaint with this book is that it does not talk about using
multiple techniques together.
Btw: I felt this book was so well written that I promptly lent it to my
manager for weekend reading.
Disclaimer: Although we have never worked together, Roger Stein and I
for a brief time shared the same employer: Dun & Bradstreet, Roger at
Moody's and I at A.C Nielsen. One of the case studies is about
Spotlight, a system with which I was associated.
-Tej Anand
NCR Corporation
Human Interface Technology Center
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Sun, 6 Apr 1997 21:54:10 +0300
From: Sami Kaski <[email protected]>
Subject: Thesis on data exploration with SOMs available
The following Dr.Tech. thesis is available at
http://nucleus.hut.fi/~sami/thesis/thesis.html (html-version)
http://nucleus.hut.fi/~sami/thesis.ps.gz (compressed postscript,
300K)
http://nucleus.hut.fi/~sami/thesis.ps (postscript, 2M)
The articles that belong to the thesis can be accessed through the page
http://nucleus.hut.fi/~sami/thesis/node3.html
Data Exploration Using Self-Organizing Maps
Samuel Kaski
Helsinki University of Technology
Neural Networks Research Centre
P.O.Box 2200 (Rakentajanaukio 2C)
FIN-02015 HUT, Finland
Finding structures in vast multidimensional data sets, be they
measurement data, statistics, or textual documents, is difficult and
time-consuming. Interesting, novel relations between the data items
may be hidden in the data. The self-organizing map (SOM) algorithm of
Kohonen can be used to aid the exploration: the structures in the data
sets can be illustrated on special map displays.
In this work, the methodology of using SOMs for exploratory data
analysis or data mining is reviewed and developed further. The
properties of the maps are compared with the properties of related
methods intended for visualizing high-dimensional multivariate data
sets. In a set of case studies the SOM algorithm is applied to
analyzing electroencephalograms, to illustrating structures of the
standard of living in the world, and to organizing full-text document
collections.
Measures are proposed for evaluating the quality of different types of
maps in representing a given data set, and for measuring the
robustness of the illustrations the maps produce. The same measures
may also be used for comparing the knowledge that different maps
represent.
Feature extraction must in general be tailored to the application, as
is done in the case studies. There exists, however, an algorithm
called the adaptive-subspace self-organizing map, recently developed
by Kohonen, which may be of help. It extracts invariant features
automatically from a data set. The algorithm is here characterized in
terms of an objective function, and demonstrated to be able to
identify input patterns subject to different transformations.
Moreover, it could also aid in feature exploration: the kernels that
the algorithm creates to achieve invariance can be illustrated on map
displays similar to those that are used for illustrating the data
sets.
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 10 Apr 1997 17:43:04 -0700
From: Laurie Zoob <[email protected]>
Subject: SemioMap, the Discovery Search Application
Semio Corporation, a newly formed start-up company, is using
computational semiotics to identify patterns and relationships in
text-based information on the internet and intranet. Using data
visualization, the relationships are automatically displayed in a
graphical, navigable map. There is a working alpha version/early beta
of the software at http://www.semio.com. The initial product is called,
SemioMap, the Discovery Search application. SemioMap is targeted toward
the corporate intranet market.
We are currently seeking data mining, knowledge discovery and data base
oriented companies as development partners. If you are interested in
receiving more information, please email me at [email protected].
Best,
Laurie Zoob
Director, Business Development
--
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Laurie Zoob Phone: (415) 802-2943
Director Business Development Fax: (415) 802-2942
Semio Corporation Email: [email protected]
One Dolphin Drive http://www.semio.com
Redwood Shores, CA 94065
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 26 Mar 1997 13:07:39 -0800 (PST)
From: "S.D. BYERS" <[email protected]>
Subject: new version of ace.glm
Dear Splus and GLM users,
I have written a new version of ace.glm for Splus and it is
now available in the S archive at Statlib at
http://lib.stat.cmu.edu/S/ace.glm
This simple function performs the ACE transformation detection
algorithm for generalized linear models using the weighted linear model
obtained from the GLM at convergence of the fitting algorithm.
It generalizes ace.logit, ACE for logistic regression.
A paper describing ace.logit and its uses can be found at
http://www.stat.washington.edu/tech.reports/raftery-richardson.ps
These functions can be powerful tools in Generalised Linear Modelling.
The new ace.glm will work for any GLM that has a family defined in Splus.
It will also work for any link function defined for these families.
Previously, ace.glm worked only for the canonical link function.
By default, ace.glm will pleasantly plot your ACE output if a graphics
device is open.
I would like to hear about any use/abuse/errors that may arise.
Thanks,
Simon Byers,
University of Washington Statistics.
[email protected]
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Robert Straughan <[email protected]>
Subject: Senior Consultant in Data Mining at NSRC in Singapore
Date: Sat, 5 Apr 1997 09:06:47 +0800 (SGT)
Staff Title: Group Leader - Senior Consultant, Commercial Applications
Date Required: 1 June 1997
Job Description: National Supercomputing Research Centre (NSRC) is
Singapore's national centre for High Performance Computing (HPC). NSRC
currently facilitates services and solutions to the Singapore industry
in the field of Computer Aided Engineering, Chemical Applications and
Electronics. Commercial Applications has been identified as a new
growth area, where HPC can make a significant impact on the commercial
industries' competitiveness. NSRC has therefore decided to expand into
this field and is currently looking for a person with extensive
industrial experience in the field of Data Mining within finance,
banking, insurance, or retail marketing. The Group Leader shall take
overall responsibility in promoting NSRC's capabilities within the
field of Data Mining to the commercial industry in Singapore and to
solicit for business. The Group Leader shall work closely with NSRC's
existing staff within this field to develop the best possible strategy
to target potential commercial organisations.
Skills Required: Minimum Masters Degree. Specialisation within the
field of Computer Science and Business Administration. At least 5
years experience from a financial institution or in retail marketing
within the field of Data Mining / Data Analysis. Extensive managerial
experience, in particular project management, business analysis and
negotiation skills. Strong knowledge of statistical analysis and
selection / building of appropriate modelling techniques to solve
business problems. A good understanding of the algorithms used in Data
Mining (neural networks, classifications etc.). Have previously used
IBM SP2 and tools such as Intelligent Miner and Darwin as well as
statistical packages such as SAS and SPSS.
Relocation assistance, allowances for housing, children's education and
transportation apply. Salary will be commensurate with qualifications
and experience.
You can obtain more details by contacting [email protected] or visit
our web site at http://www.nsrc.nus.sg.
Resumes can be sent to:
Administration Manager
NSRC
89 Science Park Drive
The Rutherford #01-05/08
Singapore 118261
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 04 Apr 1997 14:41:09 -0500
From: Nalini Dayanand <[email protected]>
Subject: Job Announcement-Please post
THINKING MACHINES CORPORATION is a leading provider of knowledge discovery
software and services. TMC's high end datamining software suite enables
users to extract meaningful information from large databases. For more
information please see http://www.think.com. The company is seeking an
individual to join the development organization as Manager of the Data
Analysis and Applications group.
The manager of the data analysis and applications group will provide
leadership and individual contribution in the design, development and
deployment of data mining applications, prototypes and application
frameworks. Responsibilities include
* working with product marketing and clients to identify opportunities for
data mining applications
* providing leadership and individual contribution in requirements
definition and application/prototype/framework development
* organizing and managing a team of analysts, software engineers and
technology engineers responsible for the development of specific
applications/prototypes/frameworks
* providing feedback to the development organization on potential
enhancements to existing products
Experience in a telecommunications and/or financial services is desirable
but not essential.
If you background and interests match these expectations, please send your
resume via fax, email or regular mail to
Nalini Dayanand
Thinking Machines Corporation
14 Crosby Drive
Bedford, MA 01730
Fax: (617) 276-0444
email: [email protected]
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Jan Komorowski <[email protected]>
Subject: PKDD'97 -- Preliminary symposium program
PKDD'97 -- 1st European Symposium on Principles of Data Mining and
Knowledge Discovery, Trondheim, Norway, June 24-27, 1997. Preliminary
symposium program and registration information:
http://www.idt.ntnu.no/pkdd97/
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 10 Apr 97 15:04:39 CDT
From: [email protected] (ICML-COLT Administration)
Subject: COLT/ICML
Call for Participation
Tenth Annual Conference on Fourteenth International
Computational Learning Theory Conference on Machine Learning
(COLT-97) (ICML-97)
July 6-9 July 8-11
COLT/ICML Tutorials on July 8
ICML-affiliated Workshops on July 12
Vanderbilt University
Nashville, Tennessee, USA
The organizers of COLT-97 and ICML-97 invite you to participate
in one or both of these conferences. In hopes of encouraging
interactions between the learning theory and machine learning
communities, the conferences are loosely coupled by joint
tutorials, a day of joint technical sessions, a joint banquet,
and otherwise through co-location at Vanderbilt University in
Nashville, Tennessee.
Find all the latest information about COLT-97 and ICML-97 at
http://cswww.vuse.vanderbilt.edu/~mlccolt/, including lists
of papers to be presented, registration and housing material,
information on tutorials and workshops, invited speakers,
travel, and the like. You may also obtain registration and
housing material by writing to [email protected].
--------------------
Registration costs and applicable dates are:
Early Late
(until June 2) (after June 2)
COLT $140 $180
ICML $140 $180
COLT/ICML $240 $310
--------------------
Registration for one of three ICML-affiliated Workshops
on
(1) reinforcement learning,
(2) automata induction, grammatical inference, and language
acquisition, or
(3) machine learning application in the real world
is $25 until June 2, and $35 after June 2.
--------------------
ICML-97 acknowledges generous support from the Daimler-Benz
Corporation. COLT-97 acknowledges generous support from
ATT and is held in cooperation with ACM SIGACT and SIGART.
Both conferences are sponsored by Vanderbilt University.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 11 Apr 1997 11:03:04 +1000 (EST)
From: [email protected] (Xindong Wu)
Subject: CFP: IEEE KDEX-97
1997 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX-97)
--------------------------------------------------------------------
Sponsored by the IEEE Computer Society and Co-located with
the 9th IEEE Tools with Artificial Intelligence Conference
November 3, 1997, Newport Beach, California, U.S.A.
===================================================
Call for Papers
The 1997 IEEE Knowledge and Data Engineering Exchange Workshop
(KDEX-97) will provide an international forum for researchers,
educators and practitioners to exchange and evaluate information and
experiences related to state-of-the-art issues and trends in the areas
of artificial intelligence and databases. The goal of this workshop
is to expedite technology transfer from researchers to practitioners,
to assess the impact of emerging technologies on current research
directions, and to identify emerging research opportunities.
Educators will present material and techniques for effectively
transferring state-of-the-art knowledge and data engineering
technologies to students and professionals. The workshop is currently
scheduled for an one-day duration, but depending on the final program
it might be extended to a second day.
Submissions can be in the form of survey papers, experience reports,
and educational material to facilitate technology transfer. Accepted
papers will be published in the workshop proceedings by the IEEE
Computer Society. A selected number of the accepted papers will
possibly be expanded and revised for publication in the IEEE
Transactions on Knowledge and Data Engineering (IEEE-TKDE) and the
International Journal of Artificial Intelligence Tools. Educational
material related to papers published in the IEEE-TKDE will be posted
on the IEEE-TKDE home page.
The theme of the workshop is "AI MEETS DATABASES". Topics of interest
include, but are not limited to:
- Computer supported cooperative processing and interoperable
systems
- Data sharing, data warehousing and meta-data management
- Distributed intelligent mediators and agents
- Distributed object management
- Dynamic knowledge
- Evaluation and measurement of knowledge and database systems
- High-performance issues (including architectures, knowledge
representation techniques, inference mechanisms, algorithms and
integration methods)
- Information structures and interaction
- Intelligent search, data mining and content-based retrieval
- Knowledge and data engineering systems
- Quality assurance for knowledge and data engineering systems
(correctness, reliability, security, survivability and
performance)
- Software re-engineering and intelligent software information
systems
- Spatio-temporal, active, mobile and multimedia data
- Emerging applications (biomedical systems, decision support,
geographical databases, Internet technologies and applications,
digital libraries, etc.)
All submissions should be limited to a maximum of 5,000 words. Six
hardcopies should be forwarded to the following address.
Xindong Wu (KDEX-97)
Department of Software Development
Monash University
900 Dandenong Road
Caulfield East, Melbourne 3145
Australia
Phone: +61 3 9903 1025
Fax: +61 3 9903 1077
E-mail: [email protected]
Please include a cover page containing the title, authors (names,
postal and email addresses, telephone and fax numbers), and an
abstract. This cover page must accompany the paper.
************ I m p o r t a n t D a t e s *****************
* 6 copies of full papers received by: June 15, 1997 *
* acceptance/rejection notices: July 31, 1997 *
* final camera-readies due by: August 31, 1997 *
* workshop: November 3, 1997 *
************************************************************
Further Information
===================
WWW: http://www.sd.monash.edu.au/kdex-97
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Marney Smyth <[email protected]>
Subject: Hinton -- Jordan Learning Methods course : spaces still available
Date: Thu, 10 Apr 1997 07:38:25 -0400 (EDT)
some spaces still available ...
**************************************************************
*** ***
*** Learning Methods for Prediction, Classification, ***
*** Novelty Detection and Time Series Analysis ***
*** ***
*** Washington, D.C., May 2 -- 3, 1997 ***
*** ***
*** Geoffrey Hinton, University of Toronto ***
*** Michael Jordan, Massachusetts Inst. of Tech. ***
*** ***
**************************************************************
A two-day intensive Tutorial on Advanced Learning Methods will be held
May 2 -- 3rd, 1997, at the Hyatt Regency on Capitol Hill, Washington
D.C. Space is available for up to 50 participants for the course.
The course will provide an in-depth discussion of the large collection
of new tools that have become available in recent years for developing
autonomous learning systems and for aiding in the analysis of complex
multivariate data. These tools include neural networks, hidden Markov
models, belief networks, decision trees, memory-based methods, as well
as increasingly sophisticated combinations of these architectures.
Applications include prediction, classification, fault detection,
time series analysis, diagnosis, optimization, system identification
and control, exploratory data analysis and many other problems in
statistics, machine learning and data mining.
(edited for space)
ADDITIONAL INFORMATION
A registration form is available from the course's WWW page at
http://www.ai.mit.edu/projects/cbcl/web-pis/jordan/course/
Marney Smyth
E-mail: [email protected]
Phone: 617 258-8928
Fax: 617 258-6779
|
410.23 | 97:14 | IJSAPL::OLTHOF | Spellchecked Henry Although | Thu Apr 24 1997 12:47 | 539 |
| Knowledge Discovery Nuggets 97:14, e-mailed 97-04-23
News:
* E. Bertino, Query: data mining from wafers manufacturing process ?
Publications:
* M. Ramoni, Technical Reports on Bayesian Knowledge Discovery,
http://kmi.open.ac.uk/~marco/projects/kdd
* Tom Mitchell, Text book for Data Mining: Machine Learning
http://www.cs.cmu.edu/~tom/mlbook.html
Siftware:
* R. Quinlan, Windows Version of C5.0 ("See5") Available Now
http://www.rulequest.com
* Stanley Rice, Postcoordinate Software
http://www.cruzio.com/~autospec/darwin.htm
* Pamela Lerwick, IDIS Special Release
http://www.datamining.com
Positions:
* R. King, Ph.D. Studentships in Data Mining at University of Wales, UK
* Fred J. Damerau, Research Associate in Text Mining/Information
Extraction
--
Knowledge Discovery Nuggets is a free electronic newsletter for the
Data Mining and Knowledge Discovery community, focusing on the
latest research and applications.
Submissions are most welcome and should be emailed, with a
DESCRIPTIVE subject line (and a URL) to [email protected].
Please keep CFP and meetings announcements short and provide
a URL for details.
To subscribe, see http://www.kdnuggets.com/subscribe.html
KD Nuggets frequency is 3-4 times a month.
Back issues of KD Nuggets, a catalog of data mining tools
("Siftware"), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site
at http://www.kdnuggets.com/
-- Gregory Piatetsky-Shapiro (editor)
[email protected]
********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Restlessness and discontent are the necessities of progress.
--Thomas A. Edison
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Thu, 17 Apr 1997 09:44:45 +0200 (METDST)
Subject: data mining from wafers manufacturing process
At our University, we are starting an application project
dealing with data from a wafers manifacturing process.
We are thinking to use data mining techniques
for try to address the following problem.
Some of those wafers are faulty. There is a database keeping track
of the entire manifacturing process for each wafer and collecting
large amount of data concerning each step of the manifacturing
process (there are about 300 steps; each step is characterized
about 100 parameters). Our problem is use data mining techniques
in helping the diagnosis, that is, to see which step
may have caused the problem.
I was wondering whether you are aware of any use of data mining
techniques for similar problems. We have also to acquire
some suitable data mining tools.
I would appreciate any suggestion you may give me on this
issue.
Best regards Elisa
----------------------------------------------------------------------------
---
Prof. Elisa Bertino
Dipartimento di Scienze dell'Informazione
Universita' di Milano
Via Comelico 39/41
20135 Milano (Italy)
tel: (+39)2-55006227
fax: (+39)2-55006253
e-mail: [email protected]
[email protected]
www http://mercurio.sm.dsi.unimi.it/~bertino/
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 9 Apr 1997 19:23:44 +0100
From: Marco Ramoni <[email protected]>
Subject: Technical Reports Available
The following reports are available on the World Wide Web. Further
information about the Bayesian Knowledge Discovery Project can be
reached at
http://kmi.open.ac.uk/~marco/projects/kdd
Marco
____________________________________________________________________________
__
Title: Efficient Parameter Learning in Bayesian Networks from
Incomplete Databases
Authors: Marco Ramoni [1] and Paola Sebastiani [2]
1.Knowledge Media Institute, The Open University.
2.Department of Actuarial Science and Statistics, City University.
TR number: KMI-TR-41
Date: January 1997
Keywords: Bayesian Belief Networks; Machine Learning,
Probabilistic Reasoning, Missing Data.
Abstract:
Current methods to learn conditional probabilities from incomplete
databases use a common strategy: they complete the database by
inferring somehow the missing data from the available information and
then learn from the completed database. This paper introduces a new
method - called bound and collapse (BC) - which does not follow this
strategy. BC starts by bounding the set of estimates consistent with the
available information and then collapses the resulting set to a point
estimate via a convex combination of the extreme points, with weights
depending on the assumed pattern of missing data. Experiments
comparing BC to the Gibbs Samplings are also provided.
WWW: http://kmi.open.ac.uk/kmi-abstracts/kmi-tr-41-abstract.html
____________________________________________________________________________
__
Title: Learning Bayesian Networks from Incomplete Databases
Authors: Marco Ramoni [1] and Paola Sebastiani [2]
1.Knowledge Media Institute, The Open University.
2.Department of Actuarial Science and Statistics, City University.
Reference: Technical Report KMI-TR-43
Date: February 1997
Keywords: Bayesian Belief Networks, Bayesian Learning, Missing Data, Model
Selection
Abstract:
Bayesian approaches to learn the graphical structure of Bayesian Belief
Networks (BBNs) from databases share the assumption that the
database is complete, that is, no entry is reported as unknown. Attempts
to relax this assumption often involve the use of expensive iterative
methods to discriminate among different structures. This paper
introduces a deterministic method to learn the graphical structure of a
BBN from a possibly incomplete database. Experimental evaluations
show a significant robustness of this method and a remarkable
independence of its execution time from the number of missing data.
WWW: http://kmi.open.ac.uk/kmi-abstracts/kmi-tr-43-abstract.html
____________________________________________________________________________
_
Title: The Use of Exogenous Knowledge to Learn Bayesian Networks
from Incomplete Databases
Authors: Marco Ramoni [1] and Paola Sebastiani [2]
1.Knowledge Media Institute, The Open University.
2.Department of Actuarial Science and Statistics, City University.
TR number: KMI-TR-44
Date: February 1997
Keywords: Information extraction, Uncertainty and noise in data,
Bayesian inference.
Abstract:
Current methods to learn Bayesian Networks from incomplete
databases share the common assumption that the unreported data are
missing at random. This paper describes a method - called Bound and
Collapse (BC) - to learn Bayesian Networks from incomplete databases
which allows the analyst to efficiently integrate the information
provided by the database and the exogenous knowledge about the pattern
of missing data. BC starts by bounding he set of estimates consistent
with the available information and then collapses the resulting set to
a point estimate via a convex combination of the extreme points, with
weights depending on the assumed pattern of missing data. Experiments
comparing BC to the Gibbs Samplings are also provided.
WWW: http://kmi.open.ac.uk/kmi-abstracts/kmi-tr-44-abstract.html
____________________________________________________________________________
Title: Discovering Bayesian Networks in Incomplete Databases
Authors: Marco Ramoni [1] and Paola Sebastiani [2]
1.Knowledge Media Institute, The Open University.
2.Department of Actuarial Science and Statistics, City University.
TR number: KMI-TR-46
Date: March 1997
Keywords: Information extraction, Uncertainty and noise in data,
Bayesian inference.
Abstract:
Bayesian Belief Networks (BBNs) are becoming increasingly
popular in the Knowledge Discovery and Data Mining community. A
BBN is defined by a graphical structure of conditional dependencies
among the domain variables and a set of probability distributions
defining these dependencies. In this way, BBNs provide a compact
formalism - grounded in the well-developed mathematics of
probability theory - able to predict variable values, explain
observations, and visualize dependencies among variables. During
the past few years, several efforts have been addressed to develop
methods able to extract both the graphical structure and the
conditional probabilities of a BBN from a database. All these
methods share the assumption that the database at hand is complete,
that is, it does not report any entry as unknown. When this
assumption fails, these methods have to resort to expensive iterative
procedures which are infeasible for large databases. This paper
describes a new Knowledge Discovery system based on an efficient
method able to extract the graphical structure and the probability
distributions of a BBN from possibly incomplete databases. An
application using a large real-world database will illustrate methods
and concepts underlying the system and will assess its advantages as
a Knowledge Discovery system.
WWW: http://kmi.open.ac.uk/kmi-abstracts/kmi-tr-46-abstract.html
____________________________________________________________________________
__
Marco Ramoni
Knowledge Media Institute Phone: +44-1908-65-5721
The Open University Fax: +44-1908-65-3169
Walton Hall Email: [email protected]
Milton Keynes MK7 6AA URL: http://kmi.open.ac.uk/~marco
UNITED KINGDOM CUSeeMe: 137.108.81.18
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 16 Apr 1997 10:24:19 -0400
From: Tom Mitchell <[email protected]>
Sibject: Text book for Data Mining: Machine Learning by Tom Mitchell
DATAMINING TEXTBOOK: Machine Learning, Tom Mitchell, McGraw Hill
McGraw Hill announces immediate availability of a new textbook that
covers the primary algorithms used in datamining. MACHINE LEARNING
provides a thorough, interdisciplinary introduction to the key
algorithms used in datamining.
Free inspection copies are available for instructors, by contacting
Betsy Jones (McGraw Hill) at (630) 789-5057.
The chapter outline is:
1. Introduction
2. Concept Learning and the General-to-Specific Ordering
3. Decision Tree Learning
4. Artificial Neural Networks
5. Evaluating Hypotheses
6. Bayesian Learning
7. Computational Learning Theory
8. Instance-Based Learning
9. Genetic Algorithms
10. Learning Sets of Rules
11. Analytical Learning
12. Combining Inductive and Analytical Learning
13. Reinforcement Learning
(414 pages)
This book is intended for upper-level undergraduates, graduate
students, and professionals working in the area of datamining, machine
learning, and statistics. The text includes over a hundred homework
exercises, along with web-accessible code and datasets (e.g., neural
networks applied to face recognition, Bayesian learning applied to
text classification).
For further information and ordering instructions, see
http://www.cs.cmu.edu/~tom/mlbook.html
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected] (Ross Quinlan)
Date: Wed, 16 Apr 1997 07:47:28 -0400 (EDT)
Subject: Windows Version of C5.0 ("See5") Available Now
Please see http://www.rulequest.com for details. As with the
Unix version, a scaled-down demonstration version is free, and
there is also a free 10-day trial of the real thing.
Ross
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[The following is a commercial announcement. GPS]
Date: Sat, 19 Apr 97 11:51:52 PDT
From: Stanley Rice <[email protected]>
Now that spring is sprung, what about tasting some
PRECOORDINATE WINES FROM POSTCOORDINATE BOTTLES? ;-)
Like the taste of wine, relevance is not objective to us. It
is subjective, without crisp definition, dependent on our
context, describable only by fuzzy postcoordinations. SIGs
as well as individuals recognize relevance only in context.
With a little help from our friends we can optimize
relevance. But most folks have never even heard the word
postcoordination. Precoordinate systems still predominate--
Yahoo categories, single topic and alphabetical filings--at
work, at school, and at home.
The Internet, AltaVista-style search engines, and Thematic
concept filtering will change a lot of that before long. The
change may come more smoothly because old precoordinations
can be included under postcoordinations, and actually be
much enhanced thereby. Just putting the old wine in the new
bottles can multiply its bouquet and value. (No, there is
nothing for sale here.)
Examples of postcoordination possibilities with included
fuzzy precoordinations, suited to electronic libraries,
corporate intranets (and many other "incoherent" but
currently precoordinated collections) are given at:
http://www.cruzio.com/~autospec/darwin.htm
(Darwin's "The Voyage of the Beagle" is used to illustrate
Dewey precoordinations included under postcoordinations.)
Want a different kind of example? Consider "Correlating
Symptoms and Remedies," which includes uses for various
kinds of traditional diagnostic precoordinations:
http://www.cruzio.com/~autospec/accessf.htm
On the Autospec home page (address below) we look at
postcoordination of contextual and conceptual filtering from
many points of view. Your reactions are always appreciated.
In any case, relax and have another glass. It's spring! ;-)
Regards, Stan Rice
--
THEMATICS: Conceptual & Marketing Access to Text and Media
AUTOSPEC, Inc. Santa Cruz, CA. Stan Rice Voice: (408) 457-1430
Home page for Autospec: http://www.cruzio.com/~autospec/
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[The following is a commercial announcement. GPS]
Date: Tue, 22 Apr 1997 11:09:49 -0700
From: Pamela Lerwick <[email protected]>
Subject: IDIS Special Release
Contact: IDI Marketing Communications
(310) 936-3600
New Machine-Man Paradigm
Refocuses Data Mining
Novel Approach Based on Explainable Intranet Documents Introduces New
Languages and Techniques for Data Mining
____________________________________________________________________________
_
Los Angeles -- April 21, 1997
The 1997 Database World Conference in Boston will witness the birth of a
new
computing paradigm for decision support -- certain to affect the way
corporations use and benefit from computers. While most computing to date
has focused on man-machine interaction, this new and novel approach
introduces machine-man interaction.
In man-machine systems, humans view machines as "order-takers" -- we tell
machines what to do, not help them tell us what they know. This one-way
bias
is manifest even in the term man-machine itself.
While the direction of man-machine systems has been from man to machine,
the
focus of machine-man interaction is from machine to man, assisting machines
to say their piece -- delivering the benefits of the immense knowledge they
possess. This does not mean natural language output, but is based on a
specific and novel approach to model building, data structuring, language
design and information delivery.
With a database query language or a programming language, the user types or
otherwise inputs a query or program -- the machine then tries to understand
it and generate a response. In machine-man interaction, the machine types
up
a set of statements as an "explainable document" and the user understands
them to improve decision making.
This dramatic new idea will be first presented at the Database World
Conference in Boston, on May 20, 1997 by Dr. Kamran Parsaye, CEO of
Information Discovery, Inc.
He will discuss the far reaching consequences of this paradigm for
corporate
computing.
The NASA Scientific and Technical Information Program defines a man-machine
system as: "A System in which the functions of the man and the machine are
interrelated and necessary for the operation of the system." Similarly, Dr.
Parsaye defines a machine-man system as: "A System in which the functions
of
the machine and the man are interrelated and necessary for the thinking of
the man."
For a machine to tell us anything, it needs a suitable language of
expression. It needs to be able to phrase its knowledge in terms of a
language understandable by us. When dealing with computer systems, the term
"language" has often been used in the context of programming languages and
query languages. In machine-man interaction, we need languages that help
machines express their knowledge for our benefit -- i.e. knowledge
expression languages.
Programming and query languages have to be understandable by computers,
knowledge expression languages have to be comprehensible to human users --
they are the tools machines use to help us. Dr. Parsaye will illustrate how
traditional languages and systems such as SQL or OLAP are inadequate due to
their focus on one-way interaction models.
Machine-man interaction requires three distinct language facilities: First
a
language to organize the environment and develop scripts, etc. as one does
in any system, second a language to let a developer or analyst define
models, set up scenarios and specify terms for the lexicon to be used by
the
machine (i.e. an interactive document composition language), and third a
language to allow the machine to express knowledge (i.e. a knowledge
expression language.)
Using agent technology on the inter/intranet, machine-man system have a
life
of their own. They look for patterns with agents, perform discovery and
when there is something interesting to say, they generate an "explainable
document" on the intranet in plain English (or Italian, French, etc.)
accompanied by graphs. Machines need no longer be just order-takers, but
can
be the finders and communicators of knowledge.
The impact of the new paradigm on corporate planning for decision support
and data warehousing will be significant. Business users and IS departments
need no longer just consider "tools" as a method of data mining, but can
rely on automatically generated Java-based explainable documents with rich
text and graphic content. This will simultaneously accelerate the use of
Java, intranets, data warehousing and data mining.
For more information on the Database World Conference please visit DCI at
http://www.DCIexpo.com on the internet, or call (508) 470-3870. For more
information on Information Discovery, Inc. please visit
http://www.datamining.com on the internet or call (310) 937-3600.
Pamela Lerwick
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 14 Apr 1997 17:14:00 +0100
From: ROSS DONALD KING <[email protected]>
Subject: Ph.D. Studentships
Field: data mining, machine learning, ILP, scientific discovery
Place: University of Wales, Aberystwyth
Wales, UK
Applications are invited for Ph.D. Studentships in the area of data mining
in the Centre for Intelligent Systems at the Department of Computer
Science, University of Wales, Aberystwyth.
The Centre for Intelligent Systems has a particular interest in
knowledge rich data mining systems, Inductive Logic programming,
and applications in biology and chemistry.
Applicants should have at least a 2(i) in Computer Science or related
subject, with a good background in Artificial Intelligence or
Statistics.
More information can be obtained from
Professor Mark Lee or Dr. Ross D. King
Department of Computer Science,
University of Wales,
Penglais,
Aberystwyth,
Ceredigion, SY23 3DB,
Wales, UK
Tel: +44 1970 622420
Fax: +44 1970 622455
Email: [email protected] [email protected]
or from the URLs:
http://www.aber.ac.uk/~dcswww/Public/Recruitment/Proposals/
http://www.aber.ac.uk/~dcswww/Public/Research/
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 17 Apr 97 09:32:42 EDT
From: "Fred J. Damerau (862-2214)" <[email protected]>
Subject: Research Associate Position in Text Mining/Information Extraction
The Natural Language Understanding Group at the IBM T. J. Watson
Research Laboratory (Yorktown Heights, NY 10566) is looking for
a Research Associate with the qualifications listed below. The
position will most likely be initially for one year, but it is
renewable. The successful candidate will work on our text mining/
information extraction project, with a particular emphasis on
applying machine learning techniques to various issues in document
management. The project combines state-of-the-art research on machine
learning in text mining with practical production-level systems building.
________________________________________________________________
Qualifications:
The ideal candidate would have the following knowledge and experience.
Education: MA/MS in computer science or other field with extensive
background in computer science.
Programming languages:
Extensive knowledge and experience in C/C++ required; Java a plus.
Specialized Background:
Experience in implementing machine learning algorithms and/or
natural language processing algorithms.
Operating systems:
Required: Familiarity with Windows95/NT and Unix/AIX,
Helpful: Familiarity with OS/2
System programming/API experience on these operating systems not required.
General Software Development:
Familiarity with issues of large scale software development, e.g.,
API design and use, creation and integration of DLLs/Libraries,
source code control systems etc.
Candidates should send resumes and supporting letters to:
Thomas Hampp
eMail: [email protected]
phone: 914-945-1714
End of message
|
410.24 | 97:15 | IJSAPL::OLTHOF | Spellchecked Henry Although | Tue May 06 1997 11:34 | 1146 |
| Knowledge Discovery Nuggets 97:15, e-mailed 97-05-04
News:
* R. Uthurusamy, KDD-97 Overview and Tutorials
http://www-aig.jpl.nasa.gov/kdd97-docs/kdd97.tutorials.html
* R. Uthurusamy, KDD-97 Workshop, Integration of Data Mining and
Data Visualization
http://www.cs.uml.edu/~grinstei/kddvis-workshop.html
* R. Uthurusamy, KDD-97 Registration Information
http://www-aig.jpl.nasa.gov/kdd97-docs/registrationinfo.html
* Peter Turney, data mining from wafers manufacturing process
Siftware:
* Nicolas Bissantz, Delta Miner 3.0
http://www.bissantz.de
Positions:
* Pablo Tamayo, Job Position at Thinking Machines
Meetings:
* E. Horvitz, Call for participation, UAI-97,
http://cuai97.microsoft.com
* Gordian Institute, "Making Sense of Data: Computer-Aided
Pattern Discovery", July 14-18, Charlottesville, Virginia http://www.gordianknot.com
* R. Zicari, COMDEX Internet & OBJECT WORLD Frankfurt`97 (Oct 7-10)
http://www.ltt.de
--
Knowledge Discovery Nuggets is a free electronic newsletter for the
Data Mining and Knowledge Discovery community, focusing on the
latest research and applications.
Submissions are most welcome and should be emailed, with a
DESCRIPTIVE subject line (and a URL) to [email protected].
Please keep CFP and meetings announcements short and provide
a URL for details.
To subscribe, see http://www.kdnuggets.com/subscribe.html
KD Nuggets frequency is 3-4 times a month.
Back issues of KD Nuggets, a catalog of data mining tools
("Siftware"), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site
at http://www.kdnuggets.com/
-- Gregory Piatetsky-Shapiro (editor)
[email protected]
********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A gentleman is not a pot
Confucius
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 24 Apr 1997 18:06:38 -0400
From: [email protected] (R. Uthurusamy)
Subject: KDD-97 Registration Information
KDD-97 Registration Brochure
Third International Conference on Knowledge Discovery and Data Mining (KDD-97)
August 14-17, 1997
Sponsored by the American Association for Artificial Intelligence
http://www.aaai.org
KDD-97: A Preview
The rapid growth of data and information has created a need and an
opportunity for extracting knowledge from databases, and both researchers
and application developers have been responding to that need. Knowledge
discovery in databases (KDD), also referred to as data mining, is an area
of common interest to researchers in machine discovery, statistics,
databases, knowledge acquisition, machine learning, data visualization,
high performance computing, and knowledge-based systems. KDD applications
have been developed for astronomy, biology, finance, insurance, marketing,
medicine, and many other fields.
The Third International Conference on Knowledge Discovery and Data Mining
(KDD-97) will follow up the success of KDD-95 and KDD-96 by bringing
together researchers and application developers from different areas
focusing on unifying themes.
KDD-97 Organization
General Conference Chair:
Ramasamy Uthurusamy, General Motors Corporation, USA
Program Cochairs:
David Heckerman, Microsoft Research, USA
Heikki Mannila, University of Helsinki, Finland
Daryl Pregibon, AT&T Laboratories, USA
Publicity Chair:
Paul Stolorz, Jet Propulsion laboratory, USA
Tutorial Chair:
Padhraic Smyth, University of California, Irvine, USA
Demo and Poster Sessions Chair:
Tej Anand, NCR Corporation, USA
Awards Chair:
Gregory Piatetsky-Shapiro, Geneve Consulting, USA
Keynote Speaker:
Peter Huber, Universitat Bayreuth, Germany
"From Large to Huge. A Statistician's Reactions to KDD & DM"
The statistics and AI communities are confronted by the same challenge, the
onslaught of ever larger data collections, but the two communities have
reacted independently and differently. What could they learn from each
other if they looked over the fence? What is amiss on either side?
KDD-97 Tutorial Abstracts and Speakers
--------------------------------------
Full info on tutorials available at
http://www-aig.jpl.nasa.gov/kdd97-docs/kdd97.tutorials.html
All tutorials will be presented on Thursday, August 14, 1997. The times
listed below are tentative. Admission to the tutorials is included in your
conference registration fee. Registrants can attend up to four consecutive
tutorials, including four tutorial syllabi.
8:00 to 10:00am T1- Fayyad and Simoudis (single session)
Session 1 Session 2
10:30am to 12:30pm T2 - Hand T3 - Feldman
1:30 to 3:30pm T4 - Swayne and Cook T5 - Chaudhuri and Dayal
4:00 to 6:00 pm T6 - Keim T7 - DuMouchel
Tutorial 1: 8:00-10:00am
Data Mining and KDD: An Overview
Usama Fayyad, Microsoft Research and Evangelos Simoudis, IBM
We present a basic tutorial of this new and emerging area and emphasize
relations to constituent communities, including statistics, databases,
pattern recognition, learning, and visualization. The tutorial provides a
basic overview of the KDD process for extracting knowledge from databases
and covers the basics of each step in the process including: data
warehousing, selection and cleaning, data transformation, data mining,
evaluation, and visualization. We also cover a sampling of successful
applications and outline challenges and issues to be addressed.
Dr. Usama Fayyad is a Senior Researcher at Microsoft Research, the Decision
Theory & Adaptive Systems Group. His research interests include knowledge
discovery in large databases, data mining, machine learning, statistical
pattern recognition, and clustering. After receiving the Ph.D. degree in
1991, he joined the Jet Propulsion Laboratory (JPL), California Institute
of Technology (until 1996). At JPL, he headed the Machine Learning Systems
Group where he developed data mining systems for analysis of large
scientific databases.
Dr. Evangelos Simoudis is Vice President, Global Business Intelligence
Solutions - IBM North America, where he is responsible for the development
and deployment of data mining and decision support solutions to IBM's
customers worldwide. Dr. Simoudis received a B.A. in Physics from Grinnell
College, a B.S. in Electrical Engineering from California Institute of
Technology, an M.S. in Computer Science from the University of Oregon, and
a Ph.D. in Computer Science from Brandeis University.
Tutorial 2: 10:30am-12:30pm
Modelling Data and Discovering Knowledge
David Hand, Open University, UK
Our aim is to extract knowledge from large bodies of data. The size of
these bodies mean that we cannot do it unaided, but must use fast
computers, applying sophisticated statistical tools. Attempts to automate
the process of knowledge extraction date from at least the early 1980s,
with the work on statistical expert systems. We examine this work, noting
its successes and failures and, especially, what researchers in data mining
and knowledge discover can learn from those efforts. We examine what data
are, what information is, and what knowledge is. We contrast modelling with
discovery, especially in the context of large data sets. We examine high
level modelling issues, such as overfitting, generalisability,
overmodelling, and model evaluation. And we examine high level exploration
issues such as the discovery of accidental artefacts. The confluence of
computing and statistics in some areas provides a nice backdrop against
which to examine these issues, and we briefly discuss neural networks and
classification trees from these two perspectives.
Dr. David Hand is Professor of Statistics at the Open University. His
research interests include the foundations of statistics, statistical
computing, and multivariate statistics, the latter especially as applied to
classification problems. His applications interests include medicine,
finance, and psychology. He is Editor-in-Chief of Statistics and Computing
and has has published fourteen books, the most recent of which is
Construction and Assessment of Classification Rules, Wiley, January 1997.
Tutorial 3: 10:30am-12:30pm
Text Mining - Theory and Practice
Ronen Feldman, Bar-Ilan University, Israel
Knowledge Discovery in Databases (KDD) focuses on the computerized
exploration of large amounts of data and on the discovery of interesting
patterns within them. While most work on KDD has been concerned with
structured databases, there has been little work on handling the huge
amount of information that is available only in unstructured textual form.
In this tutorial we will present the general theory of Text Mining and will
demonstrate several systems that use these principles to enable interactive
exploration of large textual collections. We will describe generic
techniques for text categorization and information extraction that are used
by these systems. The systems that will be presented are KDT which is the
system for Knowledge Discovery in Texts; FACT, which discovers associations
among keywords labeling the items in a collection of textual documents; and
the Text Explorer, which is a system that provides a high level language
for interactive exploration of textual collections. We will present a
general architecture for text mining and will outline the algorithms and
data structures behind the systems. We will give special emphasis to
incremental algorithms and to efficient data structures.
Dr. Ronen Feldman is a lecturer at the Mathematics and Computer Science
Department of Bar-Ilan University in Israel. He received his B.Sc. in Math,
Physics and Computer Science from the Hebrew University, and his Ph.D. in
Computer Science from Cornell University. His main research is in the area
of Machine Learning and Data Mining. He is currently coordinating several
research projects for developing dedicated text mining systems. These
systems work on plain text collections and on the Internet.
Tutorial 4: 1:30-3:30pm
Exploratory Data Analysis using Interactive Dynamic Graphics
Deborah Swayne, Bell Communications Research and Diane Cook, Iowa State
University
Researchers and software designers in the field of data mining are just
beginning to make extensive use of graphical methods. Interactive dynamic
data visualization has been explored in the field of statistics for over
twenty years, and we propose that much of what has been learned in
statistics is relevant for data mining. This class is an introduction to
interactive data visualization as it is practiced as part of exploratory
data analysis. The XGobi software, publicly available dynamic visualization
software, will be used in the analysis of examples from biology, business,
physics, engineering, and telecommunications. The examples will illustrate
a set of general visualization principles which are embodied in specific
methods such as brushing and identification of points in simple
scatterplots, three dimensional rotations, rotations in higher dimensions
such as the grand tour, and directed searches in higher dimensions for
interesting two dimensional views using projection pursuit and manual
control.
Deborah Swayne has worked at Bellcore since that company's inception in
1985, and is currently a member of the Statistics and Data Mining Research
Group. Her research focusses on software methods for visualizing data. She
is one of the authors of the XGobi software, originally developed at
Bellcore. She has a Bachelor's degree in African Linguistics from the
University of Wisconsin at Madison, and a Master's degree in Statistics
from Rutgers University.
Dr. Dianne Cook is an Assistant Professor in the Department of Statistics,
Iowa State University. She received her PhD from Rutgers University in May
1993, and has conducted research into dynamic statistical graphics. Her
interests include using these methods for understanding high-dimensional
data, and adapting them for analyzing geographically referenced data with
multiple measurements at each site.
Tutorial 5: 1:30-3:30pm
OLAP and Data Warehousing
Surajit Chaudhuri, Microsoft Research and Umesh Dayal, Hewlett Packard
Laboratories
On-Line Analytical Processing (OLAP) and Data Warehousing technologies
enable enterprises to gain competitive advantage by exploiting the
ever-growing amount of data that is collected and stored in corporate
databases and files for better and faster decision making. Over the past
few years, these technologies have experienced explosive growth, both in
the number of products and services offered, and in the extent of coverage
in the trade press. Vendors (including all database companies) are paying
increasing attention to all aspects of decision support. The area opens up
interesting research directions, with ties to past work in database
systems, but with different assumptions and requirements. Only very
recently, however, has the database research community started to
understand and address some of these issues. This tutorial presents an
overview of OLAP and data warehousing, and an in-depth study of selected
aspects. An outline of the tutorial follows:
1. Introduction: definitions, evolution, differences from OLTP,
architectures 2. Models and Tools: conceptual model for OLAP, front-end
tools (e.g., multidimensional spreadsheets), database design (e.g., star
and snowflake schema). 3. Database Server technologies for Decision Support
Queries: specialized indexing techniques, specialized join and scan
methods, data partitioning and use of parallelism, intelligent processing
of aggregates, complex query processing, extensions to SQL, ROLAP vs.
MOLAP. 4. Other Services for OLAP/Data warehousing: data cleaning, loading
and refresh, tools for warehouse, system and process management, metadata
management and the role of repository. 5. State of Commercial Practice. 6.
Research Issues. The target audience is researchers and developers
interested in learning about the concepts, products and the technical
innovations in the area of decision support technologies.
Dr. Surajit Chaudhuri is a researcher in the Database Research Group of
Microsoft Research. From 1992 to 1995, he was a Member of the Technical
Staff at Hewlett-Packard Laboratories, Palo Alto. He did his B.Tech at the
Indian Instiute of Technology, Kharagpur and his Ph.D. at Stanford
University. In addition to query processing and optimization, Surajit is
interested in the areas of data mining, database design and uses of
databases for nontraditional applications.
Dr. Umesh Dayal is a senior researcher at Hewlett-Packard Labs, Palo Alto,
California. His current research interests are in distributed information
systems, workflow management, data mining, and information management
issues related to the emerging global information infrastructure. He
received his Ph.D. and S.M. degrees from Harvard University, his M.E. and
B.E. degrees from the Indian Institute of Science, and his B.Sc. degree
from Osmania University, India.
Tutorial 6: 4:00-6:00pm
Visual Techniques for Exploring Databases
Daniel Keim, University of Munich
For data exploration to be effective, it is important to include the human
in the exploration process and combine the flexibility, creativity, and
general knowledge of the human with the enormous storage capacity and the
computational power of today's computers. Visual database exploration aims
at integrating the human in the exploration process, applying its
perceptual abilities to the large data sets available in today's computer
systems. The basic idea of visual data exploration is to present the data
in some visual form, allowing the human to get insight into the data and
draw conclusions. Visual data exploration techniques have proven to be of
high value in exploratory data analysis and they also have a high potential
for exploring large databases. Visual database exploration is especially
powerful for the first steps of the data mining process, namely
understanding the data and generating hypotheses about the data, but it may
also significantly contribute to the actual knowledge discovery by guiding
the search using visual feedback. The goal of the tutorial is to show the
potential of visualization technology for exploring large databases. The
tutorial provides an overview of the state-of-the-art in data visualization
and provides a classification of the existing data visualization
techniques. Besides describing each of the classes, the tutorial focuses on
new developments in data visualization, which are relevant to the area of
knowledge discovery, and describes a wide range of recently developed
techniques for visualizing large amounts of arbitrary multi-attribute data
which does not have any two- or three-dimensional semantics and therefore
does not lend itself to an easy display. A detailed comparison shows the
strength and weaknesses of the existing techniques and reveals potentials
for further improvements. Several examples demonstrate the benefits of
visualization techniques for exploring databases. The tutorial concludes
with an overview of existing database exploration and visualization
systems, including research prototypes as well as commercial products.
Dr. Daniel Keim is one of the leading experts in the field of visual
database exploration, and he was the chief engineer in designing the VisDB
system - a visual database exploration system. Dr. Keim received his
diploma (equivalent to an MS degree) in Computer Science from the
University of Dortmund in 1990 and his Ph.D. in Computer Science from the
University of Munich in 1994. Currently, he is a teaching and research
assistant (approximately equivalent to an assistant professor) at the
Institute for Computer Science of the University of Munich, Germany.
Tutorial 7: 4:00-6:00pm
Statistical Models for Categorical Response Data
William DuMouchel, AT&T Research
This tutorial will survey the most common models and methods statisticians
use to fit and test relationships among categorical (discrete) data. Most
of these techniques are described in statistics texts such as Categorical
Data Analysis , by Alan Agresti, (Wiley 1990) and are widely available in
popular computer packages such as SAS and Splus. Therefore it is almost de
rigeur for someone with a new classification technique to compare the
proposal to one or more of these standard methods. The tutorial will focus
on loglinear and logistic regression models, and related models such as
probit, poisson regression, and survival models. In the short time
available, priority will be given to explaining why these techniques are so
popular among statisticians, and to how the basic models have been extended
to handle variables having more than two categories or when some of the
variables have continuous or ordinal scales. Examples of model fitting,
model search and model comparison using SAS and Splus will be presented and
discussed.
Dr. William DuMouchel has been on the faculties of UC Berkeley, University of
Michigan, University of London, MIT and Columbia University. From 1987 to
1992 he was Chief Statistical Scientist at BBN Software Products, helping
to design and develop commercial software advisory systems for data
analysis and experimental design. He is currently at AT&T Labs - Research,
Florham Park, New Jersey.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 24 Apr 1997 18:06:38 -0400
From: [email protected] (R. Uthurusamy)
Subject: KDD-97 Workshop
KDD-97 Workshop - August 17, 1997 8:30am-5pm
---------------------------------------------
Issues in the Integration of Data Mining and Data Visualization
---------------------------------------------------------------
Details:http://www.cs.uml.edu/~grinstei/kddvis-workshop.html
Data visualization deals with the effective portrayal of data with a goal
towards insight about the data. Typically, the data is of high volume,
multidimensional in nature, and does not lend itself to easy display. The
data is also often non-spatial and temporal in nature.
Data visualization software systems are very popular with end-user domain
scientists who require visual tools to explore and analyze their data.
These visual tools however are used strictly as output of the exploration
process and have received much attention whereas the input issues to the
exploration process still have not. The KDD community is concerned with two
aspects of visualization techniques: 1. Its use at the "back-end" of the
exploration process to help understand models extracted by data mining
algorithms, and 2. Scalability issues in visualization: how do we make it
efficient in presence context of large databases where data access is
expensive. The visualization community looks at KDD and analytic methods
also as applications to generate displays. However, visualization can be
used as input to KDD and analytic tools; it can also be used to support
computational steering. An effective visualization front-end can guide a
data mining algorithm in its search and may result in much better and more
easily acceptable solutions. This workshop will continue the discussions
started at the first two workshops and focus on these and other issues that
make a case for integrating KDD and visualization technologies.
Two previous workshops (Siggraph '90 and Visualization '91) have dealt with
areas such as high-level requirements for data structures and access
software, and data visualization environments. The first and second
workshop on Database Issues for Data Visualization were held in 1993 and
1995 and explored the fundamental issues. A number of experimental,
prototype, and research systems were presented. The second workshop also
saw a beginning interest with data mining and visualization integration.
This trend, so significant in the commercial sector today, is in its
infancy and is in need of much research attention.
Position statements and papers are welcome on the following issues as they
relate to KDD and data visualization integration. We would like to keep
discussions focused on the end result, which is improving the integration
of data mining and knowledge discovery systems with visualization:
* Requirements Visualization places on Knowledge Discovery Systems
* Data Models and Access Structures
* Modeling the User - Tasks, Processes, Support Issues
* Advanced User Interfaces for Data Mining
* Visual Languages for Data Mining
* System Integration Issues
* Computational Steering for Data Mining
* Scalability to Large Databases
* Distributed, Heterogeneous Data Set Issues - Data and Computation Sharing
* Examples of Integrated Systems
* Applications of Integrated Systems
Workshop Paper Submissions (Deadline June 15)
Papers (and position papers to be expanded for final publication) are
solicited that present research results in the integration of data mining
and visualization. Papers should be limited to 5,000 words and may be
accompanied by NTSC video. These should describe some original research on
the particular subject, and how it fits in with the overall theme of the
workshop. Proper references should be cited.
Workshop Registration Fee
Registration forms will be sent to the accepted participants. There is a
single registration fee of US $100 which covers the workshop sessions,
preprints, and coffee breaks.
Workshop Organizers
Georges Grinstein
Institute for Visualization and Perception Research
University of Massachusetts at Lowell
Lowell, MA 01854, USA
Email: [email protected]
Fax: +1-508-934-3551 * Phone: +1-508-934-3627
Andreas Wierse
Institute for Computer Applications
Dep. Computersimulation and Visualization
Pfaffenwaldring 27
D-70550 Stuttgart, Germany
Email: [email protected],
Fax: +49(0)711-682357 * Phone: +49-711-685-5796
Usama Fayyad
Microsoft Research
Redmond, WA 98052-6399, USA
Email: [email protected]
Fax: +1-206-936-7329 * Phone +1-206-703-1528
---------------------------------------------
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 24 Apr 1997 18:06:38 -0400
From: [email protected] (R. Uthurusamy)
Subject: KDD-97 Demos/Exhibits of Knowledge Discovery Products
-----------------------------------------------------
Following the sucess of the demonstration sessions in previous KDD
conferences, the KDD-97 program will also include demonstrations of
knowledge discovery products, knowledge discovery applications and research
prototypes. Unlike previous demonstration sessions, we will clearly
differentiate between commercial product demonstrations and research
demonstrations.
We are inviting commercial vendors to exhibit at KDD-97. The exhibitor fee
for KDD-97 will be a nominal $250.00. Exhibitors will be provided with a
6ft table top. In this space vendors will be allowed to distribute product
or company literature, show product demonstrations and set up signage.
Vendors will have to bring all necessary hardware and software that they
will require for their demonstrations.
The exhibit area will be open during the following hours: Aug. 15th: 12:30-5pm
For your information total attendance at KDD-96 was 457. Of these 35% were
affiliated with universities and 65% were affiliated with industry. If you
would like to exhibit at KDD-97 please fill out the registration form and
send it along with the name of your Product(s) and/or Service(s) and a 200
word (maximum) Description of Product(s)/Service(s) to: AAAI, KDD-97
Exhibit, 445 Burgess Drive, Menlo Park, CA 94025, USA. The description
will be included in the conference program.
We are also soliciting demonstrations of research prototypes at KDD-97.
This demonstration session will be held on August 15 from 12:30 to 5:00
PM. We have a limited budget for providing hardware for research
demonstrations. This year we will give priority to demonstrations that are
in conjunction with accepted papers at KDD-97. Within budget and space
constraints we will make every effort to accommodate as many demonstrations
as possible. If you would like your demonstration to be considered for
KDD-97 please provide the following information to Tej Anand
([email protected]) by June 1, 1997.
* Name of Demonstration:
* Title of Paper: (If this demonstration is in conjunction with a
paper/poster at KDD-97)
* Development Team:
* Affiliations of Development Team Members:
* Contact Telephone#:
* Description of Demonstration: (A short description of approx. 200 words)
* What is unique about your system or application?: (No more than 50 words)
* Status: Research Prototype/Commercially available product/Fielded application
* Hardware Required: (Please state any special memory or disk requirements)
* Operating System: (Please state specific version number)
* WAN Connection Required: Yes/No
(If Yes, please state any special modem requirements)
* Will you bring your own hardware?: Yes/No
* Any other requirements:
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 24 Apr 1997 18:06:38 -0400
From: [email protected] (R. Uthurusamy)
Subject: KDD-97 Registration Information
A registration application is attached to this online brochure. The KDD-97
program registration includes admission to four tutorials, 4 tutorial
syllabi, technical and demo sessions, the opening reception, the KDD-97
Conference Proceedings and mid-morning & afternoon coffee breaks. Onsite
registration will be located in the foyer outside the California Ballroom,
Newport Beach Marriott Hotel and Tennis Club, lobby level.
Early Registration (Postmarked by June 10)
AAAI Members
Regular $295 Students $95
Nonmembers
Regular $375 Students $155
Late Registration (Postmarked by July 15)
AAAI Members
Regular $350 Students $125
Nonmembers
Regular $425 Students $180
On-Site Registration (Postmarked after July 15 or onsite.)
AAAI Members
Regular $400 Students $475
Nonmembers
Regular $150 Students $210
Workshop Registration
Registration forms will be sent to the accepted participants. There is a
separate registration fee of US $100 which covers the workshop sessions,
preprints, and coffee breaks.
Payment Information
Prepayment of registration fees is required. Checks, international money
orders, bank transfers and travelers' checks must be in US dollars.
American Express, MasterCard, VISA, and government purchase orders are also
accepted. Registration applications postmarked after the early
registration deadline will be subject to the late registration fees.
Registration applications postmarked after the late registration deadline
will be subject to on-site registration fees. Student registrations must be
accompanied by proof of full-time student status.
Refund Requests
The deadline for refund requests is July 25, 1997. All refund requests
must be made in writing. A $75.00 processing fee will be assessed for all
refunds.
Registration Hours
Registration hours will be Thursday-Saturday, August 14-16, 7:30am-6:00pm
and Sunday, August 17, 8:00am-3:00pm. All attendees must pick up their
registration packets for admittance to programs.
Housing
AAAI has reserved a block of rooms at the Newport Beach Marriott Hotel at
reduced conference rates. Conference attendees must contact the hotel
directly and identify themselves as KDD-97 registrants to qualify for the
reduced rates. Hotel rooms are priced as singles (1 person, 1 bed),
doubles (2 persons, 2 beds), triples (3 persons, 2 beds), quads (4 persons,
2 beds). Rooms will be assigned on a first-come, first-served basis. All
rooms are subject to a 10% occupancy tax.
Headquarters Hotel:
Newport Beach Marriott Hotel
900 Newport Center Drive
Newport Beach, CA 92660
Phone: 714-640-4000
Fax: 714--640-4918
Single room: $105.00
Double room: $115.00
Check-in time: 4:00pm
Check-out time: 12:00 noon
Cut-off date for reservations: July 24, 1997.
All reservation requests for arrival after 6:00 pm must be accompanied by a
first night room deposit, or guaranteed with a major credit card. The
Newport Beach Marriott Hotel will not hold any reservations after 6:00 pm
unless guaranteed by one of the above methods. Reservations received after
the cut-off time will be accepted on a space or rate available basis.
Reservations accepted without a credit card guarantee or advance deposit
are subject to cancellation at 6:00 pm on the day of arrival.
Air Transportation and Car Rental
Newport Beach, California - Get there for less!
Discounted fares have been negotiated for this event. Call Conventions in
America at 1-800-929-4242 and ask for Group #428. You will receive 5%-10%
off the lowest applicable fares on American Airlines, or the guaranteed
lowest available fare on any carrier. Travel between August 11-21, 1997.
All attendees booking through CIA will receive free flight insurance and be
entered in their bi-monthly drawing for worldwide travel for two on
American Airlines! Hertz Rent A Car is also offering special low
conference rates, with unlimited free mileage.
Call Conventions in America - 1-800-929-4242, ask for Group #428.
Reservation hours: M-F 6:30am-5:00pm Pacific Time.
Outside US and Canada, call 619-453-3686/Fax 619-453-7679.
Internet: [email protected]/24-hour emergency service 1-800-748-5520.
If you call direct: American 1-800-433-1790, ask for index #S 9485.
Hertz 1-800-654-2240, ask for CV#24250.
Ground Transportation
The following information provided is the best available at press time.
Please confirm fares when making reservations.
Airport Connections
The Newport Beach Marriott Hotel provides complimentary airport
transportation to/from John Wayne /Orange County Airport.
Super Shuttle: 714-517-6600. The fare from LAX Los Angeles International
Airport to Newport Beach Marriott Hotel is $21.00 per person. Reservations
24 hours in advance are recommended. Discover Card, traveller's checks and
cash is accepted.
Taxi
Taxis are available at John Wayne Airport. Approximate fare from the
airport to downtown Newport Beach is $14.00. Orange County Yellow Cab
Service: 714-546-1311. The approximate taxi fare from LAX Los Angeles
International Airport to Newport Beach Marriott Hotel is $75.00-80.00.
Bus
Greyhound/Trailways Lines. The depot is located at 100 W. Winston Road,
Anaheim, CA 92805. For information on fares and scheduling, call
714-999-1256.
Rail
The Amtrak (Southern Pacific Railroad) stations are located at Santa Ana,
Irvine and Anaheim. For general information and ticketing, call
1-800-872-7245.
City Transit System
OCTD (Orange County Transit District) serves Newport Beach, Balboa Island
and Corona del Mar. Basic local fare is $1.00. For general information
call 714-636-RIDE.
Parking
Parking is available at the Newport Beach Marriott Hotel. The daily rate
for valet parking is $6.00, and $8.00 overnight. Self-parking is
complimentary.
Disclaimer: In offering American Airlines, Hertz Rent A Car, Newport Beach
Marriott Hotel, and all other service providers, (hereinafter referred to
as "Supplier(s)" for the Third International Conference on Knowledge
Discovery and Data Mining, AAAI acts only in the capacity of agent for the
Suppliers which are the providers of the service. Because AAAI has no
control over the personnel, equipment or operations of providers of
accommodations or other services included as part of the KDD-97 program,
AAAI assumes no responsibility for and will not be liable for any personal
delay, inconveniences or other damage suffered by conference participants
which may arise by reason of (1) any wrongful or negligent acts or
omissions on the part of any Supplier or its employees, (2) any defect in
or failure of any vehicle, equipment or instrumentality owned, operated or
otherwise used by any Supplier, or (3) any wrongful or negligent acts or
omissions on the part of any other party not under the control, direct or
otherwise, of AAAI.
Newport Beach, California!
Newport Beach is located along the beautiful Pacific Ocean in Orange
County, California, nestled south of Los Angeles, north of San Diego,
southwest of Disneyland in Anaheim, and adjacent to John Wayne/Orange
County Airport. Surrounded by one of the largest small-boat harbors in the
world and lazily stretching itself along more than six miles of scenic
Pacific coastline, Newport Beach beckons national and international
visitors to moor at the magnificient harbor and discover "The Colorful
Coast".
Newport Beach Visitor Information
A Concierge Desk is available in the Newport Beach Marriott Hotel. They
can assist with dining reservations, directions, tour bookings,
entertainment suggestions, and transportation information. Maps and
brochures are available.
URL: http://www.newport.lib.ca.us/NBCVB/NBCVB.html
************************************************************************
KDD-97 PREREGISTRATION APPLICATION
Name:
Company/Univ:
Dept/MS:
Address (Specify Home or Business):
City:
State:
Zip:
Phone & FAX:
Membership No:
Email Address:
************************************************************************
TECHNICAL PROGRAM (Includes Proceedings)
EARLY REGISTRATION LATE REGISTRATION
(postmarked by June 10) (postmarked by July 15)
AAAI Member Nonmember AAAI Member Nonmember
Regular Student Regular Student Regular Student Regular Student
$295 $95 $375 $155 $350 $125 $425 $180
(Students must send proof of student status to the AAAI Office. By joining
AAAI now, you can qualify for member rates. Membership information is
available from [email protected] or http://www.aaai.org.)
Total KDD-97 Conference Fee: ______
************************************************************************
TUTORIAL PROGRAM
Thursday, August 14
(Conference fee includes up to 4 consecutive tutorials & accompanying syllabi)
8:00-10:00 AM T1
10:30 AM-12:30 PM T2, T3
1:30-3:30 PM T4, T5
4:00-6:00 PM T6, T7
Please list selected tutorial codes:
************************************************************************
KDD-97 Workshop
Sunday, August 17
$100 per person.
Total Workshop Fee: _______
************************************************************************
KDD-97 OPENING RECEPTION (Included in technical program registration)
Fee for spouse, child, or guest is $20 per person.
Total reception fee: ______
************************************************************************
Exhibit Registration
August 15, 1997
$250 per exhibitor. An exhibitor kit will be mailed upon receipt of
registration.
Total Exhibitor Fee: _______
************************************************************************
PAYMENT
Email registrations must be accompanied by a credit card number.
Total Amount Due: ______
Check one: Mastercard ___ Visa ___ American Express ___
Credit Card Account Number:
Expiration Date:
Name as it appears on card:
Forms cannot be processed if information is incomplete. The refund request
deadline is July 25, 1997. A $75.00 processing fee will be assessed for
refunds.
Registrations postmarked after July 15 are subject to onsite rates.
Mail completed application to [email protected] or fax to 415/321-4457.
Please note that there are security issues involved with the transmittal
of credit card information over the internet. AAAI will not be held liable
for any misuse of your credit card information during its transmittal
from you to AAAI.
For complete KDD-97 information, please visit AAAI's web site at
http://www.aaai.org.
Thank you for your registration! See you at KDD-97
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 24 Apr 1997 08:42:49 -0400
From: [email protected] (Peter Turney)
Subject: Re: data mining from wafers manufacturing process
Dear Elisa:
> At our University, we are starting an application project
> dealing with data from a wafers manifacturing process.
> We are thinking to use data mining techniques
> for try to address the following problem.
> Some of those wafers are faulty. There is a database keeping track
> of the entire manifacturing process for each wafer and collecting
> large amount of data concerning each step of the manifacturing
> process (there are about 300 steps; each step is characterized
> about 100 parameters). Our problem is use data mining techniques
> in helping the diagnosis, that is, to see which step
> may have caused the problem.
>
> I was wondering whether you are aware of any use of data mining
> techniques for similar problems. We have also to acquire
> some suitable data mining tools.
Here are two relevant URLs for you:
1. ftp://ai.iit.nrc.ca/pub/iit-papers/NRC-39163.ps.Z
P. Turney. Data Engineering for the Analysis of Semiconductor
Manufacturing Data. IJCAI-95 Workshop on Data Engineering for
Inductive Learning: 50-59. 1995.
2. http://www.quadrillion.com/
Quadrillion Corporation, makers of Q-Yield
Best wishes,
Peter.
http://ai.iit.nrc.ca/staff/peter.html
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Thu, 24 Apr 97 11:18:57
Subject: FW: new entry for siftware section
<H2>Siftware: Delta Miner </H2>
<br><b>*URL:</b> <A HREF="http://www.bissantz.de"> http://www.bissantz.de</a>
<br><b>*Description:</b>:
Delta Miner 3.0 is a suite of easy to handle data mining instruments for
financial controlling applications
and database analysis.
<br><b>*Discovery tasks:</b> Clustering, Summarization,
Deviation Detection, Visualization
<br><b>*Comments:</b> Delta Miner 3.0 is a suite of data mining
instruments that analyzes complex data pools. Delta Miner's tools are
flexible: they lend themselves to a broad range of applications. A
common application is the analysis of financial controlling data. Delta
Miner guides the user quickly and easily through complex data structures
down to the significant facts. In contrast to the simple "Drill-down"
capabilities of typical EIS and MIS tools, Delta Miner integrates a high
level of helpful automation. The system is capable of recommending the
best analysis paths, thereby relieving the controller from tedious
routine tasks. In addition to identifying the important trends, the tool
also points to the causes of those trends. Further analyses inform the
user about the best possible countermeasures to negative developments.
The basis techniques of the Delta Miner were developed at FORWISS, where
since 1993, a research group led by Prof. Dr. Peter Mertens has
intensively investigated algorithms for Data Mining. At it's first
presentation delta miner was recognized as one of the best three
products in the category "Business Management Solutions" at the Systems
'96 trade show in Munich. A demoversion can be downloaded.
<br><b>*Platform(s):</b> Windows 95, NT
<br><b>*Contact:</b> <pre>
Bissantz K�ppers & Company GmbH
Am Weichselgarten 7
91058 Erlangen
Germany
phone +49 9131 691-450
fax +49 9131 691-455
[email protected]
</pre>
<br><b>*Status: </b> Product
<br><b>*Updated:</b> 1997-04-11 by Dr. Nicolas Bissantz ([email protected])
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 30 Apr 1997 16:22:28 -0400
From: Pablo Tamayo <[email protected]>
Job Description:
Staff Member in the Technology Group
Researcher/Developer of Data Mining/KDD Technologies
Thinking Machines Corp.
4/30/97
- Provide technical and scientific expertise in core areas for Data
Mining and KDD, such as Machine Learning, Artificial Intelligence,
Statistics and High Performance Computing, to the development
organization and the company in general. Help to evaluate competing, new
or strategic technologies and algorithms for current or future releases
of Data Mining/KDD products (toolsets, KDD engines and vertical
applications).
- Design and develop state-of-the-art Machine Learning/Statistical
module prototypes. Be responsible for the support and maintenance of the
assigned modules. Collaborate with the Software Engineering Group
to integrate these prototypes into products' software architecture
following development-wide software engineering guidelines. Provide
parallelism and performance enhancements for algorithms. Help support
core algorithms in current products.
- Collaborate with the Data Analysis, Professional Services and
Technical Sales groups to study and choose appropriate algorithms and
methods for proof of concept studies or to integrate permanent solutions
for customers.
- Help write patents and provide technical assistance in patent related
issues.
- Represent the company in relevant conferences, workshops, trade shows
or forums and follow Data Mining/KDD literature and trends in the KDD
academic and commercial communities.
If you are interested please contact:
Dr. Pablo Tamayo
[email protected]
Thinking Machines Corp.
14 Crosby Dr.
Bedford, MA 01730
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Eric Horvitz <[email protected]>
Date: Wed, 23 Apr 1997 13:53:39 -0700
Thirteenth Conference on Uncertainty in Artificial Intelligence
Please refer to the UAI '97 home page at http://cuai97.microsoft.com for
updated information on this summer's UAI conference and registration
procedures. UAI will follow right after AAAI in Providence. The page
also includes other information of interest, including details (...and
even some reading assignments) for the UAI '97 Full Day Course on
Uncertain Reasoning on Thursday, July 31. The pages also contain
information on accomodations in Providence.
Looking forward to seeing you this summer,
Eric Horvitz
Conference Chair
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Fri, 25 Apr 1997 14:45:21 -0400
Subject: The Gordian Institute's "Making Sense of Data: Computer-Aided
Pattern Discovery" course is scheduled for July 14-18 in
Charlottesville, Virginia. Refer to http://www.gordianknot.com
------------------------------------------------------------------------
The Gordian Institute, a division of American Heuristics Corporation (AHC),
established July 14-18, 1997 in the historic town of Charlottesville near
Monticello as the venue for the next offering of "Making Sense of Data:
Computer-Aided Pattern Discovery."
The intensive four and one-half day data mining course will take place in
Charlottesville, Virginia with a start date of July 14, 1997. The course
includes live interactive demonstrations using data from real-world
applications. Participants need only have prior working experience with
computers and familiarity with data related problems to benefit from the
course.
Attendees will explore a host of advanced computing techniques and software
tools used to discover useful patterns hidden in data. The course surveys
modern algorithms drawn from the fields of statistics, machine learning, data
mining and inductive modeling which automatically build classifiers or
estimators from a database. You may never find another course that succinctly
covers the essential parts of so many aspects of "data mining" with both
theoretical and practical insights. Topics to be presented are:
-Pattern Discovery: An Overview
-Inducing Models from Data: Benefits and Dangers
-The Data Mining Process
-Perspectives of Related Fields:
-Statistics, Machine Learning, Data Mining
and Artificial Intelligence
-Data Issues
-Case Diagnostics (Outlier, Influential, Leverage Points)
-Feature Creation and Selection
-Classical Statistical Techniques
-Linear: Regression and Discriminant Analysis
-Nonparametric: Scatterplot Smoothers,
Nearest Neighbors, Kernels
-Key General Tools:
-Scientific Visualization
-Resampling
-Optimization
-Clustering
-Modern Methods
-Neural Networks
-Polynomial Networks (ASPN, AIM)
-Decision Trees (CART)
-Brief Survey of Other Methods
-Projection Pursuit
-ASH (Average Shifted Histograms)
-MARS (Multivariate Adaptive Regression Splines)
-Radial Basis Functions
-Comparing and Combining Methods
While increasingly awash in data, most organizations are unable to fully
extract the useful information embedded within. The practical techniques
taught in this course can help you to discover and make sense of hidden
patterns. A key element of corporate efficiency must be the extraction of
important information to support the decision making process and accurately
predict and plan for future needs. Those from government, industry and
academia who see the need for non-linear modeling techniques, and who have
particular applications not adequately solved with classic modeling techniques
are target candidates for this course.
Direct Quote from Course Evaluation Sheet:
"I felt this course was far superior to many others that I have been exposed
to. Most notably, the instructors were not only clearly experts but were not
biased toward any one software package or technique. The instructors also
emphasized targeting the users' specific applications (including analyzing
sample data brought in by the students). This is exceptionally useful. Great
value for the $. What was most valuable to me was the presentation of a broad
range of both analytical techniques and software tools for solving various
problems. This helps to give me the 'big picture' and allows me to best
determine what technologies are most applicable and useful to me."
-Andy Kalish, Eastman Kodak
The Instructors:
John F. Elder IV, PhD, and Dean Abbott of Quantitative Solutions explain the
methods used inside leading commercial and academic software, providing
practical tips and techniques on feature extraction and neural network problem
solving. The course instructors each have more than a decade of experience in
applying adaptive, data-driven techniques to practical problems.
Dr. Elder has developed or refined some of the methods covered in this course.
He is Chief Scientist at Quantitative Solutions and Adjunct Professor at the
University of Virginia, and has authored four book chapters and numerous
articles on adaptive methods of pattern discovery. He has been a researcher
at Rice University and at an engineering consulting firm, and was Director of
Research for an investment management company. Dr. Elder is a frequent
lecturer on pattern discovery techniques, and is the technical chair of the
Adaptive and Learning Systems Group of the IEEE Systems, Man, and Cybernetics
Society.
Dean W. Abbott is a Senior Research Scientist at Quantitative Solutions. He
has applied data mining techniques to challenges in optimum guidance and
control, optical character recognition, image pattern recognition, and radar
and multi-spectral signal processing. Mr. Abbott has developed pattern
recognition software that is sold commercially, and has written and lectured
on novel applications of feature selection, polynomial network, and pattern
recognition techniques to solve real-world problems in several fields.
Pricing Information:
Registration for this four and one-half day course is $1995. Government and
academic discounts may apply. Lodging details and directions may be viewed at
http://www.gordianknot.com, or obtained by providing a fax number or Email
address to (800) 405-2114 or [email protected]. You may also send a
message to [email protected] with "newsletter" in the subject field to
receive a quarterly electronic newsletter from The Gordian Institute.
If you have remaining questions regarding the course, a knowledgeable
representative may be contacted directly at (800) 405-2114. Seats may also be
secured through Gordian's web site. Space is limited to 24 seats, so go to
your browser, set it to http://www.gordianknot.com and reserve your place!
__________________________
The Gordian Institute
http://www.gordianknot.com
[email protected]
(800) 405-2114
__________________________
The parent company, American Heuristics Corporation (AHC) is a founding member
of the West Virginia High Technology Consortium, with headquarters in
Triadelphia, West Virginia. AHC is an advanced software technology consulting
company applying hybrid software solutions to complex technical problems in
business, industry and government. AHC may be found on the web at: http://
www.heuristics.com
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "Prof. Zicari" <[email protected]>
Date: Sun, 27 Apr 1997 00:10:18 +0200 (METDST)
I would like to inform you that the conference programs of
COMDEX Internet & OBJECT WORLD Frankfurt`97 (October 7-10)
are now available on line at :
http://www.ltt.de
The web site will be updated on a regular base.
If you have any questions, please send me an e-mail at
[email protected].
Best Regards
Roberto Zicari
Chair Advisory Board,
COMDEX Internet & OBJECT WORLD Frankfurt.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
410.25 | 97:16 | IJSAPL::OLTHOF | Spellchecked Henry Although | Sun May 11 1997 19:52 | 765 |
| Knowledge Discovery Nuggets 97:16, e-mailed 97-05-08
Publications:
* GPS, first issue of DMKD journal is published!
http://www.research.microsoft.com/research/datamine/
* Gerhard Widmer, CfP: MLJ Special Issue on Context Sensitivity and Concept Drift (http://www.ai.univie.ac.at/mlj_specissue/)
Siftware:
* Larry Bouchie, Cognos new Data Mining Tool: Scenario
* Aleksander Oehrn, Rosetta - rough-set tool for data analysis
http://www.idt.unit.no/~aleks/rosetta/rosetta.html
Positions:
* Gregory Piatetsky-Shapiro, Data Mining Company looking for
experts in decision trees and/or bayesian networks
* Donal Lyons, Data Mining Research Position in Ireland
* Yike Guo, Data Mining Job at Fujitsu (Japan)
Meetings:
* Pavel Brazdil, The Workshop on "Extraction of Knowledge from Data Bases" (EKBD'97), Coimbra, Portugal, October 6-9, 1997
http://alma.uc.pt:80/~epia97/EKBD97.html
* Michael Berthold, IDA-97 Call for Participation
http://web.dcs.bbk.ac.uk/ida97.html
* Staal Vinterbo, PKDD'97 Call for participation,
Trondheim, Norway, June 24-27, 1997,
http://www.idi.ntnu.no/pkdd97/
* Rob Tibshirani, Statistical prediction methods for finance and marketing, New York City: June 23-24,
1997,
http://stat.stanford.edu/~trevor/mrc.finance.html
* Angi Voss, Workshop on Social Agents at ECSCW97 Conference
September 7, 1997
http://orgwis.gmd.de/projects/SAW/ecscw97SoAg.html
--
Knowledge Discovery Nuggets is a free electronic newsletter for the
Data Mining and Knowledge Discovery community, focusing on the
latest research and applications.
Submissions are most welcome and should be emailed, with a
DESCRIPTIVE subject line (and a URL) to [email protected].
Please keep CFP and meetings announcements short and provide
a URL for details.
To subscribe, see http://www.kdnuggets.com/subscribe.html
KD Nuggets frequency is 3-4 times a month.
Back issues of KD Nuggets, a catalog of data mining tools
("Siftware"), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site
at http://www.kdnuggets.com/
-- Gregory Piatetsky-Shapiro (editor)
[email protected]
********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
About the Deep Blue -- Kasparov match,
"I just think we should look at this as a chess match," he said, "between the
world's greatest chess player and Garry Kasparov."
Louis Gerstner, IBM Chairman
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 8 May 1997 09:41:10 -0500 (EST)
From: GPS <[email protected]>
Subject: First Issue of DMKD journal
The first issue of DMKD journal has finally been published!
see http://www.research.microsoft.com/research/datamine/vol1-1/default.htm
The beautiful black and white cover shows an Escher-inspired picture
of several robots inside a mysterious structure (a data mine?), and
contents include
an editorial by Usama Fayyad, 4 excellent technical papers,
* Statistical Themes and Lessons for Data Mining
Clark Glymour, David Madigan, Daryl Pregibon, Padhraic Smyth
* Data Cube: A Relational Aggregation Operator Generalizing Group-by,
Cross-Tab, and Sub Totals
Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, and Hamid Pirahesh
* On Bias, Variance, 0/1 - loss, and the Curse-of-Dimensionality
Jerome H. Friedman
* Bayesian Networks for Data Mining, David Heckerman
and a brief application summary:
* Advanced Scout: Data Mining and Knowledge Discovery in NBA data,
Inderpal Bhandari, Ed Colet, Jennifer Parker, Zachary Pines, Rajiv Pratap, Krishnakumar Ramanujam
Sample copies of first issue will be mailed soon.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 30 Apr 1997 11:09:50 +0200 (MET DST)
From: Gerhard Widmer <[email protected]>
Subject: CfP: MLJ Special Issue on Context Sensitivity and Concept Drift
Machine Learning Journal
Special Issue on Context Sensitivity and Concept Drift
Miroslav Kubat and Gerhard Widmer, Guest Editors
MOTIVATION AND RESEARCH ISSUES
In many machine learning applications, the features given to the
learning program do not capture all aspects of the application problem.
This is a limitation shared with all forms of modeling -- even the
person who formulates the learning problem may not be aware of all of
the relevant context. Examples from the history of machine learning
and pattern recognition include omitting illumination features in
computer vision and omitting language accents in speech recognition
systems. A similar problem arises when the relevant features are
included, but the training examples do not provide enough variation
of those features to permit the learning algorithm to detect their
relevance. For example, if foreign accent features are included in a
speech recognition system, but all training examples are from native
speakers, then the foreign accent features will be ignored by the
learning system.
Relevant context may also change with time, so that a classifier
trained on one set of training examples (where a contextual feature
was absent or held constant) may suddenly begin to perform badly when
the context changes. Gradual or abrupt changes in context often
become apparent in the form of {\em concept drift}. For situations
where a concept gradually evolves over time in a certain general
direction (such as the concept ``computer''), the term {\em concept
evolution} has sometimes been used. Tracking concept drift on-line
requires a learner to continually monitor its performance and adjust
its hypotheses if necessary. It might also require the learner to
"forget" old, outdated information.
In batch learning, problems may arise if the training data were
collected in batches from different contexts, or if the training
data were gathered in one setting but the test data are drawn from
a different setting. Again, effective learning requires the recognition
of such discontinuities and the ability to adapt hypotheses to
different conditions.
This special issue is devoted to theoretical and empirical studies
of methods for detecting missing context, tracking concept drift,
adapting learned knowledge to new contexts, and identifying and
reasoning about contextual effects and concept changes in learning.
We encourage submissions addressing one or more of the following
research issues:
. on-line tracking of concept drift and concept evolution
. theoretical results concerning concept drift and contextual influences
. formal definitions of context and its effects on concept learning
. real-world applications involving context changes and/or concept drift
. representation of context-sensitive concepts
. representation of context
. recognition of context and reasoning about context
. adaptation of learned knowledge to new contexts
Both theoretical and more practically oriented papers are welcome,
but we do encourage papers that provide real-world examples of context
sensitivity and concept drift and compare multiple ways of addressing
the problems that arise.
SUBMISSION INFORMATION:
The expected length is 8000-12000 words for a full paper, or 2000-4000
words for a Research Note (full-page figures count for 400 words).
Electronic submission via e-mail is STRONGLY ENCOURAGED. Postscript
files (compressed or gzipped, uuencoded) should be sent to
[email protected].
For hardcopy submissions, please send 5 copies of the manuscript to:
Gerhard Widmer
Austrian Research Institute for Artificial Intelligence
Schottengasse 3
A-1010 Vienna
Austria
Tel: +43-1-53532810
Fax: +43-1-5320652
e-mail: [email protected]
The submission deadline is September 15, 1997.
see http://www.ai.univie.ac.at/mlj_specissue/ for full details.
The special issue is scheduled to appear in the summer of 1998.
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 28 Apr 1997 13:38:14 +0200
To: [email protected]
From: Aleksander Oehrn <[email protected]>
Subject: Rosetta availability
===================================================
Rosetta -- A Rough Set Toolkit for Analysis of Data
===================================================
Rosetta is a toolkit for analyzing tabular data within the framework of
rough set theory, and consists of a computational kernel and a GUI
front-end. The Rosetta GUI reflects the contents of the kernel, and runs on
PCs operating under Windows NT or Windows 95.
A limited version of Rosetta is made publicly available for non-commercial
use. The downloadable program is limited in the sense that algorithms from
the embedded RSES library are not applicable to decision tables larger than
some predetermined size (currently 500 objects and 20 attributes).
http://www.idt.unit.no/~aleks/rosetta/rosetta.html
The software (including documentation) is provided "as is" without warranty
of any kind.
Kernel architecture and front-end designed and implemented at the Knowledge
Systems Group, Dept. of Computer and Information Science, Norwegian
University of Science and Technology, Norway. Sections of the computational
kernel (RSES) developed at the Logic Group, Inst. of Mathematics,
University of Warsaw, Poland.
Rosetta is designed to support the overall KDD process; from initial
browsing and preprocessing of the data, via reduct computation and rule
generation, to validation and analysis of the extracted rules.
Some of the features currently offered by the computational kernel include
amongst others:
- Completion of decision tables with missing values
according to various completion strategies.
- Computation of partitions and rough set approximations
within the variable precision model.
- Sampling of subtables for validation purposes.
- Discretization of numerical attributes with various
discretization algorithms.
- Computation of reducts (both in the standard sense as well
as object-related ones). Various approximation algorithms
(e.g. genetic algorithms) are offered, as well as exhaustive
computation via discernibility matrices. Dynamic reducts can
be computed.
- Generation of propositional rules.
- Shortening and pruning of sets of reducts and rules.
- Exporting of rules, reducts and tables, e.g. to Prolog.
- Application of synthesized rules to unseen examples by means
of various classification strategies, e.g. voting.
- Generation of confusion matrices.
Some of the features currently offered by the Rosetta GUI include amongst
others:
- Full Windows GUI conformance.
- Organization of project items in a tree-structure in order to
retain data-navigational abilities.
- Viewing of all structures in intuitive grid environments, using
terms from the modelling domain.
- Context-sensitive menus.
- Drag and drop functionality.
- Masking of attributes, enabling one to work with "virtual"
tables.
- Automatic generation of annotations, thus documenting the
modelling session.
- A prototype environment for interactive classification and guidance
on the basis of incomplete information, using a selected set of
synthesized rules.
- On-line help.
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 7 May 1997 17:37:13 -0400
From: Larry Bouchie <[email protected]>
Cognos' Scenario data mining product was released
last month. Cognos' main Web page is at http://www.cognos.com and the
Scenario site is at http://www.cognos.com/busintell/products/scenario.html
Concise background and a review are at
http://www8.zdnet.com/pcweek/reviews/0505/05mining.html
COGNOS UNVEILS SCENARIO FOR DATA MINING
-- New Data Mining Software Joins Cognos' Market-Leading Business
Intelligence Tools, PowerPlay" For OLAP And Impromptu" For Query &
Reporting --
BURLINGTON, MA, March 3, 1997 -- Cognos (NASDAQ:COGNF; TSE:CSN) today
announced its newest business intelligence tool, Scenario, for
enterprise-wide guided data analysis and data mining. Scenario extends the
industry's most comprehensive business intelligence product family, joining
Cognos' market-leading PowerPlay, the universal online analytical
processing (OLAP) client, and the award-winning Impromptu query and
reporting tool.
Designed for spotting patterns and exceptions in business data that might
otherwise be missed, Scenario's sophisticated interface allows users to
readily visualize the business information being uncovered. It automates
the discovery and ranking of critical factors impacting a business, exposes
hidden relationships between factors and establishes thresholds and
benchmarks. An intuitive, cost-effective desktop tool, Scenario liberates
data mining from what is typically an expensive and time-consuming process.
Insights derived using Scenario are achieved directly by those best
positioned to use the knowledge and effect rapid change.
Designed to support faster business decision-making, Scenario:
* makes data mining immediately accessible to decision makers;
* simplifies business data analysis by filtering out insignificant business
variables and relationships;
* validates business hypotheses by showing and ranking critical factors and
relationships;
* leads to new business insights by automating information discovery; and
* integrates with Impromptu and PowerPlay as best-of-breed components in
the Cognos enterprise business intelligence solution.
"With Scenario, Cognos is delivering a very important technology to
business analysts," said George Azrak, national director of IS development
at Domino's Pizza. Domino's Pizza has been working with early versions of
Scenario, and has provided Cognos with valuable input from an end user's
point of view.
"Accessible data mining is the long-awaited third wave in the data
warehousing revolution," said Alan Rottenberg, Cognos' senior vice
president, Business Intelligence Tools. "First query and reporting brought
data to the desktop, then OLAP technologies enabled the convenient
navigation of massive data warehouses. Data mining is the technological
leap that automates the information discovery process.
Rottenberg continued, "Impromptu gives access to the numbers and data on
which a business runs. PowerPlay lets individual managers explore that
data without an army of programmers. Scenario works alongside both of
those products to refine business data to distinguish what really matters.
Drawing a straight line to the bottom line, this product completes the
spectrum of business intelligence tools that can arm knowledge workers with
the insight to truly understand the data that drives a business -- and to
reap the competitive rewards."
Scenario uses statistical methods that go beyond "tree" analysis. For
example, one such method is a data segmentation capability based on CHAID
(Chi-Squared Automatic Interaction Detection) technology. CHAID allows
users to find statistically relevant relationships and trends within large
repositories of business data by "refining" it down to the most useful
nuggets that have the greatest effect on the results being tracked.
Subsequent releases of Scenario will include neural-network modeling and
forecasting capabilities, using technologies from recently acquired Right
Information Systems.
Pricing and Availability
Available from Cognos for $695, Scenario 1.0 for Windows 95 or Windows NT
requires an IBM-compatible 486 PC and 8 MB of RAM.
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 8 May 1997 10:40:10 -0500 (EST)
From: Gregory Piatetsky-Shapiro <[email protected]>
Subject: Looking for experts in decision trees and/or bayesian networks
** Data Mining Consulting and Integration Company is looking for
experts in decision trees and/or bayesian networks **
TASK: Participate in the design, development, and deployment of leading
edge integrated data mining and customer modeling systems, primarily in
the financial area. Perform quick data mining studies using a variety of
different approaches and tools.
The candidates will join a team of world-class experts in data
warehousing, data mining and knowledge discovery.
Ideal candidates will have a Ph.D. in Machine Learning, Statistics,
or related fields and 2-3 years of experience, or an M.S. with an
equivalent experience. The candidates should have expertise with
different modeling approaches, but primarily
with with decision trees/rules or with bayesian belief networks.
The candidates should be familiar with statistical theory and have practical
experience with databases.
Excellent coding skills in C/Java/Unix environment along with
good system maintenance practices and the ability to
quickly pick up new systems and languages are needed.
The candidates should also have good communication skills, be
able to work in a team, and be able to enjoy the exciting atmosphere of
a start-up company.
Most of all, candidates should have the passion for developing and
applying innovative methods for solving practical problems.
We offer very competitive salaries, and our outstanding benefits include
profit sharing, stock options, medical/dental insurance, and a 401(k)
plan.
The data mining branch of the company is conveniently located in the
Cambridge area, easily accessible by public transportation.
Proper work authorization required.
Please email your resume and a cover letter (in plain ASCII, please) to:
Gregory Piatetsky-Shapiro, Ph.D.
Director of Applied Research
Geneve Consulting Group
545 Concord Ave
Cambridge MA 02138
email: [email protected]
tel: 617-661-1358
fax: 617-491-4936
URL: http://www.kdnuggets.com/gps.html
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Subject: Data Mining Research Position possibility.
Date: Sat, 26 Apr 1997 11:57:24 +0100
From: Donal Lyons <[email protected]>
Currently there is EU funding available for experienced researchers to
spend a year in countries such as Ireland. I wish to explore the
possibility of using this funding to help develop a Data Mining Interest
Group within the School of Systems and Data Studies in Trinity College,
Dublin.
I'd like to discuss this further with any experienced EU researchers who
are at least tentatively interested.
Regards,
Donal.
Donal Lyons, Phone (1000-1700 GMT) +353 1 608 1919
Lecturer (Information Systems) Phone Messages +353 1 608 1767
School of Systems & Data Studies
Trinity College, Dublin 2, FAX on request
Ireland.
................http://www2.tcd.ie/Statistics/staff/dlyons.html........
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 5 May 97 11:48 BST
From: Yike Guo <[email protected]>
Subject: Job in Japan
A Fujitsu subsidiary company which is developing OLAP and datamining tools
is now looking for a foreign engineer who is interested in working in Japan.
Carrier opportunity for a programing engineer in Japan
Duties
Designing and programing data mining products which include
a visualizing OLAP client.
Requirements
- BS or MS degree related to computer science
- C programming skill (VC++ on NT background is best)
- Familiarity with datamining, visualization, or OLAP
- Native English speaker
Contact
Fujitsu SWE, Manager Mr. Katoh
E-mail: [email protected]
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Tue, 29 Apr 1997 19:30:03 +0200 (MET DST)
From: Pavel Brazdil <[email protected]>
Call for Participation
The Workshop on "Extraction of Knowledge from Data Bases"
EKBD'97
http://alma.uc.pt:80/~epia97/EKBD97.html
Under the auspices of the
Portuguese Conference on Artificial Intelligence (EPIA'97) Coimbra,
Portugal, October 6-9, 1997
October, 7-8, 1997
Coimbra University Physics Building
Aims of the Workshop
This workshop is in the area of Extraction (or Discovery) of Knowledge from
Data Bases and Data Mining, which are rather recent but expanding
rapidly. The objective of the workshop is to discuss methods for non-trivial
extraction of information which is implicit in the existing data and which
can be represented in a high-level language so as to facilitate interpretation.
EKBD'97 welcome original papers in English on the following topics:
- Machine Learning methods useful in KDD and Data Mining,
(decision tree /rule induction, relational learning (ILP) etc.)
- Statistical methods useful in KDD and Data Mining,
(multivariate analysis, principle components, clustering, regression
methods etc.),
- Reduction of complexity through preprocessing,
(identification of relevant attributes, data sampling, clustering, etc.),
- Data summarization and consolidation,
- Languages useful in describing user's hypotheses,
- Applications of KDD and Data Mining,
- other related areas of interest.
Workshop Format and Attendance Requirements:
The workshop will include invited talks, paper presentations and a panel
discussion. The workshop will last 1-2 days.
Papers in English, with no more than 15 pages are welcome.
Attendees should be registred to the main EPIA conference.
(see http://alma.uc.pt:80/~epia97)
Submit 3 copies of the full paper to the address below:
Pavel Brazdil
LIACC, Universidade do Porto,
R. Campo Alegre, 823,
4150 PORTO, PORTUGAL
Text format should follow Springer Verlag Lecture Notes Series.
English is the official language of the workshop.
Important dates:
June, 16: submissions due
July, 15: notifications sent
September, 8: final versions due
Programme Committee:
Pavel Brazdil, Univ.Porto (chair)
Arlindo Oliveira, IST
Carlos Bento, U. Coimbra
Ernesto Costa, U. Coimbra
Fernando Moura-Pires, UNL-FCT
Fernando Nicolau, UNL-FCT
Helena Bacelar Nicolau, UNL-FCT
Joaquim Pinto da Costa, Univ. Porto
Paulo Azevedo, Univ. Minho
Paula Brito, Univ. Porto
Paulo Gomes, INE, Porto
Organizing Committee:
Pavel Brazdil (chair)
LIACC, Universidade do Porto, R. Campo Alegre, 823,
4150 PORTO, PORTUGAL
email: [email protected]
Tel.: (02) 600 1672, Fax: (02) 600 3654
Fernando Moura-Pires
UNL-FCT, Dept. Informatica, Quinta da Torre
2825 Monte da Caparica, PORTUGAL
email: [email protected]
Tel.: (01) 295 4464, Fax: (01) 295 5641
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Subject: IDA Call for Participation
Date: Thu, 8 May 1997 17:43:12 +0200
From: Michael Berthold <[email protected]>
CALL FOR PARTICIPATION
The Second International Symposium on Intelligent Data Analysis (IDA-97)
Birkbeck College, University of London
4th-6th August 1997
In Cooperation with
AAAI, ACM SIGART, BCS SGES, IEEE SMC, and SSAISB
[ http://web.dcs.bbk.ac.uk/ida97.html ]
You are invited to participate in IDA-97, to be held in the heart of London.
IDA-97 will be a single-track conference consisting of oral and poster
presentations, invited speakers, demonstrations and exhibitions. The
conference Call for Papers introduced a theme, "Reasoning About Data",
and many papers complement this theme, but other, exciting topics have emerged,
including exploratory data analysis, data quality, knowledge discovery and
data-analysis tools, as well as the perennial technologies of classification
and soft computing. A new and exciting theme involves analyzing time series
data from physical systems, such as medical instruments, environmental data
and industrial processes.
Information regarding registration as well as the preliminary technical
program can be found on the IDA-97 web page (address listed above). Please
note that there are reduced rates for early registration (before 2nd June).
Also there are still a limited number of spaces available for exhibition,
and potential exhibitors are encouraged to book early (the application
deadline is 2nd June).
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "Staal Vinterbo" <[email protected]>
Message-Id: <[email protected]>
Date: Tue, 6 May 1997 18:05:56 +0200
X-Mailer: Z-Mail (3.2.1 10oct95)
To: [email protected]
Subject: PKDD'97 Call for participation
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Status: U
X-Mozilla-Status: 0001
Content-Length: 4951
Dear Sir.
I am asking on behalf of Prof. Komorowski that the following call for
participation is distributed via the kdd nuggets mailinglist.
Thank you.
PKDD'97 -- Call For Participation
1st European Symposium on Principles of
Data Mining and Knowledge Discovery
Trondheim, Norway
June 24-27, 1997
Tutorials: June 24-25
Symposium: June 26-27
Data Mining and Knowledge Discovery (KDD) have recently emerged from a
combination of many research areas: databases, statistics, machine
learning, automated scientific discovery, inductive programming,
artificial intelligence, visualization, decision science, and high
performance computing.
While each of these areas can contribute in specific ways, KDD focuses on
the value that is added by creative combination of the contributing areas.
The goal of PKDD'97 is to provide a European-based forum for interaction
among all theoreticians and practitioners interested in data mining.
Fostering an interdisciplinary collaboration is one desired outcome, but
the main long-term focus is on theoretical principles for the emerging
discipline of KDD, especially those new principles that go beyond each of
the contributing areas.
Please look at the PKDD'97 Homepage (http://www.idi.ntnu.no/pkdd97/) for
detailed information and news about the symposium.
Registration Information is available at
http://www.idi.ntnu.no/pkdd97/fees.html
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Sun, 4 May 97 12:10 EDT
Subject: Modern Regression and Classification course - New York
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ +++
+++ Modern Regression and Classification: +++
+++ +++
+++ Statistical prediction methods for finance +++
+++ and marketing +++
+++ +++
+++ +++
+++ New York City: June 23-24, 1997 +++
+++ +++
+++ Trevor Hastie, Stanford University +++
+++ Rob Tibshirani, University of Toronto +++
+++ +++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
This two-day course will give a detailed overview of statistical models
for regression and classification. Known as machine-learning in
computer science and artificial intelligence, and pattern recognition
in engineering, this is a hot field with powerful applications in
finance, science and industry.
This course covers a wide range of models from linear regression
through various classes of more flexible models to fully nonparametric
regression models, both for the regression problem and for
classification.
This special version of our popular MRC course is tailored to financial
and marketing professionals.
Although a firm theoretical motivation will be presented, the emphasis
will be on practical applications and implementations, especially in
the finance and marketing areas. The course will include many examples
and case studies, and participants should leave the course well-armed
to tackle real problems with realistic tools. The instructors are at
the forefront in research in this area.
After a brief overview of linear regression tools, methods for
one-dimensional and multi-dimensional smoothing are presented, as well
as techniques that assume a specific structure for the regression
function. These include splines, wavelets, additive models, MARS
(multivariate adaptive regression splines), projection pursuit
regression, neural networks and regression trees. All of these can be
adapted to the time-series framework for predicting future trends from
the past.
The same hierarchy of techniques is available for classification
problems. Classical tools such as linear discriminant analysis and
logistic regression can be enriched to account for nonlinearities and
interactions. Generalized additive models and flexible discriminant
analysis, neural networks and radial basis functions, classification
trees and kernel estimates are all such generalizations. Other
specialized techniques for classification including nearest- neighbor
rules and learning vector quantization will also be covered.
Apart from describing these techniques and their applications to a wide
range of problems, the course will also cover model selection
techniques, such as cross-validation and the bootstrap, and diagnostic
techniques for model assessment.
Software for these techniques will be illustrated, and a comprehensive
set of course notes will be provided to each attendee.
Additional information is available at the Website:
http://stat.stanford.edu/~trevor/mrc.finance.html
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 05 May 1997 12:45:27 +0200
From: Angi Voss <[email protected]>
Subject: Workshop on Social Agents
"Social Agents in Web-Based CollaborationTS
at the ECSCWP297 Conference
September 7, 1997
Organizers: Thomas Kreifelts, Angi Voss, Gloria Mark, Arnstein Borstad,
Vidar Hepsoe
Abstract
--------
We see signs today that the Web is moving toward an environment where
new social and collaborative interactions are being realized. Rather
than continuing to evolve as a single-user environment, the Web is
beginning to be regarded as an environment where reciprocity and
awareness of othersP2 activities have an important function. Software
agents can help develop and support the process of reciprocity by
helping people find others with similar interests, and helping match
knowledge to the right people. Agents can also help people collectively
construct knowledge, shaped around their needs.
This full-day workshop is intended for designers and researchers from
academia and industry to discuss the role of agents in dealing with
social information. How can social agents be integrated into
collaborative relationships so that information and expertise can be
distributed and matched to the right people, where appropriate
relationships can be developed, and where collective knowledge can be
established?
Participation requires the submission of an input paper (3-6 pages) that
should try to address the points described above, from any of the
following aspects:
-experiences with agent use in collaboration
-design of agent systems
-application areas
-interface design
The paper should be sent for review by June 15 to:
Thomas Kreifelts
GMD-FIT.CSCW
D-53754 Sankt Augustin
Germany
Email: [email protected]
Fax: +49-2241-142084
Electronic submission is encouraged, HTML being the preferred format.
The selection of participants will be based on the input papers.
Accepted participants will be notified before the end of June so that
they can take advantage of early registration by July 1. For those who
are interested in submitting a paper to the workshop, but are not able
to meet the June 15 deadline, please contact the organizers as soon as
possible expressing your interest to participate in the workshop. The
accepted input papers will be distributed electronically in advance to
the workshop participants. The workshop will be structured around the
presentation of selected input papers to stimulate the discussion. Note
that participation in the workshop requires participation in the ECSCW
97 conference.
Important Dates:
----------------
June 15, 1997 - Deadline for submissions
end of June - Notification of acceptance
...July 1, 1997 - Early registration deadline for the ECSCW '97
conference
September 7, 1997 - The Workshop
For more information: http://orgwis.gmd.de/projects/SAW/ecscw97SoAg.html
Angi Voss GMD FIT D-53754 Sankt Augustin
phone: (+49) 2241-142726
fax: (+49) 2241-142384
e-mail: [email protected]
URL: http://nathan.gmd.de/persons/angi.voss.html
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
410.26 | 97:17 | IJSAPL::OLTHOF | Spellchecked Henry Although | Sat May 17 1997 12:40 | 697 |
| Knowledge Discovery Nuggets 97:17, e-mailed 97-05-15
Publications:
* Phil Chan, CFP: MLJ special issue on IMLM,
http://www.cs.fit.edu/~imlm/
Siftware:
* P. Spedding, Cognos' Scenario Wins PC Week Labs Analyst's
Choice Award, http://www8.zdnet.com/pcweek/reviews/0505/05mining.html
Positions:
* COMPUTATIONAL FINANCE at the Oregon Graduate Institute of Science &
Technology (OGI), http://www.cse.ogi.edu/CompFin/
* George Smith, Research Assistant Position at UEA, Norwich, UK
Meetings:
* Lipo Wang, 2nd Pacific-Asia Conference on Knowledge Discovery and
Data Mining (PAKDD-98), Melbourne, Australia, 15-17 April 1998,
http://www.sd.monash.edu.au/pakdd-98
* David Leake, ICCBR-97: First Call for Participation,
http://www.iccbr.org/iccbr-97.html
* Hakan Erdogmus, CASCON'97 CfP, http://www.cas.ibm.ca/cascon/
* John R. Koza, GP-97 Revised Call for Participation,
http://www-cs-faculty.stanford.edu/~koza/gp97.html
--
Knowledge Discovery Nuggets is a free electronic newsletter for the
Data Mining and Knowledge Discovery community, focusing on the
latest research and applications.
Submissions are most welcome and should be emailed, with a
DESCRIPTIVE subject line (and a URL) to [email protected].
Please keep CFP and meetings announcements short and provide
a URL for details.
To subscribe, see http://www.kdnuggets.com/subscribe.html
KD Nuggets frequency is 3-4 times a month.
Back issues of KD Nuggets, a catalog of data mining tools
("Siftware"), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site
at http://www.kdnuggets.com/
-- Gregory Piatetsky-Shapiro (editor)
[email protected]
********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If the fool would persist in his folly he would become wise.
William Blake
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "IMLM Workshop (pkc)" <[email protected]>
Subject: CFP: MLJ special issue on IMLM
Dear colleagues,
Here is a CFP for the Machine Learning Journal special issue on IMLM.
Submission is due on Oct 1st, 97. Hope you can submit. Thanks.
Phil, Sal, and Dave
------
CALL FOR PAPERS
Machine Learning Journal
Special Issue on
Integrating Multiple Learned Models
for Improving and Scaling Machine Learning Algorithms
Most modern Machine Learning, Statistics and KDD techniques use a
single model or learning algorithm at a time, or at most select one
model from a set of candidate models. Recently however, there has been
considerable interest in techniques that integrate the collective
predictions of a set of models in some principled fashion. With such
techniques often the predictive accuracy and/or the training
efficiency of the overall system can be improved, since one can "mix
and match" among the relative strengths of the models being combined.
Any aspect of integrating multiple models is appropriate for the
special issue. However we intend the focus of the special issue to be
on the issues of improving prediction accuracy and improving training
efficiency in the context of large databases.
Submissions are sought in, but not limited to, the following topics:
1) Techniques that generate and/or integrate multiple learned
models. Examples are schemes that generate and combine
models by
* using different training data distributions
(in particular by training over different partitions
of the data)
* using different sampling techniques to generate different
partitions
* using different output classification schemes
(for example using output codes)
* using different hyperparameters or training heuristics
(primarily as a tool for generating multiple models)
2) Systems and architectures to implement such strategies.
For example,
* parallel and distributed multiple learning systems
* multi-agent learning over inherently distributed data
3) Techniques that analyze the integration of multiple learned models for
* selecting/pruning models
* estimating the overall accuracy
* comparing different integration methods
* tradeoff of accuracy and simplicity/comprehensibility
Schedule:
October 1: Deadline for submissions
December 15: Deadline for getting decisions back to authors
March 15: Deadline for authors to submit final versions
August 1998: Publication
Submission Guidelines:
1) Manuscripts should conform to the formatting instructions in:
http://www.cs.orst.edu/~tgd/mlj/info-for-authors.html
The first author will be the primary contact unless otherwise stated.
2) Authors should send 5 copies of the manuscript to:
Karen Cullen
Machine Learning Editorial Office
Attn: Special Issue on IMLM
Kluwer Academic Press
101 Philip Drive
Assinippi Park
Norwell, MA 02061
617-871-6300
617-871-6528 (fax)
[email protected]
and one copy to:
Philip Chan
MLJ Special Issue on IMLM
Computer Science
Florida Institute of Technology
150 W. University Blvd.
Melbourne, FL 32901
407-768-8000 x7280 (x8062) (407-674-7280/8062 after 6/1/97)
407-984-8461 (fax)
3) Please also send an ASCII title page (title, authors, email, abstract,
and keywords) and a postscript version of the manuscript to
[email protected].
General Inquiries:
Please address general inquiries to:
[email protected]
Up-to-date information is maintained on WWW at:
http://www.cs.fit.edu/~imlm/
Co-Editors:
Philip Chan, Florida Institute of Technology [email protected]
Salvatore Stolfo, Columbia University [email protected]
David Wolpert, IBM Almaden Research Center [email protected]
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[The following is a commercial announcement. GPS]
From: "Spedding, Patrick" <[email protected]>
Subject: Cognos' Scenario Wins PC Week Labs Analyst's Choice Award
Date: Fri, 9 May 1997 05:36:20 -0400
Cognos' Scenario Wins PC Week Labs Analyst's Choice Award
BURLINGTON, Mass., May 6 /PRNewswire/ -- Cognos'(R) (Nasdaq: COGNF;
Toronto: CSN)
Scenario(TM) data mining tool won PC Week Labs Analyst's
Choice Award after a head-to-head review with a competing product. Scenario's
"innovative interface makes it the coolest software package we've seen
this year," said the review, which cited its superiority, power and graphics.
Scenario extends the industry's most comprehensive business intelligence
product family, joining Cognos' market-leading PowerPlay(R), the
universal OLAP client, and award-winning Impromptu(R) query and reporting
tool.
"This award substantiates Cognos' belief that data mining in the
hands of business users offers up a powerful, functional and affordable
competitive edge," said Alan Rottenberg, senior vice president, Business
Intelligence products. "Putting data mining capabilities into the hands of decision makers
and knowledge workers extends our strategy of enabling them to react
quickly to newfound knowledge, whether in operational systems or data
warehouses.
Scenario joins Cognos' other award-winning business intelligence tools
for fastest time to results, lowest cost of ownership and unparalleled ease
of use."
PC Weeks Labs, the world's largest independent testing laboratory,
applauded both Cognos' Scenario and the competitor for bringing new data
mining techniques to the PC. "But in head-to-head testing," it wrote,
"Scenario safely mined more usable information than its competitor,
making it our top pick."
Designed for spotting patterns and exceptions in business data that
might
otherwise be missed, Scenario's sophisticated interface allows users to
readily visualize the business information being uncovered. It
automates the
discovery and ranking of critical factors impacting a business, exposes
hidden relationships between factors and establishes thresholds and benchmarks.
An intuitive, cost-effective desktop tool, Scenario liberates data mining
from what is typically an expensive and time-consuming process. Insights
derived using Scenario are achieved directly by those best positioned to
use the knowledge and effect rapid change.
Scenario 1.0, released in April 1997, is available from Cognos for
$695.
It runs on Windows 95 and Windows NT and requires an IBM-compatible 486
PC and 8 MB of RAM.
(see http://www8.zdnet.com/pcweek/reviews/0505/05mining.html for PC week
comparison of Scenario and BusinessMiner. GPS)
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 7 May 1997 11:46:09 -0700 (PDT)
From: Computational Finance <[email protected]>
Subject: Computational Finance Graduate Programs
=======================================================================
COMPUTATIONAL FINANCE at the Oregon Graduate Institute of Science &
Technology (OGI)
Master of Science Concentrations in
Computer Science & Engineering (CSE)
Electrical Engineering (EE)
Upcomming MS Application Deadline for Fall 1997: May 15 & June 15!
New! Certificate Program Designed for Part-Time Students.
For more information, contact OGI Admissions at (503)690-1027 or
[email protected], or visit our Web site at:
http://www.cse.ogi.edu/CompFin/
=======================================================================
Computational Finance Overview:
Advances in computing technology now enable the widespread use of
sophisticated, computationally intensive analysis techniques applied to
finance and financial markets. The real-time analysis of tick-by-tick
financial market data, and the real-time management of portfolios of
thousands of securities is now sweeping the financial industry. This has
opened up new job opportunities for scientists, engineers, and computer
science professionals in the field of Computational Finance.
The strong demand within the financial industry for technically
sophisticated graduates is addressed at OGI by the Master of Science and
Certificate Programs in Computational Finance. Unlike a standard two year
MBA, the programs are directed at training scientists, engineers, and
technically oriented financial professionals in the area of quantitative
finance.
The master's programs lead to a Master of Science in Computer Science and
Engineering (CSE track) or in Electrical Engineering (EE track). The MS
programs can be completed within 12 months on a full-time basis. In
addition, OGI has introduced a Certificate program designed to provide
professionals in engineering and finance a means of upgrading their skills
or acquiring new skills in quantitative finance on a part-time basis.
The Computational Finance MS concentrations feature a unique combination
of courses that provides a solid foundation in finance at a non-trivial,
quantitative level, plus the essential core knowledge and skill sets of
computer science or the information technology areas of electrical
engineering. These skills are important for advanced analysis of markets
and for the development of state-of-the-art investment analysis, portfolio
management, trading, derivatives pricing, and risk management systems.
The MS in CSE is ideal preparation for students interested in securing
positions in information systems in the financial industry, while the MS
in EE provides rigorous training for students interested in pursuing
careers as quantitative analysts at leading-edge financial firms.
The curriculum is strongly project-oriented, using state-of-the-art
computing facilities and live/historical data from the world's major
financial markets provided by Dow Jones Telerate. Students are trained in
the use of high-level numerical and analytical software packages for
analyzing financial data.
OGI has established itself as a leading institution in research and
education in Computational Finance. Moreover, OGI has strong research
programs in a number of areas that are highly relevant for work in
quantitative analysis and information systems in the financial industry.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Tue, 13 May 1997 14:40:06 +0100 (BST)
From: [email protected] (George Smith)
Subject: Research Assistant Position at UEA, Norwich, UK
The School of Information Systems, University of East
Anglia, Norwich has a vacancy for a
Research Assistant
to work on a project entitled "Datamining in the
Telecommunications Sector".
A computer graduate with at least a 2(I) degree in computing
or allied subject is sought for a two year post
starting August 1st, 1997, or as soon as possible
thereafter.
The appointee will work within a leading telecommunications
company, Nortel plc, on a day-to-day basis but
will be an employee of the University of East Anglia.
Opportunities will exist for registration for a part-time
higher degree at the University. A successful applicant will
be expected to have a high degree of numeracy and
a strong computing background. Preference will be given to
those who, in addition, have some knowledge
(and expertise) in one or more of the following:
evolutionary computation, operations research, artificial
intelligence or telecommunications.
The research is sponsored jointly by the Teaching Company
Scheme and by Nortel plc and involves the
development and application of various inference and
heuristic techniques, including genetic algorithms,
simulated annealing and tabu search, to elicit knowledge
from large scale data sets generated within the
telecommunications industry.
Initial salary to be determined but expected to be around
16K UK pounds.
Applicants are invited to telephone Dr George D Smith (+44
(0) 1603 593260) or email [email protected] for
further information.
Applications in the form of a covering letter plus three
copies of a CV, including the names and addresses of
three referees, should be sent to:
Dr George D Smith
School of Information Systems
University of East Anglia
Norwich
NR4 7TJ, UK
on or before Friday 6th June 1997.
Tel: + 44 (0)1603 593260
FAX: + 44 (0)1603 593344
Email: [email protected]
www: http://www.sys.uea.ac.uk/Teaching/Staff/gds.html
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 12 May 1997 16:14:32 +1000
From: Lipo Wang <[email protected]>
Subject: CFP: Conference on Knowledge Discovery and Data Mining (PAKDD-98)
======================================================================
C A L L F O R P A P E R S
======================================================================
The Second Pacific-Asia Conference on
Knowledge Discovery and Data Mining (PAKDD-98)
----------------------------------------------
Melbourne, Australia, 15-17 April 1998
======================================
URL: http://www.sd.monash.edu.au/pakdd-98
The Second Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD-98) will provide an international forum for the sharing
of original research results and practical development experiences
among researchers and application developers from different KDD
related areas such as machine learning, databases, statistics,
knowledge acquisition, data visualization, software re-engineering,
and knowledge-based systems. It will follow the success of PAKDD-97
held in Singapore in 1997 by bringing together participants from
universities, industry and government.
Papers on all aspects of knowledge discovery and data mining are
welcome. Areas of interest include, but are not limited to:
- Data and Dimensionality Reduction
- Data Mining Algorithms and Tools
- Data Mining and Data Warehousing
- Data Mining on the Internet
- Data Mining Metrics
- Data Preprocessing and Postprocessing
- Data and Knowledge Visualization
- Deduction and Induction in KDD
- Discretisation of Continuous Data
- Distributed Data Mining
- KDD Framework and Process
- Knowledge Representation and Acquisition in KDD
- Knowledge Reuse and Role of Domain Knowledge
- Knowledge Acquisition in Software Re-Engineering and Software
Information Systems
- Induction of Rules and Decision Trees
- Management Issues in KDD
- Machine Learning, Statistical and Visualization Aspects of KDD
(including Neural Networks and Inductive Logic Programming)
- Mining in-the-large vs Mining in-the-small
- Noise Handling
- Security and Privacy Issues in KDD
- Successful/Innovative KDD Applications in Science, Government,
Business and Industry.
Both research and applications papers are solicited. All submitted
papers will be reviewed on the basis of technical quality, relevance
to KDD, significance, and clarity. Accepted papers will be published
in the conference proceedings by an international publisher. A
selected number of the accepted papers will be expanded and revised
for inclusion in a special issue of an international journal.
All submissions should be limited to a maximum of 5,000 words. Four
hardcopies should be forwarded to the following address.
Professor Ramamohanarao Kotagiri (PAKDD '98)
Department of Computer Science
The University of Melbourne
Parkville, VIC 3052
Australia
Please include a cover page containing the title, authors (names,
postal and email addresses), an 200-word abstract and up to 5
keywords. This cover page must accompany the paper.
*************** I m p o r t a n t D a t e s ***************
* 4 copies of full papers received by: October 16, 1997 *
* acceptance notices: December 22, 1997 *
* final camera-readies due by: January 30, 1998 *
*************************************************************
Conference Chairs:
==================
Ross Quinlan Sydney University
Bala Srinivasan Monash University
Program Chairs:
===============
Xindong Wu Monash University
Ramamohanarao Kotagiri Melbourne University
Organising Committee Co-Chairs:
===============================
Kevin Korb Monash University
Graham Williams CSIRO, Australia
PAKDD-98 Publicity Chair:
=========================
Lipo Wang Deakin University
PAKDD-98 Tutorial Chair:
========================
Jon Oliver Monash University
PAKDD-98 Treasurer:
===================
Michelle Riseley Monash University
Program Committee:
==================
Grigoris Antoniou James Boyce Ivan Bratko
Mike Cameron-Jones Arbee Chen David Cheung
Vic Ciesielski Honghua Dai John Debenham
Olivier de Vel Tharam Dillon Guozhu Dong
Peter Eklund Usama Fayyad Matjaz Gams
Yike Guo David Hand Evan Harris
David Heckerman David Kemp Masaru Kitsuregawa
Kevin Korb Hingyan Lee Jae-Kyu Lee
Deyi Li Bing Liu Huan Liu
Zhi-Qiang Liu Hongjun Lu Dickson Lukose
Kia Makki Heikki Mannila Peter Milne
Shinichi Morishita Hiroshi Motoda Hwee-Leng Ong
Jon Oliver Maria Orlowska G. Piatetsky-Shapiro
Niki Pissinou Peter Ross Claude Sammut
S. Seshadri Hayri Sever Arun Sharma
Heinz Schmidt Evangelos Simoudis Atsuhiro Takasu
Takao Terano B. Thuraisingham Kai Ming Ting
David Urpani R. Uthurusamy Lipo Wang
Geoff Webb Graham Williams Beat Wuthrich
Xin Yao John Zeleznikow Dian-cheng Zhang
Ming Zhao Zijian Zheng Ning Zhong
Justin Zobel
Further Information
===================
Dr Xindong Wu
Department of Software Development
Monash University
900 Dandenong Road
Caulfield East, Melbourne 3145
Australia
Phone: +61 3 9903 1025
Fax: +61 3 9903 1077
Email: [email protected]
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Tue, 6 May 1997 13:08:00 -0500 (EST)
From: "David Leake" <[email protected]>
Subject: ICCBR-97: First Call for Participation
ICCBR-97
Second International Conference on Case-Based Reasoning
Brown University
Providence, Rhode Island, July 25-27, 1997
Note: The early registration deadline is May 28, 1997 (extended from May 20).
Additional information is available from http://www.iccbr.org/iccbr-97.html
Questions should be sent to [email protected].
--------------- Conference Overview ---------------
In 1995, the first International Conference on Case-Based Reasoning (ICCBR-95)
was held in Sesimbra, Portugal, as the start of a biennial series. ICCBR-97,
the Second International Conference on Case-Based Reasoning, will be held at
Brown University in Providence, Rhode Island, on July 25-27, immediately prior
to AAAI-97 and IAAI-97.
The program of ICCBR-97 will include both research and applications. The
three-day conference will feature invited talks, paper and poster sessions,
and panels presenting both mature work and new ideas, selected from over
100 submissions to the conference. The conference aims to achieve a
vibrant interchange between researchers and practitioners with different
perspectives on fundamentally related issues, in order to examine and
advance the state of the art in case-based reasoning and related fields.
Topics to be addressed in conference presentations include:
* Case representation, indexing and retrieval, similarity assessment, case
adaptation, and analogical reasoning
* Case-based and instance-based learning, index learning, and integrating
CBR with other learning methods
* Case-based reasoning and related approaches for task areas such as
education, design, and medicine
* Integration of CBR with other AI methods and comparisons to other
approaches
* Methods and systems for decision support, knowledge management, and
intelligent information retrieval
* Novel application areas for case-based techniques, deployed applications
with significant impact, and lessons learned from application
development
(See http://www.iccbr.org/iccbr-97.html for details on registration, etc.)
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: 8 May 1997 10:17:04 -0500
From: "Erdogmus" <[email protected]>
Subject: CASCON'97 CfP
CASCON'97 web site: http://www.cas.ibm.ca/cascon/
--
CASCON'97: Meeting of Minds
November 10-13, 1997
International Plaza Hotel
Mississauga, Ontario, Canada
Dear Colleague,
CASCON '97, the seventh annual IBM Center for Advanced Studies Conference
is upon us. CASCON provides an excellent opportunity for academic,
governmental, and industrial research communities to share their work. We encourage you to
submit papers. The deadline for paper submissions is June 27, however, we
would like to know about your intention to submit a paper earlier (by May 16,
if possible). If you are thinking about submitting a paper, please
register as soon as possible on our web site at
http://www.cser.ca:8001/
All you have to do is to fill out a simple online form specifying a
tentative title and some keywords. This information can easily be changed
any time using the automated system.
This year, we are soliciting papers in a wide range of topics including =
but not limited to the following:
- Distributed systems and applications: Internet and the WWW, electronic
commerce, tele-learning, tele-medicine, CSCW, multimedia, distributed
object technologies, Java, performance analysis, high-speed networks,
and applications management
- Database technology: data mining, knowledge recovery, digital =
libraries, and data warehousing
- User technologies: human-computer interaction, navigation, and GUIs
- Software engineering and practices: maintenance, design recovery, program
understanding, visualization, reuse, frameworks and design patterns,
development environments, reliability, testing and validation,
metrics, and real-time systems
- Compiler technology: new techniques, compiler development, optimization,
parallelism, and architectures
For more information about CASCON'97, please visit the web site
http://www.cas.ibm.ca/cascon/
We are looking forward to your participation.
Dr. Hakan Erdogmus
CASCON'97 Program Co-chair
[email protected]
CASCON'97 web site: http://www.cas.ibm.ca/cascon/
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Sat, 10 May 1997 13:09:26 -0700 (PDT)
From: "John R. Koza" <[email protected]>
Subject: GP-97 Revised Call for Participation
CALL FOR PARTICIPATION
Genetic Programming 1997 Conference (GP-97)
July 13 - 16 (Sunday - Wednesday), 1997
Fairchild Auditorium - Stanford University - Stanford, California
-----------------------------------------------------------------------
In cooperation with American Association for Artificial Intelligence (AAAI),
Association for Computing Machinery (ACM), SIGART, and Society for Industrial
and Applied Mathematics (SIAM)
-----------------------------------------------------------------------
WWW FOR GP-97: http://www-cs-faculty.stanford.edu/~koza/gp97.html
-----------------------------------------------------------------------
NOTE: You are urged to make your housing arrangements as early as possible
since convenient hotel locations are limited. Also, if you are driving
to the Stanford campus, please be aware of parking lot construction in
the area of Fairchild Auditorium and allow a little extra time
(particularly on the first Monday session) to find a parking place.
-----------------------------------------------------------------------
Genetic programming is an automatic programming technique for evolving
computer programs that solve (or approximately solve) problems. Starting with
a primordial ooze of thousands of randomly created computer programs, a
population of programs is progressively evolved over many generations using
the Darwinian principle of survival of the fittest, a sexual recombination
operation, and occasional mutation.
The first annual genetic programming conference in 1996 featured 15 tutorials,
2 invited speakers, 3 parallel tracks, 73 papers, and 17 poster papers in
proceedings book, and 27 late-breaking papers in a separate book distributed
to conference attendees, and 288 attendees. A description of GP-96 appears in
the October 1996 issue of Scientific American
(http://www.sciam.com/WEB/1096issue/1096techbus3.html). This second annual
conference in 1997 reflects the rapid growth of this field in which over 600
technical papers have been published since 1992. For August 5, 1996 article
in E. E. Times on GP-96 conference and August 12, 1996 article in E. E Times
on John Holland's invited speech at GP-96, go to
http://www.techweb.com/search/search.html
There will be 36 long, 33 short, and 15 poster papers at the Second Annual
Genetic Programming Conference to be held on July
13-16 (Sunday - Wednesday), 1997 at Stanford University.
In addition, there will be late-breaking papers (published in a separate
book in mid June after the June 11 deadline for late-breaking papers).
Topics include, but are not limited to,
applications of genetic programming, theoretical foundations of
genetic programming, implementation issues, technique extensions, cellular
encoding, evolvable hardware, evolvable machine language programs, automated
evolution of program architecture, evolution and use of mental models,
automatic programming of multi-agent strategies, distributed artificial
intelligence, auto-parallelization of algorithms, automated circuit synthesis,
automatic programming of cellular automata, induction, system identification,
control, automated design, data and image compression, image analysis, pattern
recognition, molecular biology applications, grammar induction, and
parallelization. Papers describing recent developments are also solicited in
the following additional areas: genetic algorithms, classifier systems,
evolutionary programming and evolution strategies, artificial life and
evolutionary robotics, DNA computing, and evolvable hardware.
-----------------------------------------------------------------------
full information at http://www-cs-faculty.stanford.edu/~koza/gp97.html
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
410.27 | 97:18 | IJSAPL::OLTHOF | Spellchecked Henry Although | Thu Jun 05 1997 22:54 | 749 |
| Knowledge Discovery Nuggets 97:18, e-mailed 97-05-27
News:
* Ronny Kohavi, Silicon Graphics' MineSet used in Incyte's LifeTools 3D
http://www.incyte.com/press/1997/PR9712-LT3D.html
* R. Zicari, COMDEX Internet Application Awards,
http://www.ltt.de
* Brij Masand, HPCwire: Robert Grossman discusses managing, mining
large data sets
Publications:
* GPS, First Issue of DMKD journal is available on-line in PDF
format, http://www.wkap.nl/kapis/CGI-BIN/WORLD/kaphtml.htm?DAMISAMPLE
* Andy Pryke, Bibliography of KDD and Data Mining Papers,
http://www.cs.bham.ac.uk/~anp/papers.html
Meetings:
* D. Fischer, COLT/ICML Early Registration deadline June 2,
http://cswww.vuse.vanderbilt.edu/~mlccolt/
* Jan Komorowski, PKDD'97 -- Call For Participation,
http://www.idi.ntnu.no/pkdd97/
* David Heckerman, Summer School on PROBABILISTIC GRAPHICAL MODELS
http://www.newton.cam.ac.uk/programs/nnm.html
* Vasant Honavar, CFP: Workshop on Automata Induction
Grammatical Inference, and Language Acquisition at ICML-97
http://www.cs.iastate.edu/~honavar/mlworkshop.html
* Honghua Dai, KDEX-97: IEEE Knowledge and Data Engineering
Exchange Workshop, http://www.sd.monash.edu.au/kdex-97
* Gordon, CFP: ICML-97 workshop on Reinforcement Learning
http://www.cs.cmu.edu/~ggordon/ml97ws
--
Knowledge Discovery Nuggets is a free electronic newsletter for the
Data Mining and Knowledge Discovery community, focusing on the
latest research and applications.
Submissions are most welcome and should be emailed, with a
DESCRIPTIVE subject line (and a URL) to [email protected].
Submissions may be edited for length.
Please keep CFP and meetings announcements short and provide
a URL for details.
To subscribe, see http://www.kdnuggets.com/subscribe.html
KD Nuggets frequency is 3-4 times a month.
Back issues of KD Nuggets, a catalog of data mining tools
("Siftware"), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site
at http://www.kdnuggets.com/
-- Gregory Piatetsky-Shapiro (editor)
[email protected]
********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************
~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"When you come to a fork in the road, take it."
- Yogi Berra -
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 15 May 1997 22:22:53 -0700
From: Ronny Kohavi <[email protected]>
Subject: Silicon Graphics' MineSet used in Incyte's LifeTools 3D
A recent press release by Incyte Pharmaceuticals Inc. announces
LifeTools 3D, a powerful data mining and visualization software based
on Silicon Graphics' MineSet(tm) software suite of data analysis and
visualization tools. In collaboration with Silicon Graphics, Incyte
created customized functions that are specifically designed to help
researchers view, explore, and identify novel genes within LifeSeq.
See
http://www.incyte.com/press/1997/PR9712-LT3D.html
for details.
--
Ronny Kohavi ([email protected], http://robotics.stanford.edu/~ronnyk)
Engineering Manager, Analytical Data Mining.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "Prof. Zicari" <[email protected]>
Date: Fri, 9 May 1997 23:39:14 +0200 (METDST)
Subject: COMDEX Internet Application Awards.
News Release
First COMDEX Internet Application Awards
IBM, Microsoft and SUN to sponsor Awards Program for the new generation of
Internet applications
Frankfurt -- April 1997. The three leading IT companies IBM, MICROSOFT and
SUN Microsystems will jointly support an international Awards Program
designed for the new generation of Internet-based applications for
business.
The first COMDEX Internet Application Awards will be given out in the
following three categories:
Best Intranet-based application for enterprise usage
Focus: Use of an Intranet for Institutional/Corporate knowledge
for competitive advantage.
Most Innovative Web Site
Focus: Best or most innovative Web Site with respect to user
interface, easy to use, innovative content.
Best Transactional Internet Application
Focus: Database, interactive applications.
The Award winners will be selected among the submittals by a jury of
international experts. The Awards ceremony will take place on October 8,
1997 at the trade show COMDEX Internet & Object World Frankfurt'97 (October
7-10,1997, Sheraton Conference Center, Frankfurt/Main Airport).
"Successful Internet technologies like Java confirm us in considering the
Internet as the future base for enterprise computing. The COMDEX Internet
Application Awards program provides an excellent forum for honoring and
supporting outstanding Internet applications. We are looking forward to an
exciting contest", says Gert Haas, Marketing Director, SUN Microsystems,
Germany.
Microsoft's commitment to the Awards Program is explained by Karl-Heinz
Breitenbach, Customer Unit Manager Internet & Developer Customer Unit,
Microsoft Germany: "The availability of all relevant information at work is
the base for a fast and successful decision in a company. We therefore have
taken the challenge of providing 'information at your fingertips' very
early and this is reflected by our current product line. Internet
technology today allows to rapidly and reliably represent information
distributed in all branches of the company via a so called Intranet
solution. With the sponsorship of the COMDEX Internet Application Awards,
Microsoft confirms its commitment to innovative Internet technologies which
perfectly match our company goals."
Sanyaya Addanki, General Manager of Network Computing Solutions, IBM EMEA,
explains IBM's motivation for a sponsorship: "IBM is committed to providing
companies with solutions that link business critical applications and data
with the global reach and easy access of the web. We are proud to sponsor
the COMDEX Internet Application Awards Program, which fosters the
development of electronic business applications. Electronic business is the
cornerstone of IBM's network computing vision."
To obtain the entry kit:
download it from the web at: http://www.ltt.de
send an e-mail to: [email protected]
call LogOn at: +49-6173-9558-51
COMDEX Internet and Object World Frankfurt '97 are produced by SOFTBANK
COMDEX Inc. and LogOn Technology Transfer GmbH.
The show is sponsored by: Object Management Group (OMG), A1-Solutions,
Business Online, Computer Associates, Computer Zeitung, MID and redmond's.
Internet and Wireless are sponsored by Omnilink Internet Service Center and
ARtem.
Information on Conferences and Exhibition:
Christiane Sattler
LogOn Technology Transfer GmbH
Burgweg 14, D-61476 Kronberg/Ts., Germany
phone: +49-6173-9558-53
fax: +49-6173-9404-20
e-mail: [email protected]
Web: http://www.ltt.de
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[the following article is included with the permission of HPCwire. GPS]
Date: Fri, 23 May 1997 14:00:49 -0400
From: Brij Masand <[email protected]>
Subject: ROBERT GROSSMAN DISCUSSES MANAGING, MINING LARGE DATA SETS
[From H P C w i r e *** May 23, 1997: Vol. 6, No. 20 ***]
ROBERT GROSSMAN DISCUSSES MANAGING, MINING LARGE DATA SETS
by Alan Beck, editor in chief, HPCwire 05.23.97
=============================================================================
Chicago, Ill. -- Issues raised in the effective archiving, managing and
mining of very large data sets have significant pragmatic repercussions
throughout both commercial and scientific computing. To learn more about the
state of the art in this area, HPCwire interviewed Robert Grossman, professor
of mathematics, statistics and computer science at the University of Illinois
at Chicago, president of Magnify, and principal researcher in the Terabyte
Challenge.
-------------------
HPCwire: Please give an overview of the current status of the Terabyte
Challenge, including funding sources and participants.
GROSSMAN: "The Terabyte Challenge is open, distributed test bed for
managing and mining massive data sets. The infrastructure for the Terabyte
Challenge is provided by the NSF sponsored National Scalable Cluster Project
(NSCP) and its industrial partners. The NSCP philosophy is to use commodity
components with high performance networking to build virtural platforms with
supercomputing power. The software tools developed for the Terabyte Challenge
seek to balance high performance computing with the high performance
input/output required by data intensive and data mining applications.
"Currently, the NSCP consists of approximately 25 nodes and 500 Gigabytes
of disk at both UIC and UPenn, together with smaller clusters at the
participating partners. The infrastructure will be more than doubling over
the next few months to over 100 nodes and 2 Terabytes of disk. Unlike
other centers, the NSCP is configured for managing and mining large data
sets, ranging in size from 100 to 500 Gigabytes.
"We are currently planning the third Annual Terabyte Challenge, which
will take place at SC 97. The first two took place at Supercomputing 95 and
96 (both won High Performance Computing Challenge Awards).
"Currently, the University of Illinois at Chicago, the University of
Pennsylvania, and the University of Maryland form the core academic team. Two
industrial partners-HUBS (Philadelphia) and Magnify, Inc. (Chicago) will also
be working closely on this year's Terabyte Challenge. Funding is provided by
NSF to the NSCP Consortium, by DOE to UIC and UPenn, and by DOD to Magnify.
We expect additional partners to join us. If interested, please contact RLG.
"Current applications include mining scientific data (UIC and UPenn),
mining medical data (UIC and UPenn), detecting network intrusions with data
mining (Magnify, Inc), and data intensive computing in support of virtual
reality (HUBS).
"The web site http://www.lac.uic.edu will contain additional information
shortly."
HPCwire: What progress has been made in scaling algorithms for very large
data sets?
GROSSMAN: "I use the 10x rule: one can expect to archive 10-100x more data
than one can manage, and manage 10-100x more data than one can mine. This
makes sense since archiving requires a simple retrieval of files or objects,
managing requires the ability to perform simple queries, and mining requires
statistically and numerically intensive queries. At SC 96, we mined data sets
that were roughly 100-250 Gigabytes in size using 10-25 nodes. At SC 97, we
hope to mine 500-1000 Gigabytes of data on 50-100 nodes. I want to emphasize
that one can manage and perform simple queries of much larger data sets (up
to tens of Terabytes), but the detailed data mining of even a few hundred
gigabytes of data is a challenge today."
"Parallelizing data mining algorithms can be done in several ways. Most
data mining algorithms are sufficiently compute-intensive that they work best
when the data and the working space required for the algorithm fit into
memory. For large data sets this is not clearly not possible and the
challenge is to balance the i/o requirements of the algorithm with the cpu
requirements. Several approaches are possible:
"For the purposes here, we assume that the data mining process consists of
several steps, including 1) extracting patterns, 2) using these patterns
automatically to build predictive models, and 3) selecting or combining
multiple predictive models to produce a single decision. In each of the four
methods described next, one or more subsets of the data are chosen and mined.
The methods differ in how the subsets are chosen: the subsets may be created
by random draws, by a partition of the data, by a cover of the data, or by a
range based query of the data.
"In sample based data mining, one samples a large data set and then
extracts a patterns or builds a model. This is the most common approach. It
works well for patterns that are still easily found after down sampling. It
has the advantage that the compute time is vastly reduced (since the data to
be mined is vastly smaller) and the disadvantage that the patterns obtained
are often not indicative of the whole data set -- this is closely related to
the problem of over-fitting. This approach is most often not parallelized,
although sometimes sampling can be done in parallel and the results combined
into one model using model averaging techniques.
"In partitioned based data mining, the data set is partitioned into distinct
subsets which fit into memory, each partition is separately mined to produce
a collection of predictive models, and then the predictive models are
combined using model selection and model averaging techniques. This type of
data mining is easily parallelized, since one (or more) processors can be
assigned to each partition.
"Cover-based data mining is similar to partitioned based data mining, but
the different subsets to be mined can be overlapping. This is closely
related to what is called local mining, in which the patterns extracted use
data which is localized in some fashion, say based on the N closest data
points to a fixed reference point.
"Attribute-based data mining creates different subsets to be mined by using
an attribute based query of the underlying data set. For example, all objects
whose first attribute is less than 1.1 and whose second attribute is equal to
"A", etc. are selected and then mined.
"For more information, see R. L. Grossman, Scaling Data Mining Algorithms
Using Cover-based Learning with Model Selection and Model Averaging,
http://www.magnify.com "
HPCwire: How is the TC approaching the mining of highly distributed data?
GROSSMAN: "On the systems side, we have made good progress in this area.
The NSCP clusters at UIC and UPenn have been connected for several weeks now
by the vBNS at OC-3 (155 Mbps) speeds. Using this infrastructure we have
experimented with wide area data mining of scientific and medical data. We
are currently using this experience to develop new algorithms for wide area
data mining and to develop new generations of our data management and data
mining tools. The challenge is to develop a new class of algorithms for
extracting patterns from widely distributed data without the necessity of
first warehousing the data."
HPCwire: What progress has been made in better understanding dynamical
systems via data mining?
GROSSMAN: "Not as much as we would have liked. Data mining algorithms
today, by and large, work with data which is flat and static. The core
dynamical system concepts of a state vector and its evolution in time are
missing in most data mining algorithms. Hybrid systems is an emerging field
which combines dynamical systems with discrete structures such as rule
systems and automata. The latter can express the patterns discovered in data
mining. Researchers working in the NSCP are actively investigating exploiting
hybrid systems and related techniques to develop next generation data mining
algorithms which can utilize state information and work with time varying
data."
HPCwire: How is TC research being made available to the commercial sector?
Have any new products or partnerships resulted from TC-generated technology?
GROSSMAN: "The NSCP and the Terabyte Challenge have 1) published the core
ideas they have developed for data mining and data intensive computing, 2)
developed reference architectures and implementations for software tools to
support data mining (the UIC software tools PTool, JTool, and DMTool), and 3)
encouraged companies to exploit this technology for data intensive computing
and data mining.
"To date, HUBS in Philadelphia and Magnify, Inc. in Chicago have begun to
employ some of these ideas in the products and services they offer.
Currently, regional data minings centers are in the planning process in both
Chicago and Philadelphia."
HPCwire: How do you see the TC evolving over the next five years?
GROSSMAN: "The most exciting development is the expected transformation of
the NSCP into two regional data mining centers with very strong industrial
ties: one in Chicago and one in Philadelphia. This has three important
consequences: 1) First the compute, i/o, and networking infrastructure which
we can dedicate to data mining projects is expected to double this year and
hopefully to double again in about two years. 2) With our industrial
partners, we are actively working to demonstrate the practical feasibility of
mining massive data sets and to establish open standards for managing,
mining, and modeling massive data sets. 3) Using the vBNS network connecting
the centers in Chicago and Philadelphia, we are finding it easy to experiment
with the type of wide area data mining issues which we expect to take on an
increasing important role for scientific, engineering, medical, and business
data mining applications.
"To summarize, during the next five years, we expect the Terabyte Challenge
not only to continue to push the boundaries of massive data mining through an
annual competition, but also, together with its industrial partners, to be
actively involved with establishing data mining standards and reference
implementations of software tools for managing, mining, and modeling massive
data sets.
"Additional participants for 1997 competition are welcome. Please contact
one of the organizers if interested. Additional information
can be found at http://www.nscp.uic.edu "
--------------------
Alan Beck is editor in chief of HPCwire. Comments are always welcome and
should be directed to [email protected]
Copyright 1997 HPCwire. Redistribution of this article is forbidden by law
without the expressed written consent of the publisher. For a free trial
subscription to HPCwire, send e-mail to [email protected].
H P C w i r e
The Text-on-Demand E-zine for High Performance Computing
***************************************************************************
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 22 May 1997 15:05:54 -0400
From: Gregory Piatetsky-Shapiro <[email protected]>
Subject: First Issue of DMKD journal is available on-line in PDF format
The premiere issue of Data Mining and Knowledge Discovery journal
is available on-line, in PDF format, at
http://www.wkap.nl/kapis/CGI-BIN/WORLD/kaphtml.htm?DAMISAMPLE
To read this very good (in my biased opinion) issue you need an Acrobat reader,
which you can download from http://www.adobe.com/acrobat/
Only the first issue will be freely available on-line,
but you can subscribe to the journal for $50 individual rate, more
for institutional rate
-- see http://www.wkap.nl/kapis/CGI-BIN/WORLD/journalhome.htm?1384-5810
for subscription information. Please support this journal !
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Fri, 23 May 97 22:12:09 BST
Subject: Nuggets: Bibliography of KDD and Data Mining Papers
The Master Bibliography of KDD and Data Mining Papers is a
bibliography of over 400 papers on the topics of Data Mining and
Knowledge Discovery in Databases (this includes closely related papers
on visualisation and machine learning). More than 70 of the papers are
online.
It is available in either bibtex, or html annotated bibtex formats
from:
http://www.cs.bham.ac.uk/~anp/papers.html
A search interface is also available at:
http://www.cs.bham.ac.uk/~anp/bibtex/search.html
Andy additional references, or corrections are gratefully
received. Please email them to me, Andy Pryke, at
[email protected] Only references in machine readable format
(e.g. refer or preferable Bibtex) can be added, due to time
constraints.
Note that all the information I have about the papers in in the
bibliography, and many (330ish) of the papers are not available
online.
Please read the _collection_ copyright statement at
(http://www.cs.bham.ac.uk/~anp/bibtex/copyright.html).
If you find the bibliography useful, you may wish to send me a
postcard (details in the copyright statement).
Andy Pryke
--
Andy Pryke, Research Student, Computer Science, Birmingham University
Data Mining Information - http://www.cs.bham.ac.uk/~anp/TheDataMine.html
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 16 May 1997 19:09:05 -0500
From: [email protected] (Douglas H. Fisher)
Subject: COLT/ICML Early Registration
Early registration for the Tenth Annual Conference on
Computational Learning Theory (COLT-97) and/or the Fourteenth
International Conference on Machine Learning (ICML-97)
concludes June 2, 1997. Room blocks at area hotels and on campus
are also "released" June 2 (though rooms will likely still be available
after that date). See http://cswww.vuse.vanderbilt.edu/~mlccolt/
for more information.
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 16 May 1997 16:44:56 +0200 (MET DST)
From: Jan Komorowski <[email protected]>
Subject: PKDD'97 -- Call For Participation
1st European Symposium on Principles of
Data Mining and Knowledge Discovery in Databases
Trondheim, Norway
June 24-27, 1997
Tutorials: June 24-25
Symposium: June 26-27
This is an invitation to the 1st European Symposium on Principles of
Data Mining and Knowledge Discovery in Databases.
PKDD'97 is the first symposium in an intended series of meetings of
the data mining and knowledge discovery from databases (KDD) community
in Europe. The goal of the PKDD series is to provide a European-based
forum for interaction among all theoreticians and practitioners
interested in data mining and knowledge discovery. Fostering an
interdisciplinary collaboration is one desired outcome, but the main
long-term focus is on theoretical principles for the emerging
discipline of KDD, especially those new principles that go beyond each
of the contributing areas.
There were 50 papers submitted to PKDD'97. After the selection by the
program committee, the papers were assigned into three categories: 14
plenary papers, 13 parallel session papers and 11 poster papers that
include spot-light presentations in the plenary sessions. In
addition, four tutorials were selected: Rough Sets for Data Mining and
Knowledge Discovery, Techniques and Applications of KDD, High
Performance Data Mining, and Data Mining in the Telecommunications
Industry.
The proceedings are published by Springer Verlag.
The invited speakers include Evangelos Simoudis, USA, and Bjarne Foss,
Norway. Theey will provide their different perspectives on the field:
one is data mining for businesses and the other data mining seen from
the point of view of control theory. Panel discussions on the present
situation and the future development of the field are planned.
There will be software exhibitions of both commercial and academic
software.
Please look at the PKDD'97 Homepage (http://www.idi.ntnu.no/pkdd97/) for
detailed information and news about the symposium.
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: David Heckerman <[email protected]>
Subject: Summer School on PROBABILISTIC GRAPHICAL MODELS
Date: Fri, 16 May 1997 08:08:00 -0700
A Newton Institute EC Summer School
PROBABILISTIC GRAPHICAL MODELS
1 - 5 September 1997
Isaac Newton Institute, Cambridge, U.K.
Organisers: C M Bishop (Aston) and J Whittaker (Lancaster)
Probabilistic graphical models provide a very general framework for
representing complex probability distributions over sets of
variables. A powerful feature of the graphical model viewpoint is that
it unifies many of the common techniques used in pattern recognition
and machine learning including neural networks, latent variable
models, probabilistic expert systems, Boltzmann machines and Bayesian
belief networks. Indeed, the increasing interactions between the
neural computing and graphical modelling communities have resulted in
a number of powerful new ideas and techniques. The conference will
include several tutorial presentations on key topics as well as
advanced research talks.
Provisional themes:
Conditional independence; Bayesian belief networks; message
propagation; latent variable models; variational techniques; mean
field theory; learning and estimation; model search; EM and MCMC
algorithms; axiomatic approaches; causality; decision theory; neural
networks; information and coding theory; scientific applications and
examples.
Provisional list of speakers:
C M Bishop (Aston) D J C MacKay (Cambridge)
R Cowell (City) J Pearl (UCLA)
A P Dawid (UCL) M D Perlman (Washington)
D Geiger (Technion) M Piccioni (Aquila)
E George (Texas) R Shachter (Stanford)
W Gilks (Cambridge) J Q Smith (Warwick)
D Heckermann (Microsoft) M Studeny (Prague)
G E Hinton (Toronto) M Titterington (Glasgow)
T Jaakkola (UCSC) J Whittaker (Lancaster)
M I Jordan (MIT) S Lauritzen (Aalborg)
B Kappen (Nijmegen) D Spiegelhalter (Cambridge)
M Kearns (AT&T) S Russell (Berkeley)
This instructional conference will form a component of the Newton
Institute programme on Neural Networks and Machine Learning, organised
by C M Bishop, D Haussler, G E Hinton, M Niranjan and L G Valiant.
Further information about the programme is available via the WWW at
http://www.newton.cam.ac.uk/programs/nnm.html
Location and Costs:
The conference will take place in the Isaac Newton Institute and
accommodation for participants will be provided at Wolfson Court,
adjacent to the Institute. The conference package costs 270 UK pounds
which includes accommodation from Sunday 31 October to Friday 5
September, together with breakfast, lunch during the days that the
lectures take place and evening meals.
Applications:
To participate in the conference, please complete and
return an application form and, for students and postdoctoral fellows,
arrange for a letter of reference from a senior scientist. Limited
financial support is available for participants from appropriate
countries.
Application forms are available from the conference Web Page at
http://www.newton.cam.ac.uk/programs/nnmec.html
Completed forms and letters of recommendation should be sent to Heather
Dawson at the Newton Institute, or by e-mail to
[email protected]
*Closing Date for the receipt of applications and
letters of recommendation is 16 June 1997*
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Vasant Honavar <[email protected]>
Subject: Call for Participation: Workshop on Automata Induction,
Grammatical Inference, and Language Acquisition
Date: Thu, 8 May 1997 10:53:48 -0500 (CDT)
Workshop on
Automata Induction, Grammatical Inference, and Language Acquisition
The Fourteenth International Conference on Machine Learning (ICML-97)
July 12, 1997, Nashville, Tennessee
The Automata Induction, Grammatical Inference, and Language Acquisition
Workshop will be held on Saturday, July 12, 1997 during the Fourteenth
International Conference on Machine Learning (ICML-97) which will be
co-located with the Tenth Annual Conference on Computational Learning Theory
(COLT-97) at Nashville, Tennessee from July 8 through July 12, 1997.
Additional information on ICML-97 and COLT-97 can be found at
http://www.cs.iastate.edu/~honavar/mlworkshop.html
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 21 May 1997 12:23:13 +1000
From: Honghua Dai <[email protected]>
Subject: KDEX-97 Final Call for Papers
1997 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX-97)
--------------------------------------------------------------------
Sponsored by the IEEE Computer Society and Co-located with
the 9th IEEE Tools with Artificial Intelligence Conference
November 4, 1997, Newport Beach, California, U.S.A.
===================================================
Call for Papers
The 1997 IEEE Knowledge and Data Engineering Exchange Workshop
(KDEX-97) will provide an international forum for researchers,
educators and practitioners to exchange and evaluate information and
experiences related to state-of-the-art issues and trends in the areas
of artificial intelligence and databases. The goal of this workshop
is to expedite technology transfer from researchers to practitioners,
to assess the impact of emerging technologies on current research
directions, and to identify emerging research opportunities.
Educators will present material and techniques for effectively
transferring state-of-the-art knowledge and data engineering
technologies to students and professionals. The workshop is currently
scheduled for an one-day duration, but depending on the final program
it might be extended to a second day.
Submissions can be in the form of survey papers, experience reports,
and educational material to facilitate technology transfer. Accepted
papers will be published in the workshop proceedings by the IEEE
Computer Society. A selected number of the accepted papers will
possibly be expanded and revised for publication in the IEEE
Transactions on Knowledge and Data Engineering (IEEE-TKDE) and the
International Journal of Artificial Intelligence Tools. Educational
material related to papers published in the IEEE-TKDE will be posted
on the IEEE-TKDE home page.
The theme of the workshop is "AI MEETS DATABASES". Topics of interest
include, but are not limited to:
- Computer supported cooperative processing and interoperable
systems
- Data sharing, data warehousing and meta-data management
- Distributed intelligent mediators and agents
- Distributed object management
- Dynamic knowledge
- Evaluation and measurement of knowledge and database systems
- High-performance issues (including architectures, knowledge
representation techniques, inference mechanisms, algorithms and
integration methods)
- Information structures and interaction
- Intelligent search, data mining and content-based retrieval
- Knowledge and data engineering systems
- Quality assurance for knowledge and data engineering systems
(correctness, reliability, security, survivability and
performance)
- Software re-engineering and intelligent software information
systems
- Spatio-temporal, active, mobile and multimedia data
- Emerging applications (biomedical systems, decision support,
geographical databases, Internet technologies and applications,
digital libraries, etc.)
All submissions should be limited to a maximum of 5,000 words. Six
hardcopies should be forwarded to the following address.
Xindong Wu (KDEX-97)
Department of Software Development
Monash University
900 Dandenong Road
Caulfield East, Melbourne 3145
Australia
Phone: +61 3 9903 1025
Fax: +61 3 9903 1077
E-mail: [email protected]
Please include a cover page containing the title, authors (names,
postal and email addresses, telephone and fax numbers), and an
abstract. This cover page must accompany the paper.
************ I m p o r t a n t D a t e s *****************
* 6 copies of full papers received by: June 15, 1997 *
* acceptance/rejection notices: July 31, 1997 *
* final camera-readies due by: August 31, 1997 *
* workshop: November 4, 1997 *
************************************************************
Further Information
===================
WWW: http://www.sd.monash.edu.au/kdex-97
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Tue, 20 May 97 10:30:38 EDT
Subject: CFP: ICML-97 workshop on REINFORCEMENT LEARNING: TO MODEL OR
NOT TO MODEL, THAT IS THE QUESTION
Workshop at the Fourteenth
International Conference on Machine
Learning (ICML-97)
Vanderbilt University, Nashville, TN
July 12, 1997
www.cs.cmu.edu/~ggordon/ml97ws
Recently there has been some disagreement in the reinforcement
learning community about whether finding a good control policy
is helped or hindered by learning a model of the system to be
controlled. Recent reinforcement learning successes
(Tesauro's TD-gammon, Crites' elevator control, Zhang and
Dietterich's space-shuttle scheduling) have all been in
domains where a human-specified model of the target system was
known in advance, and have all made substantial use of the
model. On the other hand, there have been real robot systems
which learned tasks either by model-free methods or via
learned models. The debate has been exacerbated by the lack
of fully-satisfactory algorithms on either side for
comparison.
Topics for discussion include (but are not limited to)
o Case studies in which a learned model either contributed to
or detracted from the solution of a control problem. In
particular, does one method have better data efficiency?
Time efficiency? Space requirements? Final control
performance? Scaling behavior?
o Computational techniques for finding a good policy, given a
model from a particular class -- that is, what are good
planning algorithms for each class of models?
o Approximation results of the form: if the real system is in
class A, and we approximate it by a model from class B, we
are guaranteed to get "good" results as long as we have
"sufficient" data.
o Equivalences between techniques of the two sorts: for
example, if we learn a policy of type A by direct method B,
it is equivalent to learning a model of type C and computing
its optimal controller.
o How to take advantage of uncertainty estimates in a learned
model.
o Direct algorithms combine their knowledge of the dynamics and
the goals into a single object, the policy. Thus, they may
have more difficulty than indirect methods if the goals change
(the "lifelong learning" question). Is this an essential
difficulty?
o Does the need for an online or incremental algorithm interact
with the choice of direct or indirect methods?
full information at
www.cs.cmu.edu/~ggordon/ml97ws
Contact: Geoff Gordon ([email protected])
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|