[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference nsic00::eis_dw

Title:Executive Information Solutions & Data Warehousing Conference
Notice:Welcome to the Data Warehousing conference
Moderator:26002::HAGGERTY
Created:Thu Sep 01 1994
Last Modified:Thu Jun 05 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:499
Total number of notes:2932

410.0. "KDD Nuggets" by UTROP1::dhcppc.uto.dec.com::olthof_h (Spellchecked Henry Although) Fri Nov 08 1996 08:06

T.RTitleUserPersonal
Name
DateLines
410.1KDD Nuggets 96:34IJSAPL::OLTHOFSpellchecked Henry AlthoughFri Nov 08 1996 08:12820
410.2count me inFOUNDR::BARNETT_TTue Nov 12 1996 01:015
410.3Read note 406 and enroll on the WEB siteUTROP1::dhcppc.uto.dec.com::olthof_hSpellchecked Henry AlthoughTue Nov 12 1996 08:049
410.4KDD Nuggets 96:35IJSAPL::OLTHOFSpellchecked Henry AlthoughSun Nov 17 1996 12:36886
410.5KDD Nuggets 96:36IJSAPL::OLTHOFSpellchecked Henry AlthoughFri Nov 22 1996 08:40862
410.6KDD Nuggets 96:37IJSAPL::OLTHOFSpellchecked Henry AlthoughWed Nov 27 1996 11:10929
410.7KDD Nuggets 96:38IJSAPL::OLTHOFSpellchecked Henry AlthoughTue Dec 10 1996 07:11821
410.8KDD Nuggets 96:39IJSAPL::OLTHOFSpellchecked Henry AlthoughSat Dec 14 1996 07:441078
410.9KDD Nuggets 96:40IJSAPL::OLTHOFSpellchecked Henry AlthoughFri Dec 20 1996 09:07770
410.10KDD Nuggets 97:01IJSAPL::OLTHOFSpellchecked Henry AlthoughSun Jan 05 1997 19:16827
410.1197:02IJSAPL::OLTHOFSpellchecked Henry AlthoughFri Jan 10 1997 10:21655
410.1297:03IJSAPL::OLTHOFSpellchecked Henry AlthoughMon Jan 20 1997 14:17561
410.1397:04IJSAPL::OLTHOFSpellchecked Henry AlthoughMon Feb 03 1997 11:471444
	Knowledge Discovery Nuggets 97:04, e-mailed 97-01-28
News:
	* GPS, Information Week on Debunking Data-Mining Myths 
		http://www.techweb.com/se/directlink.cgi?IWK19970120S0042
	* N. Uffenheimer, EDS in the data warehouse, datamining, DSS areas
Publications:
	* J. P. Brown, Data Mining: What Needs To Be Done, And Why.
		http://www.hal-pc.org/~jpbrown
	* F. Famili, Intelligent Data Analysis Journal - First Issue is live,
		http://www.elsevier.com/locate/ida
Siftware:
	* B. Li, Parallel C4.5,
	 	http://merv.cs.nyu.edu:8001/~binli/pc4.5/
Positions:
	* E. Babb, Jobs in data mining in London,
		http://www.parsys.com/dafs.htm
	* D. Berleant, Tenure Track, Teaching and Research at U. of Arkansas
Meetings:
	* D. Stodder, Data Mining Summit program, 
		http://www.dbsummit.com
--
KDD Nuggets is a free electronic newsletter for the Data Mining and Knowledge 
Discovery in Databases (KDD) community, focusing on the latest research and 
applications.

Submissions are most welcome and should be emailed, 
with a DESCRIPTIVE subject line (and a URL, when available) to [email protected]
To subscribe, email to [email protected] message with 
	subscribe kdd-nuggets 
in the first line (the rest of the message and subject are ignored). 
See http://info.gte.com/~kdd/subscribe.html for details.

Nuggets frequency is approximately 3 times a month. 
Back issues of Nuggets, a catalog of S*i*ftware (data mining tools), 
and a wealth of other information on Data Mining and Knowledge Discovery 
is available at Knowledge Discovery Mine site http://info.gte.com/~kdd

	-- Gregory Piatetsky-Shapiro (editor)

********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) * 
* and not necessarily of their respective employers (or GTE Laboratories)   *
*****************************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Variations on old chestnut on how to use programming languages 
to shoot yourself in the foot ... 

HTML: You shoot yourself in the foot, but the bullet takes 10 minutes
to get there.

VRML: You have to fight your way through 3 levels of DOOM before you
can shoot yourself in the foot with a blaster cannon.

JAVA: You shoot yourself and everyone else on the internet in the foot.

JAVASCRIPT: You shoot yourself and everyone else on the internet in
the foot with rubber bullets.

PERL: You try to shoot yourself in the foot, but can't figure out
the instructions that came with the gun.

TCL: You shoot yourself in the foot with a cap gun.

			Thanks to L. Brothers
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 27 Jan 1997 17:11:18 -0500
From: [email protected] (Gregory Piatetsky-Shapiro)
Subject: Information week on Debunking Data-Mining Myths -
Content-Length: 23384

see 
http://www.techweb.com/se/directlink.cgi?IWK19970120S0042 
for full text

January 20, 1997, Issue: 614
                   Section: InformationWeek Labs


                   Debunking Data-Mining Myths --
                   Don't let contradictory claims about
                   data mining keep you from improving
                   your business 

                   By Robert D. Small

                   A great deal of what is said about data mining is
                   incomplete, exaggerated, or wrong. Data mining has
                   taken the business world by storm, but as with many
                   new technologies, there seems to be a direct
                   relationship between its potential benefits and the
                   quantity of often-contradictory claims, or myths,
                   about its capabilities and weaknesses. It's difficult to
                   fight these myths, which are based on
                   misunderstandings, hopes, and fears. The new
                   technology cycle typically goes like this: Enthusiasm
                   for an innovation leads to spectacular assertions.
                   Ignorant of the technology's true capabilities, users
                   jump in without adequate preparation or training.
                   Then, sobering reality sets in. Finally, frustrated and
                   unhappy, users complain about the new technology
                   and urge a return to "business as usual." When you
                   undertake a data-mining project, avoid a cycle of
                   unrealistic expectations followed by disappointment.
                   Understand the facts instead, and your data-mining
                   efforts will be successful. - Simply put, data mining
                   is used to discover patterns and relationships in your
                   data in order to help you make better business
                   decisions.

                   Myth: Data mining produces surprising results that
                   will utterly transform your business.

                   Fact: Most often, the results of data mining yield
                   steady improvement to an already successful
                   organization, often contributing important incremental
                   changes rather than revolutionary ones.

                   Nevertheless, data mining can lead to significant
                   change in several ways. First, it may give the talented
                   business manager a small advantage each year, on
                   each project, with each customer. Compounded over
                   a period of time, these small advantages turn into a
                   large competitive edge. For example, a catalog retailer
                   that can better target its mailing list can increase
                   profits by reducing the cost of mailings while
                   increasing the number of orders. Over time, this can
                   result in a substantially more profitable business.

                   Second, data mining occasionally does uncover one
                   of those rare "breakthrough" facts, such as scientists'
                   noticing the association between the fatal Reyes
                   Syndrome and children taking aspirin.

                   In short, data mining is a powerful search tool for
                   forward-looking companies.

                   Myth: Data-mining techniques are so sophisticated
                   that they can substitute for domain knowledge or for
                   experience in analysis and model building.

                   Fact: No analysis technique can replace experience
                   and knowledge of the business and its markets. On
                   the contrary, data mining makes education and
                   experience in many areas more important than
                   ever.While experts may need to learn new analytical
                   techniques to stay current and make leading-edge
                   contributions, someone who's an expert only in
                   analytical techniques, without having knowledge of
                   the business, is of no help.

                   Experience in building models, however, can ensure
                   more profitable use of data mining, since data
                   mining is simply the newest tool for building models.

                   The less domain knowledge a data mining expert
                   brings to a problem, the more important it is to
                   perform the data mining in close cooperation with
                   people who understand the business.

                   Similarly, the less skill and experience that business
                   experts have in modeling and using the associated
                   tools, the more help they need from data-mining
                   experts in leveraging their business knowledge.

                   For example, financial analysts seeking to increase the
                   return on their clients' investments may ask an expert
                   data miner to analyze a large, complex database on
                   previous clients. The data miner may discover that
                   certain variables predict success in investing, but it
                   takes a financier to know whether it's legal to influence
                   those variables.

                   Myth: Data-mining tools automatically find the
                   patterns you're looking for, without being told what to
                   do.

                   Fact: Data mining is most cost-effective when used
                   to solve a particular problem. Although a data-mining
                   tool can indeed explore your data and uncover
                   relationships, it still needs to be directed toward a
                   specific goal. Simply giving a data-mining tool a
                   mailing list and expecting it to find customer profiles
                   that improve the efficiency of a direct-mail campaign
                   is not particularly effective. You need to be more
                   specific in your goals. For example, to improve the
                   value of mailing-list responses, your model might
                   emphasize customers who have previously bought
                   expensive items; to increase the number of
                   responses, your model might emphasize customers
                   who have responded to previous mailings.

                   Myth: Data mining is useful only in certain areas, such
                   as marketing, sales, and fraud detection.

                   Fact: Virtually any process from pharmacology to
                   customer service can be studied, understood, and
                   improved using data mining. These techniques are
                   being applied to such diverse applications as
                   manufacturing process control, human resources, and
                   food-service management.

                   Data mining is useful wherever data can be collected.
                   Of course, in some instances, cost/benefit
                   calculations might show that the time and effort of the
                   analysis is not worth the likely return. For example,
                   suppose you suspect that if you collect just one more
                   piece of information about your customers, you could
                   double the number of orders you received. But you
                   also know that mailing to twice as many people will
                   also double the number of orders. If gathering the
                   data is more expensive than sending the extra
                   mailings, then it makes sense to increase the mailings
                   rather than mine the data.

                   Myth: The methods used in data mining are
                   fundamentally different from the older quantitative
                   model-building techniques.

                   Fact: All methods now used in data mining are natural
                   extensions and generalizations of analytical methods
                   known for decades. Neural nets, a special case of
                   projection pursuit regression, were developed in the
                   1940s. CART (classification and regression trees)
                   methods were used by social scientists in the 1960s.
                   K-nearest neighbor, a form of density estimation, has
                   been used for a half-century.

                   All these methods-just like regression
                   techniques-model relationships between a set of
                   profile variables and an outcome.

                   What's new in data mining is that we're now applying
                   these techniques to more general business problems,
                   thanks to the increased availability of data and
                   inexpensive processing power.

                   Furthermore, because communication between the
                   business community and methodologists, who are
                   mainly academics, has often been poor, there was,
                   until recently, no user-friendly software for
                   implementing these methods. The recent interest in
                   data mining is in part due to the improved user
                   interfaces that make these techniques more available
                   to business experts.

                   The rise of these powerful methods is a great step
                   forward, but the old tools are still valuable. Varieties
                   of regression techniques, discriminant analysis, and
                   even simple graphs can help reveal hidden patterns.
                   No single method solves all or even a majority of
                   problems. Successful data mining requires a portfolio
                   of tools, both old and new.

                   Myth: Data mining is an extremely complex process.

                   Fact: The algorithms of data mining may be complex,
                   but new tools have made those algorithms easier to
                   apply. Often, just the correct application of relatively
                   simple analyses, graphs, and tables can reveal a great
                   deal about our business. Much of the difficulty in
                   applying data mining comes from the same
                   data-organization issues that arise when using any
                   modeling techniques. These include data preparation
                   tasks-such as deciding which variables to include and
                   how to encode them-and deciding how to interpret
                   and take advantage of the results.

                   Myth: Only massive databases are worth mining.

                   Fact: It's true that many methods used in data mining
                   were specifically developed for analyzing very large
                   data sets, and that many data-mining applications
                   involve massive data sets. But a moderately sized or
                   small data set can also yield valuable information. For
                   example, buying patterns may depend most strongly
                   on the day of the week or the time of the year. A
                   modest database consisting of only "day" and "sales"
                   could show this pattern, give the retailer some idea of
                   its magnitude, and allow for planning of inventory and
                   staffing.

                   Even when building a massive database, try out some
                   simple analysis on the data while the database is still
                   moderate in size. You may decide to collect the data
                   differently or to collect different data altogether.

                   Myth: Data mining is more effective with more data,
                   so all existing data should be brought into any
                   data-mining effort.

                   Fact: More data items are useful only if they
                   contribute more information about the issues at hand,
                   or goals. Otherwise, they can be worse than
                   worthless. A database may have a great deal of
                   information about an item (or about the relationship
                   between items) but nothing about other items that are
                   actually closely related. For example, a company may
                   have information about how customers use one credit
                   card, but nothing about how those customers use
                   their other credit cards.

                   However, adding data with little information content
                   can actually lower the predictive power of the
                   database. By including irrelevant data or adding
                   multiple measurements of the same item, the utility of
                   the data-mining results will be reduced. For example,
                   if you include age as well as birth date, the analysis
                   tool will discover that both factors are equally relevant
                   and will therefore assign a lower weight to both
                   measures as predictors.

                   Myth: Building a data-mining model on a sample of a
                   database is ineffective, because sampling loses the
                   information in the unused data.

                   Fact: The thrust of almost all developments in the
                   study of sampling is to maximize the amount of
                   information gained per unit of effort expended.

                   Keep in mind that your data probably already
                   represents a sample of a larger population. When you
                   analyze your customer database to help acquire new
                   customers, you're basing your model on a sample of
                   the total population.

                   Under some circumstances, you may be forced to
                   sample. Not all your data may be relevant to the
                   problem at hand or reflect the population you're trying
                   to model. Many data warehouses include historical
                   data that reflects conditions-such as unexpired
                   patents-that no longer apply, rendering it
                   inappropriate for building a model to guide future
                   decisions.

                   Sometimes full-scale data-gathering is not practical.
                   For example, if you'd like to learn about customers'
                   satisfaction with your new product or service, but it
                   takes an hour to administer a customer satisfaction
                   survey, you'll most likely decide to limit your analysis
                   to a sample.

                   In fact, a relatively small random probability sample,
                   correctly taken, can yield excellent results. Although
                   there are 60 million or more voters in a presidential
                   race, the final poll before the election, which is based
                   on two-thousandths of 1% of those voters, is seldom
                   off by more than 2%. If we had a database of all 60
                   million voters and hundreds of measurements on each
                   one, we couldn't build a better model for predicting
                   the winner.

                   Even when it's possible to build the model on the
                   entire database, you may choose not to. It's often a
                   better use of resources to build and evaluate many
                   models using samples of the data, rather than rely on a
                   single model using all the data.

                   Myth: Data mining is another fad that will soon fade,
                   allowing us to return to standard business practice.

                   Fact: Although the name may change, data mining as
                   a vital application will not go away. Companies have
                   been using related quantitative techniques in many
                   parts of their businesses for a long time. Data mining
                   is just one more advance in a research process that
                   has been ongoing since the beginning of the 20th
                   century. A recent increase in the power of computers,
                   coupled with cheap electronic methods for capturing
                   large amounts of data, brings us to this step now.

                   Data mining can't be ignored-the data is there, the
                   methods are numerous, and the advantages that
                   knowledge discovery brings to a business are
                   tremendous. Companies whose data-mining efforts
                   are guided by "mythology" will find themselves at a
                   serious competitive disadvantage to those
                   organizations taking a measured, rational approach
                   based on facts.

                   Robert D. Small is VP of Research of Two Crows
                   Corp. in Potomac, Md. He can be reached at
                   [email protected].

                   SIDEBAR: Six Steps For Successful Data Mining

                   - Identify the goal

                   - Assemble the relevant data

                   - Choose your analysis methods

                   - Decide which software tool is best for implementing
                   the method

                   - Run the analysis

                   - Decide how to implement the results

                   Data: Two Crows Corp.

                   Copyright � 1997 CMP Media Inc.

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Thu, 16 Jan 1997 22:15:57 -0800
Subject: EDS in roads into the data warehouse, datamining, DSS areas

EDS, the largest computer service provider in the world, has established
a focused consulting practice in the area of data warehousing, data
mining and decision support systems. EDS built a world-class integration
lab (in the domain of the insurance industry)to demonstrate
applications, test tools, integrate solution components and build proof
of concepts. For a free white paper and additional information, please
contact Nathan Uffenheimer at (972)604-8915.


>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "jpbrown" <[email protected]>
Organization:  Ultimate Resources
Date:          Thu, 16 Jan 1997 13:57:24 -0006
Subject:       What Needs To Be Done, And Why.

Descriptive Introduction: The Databases that are the core of 
Data Warehousing are not just repositories. Together, they 
form an interactive machine that makes it possible to learn 
much more about the constituent population or populations.
This expands on:  http://www.hal-pc.org/~jpbrown

Text: Most data collections are hybrid in one way or another. 
I have spent several years studying many actual cases. Over and 
over again, I ran into the apples and oranges problem, where 
there are sub-populations that are very different, one from 
another. I do not need to tell you how confusing the results 
of analysis can be, if these situations are ignored.

I have continued to devise ways to detect the anomalies of the 
hybrid database, always assuming that some aspects of this problem 
may be present, or may develop with the passage of time. If they 
do develop as time goes on, there needs to be a method for 
detecting the onset of Change. I have developed, and expect 
to continue to develop, new methods to make effective, reliable 
analyses in cases where hybrid sub-populations are recognized.

In using these techniques you can: 

*  take an unfamiliar population and diagnose potential problems.

*  identify the causes of the problems.

*  apply different methods that will measure the analyzability of 
   naturally occurring hybrid populations.

*  suggest ways to increase the utility of data, or to point out 
   that some types of data are incurably unhelpful.

*  use different techniques (Autoclassification) to separate out
   sub-populations, based on predictability or other sources 
   of coherence.

*  make reliable predictions.

*  detect and remedy Changes in causal systems that would 
   otherwise reduce reliability.

So far, the great strides that have been taken in Databases, Data 
Marts and Data Warehouses, have been advances in Data Manipulation. 
The next great strides will be taken in SuperInduction, and they 
will be applied before, during, and after the various steps of 
manipulation.

The resulting Output:
 
*  will be based, without prejudice (objectively), on the Input.

*  will also have had the benefit of many kinds of new knowledge, 		
   developed during the analytical process.	
	
*  and will be ideally presented to produce the best possible 
   results for the corporate user (Decision Support).
 
If you have gone through the Web Site http://www.hal-pc.org/~jpbrown      
and you want to see some of the extra complex links, let me know at
[email protected]


>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 17 Jan 1997 08:46:53 -0500
From: [email protected] (Fazel Famili)
Subject: Intelligent Data Analysis Journal - First Issue is live

	Intelligent Data Analysis - An International Journal (New)

		    An electronic, Web-based journal
		     Published by Elsevier Science


URL:  http://www.elsevier.com/locate/ida
      http://www.elsevier.nl/locate/ida  


The first issue of Intelligent  Data Analysis journal is on  live.  This is 
a  quarterly journal  published  by  Elsevier Science Inc.  The  journal is 
planning to offer a number of new features that are not currently available 
in paper  journals: (i) an  alerting service  notifying subscribers  of new 
papers in the journal, (ii) links  to large-scale  data collections,  (iii) 
links to secondary collection of data related to material presented in  the 
journal, (iv) the ability to test new search  mechanisms on the  collection 
of journal articles, (v) links to related bibliographic material, and  (vi) 
inclusion of 3-D objects and multiple color graphs. 

Please refer to one of the above sites that contain articles for the  first  
issue and journal home page (e.g. Aims and Scope, Author Submission  Guide-
lines, and more).

Best wishes,

A. Famili
Editor-in-Chief

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 23 Jan 1997 10:02:10 -0500
From: [email protected] (Bin Li)
Subject: new siftware entry for PC4.5

Could you add an entry in the Siftware page for our parallel
C4.5 classification tool?  Thanks,
_______
Bin Li

----------------------------------------------------------------------------
Siftware: Parallel C4.5 (PC4.5)

*URL: http://merv.cs.nyu.edu:8001/~binli/pc4.5/

*Description: If you have C4.5 and a network of workstations that are 
accessible to you, PC4.5 will help you better use C4.5.  PC4.5 offers you 
these advantages:

    1. It is faster.  In an N trial c4.5 run, a single process builds N
       classification trees one by one and then picks the best one.  In
       PC4.5, the N trials are each handled by a process and each process 
       is run on a different machine (if N or more machines are available).

    2. It is fault-tolerant.  PC4.5 automatically assigns a process to
       a machine if the machine is idle (i.e. no activity by the machine's
       owner).  If the owner of a machine comes back or it fails during
       a PC4.5 computation, the PC4.5 process automatically retreats and
       resumes on a different machine that is idle.

    3. It supports multiple platforms.  PC4.5 runs on SunOS, Solaris and
       Linux machines (for HPUX, IRIX, and ALPHA, please contact author).
       Networked multi-platform workstations can run PC4.5 processes of a 
       single PC4.5 program at the same time.

PC4.5 is built with the Persistent Linda (PLinda) system, a software system
for robust distributed parallel computing developed at New York University.
To get more information on PLinda, please visit our web site at
http://merv.cs.nyu.edu:8001/~binli/plinda/ or send email to
[email protected].

Both PC4.5 and PLinda are research efforts led by professor Dennis Shasha.

Important: You must have the original C4.5 package in order to use PC4.5.  
To get C4.5, please contact Dr. J. R. Quinlan ([email protected]).

*Discovery tasks: Classification

*Platform(s): Unix (SunOS, Solaris, Linux; please contact author for HPUX,
IRIX, and ALPHA)

*Contact: Bin Li
	  715 Broadway, Rm 715
	  New York, NY 10003
	  (212) 998-3485
          email: [email protected] (preferred)

*Status: Public Domain (source code)

*Source of information: ftp://cs.nyu.edu/pub/plinda/pc4.5.tar.gz

*Updated: 1997-01-22 by Bin Li, [email protected]

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: 17 Jan 1997 12:04:57 +0000
From: "Ed Babb" <[email protected]>
Subject: kdd- job in data mining

OPPORTUNITY IN  DATA MINING!

PARSYS is a leading European supplier of parallel systems and technology.  They
are currently the lead partner in a large multinational ESPRIT project aimed at
building a parallel data mining file server. Consequently, they are looking for
people interested in data mining systems and with experience of parallel
computers, database technology and machine learning. 

The positions involve adapting learning techniques such as rule induction,
neural networks, genetic algorithms to run on a parallel computer. Also helping
to adapt an existing database system to run on a parallel machine. Enthusiasm
for producing fast algorithms in C is essential. 

At least a 2.1 degree in Computing, Artificial Intelligence or equivalent is
needed. In addition, several years relevant experience is desirable. Salary
will depend on age and experience. 

Please post your CV stating current salary to: Ed Babb, PARSYS  LTD, Boundary
House, Boston Road, Hanwell, London, W7 2QE, UK.  Alternatively email him on
[email protected] if you wish to make any brief informal enquires.

Please see http://www.parsys.com/dafs.htm for summary of the DAFS project.
*********************************************

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected] (BERLEANT DANIEL J)
Date: Tue, 21 Jan 1997 08:20:37 -0600
Subject: POSITION: Tenure Track, Teaching and Research

This is an informal request for inquiries from people interested in
the tenure track position offered by our dept. starting next
September. Feel free to spread the word.

If you are interested in teaching two software related courses per
semester (typically one undergrad, one grad) and in doing research in
empirical NLP, text processing, information retrieval from full text,
data/knowledge mining from full text, etc., AND you have/are getting
Ph.D. and a formal qualification in engineering (Bachelor's, Master's,
or Ph.D. degree with the word "engineering" in it or issued by a
dept., college, campus, or university with the word "engineering" in
its name, etc.), please email me to discuss applying.
  If you don't think you have an engineering degree, check - maybe
you'll be surprised.
  I am very interested in promoting applications from people in the
above mentioned areas and look forward to responding forthrightly to
your inquiry.

  Best Regards,
  Daniel Berleant
  Dept. of Computer Systems Engineering
  University of Arkansas, Fayetteville
  Phone: (501) 575-5590
  Fax:   (501) 575-5339
  Email: [email protected]

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 20 Jan 97 12:54:56 PST
From: "Dave Stodder" <[email protected]>
To: [email protected]
Subject: Data Mining Summit program

       As you know, the 1997 Data Mining Summit is coming up Feb. 18-21 in 
       San Francisco. The conference is sponsored by Miller Freeman Inc.'s 
       Database Programming & Design and DBMS magazines.
       
       We have a great lineup of speakers: Usama Fayyad, Evangelos 
       Simoudis, Kamran Parsaye, Larry Kershberg, Bob Vere, Gene Feruzza, 
       and others, including case studies. The complete program is located 
       at www.dbsummit.com.
       
       I am attaching files of the complete program, if it would be 
       possible to include it with KDD Nuggets.
       
       Thanks very much,
       
       David
       
       David Stodder
       Conference Chair, Data Mining Summit
       Editor-in-Chief, Database Programming & Design
       411 Borel Ave., Suite 100
       San Mateo, CA 94402
       (415) 655-4290, Fax (415) 655-4350
       Internet: [email protected]

Return-Path: <[email protected]>
Date: Mon, 27 Jan 1997 17:01:16 -0500 (EST)
X-Sender: [email protected] (Unverified)
X-Mailer: Windows Eudora Pro Version 2.2 (16)
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
To: [email protected]
From: Gregory Piatetsky <[email protected]>
Content-Length: 36234

Tuesday, February 18
Data Mining and the Internet:
New Dimensions in Knowledge Discovery
Chaired by David Stodder
Editor-in-Chief
Database Programming & Design

Successful application of data mining tools and knowledge discovery tools
methods can have a tremendous effect on an organization. Combined with the
Internet, data mining explodes into a new world of possibility. Electronic
commerce and other activity will create huge new resources of data that
businesses can mine for greater efficiency and customer service. But perhaps
more importantly, data mining combined with Internet-based applications has
the potential to deliver whole new areas of profitable decision support
services.

This special seminar will focus on the dynamic combination of data mining,
advanced databases, and the Internet. Bringing a series of experts together,
this all-day session will cover key topics, including:

   -- Development and use of intelligent software agents
   -- How data mining fits with the technology advances made by commercial
search engines and browsers
   -- Case studies of organizations that have created effective data mining
applications for Internet customers
   -- Developments in heterogeneous database access to enable wider use of
data mining
   -- Data mining and knowledge discovery methods that work best for
creating Internet-aware applications
   -- Advances in graphics and data visualization that will impact Internet
data mining applications

For the latest news about this seminar, including the scheduled speakers,
please check back with this Web site. The complete program will be in place
in early December.

Wednesday, February 19

8:30 - 9:35
OLAP and Data Mining: Bridging the Gap=20
Part I
Kamran Parsaye=20
CEO
Information Discovery Inc.=20

To date, most observers have viewed data mining and online analytical
processing (OLAP) as separate components of decision support. It has been
difficult to link the two largely because no coherent theory exists upon
which to build a relationship. In this keynote speech, Parsaye will
introduce a unified theory and methodology for OLAP and data mining. He will
describe in detail how the two activities can reinforce each other.

Parsaye will begin by describing the "dimensions" of decision support=
 and
how data mining activity fits into one of the dimensions. Data mining within
a single dimension is a rough approximation of multidimensional mining.
Parsaye will describe how a lack of attention to dimensionality in data
mining can result in unexpected results reminiscent of the "lossless join=
"
problem in the early days of relational databases.

In the second part of his presentation, Parsaye will present a formal
framework for mining OLAP data and will introduce a new set of
multidimensional normalization constructs that allow us to understand OLAP
discovery.

In this session you will learn:
- How OLAP, data mining, and other activities fit together in the four
"spaces," or dimensions, of decision support
- Limitations of normalization and star schemas for data mining activities
- New structures that go beyond star schemas
- A methodology for applying OLAP data mining, with three distinct=
 processes
of episodic, strategic, and continuous mining for specific user groups
within corporate environments.

Kamran Parsaye is CEO of Information Discovery Inc. He has developed
commercial data mining applications since the mid-1980s. Parsaye has a range
of experience in the software industry both in research and in business, and
has provided guidance to top-level management of leading industrial,
financial, and government organizations. He is coauthor of Intelligent
Database Tools & Applications (John Wiley & Sons, 1993).

9:45 - 10:50
OLAP and Data Mining: Bridging the Gap
Part II
Kamran Parsaye
CEO
Information Discovery Inc.
(For description, see above)

Break 10:50 - 11:10

11:10 - 12:15
Institutionalizing Knowledge Discovery: Creating a New Business Process
Tej Anand
Director of Knowledge Discovery
Human Interface Technology Center
NCR Corp.

Practitioners are slowly beginning to accept that knowledge discovery is
much more than just the application of machine learning or statistical
algorithms to a dataset. Researchers understand that a knowledge discovery
process exists, and they even agree on what basic tasks make up that
process. However, for knowledge discovery to move beyond finding
"interesting trivia" to become a business process akin to marketing, the
details behind the knowledge discovery process must be expounded. Anand will
take the process apart to reveal its details; he will offer practical ideas
for accomplishing business goals through a new understanding of the process.

In this session you will learn:
- Why knowledge discovery is so difficult (contrary to what you might have=
 heard)
- Why you cannot buy a tool to "do" knowledge discovery for you
- How process templates can remind the practitioner of tasks he or she must
complete and can provide a framework for making, recording, and auditing
decisions during the knowledge discovery process
- How process guides help the practitioner select data transformation
techniques, interpret data visualizations, select the correct machine
learning or statistical algorithm, and interpret results
- How embedding templates and guides into tools will allow knowledge
discovery to become an institutionalized business process.

Tej Anand is director of the knowledge discovery team at NCR Corp.=92s Human
Interface Technology Center. In 1993, he established this business and
technical consulting team to help retail, insurance, consumer packaged
goods, and other commercial enterprises realize business insights hidden in
their operational data. Team members also conduct research and development
to create knowledge discovery processes and data mining tools. Prior to
joining NCR, Anand developed data mining tools for A.C. Nielsen Co. He has
also been a member of the research staff at Philips Laboratories, where he
did research in the area of artificial intelligence software systems.

12:15 - 1:30
Lunch

Track A: Algorithms and Methods

1:30 - 2:35
Data Mining and the KDD Process: Algorithms and Limitations=20
Part I
Usama Fayyad
Senior Researcher
Microsoft Research

This two-part talk will provide an overview of the rapidly growing area of
knowledge discovery in databases (KDD). Fayyad will define KDD goals,
present motivations guiding the KDD process, and discuss how KDD relates to
data mining. He will then focus on the core data mining methods. These
methods have their origins in statistics, pattern recognition, artificial
intelligence (machine learning), databases, and parallel computing. Fayyad
will explore the limitations and challenges of each major data mining
method. He will break these methods down into classes and will cover a
sampling of algorithms for each class, outlining its advantages and=
 limitations.

The goal of this two-part presentation is to provide a detailed snapshot of
the current state of data mining methods, how they fit into the KDD process,
and what key challenges developers should be aware of when applying them.
Fayyad will focus primarily on the technical aspects of the algorithms
rather than their use in particular implementations.

In this session you will learn:
- Definitions of KDD and data mining and how the two areas fit together
- Dominant data mining methods used in the field and the specific problems
they address
- Critical limitations and challenges of each method
- How to avoid pitfalls when applying data mining methods.

Usama Fayyad is a senior researcher at Microsoft Research. His interests
include knowledge discovery in large databases, data mining, machine
learning theory and applications, statistical pattern recognition, and
clustering. Before joining Microsoft in 1996, he headed the Machine Learning
Systems Group at the Jet Propulsion Laboratory (JPL), California Institute
of Technology, where he developed data mining systems for automated science
data analysis. He remains affiliated with JPL as a distinguished visiting
scientist. Fayyad received the JPL 1993 Lew Allen Award for Excellence in
Research and the 1994 NASA Exceptional Achievement Medal. He was program
cochair of KDD-94 and KDD-95 (the First International Conference on
Knowledge Discovery and Data Mining). He is general chair of KDD-96, an
editor-in-chief of the journal Data Mining and Knowledge Discovery, and
coeditor of Advances in Knowledge Discovery and Data Mining (MIT Press,=
 1996).

2:45 - 3:50
Data Mining and the KDD Process: Algorithms and Limitations=20
Part II
Usama Fayyad
Microsoft Research
(For description, see above)

3:50 - 4:15
Break

4:15 - 5:00
Data Mining: The View from IBM

5:00 - 5:45
Data Mining: The View from Tandem Computers


Track B: Case Studies in Data Mining

1:30 - 2:35
Leveraging Customer Information for Competitive Advantage
Lisa Modisette
Director of Wireless Intelligent Solutions
Lightbridge Inc.

The cellular phone industry today looks much like the credit-card industry
of a few years ago. The market is growing at nearly 50 percent a year but
will reach a saturation point soon- just as the credit card industry has.
"Churn," or customer attrition, is a growing problem for the maturing
cellular phone industry. In this case study, Modisette will describe how
data mining techniques that worked so well in the credit card industry to
prevent and reverse customer attrition may be applied to the wireless
telecommunications industry.

Modisette will describe how Lightbridge Inc., a wireless communications
provider, has used data mining tools to retain good customers at minimal
cost. Data mining tools make use of existing customer transactional and
demographic data, allowing companies to quickly and easily discover customer
needs. Detailed customer knowledge will enable carriers to prepare for a
more saturated market and offer new businesses based on customer knowledge.

In this session you will learn:
- How Lightbridge uses data mining and churn modeling techniques to combat
customer attrition
- Specific predictive modeling techniques and their effectiveness
- How to get the most out of existing data and acquire a deeper knowledge=
 of
customer behavior.

Lisa Modisette is responsible for the development and marketing of
Lightbridge Inc.=92s Wireless Intelligence line of products and services,
designed to provide decision support and database marketing to wireless
carriers. She joined Lightbridge in 1994 and has driven the development of
the new decision-support product line since its inception. Modisette has
experience in identifying customer needs and in creating and maximizing the
use of decision-support systems, database marketing, and customer
segmentation. Modisette also has expertise in OLAP, business intelligence,
database marketing, product management, sales training, and a variety of
information technology. Before joining Lightbridge, she was director of the
telecommunications industry practice at Metaphor Inc., an IBM subsidiary.
She has a B.A. in marketing from the University of Colorado.

2:45 - 3:50
Business Experiences with Data Mining
Evangelos Simoudis
Director of Data Mining Solutions
IBM Corp.

Health care and insurance are two industries that offer interesting
opportunities for data mining applications. In this presentation, Simoudis
will describe how two businesses have developed production data mining
systems. The Health Insurance Commission (HIC), an agency of the Australian
government, processes claims for Australia=92s Medicare, Medibank Private,
Pharmaceutical Benefits, and Child Care programs. HIC uses data mining to
help reduce costs by ensuring that all medical tests and services are
appropriately prescribed and accurately billed.

John Hancock, an insurance and financial services provider, has a marketing
and services database to support the company=92s cross-selling efforts and=
 to
accurately identify future customer service requirements. Hancock developed
a survey of 55,000 targeted users; it uses data mining to provide profiles
based on survey results.

In this session you will learn:
-- Case study examples of data mining methods used for reducing costs and
profiling customers
-- The technology/business integration important for data mining success
-- Important processes to ensure accurate results from data mining

Evangelos Simoudis is IBM=92s director of Data Mining Solutions. Before
joining IBM, Simoudis led Lockheed Corp.=92s data mining research, and was
responsible for the commercial introduction and marketing of Lockheed's
Recon data mining system for financial and retail markets. Simoudis also
spent six years as a member of the principal research staff at Digital
Equipment Corp.'s Artificial Intelligence Center. He conducted research on
machine learning, pattern recognition, knowledge-based systems, and
distributed artificial intelligence; Digital has incorporated his research
work in products for engineering design and diagnostics. Simoudis has
written extensively on data mining and machine learning, and is the North
American editor of the Artificial Intelligence Review.

3:50 - 4:15
Break

4:15 - 5:00
Data Mining: The View from Angoss Software


Thursday, February 20

8:30 - 9:35
Keynote Speech
Speaker TBA

9:45 - 10:50
Weaving Detail into the Big Picture
Denise M. Barnhart
Chief, Corporate Analysis Division
Army and Air Force Exchange Service

"There=92s too much data ... but it=92s just not enough." With the=
 continued
growth of very large databases (VLDBs) and the mushrooming need for quick
access to progressively smaller details of the retail business, corporations
risk losing sight of the larger view, the brighter opportunity, or the
insidious trend. The Army and Air Force Exchange Service (AAFES), which
provides $6 billion in goods and services to military servicemen and
servicewomen around the world, has taken on this challenge. In a case study
presentation, Barnhart will describe AAFES=92s extensive use of massively
parallel analytical processing and data mining. The organization uses this
advanced technology for retail research and integrating analysis results
with operational and strategic processes.

In this session you will learn:
- How AAFES uses neural nets to understand demographics and project market
potential
- Neural net applications that let an organization view data both at the
total business level and at the detailed level of specific items in a retail
store
- How AAFES calculates relationships between retail items and categories=
 and
links these categories to demographic characteristics
- Techniques for the cross-utilization of multiple databases for=
 configuring
retail stores to maximize corporate earnings per square foot
- How to overcome challenges in integrating database patterns into the
corporate strategic vision.

Denise Barnhart is chief of the Corporate Analysis Division, part of the
Army and Air Force=92s Exchange Service=92s (AAFES=92s) Strategic Planning
Directorate. AAFES is profit-generating agency of the Defense Department.
Barnhart joined AAFES in 1976 as a CPA and has since specialized in the
strategic optimization of stores for the benefit of both customer
satisfaction and bottom line. She was an early proponent of the day-to-day
use of neural nets in planning store construction in the late =9280s. Today,
AAFES wholly plans mall sales and earnings levels, store mix, sizing, and
parking requirements with neural net analyses. With the refinement of retail
point-of-sale in the =9290s, Barnhart has extended corporate strengths in
local markets.

10:50 - 11:10
Break

11:10 - 12:15
The Visualization of Large, Complex Datasets
Georges Grinstein
Professor, Institute for Visualization and Perception Research=20
University of Massachusetts Lowell

Visualization is the translation of data, sampled or generated, into some
perceptual presentation, most typically visual, to provide insights into the
data. It represents the mapping of data into a symbolic representation
useful for researchers, analysts, scientists, and business managers. This
"mapping," or interaction, can occur at several stages of the=
 visualization
presentation pipeline; it directs the transformations or alters the
presentation of data.

Visualization is no longer simply an application of computer graphics. While
computer graphics remain the underpinning technology of this discipline,
visualization now includes- and must support- databases, real-time
interaction, networking, supercomputing, multimedia, visual programming,
systems theory, and human perception. This development has provided some
very fertile ground for integrating knowledge discovery, statistics, and
visualization.

In this talk Grinstein will highlight key research issues in the
visualization of large, complex informational spaces.

In this session you will learn:
- A brief history of visualization, from initial efforts to extend data
presentation beyond the classic pixel-driven techniques to the current
challenge of encompassing domain knowledge
- How visualization and data mining can work together to provide rich
user-exploration and analysis environments
- How to make astute use of visualization techniques.

Georges Grinstein is a professor of computer science at University of
Massachusetts in Lowell, Massachusetts. He also serves as director of the
university=92s Institute for Visualization and Perception Research and is
principal engineer with MITRE Corp.'s Center for Air Force C3I Systems.=20

Track A: Algorithms and Methods

1:30 - 2:35
Improving Prediction Performance with Genetic Algorithms=20
Steven Vere
President
Ultragem Data Mining Co.

Data mining with genetic algorithms is a new technology aimed at improving
prediction performance. However, many of today's commercial data mining
products actually incorporate older machine learning algorithms, such as ID3
and CART. These systems use heuristic algorithms to generate decision rules.
Being heuristic, they do not guarantee the best in prediction performance;
in most cases, we now know they do not. Ten years ago, these technologies
represented a good trade-off between prediction performance and training
speed. But in today=92s high-speed computing environment, it is possible to
use the controlled, brute computational force of genetic algorithms to find
the higher performing prediction rules that heuristic algorithms overlook.
In this presentation Vere will describe techniques for efficiently applying
the genetic algorithm paradigm to large data mining problems.

In this session you will learn:
- The definition and description of genetic algorithms
- Applications of genetic algorithms to data mining and numerical=
 prediction
problems
- How specific techniques, such as averaging the predictions of sets of
genetically generated classifiers, can significantly enhance performance.

Steven Vere is president and founder of Ultragem Data Mining Co., a data
mining consulting company specializing in the commercial application of
evolutionary algorithms. He has over 20 years of experience in machine
learning and artificial intelligence. Vere has served as a member of the
computer science faculty at the University of Illinois, Chicago and has also
held senior technical and management positions at the NASA Jet Propulsion
Laboratory, Lockheed R&D Division, and Bank of America. His work has
appeared in research journals, AI Encyclopedia, and Scientific American; he
will be featured on a future episode of Beyond 2000, a television
documentary series. Vere holds a Ph.D. in computer science from University
of California at Los Angeles.

2:45 - 3:50
Data Mining: Finding the Total Business Solution
Gene Feruzza
President, Customer Management Services

Too often, we view data mining as only data visualization, predictive
modeling, or some other specific technique. Although these components are
important, supporting the total business solution requires that we take a
much broader scope. In this talk, Feruzza will on data mining processes in
real-world applications developed in telecommunications, financial services,
utilities, and online services. He will describe the cyclical nature of
successful data mining, first focusing on the data infrastructure (data mart
or warehouse) and data access and manipulation. Feruzza will then describe
the role, and integration, of modeling processes and technologies, including
rule-based techniques, traditional statistics, neural networks, and genetic
approaches. He will discuss experiences with delivering the knowledge
obtained from the technology to the business user, and how promote the
strategic integration of technology and business applications.

In this session you will learn:
-- How to view the full scope of data mining needs to be to be successful.
-- Why it=92s important to embrace and support all modeling technologies,=
 not
just one
-- Solutions to common pitfalls based on data mining experiences
-- Best practices for delivering knowledge gained to the business user
-- Why data mining should be a cyclical, "living" process.

Gene Feruzza has extensive experience with advanced segmentation techniques
utilizing basic statistics and regression modeling, rule-based segmentation,
neural network modeling along with evolutionary and hybrid modeling
architectures. For 12 years he has provided integrated marketing and
business solutions for clients in telecommunications, electric utilities,
financial services, aerospace, manufacturing, and retail. He has worked for
two leading neural network hardware and software providers (HNC and Neural
Ware) as an instructor and consultant. He has also developed and marketed
his own database management and segmentation software. Feruzza graduated
from the University of Pittsburgh with a BS in computer science and=
 mathematics.

4:15 - 5:00
Data Mining: The View from NeoVista

7:30 - 9:00
1:30 - 3:00
Birds of a Feather
Breakout Sessions
Success with data mining depends on an intimate knowledge of specific
industry application requirements. After the first Data Mining Summit last
April, we received many requests to include in the program organized
"networking" sessions for attendees to discuss specific industry=
 challenges.
To close out the Second Annual Data Mining Summit, we invite attendees to
join in our special Birds of a Feather sessions, which will focus on data
mining issues faced by specific industries. A vertical industry expert will
lead each discussion group.

Come and share your questions and experiences with other like-minded data
mining practitioners! Depending on popularity, we plan to offer Birds of a
Feather sessions about data mining in the following industries:

- Retailing
- Health care
- Financial services
- Telecommunications

To help us organize the Birds of a Feather sessions ahead of the conference,
please use the registration form to choose which vertical industry session
you would like to attend.

Track B: Case Studies in Data Mining

1:30 - 2:35
Artificial Intelligence and Process-Delay Analysis: A Decision-Tree Case=
 Study
Bob Evans
Member, Advanced Technology Staff
RR Donnelley & Sons Co.

Cylinder wear (called "banding") causes serious delays in the=
 rotogravure
printing process and has plagued the industry for decades. A process-delay
analysis initiative at RR Donnelley & Sons=92 Gallatin, Tennessee plant has
reduced the incidence of cylinder banding to near negligible levels. In this
presentation, Evans will describe the Evans-Fisher Process Analysis Model, a
solution driven by decision-tree induction. Through case study examples, he
will describe the use of this powerful artificial intelligence method for
data mining. Evans will also address some of the business and social issues
associated with data collection and analysis.

At RR Donnelley, database technology is the vehicle for solving process
problems. Evans will show how decision-tree induction may be viewed as
automated query generation. Attendees will see examples of queries generated
by this tool. Evans will explain how decision-tree induction guides users
away from the "blind alleys" that can frustrate data mining efforts.=20

In this session you will learn:
- How to astutely define and collect data for decision-tree induction
- Case study examples of how the Evans-Fisher Process Analysis Model was
developed and applied
- How to use artificial intelligence and data mining to solve complex
industrial problems.

Bob Evans is on the advanced technology staff of RR Donnelley & Sons Co. in
Gallatin, Tennessee. He is also an adjunct assistant professor of computer
science at Volunteer State Community College in Tennessee. A 33-year
employee of RR Donnelley, he is responsible for implementing and upgrading
process-delay analysis using current data mining technology. He has
published several articles and has given presentations on shop-floor
applications of artificial intelligence. Computer scientists frequently cite
his application of decision-tree induction to cylinder bands as a successful
example of the transfer of data mining technology from the research
laboratory to an industrial environment. Evans holds an A.B. degree in
mathematics from Indiana University and a Master of Engineering degree in
computer science from Vanderbilt University.

2:45 - 3:50
Fraud Detection Systems: Combining Data Mining and Machine Learning
Tom Fawcett, Foster Provost
Members of the Technical Staff
Machine Learning Project
NYNEX Science and Technology

In this presentation, Fawcett and Provost will describe a framework that
combines data mining and machine learning techniques to design fraud
detection methods. Fraud detection is based on profiling customer behavior
and checking for anomalies. The domain of this case study is cloning fraud
in cellular telephony, but the methods involved are more widely applicable:
any domain in which fraudulent usage is superimposed upon legitimate usage
(as in credit card fraud) is a candidate. Fawcett and Provost use a
rule-learning program to uncover indicators of fraudulent behavior from a
large database of cellular calls. They will show how they use these
indicators to construct profilers and how their system combines evidence
from multiple profilers to generate high-confidence alarms.

In this session you will learn:
- How to create a profitable synergy of data mining and machine learning
- How to address the intricacies of building data mining systems under
real-world constraints
- Complications that arise when trying to assign cost/benefit trade-offs=
 (the
cost of handling a false alarm differs from the cost of missing fraudulent
usage, which varies among fraud cases).

Tom Fawcett works in machine learning, data mining, and knowledge-based
systems. He has worked at NYNEX Science & Technology, GTE Laboratories, and
MITRE Corp. Fawcett holds a Ph.D. from the University of Massachusetts at
Amherst. While at GTE, his machine-learning system was used for automated
adaptation in telecommunications network management. He developed and
maintained a large knowledge-based mission planning system for MITRE.
Fawcett has published articles addressing the representation problem in
machine learning and has done research in case-based reasoning.

Foster Provost works on machine learning and data mining at NYNEX Science
and Technology, where, in addition to developing methods for the automated
design of fraud detection systems, he has also made advances by combining
data mining techniques with decision-analytic techniques for cost-effective
technician dispatch. Prior to joining NYNEX, Provost worked on data mining
in scientific domains, including botanical toxicology, high-energy physics,
and infant mortality. His work produced advances in rule learning, scaling
up machine learning methods to large databases, using background knowledge
to guide learning, and selecting inductive bias. Provost holds a Ph.D. from
the University of Pittsburgh, where he held IBM and Mellon graduate
fellowships. He received a B.S. in physics and mathematics from Duquesne
University. He is a recent recipient of NYNEX's President's Award.


4:15 - 5:00
Data Mining: The View from DataMind

7:30 - 9:00
Birds of a Feather
1:30 - 3:00
Success with data mining depends on an intimate knowledge of specific
industry application requirements. After the first Data Mining Summit last
April, we received many requests to include in the program organized
"networking" sessions for attendees to discuss specific industry=
 challenges.
To close out the Second Annual Data Mining Summit, we invite attendees to
join in our special Birds of a Feather sessions, which will focus on data
mining issues faced by specific industries. A vertical industry expert will
lead each discussion group.

Come and share your questions and experiences with other like-minded data
mining practitioners! Depending on popularity, we plan to offer Birds of a
Feather sessions about data mining in the following industries:

- Retailing
- Health care
- Financial services
- Telecommunications

To help us organize the Birds of a Feather sessions ahead of the conference,
please use the registration form to choose which vertical industry session
you would like to attend.

Friday, February 21

8:30 - 9:35
Data Mining 1997/98: Key Trends & Market Perspectives
Aaron Zornes
Executive Vice President and ADS Service Director
Meta Group

Although the data mining market garnered less than $100 million in 1996,
industry analysts at Meta Group forecast the market will explode to more
than $800 million by the year 2000. During 2Q96, Meta Group surveyed 250+
Global 2000=96size business users of data mining products and services in
retailing, healthcare, financial services, and telecommunications. This
presentation will highlight key survey findings regarding adoption criteria,
timelines, technical parameters, and leading business applications. Meta
Group=92s study investigated not only the traditional uses of data mining
technology, such as fraud prevention and credit card authorization within
the financial services industry, but also investigated rapidly emerging
requirements stemming from data warehouse implementations and Web-enabled
commerce and marketing.

In this session you will learn:
- How to interpret early user adoption rates by industry segments
- What will be the impact of emerging systems integrators and data bureaus
- What=92s behind current data quality, data warehouse, and data=
 visualization
trends

Aaron Zornes is executive vice president and ADS service director for Meta
Group. He is a leading authority on the software industry as it relates to
applications development and delivery- especially data warehousing and
second-generation multitier client/server applications. Zornes has devoted
more than 20 years to line and strategic management roles in leading vendor
and user organizations, including executive and managerial positions at
Ingres Corp., Wang Laboratories Inc., Software AG of North America, and
Cincom Systems Inc. He is a frequent author and keynote speaker on data
warehousing, data mining, advanced client/server tools, and customer-centric
application architectures. Since 1992, He has been conference chair of DCI's
Data Warehouse World conference series.

9:45 - 10:50
Knowledge Rovers: Configurable Agents to Support Enterprise Information
Infrastructures
Larry Kerschberg
Professor and Chair, Information and Software Systems Engineering
School of Information Technology and Engineering
George Mason University

Knowledge rovers represent a family of cooperating intelligent agents that
can support a collection of scenarios, decision-makers, and tasks. These
rovers play specific roles within the enterprise information infrastructure
to support users, maintain complex views, and mine and refine data into
knowledge. Rovers can roam the Internet, seeking, locating, negotiating for,
and retrieving data and knowledge specific to their mission.

For decision-makers to make appropriate use of information, the current
flood of data must be filtered and transformed. In this presentation,
Kerschberg will describe knowledge rovers and the data mining and software
agent technology that creates them. He will highlight important rovers and
how they fit into data warehouse, data mine, and data mart architectures.
Kerschberg will describe Field Agent rovers that discover new resources,
collect data, and bring back information; Information Curator rovers that
refine data into knowledge and place it in an information repository; and
Domain Servers that from within the repository facilitate access to multiple
data types, such as images, text, formatted data, and simulation data
related to a particular domain. Finally, Kerschberg will discuss Sentinal
rovers that monitor Domain Servers for interesting events, patterns, and
specified conditions to alert decision-makers and take actions on their=
 behalf.

In this session you will learn:
- The role of intelligent agents in supporting enterprise information
architectures
- How to integrate a family of configurable rovers for discovery,
integration, and evolution of information
- The interrelationship among concepts such as data warehouses, data mines,
and information repositories in the enterprise information infrastructure
- The concept of virtual data mines and data mining over multiple
heterogeneous data sources.

Larry Kerschberg is professor and chair of the Department of Information and
Software Systems Engineering in the School of Information Technology and
Engineering at George Mason University in Virginia. He is also director of
the university=92s Center for Information Systems Integration and Evolution.
His research focuses on intelligent agents, intelligent information
integration, data mining and knowledge discovery in databases, and expert
database systems. His research is funded in part by DARPA. Kerschberg is
also President of KRM Inc., which pursues research and development in
knowledge rovers and mediators in intelligent information systems. He is
editor-in-chief of the International Journal of Intelligent Information
Systems, published by Kluwer Academic Publishing Co. Kerschberg organized
and has served as program chair of the First and Second International
Conferences on Expert Database Systems. He holds a Ph.D. in engineering from
Case Western Reserve University.


10:50 - 11:10
Break

11:10 - 12:15
Privacy Issues and Data Mining
Panel Session Chaired by
David Stodder,
Editor-in-Chief,
Database Programming & Design

Data mining tools, when combined with large, sophisticated databases,
already offer businesses and other organizations powerful new abilities to
learn more about clients, customers, citizens, and taxpayers. The Internet
and Web-enabled commerce will create vast sources of data and new ways to
package information databases as products and services. Privacy and security
specialists are becoming increasingly concerned that basic privacy rights
could be trampled in the race to provide modern, intelligent information
services. Businesses must take new security measures to protect proprietary
data- and learn how to resolve the tug-of-war with competitors and service
contractors over just who owns the data.

This panel session will feature a selection of experienced users, security
experts, and data mining professionals, who will focus on privacy and
security concerns that broadly effect the practice of data mining. The panel
will discuss what measures governments and business are taking- and should
take- with regard to data mining and the development of new information=
 services.

David Stodder is editor-in-chief of Database Programming & Design. He has
been with the publication since its inception in 1987. He has served on the
advisory board of several industry conferences, including IDUG North
America, DCI=92s Database and Client/Server World, and Blenheim/NDN=92s=
 DB/Expo.
He is also chair of Miller Freeman Inc.=92s VLDB Summit, Object/Relational
Summit, and Business Rules Summit conferences.
410.1497:05IJSAPL::OLTHOFSpellchecked Henry AlthoughWed Feb 05 1997 09:171172
	Knowledge Discovery Nuggets 97:05, e-mailed 97-02-04
News:
	* W. Kloesgen, KDD-97: Call For Panel Proposals 
	* E. Colet, Announcing a regular posting of NBA data mining patterns,
		http://www.nba.com/news_feat/
	* GPS, Business Week Feb 3, 1997 Story on Data Mining
	* B. Griffin, Tools for quantifying newgroups and email postings? 
	* M. Rebhan, GeneCards: genes, proteins and diseases.
		http://bioinformatics.weizmann.ac.il/cards
Publications:
	* A. Basu, CFP: INFORMS Journal on Computing Special issue on 
		Knowledge Discovery and Data Mining 
	* M. Singh, CFP: IEEE Internet Computing, Special issue on Agents
 		http://www.computer.org/pubs/internet/
Positions:
	* W. Buntine, PhD/Masters Research Assistantship at Berkeley
Meetings:
	* D. Gordon, CFP: ICML-97 Workshop on ML application in the real world
		http://www.aifb.uni-karlsruhe.de/WBS/ICML97/ICML97.html
	* M. Smyth, Learning Methods Course by Hinton and Jordan,
			Washington, D.C., May 2 -- 3, 1997 
	* J. Zytkow, Forthcoming events related to Data Mining 
		 PKDD'97, ISMIS-97 and KDD-97
--
KDD Nuggets is a free electronic newsletter for the Data Mining and Knowledge 
Discovery in Databases (KDD) community, focusing on the latest research and 
applications.

Submissions are most welcome and should be emailed, 
with a DESCRIPTIVE subject line (and a URL, when available) to [email protected]
To subscribe, email to [email protected] message with 
	subscribe kdd-nuggets 
in the first line (the rest of the message and subject are ignored). 
See http://info.gte.com/~kdd/subscribe.html for details.

Nuggets frequency is approximately 3 times a month. 
Back issues of Nuggets, a catalog of S*i*ftware (data mining tools), 
and a wealth of other information on Data Mining and Knowledge Discovery 
is available at Knowledge Discovery Mine site http://info.gte.com/~kdd

	-- Gregory Piatetsky-Shapiro (editor)

********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) * 
* and not necessarily of their respective employers (or GTE Laboratories)   *
*****************************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If there is a 50-50 chance that something can go wrong, then 9
times out of ten it will. (Paul Harvey News, 1979)
 	Excerpted from "Quotes, damned quotes and..." by John Bibby.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 3 Feb 1997 15:01:47 +0100
From: [email protected] (Willi Kloesgen)
Subject: KDD-97: Call for Panel Proposals

As in previous KDD conferences, the KDD-97 program will include panel 
discussions.  A great panel requires an interesting topic, good
speakers, and proper preparation.   To facilitate all three we solicit
early suggestions. Please submit suggestions for topics and preferably also
for panelists who could represent diverse positions or approaches of the
topic. Suggested topics should relate to any of the main KDD-97 topics (see
http://www-aig.jpl.nasa.gov/kdd97). 
The panel topics should be of general interest for a
large part of the KDD audience and allow several (controversial) approaches
to be discussed.

Please email informal suggestions by  April 2, 1997 (earlier if possible) to:

Willi Kloesgen

[email protected]

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "Edward Colet"<[email protected]>
Date: Wed, 29 Jan 1997 18:00:02 -0400
Subject: Announcing a regular posting of NBA data mining patterns.

National Basketball Association teams have been using IBM's Advanced Scout
data mining application to discover trends and patterns in game data.
Now a selected set of discovered patterns are also made available to fans
via a regular posting on the Internet before and after NBA/NBC's game of
the week.  The reported patterns are based on analyses of the teams
previous game(s), and additional commentary is added following the game.

The patterns can be found in the regular feature of the NBA website
entitled, "Beyond the Boxscore" (found under "News and Features").  The
NBA website is at "http://www.nba.com", and the data mining results are
under "http://www.nba.com/news_feat/".  There are also links to more
information on Advanced Scout at "http://www.nba.com/ad/ibm", and at "
http://www.research.ibm.com/scout/home.html/".

Regards,
Ed Colet.

 *********************************************
 IBM T.J. Watson Research Center
 30 Saw Mill River Road
 Hawthorne  NY  10532
 phone: 914-784-6621;  tie-line 863
 fax: 914-784-7455
 email: [email protected]
 *********************************************
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 3 Feb 1997 09:57:37 -0500
From: [email protected] (Gregory Piatetsky-Shapiro)
Subject: Business Week Feb 3, 1997 Story on Data Mining

Last week's Business Week has a very nice story by John  Verity on 
"Coaxing Meaning out of Raw Data" (p. 134). 
It described several successful customer modeling applications 
at MCI, cellular fraud detection, US West, JPL, Walmart, and more
and featured quotes 
from Usama Fayyad, Herb Edelstein, Steven Vere, and others. 

"A huge opportunity is opening up", according to Usama, 
but "the devil really is in the details", according 
to NeoVista CEO John Harte.

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 31 Jan 1997 11:41:47 -0800
From: [email protected] (Brian Griffin)
Organization: Netscape
Subject: Recommendation

Can you please recommend the best PD and commercial data mining tool for
quantifying newgroups and email postings.

Thank you very much,
Brian Griffin
Manager, Technical Support
Netscape Communications Corp.

[GPS -- if you do know such tools, please cc to [email protected] and 
I will summarize to the list]
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 29 Jan 1997 05:05:46 +0200
From: Michael Rebhan <[email protected]>
Organization: Weizmann Institute of Science
Subject: GeneCards: genes, proteins and diseases.

http://bioinformatics.weizmann.ac.il/cards

This database aims at integrating knowledge about all human genes, their
products, and their involvement in diseases. And although it already
integrates what is easily available in different heterogenous databases,
the authors are planning to use technology from Artificial Intelligence,
including Knowledge Discovery in Databases (KDD) tools, to expand the
current resource. We would like to hear opinions from people inside the
AI/KDD community regarding the following projects:

a) a user guidance system that recognizes problems caused by "poorly
designed" search strategies entered to suggest intelligent options to
the user that might take him/her as fast as possible to the wanted
information (this system should thus somehow replace an expert in the
retrieval of biomedical information as much as possible).

b) knowledge extraction tools taking data from free text, like from
abstracts of papers in Medline, to gather data about the relationships
between genes/proteins (which one interacts directly with which one
a.s.o.), and about the role of a particular gene/protein in the
pathogenesis of a particular disease

Although both projects are still more or less ill-defined, we are very
interested in your ideas. If you are also fascinated by this challenge,
please email Michael Rebhan ([email protected]).

Michael Rebhan, Ph.D.     Weizmann Institute of Science, Dept. Biol.
Serv.,
Bioinformatics Unit, Rehovot 76100, Israel        (FAX: +972-8-934-4113)
WWW: http://bioinfo.weizmann.ac.il/cards/rebhan.html
Email: [email protected]

>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 27 Jan 1997 08:48:06 -0700
From: Amit Basu <[email protected]>
Subject: cfp for INFORMS Journal on Computing

Call for Papers on Knowledge Discovery and Data Mining
for the INFORMS Journal on Computing

The knowledge and data management area of the INFORMS Journal on Computing
invites technical papers on the analysis, design and management of knowledge
discovery and data mining methods and systems. Selected papers will be
published in a special cluster on this topic. The journal is an official
publication of the Institute for Operations Research and Management
Sciences, and  focuses on the interface between operations
research/management science and computer science. Papers that deal with
algorithms for system design, methods for efficient information management,
and analytical or empirical studies of system performance are welcome.
Topics of interest include (but are not limited to):

* performance analysis of KD/DM algorithms (efficiency, scalability,
reliability, etc.)
* the use of optimization methods in KD/DM 
* comparative studies of KD/DM versus other exploratory data analysis
methods, including 
    traditional statistical and mathematical programming models
* analysis of  context-specific KD/DM methods
* neural networks in KD/DM
* performance analysis of uncertainty management methods in KD/DM
* analysis of KD/DM algorithms in large-scale, distributed and/or
heterogeneous database systems
* efficiency and scalability analysis of KD/DM algorithms for specialized
databases 
    (spatial, temporal, multimedia, statistical, etc.)
* analysis of data mining methods on confidential data 
* efficient data preprocessing methods (e.g., scrubbing, sampling and
reduction) for data mining
* performance of KD/DM methods on multidimensional data

Manuscripts should be prepared according to JoC guidelines. 

Deadline: July 31, 1997. Four (4) copies of each manuscript should be
submitted to Professor Amit Basu, the Area Editor for Knowledge and Data
Management, at the following address:

Owen Graduate School of Management		
Vanderbilt University				
Nashville, TN 37203				
TEL: 615-322-7043					
FAX: 615-343-7177					
email: [email protected]	

For more information, please contact Professor Basu at the above address, or
the Editor-in-Chief of JoC, Professor Bruce Golden, at the address below:

College of Business and Management
University of Maryland
College Park, MD 20742
TEL: 301-405-2232
FAX: 301-314-9157
email: [email protected]
------------------------------------------------------------------------------
Amit Basu
Associate Professor
Owen Graduate School of Management
Vanderbilt University
Nashville, TN 37203
TEL: 615-322-7043
FAX: 615-343-7177

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Subject: IEEE Internet Computing: Agents
From: [email protected] (Munindar Singh)
Date: Wed, 29 Jan 1997 10:27:57 -0500 (EST)

			   IEEE Internet Computing
		   http://www.computer.org/pubs/internet/

			       CALL FOR PAPERS

IEEE Internet Computing is a new bimonthly magazine from the IEEE Computer
Society designed to help the engineer productively use the ever expanding
technologies and resources of the Internet. Internet Computing and IC on-line
will provide developers and users with the latest advances in Internet-based
computer applications and supporting technologies such as the World Wide Web,
Java programming, and Internet-based agents. Through the use of peer-reviewed
articles as well as essays, interviews, and roundtable discussions, IC will
address the Internet's widening impact on engineering practice and society.

IC is soliciting regular papers and papers for theme issues, including one on
agents.  To submit, send e-mail to any member of the editorial board.
Include a plain text abstract, and a URL from which the paper can be viewed.

Members of the editorial board are listed on the IC web page.  Author
guidelines are available at http://www.computer.org/pubs/internet/auguide.htm

Topics include system engineering issues such as agents, agent
message protocols, engineering ontologies, web scaling, intelligent
search, on-line catalogs, distributed document authoring, electronic
design notebooks, electronic libraries, security, remote instruction,
distributed project management, reusable service access and validation,
electronic commerce, and Intranets.

        -----------------------------------------------------------
			    UPCOMING THEME ISSUES
                      ------------------------------

Agents:                                      Editorial Board Contacts:
What kinds of agents are performing useful   Munindar Singh
work on the Internet? Papers should          [email protected]
clearly define both the applications and     or
technologies being used as well as the       Michael Huhns
sense of "agent." Applications should be     [email protected]
demonstrable. Issues include security,       Due date: March 15, 1997
mobility, and agent communication
languages. Claims about the efficacy of
one approach or language should be
supported by examples from applications.

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Sat, 25 Jan 1997 10:50:41 -0800
From: Wray Buntine <[email protected]>
Subject: PhD/Masters Research Assistantship

PhD/Masters Research Assistantships

Field:  probabilistic algorithms, data analysis/mining and 
        optimization for CAD
Place:  Electrical Engineering and Computer Science
        University of California, Berkeley

The CAD group in the EECS Dept. at UC Berkeley is offering research support
for its Masters and Doctoral program.  Research areas include but are not
limited to the use of data mining/analysis/engineering techniques in CAD or
optimization, and probabilistic methods for optimization or specialized
compilation.

The Electronic Design Technology (EDT) field is concerned with computer
automated or computer-assisted design of complex electronic systems.  With
current hardware capabilities advancing rapidly, a key bottleneck is the
development of advanced algorithms for optimization and simulation of
partial, abstract or completed designs.  Our task is to design, code and
experiment with new algorithms, methodologies, and software technologies for
alleviating this bottleneck.  The task can include the use of data
mining/analysis to understand the nature of the optimization task, or in
order to develop adaptive optimization methods.

The ideal candidate should have a background in computer science, electrical
engineering or related disciplines, should be an accomplished or developing
programmer, and should have an interest in the theory and mathematical
techniques used in optimization, data analysis, or probabilistic methods.
Candidates who wish to apply are invited to respond with a copy of their CV
to:

Professor R. Newton     URL:  http://www.eecs.berkeley.edu/~newton
Dr. Wray Buntine        URL:  http://www.eecs.berkeley.edu/~wray
Dr. Andrew Mayer        URL:  http://www.eecs.berkeley.edu/~mayer

Dept. of Electrical Engineering and Computer Sciences
520 Cory Hall
University of California at Berkeley
Berkeley, CA, 94720

The CAD Group           URL:  http://www-cad.eecs.berkeley.edu
EECS, UC Berkeley       URL:  http://www.eecs.berkeley.edu

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Wed, 29 Jan 97 14:44:16 EST
Subject: ICML-97 workshop CFPs

                           CALL FOR PAPERS

                  ML APPLICATION IN THE REAL WORLD: 
              METHODOLOGICAL ASPECTS AND IMPLICATIONS

                      Workshop at the Fourteenth 
                 International Conference on Machine 
                          Learning (ICML-97)
                         Nashville, Tennessee
                            July 12, 1997
 
 WWW-page: http://www.aifb.uni-karlsruhe.de/WBS/ICML97/ICML97.html


Description 
Application of Machine Learning techniques to solve real-world problems
has gained more and more interest over the last decade. In spite of this
attention, the ML application process is still lacking a generally accepted
terminology, let alone commonly accepted approaches or solutions. 
Several initiatives, both conferences and workshops have been held
concerning this topic. 
The ICML-93 workshop of Langley and Kodratoff on ML applications as well
as at the ICML-95 workshop on 'Applying Machine Learning in Practice' by
Aha, Catlett, Hirsh and Riddle form the successful precedents of this workshop.
The focus of the ICML-95 workshop was the 'characterization of the
expertise used by machine learning experts during the course of applying
learning algorithms to practical applications'. In the last year a
significant research effort has been spent that deals with applications
of learning algorithms. A reflection of this is the recent interest in
Data Mining and KDD, as for instance reflected in the international KDD-
conference (1995 (Montreal) and 1996 (Portland, OR)). Since the
application of ML-techniques is also very relevant to the KDD-community
it is not surprising that this is also reflected in those conferences.

The workshop will draw along the lines of all these events, but
will emphasise the processes underlying the application of ML in
practice. Methodological issues, as well as issues concerning the kinds
and roles of knowledge needed for applying ML will form a major focus
of the workshop.

It aims at building upon some of the results of discussions at the
ICML-95 workshop on "Application of ML techniques in practice" 
and at the same time tries to move forward to a consensus regarding a
methodology on the application of learning algorithms in practice. 

The workshop "ML Application in the real world; methodological aspects and
implications" focuses on the methodological principles underlying
successful application of ML techniques. Apart from powerful ML
algorithms, good application strategies have to be defined. This implies a
thorough understanding of the initial problem definition and its relation
to the chain of tasks that leads towards a successful solution. Therefore a
two-dimensional approach regarding the process of ML application is
needed. The first dimension deals with the whole cycle of analysing the
setting, problem definition, knowledge extraction, database interaction,
learning, evaluation and iteration in real-world domains, where the second
dimension forms an "inner loop" to this cycle, where the problem
definition is used to refine the task at hand and map it on available
algorithms for learning, pre- and postprocessing and evaluation of
results.
Concerning these issues there is no clear distinction between ML and KDD,
and therefore this workshop will be equally interesting for
researchers from both communities.

This workshop does not focus on (methods for) developing new algorithms.
Moreover, case studies will only contribute to the workshop discussion if
general application principles can be derived from them.


Intended Participants and Audience
The workshop primarily aims at scientists and practitioners that apply ML 
and related techniques to solve problems in the real world. To attend
the workshop, one should submit a paper, a one page extended abstract or
a statement of interest. In case of too much interest from 
participants,  the program committee will select participants on the 
basis of workshop relevance. Ideally, the audience contains a mix of 
university and industrial participants.


Workshop program
The program for this one-day workshop will have a maximum of 10
presentations. Some invited presentations will be part of the program.
Presentations will take 30 minutes (15-20 minutes presentation and 10-15
minutes discussion). Speakers are asked to focus their presentation on
the basis of a topic list that will be compiled during the review
process. To foster discussion and debate, accepted papers will be given
to a critic beforehand; by these means critics will be prepared to
debate presentations. At the end of the workshop, there will be a
plenary discussion session. Accepted papers will be distributed via the
workshop WWW-page before the workshop, to stimulate the discussion.
Accepted papers will also be published in workshop proceedings.

Papers are welcomed concerning (but not limited to) the following
topics:
* Methodological approaches focusing on the process of ML application,
  or sub-processes, such as problem definition and refinement,
  application design, data acquisition, pre- and postprocessing, task
  analysis etc. 
* Making explicit the kinds and roles of knowledge that are necessary
  for execution of ML applications.
* Matching of problem definitions on specific techniques and multi-
  technique configurations.
* Impact of methodologies for empirical research on the application of
  ML-techniques. 
* Identification of the relation of different ML strategies to given
  problem types and identification of the characteristics that play a
  role in describing the initial problems.
* Embedding of the ML application process in more general methodologies
  for (knowledge) system development.
* Frameworks for support of (ML-)novices and experts for setting up
  applications and reuse of previously application(part)s.
* Case studies, describing successful ML applications, that abstract
  from the implementational aspects and focus on identification of the
  choices that are made when designing the application i.e. the 
  (meta-)knowledge involved, etc.
* Comparison of the process of ML application with processes for
  application of related techniques (e.g. statistical data analysis).


Submission guidelines
* Submitted papers should not exceed 3500 words or 8 pages Times Roman
  12pt.
* The title page should contain paper title, author name(s), affiliations and 
  full addresses including e-mail of the corresponding author, as well as the
  paper abstract and five keywords at most.
* Papers are reviewed by at least three members of the program committee on
  their relevance for the workshop discussions. 
* For preparation of the camera ready copies, an ICML style file will be
  available.


Tentative Submission Schedule
* Submission deadline: 		March 22, 1997
* Notification of acceptance: 	April  9, 1997
* Camera ready copy + PS-file:  May    1, 1997
* Papers available on WWW:	June  15, 1997
* Workshop date: 		July  12, 1997


Electronic paper submissions are preferred. Please send your submission
to: 
  [email protected]. 

If Postscript printing is not available, paper submissions (4 hardcopies, 
preferably double sided) can be sent to:
  ICML Workshop "ML APPLICATION IN THE REAL WORLD" 
  p/o ATO-DLO, Floor Verdenius
  Postbus 17
  6700 AA Wageningen
  Netherlands


Program Committee
Dr. Pieter Adriaans           (Syllogic, Houten, The Netherlands)
Prof. C. Brodley              (Purdue University, West Lafayette, IND, USA)
Prof. David Hand              (Open University, Milton Keynes, United Kingdom)
Prof. Yves Kodratoff          (LRI, Paris, France)
Dr. Vassilis Moustakis        (Technical University of Crete, Chania, Greece)
Prof. Gholamreza Nakhaeizadeh (Daimler Benz AG Research, Ulm, Germany)
Dr. R. Kohavi                 (Silicon Graphics, Mountain View, CA, USA)
Dr. Enric Plaza i Cervera     (IIIA-CSIC, Bellaterra, Catalonia, Spain)
Dr. Foster J. Provost         (NYNEX Science & Technology, White Plains, NY,
USA)
Dr. P. Riddle                 (University of Auckland, New Zealand)
Dr. Celine Rouveirol          (LRI, Paris, France)
Prof. Derek Sleeman           (University of Aberdeen, United Kingdom)
Drs. Maarten van Someren      (SWI, Amsterdam, The Netherlands)
Prof. Rudi Studer             (University of Karlsruhe, Germany)

Organising Committee
Robert Engels                 (University of Karlsruhe, Germany)
                              [email protected]
Juergen Herrmann              (University of Dortmund, Germany)
                              [email protected]
Bob Evans                     (RR Donnelley, Gallatin TN, USA)
                              [email protected]
Floor Verdenius               (ATO-DLO, Wageningen, The Netherlands)
                              [email protected]

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Marney Smyth <[email protected]>
Subject: Learning Methods Tutorial -- Washington DC, May 1997 
Date: Sat, 1 Feb 1997 12:19:02 -0500 (EST)


        **************************************************************
        ***                                                        ***
        ***     Learning Methods for Prediction, Classification,   ***
        ***       Novelty Detection and Time Series Analysis       ***
        ***                                                        ***
        ***          Washington, D.C., May 2 -- 3, 1997            ***
        ***                                                        ***
        ***        Geoffrey Hinton, University of Toronto          ***
        ***      Michael Jordan, Massachusetts Inst. of Tech.      ***
        ***                                                        ***
        **************************************************************


A two-day intensive Tutorial on Advanced Learning Methods will be held
on May 2nd and 3rd, 1997, at the Hyatt Regency on Capitol Hill,
Washington D.C.  Space is available for up to 50 participants for the
course.

The course will provide an in-depth discussion of the large collection 
of new tools that have become available in recent years for developing 
autonomous learning systems and for aiding in the analysis of complex 
multivariate data.  These tools include neural networks, hidden Markov 
models, belief networks, decision trees, memory-based methods, as well 
as increasingly sophisticated combinations of these architectures.  
Applications include prediction, classification, fault detection, 
time series analysis, diagnosis, optimization, system identification 
and control, exploratory data analysis and many other problems in
statistics, machine learning and data mining.

The course will be devoted equally to the conceptual foundations of 
recent developments in machine learning and to the deployment of these 
tools in applied settings.  Case studies will be described to show how 
learning systems can be developed in real-world settings.  Architectures 
and algorithms will be presented in some detail, but with a minimum of 
mathematical formalism and with a focus on intuitive understanding.  
Emphasis will be placed on using machine methods as tools that can 
be combined to solve the problem at hand.

WHO SHOULD ATTEND THIS COURSE?

The course is intended for engineers, data analysts, scientists,
managers and others who would like to understand the basic principles
underlying learning systems.  The focus will be on neural network models 
and related graphical models such as mixture models, hidden Markov 
models, Kalman filters and belief networks.  No previous exposure to 
machine learning algorithms is necessary although a degree in engineering 
or science (or equivalent experience) is desirable.  Those attending 
can expect to gain an understanding of the current state-of-the-art 
in machine learning and be in a position to make informed decisions 
about whether this technology is relevant to specific problems in 
their area of interest.

COURSE OUTLINE

Overview of learning systems; LMS, perceptrons and support vectors; 
generalized linear models; multilayer networks; recurrent networks; 
weight decay, regularization and committees; optimization methods; 
active learning; applications to prediction, classification and control

Graphical models: Markov random fields and Bayesian belief networks;
junction trees and probabilistic message passing; calculating most 
probable configurations; Boltzmann machines; influence diagrams; 
structure learning algorithms; applications to diagnosis, density 
estimation, novelty detection and sensitivity analysis

Clustering; mixture models; mixtures of experts models; the EM 
algorithm; decision trees; hidden Markov models; variations on 
hidden Markov models; applications to prediction, classification 
and time series modeling

Subspace methods; mixtures of principal component modules; factor 
analysis and its relation to PCA; Kalman filtering; switching 
mixtures of Kalman filters; tree-structured Kalman filters; 
applications to novelty detection and system identification

Approximate methods: sampling methods, variational methods; 
graphical models with sigmoid units and noisy-OR units; factorial 
HMMs; the Helmholtz machine; computationally efficient upper 
and lower bounds for graphical models

REGISTRATION

Standard Registration: $700

Student Registration:  $400

Cancellation Policy: Cancellation before Friday April 25th, 1997,
incurs a penalty of $150.00. Cancellation after Friday April 25th,
1997, incurs a penalty of one-half of Registration Fee.

Registration Fee includes Course Materials, breakfast, coffee breaks,
and lunch.

On-site Registration is possible. Payment of on-site registration must
be in US Dollar amounts, by Money Order or Check (preferably drawn on
a US Bank account).



Those interested in participating should return the completed
Registration Form and Fee as soon as possible, as the total number of
places is limited by the size of the venue.




     Please print this form, and fill in the hard copy to return by mail

                                REGISTRATION FORM

	   Learning Methods for Prediction, Classification,
	      Novelty Detection and Time Series Analysis

		Friday, May 2 - Saturday, May 3, 1997
			Washington, D.C., USA.
		--------------------------------------
				   
                      Please complete this form (type or print)

         Name   ___________________________________________________
                Last                 First                   Middle

         Firm or Institution  ______________________________________



        Standard Registration ____         Student Registration ____



         Mailing Address (for receipt)     _________________________

         __________________________________________________________

         __________________________________________________________

         __________________________________________________________
          Country                    Phone                      FAX

         __________________________________________________________
                               email address

         (Lunch Menu - tick as appropriate):


         ___ Vegetarian                           ___ Non-Vegetarian


Fee payment must be made by MONEY ORDER or PERSONAL CHECK. All amounts
are given in US dollar figures. Make fee payable to Prof. Michael
Jordan. Mail it, together with this completed Registration Form to:

Professor Michael Jordan
Dept. of Brain and Cognitive Sciences
M.I.T.
E10-034D
77 Massachusetts Avenue
Cambridge, MA 02139
USA 




HOTEL ACCOMMODATION

Hotel accomodation is the personal responsibility of each participant.
				   
                        The Tutorial will be held in

                      
		    Hyatt Regency on Capitol Hill
		      400 New Jersey Avenue, NW
			 Washington, DC 20001
		   1-800-233-1234 or (202) 737-1234


			 on May 2 -- 3, 1997.

 The hotel has reserved a block of rooms for participants of the course. The
                  special room rates for participants are:

                  U.S. $139.00 (Single/Double) per night + tax 

You must reserve accommodation before *April 1, 1997* to avail of this
special rate.  Please be aware that these prices do not include State
or City taxes.



ADDITIONAL INFORMATION
A registration form is available from the course's WWW page at 

 http://www.ai.mit.edu/projects/cbcl/web-pis/jordan/course/

 Marney Smyth
 E-mail: [email protected]
 Phone:  617 258-8928
 Fax:    617 258-6779

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 3 Feb 1997 22:47:43 -0600
From: jan zytkow <[email protected]>

Dear Colleague:   

You may be interested in the following forthcoming events related to
machine discovery.  Please notice that there is still time to submit a
paper to each of these events:

1. PKDD'97 -- 1st European Symposium on Principles of Data Mining
   and Knowledge Discovery, Trondheim, Norway, June 25-27, 1997
   Deadline for submissions: February 17

2. International Symposium on Methodologies for Intelligent Systems 
   (ISMIS-97), Charlotte, North Carolina, October 15-18, 1997
   Machine discovery and learning is a strong theme at ISMIS
   Deadline for submissions: March 1.

3. The Third International Conference on Knowledge Discovery and Data
   Mining (KDD-97), Newport Beach, California, August 14-17, 1997
   Deadline for submissions: March 10 (Cover page by March 3).

Best regards,

  -- Jan Zytkow

------------------------------------------------------------------
1.
------------------------------------------------------------------

        New deadline for submitting papers to PKDD-97

The original deadline for submitting papers to the 1997 Principles of
Knowledge Discovery in Databases was Wednesday, February 5.  This
deadline has been extended, so that PKDD-97 papers are now due on

                   Monday, February 17, 1997

Notice of acceptance: March 17
Camera ready copies: April 4

Submit by email (preferred) to [email protected] or by airmail to 

    Jan Komorowski
    Department of Computer Systems
    Norwegian University of Science and Technology
    7034 Trondheim, Norway

Papers should be in English and not exceed ten single-spaced pages of
12pt font.  The first page should begin with title, authors,
affiliations, surface and e-mail addresses, and an abstract of about
200 words.

The proceedings of the Symposium will be published in the Springer
Verlag Lecture Notes AI Series (www.springer.de/comp/comp.html) and
available at PKDD-97, June 25-27.

Watch the updated PKDD'97 WWW page for further details:
         http://www.idt.ntnu.no/pkdd97

If you have already sent off your paper but would like to resubmit by
the new deadline, please send email to [email protected]

---------------------------------------------------------------------------

             PKDD'97 -- 1st European Symposium on Principles of
                     Data Mining and Knowledge Discovery
                              Trondheim, Norway
                              June 25-27, 1997


 Program Committee             Introduction

    * Pieter Adriaans          Data Mining and Knowledge Discovery (KDD)
    * Attilio Giordana         have recently emerged from a combination of
    * David Hand               many research areas: databases, statistics,
    * Bob Henery               machine learning, automated scientific
    * Mikhail Kiselev          discovery, inductive programming, artificial
    * Willi Kloesgen           intelligence, visualization, decision
    * Yves Kodratoff           science, and high performance computing.
    * Jan Komorowski
    * Heikki Manilla           While each of these areas can contribute in
    * Marjorie Moulet          specific ways, KDD focuses on the value that
    * Steve Muggleton          is added by creative combination of the
    * Zdzislaw Pawlak          contributing areas. The goal of PKDD'97 is
    * Gregory                  to provide a European-based forum for
      Piatetsky-Shapiro        interaction among all theoreticians and
    * Zbigniew Ras             practitioners interested in data mining.
    * Lorenza Saitta           Fostering an interdisciplinary collaboration
    * Erik Sandewall           is one desired outcome, but the main
    * Wei-Min Shen             long-term focus is on theoretical principles
    * Arno Siebes              for the emerging discipline of KDD,
    * Andrzej Skowron          especially those new principles that go
    * Derek Sleeman            beyond each of the contributing areas.
    * Shusaku Tsumoto
    * Raul Valdes-Perez        To promote these goals, PKDD'97 will be
    * Rudiger Wirth            organized into tracks around the key areas
    * Stefan Wrobel            contributing to KDD. For each area an ideal
    * Wojtek Ziarko            paper should focus on how its methods
    * Jan Zytkow               advance KDD's goals and principles.

                               Both theoretical and applied submissions are
                               sought. Reviewers will assess the
                               contribution towards the main goals of
                               PKDD'97, in addition to the usual
                               requirements of novelty, clarity and
                               significance. Applied papers should go
                               beyond an individual application, presenting
                               an explicit method that promises a degree of
                               generality within some stage of the
                               discovery process, such as preprocessing,
                               mining, visualization, use of prior
                               knowledge, knowledge refinement, and
                               evaluation. Theoretical papers should
                               demonstrate how they advance the process of
                               data mining and knowledge discovery.

------------------------------------------------------------------
2.
------------------------------------------------------------------
    ****   C A L L     F O R    P A P E R S  ****

TENTH INTERNATIONAL SYMPOSIUM ON
METHODOLOGIES FOR INTELLIGENT SYSTEMS (ISMIS'97)

Hilton Hotel, Charlotte, North Carolina
October 15-18, 1997

SPONSORS
UNC-Charlotte, Oak Ridge National Laboratory, Univ. of Warsaw, and others.

PURPOSE OF THE SYMPOSIUM 
This Symposium is intended to attract individuals who are actively 
engaged both in theoretical and practical aspects of intelligent systems. 
The goal is to provide a platform for a useful exchange between 
theoreticians and practitioners, and to foster the cross-fertilization 
of ideas in the following areas:
   * Evolutionary Computation 
   * Intelligent Information Systems
   * Learning and Knowledge Discovery
   * Knowledge Representation and Integration
   * Logic for Artificial Intelligence
   * Robotics, Motion and Machine Vision
   * Soft Computing 
   * Methodologies (modeling, design, validation, performance evaluation).
In addition, we solicit papers dealing with Applications of Intelligent
Systems in complex/novel domains, e.g. human genome, global change,
manufacturing, health care, etc.


SYMPOSIUM CHAIRS 
   Francois G. Pin (Oak Ridge National Lab.)
   Zbigniew W. Ras (UNC-Charlotte & Polish Acad. Sci.) 
   Andrzej Skowron (U. Warsaw, Poland)
 
PROGRAM COMMITTEE
Luigia Carlucci Aiello (U. Roma, Italy)
Thomas Baeck (Inf. Centrum Dortmund & U. Leiden, The Netherlands)
Alan Biermann (Duke Univ.)
Jacques Calmet (U. Karlsruhe, Germany)
Jaime Carbonell (CMU)
Wesley Chu (UCLA)
Kenneth DeJong (GMU) 
Robert Demolombe (CERT/ONERA, France)
Jon Doyle (MIT)
Toshio Fukuda (Nagoya U., Japan)
Attilio Giordana (U. Torino, Italy)
Diana Gordon (Naval Research Lab.)
Mirsad Hadzikadic (Carolinas HealthCare System)
Jiawei Han (Simon Fraser U., Canada)
David Hislop (Army Research Office)
Matthias Jarke (RWTH Aachen, Germany)
John Y. Jiang (Pacific Bell Lab.)
Willi Kloesgen (GMD, Germany)
Yves Kodratoff (U. Paris VI, France)
Jan Komorowski (U. Trondheim, Norway)
Alberto Martelli (U. Torino, Italy)
Robert Meersman (U. Brussels, Belgium)
Zbigniew Michalewicz (UNC-Charlotte & Polish Acad. Sci.)
Ryszard Michalski (GMU & Polish Acad. Sci.)
Jack Minker (U. Maryland)
Ephraim Nissan (U. Greenwich, UK)
Lin Padgham (RMIT U., Australia) 
Rohit Parikh (CUNY)
Lynne Parker (ORNL) 
Gregory Piatetsky-Shapiro (GTE Lab.)
Henri Prade (U. Paul Sabatier, France)
Luc De Raedt (U. Leuven, Belgium)
Marek Rusinkiewicz (MCC)
Lorenza Saitta (U. Torino, Italy)
Erik Sandewall (Linkoping U., Sweden)
Yoav Shoham (Stanford U.)
Richmond Thomason (U. Pittsburgh)
Jing Xiao (UNCC)
Carlo Zaniolo (UCLA)
Gian Piero Zarri (CNRS, France)
Maria Zemankova (NSF)
Jan M. Zytkow (Wichita State U. & Polish Acad. Sci.)

INVITED TALKS
Alan Biermann (Duke Univ.)
  "Multimedia Dialogue:  Theory and Practice"
Jaime Carbonell (CMU)
  "Automated Text Summarization" or "Learning from the WEB"
Wesley Chu (UCLA)
  "A knowledge-based multimedia medical distributed database system"
Michael Lowry (NASA Ames)
  "V&V of AI systems that control deep-space spacecraft"
Gregory Piatetsky-Shapiro (GTE Lab.)
  "Data Mining and Knowledge Discovery: The Second Generation"
Gio Wiederhold (Stanford U.)
  "Achieving scalibility through an Ontology Algebra"

ORGANIZING COMMITTEE
Brian Bachman (First Union)
Mirsad Hadzikadic (Carolinas HealthCare System)
Karen Harber (ORNL)
Mieczyslaw Klopotek (Polish Acad. Sci.)
M.S. Narasimha (IBM-Charlotte)
Zbigniew W. Ras (UNC-Charlotte)

PAPER SUBMISSION 
Authors are invited to submit four copies of their manuscript 
(maximum 12 pages) to one of the addresses below: 

     Papers from US and Canada:      Papers from Europe: 
     Francois G. Pin, ISMIS'97       Andrzej Skowron, ISMIS'97
     ORNL, Bldg. 7601, M.S. 6305     Univ. of Warsaw
     P.O. Box 2008                   Dept. of Mathematics
     Oak Ridge, TN 37831-6305        Banacha 2
     e-mail: [email protected]            PL-02-097 Warsaw, POLAND
     fax: 423-574-4624               e-mail: [email protected]
     tel: 423-574-6130               tel: 48-(22)-658-3449

                 All other papers:
                 Zbigniew W. Ras, ISMIS'97
                 Univ. of North Carolina
                 Dept. of Comp. Science
                 Charlotte, N.C. 28223
                 e-mail: [email protected]
                 fax: 704-547-3516
                 tel: 704-547-4567

Submissions should include a title page (1 copy) specifying the
title, all authors with their affiliations, abstract (100-200 words), 
up to 10 keywords (begin the keyword list with at least one of the 
ISMIS areas listed above); and the preferred address of the contact 
author, including a telephone number, fax number, and e-mail address 
(if available). The remainder of the paper can include up to 11 pages, 
attached to the title page.
If possible, the title page should be ADDITIONALLY submitted via email 
(in plain text) to <[email protected]> to facilitate submissions processing.

IMPORTANT DATES
Submission of Papers:    March 1, 1997 
Acceptance Notification: May 25, 1997 
Final Paper:             July 1, 1997

PUBLICATION
Papers accepted for Regular Sessions will be published by 
Springer-Verlag in LNCS/LNAI.  
Poster Session proceedings will be published by Oak Ridge 
National Laboratory.
Both proceedings will be available at the symposium.

WWW
Information about ISMIS'97 can be found on 
  http://www.ipipan.waw.pl/~klopotek/ismis97.html


------------------------------------------------------------------
3.
------------------------------------------------------------------

		   The Third International Conference on
		Knowledge Discovery and Data Mining (KDD-97)

			August 14-17, 1997
		Newport Beach, California, U.S.A.

Sponsored by the American Association for Artificial Intelligence
----------------------------------------------------------------------------

Call for Papers

The rapid growth of data and information has created a need and 
an opportunity for extracting knowledge from databases, and both 
researchers and application developers have been responding to that need.
Knowledge discovery in databases (KDD), also referred to as data mining, is
an area of common interest to researchers in machine discovery, statistics,
databases, knowledge acquisition, machine learning, data visualization, high
performance computing, and knowledge-based systems.  KDD applications have 
been developed for astronomy, biology, finance, insurance, marketing, 
medicine, and many other fields.

The third international conference on knowledge discovery and
data mining (KDD-97) will follow up the success of KDD-95 and KDD-96 
by bringing together researchers and application developers from 
different areas focusing on unifying themes.

Suggested Topics

The topics of interest include, but are not limited to:

Theory and Foundational Issues in KDD

   * Data and knowledge representation for KDD
   * Probabilistic modeling and uncertainty management in KDD
   * Modeling of structured, unstructured and multimedia data
   * Fundamental advances in search, retrieval, and discovery methods
   * Definitions, formalisms, and theoretical issues in KDD

Data Mining Methods and Algorithms

   * Algorithmic complexity, efficiency and scalability issues in data
     mining
   * Probabilistic and statistical models and methods
   * Using prior domain knowledge and re-use of discovered knowledge
   * Parallel and distributed data mining techniques
   * High dimensional datasets and data preprocessing
   * Unsupervised discovery and predictive modeling

KDD Process and Human Interaction

   * Models of the KDD process
   * Methods for evaluating subjective relevance and utility
   * Data and knowledge visualization
   * Interactive data exploration and discovery
   * Privacy and security

Applications

   * Data mining systems and data mining tools
   * Application of KDD in business, science, medicine and engineering
   * Application of KDD methods for mining knowledge in text, image, audio,
     sensor, numeric, categorical or mixed format data
   * Resource and knowledge discovery using the Internet

This list of topics is not intended to be exhaustive but an indication of
typical topics of interest. Prospective authors are encouraged to submit
papers on any topics of relevance to knowledge discovery and data mining.

Demonstration Sessions

KDD-97 also invites working demonstrations of discovery systems. 
Contact information for details is provided below.

Submission and Review Criteria

Both research and applications papers are solicited. All submitted papers
will be reviewed on the basis of technical quality, relevance to KDD,
novelty, significance, and clarity. Authors are encouraged to make their
work accessible to readers from other disciplines by including a carefully
written introduction. Papers should clearly state their relevance to KDD.

Please submit 7 hardcopies of a short paper (a maximum of 9
single-spaced pages not including cover page and bibliography, 1 inch
margins, and 12pt font) to be received by March 10, 1997.  A cover
page must include author(s) full address, email, paper title and a 200
word abstract, and up to 5 keywords. This cover page must accompany
the paper. In addition, an ascii version of the cover page must be
submitted electronically by March 3, 1997 (earlier if possible),
preferably using a WWW form located at
http://www-aig.jpl.nasa.gov/kdd97/.  If the WWW form cannot be used,
please submit the ascii cover page by email to
[email protected], using the template available by ftp at
http://www-aig.jpl.nasa.gov/kdd97/.

Please mail the 7 hardcopies of the full papers to:

     AAAI (KDD-97)
     445 Burgess Drive
     Menlo Park, CA 94025-3496 USA
     Phone: (+1 415) 328-3123
     Fax: (+1 415) 321-4457
     Email: [email protected]
     Web Site: http://www.aaai.org.

Important Dates

   * Submissions Due: March 10, 1997 
   * Acceptance Notice: April 28, 1997 
   * Camera-ready paper due: May 26, 1997


KDD-97 Organization
-------------------

General Conference Chair

	Ramasamy Uthurusamy (General Motors Corporation, USA)     

Program Co-Chairs

	David Heckerman (Microsoft Research, USA)
	Heikki Mannila (University of Helsinki, Finland)
	Daryl Pregibon (AT&T Research, USA)

Publicity Chair

	Paul Stolorz (Jet Propulsion Laboratory, USA)     

Tutorial Chair

        Padhraic Smyth (UC Irvine, USA)

Demo and Poster Sessions Chair

	Tej Anand (NCR Corporation, USA)

Awards Chair

        Gregory Piatetsky-Shapiro (GTE Laboratories, USA)

Panel Chair

        Willi Kloesgen


Contact Information
-------------------

For further information, send inquiries regarding

   * submission logistics to AAAI at [email protected]
     Phone: (+1 415) 328-3123
     Fax: (+1 415) 321-4457

   * KDD-97 sponsorship and industry participation to
     Ramasamy Uthurusamy  [email protected]
     Phone: 810-696-0669
     Fax:   810-696-0580

   * technical program and content to [email protected]

   * demo and poster sessions to [email protected]

   * general and publicity issues to [email protected]

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
410.1597:06IJSAPL::OLTHOFSpellchecked Henry AlthoughWed Feb 12 1997 22:35710
	Knowledge Discovery Nuggets 97:06, e-mailed 97-02-12
News:
	* E. Colet, ESPN to regularly show the application of data mining
		http://www.nba.com/allstar97/asgame/beyond.html
	* K. Parsaye, IDI Press Release: "Bridge Between OLAP and Data Mining"
Publications:
	* R. Greiner, CLNL 4: Computational Learning Theory and Natural 
		Learning Systems, v. IV: Making Learning Systems Practical,
		http://www-mitpress.mit.edu/mitp/recent-books/comp/greop.html
	* R. Kohavi, MLJ Spec Issue on Applications of Machine Learning
                 and the Knowledge Discovery Process, deadline: March 4.
		http://reality.sgi.com/ronnyk/mljapps/
Positions:
	* H. Mannila, Postdoctoral position in data mining / 
 		pattern matching / spatial data,  
		http://www.cs.helsinki.fi/~mannila
	* F. Provost, KB system developer positions at NYNEX 
		Science and Technology
Meetings:
	* S. Cartmell, PADD 97 update -- 
		http://www.demon.co.uk/ar/PADD97/
	* B. Zupan, IDAMAP-97: Reminder and brief Second CFP
	* G. Widmer, ECML'97 - Papers & Registration Info
		http://is.vse.cz/ecml97/home.html
--
Knowledge Discovery Nuggets is a free electronic newsletter for the Data 
Mining and Knowledge Discovery in Databases (KDD) community, focusing on 
the latest research and applications.

Submissions are most welcome and should be emailed, 
with a DESCRIPTIVE subject line (and a URL, when available) to [email protected]
To subscribe, email to [email protected] message with 
	subscribe kdd-nuggets 
in the first line (the rest of the message and subject are ignored). 
See http://info.gte.com/~kdd/subscribe.html for details.

Nuggets frequency is approximately 3-4 times a month. 
Back issues of Nuggets, a catalog of S*i*ftware (data mining tools), 
and a wealth of other information on Data Mining and Knowledge Discovery 
is available at Knowledge Discovery Mine site http://info.gte.com/~kdd

	-- Gregory Piatetsky-Shapiro (editor)

********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) * 
* and not necessarily of their respective employers (or GTE Laboratories)   *
*****************************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Arguing with engineers is like mud-wrestling with pigs.
Sooner or later you'll realize that they like it.
		Thanks to Tom Lanning
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "Edward Colet"<[email protected]>
Date: Tue, 11 Feb 1997 16:11:17 -0400
Subject: "ESPN to regularly show the application of data mining"

On Sunday mornings from 9:00-9:30 (EST), ESPN will regularly broadcast
a show called "NBA Matchups presented by IBM".   The show will feature
in-depth analysis of player and team match-ups based on trends and patterns
found by Advanced Scout that pertain to the National Basketball Association

(NBA) game of the week. The game of the week is aired later that afternoon
on NBC.   Bob Hill (former coach of the San Antonio Spurs), Fred "Mad Dog"
Carter and Mark Jones (both of ESPN) are the hosts, and an invited guest
will round out the panel (last week's guest was Red Auerbach). As some of you
may know, several NBA coaches have been using IBM's Advanced Scout data
mining application to discover trends and patterns in game data. Advanced
Scout is also the basis for the "Beyond the Box Score" feature on the 
NBA website (www.nba.com. Look under "News and Features" 
if you don't see it off the home page).

Thanks,
Ed Colet.

 *********************************************
 IBM T.J. Watson Research Center
 30 Saw Mill River Road
 Hawthorne  NY  10532
 phone: 914-784-6621;  tie-line 863
 fax: 914-784-7455
 email: [email protected]
 *********************************************
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 5 Feb 1997 10:10:31 -0800
From: [email protected] (IDI)
Subject: OLAP & DM Press Release
************************************************************************
Special Release
				
CONTACT: IDI MARKETING COMMUNICATIONS
									       (310) 937-3600
 

Breakthrough Merges 
OLAP and DataMining

The Bridge Between OLAP and Data Mining 
Impacts all Corporate Decision Support Plans

Los Angeles -- January 27, 1997

The 2nd Annual Data Mining Summit in San Francisco, California on February
19, 1997 is likely to be remembered as the event in which On Line Analytical
Processing (OLAP) and datamining came together for the first time and took
uniform shape.

Up until now, most corporations had considered data mining and OLAP as
individual and disparate components of their decision support system,
because no coherent theory and methodology existed for a relationship. The
1997 Data Mining Summit will bridge this gap and will forever change the way
corporations view and use decision support systems.

At the Keynote Address for the Summit, Dr. Kamran Parsaye, CEO of
Information Discovery, Inc. will introduce a fundamentally new  theory and
methodology for connecting OLAP and data mining, showing that they must be
merged in order to avoid incorrect and misleading results during data analysis.

"The bridge between OLAP and data mining is not a luxury but a necessity,"
said Dr. Parsaye. "OLAP analyses and datamining need to be performed
together if we are to trust the results from either" he added. "In the early
days of relational databases, before normalization theory was introduced,
people were getting incorrect results. Now, unless OLAP and data mining are
performed together a similar situation can prevail" he said.

The keynote address will show that whenever data analysis takes place, it
happens within some "dimension", and datamining along a single axis is
merely a rough approximation of multi-dimensional mining. Lack of attention
to dimensionality in data mining can result in unexpected results.  And,
decision support errors can take a long time to be uncovered -- if ever. A
companion paper in the February issue of Database Programming and Design
magazine details examples of this phenomena and outlines a uniform approach
for dealing with both OLAP and datamining.  

At the keynote, Dr. Parsaye will also describe how OLAP and data mining fit
together in the context of the Four Spaces  of Decision Support. This
methodology for applying OLAP data mining has three distinct processes of
episodic, strategic and continuous mining for specific user groups within
corporate environments.

"Integration between OLAP and data mining can not take place at the desktop
level and must be performed on the server" said Dr. Parsaye. "IS departments
that hand their users OLAP data to be mined on the desktop could be
unknowingly getting their users into serious trouble" he said.

The impact of the new result on corporate planning for decision support and
data warehousing can be significant. Business users and IS departments can
no longer just consider an OLAP product and a separate data mining system
but will need to consider both at once to avoid the pitfalls outlined in the
keynote. This will also accelerate the use of products for both OLAP and
data mining.

For more information on the DataMining Summit please visit
http://www.dbsummit.com on the internet, or call (415) 905 2267. For more
information on Information Discovery, Inc. please visit
http://www.datamining.com on the internet.

[note: any comments from readers on appropriateness of posting 
commercial press releases such as above? GPS] 

>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 3 Feb 1997 18:56:37 -0500 (EST)
From: Russell Greiner <[email protected]>
To: [email protected], [email protected], [email protected], [email protected],
        [email protected], [email protected],
        [email protected]
Subject: CLNL v4 is here!
CC: [email protected], [email protected]
Content-Length: 363

We are pleased to announce that the book

  "Computational Learning Theory and Natural Learning Systems
   Volume IV: Making Learning Systems Practical"
  
   (ed. Russell Greiner, Thomas Petsche, and Stephen Jose Hanson)

is now available from MIT Press; see 
   http://www-mitpress.mit.edu/mitp/recent-books/comp/greop.html
for details.

Cheers,
	Russ Greiner

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Tue, 4 Feb 1997 23:23:12 -0800
From: Ronny Kohavi <[email protected]>
Subject: CFP: Special Machine Learning issue on applications of ML

This is a short reminder that the submission deadline for the special
issue of Machine Learning is in a few weeks.  For more information, see

                  http://reality.sgi.com/ronnyk/mljapps/

*** Submission deadline: 4 Mar 1997

_____________________________________________________________________________
                           Machine Learning 

                           Special Issue on
                   Applications of Machine Learning
                 and the Knowledge Discovery Process
				   
	    Guest editors: Ronny Kohavi and Foster Provost

   With the explosion in size of business and scientific databases
   (VLDBs), the opportunities and pressure to mine the data and make
   novel discoveries have increased dramatically.  For many problems,
   basic statistical summaries are not sufficient and there is a clear
   and recognized need for solutions involving a machine learning
   component. For example, modern businesses constantly seek to gain
   competitive advantage by tailoring actions to different customer
   segments and avoiding the trap of targeting the "average customer."

   This special issue of the journal Machine Learning will be dedicated
   to papers describing work in which machine learning technologies have
   been applied to solve significant real-world problems.  In particular,
   it will focus on the application of Machine Learning technology, the
   simplifying assumptions that *cannot* be made in a real-world
   application, and the processes that are involved in going from the raw
   data to the final knowledge that decision makers seek.
_____________________________________________________________________________

  Ronny Kohavi and Foster Provost

    [email protected]

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Wed, 12 Feb 1997 09:58:29 +0500
Subject: 3 positions at NYNEX S&T

    KB system developer positions at NYNEX Science and Technology

The Integrated Network Services Testing & Analysis (INSTA) group at
NYNEX Science & Technology has three openings for knowledge-based (KB)
diagnostic system developers. The group is involved in building
monitoring, testing and diagnostic systems using state of the art AI
technologies for advanced Telecom networks and circuits. The group has
been building systems that support complete testing and diagnosis of
circuits, both from the central office and in the field. Systems
already built and deployed test and diagnose residential telephone
lines and some of the business services. In addition to these, the
group is currently looking at ISDN and broadband services.

The selected candidate would work on one or more of the following
projects: 

- Building KB system for assisting field technicians out in the field in
  testing and troubleshooting faults in telecomm circuits. The candidate
  will also explore complementing this KB with the KB performing
  centralized testing from the Central Office.  

- Building KB system for automated centralized testing and diagnosis of 
Special (Buisness) service circuits. 

- Building monitoring, testing, and diagnostic systems for broadband
  circuits. 

- Building an intelligent interactive assistant to aid testers in
  testing and diagnosing circuits. 



Suitable candidates must have the following:
===========================================

- Background in Computer Science or Computer Engineering or Electrical
  Engineering.

- Experience in all aspects of building knowledge-based systems
  including knowledge acquisition, knowledge engineering, domain and
  task modeling, testing, validation, and evaluation of the
  knowledge-based systems. 

- Good understanding of various AI techniques such as model-based
  reasoning, case-based reasoning, neural nets, and machine learning.

- Good analytical skills.

- Quick learner - to quickly acquire relevant domain knowledge.

- Good system building experience


Experience with the following would be a plus:
===================================================================

- Knowledge of data-analysis tools (eg: statistical tools)

- Unix, C, C++, LISP, ARTIM, CLIPS...

- Distributed Client server Architectures

- databases, database wharehousing

- Telecomm experience: Operation Support Systems, Residential Lines,
  Special services, broadband services, telecomm network and circuit
  testing, alarm monitoring etc.



If interested, please mail a hard copy of your resume to:

Yuling Wu
NYNEX Science & Technology
400 Westchester Av.
White Plains, NY 10604

or email the postscript version to:

[email protected]

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 7 Feb 1997 14:21:24 +0200 (EET)
From: Heikki Mannila <[email protected]>
Subject: Postdoc position in Helsinki: data mining / pattern matching 
/ spatial data
Content-Length: 1366

                Postdoctoral position in
      data mining / pattern matching / spatial data

                 University of Helsinki
             Department of Computer Science

The pattern matching and data mining group in the Department of Computer
Science, University of Helsinki, has an opening for a postdoc researcher
in the areas of data mining, pattern matching, or spatial data.

The research group combines methods from pattern matching, statistics,
and databases to develop methods for the analysis of large data sets.
The group does theoretical and applied research.  Currently, special
emphasis is given to work related to bioinformatics and geoinformatics.
The group is one of the leading ones in data mining and string matching.

For further information, see

        http://www.cs.helsinki.fi/~mannila
        http://www.cs.helsinki.fi/~ukkonen

Applicants should have a recent Ph.D.  or equivalent.  The appointment
is initially for one year, starting from September 1997.

Applications should contain a curriculum vita, a list of three referees
and a letter addressing the applicant's suitability for the position.
Applications and inquiries may be submitted by email to

[email protected] or [email protected]

before February 28, 1997.


Heikki Mannila          Esko Ukkonen
                                             
>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 5 Feb 1997 18:21:51 +0000
From: Steve Cartmell <[email protected]>
Subject: PA EXPO97 UPDATE

                                    PRACTICAL APPLICATION  EXPO97
                                ==============================
                                            CONFERENCE UPDATE
                                        ===================

Westminster Central Hall, London, 21-25 April, 1997

The Practical Application EXPO97 brings together four events under one
roof:  PAAM97 - The Practical Application of Intelligent Agents and
Multi-Agents; PADD97- The Practical Application of Knowledge Discovery and
Data Mining; PACT97-The Practical Application of  Constraint Technology and
PAP97-The Practical Application of Prolog.

PLEASE VISIT OUR RECENTLY UPDATED WEB PAGES FOR FURTHER INFORMATION ON

Tutorials
Invited Talks
Exhibition
Venue
Hotel reservations
Registration

http://www.demon.co.uk/ar/Expo97/
http://www.demon.co.uk/ar/PAP97/
http://www.demon.co.uk/ar/PACT97/
http://www.demon.co.uk/ar/PAAM97/
http://www.demon.co.uk/ar/PADD97/



The Practical Application Company
PO Box 137
Blackpool
Lancs FY2 9UN
UK
Tel:  +44 (0)1253 358081
Fax: +44 (0)1253 353811
email:  [email protected]
WWW:  http://www.demon.co.uk/ar/TPAC/

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Blaz Zupan <[email protected]>
Subject: IDAMAP-97: Reminder and brief Second CFP
Date: Wed, 5 Feb 1997 10:35:20 +0100 (MET)

                 Reminder and brief Second Call for Papers for

                                  IDAMAP-97
            INTELLIGENT DATA ANALYSIS IN MEDICINE AND PHARMACOLOGY
                          Saturday, August 23, 1997

                          Workshop W15 at IJCAI-97
                    August 23-29, 1997, Nagoya, Japan


Paper submission deadline is March 3, 1997. Submit 8-12 page papers by
e-mail (postscript) and 3 hard-copies by surface mail to:

  Nada Lavrac, Blaz Zupan
  J. Stefan Institute, Jamova 39, SI-1000 Ljubljana, Slovenia
  email:  [email protected]

For up-to-date workshop information please check:
http://www-ai.ijs.si/ailab/activities/idamap97.html
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 10 Feb 1997 18:00:38 +0100 (MET)
From: Gerhard Widmer <[email protected]>
Subject: ECML'97 - Papers & Registration Info
-------------------------------------------------------------------------

     NINTH EUROPEAN CONFERENCE ON MACHINE LEARNING (ECML-97)

             Prague, Czech Republic, April 23-26 1997

       ******************************************************
       ECML'97: LIST OF ACCEPTED PAPERS and REGISTRATION INFO
       ******************************************************
-------------------------------------------------------------------------

The list of accepted papers, INCLUDING ALL ABSTRACTS, is now
available from the ECML-97 WWW home page:

   http://is.vse.cz/ecml97/home.html

This page also gives access to
- the 4 post-conference ECML/MLNet WORKSHOPS and
- ECML-97 REGISTRATION INFORMATION and the ECML REGISTRATION FORM.
- A preliminary version of the CONFERENCE PROGRAMME will be available soon.

For further questions about the program, contact Gerhard Widmer
at [email protected], for questions regarding registration,
contact the local organizers at [email protected].

For those without access to the WWW, please find below
- titles and contact addresses for the 4 MLNet workshops,
- the list of papers (w/o abstracts),
- an ascii version of the registration form.


------------------------------------------------------------------

ECML / MLNet WORKSHOPS (Saturday, April 26):

WS 1:		Data-Driven Learning of Natural Language Processing Tasks
Contact:	Walter Daelemans,
		P.O. BOX 90153, NL-5000 LE Tilburg, The Netherlands.
		Tel: +31 13 4663070, Fax: +31 13 4663110,
		E-mail: [email protected]
WS1 WWW Page:	http://www.cs.unimaas.nl/ecml97/

WS 2: Case-Based Learning: Beyond Classification of Feature Vectors
Contact:	Dietrich Wettschereck,
		GMD, FIT.KI, Schloss Birlinghoven,
		53754 Sankt Augustin, Germany
		Tel: +49-2241-14-2097, Fax: +49-2241-14-2072,
		E-mail: [email protected]
WS2 WWW Page:	http://nathan.gmd.de/persons/dietrich.wettschereck/ecmlws.html

WS 3:		Learning in Dynamically Changing Domains:
		Theory Revision and Context Dependence Issues
Contact:	Gholamreza Nakhaeizadeh,
		Research Center of Damiler-Benz AG, Ulm, Germany
		E-mail: [email protected]
WS3 WWW Page:	http://www.amsta.leeds.ac.uk/statistics/ecml97/dyn.htm

WS 4:		Machine Learning and Human-Agent Interaction
Contact:	Michael Kaiser,
		Institute for Real-Time Computer Systems & Robotics
		University of Karlsruhe, Kaiserstrasse 12,
		D-76128 Karlsruhe, Germany
		E-Mail: [email protected]
WS4 WWW Page:	http://wwwipr.ira.uka.de/events/hai97/

Common dates for all workshops:
	Deadline for submissions:	February 15
	Notification of acceptance:	March 8
	Camera-ready copy due:		April 1

------------------------------------------------------------------

	PAPERS ACCEPTED FOR PRESENTATION AT ECML'97:


INVITED TALKS / PAPERS:

Learning Complex Probabilistic Models (tentative title)
   Stuart J. Russell, University of California, Berkeley, USA

Constructing and Sharing Perceptual Distinctions
   Luc Steels, Free University of Brussels (VUB) and
   Sony Computer Science Laboratory, Paris

On Prediction by Data Compression
   Paul Vitanyi, CWI, Amsterdam
   Ming Li, City University of Hong Kong



LONG TALKS/PAPERS:

   Induction of Feature Terms with INDIE
	Eva Armengol & Enric Plaza, IIIA, Barcelona, Spain

   Integrated Learning and Planning Based on Truncating Temporal Differences
	Pawel Cichosz, Warsaw University of Technology, Warsaw, Poland

   Theta-subsumption for Structural Matching
	Luc De Raedt, Katholieke Universiteit Leuven, Belgium
	Peter Idestam-Almquist, Stockholm University, Sweden
	Gunther Sablon, Katholieke Universiteit Leuven, Belgium

   Constructing Intermediate Concepts by Decomposition of Real Functions
	Janez Demsar, Blaz Zupan, Marko Bohanec, Ivan Bratko
	University of Ljubljana and Jozef Stefan Institute, Ljubljana, Slovenia

   Conditions for Occam's Razor Applicability and Noise Elimination
	Dragan Gamberger, Rudjer Boskovic Institute, Zagreb, Croatia
	Nada Lavrac, Jozef Stefan Institute, Ljubljana, Slovenia

   Learning Different Types of New Attributes by Combining
   the Neural Network and Iterative Attribute Construction
	Yuh-Jyh Hu, University of California, Irvine, USA

   Finite-Element Methods with Local Triangulation Refinement
   for Continuous Reinforcement Learning Problems
	Remi Munos, CEMAGREF, Antony, France

   Compression-based Pruning of Decision Lists
	Bernhard Pfahringer, University of Waikato, New Zealand

   NeuroLinear: A System for Extracting Oblique Decision Rules
   from Neural Networks
	Rudy Setiono & Huan Liu, National University of Singapore

   Model Combination in the Multiple-data-batches Scenario
	Kai Ming Ting, University of Waikato, New Zealand
	Boon Toh Low, Chinese University of Hong Kong

   Natural Ideal Operators in Inductive Logic Programming
	Fabien Torre & Celine Rouveirol, LRI, Paris, France

   Ibots Learn Genuine Team Solutions
	Cristina Versino & Luca Maria Gambardella, IDSIA, Switzerland

   Global Data Analysis and the Fragmentation Problem in Decision Tree Induction
	Ricardo Vilalta, Gunnar Blix, Larry Rendell,
	University of Illinois at Urbana-Champaign, USA



SHORT TALK/PAPERS:

   Exploiting Qualitative Knowledge to Enhance Skill Acquisition
	Cristina Baroglio, Universita di Torino, Italy

   Classification by Voting Feature Intervals
	G"ulsen Demir"oz & H. Altay G"uvenir,
	Bilkent University, Ankara, Turkey

   Metrics on Terms and Clauses
	Alan Hutchinson, King's College, London, UK

   Learning When Negative Examples Abound
	Miroslav Kubat, Robert Holte, Stan Matwin, University of Ottawa, Canada

   A Model for Generalization based on Confirmatory Induction
	Nicolas Lachiche, INRIA Looraine, France
	Pierre Marquis, Universite d'Artois, France

   Learning Linear Constraints in Inductive Logic Programming
	Lionel Martin & Christel Vrain, Universite d'Orleans, France

   Inductive Genetic Programming with Decision Trees
	Nikolay Nikolaev, American University in Bulgaria
	Vanyo Slavov, New Bulgarian University, Sofia, Bulgaria

   Parallel and Distributed Search for Structure in Multivariate Time Series
	Tim Oates, Matthew Schmill, Paul Cohen
	University of Massachusetts, Amherst, USA

   Probabilistic Incremental Program Evolution: Stochastic Search
   Through Program Space
	Rafal Salustowicz & J"urgen Schmidhuber, IDSIA, Switzerland

   The GRG Knowledge Discovery System:
   Design Principles and Architectural Overview
	Ning Shan, Macro International Inc., Calverton, MD, USA
	Howard Hamilton & Nick Cercone, University of Regina, Canada

   Learning and Exploitation do not Conflict under Minimax Optimality
	Csaba Szepesvari, University of Szeged, Hungary

   Search-based Class Discretization
	Luis Torgo & Joao Gama, University of Porto, Portugal

   A Case Study in Loyalty and Satisfaction Research
	Koen Vanhoof, Josee Bloemer, K. Pauwels
	Limburgs Universitair Centrum, Belgium


---------------------------------------------------------------------

                    REGISTRATION FORM - ECML 97
                                 
                  (The deadline: March 25, 1997)

                                 
TO BE FAXED  (42-2) 6731 0503      OR MAILED  Action M Agency,
note, please                                  Vrsovicka 68
that  after  March 1, 1997,                   101 00 - Praha 10
the country number (42)                       Czech Republic
will be changed to (420)
                        
          
                  FILL IN CAPITAL LETTERS, PLEASE


last name:                              first name:

Prof./Dr./Mr./Ms. affilliation:

university/dept.:

street:                                 town:

Code:     		               country:

phone:                                  fax:

e-mail:


name of accompanying  person(s):

date  (time) of arrival:                date of departure:

number of  nights:

I will attend workshop:  1.   2.   3.   4.  (tick, please)

ACCOMMODATION:  krystal hotel (Conference  site)
                an individual choice up to price per night:

Room:  single  double    NAME OF PERSON SHARING THE ROOM:
                                                                  
                                                                  
special needs  (vegetarian, disabled, etc.):

CONFERENCE FEES:                BEFORE  / AFTER FEBRUARY 20, 1997
CONFERENCE FEE (APRIL 23-25)  DM 270.00 / 320.00
MLNet WORKSHOP FEE (APRIL 26) DM  35.00 / 35.00
ACCOMPANYING PERSON FEE       DM  80.00 / 100.00
ACCOMMODATION DEPOSIT:   DM 150. 00
ACCOMMODATION BALANCE:
(NUMBER OF NIGHTS MINUS THE DEPOSIT )

SOCIAL PROGRAM:
SIGHTSEEING TOUR OF PRAGUE    DM   25. 00
TA FANTASTIKA THEATRE         DM   27. 00

TRIP & FAREWELL PARTY         DM   65. 00

TOTAL AMOUNT:

                      PAYMENT BY CREDIT CARD:

 AMEX     VISA      Master Card / Eurocard      JCB      Diners
club
Number:

Expire:              /             Four-numbers code  (for  amex
cards  only):         /       /       /       /


I,  the  undersigned, give  the authorization  to  the   Action  M
Agency  to withdraw from my account the equivalent in Czech Crowns
of

the  total amount of  DM                Your Signature

I agree to withdraw from  my credit card
the accommodation balance (after March 25)   Your Signature



                     PAYMENT BY BANK TRANSFER:

Name of  the bank
                                                                  
Date of payment                    Your  Signature
410.169707IJSAPL::OLTHOFSpellchecked Henry AlthoughSat Mar 01 1997 14:291415
	Knowledge Discovery Nuggets 97:07, e-mailed 97-02-24
Publications:
	* GPS, Review of Adv. KDDM in NeuroVe$t journal
Siftware:
	* R. Kohavi, SGI MineSet Available for Varsity Members
		http://www.sgi.com/Products/software/MineSet
Positions:
	* T. Gutschow, Data Mining Research Position at HNC Software Inc.
	* C. Shearer, Vacancies - Data Mining Tool Development & Consulting : 
		UK & US, at ISL 
	* W. Zhang, Job: Machine Learning at Boeing
Meetings:
	* M. P. Singh, 2nd CFP:  Workshop on Agent Theories, Architectures, 
		and Languages (ATAL), Providence, RI, July 24-26, 1997
		http://www.csc.ncsu.edu/faculty/mpsingh/activities/atal/
	* H. M. Chung, CFP: track on Data Mining at AIS-97, 
		Indianapolis, Indiana, August 15-17, 1997
		http://hsb.baylor.edu/ramsower/ais.ac.97
	* L. DeRaedt, CFP: IJCAI-97 workshop on Frontiers of Inductive 
		Logic Programming, 25 August 1997
	* M. Manago, 2 days course on Data Mining & CBR in San Francisco for 
		U. of Berkeley Extension, March 24-25, 1997
	* M. Manago, Tutorial + Seminar on CBR & Data Mining,
		London, 17-19 March 1997, 
		http://www.unicom.co.uk
--
Knowledge Discovery Nuggets is a free electronic newsletter for the 
Data Mining and Knowledge Discovery community, focusing on the latest 
research and applications.  Submissions should be emailed, 
with a DESCRIPTIVE subject line (and a URL, when available) to [email protected]

To subscribe, email to [email protected] message with 
	subscribe kdd-nuggets 
in the first line (the rest of the message and subject are ignored). 
See http://info.gte.com/~kdd/subscribe.html for details.

Nuggets frequency is approximately 3 times a month. 
Back issues of KD Nuggets, a catalog of S*i*ftware (data mining tools), 
and a wealth of other information on Data Mining and Knowledge Discovery 
is available at Knowledge Discovery Mine site http://info.gte.com/~kdd/

	-- Gregory Piatetsky-Shapiro (editor)

(p.s. this is my last week at GTE.  
Starting today, I can be reached at [email protected] .
After March 1, 1997 I will continue to edit and distribute KD Nuggets 
and maintain KD Mine pages at a new web site -- details to be announced soon!
The [email protected] and [email protected] email addresses would still 
work for a while. GPS)

********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) * 
* and not necessarily of their respective employers (or GTE Laboratories)   *
*****************************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Q: What is the link between a large number of meetings and a 
large number of job announcements?  
A: Somebody got to work, while all those other people go to meetings 

>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Sun, 16 Feb 1997 12:20:06 -0500
From: gps0 (Gregory Piatetsky-Shapiro)
Subject: NeuroVe$t journal and Data Mining for Financial Applications]
Content-Length: 3383

Here, reprinted with permission, is the review of AKDDM book from 
***
NeuroVe$t Journal, Jan/Feb 1996, pg.49, Reviews in Brief section -

Advances in Knowledge Discovery and Data Mining 
 
	Advances in Knowledge Discovery and Data Mining (AKDDM) provides
a well-edited collection of material from the 1994 KDD (Knowledge
Discovery in Databases) Workshop, and several additional invited papers.
In all, 23 papers presented in 7 chapters are included along with a
useful appendix on KDD terminology and resources on the Internet. 
Coupled with an extensive index and a very good job of editing, AKDDM
makes for a very accessible and worthwhile collection of papers.   
	Of particular interest to investors and traders, especially those
using data-driven computer technologies, are "A Statistical Perspective
on Knowledge Discovery in Databases" by Elder and Pregibon, which
provides a very good introduction to the topics.  "Finding Patterns in
Time Series" by Berndt and Clifford include in their studies a look at
various technical analysis patterns of daily DJIA prices from 1989 to
1993, using pattern templates that vary in length from 9 to 12 trading
days.  "Integrating Inductive and Deductive Reasoning for Data Mining" by
Simoudis, Livezey and Kerber involves the creation of portfolios of 100
stocks from 7 years of data on 1500 stocks. "Predicting Equity Returns
from Securities Data with Minimal Rule Generation" by Apte and Hong
describes a minimal rule generation technique for forecasting 1-month S&P
500 returns using 40 fundamental and technical variables (not
specifically identified). 
	Unfortunately, there is scant mention of the specifics of rough
sets, nearest neighbor classifiers, learning vector quantizers,
self-organizing maps, fuzzy logic and other tools of interest to
practitioners and applied researchers working in the field.  And, on more
than a couple of occasions, the authors (including the editors) appear to
venture beyond their respective areas of expertise.  However, the few
shortcomings are overshadowed by several very good introductory studies.
	Seldom do I recommend collections of workshop or conference papers
to the general audience.  However, AKDDM represents an exception. 
Despite its weaknesses, it provides a valuable introduction to a
relatively new, yet increasingly important area of applied research. 
Financial practitioners who are particularly interested in data mining
will certainly want to take a look. 
	Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, and
Ramasamy Uthursusamy (editors).  1996. The MIT Press, 55 Hayward Street,
Cambridge, MA 02142. 620 pages.  US$50.  ISBN 0-262-56097-6. 
617-253-5643. -- James Hampton 
***
(c) Copyright 1997 Finance & Technology Publishing, 
P.O. Box 764, Haymarket, VA 20168.  Reprinted
with permission of the publisher from NeuroVe$t Journal, Jan/Feb 1997.

Details on NeuroVe$t Journal (now named J. of Computational Intelligence 
in Finance are at) at http://ourworld.compuserve.com/homepages/ftpub


>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Sat, 15 Feb 1997 12:14:00 -0800
From: Ronny Kohavi <[email protected]>
Subject: SGI MineSet Available for Varsity Members
Reply-to: [email protected]

                   Silicon Graphics' MineSet 
                 Available to Varsity Members
                 ----------------------------

MineSet(TM) version 1.1 is the second release of SGI's product for
data mining and exploratory data analysis. MineSet integrates tools
for data access, transformations, analytical data mining, and visual
data mining.  See http://www.sgi.com/Products/software/MineSet for
more information.

In addition to 30-day free evaluation copies available to any site,
with the new release of SGI's Varsity program CDs (happening now),
varsity members can get PERMANENT MineSet licenses.

Any educational institution is eligible. To qualify, the institution
must have an infrastructure capable of handling technical software
support for its Silicon Graphics users who have purchased Varsity
Program software packages. THE VARSITY PROGRAM AGREEMENT MUST BE
COMPLETED AND SIGNED BY THE INSTITUTION AND APPROVED BY SILICON
GRAPHICS.

The institution buys the right to distribute Varsity Program Developer
Package right-to-use licenses in multiples of 10 or 25. These licenses
are maintained by purchasing yearly support. Thus, the cost of
ownership is significantly reduced in the second year and beyond.


                     How Does this Work
                     ------------------

SGI Varsity sites will get Varsity CD-ROMs with MineSet or they can
download it directly from
   http://www.sgi.com/Products/Evaluation/evaluation.html

To get a permanent license, the site administrator can use the VPX
(varsity ID) number to get a license from
   http://www.sgi.com/Products/license.html (click the radio
                                             button for varsity).

See http://www.sgi.com/silicon_campus/varsity.html for
more information about the SGI's varsity program.

For questions about MineSet, send e-mail to [email protected]
or visit our site at: http://www.sgi.com/Products/software/MineSet

--

   Ronny Kohavi ([email protected])

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "Gutschow, Todd" <[email protected]>
Subject: Data Mining Research Position at HNC Software Inc.
Date: Wed, 12 Feb 1997 17:51:58 -0800


The Technology Development Group at HNC Software Incorporated has an
opening for a Manager of Data Mining Technology Research. The Technology
Development Group is responsible for the core data analysis, data
mining, and
data modeling technology used in all HNC vertical solution products. The
position
will report to the Vice President of Technology Development and will be
located at HNC's headquarters facility in San Diego, CA.

Duties/Job Description:
  Conduct research in to new data mining algorithms in support of
  the Database Mining=D2 Marksman and other HNC products. Identify and
  coordinate data mining technology projects across all HNC operating
groups.
  Monitor the data mining research literature to identify promising new
techniques.
  Support product development and marketing activities via customer
  presentations, conference talks, and white papers.

Required Qualifications (Experience/Skills):
  MS or Ph.D. in computer science, engineering, mathematics or other
hard science (e.g., physics, chemistry, etc.). Five or more years
experience in implementing and evaluating new statistical data analysis, neural
  networks, and/or data mining algorithms. Good software development
  skills. Experience with modern software development processes and
tools (e.g., C++, Object oriented design, etc.). Strong communication and
  presentation skills.

Preferred Qualifications (Experience/Skills)
  Strong algorithm diagnosis and troubleshooting skills. Experience with
  database marketing and its associated data analysis problems. Project
  management experience.

  If you know someone with the above qualifications who is interested in
  employment opportunities with HNC, please ask them to fax, mail or
  e-mail resumes immediately to:

  Human Resources Department
  HNC Software Inc.
  5930 Cornerstone Court West
  San Diego, CA  92121
  FAX: (619) 452-6524
  E-mail: [email protected]
  Reference Job No. 293
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Colin Shearer <[email protected]>
Date: Thu, 13 Feb 97 14:36:13 GMT
Subject: VACANCIES - DATA MINING TOOL DEVELOPMENT & CONSULTING : UK & US

        VACANCIES - DATA MINING TOOL DEVELOPMENT & CONSULTING : UK & US
        ===============================================================

Integral Solutions Limited (ISL) is a leading supplier of advanced decision
support technology, specialising in data mining.

Our award-winning Clementine tool combines multiple modelling techniques
(neural networks, rule induction, regression) with data visualisation and
manipulation to extract high-value decision making knowledge from large bodies
of historical data. A rich visual programming interface makes Clementine
accessible to non-technologist "data owners" - business, rather than IT,
experts - and provides high productivity for "power" users. Clementine has
established a leading position in the data mining market, and is in use in a
wide range of industry sectors including finance, retail, telecoms,
pharmaceuticals, utilities, broadcasting, defence. Applications are diverse
and include demand prediction, customer profiling, risk assessment, turnover
forecasting, process optimisation, fault pre-emption and fraud detection.

We have an urgent need to recruit top-quality technical personnel. Current
vacancies are:


Data Mining Tool Developers
---------------------------

Basingstoke, UK.

To work on the ongoing development of Clementine.

Candidates should have an interest in, and ideally experience of implementing,
advanced modelling and data analysis techniques; experience of commercial data
mining tool development is desirable but not essential. Experience of some or
all of the following would also be useful:

    Unix                          GUI Development
    VMS                           Pop11
    X Windows / Motif             C
    Windows 95 / NT               SQL
    Databases/ODBC                Statistics

Applicants should have a 2.1 or better at first degree; a relevant second
degree may be an advantage. Technical excellence is expected, but must be
combined with first rate communications and interpersonal skills and a desire
for close contact with customers. Recent graduates and those with commercial
experience will both be considered.


Data Mining Consultants
-----------------------

Basingstoke, UK; King of Prussia, PA, USA.

To apply Clementine to customers' business problems. The role will include
pre-sales consulting, training, and developing solutions.

Candidates should be degree-qualified (2.1 or better) and, ideally, should
have experience of data analysis and modelling in a business environment.
Excellent communication and interpersonal skills are vital, and candidates
should display initiative, creativity, enthusiasm (and the ability to convey
it to clients) and self-management skills.

As ISL's clients span many markets, our consultants need the ability to
assimilate knowledge of any client's business, understand their problems, and
fit a data mining solution to these. However, we also encourage applications
from those with a specific business/sector specialisation (for example finance
(banking, insurance), retail or manufacturing).

We are willing to consider applications both from experienced consultants and
from any other candidates who believe they have the aptitude to be developed
into first-class consultants.



This is an opportunity to join a small (30 people) but dynamic and rapidly
developing company in an exciting business/technology area. ISL provides a
stimulating and technically challenging environment with considerable scope
for professional development.

ISL is an equal opportunities employer. We encourage applications from new
graduates through to experienced professionals. Salaries/benefits are
competitive, and commensurate with relevant experience.

Please apply with CV to:

For UK:                                 For US:

    Linda Montgomery,                       Kevin Peyton
    Integral Solutions Limited,             ISL Decsion Systems Inc.
    Berk House,                             630 Freedom Business Center
    Basing View,                            King of Prussia
    Basingstoke,                            PA 19406
    RG21 4RG                                USA
    UK

    Fax  : +44 1256 63467                   Fax  : (610) 768 7774
    Email: [email protected]                 Email: [email protected]

Tell us why you are the ideal candidate for a position at ISL.

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 17 Feb 1997 16:52:04 -0800
From: [email protected] (Wei Zhang)
Subject: Job: machine learning at Boeing

	**Outstanding Machine Learning Researcher needed**

The Boeing Company, the world's largest aerospace company, is actively
working research projects in advanced computing technologies including
projects involving NASA, FAA, Air Traffic Control, and Global
Positioning as well as airplane and manufacturing research.

The Research and Technology organization located in Bellevue,
Washington, near Seattle, has an open position for a machine learning
researcher.  We are the primary computing research organization for
Boeing and have contributed heavily to both short term technology
advances and to long range planning and development.
 
BACKGROUND REQUIRED: Machine Learning, Knowledge Discovery, Data
Mining, Statistics, Artificial Intelligence or related field.
 	    
RESEARCH AREAS: We are developing and applying techniques for data
mining and statistical analyses of diverse types of data, including:
safety incidents, flight data recorders, reliability, maintenance,
manufacturing, and quality assurance data.  These are not areas where
most large R&D data mining efforts are currently focused.  Research
areas include data models, data mining algorithms, statistics, and
visualization. Issues related to our projects also include pattern
recognition, multidimensional time series, and temporal databases. We
can achieve major practical impacts in the short-term both at Boeing
and in the airline industry, which may result in a safer and more
cost-effective air travel industry.
 
A Ph.D. in Computer Science or equivalent experience is highly
desirable for the position.  We strongly encourage diversity in
backgrounds including both academic and industrial
experiences. Knowledge of machine learning, statistics, and data
mining are important factors. Experience with databases and
programming (C/C++, JAVA, and Splus) is desirable.
   
APPLICATION: If you meet the requirements and you are interested,
please send your resume via electronic e-mail in plain ASCII format to
[email protected] (Wei Zhang). You can also send it via
US mail to

Wei Zhang
The Boeing Company
PO Box 3707, MS 7L-66
Seattle, WA 98124-2207

Application deadline is April 30, 1997. 

The Boeing Company is an equal opportunity employer.

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Note -- CFPs lately are getting too long! please send short
versions with all the wonderful details at your the conference website! GPS]

From: [email protected]
Subject: 2nd CFP: Agent Theories, Architectures, and Languages, 1997 (4th Intl Wshop)
Date: Mon, 17 Feb 1997 18:20:54 -0500 (EST)
Reply-To: [email protected]
			SECOND CALL FOR PAPERS

                    The Fourth International Workshop on
             Agent Theories, Architectures, and Languages (ATAL)

                        Providence, Rhode Island, USA
                              July 24-26, 1997
          http://www.csc.ncsu.edu/faculty/mpsingh/activities/atal/

Intelligent agents are one of the most important developments in computer
science in the 1990s. Agents are of interest in many important application
areas, ranging from human-computer interaction to industrial process
control. The ATAL workshop series aims to bring together researchers
interested in the agent-level, micro aspects of agent technology.
Specifically, ATAL-97 will address issues such as theories of rational
agency, software architectures for intelligent agents, methodologies and
programming languages for realising agents, and software tools for applying
and evaluating agent systems. Papers that consider macro-level, societal
issues of agent-based systems are welcome only if they explicitly relate to
the workshop themes. ATAL-97 will be held over the three days immediately
preceding the AAAI-97 conference, also being held in Providence. The ATAL-97
proceedings will be formally published as volume four of the Intelligent
Agents series from Springer-Verlag.

                                  TIMETABLE

 Submissions due               April 18, 1997
 Notifications sent            May 23, 1997
 Prefinal versions due         July 1, 1997
 Workshop                      July 24-26, 1997

[edited for brevity -- full details at URL above. GPS]
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 17 Feb 1997 12:15:07 -0800 (PST)
From: H Michael Chung <[email protected]>

Call for Papers

Association of Information Systems 1997 Americas Conference
Indianapolis, Indiana, August 15-17, 1997

Mini-track on 
"Tools and Applications of 
Data Mining,  Induction, and Knowledge Discovery:
In Search of  a  Mighty Tool"

Minitrack Chair: H. Michael Chung, CSULB

Description

This minitrack covers broader issues related to data mining, induction, and
knowledge discovery in the  areas of business and management applications.

Tools based on regression analysis, information theoretic methods, genetic
algorithms,  and  neural networks have  been applied to discover patterns of
financial fraud,  to capture customer profiles for marketing,   to predict
fluctuations in stock prices,  to control  product  quality, and to diagnose
telecommunication network problems,  among others . Expert decisions,
environmental/normative datasets, and Internet database are considered for
discovering information and knowledge.

There are many issues that should be addressed in order to reap quality
knowledge by applying sophisticated algorithms that would satisfy user
needs.  Some of the relevant topics include

- Applications of  Inductive Learning, Data Mining, and Knowledge
  Discovery 
- Data  Warehousing
- Statistical Inference of Data Mining
- Knowledge Acquisition
- WWW Database and Agents
- Evaluation of  Tools
- Economics of Decisions
- Data Visualization
- Learning Systems



***************Important Dates***************

Electronic Submission Deadline:	March 1st, 1997 

Notification of Acceptance:		April 15th, 1997

Camera Ready	Copy Due:		May 4th, 1997


***************Submission Guidelines******************

Each submission must be FORWARDED ELECTRONICALLY AS A WORD PROCESSING FILE
(MS WORD OR WORDPERFECT FORMAT) ATTACHED TO AN E-MAIL MESSAGE to the
mini-track chair, H. Michael Chung.  If this is not possible, then authors
should contact the mini-track chair and arrange for a suitable workaround.

Each submission is limited to THREE-PAGES IN LENGTH (APPROXIMATELY 1,750
WORDS) INCLUDING ALL FIGURES, TABLES, APPENDICES, AND REFERENCES, and must
include the
following:

a) The name, e-mail address, mailing address, university/organizational
affiliation, and phone/fax numbers of the contact person for the submission
in the first few lines of the file,

b) The submission title and the author's(s') name(s), the author's(s')
e-mail address(es), mailing address(es), and author's(s')
organization/university affiliation(s),

c) An abstract of the submission,

d) The body of the submission, and

e) A list of references or a bibliography.

All conference submissions and the submission review processes will be
managed through e-mail.  The receipt of submissions will be quickly
confirmed by the mini-track chair.  Submissions should follow the style
guidelines of the MIS Quarterly.  All camera-ready copy preparation details
will be provided to submitting authors by the mini-track chairs through
e-mail upon acceptance.

Please send any questions and all submissions to Data Mining mini-track to 

H. Michael Chung
Department of Information Systems
College of Business Administration
California State University, Long Beach
Long Beach, CA 90840-8506
TEL (562) 985-7691
FAX (562) 985-5543
INTERNET [email protected]

For additional information on the 1997 AIS Americas Conference, 
please see the homepages, http://hsb.baylor.edu/ramsower/ais.ac.97.

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 17 Feb 1997 15:05:21 +0100 (MET)
From: Luc De Raedt <[email protected]>

               CALL FOR PARTICIPATION and PAPERS

                     IJCAI-97 Workshop on 

	     FRONTIERS OF INDUCTIVE LOGIC PROGRAMMING 

                   Monday  25 August 1997

GENERAL INFORMATION 

The IJCAI-97 one day workshop on "Frontiers of ILP" in Nagoya, Japan, 
will take place on August 25, immediately prior to 
the start of the main IJCAI conference. 

TECHNICAL DESCRIPTION

Inductive logic programming (ILP) is a recent subfield of
artificial intelligence that studies the induction of first order formulae 
from examples. The purpose of this workshop is twofold:
on the one hand, we wish to widen the scope of ILP
by investigating its relations to neighboring fields, 
and on the other hand, we wish to make ILP more accessible 
for researchers from neighboring fields.

The workshop therefore solicits papers
that lie at the frontiers of ILP with neighboring fields.
A non-exclusive list of interesting topics for the workshop includes :

* ILP and Software Engineering: 
  what has ILP to offer to Software Engineering ?, 
  and in what way can Software Engineering help to design ILP systems
  and applications  ?

* ILP for Knowledge Discovery in Databases : ILP aims 
  at learning complex rules involving multiple relations from small 
  databases, whereas KDD typically induces simple rules about a 
  single relation from a large database. Furthermore, ILP allows to
  exploit background knowledge in a variety of ways. Can KDD and ILP be  
  succesfully combined ?  

* ILP and Computational or Algorithmic Learning Theory :
  though many results have been obtained concerning the learnability
  of inductive logic programming, most of the results are negative
  and most of the positive results are reducible to propositional learning 
  methods.  Is there a mismatch of COLT with ILP ? and if so,
  what can be done about it ?

* ILP versus propositional learning methods :
  Since the very start of ILP,  researchers and practioners of 
  machine learning have wondered about the relation between 
  ILP and propositional learning methods. Theoretical and experimental 
  questions that arise include:
  when to use ILP and when to use propositional learning methods ?
  under what circumstances can ILP be reduced to propositional learning ?
  what is the price to pay for using first order logic in 
  terms of efficiency ?

* ILP and Knowledge Representation : ILP has traditionally employed
  computational logic to represent hypotheses and observations.
  Alternative well-founded knowledge representation formalisms have received
  little attention  (with the exception of CLASSIC). 
  What can ILP learn from Knowledge Representation  ?
  and in what well-founded Knowledge Representation formalisms
  is induction feasible ?

* ILP in multistrategy learning : Multistrategy learning 
  combines multiple learning strategies. What role can ILP
  play for multistrategy learning ?

* ILP and Probabilistic reasoning:  in contrast to 
  propositional learning methods, ILP has not used
  probabilistic representations. How can ILP incorporate
  such representations ? and how can it interact with
  methods such as Bayes nets or Hidden Markov Models ?

* ILP for Intelligent Information Retrieval: 
  The rapid development of
  the World Wide Web has spawned significant interest in intelligent
  information retrieval. In particular, the need for algorithms for
  reliably classifying textual documents into given categories (like
  interesting/uninteresting) be useful for a wide variety of tasks.
  Currently, most learning algorithms are not able to make use of
  structural information like word order, succesive words, structure of
  the text, etc. Can ILP algorithms offer advantages over conventional
  information retrieval or machine learning algorithms for this sort of
  tasks?

* Applications of ILP in subfields of AI : ILP has been applied
  to other subfields of AI, including natural language processing,
  intelligent agents and planning. 
  Further applications of ILP within AI are solicited.

Both position papers about the relation of ILP to other fields, as well
as research papers that make specific techical contributions
are solicited. However, to stimulate discussion, it is expected 
that each technical paper also clarifies the position 
of ILP with regard to the neighboring field(s) it addresses.

Except for the presentation of position and technical papers,
the workshop will also feature a panel discussion
on the frontiers of ILP and possibly an invited talk.

ORGANISERS

Luc De Raedt (chair and primary contact)
Saso Dzeroski
Koichi Furukawa  
Fumio Mizoguchi
Stephen Muggleton

PROGRAMME COMMITTEE

Francesco Bergadano  (Italy)
Luc De Raedt (co-chair, Belgium)
Saso Dzeroski (Slovenia)
Johannes Furnkranz  (Austria)
Koichi Furukawa  (Japan)
David Page (U.K.)
Fumio Mizoguchi  (Japan)
Ray Mooney (U.S.A.)
Stephen Muggleton (co-chair, U.K.)


CALL FOR PARTICIPATION

Participation is open to all members of the AI Community.
However, to encourage interaction and a broad exchange of ideas
the number of participants will be strictly limited
(preferably under 30 and certainly under 40). 

Participants will be selected on the basis of submissions.
Three types of submissions will be considered :
1) technical contributions (ideally, a 3 to 5 page extended abstract, 
                         in the IJCAI Proceedings Format, 3000-4000 words),
2) position papers  (ideally, a 1 to 3 page abstract
                  in the IJCAI Proceedings Format, 1000 - 3000 words)
3) a statement of interest (ideally, a one page motivation of why you 
     would like to participate, 300- 500 words) 
Only submissions of type 1) and 2) will be considered 
for presentation at the workshop and inclusion in the workshop notes. 

Submissions should be received no later than April 1, 1997,
and must include  first  author's  complete   contact  information, 
including address, email, phone, and fax number. Though 1 April
is the hard deadline, the authors are encouraged to submit
their material by 24 March, in order to facilitate the reviewing process. 


Double submissions with the ILP-97 Workshop (which is to take
place in Prague, September 1997) are allowed. 

SUBMISSIONS

Submit papers by email (postscript) and surface mail (2 copies) to

   Luc De Raedt
   Dept. of Computer Science 
   Katholieke Universiteit Leuven
   Celestijnenlaan 200A
   B-3001 Heverlee
   Belgium
   Email : [email protected]

IMPORTANT DATES

  - Paper submission : 1 April 
  - Notification to Authors : 21 April 
  - Camera ready copy : the submissions themselve  
                        will serve as camera ready copy
     (submissions in the IJCAI Proceedings Style are strongly preferred,
     see http://www.ijcai.org/ijcai-97/ for details)

PUBLICATION

The accepted submissions will be included in the workshop notes
to be distributed at the workshop.
Post-conference publication of a selection of the workshop papers
will be considered and discussed at the  workshop.

COSTS

To cover costs, a fee of $US 50 will be charged, 
in addition to the normal IJCAI-97 conference registration fee.
Attendees of IJCAI workshops will be required to register
for the main IJCAI conference. 

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "MANAGO" <[email protected]>
Subject: 2 days course on Data Mining & CBR in San Francisco for University of Berkeley Extension
Date: Tue, 18 Feb 1997 17:23:09 +0100

Continuing Education in Engineering
University of California Berkeley Extension
Intensive short course at the San Francisco Airport

Course Organizer
Michel Manago, Acknosoft

Course Lecturers
Dr Usama Fayyad, Senior Researcher, Microsoft Research
Dr Michel Manago, President, Acknosoft international
Dr Evangelos Simoudis, Vice President, Data Mining and Decision Support
Solutions at IBM


Data Mining and Case-Based Reasoning (CBR): Principles and Applications
An intensive two-day course
Monday-Tuesday, March 24-25, 1997
San Francisco Airport

Course Description
The objective of this course is to present technologies for making better
use of data for decision-making purposes. Data mining techniques are used
to extract decision knowledge: for instance, in the form of a decision tree
or decision rules from a database. Case-based reasoning is the name given
to problem-solving methods that make direct use of past experiences (cases)
rather than a corpus of general knowledge. Data mining (DM) and case-based
reasoning (CBR) technologies can be used to:
 * 	Explore and analyze databases and generate hypotheses about the data;
 * 	Anticipate future events (decision support);
 * 	Solve a new problem, whose solution is unknown, by retrieving and
adapting similar problems that have been previously solved.
According to the meta-group, the market for data mining is estimated at
$800 million by the year 2000. It is considered to be one of the three key
technologies that will have the biggest impact on information technologies
in the third millennium.
The course addresses both practical and theoretical issues. We will compare
and contrast the technologies, present the architecture of CBR and DM
systems, describe some algorithms, and more. We also will show how: cases
are indexed for efficient retrieval; the similarity between new and past
cases is assessed; cases can be represented; to use domain knowledge in
addition to data to characterize applications domains and reveal the
underlying methodology for building an application. We will identify the
market and present real applications in various domains such as technical
maintenance (diagnosis of Boeing 737 aircraft engines), customer support
(help desk for troubleshooting SEPRO robots in the plastic industry),
configuration (layouts of composite parts of an autoclave at Lockheed),
financial decision support, retail, and fraud detection.


Who Should Attend
This course is intended for:
 * Business analysts who want to have an in-depth overview of data mining
technology and learn what it can really do and cannot do
 * IT managers and technical staff who are in charge of engineering business
information systems and who want to learn how to implement data mining
solutions
 * End-users who need to make better use of their data for decision making
 * Customer service managers, maintenance managers, manufacturing managers,
financial decision makers who want to learn how to solve problems more
efficiently and at reduced costs
Anyone with a specific application in mind can benefit from the course,
which provides an overview of the technologies as well as of the
applications. Non-technical people will benefit from the basics of the
course, such as general principles and overview of applications
(quantification of business benefits, for example). There are no
prerequisites; this tutorial describes basic notions and illustrates these
with meaningful examples from a variety of applications in technical
maintenance, customer support, manufacturing, banking, and the consumer
market. Computer skills are not required.


Schedule
Monday-Tuesday, March 24-25, 1997
Registration: 8:00 am Monday
Lectures: 8:30 am-4:30 pm daily
Lunches: noon-1:00 pm daily


Location
Embassy Suites Hotel, San Francisco Airport, 150 Anza Blvd., Burlingame,
California.


Fee
The fee is $895 (EDP 326611). This includes:
 * 	2 days of instruction (1.4 ceu)
 * 	Comprehensive course notes
 * 	Daily lunches and refreshments 


Topic Outline

Day One

From Data to Decisions
This brief introduction will provide to the attendees a common ground that
will enable them to understand and participate in the rest of the tutorial.
We will define knowledge discovery (KDD) in databases and case-based
reasoning (CBR)

Introduction to Knowledge Discovery in Databases
In this section we will:
Provide a general architecture for a generic KDD system that will enable
the subsequent discussion of the fundamental KDD issues, presentation of
the various KDD techniques, and description of various existing KDD
systems.
Present the basic knowledge discovery process, from the initial stages of
selecting data and cleaning of the selected data, to the identification of
important attributes and the final stages of integrating the extracted
knowledge into a decision support system.
Briefly discuss the various types of data mining techniques that are
commonly used for KDD. A brief introduction of CBR will be made.
Outline the core research issues in the field of KDD, as well as present
how these issues relate to fundamental AI issues such as representation and
search.

Preparing Data for Mining
The quality of the knowledge extracted by a KDD system from a data set is
related to the quality of the provided data. In this part of the tutorial
we will:
Examine various data problems, e.g., noisy data, incomplete data,
low-information content data, etc.
Discuss how each such problem affects the KDD operation.
Present techniques for solving certain of these problems, e.g., data
cleaning techniques. The large size of the databases that must be analyzed
necessitates the use of sampling techniques and the application of
dimensionality reduction techniques on a data set before a data mining
method is applied to it. We will present commonly used sampling methods and
discuss how they can be implemented. We will also discuss commonly used
dimensionality reduction techniques from statistics, e.g., principal
component analysis, and the use of domain knowledge for identifying
important attributes of a data set. Due to the particular prevalence and
importance of time-series data in a variety of application domains, we will
discuss techniques for preprocessing such data before it is presented to a
KDD system.

Data Mining and Technique Selection
We will present data mining techniques from five basic areas: (1)
artificial intelligence, (2) neural networks, (3) statistics, (4)
multidimensional and deductive databases, and (5) data visualization.
With each type of technique we will present its pros and cons with respect
to the generic KDD model defined in the tutorial's first part.

Databases and Visualization Techniques
Multidimensional and deductive databases merge knowledge-based techniques
with database technology. Recently such databases have been successfully
coupled with relational and legacy database management systems, providing
analysts with unique ways to express and automatically test hypotheses on
very large data sets. In addition, research on very large databases has
resulted in a variety of KDD techniques, such as association discovery and
sequence discovery. These techniques are based on simple database
operations, such as aggregation, and are applicable to specific types of
data, such as those commonly collected by large retail chains. We will
provide an introduction to multidimensional and deductive databases,
discuss data warehousing concepts, present how these techniques can be
applied on KDD tasks, and review the current research on databases.
Visualization has traditionally been used for the presentation of results
obtained by other methods, e.g., statistical analysis. We will discuss how
interactive visualization techniques can be used for knowledge discovery
operations. We will begin with simple techniques (scatter plots and line
plots, for example) and proceed with modern 3-D visualization techniques.

Some Examples of KDD Applications
We will first develop a set of criteria for comparing KDD systems. We will
then review in depth two such systems developed by the authors and
considered by the research community as representing the state-of-the-art:
IBM's customer segmentation data mining system and JPL's SKICAT system. In
addition to presenting the architecture of each system and discussing the
KDD methods it integrates, we will present a detailed account of how the
systems have been applied on financial, retail, manufacturing, astronomy,
and large image databases in planetary sciences.

Demonstration of a Data Mining System and Applications

Summary of the Day and Discussion
Summary, recap, overview of the basic unifying themes, and pointers to
available literature on KDD and future work.


Day Two

Overview of Case-Based Reasoning (CBR) Technology
In this introduction, we will present an overview of CBR, detail the CBR
cycle, and explain the main characteristics of CBR technology.

Applications of CBR in Technical Domains
We will present several CBR applications in technical domains. These deal
with maintenance, customer support, manufacturing, design, rapid evaluation
of production costs, and sale-support.
Troubleshooting CF56-3 engines for the Boeing 737. Time spent by airline
maintenance operators to solve engine failures and related costs (flight
delays or cancellations) are a major concern. The use of an intelligent
diagnostic software contributes to improving customer support and reduces
the cost of ownership by improving troubleshooting accuracy and reducing
airplane downtime. We will examine this application from the engine
manufacturer perspective (CFM international/Snecma) as well as from the
client's perspective (British Airways). Integration of the CBR
troubleshooting with electonic technical documentation. Demonstration.
A help desk for troubleshooting SEPRO robots in the plastic industry. Case
study from a small size company (160 employees) that has adopted CBR for it
customer support services. Demonstration.
Improving feedback from experience in manufacturing. We will present the
ongoing Noemie data warehousing and data mining project. Noemie aims at
increasing the quality and reliability of equipments for the oil industry.
Case study from the manufacturer perspective (Schlumberger) as well as from
the end-user's perspective (Nork Hydro).

CBR: How It Works
Based on the review of applications that will have been presented during
the morning, we will go into the details of the algorithms and present how
they have been used. In particular, we will describe mechanisms for:
retrieving cases;  assessing the similarity; and indexing cases. We will
describe the link between induction, a form of KDD, and CBR. We also will
present some sample algorithms.

Comparing CBR with Other Technologies
During this part of the tutorial, we will compare CBR and other
technologies for decision making. In particular, we will look at rule-based
expert systems, classical statistics, neural networks, and standard
database queries. We will review a case study done at a banking institution
for comparing credit scoring, CBR, and rule-based expert systems.

Case-Based Reasoning in Practice
During this final presentation, we will detail the basic steps and a
methodology for building a CBR system. We will describe how to model cases,
state how cases can be acquired from scratch or from existing databases,
review potential sources for the cases, and explain how to choose an
algorithm. We will also investigate organizational issues for assuring case
quality and explain how human factors have to be taken into consideration
when delivering a CBR application.

Summary of the Tutorial and Discussion


Lecturers
Usama Fayyad, Ph.D., is a Senior Researcher at Microsoft Research. He is
also a Distinguished Visiting Scientist at the Jet Propulsion Laboratory
(JPL), California Institute of Technology, and an adjunct professor of
computer science at University of Southern California. Prior to joining
Microsoft Research, he headed the Machine Learning Systems Group at JPL and
was Principal Investigator of the Science Data Analysis and Visualization
Task and other tasks involving machine learning applications. He received
his Ph.D. in computer science and engineering from the University of
Michigan, Ann Arbor. He is a recipient of the NASA Exceptional Achievement
Medal (1994) and the 1993 Lew Allen Award for Excellence at JPL. He has
co-chaired Knowledge Discovery in Database conferences KDD-94 and KDD-95,
and is general chair of KDD-96. He is a co-editor of Advances in Knowledge
Discovery and Data Mining (AAAI/MIT Press 1996), and Editor-in-Chief of a
new journal on this topic (Kluwer).

Michel Manago, Ph.D., is the scientific and managing director of AcknoSoft.
Dr. Manago graduated from the University of Illinois at Urbana-Champaign
and obtained his Ph.D. at the University of Paris, writing his thesis on
"Integration of Symbolic and Numeric Techniques in Machine Learning." He
has applied DM and CBR in technical domains such as diagnosis of Boeing 737
engines, customer support for marine diesel engines and robots, maintenance
of trains, reliability analysis of gas meters, experience feedback to
increase quality of production when manufacturing oil equipment, nuclear
safety, design of plastic parts in the manufacturing industry, and active
sale support over the Internet. He is author of the KATE line of products
for DM and CBR. He is editor of the book Advances in Case Based Reasoning
(Springer Verlag, 1995) and author of the report "A Review of Industrial
Case-Based Reasoning. He received the Information Technologies European
Award in 1995 (the European "Nobel prize" in computer technologies), among
other honors.

Evangelos Simoudis, Ph.D., is Vice President, Data Mining and Decision
Support Solutions at IBM, where he is responsible for the development and
deployment of data mining solutions to IBM's customers worldwide. Prior to
joining IBM, Dr. Simoudis was a Group Leader of the Data Comprehension
Group at the Lockheed AI Center where, since 1991, he led the development
and market introduction of the Recon data mining system and led research on
knowledge discovery in databases, machine learning, case-based reasoning
and their application to financial, retail, and fraud detection problems.
In 1994 Dr. Simoudis and his team were awarded Lockheed's Pursuit of
Excellence Award for their work on the Recon system. Dr. Simoudis is also
an adjunct assistant professor at the computer engineering department of
Santa Clara University. Dr. Simoudis holds a Ph.D. in computer science from
Brandeis University, an M.S. in computer science from the University of
Oregon, a B.S. in electrical engineering from the California Institute of
Technology, and a B.A. in physics from Grinnell College.



Enrollment Information
Enrollment may be made by companies or individuals. Enrollment is limited
and advance enrollment is required. Upon request, a place in the course
will be reserved for individuals who require time to obtain authorization.
To reserve a place, call (510) 642-4151, or fax (510) 642-6027.
How to enroll
By phone: You may enroll by phone if you use MasterCard, Visa, or American
Express; call (510) 642-4111.
By fax: If you use MasterCard, Visa, or American Express, fill out the form
on the back of this brochure and send it via fax number (510) 642-0374.
Please be sure to fax the entire form including the mailing label, if there
is one. Please provide all the information requested on the form.
By mail: Fill out and return the enrollment form provided.
By purchase order: Companies, agencies, and other organizations may pay
course fees by purchase order.
Enrollments must be accompanied by the full fee or by purchase order
authorization. You may pay by check or use MasterCard, Visa, or American
Express. Make checks payable to the UC Regents.
For efficient enrollment processing, we must have the Priority Code from
this publication, whether or not it is addressed to you. This five-digit
code (three numbers and two letters) appears on the mailing label above the
addressee's name. If there is no label on your copy, the code appears in a
box in the middle of the address surface.
Cancellation policy: Any cancellation is subject to a $30 processing fee.
Cancellations received less than five working days from the start of the
course are subject to a $100 cancellation fee. Substitutions may be made at
any time. If the course is not held for any reason, UC Berkeley Extension's
liability is limited to refund of the full course fee.
Confirming your enrollment: If you enroll by mail and have not received an
enrollment confirmation five days prior to the scheduled date of the
course, please call (510) 642-4151 to confirm that the course will convene
as scheduled.
Housing
A group of rooms will be set aside at the Embassy Suites Hotel, San
Francisco Airport, 150 Anza Blvd., Burlingame, California, and reservation
information will be sent to enrollees.  Participants may reserve rooms in
advance with Embassy Suites, phone (415) 342-4600 or fax (415) 342-8109.
Special rates will be available; participants in these courses should so
identify themselves when requesting room reservations.  Reservations must
be made no later than one month before the date of your course.  After this
date room reservations will be accepted only on a rate and space
availability basis.
Airport transportation and parking
Courtesy shuttle service is provided between the hotel and the airport.
There is ample free parking available at the hotel.
Continuing education units (ceu)
These units are a nationally recognized means of recording noncredit study
and are accepted by many employers and relicensure agencies as evidence of
a serious commitment to career advancement and the maintenance of
professional competence. One ceu is awarded for each 10 hours of
attendance. If you want us to keep a record of your ceu study you must fill
out and return a form that will be distributed in class.
Program Coordinator
Linda Reid, Continuing Education in Engineering, University Extension,
University of California, Berkeley
Program Representative
Natalie Dennis, Continuing Education in Engineering, University Extension,
University of California, Berkeley



General Information


Housing
A group of rooms will be set aside at the Embassy Suites Hotel, San
Francisco Airport, 150 Anza Blvd., Burlingame, California, and reservation
information will be sent to enrollees.  Participants may reserve rooms in
advance with Embassy Suites, phone (415) 342-4600 or fax (415) 342-8109.
Special rates will be available; participants in these courses should so
identify themselves when requesting room reservations.  Reservations must
be made no later than one month before the date of your course.  After this
date room reservations will be accepted only on a rate and space
availability basis.

Airport transportation and parking
Courtesy shuttle service is provided between the hotel and the airport.
There is ample free parking available at the hotel.

Continuing education units (ceu)
These units are a nationally recognized means of recording noncredit study
and are accepted by many employers and relicensure agencies as evidence of
a serious commitment to career advancement and the maintenance of
professional competence. One ceu is awarded for each 10 hours of
attendance. If you want us to keep a record of your ceu study you must fill
out and return a form that will be distributed in class.

Program Coordinator
Linda Reid, Continuing Education in Engineering, University Extension,
University of California, Berkeley

Program Representative
Natalie Dennis, Continuing Education in Engineering, University Extension,
University of California, Berkeley


If you have questions

Call (510) 642-4151, e-mail [email protected], fax (510) 642-6027,
or write to Continuing Education in Engineering, University Extension, UC
Berkeley,
1995 University Ave., Berkeley, CA 94720-7010

The University of California, in accordance with applicable federal and
state law and University policy, prohibits discrimination, including
harassment, on the basis of race, color, national origin, religion, sex,
disability, age, medical condition (cancer-related), ancestry, marital
status, citizenship, sexual orientation, or status as a Vietnam-era veteran
or special disabled veteran. This nondiscrimination policy covers
admission, access, and treatment in University programs and activities.
Inquiries may be directed as follows: sex discrimination and sexual
harassment: Carmen McKines, Title IX Compliance Officer, (510) 643-7895;
disability discrimination and access: Ward Newmeyer, A.D.A./504 Compliance
Officer, (510) 643-5116 (voice or TTY/TDD); age discrimination: Alan T.
Kolling, Age Discrimination Act Coordinator, (510) 642-6392. Other
inquiries may be directed to the Academic Compliance Office, 200 California
Hall, #1500, (510) 642-2795.





CONTRACT TRAINING

Enlist our experts at your location
At UC Berkeley Extension we're committed to working with you and your staff
to help achieve your objectives. Through the Berkeley Partnership for
Professional Development, we'll meet with you to analyze your staff's
training needs, then custom-design a program to satisfy your special
requirements. Or you can select from our many established courses.

Contract training offers:
_ Choice of format: from workshops and sequential classes to multiday
residential seminars
_ Highly qualified instructors
_ Convenient location: on-site at your company or at a facility of your
choice
_ Courses tailored to your needs

To discuss your training needs,
call Karl Johnson at (510) 642-4151 or fax (510) 642-6027



ENROLL BY FAX with MasterCard, Visa, American Express, or a company
purchase order: (510) 642-0374.
Or enroll by phone with MasterCard, Visa, or American Express: (510)
642-4111. 

Please give us the Priority Code (see below) if you enroll by phone.

To enroll by mail, return this entire page. Please do not remove the
mailing label.
Mail to: Dept. B, UC Berkeley Extension, 1995 University Ave., Berkeley, CA
94720.

Name
last
first
middle

Position
Company name

BUSINESS ADDRESS
number
street
mail stop
city
state
zip

Daytime phone
Fax number
These numbers are requested so that you can be notified if there is a
change in the schedule or status of your course.

Priority Code   6     0     9   ___ ___
For efficient processing, we must have the Priority Code from this
publication, whether or not it is addressed to you. This 5-digit code (3
numbers and 2 letters) appears on the mailing label above the addressee's
name. If there is no label on your copy, the code appears in a box in the
middle of the address surface.


I enclose $ ___________to cover_______enrollments in:

_____ Data Mining and Case-Based Reasoning
$895
EDP 326611


To pay by check, make check payable to the UC Regents.

To use oMasterCard oVisa oAmerican Express  check appropriate box and give:
account number
date card expires
authorizing signature

For companies/agencies:
_____ Purchase order enclosed
(For proper processing this form must accompany your purchase order.)



Michel Manago
AcknoSoft
58 rue du Dessous des Berges
75013 Paris - France
tel : (33 1) 44 24 88 00, fax : (33 1) 44 24 88 66
web : http://www.AcknoSoft.com

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "MANAGO" <[email protected]>
Subject: Tutorial on CBR & Data Mining in London + 2 days seminar 
on applications of CBR & Data Mining
Date: Tue, 18 Feb 1997 17:32:40 +0100

The following events are taking place in London on 17-19 March 1997
For registration please see  the website (http://www.unicom.co.uk).



Principles & Applications of CBR & Data Mining

UNICOM Tutorial + Seminar Organized by Dr Michel Manago, Acknosoft

OBJECTIVES:
The objective of the tutorial is to present technologies for making better
use of data for decision making purposes. Induction is a data mining
technique that is used to extract decision Knowledge, for instance in the
form of a decision tree or decision rules, from a database. Case-Based
Reasoning is the name given to problem solving methods that make direct use
of past experiences (cases) rather than a corpus of general Knowledge. The
technologies can be used for:

1.      Exploring and analysing databases and generate hypothesis about the
data
2.      Anticipate future events (decision support)
3.      Solve a new problem, whose solution is unknown, by retrieving and
adapting similar problems that have been previously solved.

During this course, we will describe the underlying techniques and
methodologies to improve the decision making process by making better use
of data. The course will address both theoretical and practical issues. We
will compare and contrast the technologies, present the architecture of a
CBR and a DM System, describe some algorithms etc. We will show how cases
are indexed for efficient retrieval, how the similarity between new and
past cases is assessed, how cases can be represented, how to use domain
knowledge in addition to data, characterise applications domains and reveal
the underlying methodology for building an application. We will identify
the market and delineate real applications in various domains.

A. From data to decisions

The brief Introduction will provide to the attendees a common ground that
will enable them to understand and participate in the rest of the tutorial.
We will define Data Mining (induction) and Case-Reasoning (CBR).


B. Introduction to induction

In this section we will:

1. Present how to generate decision tree by induction
2. Present the inductive process, from the initial stages of selecting data
to the identification of important attributes, and the final stages of
integrating the extracted knowledge into a decision support system.


C. Presentation of Based Reasoning (CBR) technology

In this introduction, we will present an overview of CBR, detail the CBR
cycle and explain the main characteristics of CBR technology..

D. CBR : how it works

We will go into the details of the algorithms and present how they have
been used. In particular, we will describe mechanisms for :

1. retrieving cases
2. assessing the similarity
3. Indexing cases. We will describe the link between induction, a form of
KDD, and CBR

Finally, we will present some sample algorithms.

E. Preparing Data for CBR and Data Mining

The quality of the knowledge extracted by a decision support system from a
data set, is related to the quality of the provided data.  In this part of
the tutorial we will examine various data problems, e.g., noisy data,
incomplete data, low-information content data, etc.

F. Comparing induction and CBR with other technologies

During this part of the tutorial, we will compare KDD & CBR and other
technologies for decision making. In particular, we will look at rule based
expert systems, classical statistics, neural networks and standard database
queries. We will review a case study done at a Banking institution for
comparing credit scoring, CBR and rule base expert systems.

G. Applications of CBR and data mining

During this final presentation, we will detail the basic steps and a
methodology for building a CBR system. We will describe how to model cases,
stated how cases can be acquired from scratch or from existing databases,
review potential sources for the cases and explain how to choose an
algorithm. We will also investigate organisational issues for assuring case
quality and explain how human factors have to be taken into consideration
when delivering a CBR application. We will also try to characterise the
market for CBR and data mining.

H. Summary of the tutorial and discussion


PRESENTER:

Dr Michel Manago graduated from the University of Illinois in
Urbana-Champaign in 1983. He  obtained his PhD in 1988 at University of
Paris on "Integration of Symbolic and Numeric Techniques in Machine
Learning. Since 1991, Dr Manago has been the scientific and managing
director of AcknoSoft where he has been "putting the technology to use".
Michel Manago is the father of the KATE line of products for taking smart
decisions from data. He was chairman of the 2nd European workshop on CBR in
1994, editor of the book Advances in Case Based Reasoning (Springer Verlag,
1995) and author of the report "A review of industrial Case Based
Reasoning. Dr Michel Manago received the Information Technologies European
Award in 1995 (the European "Nobel prize" in computer technologies), the
1st prize for innovative software application at the XPS trade show in
Germany in 1995 and the 1996 Application of the Year award by the French
computer magazine "Decision micros et rouseaux".


CBR and Data Mining: Putting the Technology to Use

 BACKGROUND

Companies have gathered vast amounts of data that is not well used. Some
corporate databases almost work in write-only mode! Well exploited, this
mass of data could be turned into strategic corporate knowledge :

- the marketing department wants to discover trends in buyer behaviour
- the after sales division must work more efficiently so that the company
  keeps customers
- the financial department wants to assess risks in a better way
- quality management and control must be improved...

However, going from data to decisions is not an easy task.

Innovative computer technologies such as data mining and Case Based
Reasoning (CBR), will help you solve complex problems in domains where
experience plays a critical role in good decision making. And with only a
short delay develop a solution and a guaranteed payback.

(C) Copyright AcknoSoft, 1997

OBJECTIVES :

The goal of this seminar is to get a clear view about the state of the art
of applying data mining and CBR technologies to solving practical problems.
The emphasis of the seminar will be on presentations done by users of the
technology as opposed to technology providers. They will share their
experience and delineate the benefits as well as the difficulties of
putting the technologies into use. The themes that will be covered by the
speakers include

- What are CBR and data mining?
- Features of the software products they have used to build their
  application
- Comparison of data mining and CBR with other technologies
- Methodologies for case acquisition and maintenance
- Ensuring case quality and monitoring it over time
- Organisational issues that needed to be solved in order to field the
  application
- Human factors
- Overcoming technological risks
- Cost and benefits of using data mining and CBR in various domains

The goal of the seminar is to present a clear view about issues that are in
common when building CBR and data mining applications in different domains
(banking, insurance, customer support and help desk, manufacturing,
energy). We will focus on general topics such as how to assess the costs
and quantify the benefits of using the technology, how to model cases so
that they contain the right sort of knowledge for decision making purposes,
how to use the tools to build systems that analyse cases efficiently or how
to manage a CBR project from the customer's perspective.

Benefits of Attending

-Find out how the knowledge of your specialists available to everyone in
 your organisation
-Learn how to solve problems more quickly without the burden of building
 expert systems
-Capitalise your experience
-Elicit the user point of view
-Share experience with other CBR application developers
-Find out how to analyse and distill your data into usable knowledge
-Take smart decisions that are based on your experience

Programme

Day 1

Brief introduction by Michel Manago
Short presentation about Data Mining and CBR, introduction of the
objectives of the seminar.

Using data mining and CBR at Deloitte & Touche
Olivier Curet and Jonathan Killin Deloitte & Touche  Consulting Group UK,=
410.1797:08IJSAPL::OLTHOFSpellchecked Henry AlthoughSat Mar 01 1997 14:30667
	Knowledge Discovery Nuggets 97:08, e-mailed 97-02-28
News:
	* GPS, New Location for KD Mine and KD Nuggets: www.kdnuggets.com
	* W. Kloesgen, KDD-97: Second Call For Panel Proposals 
 	* P. Maiste, Price Waterhouse announces new data mining services
	* T. Denecke, Query: Data Mining and Workflow Management ?
	* D. Throop, Query: Finding approximately duplicate records ?
Publications:
	* P. Stolorz, CFP: DMKD special issue on scalable computing
	http://www.research.microsoft.com/research/datamine/dmkdpar
Siftware:
	* G6G, Intelligent Software Web Site, 
		http://www.intelligent-dir.com
Positions:
	* W. Buntine, summer students and scientist positions in 
		autonomous data analysis
	* B. Masand, KDD Job at GTE Laboratories, Waltham, Ma 
	* S. Wrobel, Two positions in Machine Learning/Data Mining at GMD
--
KD Nuggets is a free electronic newsletter for the Data Mining and Knowledge 
Discovery community, focusing on the latest research and applications.

Submissions are most welcome and should be emailed, 
with a DESCRIPTIVE subject line (and a URL) to [email protected]
To subscribe, email to [email protected] message with 
	subscribe kdd-nuggets 
in the first line (the rest of the message and subject are ignored). 
See http://www.kdnuggets.com/subscribe.html for details.

Nuggets frequency is 3-4 times a month. 
Back issues of Nuggets, a catalog of S*i*ftware (data mining tools), 
and a wealth of other information on Data Mining and Knowledge Discovery 
is available at Knowledge Discovery Mine site http://www.kdnuggets.com/

	-- Gregory Piatetsky-Shapiro (editor)

********************* Official disclaimer ***********************************
* All opinions expressed herein are those of the writers (or the moderator) * 
* and not necessarily of their respective employers (or GTE Laboratories)   *
*****************************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"An experimental science is supposed to do experiments
that find generalities.  It's not just supposed to 
tally up a long list of individual cases and their 
unique life stories.  That's butterfly collecting."
  Richard C. Lewontin, biology professor at Harvard University
		Thanks to Yolanda Gil
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 28 Feb 1997 09:41:10 -0500 (EST)
From: GPS <[email protected]>
Subject: New Location of KD Mine -- www.kdnuggets.com

I have set up a new location for Knowledge Discovery Mine web site 
	-- www.kdnuggets.com -- 
which is operational today, Feb 28, 1997.

I will continue to maintain and improve that site in my new job -- 
see www.kdnuggets.com/gps.html 

The GTE location at info.gte.com/~kdd will remain for some time, 
but I will not be updating it. 

I will also continue to edit and email Knowledge Discovery Nuggets 
(I have dropped the second D to emphasize the more general focus). 
It will be gradually transitioned to kdnuggets.com site, 
but in the meantime will continue  be distributed from GTE. 
The changeover should be transparent to all subscribers. 

-- 
	Gregory Piatetsky-Shapiro
	
please address KD Nuggets related email to [email protected]
	(which is an alias for [email protected]) 
other email to me to [email protected]

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 26 Feb 1997 14:41:27 +0100
From: [email protected] (Willi Kloesgen)
Subject: KDD-97 organization -- call for panels

As in previous KDD conferences, the KDD-97 program will include panel 
discussions.  A great panel requires an interesting topic, good
speakers, and proper preparation.   To facilitate all three we solicit
early suggestions. Please submit suggestions for topics and preferably also
for panelists who could represent diverse positions or approaches of the
topic. Suggested topics should relate to any of the main KDD-97 topics (see
http://www-aig.jpl.nasa.gov/kdd97). 
The panel topics should be of general interest for a
large part of the KDD audience and allow several (controversial) approaches
to be discussed.

Please email informal suggestions by  April 2, 1997 (earlier if possible) to:

Willi Kloesgen

[email protected]

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 21 Feb 1997 13:34:22 +0100
From: Tom Denecke <[email protected]>
Subject: Data Mining and Workflow Management

I am a student of Business Science and working in a research project
"controlling of workflow processes".

My idea is to use data mining techniques to evaluate the control data
of workflow systems. My problem is that I am not very familiar with that
technical terms. So it would be great to get a hint which methodolgies
would fit to this application domain.

Here is a little description which kind of information can be achieved:

There several process instances of each process type(for example 
auditing).
After the execution of 100 instances, there exist a lot a data for this
process type, which can be explored. 


 - processing and idle time
 - who executed the process (employee, role, orga. unit)
 - which kind of workflow
 - which activities were executed
 - data about the process object (which customer, article ...)
 - which other processes are running
 - metrics concerning quality and cost of a process/activity
 - ...

We would like to generate rules about the process performance
(bottle neck detection, when does a process perform well,..).

I would be very kind to get a little information, if there a similar
problems, which are solved by data mining techniques or just literature
hint.

Thank you very much

Tom Denecke
- MBA -

WWU Muenster
Rudolf-Harbig-Weg 24
48149 Muenster
PHONE + 49 251 89 75 65
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[The following is a commercial announcement. GPS]

From: [email protected]
Date: Fri, 28 Feb 97 08:09:07 EST
Subject: Press Release: Opening of a Knowledge Discovery Center

IMMEDIATE RELEASE

CONTACT:
Price Waterhouse Management Consulting in New York:
Jan Butler 
212- 819-4838, [email protected]
Liza Kurtz
212-995-5680, ext. 210, [email protected]

PRICE WATERHOUSE LLC ANNOUNCES NEW DATA MINING SERVICES AND OPENING OF 
KNOWLEDGE DISCOVERY CENTER
New York, NY - February 26 - Price Waterhouse Management Consulting, a 
recognized leader  in delivering data warehouse services to global companies, 
introduces Data Mining Services for helping clients achieve strategic value 
from the mounds of data often accumulated in the course of business. An 
integrated offering of Price Waterhouse's Global Data Warehouse Practice, the 
Data Mining Services range from introductory seminars on data mining and 
knowledge discovery to full data mining system implementations. To support 
these offerings, Price Waterhouse has opened the Knowledge Discovery Center in 
Bethesda, Maryland
"Data mining has recently moved to the forefront of business executive's 
strategic data warehouse initiatives, driven by a significant growth in the 
amount of data that companies collect on their customers, processes, and 
finances," said Mike Schroeck, Global Data Warehouse Practice Leader for Price 
Waterhouse. Data mining technologies use sophisticated, automated algorithms to 
discover hidden patterns, correlations, and interacting relationships among the 
hundreds of strategic data elements collected by an organization. The impact of 
data mining on a company's bottom line, whether through increased revenues or 
decreased costs, is often enormous. 
A leader in data mining knowledge and research, Price Waterhouse has performed 
a comprehensive, hands-on evaluation of many of the leading data mining tools 
currently available on the market, and has spoken at a variety of conferences 
and trade shows on the subject. With years of analytical modeling and data 
analysis experience, Price Waterhouse can help clients get the greatest return 
on their data mining investment. "We are dedicated to offering value-added data 
mining analyses to our clients. The time for businesses to take advantage of 
these tools and algorithms has never been better," says Dr. Glenn Galfond, 
Partner in charge of Price Waterhouses Management Analytics practice, which is 
spearheading the firms Data Mining Services.
The Data Mining Services offered by Price Waterhouse include Data Mining 101, 
Data Mining Proof, Data Mining Service, and Data Mining Solutions. Data Mining 
101 is a half-day beginner's course in data mining. The course provides an 
overview of the technology, examples of how it has been successfully used, and 
a demonstration of the leading data mining tools. Data Mining Proof is a short 
proof of concept project, in which Price Waterhouse mines a small extract of a 
client's data for quick, but rewarding results. This allows the client to see 
data mining's potential in a hands-on environment. Clients also receive a copy 
of PW's comprehensive Data Mining Tool Evaluation report.
For companies that are ready to delve more deeply into data mining but do not 
have the necessary in-house resources, Data Mining Service offers a full range 
of data mining outsourcing options, including data extraction, data cleansing, 
and data mining. For companies that wish to implement enterprise-wide data 
mining systems, Data Mining Solutions offers Price Waterhouse's proven data 
mining and data warehousing methodology and full-scale systems implementation 
experience. 
The Knowledge Discovery Center will be used to support these services and to 
provide an environment for demonstrating the latest data mining tools and train 
clients in their use. Price Waterhouse has equiped the Center with many of the 
leading data mining tools. The technologies and algorithms available in the 
Center encompass the full-breadth of data mining capabilities. Galfond adds, 
"Price Waterhouse has invested heavily in the research and evaluation of the 
leading data mining tools. Our clients can take advantage of this investment 
while reaping the benefits that data mining brings to their companies." 
Price Waterhouse Management Consulting delivers enterprise-wide solutions to 
large multinational clients through integrated Information Technology and 
Change Integration services.  With in-depth knowledge of selected industries 
and business process expertise, Price Waterhouse Management Consulting works 
with clients worldwide, from strategy through implementation, to help them 
improve business performance.  Price Waterhouse Management Consulting services 
are provided in the U.S. by Price Waterhouse LLC.

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
{Please cc responses to the [email protected] 
since the problem is of general interest. GPS]

From: "Throop, David R" <[email protected]>
Subject: Looking for phrase matching tool
Date: Tue, 25 Feb 1997 10:03:30 -0600

Dr. Piatetsky-Shapiro,

Thank you for your excellent website on data mining.  I'm hoping you
might help me, or point me towards someone who can.

I'm looking for a piece of commercial software that may or may not
exist.  I couldn't find it on your pages, but your stuff is the closest
I've found.  So I'm asking you for any pointers.

We have several databases which have lists of components (pieces of the
International Space Station.)  These databases have no common key.  They
do, however, have english-language descriptions of the components (on
the order of 20 - 50 characters long.)  However, these descriptions are
not identical.  For instance, a certain power switch is known by two
different names:
     RPCM N1-3B-C Switch14   and    N1-3B-RPCM-C-RPC-14
As you see, the order of the identifiers is different,  one set uses the
term 'switch' where another uses 'RPC', and the '14' is concatenated
with no space on one side.

Anyway, I'm looking for a piece of software that could go through the
databases, (armed with a dictionary, list of abbreviations, synonyms
etc) and come up with a set of best guesses about which items match.

Do you know of such a tool, either as a commercial product or a research
program?

Thanks
David Throop
281 212 9369

>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 27 Feb 1997 22:43:45 -0800 (PST)
From: DMKDPAR <[email protected]>
Subject: CFP for DMKD special issue on scalable computing

============================================================================
			   CALL FOR PAPERS
============================================================================

		  DATA MINING AND KNOWLEDGE DISCOVERY 

			   Special Issue on 
	      Scalable High-Performance Computing for KDD

	       Guest editors: Paul Stolorz and Ron Musick
	       ==========================================

    	http://www.research.microsoft.com/research/datamine/dmkdpar

    Traditional computational techniques and computer architectures are
    routinely overwhelmed by the sheer volume and complexity of information
    generated from data-gathering instruments, computational and 
    experimental methodologies, and business operations.  The fundamental
    problem of extracting knowledge and insight from massive databases and
    datasets is shared across a wide range of fields in business, 
    academia and government. The new field of Data Mining and Knowledge 
    Discovery in Databases (KDD) has arisen as an interdisciplinary response
    to this situation, merging ideas drawn from disciplines such as statistics, 
    pattern recognition, machine learning, databases, visualization and
    high performance computing.

    This special issue of Data Mining and Knowledge Discovery is devoted
    to the challenge of applying data mining and knowledge discovery methods
    to large, complex datasets. Implementation of data mining ideas in
    high-performance computing environments is crucial for coping with
    large-scale data.  In particular, parallel and distributed systems are
    needed to ensure system scalability as datasets grow inexorably in size
    and scope. These environments include dedicated massively parallel
    supercomputers, super-servers built from clusters of commodity
    workstations and high-speed network interfaces, and heterogeneous
    networks distributed over regional, national and global scales.
    High-performance and parallel computing holds the promise of scaling
    to large data sets, allowing the data mining component to search a much
    larger set of patterns and models than traditional computational platforms  
    and algorithms would allow. In addition, it promises to render the KDD
    process much more interactive by allowing fast response times for 
    difficult search and model fitting problems. 

    Data Mining and Knowledge Discovery, published by Kluwer Academic
    publishers, is the flagship publication in the rapidly growing area of
    KDD.  In this special issue we solicit the most dramatic new 
    developments in high performance large-scale KDD applications, highlighting
    the promise of the technology and identifying the main challenges for
    the future.  Technically innovative papers that describe new theoretical
    developments, or tackle the application of practical data mining
    approaches to real problems and datasets on parallel and distributed 
    architectures, are solicited. Topics of interest include, but are
    not limited to, the intersection of KDD with the following fields:

    Parallel implementations of datamining & KDD methods:
        Classification and regression: e.g. decision trees, neural nets
        Pattern recognition
        Belief nets and other Bayesian approaches
        Genetic programming 
        Association rules
        Statistical inference
        Similarity detection and measurement
        Clustering and density estimation
        Change-detection
        Text retrieval
        Content-based indexing
        Data visualization
        Trend Analysis

    Integration of KDD techniques with scalable I/O systems:
	Data warehouses & federated databases        
        Parallel file systems
        High-performance network interfaces
        Intelligent data layout
        Out-of-core algorithms
        Parallel relational querying
        High performance storage systems
        Hierarchical and distributed storage

    Methods to control complexity:
        Random sampling
        Anytime algorithms applied to datamining techniques
        New complex data-type algorithms (eg. not based on feature vectors)
        Domain simplification techniques
        Inference error/confidence characterization

    Parallel, clustered and/or distributed applications:
        Datamining on commodity-based clusters and networks
        Web-oriented datamining
        Novel applications and case studies
        Knowledge discovery systems and tools


    SUBMISSION INSTRUCTIONS
    Electronic submissions are STRONGLY ENCOURAGED. Postscript copies
    of papers may be emailed to [email protected]. Latex style
    files and related instructions can be obtained at the web site
    http://www.research.microsoft.com/research/datamine.


			   ===============
                           IMPORTANT DATES
			   ===============

                **************************************
                SUBMISSION DEADLINE:       May 8, 1997
                ACCEPTANCE NOTIFICATION: June 20, 1997
                **************************************

    Enquiries about the submission process and scope of the special issue 
    may be sent to [email protected].

>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[The following is a commercial announcement. GPS]

From: [email protected]
Date: Mon, 24 Feb 1997 22:47:04 -0500
Subject: SAIC and G6G Develop an Intelligent Software Web Site

       "SAIC and G6G Develop an Intelligent Software Web Site"

          NEW Web-Site Address is:  www.intelligent-dir.com
 

Science Applications International Corporation's (SAIC) Asset Source for 
Software Engineering Technology (ASSET) Division has teamed up with G6G 
Consulting Group (G6G) and co-developed a ground breaking new World Wide 
Web (Web) site focused on "intelligent software."

The new site contains the entire content of "The G6G Directory of 
Intelligent Software," a publication that contains over 750 abstracts 
covering 15 advanced technology corridors.

"The G6G Directory of Intelligent Software" contains product abstracts in 
Expert (Knowledge-Based) Systems, Fuzzy Logic, Hypermedia, Hypertext and 
Multimedia, Intelligent Software Tools, Neural Networks, Object-Oriented 
Programming, Virtual Reality, Voice & Speech Systems, and other areas. 
The directory is further categorized by over 140 sub-categories of "what" 
the product can be used for or "what it is" such as: 

    - Data Mining                    - Manufacturing Systems
    - Diagnostic Systems             - Modeling
    - Help Desk Systems              - Network Systems
    - Help Authoring Systems         - Stock Market
    - Knowledge Management           - Software/Hardware
    - Lending and Learning Systems   - Software Development
    - Customer Support Systems       - and many others.
     
The directory content on this Web site will be updated on a weekly 
basis.  The combination of G6G's directory and ASSET's on-line free and 
commercial product inventory will present a powerful complement of 
information on the Web.  Knowledge engineers, software engineers, 
developers and other users of intelligent software products will find 
www.intelligent-dir.com  to be extremely useful.

This valuable free resource will help create a sense of community in the 
world of intelligent software by providing an on-line source of 
searchable information about intelligent software products and vendors.


           __________________________________________________
                The G6G Directory of Intelligent Software 
           --------------------------------------------------
                      http://www.intelligent-dir.com  
           --------------------------------------------------
           SAIC/ASSET                    G6G Consulting Group
           (304) 284-9000                      (310) 458-4187
           [email protected]      [email protected]
           __________________________________________________

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Tue, 18 Feb 1997 14:39:31 -0800
From: Wray Buntine <[email protected]>
Subject:  summer students and scientist positions in autonomous data analysis

Please note the two sets of positions below.
	Research scientist
	2 summer students, or longer term support for PhD
The summer student position could be transferred into 
longer term support for focussed PhD research if the 
interest is right.

Wray Buntine

=======================   Scientist
                      
NASA's Center of Excellence in Information Technology at
Ames Research Center invites candidates to apply for a position as
Research Scientist in Information Technology:

Position description:

   * We seek applicants to join a small team of space scientists and 
     computer scientists in developing NASA's next generation smart spacecraft
     with on-board, autonomous data analysis systems.   The group includes 
     leading space scientists (Ted Roush, Virginia Gulick) and leading data
     analysts (Wray Buntine, Peter Cheeseman), and their counterparts at JPL.
   * The team is doing the research and development required for
     the task, and has a multi-year program with deliverables
     planned.  This is not a pure research position, and requires
     dedication in seeing completion of the R&D milestones.
   * The applicant will be responsible for the information technology side
     of R&D, with guidance from senior space scientists on the project.
   * The research has strong links with on-going work at the Center of
     Excellence and is an integral part of NASA's long term goals.
        
Candidate requirements:

   * Strong interest in demonstrating autonomous analysis systems to
     enhance science understanding in operational tests, with the ultimate
     goal of putting such systems in space.
   * Ph.D. degree in Computer Science, Electrical Engineering, or related
     field, and applied experience, possibly within the PhD.  In 
     exceptional cases, an M.S. degree with relevant work experience will 
     suffice.
   * Knowledge of neural or probabilistic networks, machine learning, 
     statistical pattern recognition, image processing, science data,
     processing, probabilistic algorithms, or related topics is essential. 
   * Strong communication and organizational skills with the ability to lead
     a small team and interact with scientists.
   * Strong C programming and Unix skills (experimental, not
     necessarily production), with experience in programming mathematical
     algorithms:  C++, Java, MatLab, IDL.

Application deadline:

   * March 15th, 1997 (hardcopy required -- see below).

Please send any questions by e-mail to the addresses below, and type
"PI for Autonomous data analysis" as your header line. 

Dr. Ted Roush:    [email protected]
Dr. Wray Buntine:  [email protected]

Full applications (which must include a resume and the names and addresses
of at least two people familiar with your work) should be sent by surface
mail (no e-mail, ftp or html applications will be accepted) to:

Dr. Steve Lesh
Attn:  PI for Autonomous data analysis
Mail Stop  269-1 
NASA Ames Research Center
Moffett Field, CA, 94035-1000

============================== Summer students or Student Assistantship
                      
NASA's Center of Excellence in Information Technology at
Ames Research Center invites current PhD students to apply for 
a summer position (possibly two available).

Position description:

   * We seek applicants to join a small team of space scientists and 
     computer scientists in developing NASA's next generation of smart 
     space-craft on-board, autonomous data analysis systems.   The group
     includes leading space scientists (Ted Roush, Virginia Gulick) and
     leading data analysts (Wray Buntine, Peter Cheeseman).
   * We are working with spectrometers and a CCD camera, and are
     building resource-bounded autonomous classification systems,
     and trainable object recognizers.  
   * The successful student will have considerable flexibility
     within the goals of the project to contribute.
   * An ideal summer project would produce demonstration software together
     with a conference paper.
        
Candidate requirements:

   * Knowledge of neural or probabilistic networks, machine learning, 
     statistical pattern recognition, image processing, science data,
     processing, probabilistic algorithms, or related topics is essential. 
   * Strong C programming and Unix skills (experimental, not
     necessarily production), with experience in programming mathematical
     algorithms:  C++, Java, MatLab, IDL.
   * Interest in revisiting the project at a later date.

Application deadline:

   * We will accept applications on a continuing basis until
     the beginning of summer, and will take good applicants as they apply.

Please send any questions by e-mail to the addresses below, and type
"PI for Autonomous data analysis" as your header line. 

Dr. Ted Roush:    [email protected]
Dr. Wray Buntine:  [email protected]

Full applications (which must include a resume and the names and addresses
of at least two people familiar with your work) should be sent by surface
mail (no e-mail, ftp or html applications will be accepted) to:

Dr. Steve Lesh
Attn:  summer student for Autonomous data analysis
Mail Stop  269-1 
NASA Ames Research Center
Moffett Field, CA, 94035-1000

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 21 Feb 1997 14:11:12 -0500
From: [email protected] (Brij Masand)
Subject:  KDD Job at GTE Laboratories, Waltham, Ma

**** An Outstanding Applied Researcher/Developer needed for the   ********** 
**** Knowledge Discovery in Databases project at GTE Laboratories **********


Description: Participate in the design and development of
state-of-the-art systems for data mining and knowledge discovery. The
focus of the job is on applied research in KDD, including development
of prototypes to demonstrate innovative business applications of KDD.

	The candidate will join one of the leading R&D teams in the
area of data mining and knowledge discovery. Our current projects
include predictive customer modeling for GTE's cellular telephone
markets.  We are applying multiple learning and discovery methods to
very large, high-dimensional real-world databases, involving millions
of records and Gbytes of data and have created KDD-based solutions
that are being deployed in the field.

	The ideal candidate will have a Ph.D. in Machine Learning or
related fields and 2-3 years of experience, or an M.S. with equivalent
experience.  The candidate should have experience with machine
learning algorithms, be familiar with statistical theory, have
practical experience with databases, and be proficient with
Web/Internet tools.  Excellent coding skills in C/Unix environment and
an ability to quickly pick up new systems and languages are needed.  Good
communication skills, the ability to work in a team, and good coding
and system maintenance practices are very desirable.

GTE Laboratories incorporated, located in Waltham, Ma is the central
research facility for GTE.  GTE is among the the largest local
exchange telephone carriers and the second largest mobile service
provider in the United States.  Our research facility is located on a
quiet 50 acre campus-like setting in Waltham, MA, 20 minutes from
downtown Boston.  Our salaries are competitive, and our outstanding
benefits include medical/life/dental insurance, saving
and investment plans, and an on-site fitness center.

Please send a resume and a cover letter
(preferably by e-mail, in ASCII) to:

[email protected]

or by fax to 617.466.3342 (Attn: Brij Masand)

I will be travelling till Mar 12th and  will reply to email responses
after that. thanks! -- Brij Masand  ([email protected])

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Subject: Two positions in Machine Learning/Data Mining at GMD
Date: Fri, 28 Feb 97 13:55:06 +0100
From: [email protected]

Two positions in Machine Learning/Data Mining at GMD

GMD's FIT.KI department (the AI research division of the
Institute for Applied Computer Science) is looking to
fill two scientist positions (M.S./Diplom or postdoc level) in the area of

   Machine Learning/Data Mining.

We are looking for excellent people with a strong background in one
or both of these areas, preferably combining both theoretical/scientific
and application/software-engineering skills.  Applications at both the 
postdoctoral and the M.S. level are welcome.

You will be working as a research scientist in one of our current 
ML/DM projects, KESO or ILP2, and will be part of FIT's data mining
group consisting of currently 4 people.  Scientific work, writing and 
presentation of papers, and application and software work will both be 
part of your job.  M.S. level applicants will be given time to complete their
Ph.D.s while at GMD.   

Both positions are to be filled as soon as possible, for a period of initially
two or three years, renewable for up to five years.  Salary is according to
the BAT IIa tariff, in the range of approx. DEM 50.000 to DEM 80.000 depending 
on age, qualifications, and marital status.  For more information about FIT.KI, see 
http://nathan.gmd.de, for more information about the ML/data mining group, see
http://nathan.gmd.de/projects/ml/home.html.

If you are interested in such a position, please send your application 
material to
   Dr. Stefan Wrobel
   GMD, FIT.KI
   Schloss Birlinghoven
   53754 Sankt Augustin
   Germany
   [email protected]
to be received no later than March 23, 1997 (preferably by paper mail, 
but E-Mail is o.k. if otherwise you cannot meet the deadline).  Please 
include at least a brief curriculum vitae, description of your qualifications, 
research experience and future research interests, degree/grade information 
(if relevant) and if applicable, a selection of three of your best publications 
(full text copy).  We are looking forward to your application!

--------------------------------------------------------------
Dr. Stefan Wrobel
GMD -- German Natl. Research Center for Information Technology
FIT.KI, Schloss Birlinghoven, 53754 Sankt Augustin, Germany
Tel.: +49/2241/14-0, Fax: -2889  E-Mail: [email protected]
WWW http://nathan.gmd.de/persons/stefan.wrobel.html
Secr.: D. Boethgen Tel. -2731, E-Mail: [email protected]
410.1897:09IJSAPL::OLTHOFSpellchecked Henry AlthoughThu Mar 13 1997 10:15576
Knowledge Discovery Nuggets 97:09, e-mailed 97-03-10

News:
	* P. Domingo, Re: Looking for phrase matching tool
	* R. Jain, Tandem Data Mining Announcement, 
		http://www.tandem.com	
Siftware:
	* R. Quinlan, C5.0: Successor to C4.5, 
		http://www.rulequest.com
Positions:
	* P. Norvig, Job offered in information extraction and learning,
 	data mining, http://www.junglee.com 
	* M. Bramer, Research Fellowship in Knowledge Discovery
	* X. Liu, Research Studentship in Intelligent Data Analysis,
		http://web.dcs.bbk.ac.uk/~hui/IDA/home.html
	* D. Sleeman, University of Aberdeen, Chair of Computing Science
		http://www.csd.abdn.ac.uk/people/chair_fp.html
--
-2345678-2-2345678-3-2345678-4-2345678-5-2345678-6-2345678-7-2345678-
Knowledge Discovery Nuggets is a free electronic newsletter for the 
Data Mining and Knowledge Discovery community, focusing on the latest 
research and applications.

Submissions are most welcome and should be emailed, with a DESCRIPTIVE 
subject line (and a URL) to [email protected]. 
To subscribe, see http://www.kdnuggets.com/subscribe.html 

KD Nuggets frequency is 3-4 times a month. 
Back issues of KD Nuggets, a catalog of data mining tools ("Siftware"), 
and a wealth of other information on Data Mining and Knowledge Discovery is
available at Knowledge Discovery Mine site http://www.kdnuggets.com/

	-- Gregory Piatetsky-Shapiro (editor)

********************* Official disclaimer ***************************** 
All opinions expressed herein are those of the contributors and not 
necessarily of their respective employers, or of KD Nuggets
***********************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
There is security, only opportunity
			General McArthur
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
To: [email protected]
cc: [email protected], [email protected] 
Subject: Re: Looking for phrase matching tool 
Date: Fri, 28 Feb 1997 13:43:11 -0800
From: "Pedro M. Domingos" <[email protected]>

Alvaro Monge and Charles Elkan of UC San Diego ([email protected],
[email protected]) have one such program. They have a paper in the
proceedings of KDD-96 (p. 267) that describes their system, and also gives
references to other work in the area.

Pedro

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
[Note: the following is a commercial announcement. GPS]
From: JAIN_ROHIT%t16@fedex
Date: 28 Feb 97 15:08:00 -0600
To: [email protected]
Cc: [email protected], [email protected] 
Subject: Tandems's Feb. 11 announcement

Hi folks,

It seems in Nuggets you seem to cover announcements made by many companies.
I am wondering what would be needed on Tandem's part to have you include
that announcement in Nuggets.  You can get to the announcement from our home
page at http://www.tandem.com.  I have also included parts of it in this
message.

Rohit Jain

Contact:
            Kristine Austin
            Tandem Computers Incorporated
            Tel: +1 (408) 285 6645
 World Wide Web Home Page Address: http://www.tandem.com

Tandem Object Relational Data Mining Architecture Drives Next Generation of
Knowledge Discovery

Cupertino, CA February 11, 1997 Tandem. Computers Incorporated today
announced a revolutionary approach in bringing complete knowledge discovery
to business users through its Object Relational Data Mining technology. For
the first time, the complete warehouse data set is available for real-time
data mining, resulting in reduced processing time, more complete results,
and significantly easier management. This new architecture establishes a
standard SQL interface between client data mining tools and both object
relational and relational database engines. The database engine will perform
specialized data manipulation functions required by the data mining algorithms.
Tandem's Object Relational Data Mining architecture takes full advantage of
the capabilities of relational database engines resulting in the ability to
mine larger volumes of data and better performance.

By integrating the best-of-breed data mining software with a relational
database, Tandem's Object Relational Data Mining will enable business
professionals to more effectively uncover and exploit valuable patterns and
trends hidden in their data. This architecture will enhance knowledge
discovery in solutions such as credit card marketing, claims analysis,
retail basket analysis, and others.

The interface between data mining tools and the database engine is enabled
through the use of SQL extensions, ultimately allowing customers to enjoy a
much wider range of data mining clients. Tandem will promote the
establishment of de facto standards for these extensions with other database
vendors and data mining tool providers. "Initially, the use of SQL
extensions will greatly enhance the way traditional alphanumeric data types
are mined today," said Abhay Mehta, Tandem's director of Object Relational
Data Mining Development.  As technology evolves, this architecture will
enable the fast, efficient mining of more complex data types such as image,
voice, video, and other multimedia objects. In the second half of 1997,
Tandem's ServerWare  database will be the first to combine all of the
elements into a powerful knowledge discovery business environment. 

 Tandem will be able to build on its success in the data warehouse
marketplace to position itself well in the high-end macromining segment of
the data mining arena,  said Dr. Wolfgang Martin, program director, META
Group.  Tandem s approach is unique in that it opens up the powerful
ServerWare database, and other database management systems, to a wide range
of data mining functions while accommodating future data mining developments
and complex data types. 

Tandem s data mining partners have been selected so that customers can
benefit from their combined breadth of data mining algorithms and for the
ability of their tools to work in a high-performance parallel environment
necessary to take advantage of this new architecture. Data mining partners
include leading companies such as Angoss Software International Limited,
Data Distilleries B.V., Magnify Incorporated, NeoVista Solutions
Incorporated, and Syllogic B.V.

ANGOSS Software International Limited

ANGOSS  KnowledgeSEEKER excels in applications including fraud detection,
target marketing, process control, and risk management.
KnowledgeSEEKER displays results in a decision tree format by uncovering
valuable relationships and correlations in the dataset, and by writing
predictive rules. This format can be easily understood by any business end
user. KnowledgeSEEKER turns data into valuable business knowledge.

Data Distilleries B.V.

Data Distilleries  Data Surveyor uses highly efficient decision tree based
search strategies and database optimization techniques, enabling it to take
into account hundreds of variables to mine finance, retail, insurance, and
database marketing databases. At the end of the data mining process, Data
Surveyor produces a graphical representation of the discovered relationships
and an overview of all actions and results during the mining process.

Magnify Incorporated

Magnify s PATTERN software is an open set of modular software tools for
mining, managing, and analyzing very large data sets. The PATTERN system
includes several specialized applications, such as PATTERN:Detect for
detecting fraud, anomalies, and rare events and PATTERN:Profit for
predicting the delinquency, bankruptcy, credit usage, and profitability of
customers. The PATTERN system incorporates algorithms for parallel and
distributed variants of classification, regression, and optimization trees,
and a variety of other data
mining algorithms.

NeoVista Solutions Incorporated

NeoVista Solutions  Decision Series suite of knowledge discovery tools are
directed towards solving data mining challenges in a variety of markets,
including retail, insurance, telecommunications, and healthcare.
The Decision Series suite includes pattern discovery tools based on neural
networks, clustering, genetic algorithms, and association rules.

Syllogic B.V.

The Syllogic Data Mining Tool supports all stages in the data mining
process, including data selection, data cleaning, enrichment, coding,
discovery, and visualization. Using a toolbox approach, the tool combines
various database analysis techniques, such as decision trees, association
rules, k-nearest neighbor, clustering, and visualization to solve business
challenges in the finance, transportation, government, and system and
network management segments.

To help customers stay on the leading edge of data mining, Tandem is also
partnering with key universities such as Simon Fraser University in order to
benefit from the results of their on-going research. This alliance includes
parallelizing existing and next-generation data mining algorithms and
techniques.

 Tandem is making a major investment in data mining and in driving its
widespread deployment as a business tool,  said Bill Heil, senior vice
president and general manager of Tandem s ServerWare business unit.  By
focusing on the Tandem ServerWare database engine and partnering with
best-of-breed solutions providers and researchers, we are able to supply
customers with the industry s most advanced and comprehensive range of data
mining solutions. What we are offering is an extensible approach designed to
keep customers at the forefront of the latest developments in knowledge
discovery. 

Availability

Tandem s Object Relational Data Mining solutions will be available starting
in the third quarter of 1997. With these solutions, customers will be able
to take advantage of the industry s most scalable performance for mining
databases residing on either Microsoft. Windows NT.
Server based platforms (including Tandem s recently introduced S-series
servers based on Windows NT Server) or on Tandem s massively scalable
NonStop. Himalaya. servers.

About Tandem

Founded in 1974, Tandem Computers Incorporated designs and delivers
technology solutions that companies rely on to compete in a business world
that runs 24 hours a day. A US$1.9 billion company headquartered in
Cupertino, California, Tandem has offices, strategic partners, and providers
in more than 50 countries around the world.


Tandem, Himalaya, NonStop, Object Relational Data Mining, ServerWare, and
the Tandem logo are trademarks or registered trademarks of Tandem Computers
Incorporated in the United States and/or other countries. Microsoft and
Windows NT are either trademarks or registered trademarks of Microsoft
Corporation in the United States and other countries. All other brand or
product names are trademarks or registered trademarks of their respective
companies.


Contact:
            Kristine Austin
            Tandem Computers Incorporated
            Tel: +1 (408) 285 6645
 World Wide Web Home Page Address: http://www.tandem.com

Tandem Introduces Object Relational Data Mining Solutions and Services for
Vertical Markets

       Business-driven offerings target card marketing, micromerchandising,
claims analysis, and other key applications

Cupertino, CA February 11, 1997 Applying its vertical market expertise and
new Object Relational Data Mining  architecture to real-world business
problems, Tandem. Computers Incorporated today launched a series of Object
Relational Data Mining solutions packages for card marketing,
micromerchandising, and insurance claims analysis. Tandem also announced new
consulting services designed to allow companies to quickly enjoy low-risk,
discovery-driven decision making.

The solutions and services are based on Tandem s revolutionary new Object
Relational Data Mining architecture. This enables customers to efficiently
mine their entire database, not merely samples, for useful patterns and
trends. The result is a more effective realization of the full business
value of data.  Object Relational Data Mining solutions add significant new
functionality to customer segmentation and predictive modeling techniques,
said Jonathan Kalman, managing director of MRJ Technology Solutions, a
leading specialty systems integrator.  Tandem is taking a profoundly
different approach by integrating its powerful database, capable of handling
an entire organization s data, with leading data mining tools. 

Delivering full value of business data

The new solutions packages will be comprised of the cross-platform Tandem
ServerWare, database, appropriate integrated data mining and other analysis
tools from leading solutions partners, Tandem S-series massively scalable
Himalaya. and/or Microsoft. Windows NT.
Server based hardware platforms, application and reporting templates, data
models, and Directional Consulting services. Though specially tested and
packaged, the solutions are all easily customizable. Initial solutions include:

Card Marketing

Aimed at card acquirers and issuers, this solutions package applies Object
Relational Data Mining architecture and other decision support technology to
improve the effectiveness of cardholder retention and acquisition efforts.
This provides a better understanding of when certain customers are likely to
leave and why, leading to more effective customer segmentation, increased
response rates to marketing promotions, and improved margins through
targeted product development and pricing.

Micromerchandising

This package enables retailers to mine immense volumes of detailed
merchandising data, resulting in improved in-stock positions, reduced
markdowns by better understanding buying patterns and trends, enhanced
promotional effectiveness, and improved store profitability through more
precise forecasting.

Claims Analysis

Aimed at insurance providers looking to contain underwriting costs and
improve loss ratios, this package uses Object Relational Data Mining
technology to support new product development, fraud profiling and
detection, better service provider alliances, and more exact underwriting
experience comparisons.

Immediate customer reaction to these benefits is positive. Said Juan
Verastigui, director of Claims System Development at USAA, a leading
insurance company,  Tandem s Object Relational Data Mining architecture and
the way it leverages the parallel ServerWare database will provide USAA with
the ability to derive full value from all our claims data, and not just
subsets. The resulting faster and more complete answers to our business
queries will have a very positive effect on our bottom line. 

Looking ahead, Object Relational Data Mining architecture will enable the
mining of complex data types that include voice, video and images.
Said MRJ s Jonathan Kalman,  Object Relational Data Mining solutions provide
immediate value with traditional data types, and extensibility to meet
future multimedia analysis needs. 

Directional Consulting, new Object Relational Data Mining services

Tandem s Directional Consulting services are an integral part of the new
solutions packages and are also available separately. These services define
a low-risk, high-return methodology proven over many Tandem based data
warehousing implementations for exploring and understanding how data mining
can support particular business initiatives.

Directional Consulting services use a phased approach to having data mining
production environments up and running within 90 days. The process begins
with establishing priorities for implementation of Object Relational Data
Mining and proceeds to a  proof of concept  phase to verify that the
selected data mining solutions will meet expectations.
System design, data modeling, and implementation then follow, culminating
with the establishment of a robust, scalable operational environment that
supports application evolution and growth.

Availability

Tandem Card Marketing, Micromerchandising, and Claims Analysis solutions
will be available beginning in the first quarter of 1997. These will be
enhanced to take advantage of Object Relational Data Mining technology in
the third quarter of 1997.

About Tandem

Founded in 1974, Tandem Computers Incorporated designs and delivers
technology solutions that companies rely on to compete in a business world
that runs 24 hours a day. A US$1.9 billion company headquartered in
Cupertino, California, Tandem has offices, strategic partners, and providers
in more than 50 countries around the world.


Tandem, Himalaya, NonStop, Object Relational Data Mining, ServerWare, and
the Tandem logo are trademarks or registered trademarks of Tandem Computers
Incorporated in the United States and/or other countries. Microsoft and
Windows NT are either trademarks or registered trademarks of Microsoft
Corporation in the United States and other countries. All other brand or
product names are trademarks or registered trademarks of their respective
companies.

>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Wed, 5 Mar 1997 23:31:07 -0500 (EST) 
From: [email protected] (Ross Quinlan) 
Subject: Successor to C4.5

I have developed a new inductive program called C5.0.  Its main advantages are:

    * new, faster methods for generating rules
    * support for boosting
    * optional non-uniform misclassification costs

Further information and free demonstration versions are available from

    http://www.rulequest.com

Ross Quinlan

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Fri, 28 Feb 1997 15:33:40 -0800
From: [email protected] (Peter Norvig) Organization: Junglee Corp.
To: [email protected], [email protected], [email protected],
        [email protected], [email protected], [email protected] 
Subject: Job offered in information extraction and learning, data mining

Junglee is looking for full-time employees and summer interns to work on
information discovery and data mining from text documents.  We're 
looking for creative hard-working people with experience in some of the
following: agents, databases, information extraction, parsing, regular
expressions, language design, statistics, machine learning, and GUI 
design.

Junglee develops Internet and Intranet information technology for the 
future and pushes it to market today. Technology that raises eyebrows 
and drops barriers.  Founded in 1996 by four PhD students from the 
Stanford University Computer Science Department and a Silicon Valley 
veteran, Junglee Corporation has excellent funding, high-profile 
customers, and a strong revenue plan.

Our Virtual DataBase (VDB) engine is fueled by our ability for data 
source description, extraction, and attribute mapping.  Imagine 
capturing data from hundreds of disparate unstructured web sites, 
mixing that with data from other heterogeneous, distributed database 
and non-database sources and turning it all into a relational aggregate 
with the power of full SQL queries and the ease and portability of 
HTML user interfaces.  We call these applications PALs - powerful 
information sites where people can ask for and get an answer. 
Several of our PALs are up on the web today at www.junglee.com and
www.washingtonpost.com; we are currently building more of them for 
some well-known companies.

One of the key aspects of the technology is discovering/mining 
information from text. The project is lead by Peter Norvig who has done 
extensive work on Natural Language Processing, Machine Learning, and 
other Artificial Intelligence problems. While this project involves 
significant ground-breaking research, it is definitely a development 
project, not just research.

Please send responses to [email protected] or by fax to 408-522-9470 
and mention this posting.


-- 
Peter Norvig               [email protected] Junglee Corporation
phone: 408-522-9482 1250 Oakmead Parkway       fax:   408-522-9470 Suite 310
http://www.junglee.com Sunnyvale CA 94086         http://www.norvig.com


>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: "Max Bramer" <[email protected]> Organization: University of
Portsmouth
To: [email protected], [email protected],
[email protected],
        [email protected], [email protected], [email protected] 
Date: Sat, 1 Mar 1997 17:05:45 +0000
Subject: Research Fellowship in Knowledge Discovery 
Reply-to: [email protected]

UNIVERSITY OF PORTSMOUTH

DEPARTMENT OF INFORMATION SCIENCE

RESEARCH FELLOWSHIP IN KNOWLEDGE DISCOVERY

Salary:  stlg17,472 - stlg20,381  (Pay award pending)

Closing Date:  21 March, 1997
(Note: This is an extension to the previously announced closing date.)

Reference:  RTEC 0149 (G)

Applications are invited for a two-year Research Fellowship in the
Department of Information Science to commence as soon as possible.

The successful candidate will work closely with Professor Max Bramer (Head
of the Department of Information Science) to develop research in the area of
Knowledge Discovery and Data Mining. The Department currently has projects
in the sub-areas of automatic induction of classification rules from
examples, Case Based Reasoning, Neural Networks, Genetic Algorithms and
related statistical techniques.

Applicants should have a good honours degree in Computer Science or related
subject. Preference will be given to candidates who have (or expect soon to
receive) a higher degree in a relevant discipline.
Relevant commercial experience would also be an advantage.

Informal enquiries may be made to Professor Bramer, either by telephone
(01705) 844444 or by electronic mail ([email protected]), or to Simon
Thompson on (01705) 844097 ([email protected]).  Further information
about the department is also available from the  World Wide Web at
http://www.sis.port.ac.uk.

Further particulars are available from:

Personnel Office
University House
Winston Churchill Avenue
Portsmouth PO1 2UP
England

Telephone (01705) 843421  (24 hour answerphone) E-mail:  [email protected]
http://www.port.ac.uk/

IMPORTANT NOTE: All applications should be sent (preferably on paper not by
email) to the Personnel Office NOT to the Department of Information Science.
_______________________________________________________

Professor Max Bramer
Department of Information Science
University of Portsmouth
Milton, Southsea PO4 8JF, England
Tel: +44-(0)1705-844444    Fax: +44-(0)1705-844006 email:
[email protected]

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: [email protected] (Xiaohui Liu)
Date: Tue, 4 Mar 97 12:17:57 GMT
To: [email protected]
Subject: Re: EPSRC CASE Research Studentship in Intelligent Data Analysis

BIRKBECK COLLEGE
                DEPARTMENT OF COMPUTER SCIENCE
                     UNIVERSITY OF LONDON


   EPSRC CASE Research Studentship in Intelligent Data Analysis


Applications are invited for an  EPSRC CASE PhD studentship, within the
Intelligent Data Analysis  (IDA) Group, at   the Department of Computer
Science, Birkbeck College. The three-year  studentship  is for the
investigation of intelligent data analysis  techniques for research
problems  in  process  industries,  funded  by  Honeywell Hi-Spec
Solutions, UK  and  Honeywell Technology Center, USA.  The successful
candidate will have a tax-free salary of at least 10,000 pounds (there are
experience, age-related and dependants additions), and will be  expected  to
work on a joint research  project between Birkbeck and Honeywell on "Causal
Modeling for Time Series Data".

The IDA Group at Birkbeck conducts research into the application of
computationally  intelligent techniques to data analysis  problems.
The group has enjoyed successful collaboration with several external
organisations in industry and medicine on a variety of IDA research
projects,  funded by government  agencies, industrial  sponsorships and
charity  organisations.  The group  is to host  the second IDA conference in
London this August.

Applicants  should  have  at least a 2(i)  in Computer  Science  or related
subject,  with a good background in Artificial Intelligence or  Statistics,
or  a  2(i) in  Chemical  Engineering  with  strong computing background.
Please submit a CV  as soon as possible, but not later than 31 March 1997,
to  Dr X Liu,  Department of Computer Science, Birkbeck College, Malet
Street, London WC1E 7HX, UK. Phone Dr Liu  on  0171-631 6711 or  email him
([email protected]) if you wish to make an informal enquiry.

Information  regarding this project  and research activities of the IDA
Group at Birkbeck can be accessed on the World Wide Web via URL:

          http://web.dcs.bbk.ac.uk/~hui/IDA/home.html


>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Derek Sleeman <[email protected]>
Date: Sun, 2 Mar 1997 15:01:52 GMT
To: [email protected], [email protected], [email protected],
        [email protected], [email protected], [email protected], [email protected]
Cc: [email protected]
Subject: CHAIR VACANCY (for Posting)

Announcement of Post (Closing date: early MARCH)

University of Aberdeen

Chair of Computing Science


Applications are invited for the post of Professor of Computing
Science. The new Professor will play a key role in strengthening
the teaching and research activities of the Department of
Computing Science. The new Professor will provide academic
leadership in the development of the Department's existing areas
of interest, Artificial Intelligence and Databases. Candidates
should have an international reputation with an excellent record
of innovative research as measured by publications and grant
income. Applications from academics, research managers and others
from Industry and public sector Institutions will be considered.
Further, as the University of Aberdeen has recently made a major
research investment in the Institute of Medical Sciences, it would
be an advantage if the person had experience of working with
Medical/Healthcare professionals. The person appointed will be
expected to acquire a significant role in the management of the
Department.

Informal enquiries may be directed to Professor A R Forrester,
Vice-Principal and Dean of the Faculty of Science and Engineering:

     Email: [email protected]
     Tel: +44 (0)1224 272081
     Fax: +44 (0)1224 272082

More details of the Department's research activities can be found
on our research pages at http://www.csd.abdn.ac.uk/research/index.html
or contact Professor Derek Sleeman, Head of Department:

     Email: [email protected]
     Tel: +44 (0)1224 272295/6
     Fax: +44 (0)1224 273422

For further particulars of this post, see:

http://www.csd.abdn.ac.uk/people/chair_fp.html
410.1997:10IJSAPL::OLTHOFSpellchecked Henry AlthoughFri Mar 21 1997 14:47793
Knowledge Discovery Nuggets 97:10, e-mailed 97-03-19
News:
    * J. Brown, Report on DM Summit in San Francisco, Feb 18-21, 1997
    * B. Pearlmutter, Abbadingo One: DFA Learning Competition
        http://abbadingo.cs.unm.edu/
Siftware:
    * K. Schirmer, smart information services GmbH,
        http://www.newscan-online.de
Positions:
    * G. John, IBM DATA MINING ANALYST POSITIONS,
        http://www.ibm.com/bi
    * B. Perry, HRL Job Opening: Research Intern/Parttime (KDD, DAI, Java)
        http://www.wins.hrl.com
Meetings:
    * M. Bramer, Expert Systems 97: Call for Papers 
        http://www.sis.port.ac.uk/sges/es97.html
    * M. Smyth, Hinton-Jordan Learning Methods Tutorial, May 1997,
        http://www.ai.mit.edu/projects/cbcl/web-pis/jordan/course/
    * L. De Raedt, Final call for IJCAI-97 Workshop on 
	Frontiers of inductive logic programming
    * S. Dzeroski, ILP-97: CFP Reminder
	http://www-ai.ijs.si/SasoDzeroski/ilp97.html
--

Knowledge Discovery Nuggets is a free electronic newsletter for the Data Mining 
and Knowledge Discovery community, focusing on the latest research and 
applications.

Submissions are most welcome and should be emailed, with a DESCRIPTIVE subject 
line (and a URL) to [email protected].  Please keep meeting announcements 
short and put all the details on the meeting web page !

To subscribe, see http://www.kdnuggets.com/subscribe.html 

KD Nuggets frequency is 3-4 times a month. 
Back issues of KD Nuggets, a catalog of data mining tools ("Siftware"), and a 
wealth of other information on Data Mining and Knowledge Discovery is available 
at Knowledge Discovery Mine site http://www.kdnuggets.com/

	-- Gregory Piatetsky-Shapiro (editor)

********************* Official disclaimer ************************************ 
All opinions expressed herein are those of the contributors and not necessarily 
of their respective employers or of KD Nuggets
*****************************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Knowledge is the antidote to fear
		Ralph Waldo Emerson
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Mon, 17 Mar 1997 21:12:01 -0600
From: "J.P.Brown" <[email protected]>
Subject: Second Annual Data Mining Summit

The Second Annual Data Mining Summit was held, February 19-21, 1997,
at the San Francisco Regency Hyatt. As I was not at every session, this
is a generalization - no names, no pack drill.

The majority of the delegates were from the United States and Canada.
Nine other countries were represented, from Europe, South America and
Asia. There were presentations all the way from the "Biggies" to the
"Start-Ups". From the Past to the Present, there were papers on 
specific Data Mining techniques, and much reliance on subjective 
approaches. A thought-provoking paper with present-day relevance 
covered the Public Perception of Data-Mining. From the Present to the 
Future, there were extensions to accepted ideas and some concepts 
moving toward a more controversial emphasis on objectivity.

The Basics, and some Specialties, were covered in detail, and 
attention was paid to the Dimensions of Decision Support and to 
On-Line Analytical Processing, both subjects of great importance. 
Some intensely practical, no-nonsense success stories were presented,
and some novel perspectives on iterative "living" processes.

As well as successful Data Mining examples, Limitations, Challenges 
and Possible Pitfalls were pointed out. Solutions were suggested. 
Before these demonstrably useful techniques can become the work 
horses of the future, a new generation of Tool Support must prove 
itself to be effective. This has begun to happen, and the competition 
between these new user-friendly applications will be interesting to 
participate in. 

Little attention to variations with passage of time, could be noted.
There seems to be a prevalent assumption that "situations" will not
change. This is "writing the history of the future" as opposed to the
approach which starts off by "predicting the past", and then keeps a
constant, trigger-happy lookout for significant change.

The approaches which were considered, varied from simple functions, 
to Algorithms, to Genetic Algorithms. Complex hybrid populations 
could be separated in several ways. Rules could be used, and 
Artificial Neural Nets. Agents could do it, if they were made to be 
versatile enough. Visualization was important because we can "think 
with our eyes". Some of you will know that I am of the "all of the 
above" school.

>From my own personal point of view the Data Mining Summit was
encouraging. The next move will be to put the pieces together, and to
consciously emphasize our goals. Those who want to know more about 
the "all of the above" school, could try http://www.hal-pc.org/~jpbrown 
and then let me know what they think.


>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Sun, 9 Mar 97 23:45 MST
From: "Barak Pearlmutter" <[email protected]> 
To: [email protected]
Subject: Abbadingo One: DFA Learning Competition 

Thought database miners might want to whet their teeth on these little datasets.  Although neither as big nor as lucrative as the big boys, they are a bit more controlled, and give an opportunity to test an algorithm against all the competition.


	       Abbadingo One: DFA Learning Competition

			     Announcement
				  &
			Call for Participation

In order to encourage the development of better grammar induction
algorithms, the Abbadingo One competition will award at least $1,024 to
the designer of the system that is most successful at discovering the
structure of random deterministic finite automata, as assessed by a
graded series of nine benchmark problems.  The competition ends on
15-Nov-1997. 

This competition is being sponsored by, among others,  
* The Computer Science Department at the University of New Mexico,
   which is providing computational support.
 * The Kluwer Academic journal "Machine Learning," which will give
   priority treatment to a paper describing the award winning algorithm.
 * The Santa Fe Institute, which will host the award ceremony.
 * The "Journal of Artificial Intelligence Research."

For details retrieve http://abbadingo.cs.unm.edu/

Good luck, and may the best algorithm win!
--
Competition	Kevin J. Lang <[email protected]>
  organizers:	Barak A. Pearlmutter <[email protected]>

>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
[The following is a commercial announcement. GPS]
Date: Tue, 11 Mar 1997 21:30:46 +0100
From: Kai Schirmer <[email protected]> 
Subject: smart information services GmbH

Hello! 

We would like to introduce ourselves and are interested in being listed
in your company overview on data mining and knowledge discovery.

Formed in early 1995, smart information services GmbH is located in
Potsdam near Berlin in Germany. The company's activities centers in
application development, service and research using advanced information
technologies in the areas of Intelligent Information Retrieval. 

Smart information is currently developing a news categorizing and
filtering system (newscan) using advanced text processing techniques. 

Further activities focus on fact extraction from financial news and
automated classification of news from business news wires for signaling,
filtering and routing tasks. 

The newscan news filtering system and service offers business
professionals a smartest, easy and cost-effective way of gaining current
awareness in a rapidly changing world. A true knowledge exchange
company, smart information provides electronic information services
which intelligently interconnect content providers and subscribers. 

Its interactive, customized services include newscan for corporate
workgroups and enterprises. Newscan is a premium business intelligence
service customized to the specific needs of clients that focuses on the
industry news that's critical to their business. It provides customers
with "custom-tailored" news based on a profile that describes their
markets, news needs and specialized interests. Using advanced filtering
techiques, newscan selects highly relevant news by scanning some 3,000
to 4,000 German and English news daily and delivers only those relevant
to each customer in time for each business day. 

Smart information is partner in the Esprit project ECRAN. ECRAN will
develop a new generation of Information Extraction (IE) applications, to
be included in telematic services having a large textual content. ECRAN
will analyse free texts (initially, financial information from
specialised newswire services, and market information on the internet)
extracting information content. The information can be compared against
a model of user requirements so that the system can precisely identify
text of interest to a customer. 

By using the results of the ECRAN project specific financial, economic
and political information from standardised news will be extracted and
stored in a database format. The information extraction is based on
lexicon tuning technologies and sophisticated template handling. Once
stored in a database format the extracted facts can be analysed in
combination with time series. 

Currently smart information is preparing a European research project on
information mining in heterogeneous environments. The main ideas are
described in the following. 

In the past few years, the abundance of continuous data sources, the
connectivity allowed by local and worldwide public and private networks,
and the continuous decrease of the bandwidth/price ratio, have been
subject to a steady growth at explosive rates, and this trend has shown
no sign of decline ever since. Thus, staggering amounts of new
information are continuously made available to private users, business
firms and professional operators. Extracting the information relevant
for a given business or position from an overwhelming flood of data, and
being able to use it for tactical and strategical planning, as well as
decision support on the fly, is vital for business survival and
leadership, but it is getting less and less amenable of human handling.
On the other hand, an ever increasing part of current information fluxes
passes through computer networks, which makes them amenable of automatic
filtering, processing and interpretation. Both situations concur to
demonstrate both the need and the feasibility of systems that filter and
integrate information from different data sources, sometimes being
static and well structured (legacy Data bases), sometimes dynamic and
with a variable degree of standardization, from rigidly defined records,
to multimedia documents, to free text, speech, images. 

Please link to our web-site "www.newscan-online.de".

Yours sincerely 
Kai Schirmer



>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Tue, 11 Mar 1997 20:43:00 -0800 (PST) 
From: George John <[email protected]> 
Subject: IBM DATA MINING ANALYST POSITIONS (please post/redistribute)


IBM DATA MINING ANALYST POSITIONS (please post/redistribute)

Help!  We're drowning in work!  IBM needs 10 more analysts for its
highly successful data mining group.  Join our team of high-caliber
PhD's in an exciting multi-faceted career in data mining:

* Analyze data for customers using IBM's industry-leading data mining 
  products
* Interact directly with senior management at Fortune 500 companies 
* Teach data mining classes to our customers and develop course materials 
* Travel, see the world!  (One member of our team just got back
  from Paris, another is heading to Australia for two weeks... these
  are not vacations, it's their job!)
* Interact with researchers and product developers, discuss ideas for
  new data mining algorithms, new visualizations, and new features
  for our products
* Assist sales reps in customer visits, be the "technical person" to 
  answer hard questions 
* Work with the marketing group to help develop brochures, etc.
* Attend trade shows and conferences, learn more about the industry
  and talk to customers
* Use SQL/AWK/PERL/SAS to process data   (ooh, the excitement!)

The ideal candidate 
* has an excellent understanding of the data analysis process and has
  participated in several projects
* is strongly technically proficient in at least some areas of data 
  mining (background in statistics, machine learning, neural nets, or 
  pattern recognition, or related), with a desire to learn more 
* has excellent communication and presentation skills 
* is a self-starter, good at quickly becoming a productive member of 
  a team
* is a fast learner, can quickly become an expert in a new industry
  and work with IBM consultants to productively apply data mining 
* has some unix skills, knows enough AWK and PERL to be self-sufficient
  in processing data
* has a good sense of humor, fun to work with, enjoys taking co-workers 
  out to dinner, insists on paying every time, etc...

Positions are available for both senior applicants (professors, PhD's,
MBA's, or 4+ years relevant business experience) and more junior members
(MS, BS, less job experience).  Salaries are competitive, and based on
experience.  The jobs are focused on business, but some amount of time
spent on research may be negotiated.  IBM's data mining group is growing
quickly, and offers excellent career opportunities. 

For more information on data mining at IBM, see the webpage for IBM
Global Business Intelligence Solutions (our parent organization) at
http://www.ibm.com/bi

Send resume to George H. John, [email protected].
ASCII (plain text) via email is *strongly* preferred.  
Please put "DMJOBS-97:" then your name in the subject.
Hardcopy may be sent to 
George H. John
IBM Alamden Research Center
650 Harry Rd / D2
San Jose, CA 95120-6099
FAX: 408-927-2100 (put "Attn: George John" on cover sheet)

IBM is an equal opportunity employer.

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Wed, 12 Mar 1997 15:25:34 -0800
From: [email protected] (Brad Perry) 
Subject: HRL Job Opening: Research Intern/Parttime (KDD, DAI, Java)  http://www.wins.hrl.com

    Subject: HRL Job Opening: Research Intern/Parttime (KDD, DAI, Java)

We are currently looking to fill an intern, or part-time, PhD candidate at Hughes Research Laboratories (HRL).  The position will be a summer intern capable of extending into a part-time position during the school year.  HRL is located in Malibu, CA and represents the central research lab for Hughes Electronics Corporation.

Our group is investigating the use of agent, data mining, and database technologies to support information management, discovery, and analysis in large-scale dynamic Internet environments.
Our two primary research areas involve:
   * Information exploitation techniques to effectively identify and disseminate semantically relevant information to large user populations, especially with the use of satellite broadcast channels.
   * Data mining techniques to extract, represent, and manipulate semantic cues from large-scale and distributed information sources.

The candidate should have a background in DAI, agent architectures, machine learning, and data mining.  Experience with KQML, KIF, and/or Java a definite plus.  This position entails research and prototype development.

Required: 
  * PhD candidate in Computer Science (or related field)
  * Good OO programming skills (implementation of prototypes will
    be required).
  * Unix programming background.
  * Good oral and written communication skills.
	
Desirable:
  * Machine Learning or Data Mining background
  * Java programming experience (or C/C++, at least).
  * Ontologies.
  * Multidatabase systems.
  * Distributed object systems (CORBA, RMI, etc.)


Please email your resume to Son Dao at [email protected], or mail to:
   Son Dao 
   Hughes Research Laboratories
   3011 Malibu Canyon Road
   Malibu, CA 90265

HRL is an equal opportunity employer.  

------
Brad Perry
  Hughes Research Laboratories  [email protected] (310) 317-5683
  UCLA                          [email protected]   (310) 206-4561

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: "Max Bramer" <[email protected]> 
To: [email protected], [email protected], [email protected],
        [email protected], [email protected], [email protected] 
Date: Sun, 9 Mar 1997 17:05:52 +0000
Subject: Expert Systems 97: Call for Papers 
Reply-to: [email protected]

BRITISH COMPUTER SOCIETY
SPECIALIST GROUP ON EXPERT SYSTEMS

ANNUAL CONFERENCE - EXPERT SYSTEMS '97 (ES97)

CALL FOR PAPERS

The 17th annual Conference of the British Computer Society Specialist Group
on Expert Systems, ES97, is being held at St. John's College, Cambridge
between 15th and 17th December 1997.  The objective of the ES series of
conferences is to bring together researchers and application developers
from business, industrial and academic communities to discuss issues and
solutions to problems based on techniques derived from Artificial
Intelligence. 

The Conference continues to build on the success of previous years, with a
two-track event containing fully refereed technical and applications
papers. 

For the Technical Stream, contributions are invited in the form of papers
of up to 5,000 words on knowledge-based systems and related areas of
Artificial Intelligence. Papers representing original work on theoretical
and applied AI relating to: constraint satisfaction; intelligent agents;
knowledge engineering methods; machine learning; model-based reasoning;
verification and validation of KBS; natural language understanding;
case-based reasoning, knowledge discovery in databases and other related
areas are welcome. 

For the Applications Stream, contributions are invited in the form of
papers of up to 5,000 words presenting case studies of knowledge based
systems that address real-world problems such as: diagnosis, monitoring,
scheduling and selection. Most importantly, the papers should highlight the
critical elements of success and the lessons learned. 

Papers submitted to both streams will be refereed and those accepted will
again be published in book form in the "Research and Development in Expert
Systems" and "Applications and Innovations in Expert Systems" series (for
the technical and application streams respectively). 

To assist us with our planning of the conference, anyone intending to
submit a paper should provide a short abstract, with title, at the earliest
opportunity to the Conference Secretariat. 

Authors should indicate the stream to which their papers are being
submitted. Please include your full name and postal address in any email
submissions. 

Formatting instructions for papers will be sent as soon as the title and
abstract are received. 

Four copies of papers should be submitted to arrive no later than Friday
20th June 1997. Submissions should be sent in paper form by post to the
Conference Secretariat. 

Please note that presenters of submitted papers will be asked to cover
their costs of attending the conference by paying at the SGES members'
academic rate. 

TUTORIALS & WORKSHOPS

The Conference Committee invites proposals for tutorials or workshops to be
presented on Monday 15 December. Proposals for full and half day tutorials,
from an individual or group of presenters should be directed in the first
instance to the Conference Secretariat. 

EXHIBITION

A table top exhibition will run alongside the Conference. There will be a
limited number of spaces available and potential exhibitors are encouraged
to book early, as these will be on a first-come, first-served basis. 

SPONSORSHIP

The Conference Committee is keen to make contact with any organisations who
may wish to sponsor the Conference, in whole or in part. Sponsorship of an
international conference such as ES97 will ensure the highest visibility
for the benefactor, both through the appearance of the company logo on all
promotional literature and in references to the Conference in all media
exposure prior to and after the event. 

CONFERENCE COMMITTEE:

Conference Chair: Prof Max Bramer, University of Portsmouth, Southsea, PO4
8JF [email protected]

Deputy Conference Chair: Dr Ian Watson, University of Salford, Salford, M5
4WT [email protected]

Technical Programme Chair: Dr John Hunt, University of Wales, Dept of
Computer Science, Aberystwyth, Dyfed SY23 3DB [email protected]

Applications Programme Chair: Mrs Ann Macintosh, Artificial Intelligence
Applications Institute, Edinburgh, EH1 1HN [email protected]

CONFERENCE SECRETARIAT:

Ms. Kit Stones, The Conference Team 
17 Spring Road 
Kempston, Bedford MK42 8LS 
Tel/Fax +44 (0)1234-302490
[email protected]

IMPORTANT DATES:

Title/Abstract notification: now 
Full paper submission: 20 June 1997
Notification of acceptance: 8 August 1997 
Camera ready papers due: 19 September 1997

WORLD WIDE WEB ADDRESS FOR CONFERENCE INFORMATION:
http://www.sis.port.ac.uk/sges/es97.html
_______________________________________________________

Professor Max Bramer
Department of Information Science
University of Portsmouth
Milton, Southsea PO4 8JF, England
Tel: +44-(0)1705-844444    Fax: +44-(0)1705-844006 
email: [email protected]

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Marney Smyth <[email protected]>
Subject: Hinton-Jordan Learning Methods Tutorial, May 1997 
Date: Mon, 10 Mar 1997 06:09:19 -0500 (EST) 

        **************************************************************
        ***                                                        ***
        ***     Learning Methods for Prediction, Classification,   ***
        ***       Novelty Detection and Time Series Analysis       ***
        ***                                                        ***
        ***          Washington, D.C., May 2 -- 3, 1997            ***
        ***                                                        ***
        ***        Geoffrey Hinton, University of Toronto          ***
        ***      Michael Jordan, Massachusetts Inst. of Tech.      ***
        ***                                                        ***
        **************************************************************


A two-day intensive Tutorial on Advanced Learning Methods will be held
May 2 -- 3rd, 1997, at the Hyatt Regency on Capitol Hill, Washington
D.C.  Space is available for up to 50 participants for the course. 

The course will provide an in-depth discussion of the large collection
of new tools that have become available in recent years for developing
autonomous learning systems and for aiding in the analysis of complex
multivariate data.  These tools include neural networks, hidden Markov
models, belief networks, decision trees, memory-based methods, as well
as increasingly sophisticated combinations of these architectures.
Applications include prediction, classification, fault detection, time
series analysis, diagnosis, optimization, system identification and
control, exploratory data analysis and many other problems in
statistics, machine learning and data mining. 

The course will be devoted equally to the conceptual foundations of
recent developments in machine learning and to the deployment of these
tools in applied settings.  Case studies will be described to show how
learning systems can be developed in real-world settings.  Architectures
and algorithms will be presented in some detail, but with a minimum of
mathematical formalism and with a focus on intuitive understanding.
Emphasis will be placed on using machine methods as tools that can be
combined to solve the problem at hand. 

WHO SHOULD ATTEND THIS COURSE?

The course is intended for engineers, data analysts, scientists,
managers and others who would like to understand the basic principles
underlying learning systems.  The focus will be on neural network models
and related graphical models such as mixture models, hidden Markov
models, Kalman filters and belief networks.  No previous exposure to
machine learning algorithms is necessary although a degree in
engineering or science (or equivalent experience) is desirable.  Those
attending can expect to gain an understanding of the current
state-of-the-art in machine learning and be in a position to make
informed decisions about whether this technology is relevant to specific
problems in their area of interest. 

COURSE OUTLINE

Overview of learning systems; LMS, perceptrons and support vectors;
generalized linear models; multilayer networks; recurrent networks;
weight decay, regularization and committees; optimization methods;
active learning; applications to prediction, classification and control

Graphical models: Markov random fields and Bayesian belief networks;
junction trees and probabilistic message passing; calculating most
probable configurations; Boltzmann machines; influence diagrams;
structure learning algorithms; applications to diagnosis, density
estimation, novelty detection and sensitivity analysis

Clustering; mixture models; mixtures of experts models; the EM
algorithm; decision trees; hidden Markov models; variations on hidden
Markov models; applications to prediction, classification and time
series modeling

Subspace methods; mixtures of principal component modules; factor
analysis and its relation to PCA; Kalman filtering; switching mixtures
of Kalman filters; tree-structured Kalman filters; applications to
novelty detection and system identification

Approximate methods: sampling methods, variational methods; graphical
models with sigmoid units and noisy-OR units; factorial HMMs; the
Helmholtz machine; computationally efficient upper and lower bounds for
graphical models

REGISTRATION

Standard Registration: $700

Student Registration:  $400

Cancellation Policy: Cancellation before Friday April 25th, 1997, incurs
a penalty of $150.00. Cancellation after Friday April 25th, 1997, incurs
a penalty of one-half of Registration Fee. 

Registration Fee includes Course Materials, breakfast, coffee breaks, and lunch.

On-site Registration is possible. Payment of on-site registration must
be in US Dollar amounts, by Money Order or Check (preferably drawn on a
US Bank account). 

Those interested in participating should return the completed
Registration Form and Fee as soon as possible, as the total number of
places is limited by the size of the venue. 

[edited for space]
ADDITIONAL INFORMATION
A registration form and hotel information 
are available from the course's WWW page at 

 http://www.ai.mit.edu/projects/cbcl/web-pis/jordan/course/

 Marney Smyth
 E-mail: [email protected]
 Phone:  617 258-8928
 Fax:    617 258-6779
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 14 Mar 1997 16:47:02 +0100 (MET)
From: Luc De Raedt <[email protected]>
To: [email protected], [email protected]
Subject: Final CFP Frontiers of ILP Workshop at IJCAI

FINAL CALL FOR PARTICIPATION and PAPERS

                     IJCAI-97 Workshop on 

	     FRONTIERS OF INDUCTIVE LOGIC PROGRAMMING 

                   Monday  25 August 1997

==========================================================================

GENERAL INFORMATION 

The IJCAI-97 one day workshop on "Frontiers of ILP" in Nagoya, Japan, 
will take place on August 25, immediately prior to 
the start of the main IJCAI conference. 

TECHNICAL DESCRIPTION

Inductive logic programming (ILP) is a recent subfield of
artificial intelligence that studies the induction of first order formulae 
from examples. The purpose of this workshop is twofold:
on the one hand, we wish to widen the scope of ILP
by investigating its relations to neighboring fields, 
and on the other hand, we wish to make ILP more accessible 
for researchers from neighboring fields.

The workshop therefore solicits papers
that lie at the frontiers of ILP with neighboring fields.
A non-exclusive list of interesting topics for the workshop includes :

* ILP and Software Engineering: 
  what has ILP to offer to Software Engineering ?, 
  and in what way can Software Engineering help to design ILP systems
  and applications  ?

* ILP for Knowledge Discovery in Databases : ILP aims 
  at learning complex rules involving multiple relations from small 
  databases, whereas KDD typically induces simple rules about a 
  single relation from a large database. Furthermore, ILP allows to
  exploit background knowledge in a variety of ways. Can KDD and ILP be  
  succesfully combined ?  

* ILP and Computational or Algorithmic Learning Theory :
  though many results have been obtained concerning the learnability
  of inductive logic programming, most of the results are negative
  and most of the positive results are reducible to propositional learning 
  methods.  Is there a mismatch of COLT with ILP ? and if so,
  what can be done about it ?

* ILP versus propositional learning methods :
  Since the very start of ILP,  researchers and practioners of 
  machine learning have wondered about the relation between 
  ILP and propositional learning methods. Theoretical and experimental 
  questions that arise include:
  when to use ILP and when to use propositional learning methods ?
  under what circumstances can ILP be reduced to propositional learning ?
  what is the price to pay for using first order logic in 
  terms of efficiency ?

* ILP and Knowledge Representation : ILP has traditionally employed
  computational logic to represent hypotheses and observations.
  Alternative well-founded knowledge representation formalisms have received
  little attention  (with the exception of CLASSIC). 
  What can ILP learn from Knowledge Representation  ?
  and in what well-founded Knowledge Representation formalisms
  is induction feasible ?

* ILP in multistrategy learning : Multistrategy learning 
  combines multiple learning strategies. What role can ILP
  play for multistrategy learning ?

* ILP and Probabilistic reasoning:  in contrast to 
  propositional learning methods, ILP has not used
  probabilistic representations. How can ILP incorporate
  such representations ? and how can it interact with
  methods such as Bayes nets or Hidden Markov Models ?

* ILP for Intelligent Information Retrieval: 
  The rapid development of
  the World Wide Web has spawned significant interest in intelligent
  information retrieval. In particular, the need for algorithms for
  reliably classifying textual documents into given categories (like
  interesting/uninteresting) be useful for a wide variety of tasks.
  Currently, most learning algorithms are not able to make use of
  structural information like word order, succesive words, structure of
  the text, etc. Can ILP algorithms offer advantages over conventional
  information retrieval or machine learning algorithms for this sort of
  tasks?

* Applications of ILP in subfields of AI : ILP has been applied
  to other subfields of AI, including natural language processing,
  intelligent agents and planning. 
  Further applications of ILP within AI are solicited.

Both position papers about the relation of ILP to other fields, as well
as research papers that make specific techical contributions
are solicited. However, to stimulate discussion, it is expected 
that each technical paper also clarifies the position 
of ILP with regard to the neighboring field(s) it addresses.

Except for the presentation of position and technical papers,
the workshop will also feature a panel discussion
on the frontiers of ILP and possibly an invited talk.

ORGANISERS

Luc De Raedt (chair and primary contact)
Saso Dzeroski
Koichi Furukawa  
Fumio Mizoguchi
Stephen Muggleton

PROGRAMME COMMITTEE

Francesco Bergadano  (Italy)
Luc De Raedt (co-chair, Belgium)
Saso Dzeroski (Slovenia)
Johannes Furnkranz  (Austria)
Koichi Furukawa  (Japan)
David Page (U.K.)
Fumio Mizoguchi  (Japan)
Ray Mooney (U.S.A.)
Stephen Muggleton (co-chair, U.K.)


CALL FOR PARTICIPATION

Participation is open to all members of the AI Community.
However, to encourage interaction and a broad exchange of ideas
the number of participants will be strictly limited
(preferably under 30 and certainly under 40). 

Participants will be selected on the basis of submissions.
Three types of submissions will be considered :
1) technical contributions (ideally, a 3 to 5 page extended abstract, 
                         in the IJCAI Proceedings Format, 3000-4000 words),
2) position papers  (ideally, a 1 to 3 page abstract
                  in the IJCAI Proceedings Format, 1000 - 3000 words)
3) a statement of interest (ideally, a one page motivation of why you 
     would like to participate, 300- 500 words) 
Only submissions of type 1) and 2) will be considered 
for presentation at the workshop and inclusion in the workshop notes. 

Submissions should be received no later than April 1, 1997,
and must include  first  author's  complete   contact  information, 
including address, email, phone, and fax number. Though 1 April
is the hard deadline, the authors are encouraged to submit
their material by 24 March, in order to facilitate the reviewing process. 

Double submissions with the ILP-97 Workshop (which is to take
place in Prague, September 1997) are allowed. 

SUBMISSIONS

Submit papers by email (postscript) and surface mail (2 copies) to

   Luc De Raedt
   Dept. of Computer Science 
   Katholieke Universiteit Leuven
   Celestijnenlaan 200A
   B-3001 Heverlee
   Belgium
   Email : [email protected]

IMPORTANT DATES

  - Paper submission : 1 April 
  - Notification to Authors : 21 April 
  - Camera ready copy : the submissions themselve  
                        will serve as camera ready copy
     (submissions in the IJCAI Proceedings Style are strongly preferred,
     see http://www.ijcai.org/ijcai-97/ for details)

PUBLICATION

The accepted submissions will be included in the workshop notes
to be distributed at the workshop.
Post-conference publication of a selection of the workshop papers
will be considered and discussed at the  workshop.

COSTS

To cover costs, a fee of $US 50 will be charged, 
in addition to the normal IJCAI-97 conference registration fee.
Attendees of IJCAI workshops will be required to register
for the main IJCAI conference. 

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Subject: ILP-97: CFP Reminder
  Date:  Mon, 17 Mar 1997 15:49:23 +0100
  From: Saso Dzeroski <[email protected]>

	The Seventh International Workshop on
                Inductive Logic Programming

          17-19 September 1997, Prague, Czech Republic

The deadline for paper submissions is 31 March 1997. 
                                      -------------
Invited talks will include:
"Data Mining: Algorithms and Limitations" by Usama Fayyad,
"Complexity of Logic Programming" by Georg Gottlob, and
"ILP and CLP" by Jean-Francois Puget.

For more information see 
http://www-ai.ijs.si/SasoDzeroski/ilp97.html
410.2097:11IJSAPL::OLTHOFSpellchecked Henry AlthoughTue Apr 01 1997 09:39904
Knowledge Discovery Nuggets 97:11, e-mailed 97-03-28
News:
	* GPS, KDD-97 Tutorials Program
		http://www-aig.jpl.nasa.gov/kdd97-docs/kdd97.tutorials.html
	* J. Wiegand, KDD tools/methods for detection of skin malignancies?
Publications:
	* P. Vitanyi, Kolmogorov Complexity and Applications, 2nd ed.,
		http://www.cwi.nl/~paulv/kolmogorov.html
	* R. Caldwell, Special Issue and Competition on
		Improving Generalization for Nonlinear Financial Forecasting Models
		http://ourworld.compuserve.com/homepages/ftpub/call.htm 
Positions:
	* V. Petraglia, Thinking Machines, Consultant Positions 
	* M. Ramoni, Research Studentships at the Knowledge Media Institute
Meetings:
	* G. Widmer, ECML-97 Preliminary Programme, 
		23-25 April 1997, Prague, Czech Republic 
		http://is.vse.cz/ecml97/home.html
	* J. Han, SIGMOD-97 Data Mining Workshop, May 11, 1997
		http://fas.sfu.ca/cs/conf/dmkd97.html 
	* W. Wothke, Chicago ASA Data Mining meeting, May 2, 1997
		http://www.smallwaters.com/datamine
	* GPS, Data Mining'97 : Increasing Corporate Performance, 
		Paris, June 2-4, 1997, http://www.datamining.org/events.htm 
	* S. Tafolla, XpertUser Conference, 2-5 November 1997, Boston,
		http://www.XpertUser.com
--
Knowledge Discovery Nuggets is a free electronic newsletter for the Data Mining 
and Knowledge Discovery community, focusing on the latest research and 
applications.
Submissions are most welcome and should be emailed, with a DESCRIPTIVE subject 
line (and a URL) to [email protected].  Submissions may be edited for 
brevity. 
To subscribe, see http://www.kdnuggets.com/subscribe.html 
KD Nuggets frequency is 3-4 times a month. 
Back issues of KD Nuggets, a catalog of data mining tools ("Siftware"), and a 
wealth of other information on Data Mining and Knowledge Discovery are available 
at Knowledge Discovery Mine site http://www.kdnuggets.com/
	-- Gregory Piatetsky-Shapiro (editor)

********************* Official disclaimer ************************************ 
All opinions expressed herein are those of the contributors and not necessarily 
of their respective employers or of KD Nuggets
*****************************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
The first and simplest emotion which we discover in the human mind, is curiosity. 
               --Edmund Burke
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: 27 Mar 1997,  17:12:15
From: GPS <[email protected]>
Subject: KDD-97 Tutorials 

KDD-97 conference will have a day of excellent tutorials by leading
researchers-many thanks to P. Smyth for putting it together.
See http://www-aig.jpl.nasa.gov/kdd97-docs/kdd97.tutorials.html
for full details 
================================================================
<title> KDD97 Tutorial Abstracts and Speakers </title>
<h2> Tutorial 1: Data Mining and KDD: An Overview </h2>
<h3> Usama Fayyad, Microsoft Research and
Evangelos Simoudis, IBM.   </h3>

We present a basic tutorial of this new and emerging area and
emphasize relations to constituent communities including statistics,
databases, pattern recognition, learning, and visualization. The
tutorial provides a basic overview of the KDD process for extracting
knowledge from databases and covers the basics of each step in the
process including: data warehousing, selection and cleaning, 
data transformation, data mining, evaluation, and visualization. 
We also cover a sampling of successful applications and outline
challenges and issues to be addressed.<p>
<hr>
<h2> Tutorial 2: Modelling Data and Discovering Knowledge</h2>

<h3>  David Hand, Open University, UK. </h3>
Our aim is to extract knowledge from large bodies of data.  The size of 
these bodies mean that we cannot do it unaided, but must use fast computers, 
applying sophisticated statistical tools.  Attempts to automate the process 
of knowledge extraction date from at least the early 1980s, with the work on 
statistical expert systems.  We examine this work, noting its successes and 
failures and, especially, what researchers in data mining and knowledge 
discover can learn from those efforts.  We examine what data are, what 
information is, and what knowledge is.  We contrast modelling with 
discovery, especially in the context of large data sets.  We examine high 
level modelling issues, such as overfitting, generalisability, 
overmodelling, and model evaluation.  And we examine high level exploration 
issues such as the discovery of accidental artefacts.  The confluence of 
computing and statistics in some areas provides a nice backdrop against 
which to examine these issues, and we briefly discuss neural networks and 
classification trees from these two perspectives.<p>
<hr>
<h2> Tutorial 3: Text Mining - Theory and Practice</h2>
<h3> Ronen Feldman, Bar-Ilan University, Israel. </h3>
Knowledge Discovery in Databases (KDD) focuses on the computerized
exploration of large amounts of data and on the discovery of interesting
patterns within them.  While most work on KDD has been concerned with
structured databases, there has been little work on handling the huge
amount of information that is available only in unstructured textual form. 
In this tutorial we will present the general theory of Text Mining and will
demonstrate several systems that use these principles to enable interactive
exploration of large textual collections. We will describe generic
techniques for text categorization and information extraction that are used
by these systems. The systems that will be presented are KDT which is
system for Knowledge Discovery in Texts, FACT, which discovers associations
amongst keywords labeling the items in a collection of textual documents,
and the Text Explorer which is a system that provides a high level language
for interactive exploration of textual collections.
We will present a general architecture for text mining and will outline the
algorithms and data structures behind the systems. We will give special
emphasis to incremental algorithms and to efficient data structures.
<p>
<hr>
<h2> Tutorial 4: Exploratory Data Analysis using Interactive Dynamic Graphics
</h2>

<h3> Deborah Swayne, Bell Communications Research
and  Diane Cook, Iowa State University. </h3>
Researchers and software designers in the field of data mining
are just beginning to make extensive use of graphical methods.
Interactive dynamic data visualization has been explored
in the field of statistics for over twenty years, and we
propose that much of what has been learned in statistics is
relevant for data mining.
This class is an introduction to interactive data visualization as
it is practiced as part of exploratory data analysis.  The XGobi
software, publicly available dynamic visualization software, will
be used in the analysis of examples from biology, business,
physics, engineering, and telecommunications.
The examples will illustrate a set of general visualization principles
which are embodied in specific methods such as brushing and
identification of points in simple scatterplots, three dimensional
rotations, rotations in higher dimensions such as the grand tour, and
directed searches in higher dimensions for interesting two dimensional
views using projection pursuit and manual control.
<p>
<hr><h2> Tutorial 5: Visual Techniques for Exploring Databases </h2>
<h3> Daniel Keim, University of Munich.</h3>
For data exploration to be effective, it is important to include the human in
the exploration process and combine the flexibility,  creativity, and general
knowledge  of  the  human  with  the  enormous   storage   capacity  and  the
computational  power of today's computers.  Visual database  exploration aims
at integrating the human in the exploration  process, applying its perceptual
abilities to the large data sets available in today's computer  systems.  The
basic idea of visual data  exploration  is to present the data in some visual
form,  allowing the human to get insight into the data and draw  conclusions.
Visual  data  exploration  techniques  have  proven  to be of high  value  in
exploratory  data analysis and they also have a high  potential for exploring
large databases.  Visual database  exploration is especially powerful for the
first steps of the data mining  process,  namely  understanding  the data and
generating   hypotheses  about  the  data,  but  it  may  also  significantly
contribute  to the actual  knowledge  discovery  by guiding the search  using
visual feedback.
The goal of the tutorial is to show the potential of visualization technology
for  exploring  large  databases.  The  tutorial  provides an overview of the
state-of-the-art  in data  visualization and provides a classification of the
existing  data  visualization  techniques.  Besides  describing  each  of the
classes,  the tutorial  focuses on new  developments  in data  visualization,
which are relevant to the area of knowledge  discovery,  and describes a wide
range of recently  developed  techniques  for  visualizing  large  amounts of
arbitrary   multi-attribute   data   which   does   not  have   any  two-  or
three-dimensional  semantics  and  therefore  does not lend itself to an easy
display.  A detailed  comparison  shows the strength  and  weaknesses  of the
existing techniques and reveals potentials for further improvements.  Several
examples  demonstrate the benefits of visualization  techniques for exploring
databases.  The  tutorial  concludes  with an overview  of existing  database
exploration and visualization  systems, including research prototypes as well
as commercial products.
<p>
<hr><h2> Tutorial 6: OLAP and Data Warehousing</h2>
<h3> Surajit Chaudhuri, Microsoft Research  and
 Umesh Dayal, Hewlett Packard Labs. </h3>

 
On-Line Analytical Processing (OLAP) and Data Warehousing technologies
enable enterprises to gain competitive advantage by exploiting the
ever-growing amount of data that is collected and stored in corporate
databases and files for better and faster decision making.  Over the
past few years, these technologies have experienced explosive growth,
both in the number of products and services offered, and in the extent
of coverage in the trade press. Vendors (including all database companies)
are paying increasing attention to all aspects of decision support.
The area opens up interesting research directions, with ties to past
work in database systems, but with different assumptions and
requirements. Only very recently, however, has the database research
community started to understand and address some of these issues. 
This tutorial presents an overview of OLAP and data warehousing, and an
in-depth study of selected aspects. An outline of the tutorial follows:
1. Introduction: definitions, evolution, differences from OLTP, architectures
2. Models and Tools: 	conceptual model for OLAP,
front-end tools (e.g., multidimensional spreadsheets),
database design (e.g., star and snowflake schema).
3.  Database Server technologies for Decision Support
Queries: 	specialized indexing techniques,
specialized join and scan methods,
data partitioning and use of parallelism,
	intelligent processing of aggregates,
	complex query processing,
	extensions to SQL,
	ROLAP vs. MOLAP.
		4.  Other Services for OLAP/Data warehousing:
data cleaning, loading and refresh,
tools for warehouse, system and process management,
metadata management and the role of repository.
	5.  State of Commercial Practice.
	6.  Research Issues.
The target audience 	is
researchers and developers interested in learning about the concepts,
products and the technical innovations in the area of decision support
technologies.
<p>
<hr><h2> Tutorial 7: Statistical Models for Categorical Response Data</h2>
<h3> William DuMouchel, AT&T Research. </h3>
This tutorial will survey the most common models and methods statisticians
use to fit and test relationships among categorical (discrete) data.  Most
of these techniques are described in statistics texts such as 
<i> Categorical 
Data Analysis </i>, by Alan Agresti, (Wiley 1990) and are widely available in
popular computer packages such as SAS and Splus.  Therefore it is almost de
rigeur for someone with a new classification technique to compare the
proposal to one or more of these standard methods. The tutorial will focus
on loglinear and logistic regression models, and related  models such as
probit, poisson regression, and survival models.  In the short time
available, priority will be given to explaining why these techniques are so
popular among statisticians, and to how the basic models have been extended
to handle variables having more than two categories or when some of the
variables have continuous or ordinal scales.  Examples of model fitting,
model search and model comparison using SAS and Splus will be presented and
discussed. 
For Biographical Information on Presenters 
see the web site http://www-aig.jpl.nasa.gov/kdd97-docs/kdd97.tutorials.html Contact Information:
<a href="http://www.ics.uci.edu/~smyth"> Padhraic Smyth </a> 
University of California, Irvine (KDD-97 Tutorials Chair).
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: [email protected]
Date: Fri, 21 Mar 1997 20:04:25 -0500 (EST)

I am searching for KDD tools/approaches for searching through clinical data to
help develop and fine-tune medical imaging or detection equipment.
Specifically, early detection of skin malignancies.
Perhaps there is a group somewhere working on this.

Thank you.
Best wishes,
Jeff Wiegand 

>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Wed, 19 Mar 1997 15:48:16 +0100
From: [email protected]

Ming Li and Paul Vitanyi,
AN INTRODUCTION TO KOLMOGOROV COMPLEXITY AND ITS APPLICATIONS,
REVISED AND EXPANDED SECOND EDITION, Springer-Verlag, New York, 1997, 
xx+637 pp, 41 illus. Hardcover \$49.95/ISBN 0-387-94868-6
(Graduate Texts in Computer Science Series)

After four years and two printings the second edition has now appeared. During
the preparation the book has been out of stock for a year. In interaction with
many readers and teachers of courses and seminars, all reported errors and 
problems have been corrected. The book is revised and expanded by about 
90 pages. The price has been *lowered* by over $9.
See the web page "http://www.cwi.nl/~paulv/kolmogorov.html".

>From the ``PREFACE TO THE SECOND EDITION'':

When this book was conceived ten years ago,
few scientists realized the width of scope and the
power for applicability of the central ideas. Partially
because of the enthusiastic reception of the first edition,
open problems have been solved and new applications have been
developed. We have added new material on the relation between
data compression and  minimum description length induction,
computational learning, and universal prediction; circuit theory; distributed
algorithmics; instance complexity; CD compression;
computational complexity; Kolmogorov random graphs;
shortest encoding of routing tables in communication networks;
resource-bounded computable universal distributions; average case properties;
the equality of statistical entropy and expected Kolmogorov complexity;
and so on. Apart from being used by researchers and
as reference work, the book is now commonly used for graduate courses 
and seminars. In recognition of this fact, the second
edition has been produced in textbook style. We have
preserved as much as possible the ordering of
the material as it was in the first edition.
The many exercises bunched together at the ends of
some chapters have been moved to the appropriate sections.
The comprehensive bibliography on Kolmogorov complexity
at the end of the book has been updated, as have
the ``History and References'' sections of the chapters.
Many readers were kind enough to express their appreciation
for the first edition and to send notification of typos, errors,
and comments. Their number is too large to thank them individually,
so we thank them all collectively.


BLURB:

Written by two experts in the field, this is the only
comprehensive and unified treatment of the
central ideas and their applications of Kolmogorov complexity---the
theory dealing with the quantity of information in individual objects.
Kolmogorov complexity is known variously as `algorithmic
information', `algorithmic entropy', `Kolmogorov-Chaitin
complexity', `descriptional complexity', `shortest program length',
`algorithmic randomness', and others.

The book is ideal for advanced undergraduate students, graduate students
and researchers in computer science, mathematics, cognitive sciences,
artificial intelligence, philosophy, statistics and physics.
The book is self contained in the sense that it contains the basic requirements
of computability theory, probability theory, information theory, and coding.
Included are also numerous problem sets, comments, source references and hints
to the solutions of problems, course outlines for classroom use, as well as a 
great deal of new material not included in the first edition.


If you are seriously interested in using the text in the course,
contact Springer-Verlag's Editor for Computer Science, Martin
Gilchrist, for a complimentary copy.

     Martin Gilchrist                   [email protected]
     Suite 200, 3600 Pruneridge Ave.    (408) 249-9314
     Santa Clara, CA 95051

If you are interested in the text but won't be teaching a course,
we understand that Springer-Verlag sells the book, too.
To order, call toll-free 1-800-SPRINGER (1-800-777-4643); N.J.
residents call 201-348-4033. For information regarding
examination copies for course adoptions, write Springer-Verlag
New York, Inc. , 175 Fifth Avenue, New York,NY 10010. 
You can order through the Web site: "http://www.springer-ny.com/"

For U.S.A./Canada/Mexico- e-mail: [email protected] or fax an
order form to: 201-348-4505.
For orders outside U.S.A./Canada/Mexico send this form to: [email protected]
Or call toll free: 800-SPRINGER - 8:30 am to 5:30 pm ET (that's 777-4643 and 
201-348-4033 in NJ). Write to Springer-Verlag New York, Inc., 175 Fifth Avenue,
New York, NY, 10010.

Visit your local scientific bookstore. Mail payments may be made by check, 
purchase order, or credit card (see note below). Prices are payable in U.S. 
currency or its equivalent and are subject to change without notice. Remember, 
your 30-day return privilege is always guaranteed!

Your complete address is necessary to fulfill your order.

>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: Randall Caldwell <[email protected]>
Subject: CFP: Improving Generalization for Nonlinear Financial 
Forecasting Models
                           
         Journal of Computational Intelligence in Finance
                      Call for Papers
              Special Issue and Competition on
"Improving Generalization for Nonlinear Financial Forecasting Models"


The Journal of Computational Intelligence in Finance, a peer-reviewed 
technical journal, published by Finance & Technology Publishing, is 
seeking papers for review and publication in 1997 on "Improving 
Generalization for Nonlinear Financial Forecasting Models".  For 
comparison of methods submitted, the target variable series and 
performance metrics are specified (though not required). 

PUBLICATION DATE

  November 1997

PAPER SUBMISSION DEADLINE

  June 30, 1997

MOTIVATION
 
The critical issue in applying neural networks and other data-driven
forecasting systems is generalization, the performance on data not used 
for training. The key to generalization behavior is model complexity. 
Too simple a model cannot approximate the true relationship, and overly 
complex models adjust to the noise in the data. Nearly all financial 
applications of nonparametric models (such as neural networks and genetic
algorithms) vary model complexity by adjusting the number of parameters. 
This special issue intends to highlight other methods to improve 
generalization, in particular regularization (e.g., neural network 
weight decay and smoothing) and techniques for combining models.  Of
particular interest are nonlinear methods including neural networks,
genetic algorithms, nearest neighbor networks, polynomial networks,
fuzzy logic, and hybrids.

Nearly all studies apply cross-validation to select the best model.
Alternatives to cross-validation include 'analytical' selection rules 
such as Akaike's Information Criterion, Schwartz's Information Criterion, 
and a number of others. Of particular interest are the statistical
properties (i.e., bias and variance) of model selection methods in
estimating out-of-sample performance.

DATA, TARGET VARIABLES and PERFORMANCE METRICS

Data: daily prices of a financial time series (see below)
Target Variable: the relative difference in percent (RDP) between 
today's closing price and the price five (5) days ahead
Performance Metrics: MSE (target). nRMSE and DS (to be used in the
analysis).

Participants are encouraged to use the forecast data, target variable and 
performance metrics specified for this special issue, which are available
on the Web to those who submit a satisfactory abstract (including brief 
biography) as outlined below. Participants are not be restricted regarding 
the data used as inputs to their predictors.  Especially interesting 
original methods using other forecast data, target variables and 
performance metrics will also be considered.

The forecast series is derived from daily closing prices for a financial 
time series.  The target variable is the relative difference in
percent (RDP) between today's closing price and the closing price 
five (5) days ahead.  The date, the underlying price series and the
target variable series are all provided in the downloadable data file.
The target metric is the MSE.  Also, authors' analysis should include 
the normalized RMSE (RMSE normalized using the standard deviation of 
actual RDP values), and Directional Symmetry (percentage of correctly 
predicted directions with respect to the target variable).

The forecast data provided is separated into in-sample (10 years of 
daily data) and out-of-sample (2 years of daily data) sets.  Participants 
are not restricted regarding the data used as input to their predictors.  
However, all data used should be disclosed in the paper presentaton, 
including the details of all techniques and formulas used to pre-process 
the data.  Details on the predictor and the methods used for improving 
generalization should be presented in the paper.

FORECAST HORIZON AND RE-TRAINING

Participants should test performance of their predictors over the entire
two-year out-of-sample dataset.  Of interest are results of analyses and
performance of predictors over the entire two-year prediction period:

(1) without re-training and 
(2) with re-training (optional).

The results from (1) and (2) can be useful for estimating the limits
of the forecasting horizon for the prediction methods presented.

For additional details on the forecast data, target variable and
performance metrics, see:

http://ourworld.compuserve.com/homepages/ftpub/call.htm 

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Fri, 21 Mar 1997 11:07:08 -0500
From: Vaughn Petraglia <[email protected]>
Subject: Thinking Machines, Consultant Positions 

Thinking Machines Professional Services
Senior Consultant Data Mining
San Francisco bay area and other locations
3/12/97

As a member of the new Thinking Machines Professional Services
Organization, you will be responsible for all aspects of bidding and
delivering consulting products and service to many of our most important
customers.  You will lead or participate in small teams of seasoned
professionals to help our customers use Darwin to find new business
opportunities hidden in their very large databases and data warehouses.  

Major job functions include:

 
1.  Working with a TMC Account Executive to understand the customer or
    prospects requirements, you will provide technical guidance through 
    the sales cycle.
2.  Develop a project plans, risk analysis, and formal services bids.
3.  Organizing and managing all resources needed to complete the project
    within budget and on time.
4.  Providing hands on data analysis and data mining consulting.
5.  Consulting and skills transfer on the Darwin product.
6.  Follow-up to insure customer satisfaction.

The ideal candidate will have:

1.  Project management experience.
2.  Excellent written and oral communications skills.
3.  Advanced degree in an analytical field or equivalent experience.
4.  Experience in data analysis, database systems, knowledge based systems
    or data mining. 
5.  Experience in parallel algorithms and parallel computer systems is
    desirable. 


Contact:  Vaughn Petraglia
          [email protected]

         Thinking Machines
         14 Crosby Dr.
         Bedford, Ma 01730

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Fri, 21 Mar 1997 18:45:32 +0000
From: Marco Ramoni <[email protected]>
Subject: Research Studentships at the Knowledge Media Institute

The Knowledge Media Institute (KMi) is home to internationally recognised
researchers in Educational Multimedia, Collaboration Technologies,
Artificial Intelligence, Cognitive Science, and Human-Computer Interaction.
KMi offers students an intellectually challenging environment with
exceptional research and computer facilities. We are currently seeking
applications for full-time, 3-year research studentships in the following
areas:

- Migratory Interfaces and Mobile Computing
- Virtual Intelligence and Knowledge Discovery
- Knowledge Management and Knowledge Modelling
- Sharing and Reusing Design Knowledge over the WWW

Applicants are typically expected to have a degree in computer science,
artificial intelligence, cognitive science, psychology, or a related
discipline. As KMi only accepts a very small number of research students
per year, admission is highly competitive. To apply, send a CV and short
project proposal (3 pages) along with a completed application form.
Successful candidates must be willing to live within reasonable commuting
distance from Milton Keynes, and be available to start on October 1, 1997.

Applicants are strongly encouraged to visit the KMi web site
(http://kmi.open.ac.uk/studentships) for more information on ongoing KMi
projects and the studentships.

An application form with further particulars can be obtained by contacting
Ms. Ortenz Rose by email ([email protected]), telephone  (+44  (1908) 653
800) or post (Knowledge Media Institute, The Open University, Walton Hall,
Milton Keynes, MK7 6AA, UK). Informal advice on these studentships can be
obtained by contacting Dr. Tamara Sumner, admissions co-ordinator, by email
at [email protected] or by telephone at the number above.

Closing date for applications: 18 April 1997

Further particulars are attached below.

Virtual Intelligence and Knowledge Discovery

Marco Ramoni (KMi)
http://kmi.open.ac.uk/~marco

The Virtual Intelligence Project and the Knowledge Discovery Project at the
Knowledge Media Institute seek a candidate PhD student to work at the
intersection of their areas of research. The Virtual Intelligence Project
focuses on the development of distributed Artificial Intelligence
applications over the World Wide Web. The Knowledge Discovery Project
investigates probabilistic and statistical methods to extract reusable
knowledge sources from databases. The PhD project will fall into their
joint effort to develop a distributed knowledge discovery architecture over
the World Wide Web. The successful candidate will be able to choose a
research topic among a variety of key issues underlying this research,
ranging from methodological aspects of knowledge extraction and distributed
artificial intelligence to design and development issues of the
architecture.

More information on the Virtual Intelligence Project is available at:
http://kmi.open.ac.uk/~marco/projects/wai/vip

More information on the Knowledge Discovery Project is available at:
http://kmi.open.ac.uk/~marco/projects/kdd

For more information on this studentship, contact Marco Ramoni at
[email protected].

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Mon, 17 Mar 1997 15:21:43 +0100 (MET)
From: Gerhard Widmer <[email protected]>
Subject: ECML-97 Preliminary Programme


9th EUROPEAN CONFERENCE ON MACHINE LEARNING (ECML-97)
23-25 April 1997, Prague, Czech Republic 
PRELIMINARY PROGRAMME 

Up-to-date information on the conference (including registration information)
can be found at 
http://is.vse.cz/ecml97/home.html
This programme with complete abstracts of all talks and links to the
workshops is also available at
http://www.ai.univie.ac.at/ecml/programme.html
-----------------------------------------------------------------------------

--------------------
WEDNESDAY, APRIL 23:

 9.00 -  9.30 	Welcome

 9.30 - 10.30	INVITED TALK:
		Uncertain Learning Agents
		Stuart Russell, University of California, Berkeley, USA

10.30 - 11.00	Coffee Break

11.00 - 10.30	Integrated Learning and Planning Based on
		Truncating Temporal Differences 
		Pawel Cichosz

11.30 - 12.00	Finite-Element Methods with Local Triangulation Refinement
		for Continuous Reinforcement Learning Problems
		Remi Munos

12.00 - 12.15	Learning and Exploitation Do Not Conflict
		Under Minimax Optimality
		Csaba Szepesvari

12.15 - 12.30	Exploiting Qualitative Knowledge to Enhance Skill Acquisition
		Cristina Baroglio

12.30 - 14.00	Lunch

14.00 - 15.00	INVITED TALK:
		Constructing and Sharing Perceptual Distinctions
		Luc Steels, Free University of Brussels (VUB) and
Sony Computer Science Laboratory, Paris
15.00 - 15.30	Ibots Learn Genuine Team Solutions
		Cristina Versino, Luca Maria Gambardella


15.30 - 16.00 	Coffee Break

16.00 - 16.30	NeuroLinear: A System for Extracting Oblique Decision Rules
		from Neural Networks
		Rudy Setiono, Huan Liu

16.30 - 17.00	Learning Different Types of New Attributes by Combining the
		Neural Network and Iterative Attribute Construction
		Yuh-Jyh Hu

17.00 - 17.45	Commenting Session


-------------------
THURSDAY, APRIL 24:

 9.00 - 10.00	INVITED TALK:
		On Prediction by Data Compression
		Paul Vitanyi, CWI, Amsterdam

10.00 - 10.30	Conditions for Occam's Razor Applicability and
		Noise Elimination
		Dragan Gamberger, Nada Lavrac


10.30 - 11.00	Coffee Break


11.00 - 11.30	Compression-Based Pruning of Decision Lists
		Bernhard Pfahringer

11.30 - 11.45	Inductive Genetic Programming with Decision Trees
		Nikolay I. Nikolaev, Vanio Slavov

11.45 - 12.00	Probabilistic Incremental Program Evolution:
		Stochastic Search Through Program Space
		Rafal Salustowicz, Juergen Schmidhuber

12.00 - 12.30	Constructing Intermediate Concepts by Decomposition
		of Real Functions
		Janez Demsar, Blaz Zupan, Marko Bohanec, Ivan Bratko

12.30 - 14.00	Lunch

14.00 - 14.30	Global Data Analysis and the Fragmentation Problem in
		Decision Tree Induction
		Ricardo Vilalta, Gunnar Blix, Larry Rendell

14.30 - 15.00	Model Combination in the Multiple-Data-Batches Scenario
		Kai Ming Ting, Boon Toh Low

15.00 - 15.30	Commenting Session

15.30 - 16.00 	Coffee Break

16.00 - 17.00	Poster Session

17.00 - open	ECML Community Meeting



-----------------
FRIDAY, APRIL 25:

 9.00 -  9.15	A Case Study in Loyalty and Satisfaction Research
		Koen Vanhoof, Josee Bloemer, Koen Pauwels

 9.15 -  9.30	Inducing and Using Decision Rules in the
		GRG Knowledge Discovery System
		Ning Shan, Howard J. Hamilton, Nick Cercone

 9.30 -  9.45	Learning When Negative Examples Abound
		Miroslav Kubat, Robert Holte, Stan Matwin

 9.45 - 10.00	Search-Based Class Discretization
		Luis Torgo, Joao Gama

10.00 - 10.15	Classification by Voting Feature Intervals
		G"ulsen Demir"oz, H. Altay G"uvenir

10.15 - 10.30	A Model for Generalization Based on Confirmatory Induction
		Nicolas Lachiche, Pierre Marquis


10.30 - 11.00	Coffee Break


11.00 - 11.30	Natural Ideal Operators in Inductive Logic Programming
		Fabien Torre, Celine Rouveirol

11.30 - 12.00	Theta-subsumption for Structural Matching
		Luc De Raedt, Peter Idestam-Almquist, Gunther Sablon

12.00 - 12.30	Induction of Feature Terms with INDIE
		Eva Armengol, Enric Plaza

12.30 - 12.45	Metrics on Terms and Clauses
		Alan Hutchinson

12.45 - 13.00	Learning Linear Constraints in Inductive Logic Programming
		Lionel Martin, Christel Vrain


Afternoon off - trip and farewell party (optional; see social programme)


------------------
SATURDAY, APRIL 26:

ECML/MLNet WORKSHOPS: 
WS 1: Data-Driven Learning of Natural Language Processing Tasks 
WS 2: Case-Based Learning: Beyond Classification of Feature Vectors WS 3: Learning in Dynamically Changing Domains:
Theory Revision and Context Dependence Issues 
WS 4: Machine Learning and Human-Agent Interaction 


>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: Jiawei Han <[email protected]>
Date: Tue, 18 Mar 1997 22:05:37 -0800 (PST)
Subject: SIGMOD'97 Data Mining Workshop:  Call for Participation

Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD'97)
in cooperation with ACM-SIGMOD'97
Tucson, Arizona, May 11, 1997
(URL: http://fas.sfu.ca/cs/conf/dmkd97.html)
PROGRAM
The workshop will be held one day before the SIGMOD/PODS'97 conference.  
The program is as follows:
8:30--8:35	Opening Remarks
8:35--9:30	Invited Talk
9:30--9:45	Coffee Break
9:45--11:00	Session I	Clustering/Classification

		A Fast Clustering Algorithm to Cluster Very Large Categorical 
		Data Sets in Data Mining
		Zhexue Huang

		Clustering Based On Association Rule Hypergraphs
		Eui-Hong Han, George Karypis, Vipin Kumar and Bamshad Mobasher

		Ontology-based Induction of High Level Classification Rules
		Merwyn G. Taylor, Kilian Stoffel and James A. Hendler

11:00--11:15	Coffee Break
11:15--12:30	Session II	Applications

		An efficient domain-independent algorithm for detecting 
		approximately duplicate database records
		Alvaro E. Monge and Charles P. Elkan

		An Application of Adaptive Data Mining: Facilitating 
		Web Information Access
		Parvathi Chundi and Umeshwar Dayal

		Efficient Roll-Up and Drill-Down Analysis for Large Data Sets 
		Min Wang and Bala Iyer

12:30--14:15	Lunch, Posters, Demos
14:15--15:30	Session III	Association Rules

		Mining Association Patterns from Nested Databases
		Ke Wang

		Maintenance of Discovered Association Rules: When to update?
		S.D. Lee and  David W. Cheung

		Efficient Algorithms for Discovering Frequent Sets in 
		Incremental Databases 
		Ronen Feldman, Yonatan Aumann, Amihood Amir and Heikki Mannila 

15:30--15:45	Coffee Break
15:45--17:00	Session IV	Miscellany 

		Sharing Processing in Data Mining Systems
		Arun Swami and Brian Lent

		A Pattern Discovery Algebra
		Alexander Tuzhilin

		On the  Complexity of Mining Temporal Trends
		Jef Wijsen and Robert Meersman

17:00-18:00	Summary Discussion


>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 20 Mar 1997 09:29:50 -0600
From: Werner Wothke <[email protected]>
Subject: Chicago ASA Data Mining Conference, May 2, 1997
The Chicago Chapter of the American Statistical Association is
presenting a Data Mining conference on May 2, titled
A Hard Look at Data Mining
The idea of the conference is to peel away most of the hype and present
the local statistical and data analysis community with some solid
technical and statistical information. A web site with additional
information can be found at
http://www.smallwaters.com/datamine

With beste wishes,

Werner Wothke

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 20 Mar 1997 17:48:34 -0500
From: Gregory Piatetsky-Shapiro <[email protected]>
Subject: Paris Data Mining'97 Event, June 2-4

See http://www.datamining.org/events.htm for full information
<h2 align=center>Data Mining'97 : Increasing Corporate Performance</h2>
<h2 align=center>Meridien Montparnasse Hotel, Paris, June 2-4, 1997</h2>
<h3>THE DATA MINING MARKET : TRENDS AND EVOLUTION</h3>
<dl>
<li>Market and players
<li>Perspectives and trends : Data Mining in 2000 and beyond
<li>Mining the Net : maximizing external data retrieval and analysis
<li>Data Mining and the law : situation and perspectives
</dl>

<h3>INTRODUCTION TO DATA MINING</h3>
<dl>
<li>More than a media phenomenon, what are the real issues for data mining ?
<li>Corporate data bases : retrieval and output
<li>The latest technologies
<li>Technology-human interface
</dl>

<h3>DATA MINING BEST PRACTICE</h3>
<dl>
<li>Data warehousing, On Line Analytical Processing and data mining
<li>Data and their representation for data mining
<li>Optimizing access to stored information
<li>Utilizing data mining to further management strategies
<li>Using data mining to measure corporate performance through data mining 
</dl>

<h3>DATA MINING APPLICATIONS</h3>
<dl>
<li>Direct marketing and data mining : customer satisfaction and retention
<li>Geomarketing and data mining 
<li>Marketing strategy and data mining : optimizing a commercial network 
<li>Finance and data mining : credit management and risk assessment
<li>Adapting to changing markets through implementing data mining processes in all fields of business
</dl>
<p><strong>A unique opportunity to meet your potential customers and peers and hear the latest from the competition !</strong></p>
<p>This forum will be a premier opportunity to network & exchange business cards with CEOs, VPs, and managers of :
<dl>
<li>Finance
<li>Marketing
<li>Sales
<li>Strategic Planning
<li>Information Systems
<li>Advertising above and below the line
</dl>
In the fields of :
<dl>
<li>Financial services
<li>Insurance
<li>Mail order companies
<li>Retail
<li>Healthcare
<li>Computing, Telecommunications
<li>Government
<li>Transport and logistics
</dl>
</p>

<p><strong>This Conference will be a premiere in Europe. Come join us in Paris!</strong></p>
<p>For further information and registration, please contact us at <a href='mailto:[email protected]'>[email protected]</a></p>

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Sun, 23 Mar 1997 17:05:26 -0800
		KNOWLEDGE ACCELERATION
            The 1997 XpertUser Conference
                 2 - 5 November 1997
               Boston, Massachusetts
             http://www.XpertUser.com

In support of its XpertRule(r) and Profiler(tm) products, Attar Software
announces its 1997 XpertUser Conference entitled: "Knowledge
Acceleration." The Conference, to be held in Boston, MA, 2 - 5 November 1997, features a keynote
address by Professor Donald Michie, a pioneer in the field of Machine
Intelligence. In addition, there are planned tutorials on data mining
and knowledge engineering as well as application demonstrations, and
technical sessions with Dr. Akeel Al-Attar, and other experts from Attar's
world-wide customer base. The Conference web page is at http://www.XpertUser.com. 
The registration fee is $695 until 1 July when it iincreases to $895.
410.2197:12IJSAPL::OLTHOFSpellchecked Henry AlthoughWed Apr 23 1997 12:45459
Knowledge Discovery Nuggets 97:12, e-mailed 97-04-10
 News:
    * E. Colet, Advanced Scout News -- 
        http://www.nextstep.com/new_this_week/120/advancedscout.html
    *  A. Andrusiewicz, Query -- Mining Association Rules 
Publications:
    *  H. Motoda, Final CFP: IEEE Expert Special Issue on 
        Feature Transformation and Subset Selection 
Siftware:
    * O. Leng, WinViz for Excel, 
	http://jsaic.iti.gov.sg/projects/vizMain.html 
Positions:
    * W. Jones, Knowledge Discovery Research at U. of Alabama at Birmingham
 (UAB),  http://www.cis.uab.edu/info/kdrg/kdrg.html
    * R. Straughan, Senior Consultant in Data Mining at NSRC in Singapore
        http://www.nsrc.nus.sg
Meetings:
    * R. Tibshirani, Modern Regression and Classification course,
        New York , June 23-24, 1997 
        http://stat.stanford.edu/~trevor/mrc.finance.html
    * PADD97, Practical Application of Knowledge Discovery and Data Mining
	Conference Program, London, 23-25 April 1997,
	http://www.demon.co.uk/ar/PADD97/
    * M. Conkling, Data Warehousing Best Practices & Implementation
Conference
	 Chicago May 27-June 1, 1997,
	http://www.dw-institute.com/
    * GPS, Data Mining'97 : Increasing Corporate Performance, 
	Paris, June 2-4, 1997, cancelled
--

Knowledge Discovery Nuggets is a free electronic newsletter for the 
Data Mining and Knowledge Discovery community, focusing on the 
latest research and applications.

Submissions are most welcome and should be emailed, with a 
DESCRIPTIVE subject line (and a URL) to [email protected]. 
To subscribe, see http://www.kdnuggets.com/subscribe.html 

Back issues of KD Nuggets, a catalog of data mining tools 
("Siftware"), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site 
at http://www.kdnuggets.com/

	-- Gregory Piatetsky-Shapiro (editor)
	[email protected]

********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not 
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
No matter how neutral the topic, your message will offend SOMEONE.
	Murphy's laws of BBS, thanks to 
	http://www.calweb.com/~logon/murphy.html
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: "Edward Colet"<[email protected]>
Date: Wed, 26 Mar 1997 16:30:56 -0400
Subject: Advanced Scout 

Readers may be interested in some recent updates on the data mining/KDD
work of IBM Research's Advanced Scout Project (the data mining application
used in the National Basketball Association).  These can be found in
newspapers, TV, the web and the SIGMOD/PODS schedule.  Specifically, the
press coverage of Advanced Scout appeared in the Los Angeles Times,
2/17/97, page C4.  Also, the TV show, "NextStep" showed a feature on
Advanced Scout that aired in the San Francisco area on 3/8/97.  A broadcast
of this feature will air nationwide on the Discovery channel at a later
date.  The URL for the NextStep feature called "Hard-wired Hoops" can be
found at : http://www.nextstep.com/new_this_week/120/advancedscout.html

Also available on the Web is an online posting containing the abstract and
bio for the keynote address on data mining at SIGMOD/PODS, 1997 to be given
by Inderpal.  The URL is: 
http://mundos.ifsm.umbc.edu/~ramesh/sigmod97/advprog.html.   It's
accessible from within both the SIGMOD or the PODS schedules.

Thanks,
Ed Colet.

 *********************************************
 IBM T.J. Watson Research Center
 30 Saw Mill River Road
 Hawthorne  NY  10532
 phone: 914-784-6621;  tie-line 863
 fax: 914-784-7455
 email: [email protected]
 *********************************************
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Thu, 27 Mar 1997 12:04:21 +1000 (EST)
From: Anna Andrusiewicz <[email protected]>

Hi,

I am working on a problem that may be related to mining generalized
association rules. The basic problem involves mining student enrolment
histories in order to figure out what subjects are being taken by what
kinds of students. 
 
I would like to conduct a case study on the enrolments data I have, and
was wondering if anyone knows of a public domain system for mining
association, or multi-level association rules.

Any help offered will be much appreciated - thank you,
 
 Anna Andrusiewicz
 School of Information Technology
 The University of Queensland, Australia

>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: [email protected]
Subject: Final Call for Papers: IEEE Special Issue
Date: Sat, 29 Mar 97 17:13:06 +0900

            		Final Call For Papers

       		  IEEE Expert Special Issue on 

        Feature Transformation and Subset Selection

        Guest Editors: Huan Liu and Hiroshi Motoda

(edited for space ... see Nuggets 96:37 for full CFP
http://www.kdnuggets.com/nuggets/96/n37.html#item4) 

III. SUBMISSION REQUIREMENTS and SCHEDULE

High quality, original papers that deal with real-world problems
are solicitated. All the submitted manuscripts will be subject
to a rigorous review process. Manuscripts should be prepared in
accordance with the IEEE Expert "submission guidelines". 
Manuscripts should be approximately 5,000 words long, preferably
not exceeding 10 references. This special issue is scheduled to
appear in late 1997.

Important Dates:

Submission      April 30 (FIRM DEADLINE)

Notification    June 30

Prospective authors should submit six copies of the completed
manuscript to one of the guest editors:

Huan Liu			               Hiroshi Motoda
S16 #4-17			               Institute of Scientific & Industrial 
Dept of Info Sys & Comp Sci	       Research
National University of Singapore   Osaka University
Kent Ridge, Singapore, 119260      Ibaraki, Osaka  567, Japan
[email protected]		           [email protected]

>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Sat, 29 Mar 1997 12:08:21 +0800
From: Ong Hwee Leng <[email protected]>
Subject: WinViz for Excel

A version of WinViz which runs with Excel 7.0 on Win95 is available for
sale.  WinViz is a multi-dimensional visualisation tool developed at the
Information Technology Institute.  More info & self-running demos can be
found at 
        http://jsaic.iti.gov.sg/projects/vizMain.html

-Hwee-Leng Ong

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Mon, 24 Mar 1997 09:39:26 +0600
From: [email protected] (Warren Jones)

Knowledge Discovery Research at University of Alabama at Birmingham (UAB)
URL:http://www.cis.uab.edu/info/kdrg/kdrg.html

This multidisciplinary research group is concentrating on healthcare
applications,
specifically on surveillance problems. The group consists of
representatives from
Computer and Information Sciences, Pathology and Health Informatics. A tool
called
Hawkeye has been developed which searches temporally organized medical
data,
builds associations and applies interestingness heuristics for the
identification
of trends of interest to medical domain experts. Hawkeye is also an example
of a
large scalable KDD system which requires the utilization of all stages of
the KDD
process. One of the important surveillance problems being investigated is
the
spread of antibiotic resistance.

This Group provides a very attractive opportunity for UAB computer science
graduate students to become involved in KDD research with a medical
emphasis.
Four Ph.D. students are currently associated with the Group and its
on-going
research. Graduate Assistantships are available for prospective
Ph.D.students who are interested in entering the program Fall 1997 with a
research interest in
the directions of the Knowledge Discovery Research Group.
   
UAB is a comprehensive urban institution in Alabama's largest city of
almost a
million population. Student enrollment exceeds 16,400, including more than
3,500 graduate students. The Academic Health Center is well-known for its 
interdisciplinary biomedical research. The computer science graduate
program
has an enrollment of 50, half of which are Ph.D. students. The campus
encompasses
a seventy-block area on Birmingham's Southside, offering all of the
advantages of a university within a major city.

Warren T. Jones, Ph.D. Chair
Department of Computer and Information Sciences
University of Alabama at Birmingham
Birmingham, AL 35294-1170
Ph: (205)934-8657
Fax: (205)934-5473
[email protected]

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: Robert Straughan <[email protected]>
Subject: Senior Consultant in Data Mining at NSRC in Singapore
Date: Sat, 5 Apr 1997 09:06:47 +0800 (SGT)

Staff Title:	Group Leader - Senior Consultant, Commercial Applications
Date Required:	1 June 1997

Job Description:  National Supercomputing Research Centre (NSRC) is
Singapore's national centre for High Performance Computing (HPC).  NSRC
currently facilitates services and solutions to the Singapore industry
in the field of Computer Aided Engineering, Chemical Applications and
Electronics.  Commercial Applications has been identified as a new
growth area, where HPC can make a significant impact on the commercial
industries' competitiveness.  NSRC has therefore decided to expand into
this field and is currently looking for a person with extensive
industrial experience in the field of Data Mining within finance,
banking, insurance, or retail marketing.  The Group Leader shall take
overall responsibility in promoting NSRC's capabilities within the
field of Data Mining to the commercial industry in Singapore and to
solicit for business.  The Group Leader shall work closely with NSRC's
existing staff within this field to develop the best possible strategy
to target potential commercial organisations.

Skills Required:  Minimum Masters Degree.  Specialisation within the
field of Computer Science and Business Administration.  At least 5
years experience from a financial institution or in retail marketing
within the field of Data Mining / Data Analysis.  Extensive managerial
experience, in particular project management, business analysis and
negotiation skills.  Strong knowledge of statistical analysis and
selection / building of appropriate modelling techniques to solve
business problems.  A good understanding of the algorithms used in Data
Mining (neural networks, classifications etc.).  Have previously used
IBM SP2 and tools such as Intelligent Miner and Darwin as well as
statistical packages such as SAS and SPSS.

Relocation assistance, allowances for housing, children's education and
transportation apply.  Salary will be commensurate with qualifications
and experience.

You can obtain more details by contacting [email protected] or visit
our web site at http://www.nsrc.nus.sg.

Resumes can be sent to:

                                Administration Manager
                                NSRC
                                89 Science Park Drive
                                The Rutherford #01-05/08
                                Singapore 118261

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: [email protected]
Date: Sun, 23 Mar 97 22:45 EST
Subject: Modern Regression and Classification course - New York
        ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
        +++                                                        +++
        +++        Modern Regression and Classification:           +++
        +++                                                        +++
        +++            Statistical prediction methods for finance  +++
        +++                      and marketing                     +++
        +++                                                        +++
        +++                                                        +++
        +++         New York City: June 23-24, 1997                +++
        +++                                                        +++
        +++            Trevor Hastie, Stanford University          +++
        +++          Rob Tibshirani, University of Toronto         +++
        +++                                                        +++
        ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


This two-day course will give a detailed overview of statistical models
for regression and classification. Known as machine-learning in
computer science and artificial intelligence, and pattern recognition
in engineering, this is a hot field with powerful applications in
finance, science and industry.

This course covers a wide range of models from linear regression
through various classes of more flexible models to fully nonparametric
regression models, both for the regression problem and for
classification.

This special version of our popular MRC course is tailored to financial
and marketing professionals.

Although a firm theoretical motivation will be presented, the emphasis
will be on practical applications and implementations, especially in
the finance and marketing areas. The course will include many examples
and case studies, and participants should leave the course well-armed
to tackle real problems with realistic tools. The instructors are at
the forefront in research in this area.

After a brief overview of linear regression tools, methods for
one-dimensional and multi-dimensional smoothing are presented, as well
as techniques that assume a specific structure for the regression
function. These include splines, wavelets, additive models, MARS
(multivariate adaptive regression splines), projection pursuit
regression, neural networks and regression trees. All of these can be
adapted to the time-series framework for predicting future trends from
the past.

The same hierarchy of techniques is available for classification
problems. Classical tools such as linear discriminant analysis and
logistic regression can be enriched to account for nonlinearities and
interactions. Generalized additive models and flexible discriminant
analysis, neural networks and radial basis functions, classification
trees and kernel estimates are all such generalizations. Other
specialized techniques for classification including nearest- neighbor
rules and learning vector quantization will also be covered.

Apart from describing these techniques and their applications to a wide
range of problems, the course will also cover model selection
techniques, such as cross-validation and the bootstrap, and diagnostic
techniques for model assessment.

Software for these techniques will be illustrated, and a comprehensive
set of course notes will be provided to each attendee.

Additional information is available at the Website:

http://stat.stanford.edu/~trevor/mrc.finance.html

************************************************************
Some quotes from past attendees: 

     "... the best presentation by professional statisticians I have
       ever had the pleasure of attending"
     "Superior to most courses in all aspects" 
     "I really liked how you emphasized concepts rather than
        mathematical expressions" 
     "Your 2-day course has saved me months of research" 
*************************************************************

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Rob Tibshirani, Dept of Preventive Med & Biostats, and Dept of Statistics
Univ of Toronto, Toronto, Canada M5S 1A8.
Phone: 416-978-4642 (PMB), 416-978-0673 (stats). FAX: 416 978-8299
computer fax  416-978-1525 (please call or email me to inform)
[email protected]. ftp: //utstat.toronto.edu/pub/tibs
http://www.utstat.toronto.edu/~tibs
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Mon, 31 Mar 1997 13:15:16 -0500 (EST)
Subject: PADD97

PADD97 - The First International Conference and Exhibition on
====================================================
The Practical Application of Knowledge Discovery and Data Mining
=========================================================

 23rd April - 25th April 1997

REGISTRATION      http://www.demon.co.uk/ar/Expo97/

INFORMATION       http://www.demon.co.uk/ar/PADD97/

TUTORIALS
Usama Fayyad, Microsoft Research, USA
Evangelos Simoudis, IBM, USA
         DATA Mining and the KDD Process
Blaise Egan, Huw Roberts, BT Laboratories, UK
         Knowledge Discovery - Practical Methodology and Case Studies
Luc De Raedt, Catholic University of Leuven, Belgium
         Principles and Practice of Inductive Logic Programming

INVITED SPEAKERS
Stephen Muggleton, Oxford University, UK
        Declarative Knowledge Discovery in Industrial Databases
Usama Fayyad,  Microsoft Research, USA
        Data Mining: Algorithms, Challenges and Limitations
Xindong Wu, Monash University, Australia
        Building Intelligent Learning Database Systems
Stephen Pass, Red Brick Systems, UK
        Data Mining and Data Warehouses - The Power of Integration
Neil Mackin, White Cross Systems, UK
        The Application of WhiteCross MPP Servers to Data Mining


                                    PRACTICAL APPLICATION  EXPO97
                                ==============================
                                            CONFERENCE REGISTRATION
                                        =========================
Westminster Central Hall, London, 21-25 April, 1997

PADD97 is part of The Practical Application EXPO97 which brings together
four events under one roof:  PAAM97 - The Practical Application of
Intelligent Agents and Multi-Agents; PADD97- The Practical Application of
Knowledge Discovery and Data Mining; PACT97-The Practical Application of
Constraint Technology and PAP97-The Practical Application of Prolog.

REGISTRATION NOW AVAILABLE AT

http://www.demon.co.uk/ar/Expo97/


PLEASE VISIT OUR WEB PAGES FOR FURTHER INFORMATION ON

Programmes
Tutorials
Invited Talks
Exhibition
Venue
Hotel reservations

http://www.demon.co.uk/ar/PAP97/
http://www.demon.co.uk/ar/PACT97/
http://www.demon.co.uk/ar/PAAM97/
http://www.demon.co.uk/ar/PADD97/

The Practical Application Company
PO Box 137
Blackpool
Lancs FY2 9UN
UK
Tel:  +44 (0)1253 358081
Fax: +44 (0)1253 353811
email:  [email protected]
WWW:  http://www.demon.co.uk/ar/TPAC/

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 31 Mar 97 12:50:10 -0600 (CST)
From: Melinda Conkling <[email protected]>
Subject: Data warehousing event

Hi -- The Data Warehousing Institute (www.dw-institute.com) is holding its
Best Practices & Implementation Conference in Chicago May 27-June 1, 1997.
All conference information (including how to register) can be found
on-line. 
Thanks! -- Melinda
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Thu, 10 April Mar 1997 17:48:34 -0500
From: Gregory Piatetsky-Shapiro <[email protected]>
Subject: Paris Data Mining'97 Event, June 2-4 -- cancelled

I have been informed by Gaelle Piernikarch, organizer of the 
above conference, that it has been cancelled and 
may be rescheduled for fall.

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
410.2297:13IJSAPL::OLTHOFSpellchecked Henry AlthoughWed Apr 23 1997 12:46655
Knowledge Discovery Nuggets 97:13, e-mailed 97-04-16
 News:
    * GPS, new address for subscribing to KD nuggets, 
            [email protected]
    * G. Prisco, Query: Knowledge Discovery in Network Alarm Databases
Publications:
    * J. Fuernkranz, AAI Spec Issue on First-Order Knowledge Discovery
        in Databases, http://www.ai.univie.ac.at/ilp_kdd/aai-si.html
    * T. Anand, Review of "Seven Methods for Transforming Corporate Data 
        into Business Intelligence"  by Vasant Dhar and Roger Stein
    * S. Kaski, Thesis on data exploration with SOMs available, 
	http://nucleus.hut.fi/~sami/thesis/thesis.html
Siftware:
    * L. Zoob, SemioMap, the Discovery Search Application
	http://www.semio.com
    * S.D. BYERS, new version of ace.glm for Splus
	http://lib.stat.cmu.edu/S/ace.glm
Positions:
    * R. Straughan, Senior Consultant in Data Mining at NSRC in Singapore
	http://www.nsrc.nus.sg
    * N. Dayanand, Manager of the Data Analysis and Applications group
	http://www.think.com
Meetings:
    * J. Komorowski, PKDD'97 -- Preliminary symposium program,
	http://www.idt.ntnu.no/pkdd97/
    * ICML-Colt, ICML-97/Colt-97 call for participation
	http://cswww.vuse.vanderbilt.edu/~mlccolt/
    * X. Wu, CFP: IEEE Knowledge and Data Engineering Exchange
           Workshop (KDEX-97), Nov 3, 1997, Newport Beach, CA, USA
 	 http://www.sd.monash.edu.au/kdex-97
    * M. Smyth, Hinton -- Jordan Learning Methods course: 
        spaces still available, 
    http://www.ai.mit.edu/projects/cbcl/web-pis/jordan/course/
--
Knowledge Discovery Nuggets is a free electronic newsletter for the 
Data Mining and Knowledge Discovery community, focusing on the 
latest research and applications.

Submissions are most welcome and should be emailed, with a 
DESCRIPTIVE subject line (and a URL) to [email protected]. 
Please keep CFP and meetings announcements short and provide 
a URL for details.
 
To subscribe, see http://www.kdnuggets.com/subscribe.html 

KD Nuggets frequency is 3-4 times a month. 
Back issues of KD Nuggets, a catalog of data mining tools 
("Siftware"), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site 
at http://www.kdnuggets.com/

	-- Gregory Piatetsky-Shapiro (editor)
        [email protected]
********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not 
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
2 is not equal to 3 - not even for very large values of 2. 
          Grabel's Law
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Wed, 16 Apr 1997 09:41:10 -0500 (EST) 
From: Gregory Piatetsky-Shapiro <[email protected]>
Subject: New address for subscribing to KD Nuggets --
[email protected]

Thanks to many of you for the good words about Nuggets. 
Last week I have completed the transfer of Nuggets server 
(now called Knowledge Discovery Nuggets rather than KDD Nuggets 
to emphasize the broader scope) to kdnuggets.com site. 

To subscribe, please email to [email protected] 

1-line message with 

subscribe kdnuggets 

(to unsubscribe, message should be unsubscribe kdnuggets) 

See http://www.kdnuggets.com/subscribe.html for details.

Please address all submissions for Nuggets to [email protected] ;
Email to the old Nuggets address [email protected] will probably be forwarded to 
[email protected] for some time, but it is better to send email to the
new address.  

-- GPS
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Mon, 14 Apr 97 12:48:49 PDT
From: Giuseppe Prisco <[email protected]>
Subject: Knowledge Discovery in Switching Network Alarm Databases

We are interested in the application of KDD methods to a public switching 
network alarm database. Our goal is to improve maintenance and severe alarm

prevention. Our research started studying TASA System experience and their 

sequence analysis algorithm. Any help would be appreciated, in particular:

- suggestions, experiences etc.

- suggestions about (eventually free) software for searching significant 
sequences.

- contacts with any Italian University, in order to start a possible thesis

work on that topic.

Thank you 
_________________________________________

Giuseppe Prisco - Software Analyst
Telesoft s.p.a       SPR/SSCT
Via degli Agrostemmi, 30 S.Palomba - Roma 00040
tel 06/71035723

email [email protected]

>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Tue, 01 Apr 1997 12:50:19 +0200
From: Johannes Fuernkranz <[email protected]>

 			    2nd Call For Papers
                       Applied Artificial Intelligence
                              Special issue on
                First-Order Knowledge Discovery in Databases
	   (URL: http://www.ai.univie.ac.at/ilp_kdd/aai-si.html)

A recent MLnet Workshop, held at the ICML-96, focussed on a discussion of
the potential contribution of ILP for KDD. Information on the workshop
including a short summary and all accepted papers can be found at
http://www.ai.univie.ac.at/ilp_kdd/. The general conclusion was that ILP
can
be a valuable tool for data mining, its main advantages being the
expressiveness of first-order logic as a representation language and the
ability of many ILP systems to use strong language biases for restricting
the huge search space. ILP has a high flexibility in incorporating various
forms of background knowledge, which can be invaluable for large KDD tasks.

The special issue on "First-Order Knowledge Discovery in Databases" of the
Applied Artificial Intelligence Journal will thus welcome papers that focus
on one or more of the following topics:

   * Embedding ILP into the KDD process
   * Necessary pre- and post-processing steps for real-world applications
   * Interfacing ILP systems with database managers
   * Scalability of ILP for real-world databases
   * Criteria for quantifying the complexity of ILP problems
   * Evaluation of gain and price of ILP versus propositional learning
   * Non-classification learning and discovery in a first-order framework
   * Benefits of using background knowledge and/or strong explicit biases
   * Innovative real-world applications of ILP

Papers on related subjects are also welcome, but a strong focus on
applications and database issues is required for all submissions.

see http://www.ai.univie.ac.at/ilp_kdd/aai-si.html for full details 
on Submissions

Submission Deadline: April 30, 1997

[edited for space. GPS]
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "Anand, Tej" <[email protected]>
Subject: book review for Nuggets
Date: Fri, 4 Apr 1997 16:58:14 -0500

Book Review: "Seven Methods for Transforming Corporate Data into Business 
Intelligence"  by Vasant Dhar and Roger Stein, 
(Prentice-Hall, 1997).

(see http://www.prenhall.com/allbooks/be_0132820064.html for more 
on this book.  GPS)

It has been quite a while since I have been able to read a
technical/business book in its entirety, but recently I accomplished
this feat with "Seven Methods for Transforming Corporate Data into
Business Intelligence" by Vasant Dhar and Roger Stein. Usually I am
unable to complete a technical/business book because either it is so
high-level (and abstract) that I cannot appreciate how the material
would apply to me, or it is so detailed that I am totally lost "in the
trees".

Seven Methods... is different. This short book starts off by providing
a framework for representing objectives and requirements for
"intelligent systems" (systems that embed AI techniques or systems
that explicitly represent knowledge) using a business oriented
vocabulary. This framework not only helps select the "appropriate"
technique but it helps in formulating the problem that makes that
selection transparent. The business vocabulary helps explain the
selection to management and business types.

The book then describes seven data-intensive modeling techniques (tree
induction, analogical reasoning, fuzzy logic, rule-based systems,
neural nets, genetic algorithms, and OLAP) using the framework. While
these chapters are written to enable business-oriented people to get a
quick understanding of the techniques, they are also great for
technical folks because they can provide us knowledge about techniques
in which we are not experts. All techniques are treated with uniform
depth, which makes it a handy reference. The explanation of the
techniques is highly visual with almost every other page containing a
high quality graphic that explains how the techniques work. One
quibble: Chapter 10, titled Machine Learning, could have been more
aptly titled "Tree Induction".

The book ends with seven detailed (8-10 pages each) case studies of
successful applications of each of the techniques. Each case study is
described using the same framework. This is where the rubber meets the
road, and for the seven case studies selected the framework holds up
very well.

My only real complaint with this book is that it does not talk about using
multiple techniques together.

Btw: I felt this book was so well written that I promptly lent it to my
manager for weekend reading.

Disclaimer: Although we have never worked together, Roger Stein and I
for a brief time shared the same employer: Dun & Bradstreet, Roger at
Moody's and I at A.C Nielsen. One of the case studies is about
Spotlight, a system with which I was associated.

-Tej Anand
NCR Corporation
Human Interface Technology Center


>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Sun, 6 Apr 1997 21:54:10 +0300
From: Sami Kaski <[email protected]>
Subject: Thesis on data exploration with SOMs available

The following Dr.Tech. thesis is available at

http://nucleus.hut.fi/~sami/thesis/thesis.html (html-version)
http://nucleus.hut.fi/~sami/thesis.ps.gz       (compressed postscript,
300K)
http://nucleus.hut.fi/~sami/thesis.ps          (postscript, 2M)

The articles that belong to the thesis can be accessed through the page

http://nucleus.hut.fi/~sami/thesis/node3.html

Data Exploration Using Self-Organizing Maps

Samuel Kaski

Helsinki University of Technology
Neural Networks Research Centre
P.O.Box 2200 (Rakentajanaukio 2C)
FIN-02015 HUT, Finland

Finding structures in vast multidimensional data sets, be they
measurement data, statistics, or textual documents, is difficult and
time-consuming. Interesting, novel relations between the data items
may be hidden in the data. The self-organizing map (SOM) algorithm of
Kohonen can be used to aid the exploration: the structures in the data
sets can be illustrated on special map displays.

In this work, the methodology of using SOMs for exploratory data
analysis or data mining is reviewed and developed further. The
properties of the maps are compared with the properties of related
methods intended for visualizing high-dimensional multivariate data
sets. In a set of case studies the SOM algorithm is applied to
analyzing electroencephalograms, to illustrating structures of the
standard of living in the world, and to organizing full-text document
collections.

Measures are proposed for evaluating the quality of different types of
maps in representing a given data set, and for measuring the
robustness of the illustrations the maps produce.  The same measures
may also be used for comparing the knowledge that different maps
represent.

Feature extraction must in general be tailored to the application, as
is done in the case studies. There exists, however, an algorithm
called the adaptive-subspace self-organizing map, recently developed
by Kohonen, which may be of help. It extracts invariant features
automatically from a data set. The algorithm is here characterized in
terms of an objective function, and demonstrated to be able to
identify input patterns subject to different transformations.
Moreover, it could also aid in feature exploration: the kernels that
the algorithm creates to achieve invariance can be illustrated on map
displays similar to those that are used for illustrating the data
sets.

>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Thu, 10 Apr 1997 17:43:04 -0700
From: Laurie Zoob <[email protected]>
Subject: SemioMap, the Discovery Search Application

Semio Corporation, a newly formed start-up company, is using
computational semiotics to identify patterns and relationships in
text-based information on the internet and intranet. Using data
visualization, the relationships are automatically displayed in a
graphical, navigable map.  There is a working alpha version/early beta
of the software at http://www.semio.com.  The initial product is called,
SemioMap, the Discovery Search application.  SemioMap is targeted toward
the corporate intranet market.

We are currently seeking data mining, knowledge discovery and data base
oriented companies as development partners.  If you are interested in
receiving more information, please email me at [email protected].

Best,
Laurie Zoob
Director, Business Development
-- 
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Laurie Zoob				Phone:	(415) 802-2943
Director Business Development		Fax:	(415) 802-2942
Semio Corporation			Email:	[email protected]
One Dolphin Drive 			http://www.semio.com
Redwood Shores, CA 94065
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Wed, 26 Mar 1997 13:07:39 -0800 (PST)
From: "S.D. BYERS" <[email protected]>
Subject: new version of ace.glm 

Dear Splus and GLM users,
        I have written  a new version  of ace.glm for Splus and it is
now available in the S archive at Statlib at
http://lib.stat.cmu.edu/S/ace.glm

        This simple function performs the ACE transformation detection
algorithm for generalized linear models using the weighted linear model 
obtained from the GLM at convergence of the fitting algorithm. 
It generalizes ace.logit, ACE for logistic regression.  
A paper describing ace.logit and its uses can be found at

 http://www.stat.washington.edu/tech.reports/raftery-richardson.ps

These functions can be powerful tools in Generalised Linear Modelling.
The new ace.glm will work for any GLM that has a family defined in Splus.  
It will also work for any link function defined for these families. 
Previously, ace.glm worked only for the canonical link function.
By default, ace.glm will pleasantly plot your ACE output if a graphics 
device is open. 

I would like to hear about any use/abuse/errors that may arise.

        Thanks,
                Simon Byers,
                        University of Washington Statistics.
				[email protected]
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: Robert Straughan <[email protected]>
Subject: Senior Consultant in Data Mining at NSRC in Singapore
Date: Sat, 5 Apr 1997 09:06:47 +0800 (SGT)

Staff Title:	Group Leader - Senior Consultant, Commercial Applications
Date Required:	1 June 1997

Job Description:  National Supercomputing Research Centre (NSRC) is
Singapore's national centre for High Performance Computing (HPC).  NSRC
currently facilitates services and solutions to the Singapore industry
in the field of Computer Aided Engineering, Chemical Applications and
Electronics.  Commercial Applications has been identified as a new
growth area, where HPC can make a significant impact on the commercial
industries' competitiveness.  NSRC has therefore decided to expand into
this field and is currently looking for a person with extensive
industrial experience in the field of Data Mining within finance,
banking, insurance, or retail marketing.  The Group Leader shall take
overall responsibility in promoting NSRC's capabilities within the
field of Data Mining to the commercial industry in Singapore and to
solicit for business.  The Group Leader shall work closely with NSRC's
existing staff within this field to develop the best possible strategy
to target potential commercial organisations.

Skills Required:  Minimum Masters Degree.  Specialisation within the
field of Computer Science and Business Administration.  At least 5
years experience from a financial institution or in retail marketing
within the field of Data Mining / Data Analysis.  Extensive managerial
experience, in particular project management, business analysis and
negotiation skills.  Strong knowledge of statistical analysis and
selection / building of appropriate modelling techniques to solve
business problems.  A good understanding of the algorithms used in Data
Mining (neural networks, classifications etc.).  Have previously used
IBM SP2 and tools such as Intelligent Miner and Darwin as well as
statistical packages such as SAS and SPSS.

Relocation assistance, allowances for housing, children's education and
transportation apply.  Salary will be commensurate with qualifications
and experience.

You can obtain more details by contacting [email protected] or visit
our web site at http://www.nsrc.nus.sg.

Resumes can be sent to:

                                Administration Manager
                                NSRC
                                89 Science Park Drive
                                The Rutherford #01-05/08
                                Singapore 118261

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Fri, 04 Apr 1997 14:41:09 -0500
From: Nalini Dayanand <[email protected]>
Subject: Job Announcement-Please post

THINKING MACHINES CORPORATION is a leading provider of knowledge discovery
software and services. TMC's high end datamining software suite enables 
users to extract meaningful information from large databases. For more 
information please see http://www.think.com. The company is seeking an 
individual to join the development organization as Manager of the Data  
Analysis and Applications group.

The manager of the data analysis and applications group will provide 
leadership and individual contribution in the design, development and 
deployment of data mining applications, prototypes and application 
frameworks.  Responsibilities include 

* working with product marketing and clients to identify opportunities for 
  data mining applications
* providing leadership and individual contribution in requirements 
  definition and application/prototype/framework development 
* organizing and managing a team of analysts, software engineers and 
  technology engineers responsible for  the development of specific 
  applications/prototypes/frameworks
* providing feedback to the development organization on potential 
  enhancements to existing products


Experience in a telecommunications and/or financial services is desirable 
but not essential.  

If you background and interests match these expectations, please send your 
resume via fax, email or regular mail to

Nalini Dayanand
Thinking Machines Corporation
14 Crosby Drive
Bedford, MA 01730

Fax: (617) 276-0444
email: [email protected]

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: Jan Komorowski <[email protected]>
Subject: PKDD'97 -- Preliminary symposium program 

PKDD'97 -- 1st European Symposium on Principles of Data Mining and
Knowledge Discovery, Trondheim, Norway, June 24-27, 1997.  Preliminary
symposium program and registration information: 
http://www.idt.ntnu.no/pkdd97/

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 

Date: Thu, 10 Apr 97 15:04:39 CDT
From: [email protected] (ICML-COLT Administration)
Subject: COLT/ICML

                    Call for Participation

  Tenth Annual Conference on        Fourteenth International
Computational Learning Theory    Conference on Machine Learning
        (COLT-97)                         (ICML-97)

        July 6-9                          July 8-11

                COLT/ICML Tutorials on July 8
             ICML-affiliated Workshops on July 12

                     Vanderbilt University
                   Nashville, Tennessee, USA

The organizers of COLT-97 and ICML-97 invite you to participate
in one or both of these conferences. In hopes of encouraging 
interactions between the learning theory and machine learning 
communities, the conferences are loosely coupled by joint 
tutorials, a day of joint technical sessions, a joint banquet, 
and otherwise through co-location at Vanderbilt University in 
Nashville, Tennessee. 

Find all the latest information about COLT-97 and ICML-97 at
http://cswww.vuse.vanderbilt.edu/~mlccolt/, including lists 
of papers to be presented, registration and housing material, 
information on tutorials and workshops, invited speakers, 
travel, and the like. You may also obtain registration and 
housing material by writing to [email protected].

                      --------------------

Registration costs and applicable dates are:

                Early                Late 
            (until June 2)      (after June 2)

COLT            $140                 $180
ICML            $140                 $180
COLT/ICML       $240                 $310

                      --------------------

Registration for one of three ICML-affiliated Workshops
on 
    (1) reinforcement learning, 
    (2) automata induction, grammatical inference, and language 
        acquisition, or
    (3) machine learning application in the real world

is $25 until June 2, and $35 after June 2.

                      --------------------
ICML-97 acknowledges generous support from the Daimler-Benz
Corporation. COLT-97 acknowledges generous support from
ATT and is held in cooperation with ACM SIGACT and SIGART. 
Both conferences are sponsored by Vanderbilt University.

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Fri, 11 Apr 1997 11:03:04 +1000 (EST)
From: [email protected] (Xindong Wu)
Subject: CFP: IEEE KDEX-97

1997 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX-97)
 --------------------------------------------------------------------
      Sponsored by the IEEE Computer Society and Co-located with
      the 9th IEEE Tools with Artificial Intelligence Conference

          November 3, 1997, Newport Beach, California, U.S.A.
          ===================================================

Call for Papers

The   1997 IEEE  Knowledge  and  Data  Engineering Exchange   Workshop
(KDEX-97)   will provide   an  international forum   for  researchers,
educators and practitioners to  exchange and evaluate information  and
experiences related to state-of-the-art issues and trends in the areas
of  artificial intelligence and databases.  The  goal of this workshop
is to expedite technology  transfer from researchers to practitioners,
to  assess the impact   of emerging technologies  on  current research
directions,   and  to   identify   emerging  research   opportunities.
Educators   will  present  material   and  techniques for  effectively
transferring   state-of-the-art   knowledge    and  data   engineering
technologies to students and professionals.  The workshop is currently
scheduled for an one-day duration,  but depending on the final program
it might be extended to a second day.

Submissions can be  in the form of  survey papers, experience reports,
and educational material  to facilitate technology transfer.  Accepted
papers  will be  published in  the workshop  proceedings  by the  IEEE
Computer  Society.  A  selected number   of  the accepted papers  will
possibly be expanded   and   revised  for  publication   in  the  IEEE
Transactions  on Knowledge and  Data  Engineering (IEEE-TKDE) and  the
International Journal of Artificial  Intelligence Tools.   Educational
material related to papers published  in the IEEE-TKDE will be  posted
on the IEEE-TKDE home page.

The theme of the workshop is "AI MEETS DATABASES".  Topics of interest
include, but are not limited to:

  - Computer supported cooperative processing and interoperable
    systems
  - Data sharing, data warehousing and meta-data management
  - Distributed intelligent mediators and agents
  - Distributed object management
  - Dynamic knowledge
  - Evaluation and measurement of knowledge and database systems
  - High-performance issues (including architectures, knowledge
    representation techniques, inference mechanisms, algorithms and
    integration methods)
  - Information structures and interaction
  - Intelligent search, data mining and content-based retrieval
  - Knowledge and data engineering systems
  - Quality assurance for knowledge and data engineering systems
    (correctness, reliability, security, survivability and
    performance)
  - Software re-engineering and intelligent software information
    systems
  - Spatio-temporal, active, mobile and multimedia data
  - Emerging applications (biomedical systems, decision support,
    geographical databases, Internet technologies and applications,
    digital libraries, etc.)

All submissions  should be  limited to a  maximum of  5,000 words. Six
hardcopies should be forwarded to the following address.
 
     Xindong Wu (KDEX-97)
     Department of Software Development
     Monash University
     900 Dandenong Road
     Caulfield East, Melbourne 3145
     Australia

     Phone: +61 3 9903 1025
     Fax: +61 3 9903 1077
     E-mail: [email protected]

Please include a cover  page   containing the title,  authors  (names,
postal and email  addresses,   telephone and   fax numbers), and    an
abstract.  This cover page must accompany the paper.

    ************ I m p o r t a n t   D a t e s *****************
    * 6 copies of full papers received by:    June 15,    1997 *
    * acceptance/rejection notices:           July 31,    1997 *
    * final camera-readies due by:            August 31,  1997 *
    * workshop:                               November 3, 1997 *
    ************************************************************

Further Information
===================

      WWW: http://www.sd.monash.edu.au/kdex-97

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: Marney Smyth <[email protected]>
Subject: Hinton -- Jordan Learning Methods course : spaces still available
Date: Thu, 10 Apr 1997 07:38:25 -0400 (EDT)


some spaces still available ...


        **************************************************************
        ***                                                        ***
        ***     Learning Methods for Prediction, Classification,   ***
        ***       Novelty Detection and Time Series Analysis       ***
        ***                                                        ***
        ***          Washington, D.C., May 2 -- 3, 1997            ***
        ***                                                        ***
        ***        Geoffrey Hinton, University of Toronto          ***
        ***      Michael Jordan, Massachusetts Inst. of Tech.      ***
        ***                                                        ***
        **************************************************************


A two-day intensive Tutorial on Advanced Learning Methods will be held
May 2 -- 3rd, 1997, at the Hyatt Regency on Capitol Hill, Washington
D.C.  Space is available for up to 50 participants for the course.

The course will provide an in-depth discussion of the large collection 
of new tools that have become available in recent years for developing 
autonomous learning systems and for aiding in the analysis of complex 
multivariate data.  These tools include neural networks, hidden Markov 
models, belief networks, decision trees, memory-based methods, as well 
as increasingly sophisticated combinations of these architectures.  
Applications include prediction, classification, fault detection, 
time series analysis, diagnosis, optimization, system identification 
and control, exploratory data analysis and many other problems in
statistics, machine learning and data mining.

(edited for space)

ADDITIONAL INFORMATION
A registration form is available from the course's WWW page at 

 http://www.ai.mit.edu/projects/cbcl/web-pis/jordan/course/

 Marney Smyth
 E-mail: [email protected]
 Phone:  617 258-8928
 Fax:    617 258-6779
410.2397:14IJSAPL::OLTHOFSpellchecked Henry AlthoughThu Apr 24 1997 12:47539
Knowledge Discovery Nuggets 97:14, e-mailed 97-04-23
 News:
    * E. Bertino, Query: data mining from wafers manufacturing process ?
Publications:
    *  M. Ramoni, Technical Reports on Bayesian Knowledge Discovery,
            http://kmi.open.ac.uk/~marco/projects/kdd
    * Tom Mitchell, Text book for Data Mining: Machine Learning 
            http://www.cs.cmu.edu/~tom/mlbook.html
Siftware:
    * R. Quinlan, Windows Version of C5.0 ("See5") Available Now
            http://www.rulequest.com
    * Stanley Rice, Postcoordinate Software
        http://www.cruzio.com/~autospec/darwin.htm
    * Pamela Lerwick, IDIS Special Release 
        http://www.datamining.com
Positions:
	* R. King, Ph.D. Studentships in Data Mining at University of Wales, UK
	* Fred J. Damerau, Research Associate in Text Mining/Information
            Extraction
--
Knowledge Discovery Nuggets is a free electronic newsletter for the 
Data Mining and Knowledge Discovery community, focusing on the 
latest research and applications.

Submissions are most welcome and should be emailed, with a 
DESCRIPTIVE subject line (and a URL) to [email protected]. 
Please keep CFP and meetings announcements short and provide 
a URL for details.
 
To subscribe, see http://www.kdnuggets.com/subscribe.html 

KD Nuggets frequency is 3-4 times a month. 
Back issues of KD Nuggets, a catalog of data mining tools 
("Siftware"), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site 
at http://www.kdnuggets.com/

	-- Gregory Piatetsky-Shapiro (editor)
        [email protected]

********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not 
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Restlessness and discontent are the necessities of progress. 
     --Thomas A. Edison
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 

From: [email protected]
Date: Thu, 17 Apr 1997 09:44:45 +0200 (METDST)
Subject: data mining from wafers manufacturing process 

At our University, we are starting an application project
dealing with data from a wafers manifacturing process.
We are thinking to use data mining techniques
for try to address the following problem.
Some of those wafers are faulty. There is a database keeping track
of the entire manifacturing process for each wafer and collecting
large amount of data concerning each step of the manifacturing
process (there are about 300 steps; each step is characterized
about 100 parameters).  Our problem is use data mining techniques
in helping the diagnosis, that is, to see which step
may have caused the problem.

I was wondering whether you are aware of any use of data mining
techniques for similar problems. We have also to acquire
some suitable data mining tools.

I would appreciate any suggestion you may give me on this
issue.

Best regards Elisa
----------------------------------------------------------------------------
---
Prof. Elisa Bertino
Dipartimento di Scienze dell'Informazione
Universita' di Milano
Via Comelico 39/41
20135 Milano (Italy)

tel: (+39)2-55006227
fax:  (+39)2-55006253

e-mail:         [email protected]
                      [email protected]
www            http://mercurio.sm.dsi.unimi.it/~bertino/
                 
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Wed, 9 Apr 1997 19:23:44 +0100
From: Marco Ramoni <[email protected]>
Subject: Technical Reports Available

The following reports are available on the World Wide Web. Further
information about the Bayesian Knowledge Discovery Project can be
reached at

		http://kmi.open.ac.uk/~marco/projects/kdd

Marco
____________________________________________________________________________
__

Title: Efficient Parameter Learning in Bayesian Networks from
Incomplete Databases
Authors: Marco Ramoni [1] and Paola Sebastiani [2]
    1.Knowledge Media Institute, The Open University.
    2.Department of Actuarial Science and Statistics, City University.

TR number: KMI-TR-41
Date: January 1997
Keywords: Bayesian Belief Networks; Machine Learning,
Probabilistic Reasoning, Missing Data.

Abstract:
Current methods to learn conditional probabilities from incomplete
databases use a common strategy: they complete the database by
inferring somehow the missing data from the available information and
then learn from the completed database. This paper introduces a new
method - called bound and collapse (BC) - which does not follow this
strategy. BC starts by bounding the set of estimates consistent with the
available information and then collapses the resulting set to a point
estimate via a convex combination of the extreme points, with weights
depending on the assumed pattern of missing data. Experiments
comparing BC to the Gibbs Samplings are also provided.

WWW: http://kmi.open.ac.uk/kmi-abstracts/kmi-tr-41-abstract.html


____________________________________________________________________________
__

Title: Learning Bayesian Networks from Incomplete Databases
Authors: Marco Ramoni [1] and Paola Sebastiani [2]
    1.Knowledge Media Institute, The Open University.
    2.Department of Actuarial Science and Statistics, City University.

Reference: Technical Report KMI-TR-43
Date: February 1997
Keywords: Bayesian Belief Networks, Bayesian Learning, Missing Data, Model
Selection

Abstract:
Bayesian approaches to learn the graphical structure of Bayesian Belief
Networks (BBNs) from databases share the assumption that the
database is complete, that is, no entry is reported as unknown. Attempts
to relax this assumption often involve the use of expensive iterative
methods to discriminate among different structures. This paper
introduces a deterministic method to learn the graphical structure of a
BBN from a possibly incomplete database. Experimental evaluations
show a significant robustness of this method and a remarkable
independence of its execution time from the number of missing data.

WWW: http://kmi.open.ac.uk/kmi-abstracts/kmi-tr-43-abstract.html

____________________________________________________________________________
_

Title: The Use of Exogenous Knowledge to Learn Bayesian Networks
from Incomplete Databases
Authors: Marco Ramoni [1] and Paola Sebastiani [2]
    1.Knowledge Media Institute, The Open University.
    2.Department of Actuarial Science and Statistics, City University.

TR number: KMI-TR-44
Date: February 1997
Keywords: Information extraction, Uncertainty and noise in data,
Bayesian inference.

Abstract:
Current methods to learn Bayesian Networks from incomplete
databases share the common assumption that the unreported data are
missing at random. This paper describes a method - called Bound and
Collapse (BC) - to learn Bayesian Networks from incomplete databases
which allows the analyst to efficiently integrate the information
provided by the database and the exogenous knowledge about the pattern
of missing data. BC starts by bounding he set of estimates consistent
with the available information and then collapses the resulting set to
a point estimate via a convex combination of the extreme points, with
weights depending on the assumed pattern of missing data. Experiments
comparing BC to the Gibbs Samplings are also provided.

WWW: http://kmi.open.ac.uk/kmi-abstracts/kmi-tr-44-abstract.html

____________________________________________________________________________


Title: Discovering Bayesian Networks in Incomplete Databases
Authors: Marco Ramoni [1] and Paola Sebastiani [2]
    1.Knowledge Media Institute, The Open University.
    2.Department of Actuarial Science and Statistics, City University.

TR number: KMI-TR-46
Date: March 1997
Keywords: Information extraction, Uncertainty and noise in data,
Bayesian inference.


Abstract:
Bayesian Belief Networks (BBNs) are becoming increasingly
popular in the Knowledge Discovery and Data Mining community. A
BBN is defined by a graphical structure of conditional dependencies
among the domain variables and a set of probability distributions
defining these dependencies. In this way, BBNs provide a compact
formalism - grounded in the well-developed mathematics of
probability theory - able to predict variable values, explain
observations, and visualize dependencies among variables. During
the past few years, several efforts have been addressed to develop
methods able to extract both the graphical structure and the
conditional probabilities of a BBN from a database. All these
methods share the assumption that the database at hand is complete,
that is, it does not report any entry as unknown. When this
assumption fails, these methods have to resort to expensive iterative
procedures which are infeasible for large databases. This paper
describes a new Knowledge Discovery system based on an efficient
method able to extract the graphical structure and the probability
distributions of a BBN from possibly incomplete databases. An
application using a large real-world database will illustrate methods
and concepts underlying the system and will assess its advantages as
a Knowledge Discovery system.

WWW: http://kmi.open.ac.uk/kmi-abstracts/kmi-tr-46-abstract.html

____________________________________________________________________________
__
Marco Ramoni
Knowledge Media Institute          Phone: +44-1908-65-5721
The Open University                Fax:   +44-1908-65-3169
Walton Hall                        Email: [email protected]
Milton Keynes MK7 6AA              URL:   http://kmi.open.ac.uk/~marco
UNITED KINGDOM                     CUSeeMe: 137.108.81.18

>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Wed, 16 Apr 1997 10:24:19 -0400
From: Tom Mitchell <[email protected]>
Sibject: Text book for Data Mining: Machine Learning by Tom Mitchell

 DATAMINING TEXTBOOK:  Machine Learning, Tom Mitchell, McGraw Hill

McGraw Hill announces immediate availability of a new textbook that
covers the primary algorithms used in datamining.  MACHINE LEARNING
provides a thorough, interdisciplinary introduction to the key
algorithms used in datamining. 

Free inspection copies are available for instructors, by contacting
Betsy Jones (McGraw Hill) at (630) 789-5057.

The chapter outline is:

     1. Introduction 
     2. Concept Learning and the General-to-Specific Ordering 
     3. Decision Tree Learning 
     4. Artificial Neural Networks 
     5. Evaluating Hypotheses 
     6. Bayesian Learning 
     7. Computational Learning Theory 
     8. Instance-Based Learning 
     9. Genetic Algorithms 
     10. Learning Sets of Rules 
     11. Analytical Learning 
     12. Combining Inductive and Analytical Learning 
     13. Reinforcement Learning

     (414 pages)

This book is intended for upper-level undergraduates, graduate
students, and professionals working in the area of datamining, machine
learning, and statistics.  The text includes over a hundred homework
exercises, along with web-accessible code and datasets (e.g., neural
networks applied to face recognition, Bayesian learning applied to
text classification).

For further information and ordering instructions, see
http://www.cs.cmu.edu/~tom/mlbook.html

>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: [email protected] (Ross Quinlan)
Date: Wed, 16 Apr 1997 07:47:28 -0400 (EDT)
Subject: Windows Version of C5.0 ("See5") Available Now

Please see http://www.rulequest.com for details.  As with the
Unix version, a scaled-down demonstration version is free, and
there is also a free 10-day trial of the real thing.

Ross

>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
[The following is a commercial announcement. GPS]
Date: Sat, 19 Apr 97 11:51:52 PDT
From: Stanley Rice <[email protected]>

Now that spring is sprung, what about tasting some
PRECOORDINATE WINES FROM POSTCOORDINATE BOTTLES? ;-)

Like the taste of wine, relevance is not objective to us. It
is subjective, without crisp definition, dependent on our
context, describable only by fuzzy postcoordinations. SIGs
as well as individuals recognize relevance only in context.

With a little help from our friends we can optimize
relevance. But most folks have never even heard the word
postcoordination. Precoordinate systems still predominate--
Yahoo categories, single topic and alphabetical filings--at
work, at school, and at home.

The Internet, AltaVista-style search engines, and Thematic
concept filtering will change a lot of that before long. The
change may come more smoothly because old precoordinations
can be included under postcoordinations, and actually be
much enhanced thereby. Just putting the old wine in the new
bottles can multiply its bouquet and value. (No, there is
nothing for sale here.)

Examples of postcoordination possibilities with included
fuzzy precoordinations, suited to electronic libraries,
corporate intranets (and many other "incoherent" but
currently precoordinated collections) are given at:

http://www.cruzio.com/~autospec/darwin.htm

(Darwin's "The Voyage of the Beagle" is used to illustrate
Dewey precoordinations included under postcoordinations.)
Want a different kind of example? Consider "Correlating
Symptoms and Remedies," which includes uses for various
kinds of traditional diagnostic precoordinations:

http://www.cruzio.com/~autospec/accessf.htm

On the Autospec home page (address below) we look at
postcoordination of contextual and conceptual filtering from
many points of view. Your reactions are always appreciated.
In any case, relax and have another glass. It's spring! ;-)

Regards, Stan Rice

-- 
THEMATICS: Conceptual & Marketing Access to Text and Media
AUTOSPEC, Inc. Santa Cruz, CA. Stan Rice   Voice: (408) 457-1430
Home page for Autospec:  http://www.cruzio.com/~autospec/

>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
[The following is a commercial announcement. GPS]

Date: Tue, 22 Apr 1997 11:09:49 -0700
From: Pamela Lerwick <[email protected]>
Subject: IDIS Special Release
					
Contact: IDI Marketing Communications
(310) 936-3600

New Machine-Man Paradigm 
Refocuses Data Mining

Novel Approach Based on Explainable Intranet Documents Introduces New
Languages and Techniques for Data Mining

____________________________________________________________________________
_

Los Angeles -- April 21, 1997

The 1997 Database World Conference in Boston will witness the birth of a
new
computing paradigm for decision support -- certain to affect the way
corporations use and benefit from computers. While most computing to date
has focused on man-machine interaction, this new and novel approach
introduces machine-man interaction. 

In man-machine systems, humans view machines as "order-takers" -- we tell
machines what to do, not help them tell us what they know. This one-way
bias
is manifest even in the term man-machine itself. 

While the direction of man-machine systems has been from man to machine,
the
focus of machine-man interaction is from machine to man, assisting machines
to say their piece -- delivering the benefits of the immense knowledge they
possess. This does not mean natural language output, but is based on a
specific and novel approach to model building, data structuring, language
design and information delivery.

With a database query language or a programming language, the user types or
otherwise inputs a query or program -- the machine then tries to understand
it and generate a response. In machine-man interaction, the machine types
up
a set of statements as an "explainable document" and the user understands
them to improve decision making. 

This dramatic new idea will be first presented at the Database World
Conference in Boston, on May 20, 1997 by Dr. Kamran Parsaye, CEO of
Information Discovery, Inc. 
He will discuss the far reaching consequences of this paradigm for
corporate
computing.

The NASA Scientific and Technical Information Program defines a man-machine
system as: "A System in which the functions of the man and the machine are
interrelated and necessary for the operation of the system." Similarly, Dr.
Parsaye defines a machine-man system as: "A System in which the functions
of
the machine and the man are interrelated and necessary for the thinking of
the man." 

For a machine to tell us anything, it needs a suitable language of
expression. It needs to be able to phrase its knowledge in terms of a
language understandable by us. When dealing with computer systems, the term
"language" has often been used in the context of programming languages and
query languages. In machine-man interaction, we need languages that help
machines express their knowledge for our benefit -- i.e. knowledge
expression languages. 

Programming and query languages have to be understandable by computers,
knowledge expression languages have to be comprehensible to human users --
they are the tools machines use to help us. Dr. Parsaye will illustrate how
traditional languages and systems such as SQL or OLAP are inadequate due to
their focus on one-way interaction models. 

Machine-man interaction requires three distinct language facilities: First
a
language to organize the environment and develop scripts, etc. as one does
in any system,  second a language to let a developer or analyst define
models, set up scenarios and specify terms for the lexicon to be used by
the
machine (i.e. an interactive document composition language), and third a
language to allow the machine to express knowledge (i.e. a knowledge
expression language.)

Using agent technology on the inter/intranet, machine-man system have a
life
of their own.  They look for patterns with agents, perform discovery and
when there is something interesting to say, they generate an "explainable
document" on the intranet in plain English (or Italian, French, etc.)
accompanied by graphs. Machines need no longer be just order-takers, but
can
be the finders and communicators of knowledge.  

The impact of the new paradigm on corporate planning for decision support
and data warehousing will be significant. Business users and IS departments
need no longer just consider "tools" as a method of data mining, but can
rely on automatically generated Java-based explainable documents with rich
text and graphic content. This will simultaneously accelerate the use of
Java, intranets, data warehousing and data mining.

For more information on the Database World Conference please visit DCI at
http://www.DCIexpo.com on the internet, or call (508) 470-3870. For more
information on Information Discovery, Inc. please visit
http://www.datamining.com on the internet or call (310) 937-3600.

Pamela Lerwick

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Mon, 14 Apr 1997 17:14:00 +0100
From: ROSS DONALD KING <[email protected]>
Subject: Ph.D. Studentships

Field:  data mining, machine learning, ILP, scientific discovery
       
Place:  University of Wales, Aberystwyth
        Wales, UK

Applications are invited for Ph.D. Studentships in the area of data mining 
in the Centre for Intelligent Systems at the  Department of Computer
Science, University of Wales, Aberystwyth.  

The Centre for Intelligent Systems has a particular interest in 
knowledge rich data mining systems, Inductive Logic programming, 
and applications in biology and chemistry.

Applicants  should  have  at least a 2(i)  in Computer  Science or related
subject,  with a good background in Artificial Intelligence or
Statistics.

More information can be obtained from 
Professor Mark Lee or Dr. Ross D. King

Department of Computer Science, 
University of Wales, 
Penglais, 
Aberystwyth, 
Ceredigion, SY23 3DB, 
Wales, UK 

Tel: +44 1970 622420 
Fax: +44 1970 622455 
Email: [email protected] [email protected]

or from the URLs:
http://www.aber.ac.uk/~dcswww/Public/Recruitment/Proposals/
http://www.aber.ac.uk/~dcswww/Public/Research/


>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Thu, 17 Apr 97 09:32:42 EDT
From: "Fred J. Damerau (862-2214)" <[email protected]>
Subject: Research Associate Position in Text Mining/Information Extraction

The Natural Language Understanding Group at the IBM T. J. Watson
Research Laboratory (Yorktown Heights, NY 10566) is looking for
a Research Associate with the qualifications listed below.  The
position will most likely be initially for one year, but it is
renewable.  The successful candidate will work on our text mining/
information extraction project, with a particular emphasis on
applying machine learning techniques to various issues in document
management. The project combines state-of-the-art research on machine
learning in text mining with practical production-level systems building.

________________________________________________________________
Qualifications:

The ideal candidate would have the following knowledge and experience.

Education: MA/MS in computer science or other field with extensive
           background in computer science.

Programming languages:
Extensive knowledge and experience in C/C++ required; Java a plus.

Specialized Background:
Experience in implementing machine learning algorithms and/or
natural language processing algorithms.

Operating systems:
Required: Familiarity with Windows95/NT and Unix/AIX,
Helpful:  Familiarity with OS/2
System programming/API experience on these operating systems not required.

General Software Development:
Familiarity with issues of large scale software development, e.g.,
API design and use, creation and integration of DLLs/Libraries,
source code control systems etc.

Candidates should send resumes and supporting letters to:

Thomas Hampp
eMail: [email protected]
phone: 914-945-1714

End of message
410.2497:15IJSAPL::OLTHOFSpellchecked Henry AlthoughTue May 06 1997 11:341146
Knowledge Discovery Nuggets 97:15, e-mailed 97-05-04
 News:
	* R. Uthurusamy, KDD-97 Overview and Tutorials 
		http://www-aig.jpl.nasa.gov/kdd97-docs/kdd97.tutorials.html
	* R. Uthurusamy, KDD-97 Workshop, Integration of Data Mining and 
		Data Visualization
		http://www.cs.uml.edu/~grinstei/kddvis-workshop.html
	* R. Uthurusamy, KDD-97 Registration Information
		http://www-aig.jpl.nasa.gov/kdd97-docs/registrationinfo.html
	* Peter Turney, data mining from wafers manufacturing process
Siftware:
	* Nicolas Bissantz, Delta Miner 3.0
		http://www.bissantz.de 
Positions:
	* Pablo Tamayo, Job Position at Thinking Machines
Meetings:
	* E. Horvitz, Call for participation, UAI-97, 
            http://cuai97.microsoft.com
	* Gordian Institute, "Making Sense of Data: Computer-Aided 
         Pattern Discovery", July 14-18, Charlottesville, Virginia			 		http://www.gordianknot.com
	* R. Zicari, COMDEX Internet & OBJECT WORLD Frankfurt`97 (Oct 7-10)
		http://www.ltt.de
--
Knowledge Discovery Nuggets is a free electronic newsletter for the 
Data Mining and Knowledge Discovery community, focusing on the 
latest research and applications.

Submissions are most welcome and should be emailed, with a 
DESCRIPTIVE subject line (and a URL) to [email protected]. 
Please keep CFP and meetings announcements short and provide 
a URL for details.
 
To subscribe, see http://www.kdnuggets.com/subscribe.html 

KD Nuggets frequency is 3-4 times a month. 
Back issues of KD Nuggets, a catalog of data mining tools 
("Siftware"), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site 
at http://www.kdnuggets.com/

	-- Gregory Piatetsky-Shapiro (editor)
        [email protected]

********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not 
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
A gentleman is not a pot 
                        Confucius 
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Thu, 24 Apr 1997 18:06:38 -0400
From: [email protected] (R. Uthurusamy)
Subject: KDD-97 Registration Information

KDD-97 Registration Brochure

Third International Conference on Knowledge Discovery and Data Mining (KDD-97)
August 14-17, 1997

Sponsored by the American Association for Artificial Intelligence
http://www.aaai.org

KDD-97: A Preview

The rapid growth of data and information has created a need and an
opportunity for extracting knowledge from databases, and both researchers
and application developers have been responding to that need.  Knowledge
discovery in databases (KDD), also referred to as data mining, is an area
of common interest to researchers in machine discovery, statistics,
databases, knowledge acquisition, machine learning, data visualization,
high performance computing, and knowledge-based systems.  KDD applications
have been developed for astronomy, biology, finance, insurance, marketing,
medicine, and many other fields.

The Third International Conference on Knowledge Discovery and Data Mining
(KDD-97) will follow up the success of KDD-95 and KDD-96 by bringing
together researchers and application developers from different areas
focusing on unifying themes.

KDD-97 Organization

General Conference Chair:
Ramasamy Uthurusamy, General Motors Corporation, USA

Program Cochairs:
David Heckerman, Microsoft Research, USA
Heikki Mannila, University of Helsinki, Finland
Daryl Pregibon, AT&T Laboratories, USA

Publicity Chair:
Paul Stolorz, Jet Propulsion laboratory, USA

Tutorial Chair:
Padhraic Smyth, University of California, Irvine, USA

Demo and Poster Sessions Chair:
Tej Anand, NCR Corporation, USA

Awards Chair:
Gregory Piatetsky-Shapiro, Geneve Consulting, USA

Keynote Speaker:

Peter Huber, Universitat Bayreuth, Germany

	"From Large to Huge. A Statistician's Reactions to KDD & DM"

The statistics and AI communities are confronted by the same challenge, the
onslaught of ever larger data collections, but the two communities have
reacted independently and differently. What could they learn from each
other if they looked over the fence? What is amiss on either side?


KDD-97 Tutorial Abstracts and Speakers
--------------------------------------
Full info on tutorials available at
http://www-aig.jpl.nasa.gov/kdd97-docs/kdd97.tutorials.html

All tutorials will be presented on Thursday, August 14, 1997.  The times
listed below are tentative. Admission to the tutorials is included in your
conference registration fee.  Registrants can attend up to four consecutive
tutorials, including four tutorial syllabi.

8:00 to 10:00am      	T1-  Fayyad and Simoudis (single session)

                   	Session 1                Session 2
10:30am to 12:30pm     	T2 - Hand                T3 - Feldman
1:30 to 3:30pm       	T4 - Swayne and Cook     T5 - Chaudhuri and Dayal
4:00 to 6:00 pm      	T6 - Keim                T7 - DuMouchel


Tutorial 1: 8:00-10:00am
Data Mining and KDD: An Overview
Usama Fayyad, Microsoft Research and Evangelos Simoudis, IBM

We present a basic tutorial of this new and emerging area and emphasize
relations to constituent communities, including statistics, databases,
pattern recognition, learning, and visualization. The tutorial provides a
basic overview of the KDD process for extracting knowledge from databases
and covers the basics of each step in the process including: data
warehousing, selection and cleaning, data transformation, data mining,
evaluation, and visualization. We also cover a sampling of successful
applications and outline challenges and issues to be addressed.

Dr. Usama Fayyad is a Senior Researcher at Microsoft Research, the Decision
Theory & Adaptive Systems Group. His research interests include knowledge
discovery in large databases, data mining, machine learning, statistical
pattern recognition, and clustering. After receiving the Ph.D. degree in
1991, he joined the Jet Propulsion Laboratory (JPL), California Institute
of Technology (until 1996). At JPL, he headed the Machine Learning Systems
Group where he developed data mining systems for analysis of large
scientific databases.

Dr. Evangelos Simoudis is Vice President, Global Business Intelligence
Solutions - IBM North America, where he is responsible for the development
and deployment of data mining and decision support solutions to IBM's
customers worldwide. Dr. Simoudis received a B.A. in Physics from Grinnell
College, a B.S. in Electrical Engineering from California Institute of
Technology, an M.S. in Computer Science from the University of Oregon, and
a Ph.D. in Computer Science from Brandeis University.


Tutorial 2: 10:30am-12:30pm
Modelling Data and Discovering Knowledge
David Hand, Open University, UK

Our aim is to extract knowledge from large bodies of data. The size of
these bodies mean that we cannot do it unaided, but must use fast
computers, applying sophisticated statistical tools. Attempts to automate
the process of knowledge extraction date from at least the early 1980s,
with the work on statistical expert systems. We examine this work, noting
its successes and failures and, especially, what researchers in data mining
and knowledge discover can learn from those efforts.  We examine what data
are, what information is, and what knowledge is. We contrast modelling with
discovery, especially in the context of large data sets. We examine high
level modelling issues, such as overfitting, generalisability,
overmodelling, and model evaluation. And we examine high level exploration
issues such as the discovery of accidental artefacts. The confluence of
computing and statistics in some areas provides a nice backdrop against
which to examine these issues, and we briefly discuss neural networks and
classification trees from these two perspectives.

Dr. David Hand is Professor of Statistics at the Open University. His
research interests include the foundations of statistics, statistical
computing, and multivariate statistics, the latter especially as applied to
classification problems. His applications interests include medicine,
finance, and psychology. He is Editor-in-Chief of Statistics and Computing
and has has published fourteen books, the most recent of which is
Construction and Assessment of Classification Rules, Wiley, January 1997.


Tutorial 3: 10:30am-12:30pm
Text Mining - Theory and Practice
Ronen Feldman, Bar-Ilan University, Israel

Knowledge Discovery in Databases (KDD) focuses on the computerized
exploration of large amounts of data and on the discovery of interesting
patterns within them. While most work on KDD has been concerned with
structured databases, there has been little work on handling the huge
amount of information that is available only in unstructured textual form.
In this tutorial we will present the general theory of Text Mining and will
demonstrate several systems that use these principles to enable interactive
exploration of large textual collections. We will describe generic
techniques for text categorization and information extraction that are used
by these systems. The systems that will be presented are KDT which is the
system for Knowledge Discovery in Texts; FACT, which discovers associations
among keywords labeling the items in a collection of textual documents; and
the Text Explorer, which is a system that provides a high level language
for interactive exploration of textual collections. We will present a
general architecture for text mining and will outline the algorithms and
data structures behind the systems. We will give special emphasis to
incremental algorithms and to efficient data structures.

Dr. Ronen Feldman is a lecturer at the Mathematics and Computer Science
Department of Bar-Ilan University in Israel. He received his B.Sc. in Math,
Physics and Computer Science from the Hebrew University, and his Ph.D. in
Computer Science from Cornell University. His main research is in the area
of Machine Learning and Data Mining. He is currently coordinating several
research projects for developing dedicated text mining systems. These
systems work on plain text collections and on the Internet.


Tutorial 4: 1:30-3:30pm
Exploratory Data Analysis using Interactive Dynamic Graphics
Deborah Swayne, Bell Communications Research and Diane Cook, Iowa State
University

Researchers and software designers in the field of data mining are just
beginning to make extensive use of graphical methods. Interactive dynamic
data visualization has been explored in the field of statistics for over
twenty years, and we propose that much of what has been learned in
statistics is relevant for data mining. This class is an introduction to
interactive data visualization as it is practiced as part of exploratory
data analysis. The XGobi software, publicly available dynamic visualization
software, will be used in the analysis of examples from biology, business,
physics, engineering, and telecommunications. The examples will illustrate
a set of general visualization principles which are embodied in specific
methods such as brushing and identification of points in simple
scatterplots, three dimensional rotations, rotations in higher dimensions
such as the grand tour, and directed searches in higher dimensions for
interesting two dimensional views using projection pursuit and manual
control.

Deborah Swayne has worked at Bellcore since that company's inception in
1985, and is currently a member of the Statistics and Data Mining Research
Group. Her research focusses on software methods for visualizing data. She
is one of the authors of the XGobi software, originally developed at
Bellcore. She has a Bachelor's degree in African Linguistics from the
University of Wisconsin at Madison, and a Master's degree in Statistics
from Rutgers University.

Dr. Dianne Cook is an Assistant Professor in the Department of Statistics,
Iowa State University. She received her PhD from Rutgers University in May
1993, and has conducted research into dynamic statistical graphics. Her
interests include using these methods for understanding high-dimensional
data, and adapting them for analyzing geographically referenced data with
multiple measurements at each site.


Tutorial 5: 1:30-3:30pm
OLAP and Data Warehousing
Surajit Chaudhuri, Microsoft Research and Umesh Dayal, Hewlett Packard
Laboratories

On-Line Analytical Processing (OLAP) and Data Warehousing technologies
enable enterprises to gain competitive advantage by exploiting the
ever-growing amount of data that is collected and stored in corporate
databases and files for better and faster decision making. Over the past
few years, these technologies have experienced explosive growth, both in
the number of products and services offered, and in the extent of coverage
in the trade press. Vendors (including all database companies) are paying
increasing attention to all aspects of decision support. The area opens up
interesting research directions, with ties to past work in database
systems, but with different assumptions and requirements. Only very
recently, however, has the database research community started to
understand and address some of these issues. This tutorial presents an
overview of OLAP and data warehousing, and an in-depth study of selected
aspects. An outline of the tutorial follows:
1. Introduction: definitions, evolution, differences from OLTP,
architectures 2. Models and Tools: conceptual model for OLAP, front-end
tools (e.g., multidimensional spreadsheets), database design (e.g., star
and snowflake schema). 3. Database Server technologies for Decision Support
Queries: specialized indexing techniques, specialized join and scan
methods, data partitioning and use of parallelism, intelligent processing
of aggregates, complex query processing, extensions to SQL, ROLAP vs.
MOLAP. 4. Other Services for OLAP/Data warehousing: data cleaning, loading
and refresh, tools for warehouse, system and process management, metadata
management and the role of repository. 5. State of Commercial Practice. 6.
Research Issues. The target audience is researchers and developers
interested in learning about the concepts, products and the technical
innovations in the area of decision support technologies.

Dr. Surajit Chaudhuri is a researcher in the Database Research Group of
Microsoft Research. From 1992 to 1995, he was a Member of the Technical
Staff at Hewlett-Packard Laboratories, Palo Alto. He did his B.Tech at the
Indian Instiute of Technology, Kharagpur and his Ph.D. at Stanford
University. In addition to query processing and optimization, Surajit is
interested in the areas of data mining, database design and uses of
databases for nontraditional applications.

Dr. Umesh Dayal is a senior researcher at Hewlett-Packard Labs, Palo Alto,
California. His current research interests are in distributed information
systems, workflow management, data mining, and information management
issues related to the emerging global information infrastructure. He
received his Ph.D. and S.M. degrees from Harvard University, his M.E. and
B.E. degrees from the Indian Institute of Science, and his B.Sc. degree
from Osmania University, India.


Tutorial 6: 4:00-6:00pm
Visual Techniques for Exploring Databases
Daniel Keim, University of Munich

For data exploration to be effective, it is important to include the human
in the exploration process and combine the flexibility, creativity, and
general knowledge of the human with the enormous storage capacity and the
computational power of today's computers. Visual database exploration aims
at integrating the human in the exploration process, applying its
perceptual abilities to the large data sets available in today's computer
systems. The basic idea of visual data exploration is to present the data
in some visual form, allowing the human to get insight into the data and
draw conclusions. Visual data exploration techniques have proven to be of
high value in exploratory data analysis and they also have a high potential
for exploring large databases. Visual database exploration is especially
powerful for the first steps of the data mining process, namely
understanding the data and generating hypotheses about the data, but it may
also significantly contribute to the actual knowledge discovery by guiding
the search using visual feedback. The goal of the tutorial is to show the
potential of visualization technology for exploring large databases. The
tutorial provides an overview of the state-of-the-art in data visualization
and provides a classification of the existing data visualization
techniques. Besides describing each of the classes, the tutorial focuses on
new developments in data visualization, which are relevant to the area of
knowledge discovery, and describes a wide range of recently developed
techniques for visualizing large amounts of arbitrary multi-attribute data
which does not have any two- or three-dimensional semantics and therefore
does not lend itself to an easy display. A detailed comparison shows the
strength and weaknesses of the existing techniques and reveals potentials
for further improvements. Several examples demonstrate the benefits of
visualization techniques for exploring databases. The tutorial concludes
with an overview of existing database exploration and visualization
systems, including research prototypes as well as commercial products.

Dr. Daniel Keim is one of the leading experts in the field of visual
database exploration, and he was the chief engineer in designing the VisDB
system - a visual database exploration system. Dr. Keim received his
diploma (equivalent to an MS degree) in Computer Science from the
University of Dortmund in 1990 and his Ph.D. in Computer Science from the
University of Munich in 1994. Currently, he is a teaching and research
assistant (approximately equivalent to an assistant professor) at the
Institute for Computer Science of the University of Munich, Germany.


Tutorial 7: 4:00-6:00pm
Statistical Models for Categorical Response Data
William DuMouchel, AT&T Research

This tutorial will survey the most common models and methods statisticians
use to fit and test relationships among categorical (discrete) data. Most
of these techniques are described in statistics texts such as Categorical
Data Analysis , by Alan Agresti, (Wiley 1990) and are widely available in
popular computer packages such as SAS and Splus. Therefore it is almost de
rigeur for someone with a new classification technique to compare the
proposal to one or more of these standard methods. The tutorial will focus
on loglinear and logistic regression models, and related models such as
probit, poisson regression, and survival models. In the short time
available, priority will be given to explaining why these techniques are so
popular among statisticians, and to how the basic models have been extended
to handle variables having more than two categories or when some of the
variables have continuous or ordinal scales. Examples of model fitting,
model search and model comparison using SAS and Splus will be presented and
discussed.

Dr. William DuMouchel has been on the faculties of UC Berkeley, University of
Michigan, University of London, MIT and Columbia University. From 1987 to
1992 he was Chief Statistical Scientist at BBN Software Products, helping
to design and develop commercial software advisory systems for data
analysis and experimental design.  He is currently at AT&T Labs - Research,
Florham Park, New Jersey.

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Thu, 24 Apr 1997 18:06:38 -0400
From: [email protected] (R. Uthurusamy)
Subject: KDD-97 Workshop 

KDD-97 Workshop - August 17, 1997  8:30am-5pm
---------------------------------------------

     Issues in the Integration of Data Mining and Data Visualization
     ---------------------------------------------------------------

Details:http://www.cs.uml.edu/~grinstei/kddvis-workshop.html

Data visualization deals with the effective portrayal of data with a goal
towards insight about the data. Typically, the data is of high volume,
multidimensional in nature, and does not lend itself to easy display. The
data is also often non-spatial and temporal in nature.

Data visualization software systems are very popular with end-user domain
scientists who require visual tools to explore and analyze their data.
These visual tools however are used strictly as output of the exploration
process and have received much attention whereas the input issues to the
exploration process still have not. The KDD community is concerned with two
aspects of visualization techniques: 1. Its use at the "back-end" of the
exploration process to help understand models extracted by data mining
algorithms, and 2. Scalability issues in visualization: how do we make it
efficient in presence context of large databases where data access is
expensive. The visualization community looks at KDD and analytic methods
also as applications to generate displays. However, visualization can be
used as input to KDD and analytic tools; it can also be used to support
computational steering. An effective visualization front-end can guide a
data mining algorithm in its search and may result in much better and more
easily acceptable solutions. This workshop will continue the discussions
started at the first two workshops and focus on these and other issues that
make a case for integrating KDD and visualization technologies.

Two previous workshops (Siggraph '90 and Visualization '91) have dealt with
areas such as high-level requirements for data structures and access
software, and data visualization environments. The first and second
workshop on Database Issues for Data Visualization were held in 1993 and
1995 and explored the fundamental issues. A number of experimental,
prototype, and research systems were presented. The second workshop also
saw a beginning interest with data mining and visualization integration.
This trend, so significant in the commercial sector today, is in its
infancy and is in need of much research attention.

Position statements and papers are welcome on the following issues as they
relate to KDD and data visualization integration. We would like to keep
discussions focused on the end result, which is improving the integration
of data mining and knowledge discovery systems with visualization:

* Requirements Visualization places on Knowledge Discovery Systems
* Data Models and Access Structures
* Modeling the User - Tasks, Processes, Support Issues
* Advanced User Interfaces for Data Mining
* Visual Languages for Data Mining
* System Integration Issues
* Computational Steering for Data Mining
* Scalability to Large Databases
* Distributed, Heterogeneous Data Set Issues - Data and Computation Sharing
* Examples of Integrated Systems
* Applications of Integrated Systems

Workshop Paper Submissions (Deadline June 15)

Papers (and position papers to be expanded for final publication) are
solicited that present research results in the integration of data mining
and visualization. Papers should be limited to 5,000 words and may be
accompanied by NTSC video. These should describe some original research on
the particular subject, and how it fits in with the overall theme of the
workshop. Proper references should be cited.

Workshop Registration Fee

Registration forms will be sent to the accepted participants. There is a
single registration fee of US $100 which covers the workshop sessions,
preprints, and coffee breaks.

Workshop Organizers

Georges Grinstein
Institute for Visualization and Perception Research
University of Massachusetts at Lowell
Lowell, MA 01854, USA
Email: [email protected]
Fax: +1-508-934-3551 * Phone: +1-508-934-3627

Andreas Wierse
Institute for Computer Applications
Dep. Computersimulation and Visualization
Pfaffenwaldring 27
D-70550 Stuttgart, Germany
Email: [email protected],
Fax: +49(0)711-682357 * Phone: +49-711-685-5796

Usama Fayyad
Microsoft Research
Redmond, WA 98052-6399, USA
Email: [email protected]
Fax: +1-206-936-7329 * Phone +1-206-703-1528
---------------------------------------------

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Thu, 24 Apr 1997 18:06:38 -0400
From: [email protected] (R. Uthurusamy)
Subject: KDD-97 Demos/Exhibits of Knowledge Discovery Products
-----------------------------------------------------

Following the sucess of the demonstration sessions in previous KDD
conferences, the KDD-97 program will also include demonstrations of
knowledge discovery products, knowledge discovery applications and research
prototypes.  Unlike previous demonstration sessions, we will clearly
differentiate between commercial product demonstrations and research
demonstrations.

We are inviting commercial vendors to exhibit at KDD-97. The exhibitor fee
for KDD-97 will be a nominal $250.00.  Exhibitors will be provided with a
6ft table top.  In this space vendors will be allowed to distribute product
or company literature, show product demonstrations and set up signage.
Vendors will have to bring all necessary hardware and software that they
will require for their demonstrations.

The exhibit area will be open during the following hours: Aug. 15th: 12:30-5pm

For your information total attendance at KDD-96 was 457. Of these 35% were
affiliated with universities and 65% were affiliated with industry.  If you
would like to exhibit at KDD-97 please fill out the registration form and
send it along with the name of your Product(s) and/or Service(s) and a 200
word (maximum) Description of Product(s)/Service(s) to: AAAI, KDD-97
Exhibit, 445 Burgess Drive, Menlo Park, CA 94025, USA.  The description
will be included in the conference program.

We are also soliciting demonstrations of research prototypes at KDD-97.
This  demonstration session will be held on August 15 from 12:30 to 5:00
PM.  We have a limited budget for providing hardware for research
demonstrations. This year we will give priority to demonstrations that are
in conjunction with accepted papers at KDD-97.  Within budget and space
constraints we will make every effort to accommodate as many demonstrations
as possible.  If you would like your demonstration to be considered for
KDD-97 please provide the following information to Tej Anand
([email protected]) by  June 1, 1997.

* Name of Demonstration:
* Title of Paper: (If this demonstration is in conjunction with a
                   paper/poster at KDD-97)
* Development Team:
* Affiliations of Development Team Members:
* Contact Telephone#:

* Description of Demonstration: (A short description of approx. 200 words)
* What is unique about your system or application?: (No more than 50 words)
* Status: Research Prototype/Commercially available product/Fielded application
* Hardware Required: (Please state any special memory or disk requirements)
* Operating System: (Please state specific version number)
* WAN Connection Required: Yes/No 
                 (If Yes, please state any special modem requirements)
* Will you bring your own hardware?: Yes/No
* Any other requirements:

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Thu, 24 Apr 1997 18:06:38 -0400
From: [email protected] (R. Uthurusamy)
Subject: KDD-97 Registration Information

A registration application is attached to this online brochure. The KDD-97
program registration includes admission to four tutorials, 4 tutorial
syllabi, technical and demo sessions, the opening reception, the KDD-97
Conference Proceedings and mid-morning & afternoon coffee breaks. Onsite
registration will be located in the foyer outside the California Ballroom,
Newport Beach Marriott Hotel and Tennis Club, lobby level.

Early Registration (Postmarked by June 10)
AAAI Members
Regular $295 Students $95
Nonmembers
Regular $375 Students $155

Late Registration (Postmarked by July 15)
AAAI Members
Regular $350 Students $125
Nonmembers
Regular $425 Students $180

On-Site Registration (Postmarked after July 15 or onsite.)
AAAI Members
Regular $400 Students $475
Nonmembers
Regular $150 Students $210

Workshop Registration

Registration forms will be sent to the accepted participants. There is a
separate registration fee of US $100 which covers the workshop sessions,
preprints, and coffee breaks.

Payment Information

Prepayment of registration fees is required. Checks, international money
orders, bank transfers and travelers' checks must be in US dollars.
American Express, MasterCard, VISA, and government purchase orders are also
accepted.  Registration applications postmarked after the early
registration deadline will be subject to the late registration fees.
Registration applications postmarked after the late registration deadline
will be subject to on-site registration fees. Student registrations must be
accompanied by proof of full-time student status.

Refund Requests

The deadline for refund requests is July 25, 1997.  All refund requests
must be made in writing. A $75.00 processing fee will be assessed for all
refunds.

Registration Hours

Registration hours will be Thursday-Saturday, August 14-16, 7:30am-6:00pm
and Sunday, August 17, 8:00am-3:00pm. All attendees must pick up their
registration packets for admittance to programs.

Housing

AAAI has reserved a block of rooms at the Newport Beach Marriott Hotel at
reduced conference rates.  Conference attendees must contact the hotel
directly and identify themselves as KDD-97 registrants to qualify for the
reduced rates.  Hotel rooms are priced as singles (1 person, 1 bed),
doubles (2 persons, 2 beds), triples (3 persons, 2 beds), quads (4 persons,
2 beds).  Rooms will be assigned on a first-come, first-served basis.  All
rooms are subject to a 10% occupancy tax.

Headquarters Hotel:

Newport Beach Marriott Hotel
900 Newport Center Drive
Newport Beach, CA 92660
Phone: 714-640-4000
Fax: 714--640-4918
Single room: $105.00
Double room: $115.00
Check-in time: 4:00pm
Check-out time: 12:00 noon

Cut-off date for reservations: July 24, 1997.

All reservation requests for arrival after 6:00 pm must be accompanied by a
first night room deposit, or guaranteed with a major credit card.  The
Newport Beach Marriott Hotel will not hold any reservations after 6:00 pm
unless guaranteed by one of the above methods.  Reservations received after
the cut-off time will be accepted on a space or rate available basis.

Reservations accepted without a credit card guarantee or advance deposit
are subject to cancellation at 6:00 pm on the day of arrival.

Air Transportation and Car Rental

Newport Beach, California - Get there for less!
Discounted fares have been negotiated for this event.  Call Conventions in
America at 1-800-929-4242 and ask for Group #428.  You will receive 5%-10%
off the lowest applicable fares on American Airlines, or the guaranteed
lowest available fare on any carrier. Travel between August 11-21, 1997.
All attendees booking through CIA will receive free flight insurance and be
entered in their bi-monthly drawing for worldwide travel for two on
American Airlines!  Hertz Rent A Car is also offering special low
conference rates, with unlimited free mileage.

Call Conventions in America - 1-800-929-4242, ask for Group #428.
Reservation hours: M-F 6:30am-5:00pm Pacific Time.
Outside US and Canada, call 619-453-3686/Fax 619-453-7679.
Internet: [email protected]/24-hour emergency service 1-800-748-5520.
If you call direct: American 1-800-433-1790, ask for index #S 9485.
Hertz 1-800-654-2240, ask for CV#24250.

Ground Transportation

The following information provided is the best available at press time.
Please confirm fares when making reservations.

Airport Connections

The Newport Beach Marriott Hotel provides complimentary airport
transportation to/from John Wayne /Orange County Airport.

Super Shuttle: 714-517-6600. The fare from LAX Los Angeles International
Airport to Newport Beach Marriott Hotel is $21.00 per person. Reservations
24 hours in advance are recommended.  Discover Card, traveller's checks and
cash is accepted.

Taxi

Taxis are available at John Wayne Airport.  Approximate fare from the
airport to downtown Newport Beach is $14.00.  Orange County Yellow Cab
Service: 714-546-1311.  The approximate taxi fare from LAX Los Angeles
International Airport to Newport Beach Marriott Hotel is $75.00-80.00.

Bus
Greyhound/Trailways Lines.  The depot is located at 100 W. Winston Road,
Anaheim, CA 92805.  For information on fares and scheduling, call
714-999-1256.

Rail

The Amtrak (Southern Pacific Railroad) stations are located at Santa Ana,
Irvine and Anaheim.  For general information and ticketing, call
1-800-872-7245.

City Transit System

OCTD (Orange County Transit District) serves Newport Beach, Balboa Island
and Corona del Mar. Basic local fare is $1.00.  For general information
call 714-636-RIDE.

Parking

Parking is available at the Newport Beach Marriott Hotel.   The daily rate
for valet parking is $6.00, and $8.00 overnight.  Self-parking is
complimentary.

Disclaimer: In offering American Airlines, Hertz Rent A Car, Newport Beach
Marriott Hotel, and all other service providers, (hereinafter referred to
as "Supplier(s)" for the Third International Conference on Knowledge
Discovery and Data Mining, AAAI acts only in the capacity of agent for the
Suppliers which are the providers of the service.  Because AAAI has no
control over the personnel, equipment or operations of providers of
accommodations or other services included as part of the KDD-97  program,
AAAI assumes no responsibility for and will not be liable for any personal
delay, inconveniences or other damage suffered by conference participants
which may arise by reason of (1) any wrongful or negligent acts or
omissions on the part of any Supplier or its employees, (2) any defect in
or failure of any vehicle, equipment or instrumentality owned, operated or
otherwise used by any Supplier, or (3) any wrongful or negligent acts or
omissions on the part of any other party not under the control, direct or
otherwise, of AAAI.

Newport Beach, California!

Newport Beach is located along the beautiful Pacific Ocean in Orange
County, California, nestled south of Los Angeles, north of San Diego,
southwest of Disneyland in Anaheim, and adjacent to John Wayne/Orange
County Airport.  Surrounded by one of the largest small-boat harbors in the
world and lazily stretching itself along more than six miles of scenic
Pacific coastline, Newport Beach beckons national and international
visitors to moor at the magnificient harbor  and discover "The Colorful
Coast".

Newport Beach Visitor Information

A Concierge Desk  is available in the Newport Beach Marriott Hotel. They
can assist with dining reservations, directions, tour bookings,
entertainment suggestions, and transportation information. Maps and
brochures are available.
URL: http://www.newport.lib.ca.us/NBCVB/NBCVB.html
************************************************************************

                     KDD-97 PREREGISTRATION APPLICATION

Name:
Company/Univ:
Dept/MS:
Address (Specify Home or Business):

City:
State:
Zip:
Phone & FAX:
Membership No:
Email Address:

************************************************************************

TECHNICAL PROGRAM  (Includes Proceedings)

      EARLY REGISTRATION                        LATE REGISTRATION
     (postmarked by June 10)                 (postmarked by July 15)
  AAAI Member        Nonmember              AAAI Member      Nonmember
  Regular Student  Regular Student        Regular Student  Regular Student
   $295    $95      $375    $155           $350    $125       $425   $180

(Students must send proof of student status to the AAAI Office. By joining
AAAI now, you can qualify for member rates. Membership information is
available from [email protected] or http://www.aaai.org.)

Total KDD-97 Conference Fee: ______

************************************************************************

TUTORIAL PROGRAM
Thursday, August 14
(Conference fee includes up to 4 consecutive tutorials & accompanying syllabi)

8:00-10:00 AM                   T1
10:30 AM-12:30 PM               T2, T3
1:30-3:30 PM                    T4, T5
4:00-6:00 PM                    T6, T7

Please list selected tutorial codes:

************************************************************************

KDD-97 Workshop
Sunday, August 17

$100 per person.

Total Workshop Fee: _______

************************************************************************

KDD-97 OPENING RECEPTION (Included in technical program registration)
Fee for spouse, child, or guest is $20 per person.

Total reception fee: ______

************************************************************************

Exhibit Registration
August 15, 1997

$250 per exhibitor. An exhibitor kit will be mailed upon receipt of
registration.

Total Exhibitor Fee: _______

************************************************************************


                             PAYMENT

Email registrations must be accompanied by a credit card number.

Total Amount Due: ______

Check one: Mastercard ___  Visa ___  American Express ___

Credit Card Account Number:

Expiration Date:

Name as it appears on card:

Forms cannot be processed if information is incomplete. The refund request
deadline is July 25, 1997. A $75.00 processing fee will be assessed for
refunds.
Registrations postmarked after July 15 are subject to onsite rates.

Mail completed application to [email protected] or fax to 415/321-4457.
Please note that there are security issues involved with the transmittal
of credit card information over the internet. AAAI will not be held liable
for any misuse of your credit card information during its transmittal
from you to AAAI.

For complete KDD-97 information, please visit AAAI's web site at
http://www.aaai.org.

Thank you for your registration!  See you at KDD-97

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Thu, 24 Apr 1997 08:42:49 -0400
From: [email protected] (Peter Turney)
Subject: Re: data mining from wafers manufacturing process

Dear Elisa:

> At our University, we are starting an application project
> dealing with data from a wafers manifacturing process.
> We are thinking to use data mining techniques
> for try to address the following problem.
> Some of those wafers are faulty. There is a database keeping track
> of the entire manifacturing process for each wafer and collecting
> large amount of data concerning each step of the manifacturing
> process (there are about 300 steps; each step is characterized
> about 100 parameters).  Our problem is use data mining techniques
> in helping the diagnosis, that is, to see which step
> may have caused the problem.
> 
> I was wondering whether you are aware of any use of data mining
> techniques for similar problems. We have also to acquire
> some suitable data mining tools.

Here are two relevant URLs for you:

1.   ftp://ai.iit.nrc.ca/pub/iit-papers/NRC-39163.ps.Z

     P. Turney. Data Engineering for the Analysis of Semiconductor 
     Manufacturing Data. IJCAI-95 Workshop on Data Engineering for 
     Inductive Learning: 50-59. 1995. 


2.   http://www.quadrillion.com/

     Quadrillion Corporation, makers of Q-Yield

Best wishes,
Peter.

http://ai.iit.nrc.ca/staff/peter.html

>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: [email protected]
Date: Thu, 24 Apr 97 11:18:57    
Subject: FW: new entry for siftware section 

<H2>Siftware: Delta Miner </H2>
<br><b>*URL:</b> <A HREF="http://www.bissantz.de"> http://www.bissantz.de</a> 
<br><b>*Description:</b>: 
Delta Miner 3.0 is a suite of easy to handle data mining instruments for 
financial controlling applications
and database analysis.  
<br><b>*Discovery tasks:</b> Clustering, Summarization,
 Deviation Detection, Visualization  

<br><b>*Comments:</b> Delta Miner 3.0 is a suite of data mining
instruments that analyzes complex data pools. Delta Miner's tools are
flexible: they lend themselves to a broad range of applications. A
common application is the analysis of financial controlling data.  Delta
Miner guides the user quickly and easily through complex data structures
down to the significant facts. In contrast to the simple "Drill-down"
capabilities of typical EIS and MIS tools, Delta Miner integrates a high
level of helpful automation. The system is capable of recommending the
best analysis paths, thereby relieving the controller from tedious
routine tasks. In addition to identifying the important trends, the tool
also points to the causes of those trends. Further analyses inform the
user about the best possible countermeasures to negative developments.
The basis techniques of the Delta Miner were developed at FORWISS, where
since 1993, a research group led by Prof. Dr. Peter Mertens has
intensively investigated algorithms for Data Mining. At it's first
presentation delta miner was recognized as one of the best three
products in the category "Business Management Solutions" at the Systems
'96 trade show in Munich.  A demoversion can be downloaded. 

<br><b>*Platform(s):</b> Windows 95, NT
<br><b>*Contact:</b> <pre>
Bissantz K�ppers & Company GmbH 
Am Weichselgarten 7 
91058 Erlangen 
Germany 
phone +49 9131 691-450 
fax +49 9131 691-455 
[email protected]
</pre>
<br><b>*Status: </b> Product
<br><b>*Updated:</b> 1997-04-11 by Dr. Nicolas Bissantz ([email protected]) 

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Wed, 30 Apr 1997 16:22:28 -0400
From: Pablo Tamayo <[email protected]>

Job Description:

Staff Member in the Technology Group
Researcher/Developer of Data Mining/KDD Technologies
Thinking Machines Corp.
4/30/97

- Provide technical and scientific expertise in core areas for Data
Mining and KDD, such as Machine Learning, Artificial Intelligence,
Statistics and High Performance Computing, to the development
organization and the company in general. Help to evaluate competing, new
or strategic technologies and algorithms for current or future releases
of Data Mining/KDD products (toolsets, KDD engines and vertical
applications).

- Design and develop state-of-the-art Machine Learning/Statistical
module prototypes. Be responsible for the support and maintenance of the
assigned modules.  Collaborate with the Software Engineering Group
to integrate these prototypes into products' software architecture
following development-wide software engineering guidelines. Provide
parallelism and performance enhancements for algorithms.  Help support
core algorithms in current products.

- Collaborate with the Data Analysis, Professional Services and
Technical Sales groups to study and choose appropriate algorithms and
methods for proof of concept studies or to integrate permanent solutions
for customers.

- Help write patents and provide technical assistance in patent related
issues. 

- Represent the company in relevant conferences, workshops, trade shows
or forums and follow Data Mining/KDD literature and trends in the KDD
academic and commercial communities.

If you are interested please contact:

Dr. Pablo Tamayo 
[email protected]
Thinking Machines Corp.
14 Crosby Dr.
Bedford, MA 01730

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: Eric Horvitz <[email protected]>
Date: Wed, 23 Apr 1997 13:53:39 -0700

Thirteenth Conference on Uncertainty in Artificial Intelligence

Please refer to the UAI '97 home page at http://cuai97.microsoft.com for
updated information on this summer's UAI conference and registration
procedures. UAI will follow right after AAAI in Providence. The page
also includes other information of interest, including details (...and
even some reading assignments) for the UAI '97 Full Day Course on
Uncertain Reasoning on Thursday, July 31.   The pages also contain
information on accomodations in Providence. 

Looking forward to seeing you this summer,

      Eric Horvitz
      Conference Chair

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Fri, 25 Apr 1997 14:45:21 -0400
Subject: The Gordian Institute's "Making Sense of Data: Computer-Aided 
         Pattern Discovery" course is scheduled for July 14-18 in
         Charlottesville, Virginia.  Refer to http://www.gordianknot.com
------------------------------------------------------------------------

The Gordian Institute, a division of American Heuristics Corporation (AHC), 
established July 14-18, 1997 in the historic town of Charlottesville near 
Monticello as the venue for the next offering of "Making Sense of Data: 
Computer-Aided Pattern Discovery."  

The intensive four and one-half day data mining course will take place in 
Charlottesville, Virginia with a start date of July 14, 1997.  The course 
includes live interactive demonstrations using data from real-world 
applications.  Participants need only have prior working experience with 
computers and familiarity with data related problems to benefit from the 
course.

Attendees will explore a host of advanced computing techniques and software 
tools used to discover useful patterns hidden in data.  The course surveys 
modern algorithms drawn from the fields of statistics, machine learning, data 
mining and inductive modeling which automatically build classifiers or 
estimators from a database.  You may never find another course that succinctly 
covers the essential parts of so many aspects of "data mining" with both 
theoretical and practical insights.  Topics to be presented are:

   -Pattern Discovery: An Overview
       -Inducing Models from Data: Benefits and Dangers 
       -The Data Mining Process 
       -Perspectives of Related Fields:
            -Statistics, Machine Learning, Data Mining
             and Artificial Intelligence
   -Data Issues
       -Case Diagnostics (Outlier, Influential, Leverage Points) 
       -Feature Creation and Selection
   -Classical Statistical Techniques
       -Linear: Regression and Discriminant Analysis 
       -Nonparametric: Scatterplot Smoothers, 
        Nearest Neighbors, Kernels
   -Key General Tools:
       -Scientific Visualization 
       -Resampling 
       -Optimization 
       -Clustering
   -Modern Methods
       -Neural Networks 
       -Polynomial Networks (ASPN, AIM) 
       -Decision Trees (CART)
   -Brief Survey of Other Methods
       -Projection Pursuit 
       -ASH (Average Shifted Histograms) 
       -MARS (Multivariate Adaptive Regression Splines) 
       -Radial Basis Functions
   -Comparing and Combining Methods

While increasingly awash in data, most organizations are unable to fully 
extract the useful information embedded within.  The practical techniques 
taught in this course can help you to discover and make sense of hidden 
patterns.  A key element of corporate efficiency must be the extraction of 
important information to support the decision making process and accurately 
predict and plan for future needs.  Those from government, industry and 
academia who see the need for non-linear modeling techniques, and who have 
particular applications not adequately solved with classic modeling techniques 
are target candidates for this course. 


                   Direct Quote from Course Evaluation Sheet:
"I felt this course was far superior to many  others that I have been exposed 
to.  Most notably, the instructors were not only clearly experts but were not 
biased toward any one software package or technique.  The instructors also 
emphasized targeting the users' specific applications (including analyzing 
sample data brought in by the students).  This is exceptionally useful.  Great 
value for the $.  What was most valuable to me was the presentation of a broad 
range of both analytical techniques and software tools for solving various 
problems.  This helps to give me the 'big picture' and allows me to best 
determine what technologies are most applicable and useful to me."
                     -Andy Kalish, Eastman Kodak


The Instructors:
John F. Elder IV, PhD, and Dean Abbott of Quantitative Solutions explain the 
methods used inside leading commercial and academic software, providing 
practical tips and techniques on feature extraction and neural network problem 
solving.  The course instructors each have more than a decade of experience in 
applying adaptive, data-driven techniques to practical problems.  

Dr. Elder has developed or refined some of the methods covered in this course.  
He is Chief Scientist at Quantitative Solutions and Adjunct Professor at the 
University of Virginia, and has authored four book chapters and numerous 
articles on adaptive methods of pattern discovery.  He has been a researcher 
at Rice University and at an engineering consulting firm, and was Director of 
Research for an investment management company.  Dr. Elder is a frequent 
lecturer on pattern discovery techniques, and is the technical chair of the 
Adaptive and Learning Systems Group of the IEEE Systems, Man, and Cybernetics 
Society.

Dean W. Abbott is a Senior Research Scientist at Quantitative Solutions.  He 
has applied data mining techniques to challenges in optimum guidance and 
control, optical character recognition, image pattern recognition, and radar 
and multi-spectral signal processing.  Mr. Abbott has developed pattern 
recognition software that is sold commercially, and has written and lectured 
on novel applications of feature selection, polynomial network, and pattern 
recognition techniques to solve real-world problems in several fields.


Pricing Information:
Registration for this four and one-half day course is $1995.  Government and 
academic discounts may apply.  Lodging details and directions may be viewed at 
http://www.gordianknot.com, or obtained by providing a fax number or Email 
address to (800) 405-2114 or [email protected].  You may also send a 
message to  [email protected]  with "newsletter" in the subject field to 
receive a quarterly electronic newsletter from The Gordian Institute.  

If you have remaining questions regarding the course, a knowledgeable 
representative may be contacted directly at (800) 405-2114.  Seats may also be 
secured through Gordian's web site.  Space is limited to 24 seats, so go to 
your browser, set it to  http://www.gordianknot.com  and reserve your place!

                       __________________________
                         The Gordian Institute
                       http://www.gordianknot.com
                          [email protected]
                            (800) 405-2114
                       __________________________


The parent company, American Heuristics Corporation (AHC) is a founding member 
of the West Virginia High Technology Consortium, with headquarters in 
Triadelphia, West Virginia.  AHC is an advanced software technology consulting 
company applying hybrid software solutions to complex technical problems in 
business, industry and government.  AHC may be found on the web at: http://
www.heuristics.com

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "Prof. Zicari" <[email protected]>
Date: Sun, 27 Apr 1997 00:10:18 +0200 (METDST)

I would like to inform you that the conference programs of
COMDEX Internet & OBJECT WORLD Frankfurt`97  (October 7-10)
are now available on line at :

http://www.ltt.de

The web site will be updated on a regular base. 

If you have any questions, please send me an e-mail at 
[email protected].
 
Best Regards

Roberto Zicari
Chair Advisory Board,
COMDEX Internet & OBJECT WORLD Frankfurt.

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

410.2597:16IJSAPL::OLTHOFSpellchecked Henry AlthoughSun May 11 1997 19:52765
Knowledge Discovery Nuggets 97:16, e-mailed 97-05-08
Publications:
	* GPS, first issue of DMKD journal is published!
		http://www.research.microsoft.com/research/datamine/ 
	* Gerhard Widmer, CfP: MLJ Special Issue on Context Sensitivity and 	Concept Drift (http://www.ai.univie.ac.at/mlj_specissue/)
Siftware:
	* Larry Bouchie, Cognos new Data Mining Tool: Scenario
	* Aleksander Oehrn, Rosetta - rough-set tool for data analysis
	http://www.idt.unit.no/~aleks/rosetta/rosetta.html 
Positions:
	* Gregory Piatetsky-Shapiro, Data Mining Company looking for 
	experts in decision trees and/or bayesian networks  
	* Donal Lyons, Data Mining Research Position in Ireland
	* Yike Guo, Data Mining Job at Fujitsu (Japan)
Meetings:
* Pavel Brazdil, The Workshop on "Extraction of Knowledge from Data Bases" (EKBD'97), Coimbra, Portugal, October 6-9, 1997 
       	http://alma.uc.pt:80/~epia97/EKBD97.html
	* Michael Berthold, IDA-97 Call for Participation
		http://web.dcs.bbk.ac.uk/ida97.html
	* Staal Vinterbo, PKDD'97 Call for participation, 
		Trondheim, Norway, June 24-27, 1997,
		http://www.idi.ntnu.no/pkdd97/
	* Rob Tibshirani, Statistical prediction methods for finance                       		and marketing, New York City: June 23-24,
1997,
		http://stat.stanford.edu/~trevor/mrc.finance.html
	* Angi Voss, Workshop on Social Agents at ECSCW97 Conference
             September 7, 1997
		http://orgwis.gmd.de/projects/SAW/ecscw97SoAg.html
--
Knowledge Discovery Nuggets is a free electronic newsletter for the 
Data Mining and Knowledge Discovery community, focusing on the 
latest research and applications.

Submissions are most welcome and should be emailed, with a 
DESCRIPTIVE subject line (and a URL) to [email protected]. 
Please keep CFP and meetings announcements short and provide 
a URL for details.
 
To subscribe, see http://www.kdnuggets.com/subscribe.html 

KD Nuggets frequency is 3-4 times a month. 
Back issues of KD Nuggets, a catalog of data mining tools 
("Siftware"), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site 
at http://www.kdnuggets.com/

	-- Gregory Piatetsky-Shapiro (editor)
        [email protected]

********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not 
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
About the Deep Blue -- Kasparov match, 
"I just think we should look at this as a chess match," he said, "between the
          world's greatest chess player and Garry Kasparov." 
                Louis Gerstner, IBM Chairman
>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Thu, 8 May 1997 09:41:10 -0500 (EST) 
From: GPS <[email protected]>
Subject: First Issue of DMKD journal 

The first issue of DMKD journal has finally been published! 
see http://www.research.microsoft.com/research/datamine/vol1-1/default.htm

The beautiful black and white cover shows an Escher-inspired picture
of several robots inside a mysterious structure (a data mine?), and 
contents include 
an editorial by Usama Fayyad, 4 excellent technical papers,

* Statistical Themes and Lessons for Data Mining
Clark Glymour, David Madigan, Daryl Pregibon, Padhraic Smyth 

* Data Cube: A Relational Aggregation Operator Generalizing Group-by, 
Cross-Tab, and Sub Totals
Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, and Hamid Pirahesh

* On Bias, Variance, 0/1 - loss, and the Curse-of-Dimensionality
Jerome H. Friedman 

* Bayesian Networks for Data Mining, David Heckerman

and a brief application summary: 
* Advanced Scout: Data Mining and Knowledge Discovery in NBA data,
Inderpal Bhandari, Ed Colet, Jennifer Parker, Zachary Pines, Rajiv Pratap, Krishnakumar Ramanujam
Sample copies of first issue will be mailed soon.

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Wed, 30 Apr 1997 11:09:50 +0200 (MET DST)
From: Gerhard Widmer <[email protected]>
Subject: CfP: MLJ Special Issue on Context Sensitivity and Concept Drift 


                       Machine Learning Journal
	Special Issue on Context Sensitivity and Concept Drift
           Miroslav Kubat and Gerhard Widmer, Guest Editors

   MOTIVATION AND RESEARCH ISSUES

   In many machine learning applications, the features given to the
   learning program do not capture all aspects of the application problem.
   This is a limitation shared with all forms of modeling -- even the
   person who formulates the learning problem may not be aware of all of
   the relevant context. Examples from the history of machine learning
   and pattern recognition include omitting illumination features in
   computer vision and omitting language accents in speech recognition
   systems. A similar problem arises when the relevant features are
   included, but the training examples do not provide enough variation
   of those features to permit the learning algorithm to detect their
   relevance. For example, if foreign accent features are included in a
   speech recognition system, but all training examples are from native
   speakers, then the foreign accent features will be ignored by the
   learning system.  

   Relevant context may also change with time, so that a classifier
   trained on one set of training examples (where a contextual feature
   was absent or held constant) may suddenly begin to perform badly when
   the context changes.  Gradual or abrupt changes in context often
   become apparent in the form of {\em concept drift}. For situations
   where a concept gradually evolves over time in a certain general
   direction (such as the concept ``computer''), the term {\em concept
   evolution} has sometimes been used. Tracking concept drift on-line
   requires a learner to continually monitor its performance and adjust
   its hypotheses if necessary. It might also require the learner to
   "forget" old, outdated information.

   In batch learning, problems may arise if the training data were
   collected in batches from different contexts, or if the training
   data were gathered in one setting but the test data are drawn from
   a different setting.  Again, effective learning requires the recognition
   of such discontinuities and the ability to adapt hypotheses to
   different conditions.

   This special issue is devoted to theoretical and empirical studies
   of methods for detecting missing context, tracking concept drift, 
   adapting learned knowledge to new contexts, and identifying and
   reasoning about contextual effects and concept changes in learning.
   We encourage submissions addressing one or more of the following
   research issues:

   . on-line tracking of concept drift and concept evolution
   . theoretical results concerning concept drift and contextual influences
   . formal definitions of context and its effects on concept learning
   . real-world applications involving context changes and/or concept drift
   . representation of context-sensitive concepts
   . representation of context
   . recognition of context and reasoning about context
   . adaptation of learned knowledge to new contexts

   Both theoretical and more practically oriented papers are welcome,
   but we do encourage papers that provide real-world examples of context
   sensitivity and concept drift and compare multiple ways of addressing
   the problems that arise.


   SUBMISSION INFORMATION:

   The expected length is 8000-12000 words for a full paper, or 2000-4000
   words for a Research Note (full-page figures count for 400 words).
   Electronic submission via e-mail is STRONGLY ENCOURAGED. Postscript
   files (compressed or gzipped, uuencoded) should be sent to
   [email protected].

   For hardcopy submissions, please send 5 copies of the manuscript to:

      Gerhard Widmer
      Austrian Research Institute for Artificial Intelligence
      Schottengasse 3  
      A-1010 Vienna
      Austria
      Tel: +43-1-53532810
      Fax: +43-1-5320652
      e-mail: [email protected]

   The submission deadline is September 15, 1997.

   see http://www.ai.univie.ac.at/mlj_specissue/ for full details.

   The special issue is scheduled to appear in the summer of 1998.

>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Mon, 28 Apr 1997 13:38:14 +0200
To: [email protected]
From: Aleksander Oehrn <[email protected]>
Subject: Rosetta availability

===================================================
Rosetta -- A Rough Set Toolkit for Analysis of Data
===================================================

Rosetta is a toolkit for analyzing tabular data within the framework of
rough set theory, and consists of a computational kernel and a GUI
front-end. The Rosetta GUI reflects the contents of the kernel, and runs on
PCs operating under Windows NT or Windows 95.

A limited version of Rosetta is made publicly available for non-commercial
use. The downloadable program is limited in the sense that algorithms from
the embedded RSES library are not applicable to decision tables larger than
some predetermined size (currently 500 objects and 20 attributes).

   http://www.idt.unit.no/~aleks/rosetta/rosetta.html

The software (including documentation) is provided "as is" without warranty
of any kind.

Kernel architecture and front-end designed and implemented at the Knowledge
Systems Group, Dept. of Computer and Information Science, Norwegian
University of Science and Technology, Norway. Sections of the computational
kernel (RSES) developed at the Logic Group, Inst. of Mathematics,
University of Warsaw, Poland.

Rosetta is designed to support the overall KDD process; from initial
browsing and preprocessing of the data, via reduct computation and rule
generation, to validation and analysis of the extracted rules.

Some of the features currently offered by the computational kernel include
amongst others:

- Completion of decision tables with missing values
  according to various completion strategies.
- Computation of partitions and rough set approximations
  within the variable precision model.
- Sampling of subtables for validation purposes.
- Discretization of numerical attributes with various
  discretization algorithms.
- Computation of reducts (both in the standard sense as well
  as object-related ones). Various approximation algorithms
  (e.g. genetic algorithms) are offered, as well as exhaustive
  computation via discernibility matrices. Dynamic reducts can
  be computed.
- Generation of propositional rules.
- Shortening and pruning of sets of reducts and rules. 
- Exporting of rules, reducts and tables, e.g. to Prolog. 
- Application of synthesized rules to unseen examples by means
  of various classification strategies, e.g. voting.
- Generation of confusion matrices. 

Some of the features currently offered by the Rosetta GUI include amongst
others:

- Full Windows GUI conformance. 
- Organization of project items in a tree-structure in order to
  retain data-navigational abilities. 
- Viewing of all structures in intuitive grid environments, using
  terms from the modelling domain.
- Context-sensitive menus. 
- Drag and drop functionality. 
- Masking of attributes, enabling one to work with "virtual"
  tables. 
- Automatic generation of annotations, thus documenting the
  modelling session. 
- A prototype environment for interactive classification and guidance
  on the basis of incomplete information, using a selected set of
  synthesized rules. 
- On-line help. 
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Wed, 7 May 1997 17:37:13 -0400
From: Larry Bouchie <[email protected]>

Cognos' Scenario data mining product was released
last month. Cognos' main Web page is at http://www.cognos.com and the
Scenario site is at http://www.cognos.com/busintell/products/scenario.html

Concise background and a review are at
http://www8.zdnet.com/pcweek/reviews/0505/05mining.html

COGNOS UNVEILS SCENARIO FOR DATA MINING
-- New Data Mining Software Joins Cognos' Market-Leading Business
Intelligence Tools, PowerPlay" For OLAP And Impromptu" For Query &
Reporting --

BURLINGTON, MA, March 3, 1997 -- Cognos (NASDAQ:COGNF; TSE:CSN) today
announced its newest business intelligence tool, Scenario, for
enterprise-wide guided data analysis and data mining.  Scenario extends the
industry's most comprehensive business intelligence product family, joining
Cognos' market-leading PowerPlay, the universal online analytical
processing (OLAP) client, and the award-winning Impromptu query and
reporting tool.

Designed for spotting patterns and exceptions in business data that might
otherwise be missed, Scenario's sophisticated interface allows users to
readily visualize the business information being uncovered.  It automates
the discovery and ranking of critical factors impacting a business, exposes
hidden relationships between factors and establishes thresholds and
benchmarks.  An intuitive, cost-effective desktop tool, Scenario liberates
data mining from what is typically an expensive and time-consuming process.
Insights derived using Scenario are achieved directly by those best
positioned to use the knowledge and effect rapid change.

Designed to support faster business decision-making, Scenario:
 * makes data mining immediately accessible to decision makers;
 * simplifies business data analysis by filtering out insignificant business
variables and relationships;
* validates business hypotheses by showing and ranking critical factors and
relationships;
* leads to new business insights by automating information discovery; and
* integrates with Impromptu and PowerPlay as best-of-breed components in
the Cognos enterprise business intelligence solution.

 "With Scenario, Cognos is delivering a very important technology to
business analysts," said George Azrak, national director of IS development
at Domino's Pizza.  Domino's Pizza has been working with early versions of
Scenario, and has provided Cognos with valuable input from an end user's
point of view.
 "Accessible data mining is the long-awaited third wave in the data
warehousing revolution," said Alan Rottenberg, Cognos' senior vice
president, Business Intelligence Tools.  "First query and reporting brought
data to the desktop, then OLAP technologies enabled the convenient
navigation of massive data warehouses.  Data mining is the technological
leap that automates the information discovery process.

Rottenberg continued, "Impromptu gives access to the numbers and data on
which a business runs.  PowerPlay lets individual managers explore that
data without an army of programmers.  Scenario works alongside both of
those products to refine business data to distinguish what really matters.
Drawing a straight line to the bottom line, this product completes the
spectrum of business intelligence tools that can arm knowledge workers with
the insight to truly understand the data that drives a business -- and to
reap the competitive rewards."

Scenario uses statistical methods that go beyond "tree" analysis.  For
example, one such method is a data segmentation capability based on CHAID
(Chi-Squared Automatic Interaction Detection) technology.  CHAID allows
users to find statistically relevant relationships and trends within large
repositories of business data by "refining" it down to the most useful
nuggets that have the greatest effect on the results being tracked.
Subsequent releases of Scenario will include neural-network modeling and
forecasting capabilities, using technologies from recently acquired Right
Information Systems.

Pricing and Availability
Available from Cognos for $695, Scenario 1.0 for Windows 95 or Windows NT
requires an IBM-compatible 486 PC and 8 MB of RAM.

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Thu, 8 May 1997 10:40:10 -0500 (EST) 
From: Gregory Piatetsky-Shapiro <[email protected]>
Subject: Looking for experts in decision trees and/or bayesian networks

** Data Mining Consulting and Integration Company is looking for 
experts in decision trees and/or bayesian networks **

TASK: Participate in the design, development, and deployment of leading
edge integrated data mining and customer modeling systems, primarily in
the financial area.  Perform quick data mining studies using a variety of 
different approaches and tools.

The candidates will join a team of world-class experts in data
warehousing, data mining and knowledge discovery.

Ideal candidates will have a Ph.D. in Machine Learning, Statistics,
 or related fields and 2-3 years of experience, or an M.S. with an 
equivalent experience.  The candidates should have expertise with
different modeling approaches, but primarily
with with decision trees/rules or with bayesian belief networks.
The candidates should be familiar with statistical theory and have practical
experience with databases. 

Excellent coding skills in C/Java/Unix environment along with 
good system maintenance practices and the ability to
quickly pick up new systems and languages are needed.  
The candidates should also have good communication skills, be 
able to work in a team, and be able to enjoy the exciting atmosphere of
a start-up company. 

Most of all, candidates should have the passion for developing and
applying innovative methods for solving practical problems.  
 
We offer very competitive salaries, and our outstanding benefits include
profit sharing, stock options, medical/dental insurance, and a 401(k)
plan. 

The data mining branch of the company is conveniently located in the
Cambridge area, easily accessible by public transportation. 

Proper work authorization required.

Please email your resume and a cover letter (in plain ASCII, please) to: 

Gregory Piatetsky-Shapiro, Ph.D.
Director of Applied Research
Geneve Consulting Group
545 Concord Ave
Cambridge MA 02138
	email: [email protected]
	tel: 617-661-1358 
	fax: 617-491-4936
	URL: http://www.kdnuggets.com/gps.html

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Subject: Data Mining Research Position possibility.
Date: Sat, 26 Apr 1997 11:57:24 +0100
From: Donal Lyons <[email protected]>

Currently there is EU funding available for experienced researchers to
spend a year in countries such as Ireland.  I wish to explore the
possibility of using this funding to help develop a Data Mining Interest
Group within the School of Systems and Data Studies in Trinity College,
Dublin.

I'd like to discuss this further with any experienced EU researchers who
are at least tentatively interested.

Regards,
Donal.

Donal Lyons,                      Phone (1000-1700 GMT) +353 1 608 1919
Lecturer (Information Systems)    Phone Messages        +353 1 608 1767
School of Systems & Data Studies
Trinity College, Dublin 2,        FAX                   on request
Ireland.
................http://www2.tcd.ie/Statistics/staff/dlyons.html........
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Mon, 5 May 97 11:48 BST
From: Yike Guo <[email protected]>
Subject: Job in Japan 

A Fujitsu subsidiary company which is developing OLAP and datamining tools
is now looking for a foreign engineer who is interested in working in Japan.

  Carrier opportunity for a programing engineer in Japan

    Duties
      Designing and programing data mining products which include
      a visualizing OLAP client.

    Requirements
      - BS or MS degree related to computer science
      - C programming skill (VC++ on NT background is best)
      - Familiarity with datamining, visualization, or OLAP
      - Native English speaker

    Contact
      Fujitsu SWE, Manager Mr. Katoh
      E-mail: [email protected]

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Tue, 29 Apr 1997 19:30:03 +0200 (MET DST)
From: Pavel Brazdil <[email protected]>

                          Call for Participation 
           The Workshop on "Extraction of Knowledge from Data Bases" 
                                 EKBD'97 
                   http://alma.uc.pt:80/~epia97/EKBD97.html

                         Under the auspices of the 
      Portuguese Conference on Artificial Intelligence (EPIA'97) Coimbra, 
                        Portugal, October 6-9, 1997 

October, 7-8, 1997 
Coimbra University Physics Building 

Aims of the Workshop 
 This workshop is in the area of Extraction (or Discovery) of Knowledge from
Data Bases and Data Mining, which are rather recent but expanding
rapidly. The objective of the workshop is to discuss methods for non-trivial
extraction of information which is implicit in the existing data and which
can be represented in a high-level language so as to facilitate interpretation. 
EKBD'97 welcome original papers in English on the following topics: 

- Machine Learning methods useful in KDD and Data Mining, 
  (decision tree /rule induction, relational learning (ILP) etc.) 

- Statistical methods useful in KDD and Data Mining, 
  (multivariate analysis, principle components, clustering, regression 
  methods etc.), 

- Reduction of complexity through preprocessing, 
  (identification of relevant attributes, data sampling, clustering, etc.), 

- Data summarization and consolidation, 
- Languages useful in describing user's hypotheses, 
- Applications of KDD and Data Mining,
- other related areas of interest. 

Workshop Format and Attendance Requirements: 
The workshop will include invited talks, paper presentations and a panel
discussion. The workshop will last 1-2 days. 

Papers in English, with no more than 15 pages are welcome. 
Attendees should be registred to the main EPIA conference. 
(see http://alma.uc.pt:80/~epia97)

Submit 3 copies of the full paper to the address below: 
Pavel Brazdil 
LIACC, Universidade do Porto, 
R. Campo Alegre, 823, 
4150 PORTO, PORTUGAL 

Text format should follow Springer Verlag Lecture Notes Series. 
English is the official language of the workshop. 

Important dates: 
June, 16: submissions due 
July, 15: notifications sent 
September, 8: final versions due 

Programme Committee:
Pavel Brazdil, Univ.Porto (chair) 
Arlindo Oliveira, IST 
Carlos Bento, U. Coimbra 
Ernesto Costa, U. Coimbra 
Fernando Moura-Pires, UNL-FCT 
Fernando Nicolau, UNL-FCT 
Helena Bacelar Nicolau, UNL-FCT 
Joaquim Pinto da Costa, Univ. Porto 
Paulo Azevedo, Univ. Minho 
Paula Brito, Univ. Porto 
Paulo Gomes, INE, Porto 

Organizing Committee:
Pavel Brazdil (chair) 
LIACC, Universidade do Porto, R. Campo Alegre, 823,
4150 PORTO, PORTUGAL
email: [email protected]
Tel.: (02) 600 1672, Fax: (02) 600 3654

Fernando Moura-Pires 
UNL-FCT, Dept. Informatica, Quinta da Torre
2825 Monte da Caparica, PORTUGAL
email: [email protected]
Tel.: (01) 295 4464, Fax: (01) 295 5641 

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Subject: IDA Call for Participation
Date: Thu, 8 May 1997 17:43:12 +0200
From: Michael Berthold <[email protected]>

                        CALL FOR PARTICIPATION

  The Second International Symposium on Intelligent Data Analysis (IDA-97)
                 Birkbeck College, University of London
                         4th-6th August 1997

                         In Cooperation with
          AAAI, ACM SIGART, BCS SGES, IEEE SMC, and SSAISB

               [ http://web.dcs.bbk.ac.uk/ida97.html ]

You are invited to participate in IDA-97, to be held in the heart of London. 
IDA-97 will be a single-track conference consisting of oral and poster 
presentations, invited speakers, demonstrations and exhibitions. The 
conference Call for Papers introduced a theme, "Reasoning About Data", 
and many papers complement this theme, but other, exciting topics have emerged,
including exploratory data analysis, data quality, knowledge discovery and 
data-analysis tools, as well as the perennial technologies of classification 
and soft computing. A new and exciting theme involves analyzing time series 
data from physical systems, such as medical instruments, environmental data
and industrial processes.

Information regarding registration as well as the preliminary technical
program can be found on the IDA-97 web page (address listed above). Please
note that there are reduced rates for early  registration (before 2nd June).
Also there are still a limited number of spaces available for exhibition,
and potential exhibitors are encouraged to book early (the application
deadline is 2nd June).
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: "Staal Vinterbo" <[email protected]>
Message-Id: <[email protected]>
Date: Tue, 6 May 1997 18:05:56 +0200
X-Mailer: Z-Mail (3.2.1 10oct95)
To: [email protected]
Subject: PKDD'97 Call for participation
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Status: U
X-Mozilla-Status: 0001
Content-Length: 4951

Dear Sir.
I am asking on behalf of Prof. Komorowski that the following call for
participation is distributed via the kdd nuggets mailinglist.
Thank you.

                      PKDD'97 -- Call For Participation

                   1st European Symposium on Principles of
                     Data Mining and Knowledge Discovery
                              Trondheim, Norway
                              June 24-27, 1997

                           Tutorials: June 24-25
                           Symposium: June 26-27

 Data Mining and Knowledge Discovery (KDD) have recently emerged from a
 combination of many research areas: databases, statistics, machine
 learning, automated scientific discovery, inductive programming,
 artificial intelligence, visualization, decision science, and high
 performance computing.

 While each of these areas can contribute in specific ways, KDD focuses on
 the value that is added by creative combination of the contributing areas.
 The goal of PKDD'97 is to provide a European-based forum for interaction
 among all theoreticians and practitioners interested in data mining.
 Fostering an interdisciplinary collaboration is one desired outcome, but
 the main long-term focus is on theoretical principles for the emerging
 discipline of KDD, especially those new principles that go beyond each of
 the contributing areas.

Please look at the PKDD'97 Homepage (http://www.idi.ntnu.no/pkdd97/) for
detailed information and news about the symposium.

Registration Information is available at 
http://www.idi.ntnu.no/pkdd97/fees.html

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Sun, 4 May 97 12:10 EDT
Subject: Modern Regression and Classification course -  New York

        ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
        +++                                                        +++
        +++        Modern Regression and Classification:           +++
        +++                                                        +++
        +++            Statistical prediction methods for finance  +++
        +++                      and marketing                     +++
        +++                                                        +++
        +++                                                        +++
        +++         New York City: June 23-24, 1997                +++
        +++                                                        +++
        +++            Trevor Hastie, Stanford University          +++
        +++          Rob Tibshirani, University of Toronto         +++
        +++                                                        +++
        ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

This two-day course will give a detailed overview of statistical models
for regression and classification. Known as machine-learning in
computer science and artificial intelligence, and pattern recognition
in engineering, this is a hot field with powerful applications in
finance, science and industry.

This course covers a wide range of models from linear regression
through various classes of more flexible models to fully nonparametric
regression models, both for the regression problem and for
classification.

This special version of our popular MRC course is tailored to financial
and marketing professionals.

Although a firm theoretical motivation will be presented, the emphasis
will be on practical applications and implementations, especially in
the finance and marketing areas. The course will include many examples
and case studies, and participants should leave the course well-armed
to tackle real problems with realistic tools. The instructors are at
the forefront in research in this area.

After a brief overview of linear regression tools, methods for
one-dimensional and multi-dimensional smoothing are presented, as well
as techniques that assume a specific structure for the regression
function. These include splines, wavelets, additive models, MARS
(multivariate adaptive regression splines), projection pursuit
regression, neural networks and regression trees. All of these can be
adapted to the time-series framework for predicting future trends from
the past.

The same hierarchy of techniques is available for classification
problems. Classical tools such as linear discriminant analysis and
logistic regression can be enriched to account for nonlinearities and
interactions. Generalized additive models and flexible discriminant
analysis, neural networks and radial basis functions, classification
trees and kernel estimates are all such generalizations. Other
specialized techniques for classification including nearest- neighbor
rules and learning vector quantization will also be covered.

Apart from describing these techniques and their applications to a wide
range of problems, the course will also cover model selection
techniques, such as cross-validation and the bootstrap, and diagnostic
techniques for model assessment.

Software for these techniques will be illustrated, and a comprehensive
set of course notes will be provided to each attendee.

Additional information is available at the Website:

http://stat.stanford.edu/~trevor/mrc.finance.html
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 05 May 1997 12:45:27 +0200
From: Angi Voss <[email protected]>
Subject: Workshop on Social Agents

               "Social Agents in Web-Based CollaborationTS

                         at the ECSCWP297 Conference

                              September 7, 1997

Organizers: Thomas Kreifelts, Angi Voss, Gloria Mark, Arnstein Borstad,
Vidar Hepsoe

Abstract
--------

We see signs today that the Web is moving toward an environment where
new social and collaborative interactions are being realized. Rather
than continuing to evolve as a single-user environment, the Web is
beginning to be regarded as an environment where reciprocity and
awareness of othersP2 activities have an important function. Software
agents can help develop and support the process of reciprocity by
helping people find others with similar interests, and helping match
knowledge to the right people. Agents can also help people collectively
construct knowledge, shaped around their needs. 

This full-day workshop is intended for designers and researchers from
academia and industry to discuss the role of agents in dealing with
social information. How can social agents be integrated into
collaborative relationships so that information and expertise can be
distributed and matched to the right people, where appropriate
relationships can be developed, and where collective knowledge can be
established?

Participation requires the submission of an input paper (3-6 pages) that
should try to address the points described above, from any of the
following aspects:

-experiences with agent use in collaboration
-design of agent systems
-application areas
-interface design

The paper should be sent for review by June 15 to:

                    Thomas Kreifelts
                    GMD-FIT.CSCW
                    D-53754 Sankt Augustin
                    Germany 
                    Email: [email protected]
                    Fax: +49-2241-142084 

Electronic submission is encouraged, HTML being the preferred format.
The selection of participants will be based on the input papers.
Accepted participants will be notified before the end of June so that
they can take advantage of early registration by July 1. For those who
are interested in submitting a paper to the workshop, but are not able
to meet the June 15 deadline, please contact the organizers as soon as
possible expressing your interest to participate in the workshop. The
accepted input papers will be distributed electronically in advance to
the workshop participants. The workshop will be structured around the
presentation of selected input papers to stimulate the discussion. Note
that participation in the workshop requires participation in the ECSCW
97 conference.


Important Dates:
----------------

   June 15, 1997 - Deadline for submissions

   end of June - Notification of acceptance

...July 1, 1997 - Early registration deadline for the ECSCW '97
conference

  September 7, 1997 - The Workshop

For more information: http://orgwis.gmd.de/projects/SAW/ecscw97SoAg.html

Angi Voss	GMD FIT 	D-53754 Sankt Augustin
phone: (+49) 2241-142726
fax: (+49) 2241-142384
e-mail: [email protected]
URL:  http://nathan.gmd.de/persons/angi.voss.html

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
410.2697:17IJSAPL::OLTHOFSpellchecked Henry AlthoughSat May 17 1997 12:40697
Knowledge Discovery Nuggets 97:17, e-mailed 97-05-15
Publications:
    * Phil Chan, CFP: MLJ special issue on IMLM, 
	http://www.cs.fit.edu/~imlm/
Siftware:
    * P. Spedding, Cognos' Scenario Wins PC Week Labs Analyst's 
	Choice Award, http://www8.zdnet.com/pcweek/reviews/0505/05mining.html
Positions:
    * COMPUTATIONAL FINANCE at the Oregon Graduate Institute of Science &
	Technology (OGI), http://www.cse.ogi.edu/CompFin/ 
    * George Smith, Research Assistant Position at UEA, Norwich, UK
Meetings:
    * Lipo Wang, 2nd Pacific-Asia Conference on Knowledge Discovery and
       Data Mining (PAKDD-98), Melbourne, Australia, 15-17 April 1998,
        http://www.sd.monash.edu.au/pakdd-98
    * David Leake, ICCBR-97: First Call for Participation,
	http://www.iccbr.org/iccbr-97.html
    * Hakan Erdogmus, CASCON'97 CfP, http://www.cas.ibm.ca/cascon/ 
    * John R. Koza, GP-97 Revised Call for Participation,
	http://www-cs-faculty.stanford.edu/~koza/gp97.html
--
Knowledge Discovery Nuggets is a free electronic newsletter for the 
Data Mining and Knowledge Discovery community, focusing on the 
latest research and applications.

Submissions are most welcome and should be emailed, with a 
DESCRIPTIVE subject line (and a URL) to [email protected]. 
Please keep CFP and meetings announcements short and provide 
a URL for details.
 
To subscribe, see http://www.kdnuggets.com/subscribe.html 

KD Nuggets frequency is 3-4 times a month. 
Back issues of KD Nuggets, a catalog of data mining tools 
("Siftware"), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site 
at http://www.kdnuggets.com/

	-- Gregory Piatetsky-Shapiro (editor)
        [email protected]

********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not 
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
If the fool would persist in his folly he would become wise.
        William Blake
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 

>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: "IMLM Workshop (pkc)" <[email protected]>
Subject: CFP: MLJ special issue on IMLM

Dear colleagues,

Here is a CFP for the Machine Learning Journal special issue on IMLM.
Submission is due on Oct 1st, 97.  Hope you can submit.  Thanks.

Phil, Sal, and Dave

------
                             CALL FOR PAPERS
 
                        Machine Learning Journal
                            Special Issue on
 
 
                  Integrating Multiple Learned Models
         for Improving and Scaling Machine Learning Algorithms
 
 
 Most modern Machine Learning, Statistics and KDD techniques use a
 single model or learning algorithm at a time, or at most select one
 model from a set of candidate models. Recently however, there has been
 considerable interest in techniques that integrate the collective
 predictions of a set of models in some principled fashion.  With such
 techniques often the predictive accuracy and/or the training
 efficiency of the overall system can be improved, since one can "mix
 and match" among the relative strengths of the models being combined.
 
 Any aspect of integrating multiple models is appropriate for the
 special issue.  However we intend the focus of the special issue to be
 on the issues of improving prediction accuracy and improving training
 efficiency in the context of large databases.
 
 
 Submissions are sought in, but not limited to, the following topics:
 
 1) Techniques that generate and/or integrate multiple learned
    models.   Examples are schemes that generate and combine
    models by
 
         * using different training data distributions
                 (in particular by training over different partitions
                 of the data)
         * using different sampling techniques to generate different 
                 partitions
         * using different output classification schemes
                 (for example using output codes)
         * using different hyperparameters or training heuristics
                 (primarily as a tool for generating multiple models)
 
 2) Systems and architectures to implement such strategies.
    For example,
 
         * parallel and distributed multiple learning systems
         * multi-agent learning over inherently distributed data
 
 3) Techniques that analyze the integration of multiple learned models for
 
         * selecting/pruning models
         * estimating the overall accuracy
	 * comparing different integration methods
         * tradeoff of accuracy and simplicity/comprehensibility
 
Schedule:
 
         October 1: Deadline for submissions
         December 15: Deadline for getting decisions back to authors
         March 15: Deadline for authors to submit final versions
         August 1998: Publication
 

 Submission Guidelines:

 1) Manuscripts should conform to the formatting instructions in:

	 http://www.cs.orst.edu/~tgd/mlj/info-for-authors.html 

    The first author will be the primary contact unless otherwise stated.

 2) Authors should send 5 copies of the manuscript to:
 
         Karen Cullen
         Machine Learning Editorial Office
         Attn: Special Issue on IMLM
         Kluwer Academic Press
         101 Philip Drive
         Assinippi Park
         Norwell, MA 02061
         617-871-6300
         617-871-6528 (fax)
         [email protected]
 
    and one copy to:
 
         Philip Chan
         MLJ Special Issue on IMLM
         Computer Science
         Florida Institute of Technology
         150 W. University Blvd.
         Melbourne, FL 32901
         407-768-8000 x7280 (x8062) (407-674-7280/8062 after 6/1/97)
         407-984-8461 (fax)

 3) Please also send an ASCII title page (title, authors, email, abstract, 
    and keywords) and a postscript version of the manuscript to 
    [email protected].

 
 General Inquiries: 

   Please address general inquiries to: 

       [email protected] 

   Up-to-date information is maintained on WWW at: 

       http://www.cs.fit.edu/~imlm/


 Co-Editors:
 
         Philip Chan, Florida Institute of Technology    [email protected]
         Salvatore Stolfo, Columbia University           [email protected]
         David Wolpert, IBM Almaden Research Center      [email protected]

>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
[The following is a commercial announcement. GPS]

From: "Spedding, Patrick" <[email protected]>
Subject: Cognos' Scenario Wins PC Week Labs Analyst's Choice Award 
Date: Fri, 9 May 1997 05:36:20 -0400

Cognos' Scenario Wins PC Week Labs Analyst's Choice Award 

    BURLINGTON, Mass., May 6 /PRNewswire/ -- Cognos'(R) (Nasdaq: COGNF;
Toronto: CSN) 

Scenario(TM) data mining tool won PC Week Labs Analyst's
Choice Award after a head-to-head review with a competing product. Scenario's
"innovative interface makes it the coolest software package we've seen
this year," said the review, which cited its superiority, power and graphics.
Scenario extends the industry's most comprehensive business intelligence
product family, joining Cognos' market-leading PowerPlay(R), the
universal OLAP client, and award-winning Impromptu(R) query and reporting 
tool.

    "This award substantiates Cognos' belief that data mining in the
hands of business users offers up a powerful, functional and affordable
competitive edge," said Alan Rottenberg, senior vice president, Business
Intelligence products.  "Putting data mining capabilities into the hands of decision makers
and knowledge workers extends our strategy of enabling them to react
quickly to newfound knowledge, whether in operational systems or data
warehouses.
Scenario joins Cognos' other award-winning business intelligence tools
for fastest time to results, lowest cost of ownership and unparalleled ease
of use."
    PC Weeks Labs, the world's largest independent testing laboratory,
applauded both Cognos' Scenario and the competitor for bringing new data
mining techniques to the PC.  "But in head-to-head testing," it wrote,
"Scenario safely mined more usable information than its competitor,
making it our top pick."
    Designed for spotting patterns and exceptions in business data that
might
otherwise be missed, Scenario's sophisticated interface allows users to
readily visualize the business information being uncovered.  It
automates the
discovery and ranking of critical factors impacting a business, exposes
hidden relationships between factors and establishes thresholds and benchmarks.
 An intuitive, cost-effective desktop tool, Scenario liberates data mining
from what is typically an expensive and time-consuming process.  Insights
derived using Scenario are achieved directly by those best positioned to 
use the knowledge and effect rapid change.
    Scenario 1.0, released in April 1997, is available from Cognos for
$695.
It runs on Windows 95 and Windows NT and requires an IBM-compatible 486
PC and 8 MB of RAM.

(see http://www8.zdnet.com/pcweek/reviews/0505/05mining.html for PC week
comparison of Scenario and BusinessMiner.  GPS)

>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Wed, 7 May 1997 11:46:09 -0700 (PDT)
From: Computational Finance <[email protected]>
Subject: Computational Finance Graduate Programs
======================================================================= 

COMPUTATIONAL FINANCE at the Oregon Graduate Institute of Science &
Technology (OGI)

Master of Science Concentrations in
Computer Science & Engineering (CSE)
Electrical Engineering (EE)

Upcomming MS Application Deadline for Fall 1997:  May 15 & June 15! 

New! Certificate Program Designed for Part-Time Students. 

For more information, contact OGI Admissions at (503)690-1027 or
[email protected], or visit our Web site at: 
http://www.cse.ogi.edu/CompFin/

======================================================================= 

Computational Finance Overview:

Advances in computing technology now enable the widespread use of
sophisticated, computationally intensive analysis techniques applied to
finance and financial markets. The real-time analysis of tick-by-tick
financial market data, and the real-time management of portfolios of
thousands of securities is now sweeping the financial industry. This has
opened up new job opportunities for scientists, engineers, and computer
science professionals in the field of Computational Finance.

The strong demand within the financial industry for technically
sophisticated graduates is addressed at OGI by the Master of Science and
Certificate Programs in Computational Finance. Unlike a standard two year
MBA, the programs are directed at training scientists, engineers, and
technically oriented financial professionals in the area of quantitative
finance.

The master's programs lead to a Master of Science in Computer Science and
Engineering (CSE track) or in Electrical Engineering (EE track). The MS
programs can be completed within 12 months on a full-time basis. In
addition, OGI has introduced a Certificate program designed to provide
professionals in engineering and finance a means of upgrading their skills
or acquiring new skills in quantitative finance on a part-time basis.

The Computational Finance MS concentrations feature a unique combination
of courses that provides a solid foundation in finance at a non-trivial,
quantitative level, plus the essential core knowledge and skill sets of
computer science or the information technology areas of electrical
engineering. These skills are important for advanced analysis of markets
and for the development of state-of-the-art investment analysis, portfolio
management, trading, derivatives pricing, and risk management systems. 

The MS in CSE is ideal preparation for students interested in securing
positions in information systems in the financial industry, while the MS
in EE provides rigorous training for students interested in pursuing
careers as quantitative analysts at leading-edge financial firms.

The curriculum is strongly project-oriented, using state-of-the-art
computing facilities and live/historical data from the world's major
financial markets provided by Dow Jones Telerate. Students are trained in
the use of high-level numerical and analytical software packages for
analyzing financial data. 

OGI has established itself as a leading institution in research and
education in Computational Finance. Moreover, OGI has strong research
programs in a number of areas that are highly relevant for work in
quantitative analysis and information systems in the financial industry.

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Tue, 13 May 1997 14:40:06 +0100 (BST)
From: [email protected] (George Smith)
Subject: Research Assistant Position at UEA, Norwich, UK

The School of Information Systems, University of East
Anglia, Norwich has a vacancy for a

Research Assistant

to work on a project entitled "Datamining in the
Telecommunications Sector".


A computer graduate with at least a 2(I) degree in computing
or allied subject is sought for a two year post
starting August 1st, 1997, or as soon as possible
thereafter.

The appointee will work within a leading telecommunications
company, Nortel plc, on a day-to-day basis but
will be an employee of the University of East Anglia.
Opportunities will exist for registration for a part-time
higher degree at the University. A successful applicant will
be expected to have a high degree of numeracy and
a strong computing background. Preference will be given to
those who, in addition, have some knowledge
(and expertise) in one or more of the following:
evolutionary computation, operations research, artificial
intelligence or telecommunications.

The research is sponsored jointly by the Teaching Company
Scheme and by Nortel plc and involves the
development and application of various inference and
heuristic techniques, including genetic algorithms,
simulated annealing and tabu search, to elicit knowledge
from large scale data sets generated within the
telecommunications industry.

Initial salary to be determined but expected to be around
16K UK pounds.

Applicants are invited to telephone Dr George D Smith (+44
(0) 1603 593260) or email [email protected] for
further information.

Applications in the form of a covering letter plus three
copies of a CV, including the names and addresses of
three referees, should be sent to:

      Dr George D Smith
      School of Information Systems
      University of East Anglia
      Norwich
      NR4 7TJ, UK

on or before Friday 6th June 1997.

  Tel: + 44 (0)1603 593260
  FAX: + 44 (0)1603 593344
  Email: [email protected]
  www:   http://www.sys.uea.ac.uk/Teaching/Staff/gds.html

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Mon, 12 May 1997 16:14:32 +1000
From: Lipo Wang <[email protected]>
Subject: CFP: Conference on Knowledge Discovery and Data Mining (PAKDD-98)
======================================================================
                      C A L L  F O R  P A P E R S
======================================================================

                 The Second Pacific-Asia Conference on

             Knowledge Discovery and Data Mining (PAKDD-98)
             ----------------------------------------------

                 Melbourne, Australia, 15-17 April 1998
                 ======================================

               URL: http://www.sd.monash.edu.au/pakdd-98

The Second  Pacific-Asia  Conference on Knowledge Discovery   and Data
Mining (PAKDD-98) will provide  an international forum for the sharing
of  original research  results  and practical development  experiences
among researchers   and   application developers from    different KDD
related  areas   such as   machine  learning,  databases,  statistics,
knowledge   acquisition,  data visualization, software re-engineering,
and knowledge-based systems.  It  will follow the success of  PAKDD-97
held  in  Singapore in  1997  by  bringing together  participants from
universities, industry and government.

Papers on  all aspects  of  knowledge discovery  and  data  mining are
welcome.  Areas of interest include, but are not limited to:

   - Data and Dimensionality Reduction
   - Data Mining Algorithms and Tools
   - Data Mining and Data Warehousing
   - Data Mining on the Internet
   - Data Mining Metrics
   - Data Preprocessing and Postprocessing
   - Data and Knowledge Visualization
   - Deduction and Induction in KDD
   - Discretisation of Continuous Data
   - Distributed Data Mining
   - KDD Framework and Process
   - Knowledge Representation and Acquisition in KDD
   - Knowledge Reuse and Role of Domain Knowledge
   - Knowledge Acquisition in Software Re-Engineering and Software 
     Information Systems
   - Induction of Rules and Decision Trees
   - Management Issues in KDD
   - Machine Learning, Statistical and Visualization Aspects of KDD
     (including Neural Networks and Inductive Logic Programming)
   - Mining in-the-large vs Mining in-the-small
   - Noise Handling
   - Security and Privacy Issues in KDD
   - Successful/Innovative KDD Applications in Science, Government,
     Business and Industry.

Both research  and applications papers  are solicited.   All submitted
papers will  be reviewed on the  basis of technical quality, relevance
to KDD, significance, and clarity.  Accepted  papers will be published
in the   conference  proceedings by   an international  publisher.   A
selected number of the accepted  papers  will be expanded and  revised
for inclusion in a special issue of an international journal.

All submissions  should be  limited to a  maximum of 5,000 words. Four
hardcopies should be forwarded to the following address.

     Professor Ramamohanarao Kotagiri (PAKDD '98)
     Department of Computer Science
     The University of Melbourne
     Parkville, VIC 3052
     Australia

Please  include a cover   page  containing the title, authors  (names,
postal and  email   addresses), an  200-word  abstract  and  up  to  5
keywords.  This cover page must accompany the paper.

     *************** I m p o r t a n t   D a t e s ***************
     * 4 copies of full papers received by:     October 16, 1997 *
     * acceptance notices:                     December 22, 1997 *
     * final camera-readies due by:             January 30, 1998 *
     *************************************************************

Conference Chairs:
==================

        Ross Quinlan            Sydney University
        Bala Srinivasan         Monash University

Program Chairs:
===============

        Xindong Wu              Monash University
        Ramamohanarao Kotagiri  Melbourne University

Organising Committee Co-Chairs:
===============================

        Kevin Korb              Monash University
        Graham Williams         CSIRO, Australia

PAKDD-98 Publicity Chair:       
=========================

        Lipo Wang               Deakin University

PAKDD-98 Tutorial Chair:
======================== 

        Jon Oliver              Monash University
   
PAKDD-98 Treasurer: 
===================

        Michelle Riseley        Monash University

Program Committee:
==================

    Grigoris Antoniou   James Boyce         Ivan Bratko
    Mike Cameron-Jones  Arbee Chen          David Cheung
    Vic Ciesielski      Honghua Dai         John Debenham
    Olivier de Vel      Tharam Dillon       Guozhu Dong
    Peter Eklund        Usama Fayyad        Matjaz Gams
    Yike Guo            David Hand          Evan Harris
    David Heckerman     David Kemp          Masaru Kitsuregawa
    Kevin Korb          Hingyan Lee         Jae-Kyu Lee
    Deyi Li             Bing Liu            Huan Liu
    Zhi-Qiang Liu       Hongjun Lu          Dickson Lukose
    Kia Makki           Heikki Mannila      Peter Milne
    Shinichi Morishita  Hiroshi Motoda      Hwee-Leng Ong
    Jon Oliver          Maria Orlowska      G. Piatetsky-Shapiro
    Niki Pissinou       Peter Ross          Claude Sammut
    S. Seshadri         Hayri Sever         Arun Sharma
    Heinz Schmidt       Evangelos Simoudis  Atsuhiro Takasu
    Takao Terano        B. Thuraisingham    Kai Ming Ting
    David Urpani        R. Uthurusamy       Lipo Wang
    Geoff Webb          Graham Williams     Beat Wuthrich
    Xin Yao             John Zeleznikow     Dian-cheng Zhang
    Ming Zhao           Zijian Zheng        Ning Zhong
                        Justin Zobel

Further Information
===================
 
        Dr Xindong Wu
        Department of Software Development
        Monash University
        900 Dandenong Road
        Caulfield East, Melbourne 3145
        Australia
 
        Phone: +61 3 9903 1025
        Fax: +61 3 9903 1077
        Email: [email protected]

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Tue, 6 May 1997 13:08:00 -0500 (EST)
From: "David Leake" <[email protected]>
Subject: ICCBR-97: First Call for Participation

				 ICCBR-97
	  Second International Conference on Case-Based Reasoning

			     Brown University
		Providence, Rhode Island, July 25-27, 1997


Note:  The early registration deadline is May 28, 1997 (extended from May 20).
Additional information is available from http://www.iccbr.org/iccbr-97.html
Questions should be sent to [email protected].

	    --------------- Conference Overview ---------------

In 1995, the first International Conference on Case-Based Reasoning (ICCBR-95)
was held in Sesimbra, Portugal, as the start of a biennial series.  ICCBR-97,
the Second International Conference on Case-Based Reasoning, will be held at
Brown University in Providence, Rhode Island, on July 25-27, immediately prior
to AAAI-97 and IAAI-97.

The program of ICCBR-97 will include both research and applications.  The
three-day conference will feature invited talks, paper and poster sessions,
and panels presenting both mature work and new ideas, selected from over
100 submissions to the conference.  The conference aims to achieve a
vibrant interchange between researchers and practitioners with different
perspectives on fundamentally related issues, in order to examine and
advance the state of the art in case-based reasoning and related fields.

Topics to be addressed in conference presentations include:

   * Case representation, indexing and retrieval, similarity assessment, case
     adaptation, and analogical reasoning
   * Case-based and instance-based learning, index learning, and integrating
     CBR with other learning methods
   * Case-based reasoning and related approaches for task areas such as
     education, design, and medicine
   * Integration of CBR with other AI methods and comparisons to other
     approaches
   * Methods and systems for decision support, knowledge management, and
     intelligent information retrieval
   * Novel application areas for case-based techniques, deployed applications
     with significant impact, and lessons learned from application
     development 

(See http://www.iccbr.org/iccbr-97.html for details on registration, etc.)
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: 8 May 1997 10:17:04 -0500
From: "Erdogmus" <[email protected]>
Subject: CASCON'97 CfP

CASCON'97 web site:  http://www.cas.ibm.ca/cascon/
--
                       CASCON'97: Meeting of Minds

                           November 10-13, 1997
                        International Plaza Hotel
                       Mississauga, Ontario, Canada

 Dear  Colleague,

 CASCON '97, the seventh annual IBM Center for Advanced Studies Conference 
is upon us. CASCON provides an excellent opportunity for academic, 
governmental, and industrial research communities to share their work. We encourage you to
 submit papers. The deadline for paper submissions is June 27, however, we
 would like to know about your intention to submit a paper earlier (by May 16,
 if possible).  If you are thinking about submitting a paper, please 
register as soon as possible on our web site at
             http://www.cser.ca:8001/
 All you have to do is to fill out a simple online form specifying a
 tentative title and some keywords. This information can easily be changed
 any time using the automated system.

 This year, we are soliciting papers in a wide range of topics including =
but  not limited to the following:

 - Distributed systems and applications: Internet and the WWW, electronic
     commerce, tele-learning, tele-medicine, CSCW, multimedia, distributed
     object technologies, Java, performance analysis, high-speed networks,
     and applications management

 - Database technology: data mining, knowledge recovery, digital =
libraries, and data warehousing

 - User technologies: human-computer interaction, navigation, and GUIs

 - Software engineering and practices: maintenance, design recovery, program
     understanding, visualization, reuse, frameworks and design patterns,
     development environments, reliability, testing and validation, 
metrics, and real-time systems

 - Compiler technology: new techniques, compiler development, optimization,
     parallelism, and architectures

 For more information about CASCON'97, please visit the web site
             http://www.cas.ibm.ca/cascon/

 We are looking forward to your participation.

Dr. Hakan Erdogmus
CASCON'97 Program Co-chair
[email protected]

CASCON'97 web site:  http://www.cas.ibm.ca/cascon/
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Sat, 10 May 1997 13:09:26 -0700 (PDT)
From: "John R. Koza" <[email protected]>
Subject: GP-97 Revised Call for Participation 

CALL FOR PARTICIPATION

Genetic Programming 1997 Conference (GP-97)
July 13 - 16 (Sunday - Wednesday), 1997
Fairchild Auditorium - Stanford  University - Stanford, California
-----------------------------------------------------------------------
In cooperation with American Association for Artificial Intelligence (AAAI), 
Association for Computing Machinery (ACM), SIGART, and Society for Industrial 
and Applied Mathematics (SIAM)
-----------------------------------------------------------------------
WWW FOR GP-97: http://www-cs-faculty.stanford.edu/~koza/gp97.html
-----------------------------------------------------------------------
NOTE: You are urged to make your housing arrangements as early as possible
since convenient hotel locations are limited.  Also, if you are driving
to the Stanford campus, please be aware of parking lot construction in
the area of Fairchild Auditorium and allow a little extra time 
(particularly on the first Monday session) to find a parking place.
-----------------------------------------------------------------------
Genetic programming is an automatic programming technique for evolving 
computer programs that solve (or approximately solve) problems.  Starting with 
a primordial ooze of thousands of randomly created computer programs, a 
population of programs is progressively evolved over many generations using 
the Darwinian principle of survival of the fittest, a sexual recombination 
operation, and occasional mutation.   
 
The first annual genetic programming conference in 1996 featured 15 tutorials, 
2 invited speakers, 3 parallel tracks, 73 papers, and 17 poster papers in 
proceedings book, and 27 late-breaking papers in a separate book distributed 
to conference attendees, and 288 attendees.  A description of GP-96 appears in 
the October 1996 issue of Scientific American 
(http://www.sciam.com/WEB/1096issue/1096techbus3.html).  This second annual 
conference in 1997 reflects the rapid growth of this field in which over 600 
technical papers have been published since 1992.  For August 5, 1996 article 
in E. E. Times on GP-96 conference and August 12, 1996 article in E. E Times 
on John Holland's invited speech at GP-96, go to
http://www.techweb.com/search/search.html

There will be 36 long, 33 short, and 15 poster papers at the Second Annual
Genetic Programming Conference to be held on July 
13-16 (Sunday - Wednesday), 1997 at Stanford University.
In addition, there will be late-breaking papers (published in a separate
book in mid June after the June 11 deadline for late-breaking papers).
Topics include, but are not limited to,
applications of genetic programming, theoretical foundations of 
genetic programming, implementation issues, technique extensions, cellular 
encoding, evolvable hardware, evolvable machine language programs, automated 
evolution of program architecture, evolution and use of mental models, 
automatic programming of multi-agent strategies, distributed artificial 
intelligence, auto-parallelization of algorithms, automated circuit synthesis, 
automatic programming of cellular automata, induction, system identification, 
control, automated design, data and image compression, image analysis, pattern 
recognition, molecular biology applications, grammar induction, and 
parallelization.  Papers describing recent developments are also solicited in 
the following additional areas: genetic algorithms, classifier systems, 
evolutionary programming and evolution strategies, artificial life and 
evolutionary robotics, DNA computing, and evolvable hardware.
-----------------------------------------------------------------------

full information at http://www-cs-faculty.stanford.edu/~koza/gp97.html
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
410.2797:18IJSAPL::OLTHOFSpellchecked Henry AlthoughThu Jun 05 1997 22:54749
Knowledge Discovery Nuggets 97:18, e-mailed 97-05-27
 News:
   *  Ronny Kohavi, Silicon Graphics' MineSet used in Incyte's LifeTools 3D
        http://www.incyte.com/press/1997/PR9712-LT3D.html
   * R. Zicari, COMDEX Internet Application Awards,
        http://www.ltt.de
   * Brij Masand, HPCwire: Robert Grossman discusses managing, mining
        large data sets
Publications:
    * GPS, First Issue of DMKD journal is available on-line in PDF
    format, http://www.wkap.nl/kapis/CGI-BIN/WORLD/kaphtml.htm?DAMISAMPLE
    * Andy Pryke, Bibliography of KDD and Data Mining Papers,
          http://www.cs.bham.ac.uk/~anp/papers.html
Meetings:
    * D. Fischer, COLT/ICML Early Registration deadline June 2,
        http://cswww.vuse.vanderbilt.edu/~mlccolt/
    * Jan Komorowski, PKDD'97 -- Call For Participation, 
        http://www.idi.ntnu.no/pkdd97/
    * David Heckerman, Summer School on PROBABILISTIC GRAPHICAL MODELS
        http://www.newton.cam.ac.uk/programs/nnm.html
    * Vasant Honavar, CFP: Workshop on Automata Induction 
        Grammatical Inference, and Language Acquisition at ICML-97
        http://www.cs.iastate.edu/~honavar/mlworkshop.html
    * Honghua Dai, KDEX-97: IEEE Knowledge and Data Engineering 
        Exchange Workshop, http://www.sd.monash.edu.au/kdex-97
    * Gordon, CFP: ICML-97 workshop on Reinforcement Learning
        http://www.cs.cmu.edu/~ggordon/ml97ws
--
Knowledge Discovery Nuggets is a free electronic newsletter for the 
Data Mining and Knowledge Discovery community, focusing on the 
latest research and applications.

Submissions are most welcome and should be emailed, with a 
DESCRIPTIVE subject line (and a URL) to [email protected]. 
Submissions may be edited for length.
Please keep CFP and meetings announcements short and provide 
a URL for details.
 
To subscribe, see http://www.kdnuggets.com/subscribe.html 

KD Nuggets frequency is 3-4 times a month. 
Back issues of KD Nuggets, a catalog of data mining tools 
("Siftware"), pointers to Data Mining Companies, Relevant Websites,
Meetings, and more is available at Knowledge Discovery Mine site 
at http://www.kdnuggets.com/

	-- Gregory Piatetsky-Shapiro (editor)
        [email protected]

********************* Official disclaimer ***************************
All opinions expressed herein are those of the contributors and not 
necessarily of their respective employers (or of KD Nuggets)
*********************************************************************

~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
"When you come to a fork in the road, take it."
          - Yogi Berra - 
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Thu, 15 May 1997 22:22:53 -0700
From: Ronny Kohavi <[email protected]>
Subject: Silicon Graphics' MineSet used in Incyte's LifeTools 3D

A recent press release by Incyte Pharmaceuticals Inc. announces
LifeTools 3D, a powerful data mining and visualization software based
on Silicon Graphics' MineSet(tm) software suite of data analysis and
visualization tools. In collaboration with Silicon Graphics, Incyte
created customized functions that are specifically designed to help
researchers view, explore, and identify novel genes within LifeSeq.

See 
  http://www.incyte.com/press/1997/PR9712-LT3D.html
for details.

--
   Ronny Kohavi ([email protected], http://robotics.stanford.edu/~ronnyk)
   Engineering Manager, Analytical Data Mining.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: "Prof. Zicari" <[email protected]>
Date: Fri, 9 May 1997 23:39:14 +0200 (METDST)
Subject: COMDEX Internet Application Awards.

News Release

First COMDEX Internet Application Awards
IBM, Microsoft and SUN to sponsor Awards Program for the new generation of
Internet applications

Frankfurt -- April 1997. The three leading IT companies IBM, MICROSOFT and
SUN Microsystems will jointly support an international Awards Program
designed for the new generation of Internet-based applications for
business. 

The first COMDEX Internet Application Awards will be given out in the
following three categories:
Best Intranet-based application for enterprise usage
        Focus: Use of an Intranet for Institutional/Corporate knowledge   
	for competitive advantage.
Most Innovative Web Site
        Focus: Best or most innovative Web Site with respect to user   
	interface, easy to use, innovative content.
Best Transactional Internet Application
        Focus: Database, interactive applications.

The Award winners will be selected among the submittals by a jury of
international experts. The Awards ceremony will take place on October 8,
1997 at the trade show COMDEX Internet & Object World Frankfurt'97 (October
7-10,1997, Sheraton Conference Center, Frankfurt/Main Airport).

"Successful Internet technologies like Java confirm us in considering the
Internet as the future base for enterprise computing. The COMDEX Internet
Application Awards program provides an excellent forum for honoring and
supporting outstanding Internet applications. We are looking forward to an
exciting contest", says Gert Haas, Marketing Director, SUN Microsystems,
Germany.

Microsoft's commitment to the Awards Program is explained by Karl-Heinz
Breitenbach, Customer Unit Manager Internet & Developer Customer Unit,
Microsoft Germany: "The availability of all relevant information at work is
the base for a fast and successful decision in a company. We therefore have
taken the challenge of providing 'information at your fingertips' very
early and this is reflected by our current product line. Internet
technology today allows to rapidly and reliably represent information
distributed in all branches of the company via a so called Intranet
solution. With the sponsorship of the COMDEX Internet Application Awards,
Microsoft confirms its commitment to innovative Internet technologies which
perfectly match our company goals."

Sanyaya Addanki, General Manager of Network Computing Solutions, IBM EMEA,
explains IBM's motivation for a sponsorship: "IBM is committed to providing
companies with solutions that link business critical applications and data
with the global reach and easy access of the web. We are proud to sponsor
the COMDEX Internet Application Awards Program, which fosters the
development of electronic business applications. Electronic business is the
cornerstone of IBM's network computing vision."

To obtain the entry kit:

download it from the web at:    http://www.ltt.de
send an e-mail to:                      [email protected]
call LogOn at:                  +49-6173-9558-51

COMDEX Internet and Object World Frankfurt '97 are produced by SOFTBANK
COMDEX Inc. and LogOn Technology Transfer GmbH.
The show is sponsored by: Object Management Group (OMG), A1-Solutions,
Business Online, Computer Associates, Computer Zeitung, MID and redmond's.
Internet and Wireless are sponsored by Omnilink Internet Service Center and
ARtem.

Information on Conferences and Exhibition:
Christiane Sattler
LogOn Technology Transfer GmbH
Burgweg 14, D-61476 Kronberg/Ts., Germany
phone:  +49-6173-9558-53
fax:            +49-6173-9404-20
e-mail:         [email protected]
Web:            http://www.ltt.de
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
[the following article is included with the permission of HPCwire.  GPS]

Date: Fri, 23 May 1997 14:00:49 -0400
From: Brij Masand <[email protected]>
Subject: ROBERT GROSSMAN DISCUSSES MANAGING, MINING LARGE DATA SETS           

[From  H P C w i r e *** May 23, 1997:  Vol. 6, No. 20 ***]

ROBERT GROSSMAN DISCUSSES MANAGING, MINING LARGE DATA SETS           
by Alan Beck, editor in chief, HPCwire 05.23.97
=============================================================================

  Chicago, Ill. -- Issues raised in the effective archiving, managing and
mining of very large data sets have significant pragmatic repercussions
throughout both commercial and scientific computing. To learn more about the
state of the art in this area, HPCwire interviewed Robert Grossman, professor
of mathematics, statistics and computer science at the University of Illinois
at Chicago, president of Magnify, and principal researcher in the Terabyte
Challenge.

-------------------

  HPCwire: Please give an overview of the current status of the Terabyte
Challenge, including funding sources and participants.

  GROSSMAN: "The Terabyte Challenge is open, distributed test bed for
managing and mining massive data sets. The infrastructure for the Terabyte
Challenge is provided by the NSF sponsored National Scalable Cluster Project
(NSCP) and its  industrial partners. The NSCP philosophy is to use commodity
components with high performance networking to build virtural platforms with
supercomputing power. The software tools developed for the Terabyte Challenge
seek to balance high performance computing with the high performance
input/output required by data intensive and data mining applications.

  "Currently, the NSCP consists of approximately 25 nodes and 500 Gigabytes
of disk at both UIC and UPenn, together with smaller clusters at the
participating partners. The infrastructure will be more than doubling over
the next few months to over 100 nodes and 2 Terabytes of disk. Unlike
other centers, the NSCP is configured for managing and mining large data
sets, ranging in size from 100 to 500 Gigabytes.

  "We are currently planning the third Annual Terabyte Challenge, which
will take place at SC 97. The first two took place at Supercomputing 95 and
96 (both won High Performance Computing Challenge Awards).

  "Currently, the University of Illinois at Chicago, the University of
Pennsylvania, and the University of Maryland form the core academic team. Two
industrial partners-HUBS (Philadelphia) and Magnify, Inc. (Chicago) will also
be working closely on this year's Terabyte Challenge. Funding is provided by
NSF to the NSCP Consortium, by DOE to UIC and UPenn, and by DOD to Magnify.
We expect additional partners to join us. If interested, please contact RLG.

  "Current applications include mining scientific data (UIC and UPenn),
mining medical data (UIC and UPenn), detecting network intrusions with data
mining (Magnify, Inc), and data intensive computing in support of virtual
reality (HUBS).

  "The web site http://www.lac.uic.edu will contain additional information
shortly."

  HPCwire: What progress has been made in scaling algorithms for very large
data sets?

  GROSSMAN: "I use the 10x rule: one can expect to archive 10-100x more data
than one can manage, and manage 10-100x more data than one can mine. This
makes sense since archiving requires a simple retrieval of files or objects,
managing requires the ability to perform simple queries, and mining requires
statistically and numerically intensive queries. At SC 96, we mined data sets
that were roughly 100-250 Gigabytes in size using 10-25 nodes. At SC 97, we
hope to mine 500-1000 Gigabytes of data on 50-100 nodes. I want to emphasize
that one can manage and perform simple queries of much larger data sets (up
to tens of Terabytes), but the detailed data mining of even a few hundred
gigabytes of data is a challenge today."

  "Parallelizing data mining algorithms can be done in several ways. Most
data mining algorithms are sufficiently compute-intensive that they work best
when the data and the working space required for the algorithm fit into
memory. For large data sets this is not clearly not possible and the
challenge is to balance the i/o requirements of the algorithm with the cpu
requirements. Several approaches are possible:

  "For the purposes here, we assume that the data mining process consists of
several steps, including 1) extracting patterns, 2) using these patterns
automatically to build predictive models, and 3) selecting or combining
multiple predictive models to produce a single decision. In each of the four
methods described next, one or more subsets of the data are chosen and mined.
The methods differ in how the subsets are chosen: the subsets may be created
by random draws, by a partition of the data, by a cover of the data, or by a
range based query of the data.

  "In sample based data mining, one samples a large data set and then
extracts a patterns or builds a model. This is the most common approach. It
works well for patterns that are still easily found after down sampling. It
has the advantage that the compute time is vastly reduced (since the data to
be mined is vastly smaller) and the disadvantage that the patterns obtained
are often not indicative of the whole data set -- this is closely related to
the problem of over-fitting. This approach is most often not parallelized,
although sometimes sampling can be done in parallel and the results combined
into one model using model averaging techniques.

  "In partitioned based data mining, the data set is partitioned into distinct
subsets which fit into memory, each partition is separately mined to produce
a collection of predictive models, and then the predictive models are
combined using model selection and model averaging techniques. This type of
data mining is easily parallelized, since one (or more) processors can be
assigned to each partition.

  "Cover-based data mining is similar to partitioned based data mining, but
the different subsets to be mined can be overlapping.  This is closely
related to what is called local mining, in which the patterns extracted use
data which is localized in some fashion, say based on the N closest data
points to a fixed reference point.

  "Attribute-based data mining creates different subsets to be mined by using
an attribute based query of the underlying data set. For example, all objects
whose first attribute is less than 1.1 and whose second attribute is equal to
"A", etc. are selected and then mined.

  "For more information, see R. L. Grossman, Scaling Data Mining Algorithms
Using Cover-based Learning with Model Selection and Model Averaging,
http://www.magnify.com "

  HPCwire: How is the TC approaching the mining of highly distributed data?

  GROSSMAN: "On the systems side, we have made good progress in this area.
The NSCP clusters at UIC and UPenn have been connected for several weeks now
by the vBNS at OC-3 (155 Mbps) speeds. Using this infrastructure we have
experimented with wide area data mining of scientific and medical data. We
are currently using this experience to develop new algorithms for wide area
data mining and to develop new generations of our data management and data
mining tools. The challenge is to develop a new class of algorithms for
extracting patterns from widely distributed data without the necessity of
first warehousing the data."

  HPCwire: What progress has been made in better understanding dynamical
systems via data mining?

  GROSSMAN: "Not as much as we would have liked. Data mining algorithms
today, by and large, work with data which is flat and static. The core
dynamical system concepts of a state vector and its evolution in time are
missing in most data mining algorithms. Hybrid systems is an emerging field
which combines dynamical systems with discrete structures such as rule
systems and automata. The latter can express the patterns discovered in data
mining. Researchers working in the NSCP are actively investigating exploiting
hybrid systems and related techniques to develop next generation data mining
algorithms which can utilize state information and work with time varying
data."

  HPCwire: How is TC research being made available to the commercial sector?
Have any new products or partnerships resulted from TC-generated technology?

  GROSSMAN: "The NSCP and the Terabyte Challenge have 1) published the core
ideas  they have developed for data mining and data intensive computing, 2)
developed reference architectures and implementations for software tools to
support data mining (the UIC software tools PTool, JTool, and DMTool), and 3)
encouraged companies to exploit this technology for data intensive computing
and data mining.

  "To date, HUBS in Philadelphia and Magnify, Inc. in Chicago have begun to
employ some of these ideas in the products and services they offer.
Currently, regional data minings centers are in the planning process in both
Chicago and Philadelphia."

  HPCwire: How do you see the TC evolving over the next five years?

  GROSSMAN: "The most exciting development is the expected transformation of
the NSCP into two regional data mining centers with very strong industrial
ties: one in Chicago and one in Philadelphia. This has three important
consequences: 1) First the compute, i/o, and networking infrastructure which
we can dedicate to data mining projects is expected to double this year and
hopefully to double again in about two years.  2) With our industrial
partners, we are actively working to demonstrate the practical feasibility of
mining massive data sets and to establish open standards for managing,
mining, and modeling massive data sets. 3) Using the vBNS network connecting
the centers in Chicago and Philadelphia, we are finding it easy to experiment
with the type of wide area data mining issues which we expect to take on an
increasing important role for scientific, engineering, medical, and business
data mining applications.

  "To summarize, during the next five years, we expect the Terabyte Challenge
not only to continue to push the boundaries of massive data mining through an
annual competition, but also, together with its industrial partners, to be
actively involved with establishing data mining standards and reference
implementations of software tools for managing, mining, and modeling massive
data sets.

  "Additional participants for 1997 competition are welcome. Please contact
one of the organizers if interested. Additional information
can be found at http://www.nscp.uic.edu "

--------------------
Alan Beck is editor in chief of HPCwire. Comments are always welcome and
should be directed to [email protected]

Copyright 1997 HPCwire. Redistribution of this article is forbidden by law
without the expressed written consent of the publisher. For a free trial 
subscription to HPCwire, send e-mail to [email protected].
                            H P C w i r e
The Text-on-Demand E-zine for High Performance Computing
*************************************************************************** 



>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Thu, 22 May 1997 15:05:54 -0400
From: Gregory Piatetsky-Shapiro <[email protected]>
Subject: First Issue of DMKD journal is available on-line in PDF format

The premiere issue of Data Mining and Knowledge Discovery journal
is available on-line, in PDF format, at 
http://www.wkap.nl/kapis/CGI-BIN/WORLD/kaphtml.htm?DAMISAMPLE

To read this very good (in my biased opinion) issue you need an Acrobat reader,
which you can download from http://www.adobe.com/acrobat/

Only the first issue will be freely available on-line, 
but you can subscribe to the journal for $50 individual rate, more 
for institutional rate
-- see http://www.wkap.nl/kapis/CGI-BIN/WORLD/journalhome.htm?1384-5810
for subscription information.  Please support this journal !

>~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: [email protected]
Date: Fri, 23 May 97 22:12:09 BST
Subject: Nuggets: Bibliography of KDD and Data Mining Papers

The Master Bibliography of KDD and Data Mining Papers is a
bibliography of over 400 papers on the topics of Data Mining and
Knowledge Discovery in Databases (this includes closely related papers
on visualisation and machine learning). More than 70 of the papers are
online.

It is available in either bibtex, or html annotated bibtex formats
from:

  http://www.cs.bham.ac.uk/~anp/papers.html


A search interface is also available at:

  http://www.cs.bham.ac.uk/~anp/bibtex/search.html


Andy additional references, or corrections are gratefully
received. Please email them to me, Andy Pryke, at
[email protected] Only references in machine readable format
(e.g. refer or preferable Bibtex) can be added, due to time
constraints.

Note that all the information I have about the papers in in the
bibliography, and many (330ish) of the papers are not available
online.

Please read the _collection_ copyright statement at
(http://www.cs.bham.ac.uk/~anp/bibtex/copyright.html). 

If you find the bibliography useful, you may wish to send me a
postcard (details in the copyright statement).

Andy Pryke
--
   Andy Pryke, Research Student, Computer Science, Birmingham University
Data Mining Information - http://www.cs.bham.ac.uk/~anp/TheDataMine.html 

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Fri, 16 May 1997 19:09:05 -0500
From: [email protected] (Douglas H. Fisher)
Subject: COLT/ICML Early Registration

Early registration for the Tenth Annual Conference on
Computational Learning Theory (COLT-97) and/or the Fourteenth
International Conference on Machine Learning (ICML-97)
concludes June 2, 1997. Room blocks at area hotels and on campus
are also "released" June 2 (though rooms will likely still be available
after that date). See http://cswww.vuse.vanderbilt.edu/~mlccolt/
for more information.

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Date: Fri, 16 May 1997 16:44:56 +0200 (MET DST)
From: Jan Komorowski <[email protected]>
Subject: PKDD'97 -- Call For Participation

                   1st European Symposium on Principles of
			  Data Mining and Knowledge Discovery in Databases
                              Trondheim, Norway
                              June 24-27, 1997

                           Tutorials: June 24-25
                           Symposium: June 26-27

This is an invitation to the 1st European Symposium on Principles of
Data Mining and Knowledge Discovery in Databases.  

PKDD'97 is the first symposium in an intended series of meetings of
the data mining and knowledge discovery from databases (KDD) community
in Europe.  The goal of the PKDD series is to provide a European-based
forum for interaction among all theoreticians and practitioners
interested in data mining and knowledge discovery.  Fostering an
interdisciplinary collaboration is one desired outcome, but the main
long-term focus is on theoretical principles for the emerging
discipline of KDD, especially those new principles that go beyond each
of the contributing areas.
 
There were 50 papers submitted to PKDD'97.  After the selection by the
program committee, the papers were assigned into three categories: 14
plenary papers, 13 parallel session papers and 11 poster papers that
include spot-light presentations in the plenary sessions.  In
addition, four tutorials were selected: Rough Sets for Data Mining and
Knowledge Discovery, Techniques and Applications of KDD, High
Performance Data Mining, and Data Mining in the Telecommunications
Industry.

The proceedings are published by Springer Verlag.

The invited speakers include Evangelos Simoudis, USA, and Bjarne Foss,
Norway. Theey will provide their different perspectives on the field:
one is data mining for businesses and the other data mining seen from
the point of view of control theory.  Panel discussions on the present
situation and the future development of the field are planned.

There will be software exhibitions of both commercial and academic
software. 

Please look at the PKDD'97 Homepage (http://www.idi.ntnu.no/pkdd97/) for
detailed information and news about the symposium.

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: David Heckerman <[email protected]>
Subject: Summer School on PROBABILISTIC GRAPHICAL MODELS
Date: Fri, 16 May 1997 08:08:00 -0700

		   A Newton Institute EC Summer School

                     PROBABILISTIC GRAPHICAL MODELS

          	         1 - 5 September 1997

                 Isaac Newton Institute, Cambridge, U.K.

        Organisers: C M Bishop (Aston) and J Whittaker (Lancaster)


Probabilistic graphical models provide a very general framework for
representing complex probability distributions over sets of
variables. A powerful feature of the graphical model viewpoint is that
it unifies many of the common techniques used in pattern recognition
and machine learning including neural networks, latent variable
models, probabilistic expert systems, Boltzmann machines and Bayesian
belief networks. Indeed, the increasing interactions between the
neural computing and graphical modelling communities have resulted in
a number of powerful new ideas and techniques. The conference will
include several tutorial presentations on key topics as well as
advanced research talks.


Provisional themes:

Conditional independence; Bayesian belief networks; message
propagation; latent variable models; variational techniques; mean
field theory; learning and estimation; model search; EM and MCMC
algorithms; axiomatic approaches; causality; decision theory; neural
networks; information and coding theory; scientific applications and
examples.


Provisional list of speakers:

	C M Bishop (Aston)		D J C MacKay (Cambridge)
        R Cowell (City)			J Pearl (UCLA)
	A P Dawid (UCL)			M D Perlman (Washington)
	D Geiger (Technion)		M Piccioni (Aquila)
	E George (Texas)		R Shachter (Stanford)
	W Gilks (Cambridge)		J Q Smith (Warwick)
	D Heckermann (Microsoft)	M Studeny (Prague)
	G E Hinton (Toronto)		M Titterington (Glasgow)
	T Jaakkola (UCSC)		J Whittaker (Lancaster)
	M I Jordan (MIT)		S Lauritzen (Aalborg)
	B Kappen (Nijmegen)		D Spiegelhalter (Cambridge)
	M Kearns (AT&T)			S Russell (Berkeley)

This instructional conference will form a component of the Newton
Institute programme on Neural Networks and Machine Learning, organised
by C M Bishop, D Haussler, G E Hinton, M Niranjan and L G Valiant.
Further information about the programme is available via the WWW at

         http://www.newton.cam.ac.uk/programs/nnm.html


Location and Costs: 

The conference will take place in the Isaac Newton Institute and
accommodation for participants will be provided at Wolfson Court,
adjacent to the Institute. The conference package costs 270 UK pounds
which includes accommodation from Sunday 31 October to Friday 5
September, together with breakfast, lunch during the days that the
lectures take place and evening meals.


Applications: 

To participate in the conference, please complete and
return an application form and, for students and postdoctoral fellows,
arrange for a letter of reference from a senior scientist. Limited
financial support is available for participants from appropriate
countries.

Application forms are available from the conference Web Page at

         http://www.newton.cam.ac.uk/programs/nnmec.html

Completed forms and letters of recommendation should be sent to Heather
Dawson at the Newton Institute, or by e-mail to
[email protected]

	*Closing Date for the receipt of applications and 
             letters of recommendation is 16 June 1997*

>~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
From: Vasant Honavar <[email protected]>
Subject: Call for Participation: Workshop on Automata Induction, 
Grammatical Inference, and Language Acquisition
Date: Thu, 8 May 1997 10:53:48 -0500 (CDT)

                                Workshop on
    Automata Induction, Grammatical Inference, and Language Acquisition
   The Fourteenth International Conference on Machine Learning (ICML-97)
                    July 12, 1997, Nashville, Tennessee

The Automata Induction, Grammatical Inference, and Language Acquisition
Workshop will be held on Saturday, July 12, 1997 during the Fourteenth
International Conference on Machine Learning (ICML-97) which will be
co-located with the Tenth Annual Conference on Computational Learning Theory
(COLT-97) at Nashville, Tennessee from July 8 through July 12, 1997.
Additional information on ICML-97 and COLT-97 can be found at 
http://www.cs.iastate.edu/~honavar/mlworkshop.html

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Wed, 21 May 1997 12:23:13 +1000
From: Honghua Dai <[email protected]>
Subject: KDEX-97 Final Call for Papers

 1997 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX-97)
 --------------------------------------------------------------------
      Sponsored by the IEEE Computer Society and Co-located with
      the 9th IEEE Tools with Artificial Intelligence Conference

          November 4, 1997, Newport Beach, California, U.S.A.
          ===================================================

Call for Papers

The   1997 IEEE  Knowledge  and  Data  Engineering Exchange   Workshop
(KDEX-97)   will provide   an  international forum   for  researchers,
educators and practitioners to  exchange and evaluate information  and
experiences related to state-of-the-art issues and trends in the areas
of  artificial intelligence and databases.  The  goal of this workshop
is to expedite technology  transfer from researchers to practitioners,
to  assess the impact   of emerging technologies  on  current research
directions,   and  to   identify   emerging  research   opportunities.
Educators   will  present  material   and  techniques for  effectively
transferring   state-of-the-art   knowledge    and  data   engineering
technologies to students and professionals.  The workshop is currently
scheduled for an one-day duration,  but depending on the final program
it might be extended to a second day.

Submissions can be  in the form of  survey papers, experience reports,
and educational material  to facilitate technology transfer.  Accepted
papers  will be  published in  the workshop  proceedings  by the  IEEE
Computer  Society.  A  selected number   of  the accepted papers  will
possibly be expanded   and   revised  for  publication   in  the  IEEE
Transactions  on Knowledge and  Data  Engineering (IEEE-TKDE) and  the
International Journal of Artificial  Intelligence Tools.   Educational
material related to papers published  in the IEEE-TKDE will be  posted
on the IEEE-TKDE home page.

The theme of the workshop is "AI MEETS DATABASES".  Topics of interest
include, but are not limited to:

  - Computer supported cooperative processing and interoperable
    systems
  - Data sharing, data warehousing and meta-data management
  - Distributed intelligent mediators and agents
  - Distributed object management
  - Dynamic knowledge
  - Evaluation and measurement of knowledge and database systems
  - High-performance issues (including architectures, knowledge
    representation techniques, inference mechanisms, algorithms and
    integration methods)
  - Information structures and interaction
  - Intelligent search, data mining and content-based retrieval
  - Knowledge and data engineering systems
  - Quality assurance for knowledge and data engineering systems
    (correctness, reliability, security, survivability and
    performance)
  - Software re-engineering and intelligent software information
    systems
  - Spatio-temporal, active, mobile and multimedia data
  - Emerging applications (biomedical systems, decision support,
    geographical databases, Internet technologies and applications,
    digital libraries, etc.)

All submissions  should be  limited to a  maximum of  5,000 words. Six
hardcopies should be forwarded to the following address.
 
     Xindong Wu (KDEX-97)
     Department of Software Development
     Monash University
     900 Dandenong Road
     Caulfield East, Melbourne 3145
     Australia

     Phone: +61 3 9903 1025
     Fax: +61 3 9903 1077
     E-mail: [email protected]

Please include a cover  page   containing the title,  authors  (names,
postal and email  addresses,   telephone and   fax numbers), and    an
abstract.  This cover page must accompany the paper.

    ************ I m p o r t a n t   D a t e s *****************
    * 6 copies of full papers received by:    June 15,    1997 *
    * acceptance/rejection notices:           July 31,    1997 *
    * final camera-readies due by:            August 31,  1997 *
    * workshop:                               November 4, 1997 *
    ************************************************************

Further Information
===================

      WWW: http://www.sd.monash.edu.au/kdex-97
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: [email protected]
Date: Tue, 20 May 97 10:30:38 EDT
Subject: CFP: ICML-97 workshop on REINFORCEMENT LEARNING:  TO MODEL OR 
                  NOT TO MODEL, THAT IS THE QUESTION

                      Workshop at the Fourteenth 
                  International Conference on Machine 
                          Learning (ICML-97)

                 Vanderbilt University, Nashville, TN
                            July 12, 1997

                    www.cs.cmu.edu/~ggordon/ml97ws

Recently there has been some disagreement in the reinforcement 
learning community about whether finding a good control policy 
is helped or hindered by learning a model of the system to be 
controlled.  Recent reinforcement learning successes 
(Tesauro's TD-gammon, Crites' elevator control, Zhang and 
Dietterich's space-shuttle scheduling) have all been in 
domains where a human-specified model of the target system was 
known in advance, and have all made substantial use of the 
model.  On the other hand, there have been real robot systems 
which learned tasks either by model-free methods or via 
learned models.  The debate has been exacerbated by the lack 
of fully-satisfactory algorithms on either side for 
comparison.

Topics for discussion include (but are not limited to)

  o Case studies in which a learned model either contributed to 
    or detracted from the solution of a control problem.  In 
    particular, does one method have better data efficiency?  
    Time efficiency?  Space requirements?  Final control
    performance?  Scaling behavior?
  o Computational techniques for finding a good policy, given a 
    model from a particular class -- that is, what are good 
    planning algorithms for each class of models?
  o Approximation results of the form: if the real system is in 
    class A, and we approximate it by a model from class B, we 
    are guaranteed to get "good" results as long as we have 
    "sufficient" data.  
  o Equivalences between techniques of the two sorts: for 
    example, if we learn a policy of type A by direct method B, 
    it is equivalent to learning a model of type C and computing 
    its optimal controller.
  o How to take advantage of uncertainty estimates in a learned 
    model.
  o Direct algorithms combine their knowledge of the dynamics and 
    the goals into a single object, the policy. Thus, they may 
    have more difficulty than indirect methods if the goals change 
    (the "lifelong learning" question). Is this an essential 
    difficulty?
  o Does the need for an online or incremental algorithm interact 
    with the choice of direct or indirect methods?

full information at 
                    www.cs.cmu.edu/~ggordon/ml97ws
Contact:   Geoff Gordon ([email protected])

>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~