[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference nsic00::eis_dw

Title:	Executive Information Solutions & Data Warehousing Conference
Notice:	Welcome to the Data Warehousing conference
Moderator:	26002::HAGGERTY

Created:	Thu Sep 01 1994
Last Modified:	Thu Jun 05 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	499
Total number of notes:	2932

410.0. "KDD Nuggets" by UTROP1::dhcppc.uto.dec.com::olthof_h (Spellchecked Henry Although) Fri Nov 08 1996 08:06

T.R	Title	User	Personal Name	Date	Lines
410.1	KDD Nuggets 96:34	IJSAPL::OLTHOF	Spellchecked Henry Although	`Fri Nov 08 1996 08:12`	820
410.2	count me in	FOUNDR::BARNETT_T		`Tue Nov 12 1996 01:01`	5
410.3	Read note 406 and enroll on the WEB site	UTROP1::dhcppc.uto.dec.com::olthof_h	Spellchecked Henry Although	`Tue Nov 12 1996 08:04`	9
410.4	KDD Nuggets 96:35	IJSAPL::OLTHOF	Spellchecked Henry Although	`Sun Nov 17 1996 12:36`	886
410.5	KDD Nuggets 96:36	IJSAPL::OLTHOF	Spellchecked Henry Although	`Fri Nov 22 1996 08:40`	862
410.6	KDD Nuggets 96:37	IJSAPL::OLTHOF	Spellchecked Henry Although	`Wed Nov 27 1996 11:10`	929
410.7	KDD Nuggets 96:38	IJSAPL::OLTHOF	Spellchecked Henry Although	`Tue Dec 10 1996 07:11`	821
410.8	KDD Nuggets 96:39	IJSAPL::OLTHOF	Spellchecked Henry Although	`Sat Dec 14 1996 07:44`	1078
410.9	KDD Nuggets 96:40	IJSAPL::OLTHOF	Spellchecked Henry Although	`Fri Dec 20 1996 09:07`	770
410.10	KDD Nuggets 97:01	IJSAPL::OLTHOF	Spellchecked Henry Although	`Sun Jan 05 1997 19:16`	827
410.11	97:02	IJSAPL::OLTHOF	Spellchecked Henry Although	`Fri Jan 10 1997 10:21`	655
410.12	97:03	IJSAPL::OLTHOF	Spellchecked Henry Although	`Mon Jan 20 1997 14:17`	561
410.13	97:04	IJSAPL::OLTHOF	Spellchecked Henry Although	`Mon Feb 03 1997 11:47`	1444
	Knowledge Discovery Nuggets 97:04, e-mailed 97-01-28 News: * GPS, Information Week on Debunking Data-Mining Myths http://www.techweb.com/se/directlink.cgi?IWK19970120S0042 * N. Uffenheimer, EDS in the data warehouse, datamining, DSS areas Publications: * J. P. Brown, Data Mining: What Needs To Be Done, And Why. http://www.hal-pc.org/~jpbrown * F. Famili, Intelligent Data Analysis Journal - First Issue is live, http://www.elsevier.com/locate/ida Siftware: * B. Li, Parallel C4.5, http://merv.cs.nyu.edu:8001/~binli/pc4.5/ Positions: * E. Babb, Jobs in data mining in London, http://www.parsys.com/dafs.htm * D. Berleant, Tenure Track, Teaching and Research at U. of Arkansas Meetings: * D. Stodder, Data Mining Summit program, http://www.dbsummit.com -- KDD Nuggets is a free electronic newsletter for the Data Mining and Knowledge Discovery in Databases (KDD) community, focusing on the latest research and applications. Submissions are most welcome and should be emailed, with a DESCRIPTIVE subject line (and a URL, when available) to [email protected] To subscribe, email to [email protected] message with subscribe kdd-nuggets in the first line (the rest of the message and subject are ignored). See http://info.gte.com/~kdd/subscribe.html for details. Nuggets frequency is approximately 3 times a month. Back issues of Nuggets, a catalog of Siftware (data mining tools), and a wealth of other information on Data Mining and Knowledge Discovery is available at Knowledge Discovery Mine site http://info.gte.com/~kdd -- Gregory Piatetsky-Shapiro (editor) ******************* Official disclaimer ********************************* * All opinions expressed herein are those of the writers (or the moderator) * * and not necessarily of their respective employers (or GTE Laboratories) * ***************************************************************************** ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Variations on old chestnut on how to use programming languages to shoot yourself in the foot ... HTML: You shoot yourself in the foot, but the bullet takes 10 minutes to get there. VRML: You have to fight your way through 3 levels of DOOM before you can shoot yourself in the foot with a blaster cannon. JAVA: You shoot yourself and everyone else on the internet in the foot. JAVASCRIPT: You shoot yourself and everyone else on the internet in the foot with rubber bullets. PERL: You try to shoot yourself in the foot, but can't figure out the instructions that came with the gun. TCL: You shoot yourself in the foot with a cap gun. Thanks to L. Brothers >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 27 Jan 1997 17:11:18 -0500 From: [email protected] (Gregory Piatetsky-Shapiro) Subject: Information week on Debunking Data-Mining Myths - Content-Length: 23384 see http://www.techweb.com/se/directlink.cgi?IWK19970120S0042 for full text January 20, 1997, Issue: 614 Section: InformationWeek Labs Debunking Data-Mining Myths -- Don't let contradictory claims about data mining keep you from improving your business By Robert D. Small A great deal of what is said about data mining is incomplete, exaggerated, or wrong. Data mining has taken the business world by storm, but as with many new technologies, there seems to be a direct relationship between its potential benefits and the quantity of often-contradictory claims, or myths, about its capabilities and weaknesses. It's difficult to fight these myths, which are based on misunderstandings, hopes, and fears. The new technology cycle typically goes like this: Enthusiasm for an innovation leads to spectacular assertions. Ignorant of the technology's true capabilities, users jump in without adequate preparation or training. Then, sobering reality sets in. Finally, frustrated and unhappy, users complain about the new technology and urge a return to "business as usual." When you undertake a data-mining project, avoid a cycle of unrealistic expectations followed by disappointment. Understand the facts instead, and your data-mining efforts will be successful. - Simply put, data mining is used to discover patterns and relationships in your data in order to help you make better business decisions. Myth: Data mining produces surprising results that will utterly transform your business. Fact: Most often, the results of data mining yield steady improvement to an already successful organization, often contributing important incremental changes rather than revolutionary ones. Nevertheless, data mining can lead to significant change in several ways. First, it may give the talented business manager a small advantage each year, on each project, with each customer. Compounded over a period of time, these small advantages turn into a large competitive edge. For example, a catalog retailer that can better target its mailing list can increase profits by reducing the cost of mailings while increasing the number of orders. Over time, this can result in a substantially more profitable business. Second, data mining occasionally does uncover one of those rare "breakthrough" facts, such as scientists' noticing the association between the fatal Reyes Syndrome and children taking aspirin. In short, data mining is a powerful search tool for forward-looking companies. Myth: Data-mining techniques are so sophisticated that they can substitute for domain knowledge or for experience in analysis and model building. Fact: No analysis technique can replace experience and knowledge of the business and its markets. On the contrary, data mining makes education and experience in many areas more important than ever.While experts may need to learn new analytical techniques to stay current and make leading-edge contributions, someone who's an expert only in analytical techniques, without having knowledge of the business, is of no help. Experience in building models, however, can ensure more profitable use of data mining, since data mining is simply the newest tool for building models. The less domain knowledge a data mining expert brings to a problem, the more important it is to perform the data mining in close cooperation with people who understand the business. Similarly, the less skill and experience that business experts have in modeling and using the associated tools, the more help they need from data-mining experts in leveraging their business knowledge. For example, financial analysts seeking to increase the return on their clients' investments may ask an expert data miner to analyze a large, complex database on previous clients. The data miner may discover that certain variables predict success in investing, but it takes a financier to know whether it's legal to influence those variables. Myth: Data-mining tools automatically find the patterns you're looking for, without being told what to do. Fact: Data mining is most cost-effective when used to solve a particular problem. Although a data-mining tool can indeed explore your data and uncover relationships, it still needs to be directed toward a specific goal. Simply giving a data-mining tool a mailing list and expecting it to find customer profiles that improve the efficiency of a direct-mail campaign is not particularly effective. You need to be more specific in your goals. For example, to improve the value of mailing-list responses, your model might emphasize customers who have previously bought expensive items; to increase the number of responses, your model might emphasize customers who have responded to previous mailings. Myth: Data mining is useful only in certain areas, such as marketing, sales, and fraud detection. Fact: Virtually any process from pharmacology to customer service can be studied, understood, and improved using data mining. These techniques are being applied to such diverse applications as manufacturing process control, human resources, and food-service management. Data mining is useful wherever data can be collected. Of course, in some instances, cost/benefit calculations might show that the time and effort of the analysis is not worth the likely return. For example, suppose you suspect that if you collect just one more piece of information about your customers, you could double the number of orders you received. But you also know that mailing to twice as many people will also double the number of orders. If gathering the data is more expensive than sending the extra mailings, then it makes sense to increase the mailings rather than mine the data. Myth: The methods used in data mining are fundamentally different from the older quantitative model-building techniques. Fact: All methods now used in data mining are natural extensions and generalizations of analytical methods known for decades. Neural nets, a special case of projection pursuit regression, were developed in the 1940s. CART (classification and regression trees) methods were used by social scientists in the 1960s. K-nearest neighbor, a form of density estimation, has been used for a half-century. All these methods-just like regression techniques-model relationships between a set of profile variables and an outcome. What's new in data mining is that we're now applying these techniques to more general business problems, thanks to the increased availability of data and inexpensive processing power. Furthermore, because communication between the business community and methodologists, who are mainly academics, has often been poor, there was, until recently, no user-friendly software for implementing these methods. The recent interest in data mining is in part due to the improved user interfaces that make these techniques more available to business experts. The rise of these powerful methods is a great step forward, but the old tools are still valuable. Varieties of regression techniques, discriminant analysis, and even simple graphs can help reveal hidden patterns. No single method solves all or even a majority of problems. Successful data mining requires a portfolio of tools, both old and new. Myth: Data mining is an extremely complex process. Fact: The algorithms of data mining may be complex, but new tools have made those algorithms easier to apply. Often, just the correct application of relatively simple analyses, graphs, and tables can reveal a great deal about our business. Much of the difficulty in applying data mining comes from the same data-organization issues that arise when using any modeling techniques. These include data preparation tasks-such as deciding which variables to include and how to encode them-and deciding how to interpret and take advantage of the results. Myth: Only massive databases are worth mining. Fact: It's true that many methods used in data mining were specifically developed for analyzing very large data sets, and that many data-mining applications involve massive data sets. But a moderately sized or small data set can also yield valuable information. For example, buying patterns may depend most strongly on the day of the week or the time of the year. A modest database consisting of only "day" and "sales" could show this pattern, give the retailer some idea of its magnitude, and allow for planning of inventory and staffing. Even when building a massive database, try out some simple analysis on the data while the database is still moderate in size. You may decide to collect the data differently or to collect different data altogether. Myth: Data mining is more effective with more data, so all existing data should be brought into any data-mining effort. Fact: More data items are useful only if they contribute more information about the issues at hand, or goals. Otherwise, they can be worse than worthless. A database may have a great deal of information about an item (or about the relationship between items) but nothing about other items that are actually closely related. For example, a company may have information about how customers use one credit card, but nothing about how those customers use their other credit cards. However, adding data with little information content can actually lower the predictive power of the database. By including irrelevant data or adding multiple measurements of the same item, the utility of the data-mining results will be reduced. For example, if you include age as well as birth date, the analysis tool will discover that both factors are equally relevant and will therefore assign a lower weight to both measures as predictors. Myth: Building a data-mining model on a sample of a database is ineffective, because sampling loses the information in the unused data. Fact: The thrust of almost all developments in the study of sampling is to maximize the amount of information gained per unit of effort expended. Keep in mind that your data probably already represents a sample of a larger population. When you analyze your customer database to help acquire new customers, you're basing your model on a sample of the total population. Under some circumstances, you may be forced to sample. Not all your data may be relevant to the problem at hand or reflect the population you're trying to model. Many data warehouses include historical data that reflects conditions-such as unexpired patents-that no longer apply, rendering it inappropriate for building a model to guide future decisions. Sometimes full-scale data-gathering is not practical. For example, if you'd like to learn about customers' satisfaction with your new product or service, but it takes an hour to administer a customer satisfaction survey, you'll most likely decide to limit your analysis to a sample. In fact, a relatively small random probability sample, correctly taken, can yield excellent results. Although there are 60 million or more voters in a presidential race, the final poll before the election, which is based on two-thousandths of 1% of those voters, is seldom off by more than 2%. If we had a database of all 60 million voters and hundreds of measurements on each one, we couldn't build a better model for predicting the winner. Even when it's possible to build the model on the entire database, you may choose not to. It's often a better use of resources to build and evaluate many models using samples of the data, rather than rely on a single model using all the data. Myth: Data mining is another fad that will soon fade, allowing us to return to standard business practice. Fact: Although the name may change, data mining as a vital application will not go away. Companies have been using related quantitative techniques in many parts of their businesses for a long time. Data mining is just one more advance in a research process that has been ongoing since the beginning of the 20th century. A recent increase in the power of computers, coupled with cheap electronic methods for capturing large amounts of data, brings us to this step now. Data mining can't be ignored-the data is there, the methods are numerous, and the advantages that knowledge discovery brings to a business are tremendous. Companies whose data-mining efforts are guided by "mythology" will find themselves at a serious competitive disadvantage to those organizations taking a measured, rational approach based on facts. Robert D. Small is VP of Research of Two Crows Corp. in Potomac, Md. He can be reached at [email protected]. SIDEBAR: Six Steps For Successful Data Mining - Identify the goal - Assemble the relevant data - Choose your analysis methods - Decide which software tool is best for implementing the method - Run the analysis - Decide how to implement the results Data: Two Crows Corp. Copyright � 1997 CMP Media Inc. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: [email protected] Date: Thu, 16 Jan 1997 22:15:57 -0800 Subject: EDS in roads into the data warehouse, datamining, DSS areas EDS, the largest computer service provider in the world, has established a focused consulting practice in the area of data warehousing, data mining and decision support systems. EDS built a world-class integration lab (in the domain of the insurance industry)to demonstrate applications, test tools, integrate solution components and build proof of concepts. For a free white paper and additional information, please contact Nathan Uffenheimer at (972)604-8915. >~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: "jpbrown" <[email protected]> Organization: Ultimate Resources Date: Thu, 16 Jan 1997 13:57:24 -0006 Subject: What Needs To Be Done, And Why. Descriptive Introduction: The Databases that are the core of Data Warehousing are not just repositories. Together, they form an interactive machine that makes it possible to learn much more about the constituent population or populations. This expands on: http://www.hal-pc.org/~jpbrown Text: Most data collections are hybrid in one way or another. I have spent several years studying many actual cases. Over and over again, I ran into the apples and oranges problem, where there are sub-populations that are very different, one from another. I do not need to tell you how confusing the results of analysis can be, if these situations are ignored. I have continued to devise ways to detect the anomalies of the hybrid database, always assuming that some aspects of this problem may be present, or may develop with the passage of time. If they do develop as time goes on, there needs to be a method for detecting the onset of Change. I have developed, and expect to continue to develop, new methods to make effective, reliable analyses in cases where hybrid sub-populations are recognized. In using these techniques you can: * take an unfamiliar population and diagnose potential problems. * identify the causes of the problems. * apply different methods that will measure the analyzability of naturally occurring hybrid populations. * suggest ways to increase the utility of data, or to point out that some types of data are incurably unhelpful. * use different techniques (Autoclassification) to separate out sub-populations, based on predictability or other sources of coherence. * make reliable predictions. * detect and remedy Changes in causal systems that would otherwise reduce reliability. So far, the great strides that have been taken in Databases, Data Marts and Data Warehouses, have been advances in Data Manipulation. The next great strides will be taken in SuperInduction, and they will be applied before, during, and after the various steps of manipulation. The resulting Output: * will be based, without prejudice (objectively), on the Input. * will also have had the benefit of many kinds of new knowledge, developed during the analytical process. * and will be ideally presented to produce the best possible results for the corporate user (Decision Support). If you have gone through the Web Site http://www.hal-pc.org/~jpbrown and you want to see some of the extra complex links, let me know at [email protected] >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Fri, 17 Jan 1997 08:46:53 -0500 From: [email protected] (Fazel Famili) Subject: Intelligent Data Analysis Journal - First Issue is live Intelligent Data Analysis - An International Journal (New) An electronic, Web-based journal Published by Elsevier Science URL: http://www.elsevier.com/locate/ida http://www.elsevier.nl/locate/ida The first issue of Intelligent Data Analysis journal is on live. This is a quarterly journal published by Elsevier Science Inc. The journal is planning to offer a number of new features that are not currently available in paper journals: (i) an alerting service notifying subscribers of new papers in the journal, (ii) links to large-scale data collections, (iii) links to secondary collection of data related to material presented in the journal, (iv) the ability to test new search mechanisms on the collection of journal articles, (v) links to related bibliographic material, and (vi) inclusion of 3-D objects and multiple color graphs. Please refer to one of the above sites that contain articles for the first issue and journal home page (e.g. Aims and Scope, Author Submission Guide- lines, and more). Best wishes, A. Famili Editor-in-Chief >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Thu, 23 Jan 1997 10:02:10 -0500 From: [email protected] (Bin Li) Subject: new siftware entry for PC4.5 Could you add an entry in the Siftware page for our parallel C4.5 classification tool? Thanks, _______ Bin Li ---------------------------------------------------------------------------- Siftware: Parallel C4.5 (PC4.5) URL: http://merv.cs.nyu.edu:8001/~binli/pc4.5/ Description: If you have C4.5 and a network of workstations that are accessible to you, PC4.5 will help you better use C4.5. PC4.5 offers you these advantages: 1. It is faster. In an N trial c4.5 run, a single process builds N classification trees one by one and then picks the best one. In PC4.5, the N trials are each handled by a process and each process is run on a different machine (if N or more machines are available). 2. It is fault-tolerant. PC4.5 automatically assigns a process to a machine if the machine is idle (i.e. no activity by the machine's owner). If the owner of a machine comes back or it fails during a PC4.5 computation, the PC4.5 process automatically retreats and resumes on a different machine that is idle. 3. It supports multiple platforms. PC4.5 runs on SunOS, Solaris and Linux machines (for HPUX, IRIX, and ALPHA, please contact author). Networked multi-platform workstations can run PC4.5 processes of a single PC4.5 program at the same time. PC4.5 is built with the Persistent Linda (PLinda) system, a software system for robust distributed parallel computing developed at New York University. To get more information on PLinda, please visit our web site at http://merv.cs.nyu.edu:8001/~binli/plinda/ or send email to [email protected]. Both PC4.5 and PLinda are research efforts led by professor Dennis Shasha. Important: You must have the original C4.5 package in order to use PC4.5. To get C4.5, please contact Dr. J. R. Quinlan ([email protected]). Discovery tasks: Classification Platform(s): Unix (SunOS, Solaris, Linux; please contact author for HPUX, IRIX, and ALPHA) Contact: Bin Li 715 Broadway, Rm 715 New York, NY 10003 (212) 998-3485 email: [email protected] (preferred) Status: Public Domain (source code) Source of information: ftp://cs.nyu.edu/pub/plinda/pc4.5.tar.gz Updated: 1997-01-22 by Bin Li, [email protected] >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: 17 Jan 1997 12:04:57 +0000 From: "Ed Babb" <[email protected]> Subject: kdd- job in data mining OPPORTUNITY IN DATA MINING! PARSYS is a leading European supplier of parallel systems and technology. They are currently the lead partner in a large multinational ESPRIT project aimed at building a parallel data mining file server. Consequently, they are looking for people interested in data mining systems and with experience of parallel computers, database technology and machine learning. The positions involve adapting learning techniques such as rule induction, neural networks, genetic algorithms to run on a parallel computer. Also helping to adapt an existing database system to run on a parallel machine. Enthusiasm for producing fast algorithms in C is essential. At least a 2.1 degree in Computing, Artificial Intelligence or equivalent is needed. In addition, several years relevant experience is desirable. Salary will depend on age and experience. Please post your CV stating current salary to: Ed Babb, PARSYS LTD, Boundary House, Boston Road, Hanwell, London, W7 2QE, UK. Alternatively email him on [email protected] if you wish to make any brief informal enquires. Please see http://www.parsys.com/dafs.htm for summary of the DAFS project. ********************************************* >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: [email protected] (BERLEANT DANIEL J) Date: Tue, 21 Jan 1997 08:20:37 -0600 Subject: POSITION: Tenure Track, Teaching and Research This is an informal request for inquiries from people interested in the tenure track position offered by our dept. starting next September. Feel free to spread the word. If you are interested in teaching two software related courses per semester (typically one undergrad, one grad) and in doing research in empirical NLP, text processing, information retrieval from full text, data/knowledge mining from full text, etc., AND you have/are getting Ph.D. and a formal qualification in engineering (Bachelor's, Master's, or Ph.D. degree with the word "engineering" in it or issued by a dept., college, campus, or university with the word "engineering" in its name, etc.), please email me to discuss applying. If you don't think you have an engineering degree, check - maybe you'll be surprised. I am very interested in promoting applications from people in the above mentioned areas and look forward to responding forthrightly to your inquiry. Best Regards, Daniel Berleant Dept. of Computer Systems Engineering University of Arkansas, Fayetteville Phone: (501) 575-5590 Fax: (501) 575-5339 Email: [email protected] >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 20 Jan 97 12:54:56 PST From: "Dave Stodder" <[email protected]> To: [email protected] Subject: Data Mining Summit program As you know, the 1997 Data Mining Summit is coming up Feb. 18-21 in San Francisco. The conference is sponsored by Miller Freeman Inc.'s Database Programming & Design and DBMS magazines. We have a great lineup of speakers: Usama Fayyad, Evangelos Simoudis, Kamran Parsaye, Larry Kershberg, Bob Vere, Gene Feruzza, and others, including case studies. The complete program is located at www.dbsummit.com. I am attaching files of the complete program, if it would be possible to include it with KDD Nuggets. Thanks very much, David David Stodder Conference Chair, Data Mining Summit Editor-in-Chief, Database Programming & Design 411 Borel Ave., Suite 100 San Mateo, CA 94402 (415) 655-4290, Fax (415) 655-4350 Internet: [email protected] Return-Path: <[email protected]> Date: Mon, 27 Jan 1997 17:01:16 -0500 (EST) X-Sender: [email protected] (Unverified) X-Mailer: Windows Eudora Pro Version 2.2 (16) Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable To: [email protected] From: Gregory Piatetsky <[email protected]> Content-Length: 36234 Tuesday, February 18 Data Mining and the Internet: New Dimensions in Knowledge Discovery Chaired by David Stodder Editor-in-Chief Database Programming & Design Successful application of data mining tools and knowledge discovery tools methods can have a tremendous effect on an organization. Combined with the Internet, data mining explodes into a new world of possibility. Electronic commerce and other activity will create huge new resources of data that businesses can mine for greater efficiency and customer service. But perhaps more importantly, data mining combined with Internet-based applications has the potential to deliver whole new areas of profitable decision support services. This special seminar will focus on the dynamic combination of data mining, advanced databases, and the Internet. Bringing a series of experts together, this all-day session will cover key topics, including: -- Development and use of intelligent software agents -- How data mining fits with the technology advances made by commercial search engines and browsers -- Case studies of organizations that have created effective data mining applications for Internet customers -- Developments in heterogeneous database access to enable wider use of data mining -- Data mining and knowledge discovery methods that work best for creating Internet-aware applications -- Advances in graphics and data visualization that will impact Internet data mining applications For the latest news about this seminar, including the scheduled speakers, please check back with this Web site. The complete program will be in place in early December. Wednesday, February 19 8:30 - 9:35 OLAP and Data Mining: Bridging the Gap=20 Part I Kamran Parsaye=20 CEO Information Discovery Inc.=20 To date, most observers have viewed data mining and online analytical processing (OLAP) as separate components of decision support. It has been difficult to link the two largely because no coherent theory exists upon which to build a relationship. In this keynote speech, Parsaye will introduce a unified theory and methodology for OLAP and data mining. He will describe in detail how the two activities can reinforce each other. Parsaye will begin by describing the "dimensions" of decision support= and how data mining activity fits into one of the dimensions. Data mining within a single dimension is a rough approximation of multidimensional mining. Parsaye will describe how a lack of attention to dimensionality in data mining can result in unexpected results reminiscent of the "lossless join= " problem in the early days of relational databases. In the second part of his presentation, Parsaye will present a formal framework for mining OLAP data and will introduce a new set of multidimensional normalization constructs that allow us to understand OLAP discovery. In this session you will learn: - How OLAP, data mining, and other activities fit together in the four "spaces," or dimensions, of decision support - Limitations of normalization and star schemas for data mining activities - New structures that go beyond star schemas - A methodology for applying OLAP data mining, with three distinct= processes of episodic, strategic, and continuous mining for specific user groups within corporate environments. Kamran Parsaye is CEO of Information Discovery Inc. He has developed commercial data mining applications since the mid-1980s. Parsaye has a range of experience in the software industry both in research and in business, and has provided guidance to top-level management of leading industrial, financial, and government organizations. He is coauthor of Intelligent Database Tools & Applications (John Wiley & Sons, 1993). 9:45 - 10:50 OLAP and Data Mining: Bridging the Gap Part II Kamran Parsaye CEO Information Discovery Inc. (For description, see above) Break 10:50 - 11:10 11:10 - 12:15 Institutionalizing Knowledge Discovery: Creating a New Business Process Tej Anand Director of Knowledge Discovery Human Interface Technology Center NCR Corp. Practitioners are slowly beginning to accept that knowledge discovery is much more than just the application of machine learning or statistical algorithms to a dataset. Researchers understand that a knowledge discovery process exists, and they even agree on what basic tasks make up that process. However, for knowledge discovery to move beyond finding "interesting trivia" to become a business process akin to marketing, the details behind the knowledge discovery process must be expounded. Anand will take the process apart to reveal its details; he will offer practical ideas for accomplishing business goals through a new understanding of the process. In this session you will learn: - Why knowledge discovery is so difficult (contrary to what you might have= heard) - Why you cannot buy a tool to "do" knowledge discovery for you - How process templates can remind the practitioner of tasks he or she must complete and can provide a framework for making, recording, and auditing decisions during the knowledge discovery process - How process guides help the practitioner select data transformation techniques, interpret data visualizations, select the correct machine learning or statistical algorithm, and interpret results - How embedding templates and guides into tools will allow knowledge discovery to become an institutionalized business process. Tej Anand is director of the knowledge discovery team at NCR Corp.=92s Human Interface Technology Center. In 1993, he established this business and technical consulting team to help retail, insurance, consumer packaged goods, and other commercial enterprises realize business insights hidden in their operational data. Team members also conduct research and development to create knowledge discovery processes and data mining tools. Prior to joining NCR, Anand developed data mining tools for A.C. Nielsen Co. He has also been a member of the research staff at Philips Laboratories, where he did research in the area of artificial intelligence software systems. 12:15 - 1:30 Lunch Track A: Algorithms and Methods 1:30 - 2:35 Data Mining and the KDD Process: Algorithms and Limitations=20 Part I Usama Fayyad Senior Researcher Microsoft Research This two-part talk will provide an overview of the rapidly growing area of knowledge discovery in databases (KDD). Fayyad will define KDD goals, present motivations guiding the KDD process, and discuss how KDD relates to data mining. He will then focus on the core data mining methods. These methods have their origins in statistics, pattern recognition, artificial intelligence (machine learning), databases, and parallel computing. Fayyad will explore the limitations and challenges of each major data mining method. He will break these methods down into classes and will cover a sampling of algorithms for each class, outlining its advantages and= limitations. The goal of this two-part presentation is to provide a detailed snapshot of the current state of data mining methods, how they fit into the KDD process, and what key challenges developers should be aware of when applying them. Fayyad will focus primarily on the technical aspects of the algorithms rather than their use in particular implementations. In this session you will learn: - Definitions of KDD and data mining and how the two areas fit together - Dominant data mining methods used in the field and the specific problems they address - Critical limitations and challenges of each method - How to avoid pitfalls when applying data mining methods. Usama Fayyad is a senior researcher at Microsoft Research. His interests include knowledge discovery in large databases, data mining, machine learning theory and applications, statistical pattern recognition, and clustering. Before joining Microsoft in 1996, he headed the Machine Learning Systems Group at the Jet Propulsion Laboratory (JPL), California Institute of Technology, where he developed data mining systems for automated science data analysis. He remains affiliated with JPL as a distinguished visiting scientist. Fayyad received the JPL 1993 Lew Allen Award for Excellence in Research and the 1994 NASA Exceptional Achievement Medal. He was program cochair of KDD-94 and KDD-95 (the First International Conference on Knowledge Discovery and Data Mining). He is general chair of KDD-96, an editor-in-chief of the journal Data Mining and Knowledge Discovery, and coeditor of Advances in Knowledge Discovery and Data Mining (MIT Press,= 1996). 2:45 - 3:50 Data Mining and the KDD Process: Algorithms and Limitations=20 Part II Usama Fayyad Microsoft Research (For description, see above) 3:50 - 4:15 Break 4:15 - 5:00 Data Mining: The View from IBM 5:00 - 5:45 Data Mining: The View from Tandem Computers Track B: Case Studies in Data Mining 1:30 - 2:35 Leveraging Customer Information for Competitive Advantage Lisa Modisette Director of Wireless Intelligent Solutions Lightbridge Inc. The cellular phone industry today looks much like the credit-card industry of a few years ago. The market is growing at nearly 50 percent a year but will reach a saturation point soon- just as the credit card industry has. "Churn," or customer attrition, is a growing problem for the maturing cellular phone industry. In this case study, Modisette will describe how data mining techniques that worked so well in the credit card industry to prevent and reverse customer attrition may be applied to the wireless telecommunications industry. Modisette will describe how Lightbridge Inc., a wireless communications provider, has used data mining tools to retain good customers at minimal cost. Data mining tools make use of existing customer transactional and demographic data, allowing companies to quickly and easily discover customer needs. Detailed customer knowledge will enable carriers to prepare for a more saturated market and offer new businesses based on customer knowledge. In this session you will learn: - How Lightbridge uses data mining and churn modeling techniques to combat customer attrition - Specific predictive modeling techniques and their effectiveness - How to get the most out of existing data and acquire a deeper knowledge= of customer behavior. Lisa Modisette is responsible for the development and marketing of Lightbridge Inc.=92s Wireless Intelligence line of products and services, designed to provide decision support and database marketing to wireless carriers. She joined Lightbridge in 1994 and has driven the development of the new decision-support product line since its inception. Modisette has experience in identifying customer needs and in creating and maximizing the use of decision-support systems, database marketing, and customer segmentation. Modisette also has expertise in OLAP, business intelligence, database marketing, product management, sales training, and a variety of information technology. Before joining Lightbridge, she was director of the telecommunications industry practice at Metaphor Inc., an IBM subsidiary. She has a B.A. in marketing from the University of Colorado. 2:45 - 3:50 Business Experiences with Data Mining Evangelos Simoudis Director of Data Mining Solutions IBM Corp. Health care and insurance are two industries that offer interesting opportunities for data mining applications. In this presentation, Simoudis will describe how two businesses have developed production data mining systems. The Health Insurance Commission (HIC), an agency of the Australian government, processes claims for Australia=92s Medicare, Medibank Private, Pharmaceutical Benefits, and Child Care programs. HIC uses data mining to help reduce costs by ensuring that all medical tests and services are appropriately prescribed and accurately billed. John Hancock, an insurance and financial services provider, has a marketing and services database to support the company=92s cross-selling efforts and= to accurately identify future customer service requirements. Hancock developed a survey of 55,000 targeted users; it uses data mining to provide profiles based on survey results. In this session you will learn: -- Case study examples of data mining methods used for reducing costs and profiling customers -- The technology/business integration important for data mining success -- Important processes to ensure accurate results from data mining Evangelos Simoudis is IBM=92s director of Data Mining Solutions. Before joining IBM, Simoudis led Lockheed Corp.=92s data mining research, and was responsible for the commercial introduction and marketing of Lockheed's Recon data mining system for financial and retail markets. Simoudis also spent six years as a member of the principal research staff at Digital Equipment Corp.'s Artificial Intelligence Center. He conducted research on machine learning, pattern recognition, knowledge-based systems, and distributed artificial intelligence; Digital has incorporated his research work in products for engineering design and diagnostics. Simoudis has written extensively on data mining and machine learning, and is the North American editor of the Artificial Intelligence Review. 3:50 - 4:15 Break 4:15 - 5:00 Data Mining: The View from Angoss Software Thursday, February 20 8:30 - 9:35 Keynote Speech Speaker TBA 9:45 - 10:50 Weaving Detail into the Big Picture Denise M. Barnhart Chief, Corporate Analysis Division Army and Air Force Exchange Service "There=92s too much data ... but it=92s just not enough." With the= continued growth of very large databases (VLDBs) and the mushrooming need for quick access to progressively smaller details of the retail business, corporations risk losing sight of the larger view, the brighter opportunity, or the insidious trend. The Army and Air Force Exchange Service (AAFES), which provides $6 billion in goods and services to military servicemen and servicewomen around the world, has taken on this challenge. In a case study presentation, Barnhart will describe AAFES=92s extensive use of massively parallel analytical processing and data mining. The organization uses this advanced technology for retail research and integrating analysis results with operational and strategic processes. In this session you will learn: - How AAFES uses neural nets to understand demographics and project market potential - Neural net applications that let an organization view data both at the total business level and at the detailed level of specific items in a retail store - How AAFES calculates relationships between retail items and categories= and links these categories to demographic characteristics - Techniques for the cross-utilization of multiple databases for= configuring retail stores to maximize corporate earnings per square foot - How to overcome challenges in integrating database patterns into the corporate strategic vision. Denise Barnhart is chief of the Corporate Analysis Division, part of the Army and Air Force=92s Exchange Service=92s (AAFES=92s) Strategic Planning Directorate. AAFES is profit-generating agency of the Defense Department. Barnhart joined AAFES in 1976 as a CPA and has since specialized in the strategic optimization of stores for the benefit of both customer satisfaction and bottom line. She was an early proponent of the day-to-day use of neural nets in planning store construction in the late =9280s. Today, AAFES wholly plans mall sales and earnings levels, store mix, sizing, and parking requirements with neural net analyses. With the refinement of retail point-of-sale in the =9290s, Barnhart has extended corporate strengths in local markets. 10:50 - 11:10 Break 11:10 - 12:15 The Visualization of Large, Complex Datasets Georges Grinstein Professor, Institute for Visualization and Perception Research=20 University of Massachusetts Lowell Visualization is the translation of data, sampled or generated, into some perceptual presentation, most typically visual, to provide insights into the data. It represents the mapping of data into a symbolic representation useful for researchers, analysts, scientists, and business managers. This "mapping," or interaction, can occur at several stages of the= visualization presentation pipeline; it directs the transformations or alters the presentation of data. Visualization is no longer simply an application of computer graphics. While computer graphics remain the underpinning technology of this discipline, visualization now includes- and must support- databases, real-time interaction, networking, supercomputing, multimedia, visual programming, systems theory, and human perception. This development has provided some very fertile ground for integrating knowledge discovery, statistics, and visualization. In this talk Grinstein will highlight key research issues in the visualization of large, complex informational spaces. In this session you will learn: - A brief history of visualization, from initial efforts to extend data presentation beyond the classic pixel-driven techniques to the current challenge of encompassing domain knowledge - How visualization and data mining can work together to provide rich user-exploration and analysis environments - How to make astute use of visualization techniques. Georges Grinstein is a professor of computer science at University of Massachusetts in Lowell, Massachusetts. He also serves as director of the university=92s Institute for Visualization and Perception Research and is principal engineer with MITRE Corp.'s Center for Air Force C3I Systems.=20 Track A: Algorithms and Methods 1:30 - 2:35 Improving Prediction Performance with Genetic Algorithms=20 Steven Vere President Ultragem Data Mining Co. Data mining with genetic algorithms is a new technology aimed at improving prediction performance. However, many of today's commercial data mining products actually incorporate older machine learning algorithms, such as ID3 and CART. These systems use heuristic algorithms to generate decision rules. Being heuristic, they do not guarantee the best in prediction performance; in most cases, we now know they do not. Ten years ago, these technologies represented a good trade-off between prediction performance and training speed. But in today=92s high-speed computing environment, it is possible to use the controlled, brute computational force of genetic algorithms to find the higher performing prediction rules that heuristic algorithms overlook. In this presentation Vere will describe techniques for efficiently applying the genetic algorithm paradigm to large data mining problems. In this session you will learn: - The definition and description of genetic algorithms - Applications of genetic algorithms to data mining and numerical= prediction problems - How specific techniques, such as averaging the predictions of sets of genetically generated classifiers, can significantly enhance performance. Steven Vere is president and founder of Ultragem Data Mining Co., a data mining consulting company specializing in the commercial application of evolutionary algorithms. He has over 20 years of experience in machine learning and artificial intelligence. Vere has served as a member of the computer science faculty at the University of Illinois, Chicago and has also held senior technical and management positions at the NASA Jet Propulsion Laboratory, Lockheed R&D Division, and Bank of America. His work has appeared in research journals, AI Encyclopedia, and Scientific American; he will be featured on a future episode of Beyond 2000, a television documentary series. Vere holds a Ph.D. in computer science from University of California at Los Angeles. 2:45 - 3:50 Data Mining: Finding the Total Business Solution Gene Feruzza President, Customer Management Services Too often, we view data mining as only data visualization, predictive modeling, or some other specific technique. Although these components are important, supporting the total business solution requires that we take a much broader scope. In this talk, Feruzza will on data mining processes in real-world applications developed in telecommunications, financial services, utilities, and online services. He will describe the cyclical nature of successful data mining, first focusing on the data infrastructure (data mart or warehouse) and data access and manipulation. Feruzza will then describe the role, and integration, of modeling processes and technologies, including rule-based techniques, traditional statistics, neural networks, and genetic approaches. He will discuss experiences with delivering the knowledge obtained from the technology to the business user, and how promote the strategic integration of technology and business applications. In this session you will learn: -- How to view the full scope of data mining needs to be to be successful. -- Why it=92s important to embrace and support all modeling technologies,= not just one -- Solutions to common pitfalls based on data mining experiences -- Best practices for delivering knowledge gained to the business user -- Why data mining should be a cyclical, "living" process. Gene Feruzza has extensive experience with advanced segmentation techniques utilizing basic statistics and regression modeling, rule-based segmentation, neural network modeling along with evolutionary and hybrid modeling architectures. For 12 years he has provided integrated marketing and business solutions for clients in telecommunications, electric utilities, financial services, aerospace, manufacturing, and retail. He has worked for two leading neural network hardware and software providers (HNC and Neural Ware) as an instructor and consultant. He has also developed and marketed his own database management and segmentation software. Feruzza graduated from the University of Pittsburgh with a BS in computer science and= mathematics. 4:15 - 5:00 Data Mining: The View from NeoVista 7:30 - 9:00 1:30 - 3:00 Birds of a Feather Breakout Sessions Success with data mining depends on an intimate knowledge of specific industry application requirements. After the first Data Mining Summit last April, we received many requests to include in the program organized "networking" sessions for attendees to discuss specific industry= challenges. To close out the Second Annual Data Mining Summit, we invite attendees to join in our special Birds of a Feather sessions, which will focus on data mining issues faced by specific industries. A vertical industry expert will lead each discussion group. Come and share your questions and experiences with other like-minded data mining practitioners! Depending on popularity, we plan to offer Birds of a Feather sessions about data mining in the following industries: - Retailing - Health care - Financial services - Telecommunications To help us organize the Birds of a Feather sessions ahead of the conference, please use the registration form to choose which vertical industry session you would like to attend. Track B: Case Studies in Data Mining 1:30 - 2:35 Artificial Intelligence and Process-Delay Analysis: A Decision-Tree Case= Study Bob Evans Member, Advanced Technology Staff RR Donnelley & Sons Co. Cylinder wear (called "banding") causes serious delays in the= rotogravure printing process and has plagued the industry for decades. A process-delay analysis initiative at RR Donnelley & Sons=92 Gallatin, Tennessee plant has reduced the incidence of cylinder banding to near negligible levels. In this presentation, Evans will describe the Evans-Fisher Process Analysis Model, a solution driven by decision-tree induction. Through case study examples, he will describe the use of this powerful artificial intelligence method for data mining. Evans will also address some of the business and social issues associated with data collection and analysis. At RR Donnelley, database technology is the vehicle for solving process problems. Evans will show how decision-tree induction may be viewed as automated query generation. Attendees will see examples of queries generated by this tool. Evans will explain how decision-tree induction guides users away from the "blind alleys" that can frustrate data mining efforts.=20 In this session you will learn: - How to astutely define and collect data for decision-tree induction - Case study examples of how the Evans-Fisher Process Analysis Model was developed and applied - How to use artificial intelligence and data mining to solve complex industrial problems. Bob Evans is on the advanced technology staff of RR Donnelley & Sons Co. in Gallatin, Tennessee. He is also an adjunct assistant professor of computer science at Volunteer State Community College in Tennessee. A 33-year employee of RR Donnelley, he is responsible for implementing and upgrading process-delay analysis using current data mining technology. He has published several articles and has given presentations on shop-floor applications of artificial intelligence. Computer scientists frequently cite his application of decision-tree induction to cylinder bands as a successful example of the transfer of data mining technology from the research laboratory to an industrial environment. Evans holds an A.B. degree in mathematics from Indiana University and a Master of Engineering degree in computer science from Vanderbilt University. 2:45 - 3:50 Fraud Detection Systems: Combining Data Mining and Machine Learning Tom Fawcett, Foster Provost Members of the Technical Staff Machine Learning Project NYNEX Science and Technology In this presentation, Fawcett and Provost will describe a framework that combines data mining and machine learning techniques to design fraud detection methods. Fraud detection is based on profiling customer behavior and checking for anomalies. The domain of this case study is cloning fraud in cellular telephony, but the methods involved are more widely applicable: any domain in which fraudulent usage is superimposed upon legitimate usage (as in credit card fraud) is a candidate. Fawcett and Provost use a rule-learning program to uncover indicators of fraudulent behavior from a large database of cellular calls. They will show how they use these indicators to construct profilers and how their system combines evidence from multiple profilers to generate high-confidence alarms. In this session you will learn: - How to create a profitable synergy of data mining and machine learning - How to address the intricacies of building data mining systems under real-world constraints - Complications that arise when trying to assign cost/benefit trade-offs= (the cost of handling a false alarm differs from the cost of missing fraudulent usage, which varies among fraud cases). Tom Fawcett works in machine learning, data mining, and knowledge-based systems. He has worked at NYNEX Science & Technology, GTE Laboratories, and MITRE Corp. Fawcett holds a Ph.D. from the University of Massachusetts at Amherst. While at GTE, his machine-learning system was used for automated adaptation in telecommunications network management. He developed and maintained a large knowledge-based mission planning system for MITRE. Fawcett has published articles addressing the representation problem in machine learning and has done research in case-based reasoning. Foster Provost works on machine learning and data mining at NYNEX Science and Technology, where, in addition to developing methods for the automated design of fraud detection systems, he has also made advances by combining data mining techniques with decision-analytic techniques for cost-effective technician dispatch. Prior to joining NYNEX, Provost worked on data mining in scientific domains, including botanical toxicology, high-energy physics, and infant mortality. His work produced advances in rule learning, scaling up machine learning methods to large databases, using background knowledge to guide learning, and selecting inductive bias. Provost holds a Ph.D. from the University of Pittsburgh, where he held IBM and Mellon graduate fellowships. He received a B.S. in physics and mathematics from Duquesne University. He is a recent recipient of NYNEX's President's Award. 4:15 - 5:00 Data Mining: The View from DataMind 7:30 - 9:00 Birds of a Feather 1:30 - 3:00 Success with data mining depends on an intimate knowledge of specific industry application requirements. After the first Data Mining Summit last April, we received many requests to include in the program organized "networking" sessions for attendees to discuss specific industry= challenges. To close out the Second Annual Data Mining Summit, we invite attendees to join in our special Birds of a Feather sessions, which will focus on data mining issues faced by specific industries. A vertical industry expert will lead each discussion group. Come and share your questions and experiences with other like-minded data mining practitioners! Depending on popularity, we plan to offer Birds of a Feather sessions about data mining in the following industries: - Retailing - Health care - Financial services - Telecommunications To help us organize the Birds of a Feather sessions ahead of the conference, please use the registration form to choose which vertical industry session you would like to attend. Friday, February 21 8:30 - 9:35 Data Mining 1997/98: Key Trends & Market Perspectives Aaron Zornes Executive Vice President and ADS Service Director Meta Group Although the data mining market garnered less than $100 million in 1996, industry analysts at Meta Group forecast the market will explode to more than $800 million by the year 2000. During 2Q96, Meta Group surveyed 250+ Global 2000=96size business users of data mining products and services in retailing, healthcare, financial services, and telecommunications. This presentation will highlight key survey findings regarding adoption criteria, timelines, technical parameters, and leading business applications. Meta Group=92s study investigated not only the traditional uses of data mining technology, such as fraud prevention and credit card authorization within the financial services industry, but also investigated rapidly emerging requirements stemming from data warehouse implementations and Web-enabled commerce and marketing. In this session you will learn: - How to interpret early user adoption rates by industry segments - What will be the impact of emerging systems integrators and data bureaus - What=92s behind current data quality, data warehouse, and data= visualization trends Aaron Zornes is executive vice president and ADS service director for Meta Group. He is a leading authority on the software industry as it relates to applications development and delivery- especially data warehousing and second-generation multitier client/server applications. Zornes has devoted more than 20 years to line and strategic management roles in leading vendor and user organizations, including executive and managerial positions at Ingres Corp., Wang Laboratories Inc., Software AG of North America, and Cincom Systems Inc. He is a frequent author and keynote speaker on data warehousing, data mining, advanced client/server tools, and customer-centric application architectures. Since 1992, He has been conference chair of DCI's Data Warehouse World conference series. 9:45 - 10:50 Knowledge Rovers: Configurable Agents to Support Enterprise Information Infrastructures Larry Kerschberg Professor and Chair, Information and Software Systems Engineering School of Information Technology and Engineering George Mason University Knowledge rovers represent a family of cooperating intelligent agents that can support a collection of scenarios, decision-makers, and tasks. These rovers play specific roles within the enterprise information infrastructure to support users, maintain complex views, and mine and refine data into knowledge. Rovers can roam the Internet, seeking, locating, negotiating for, and retrieving data and knowledge specific to their mission. For decision-makers to make appropriate use of information, the current flood of data must be filtered and transformed. In this presentation, Kerschberg will describe knowledge rovers and the data mining and software agent technology that creates them. He will highlight important rovers and how they fit into data warehouse, data mine, and data mart architectures. Kerschberg will describe Field Agent rovers that discover new resources, collect data, and bring back information; Information Curator rovers that refine data into knowledge and place it in an information repository; and Domain Servers that from within the repository facilitate access to multiple data types, such as images, text, formatted data, and simulation data related to a particular domain. Finally, Kerschberg will discuss Sentinal rovers that monitor Domain Servers for interesting events, patterns, and specified conditions to alert decision-makers and take actions on their= behalf. In this session you will learn: - The role of intelligent agents in supporting enterprise information architectures - How to integrate a family of configurable rovers for discovery, integration, and evolution of information - The interrelationship among concepts such as data warehouses, data mines, and information repositories in the enterprise information infrastructure - The concept of virtual data mines and data mining over multiple heterogeneous data sources. Larry Kerschberg is professor and chair of the Department of Information and Software Systems Engineering in the School of Information Technology and Engineering at George Mason University in Virginia. He is also director of the university=92s Center for Information Systems Integration and Evolution. His research focuses on intelligent agents, intelligent information integration, data mining and knowledge discovery in databases, and expert database systems. His research is funded in part by DARPA. Kerschberg is also President of KRM Inc., which pursues research and development in knowledge rovers and mediators in intelligent information systems. He is editor-in-chief of the International Journal of Intelligent Information Systems, published by Kluwer Academic Publishing Co. Kerschberg organized and has served as program chair of the First and Second International Conferences on Expert Database Systems. He holds a Ph.D. in engineering from Case Western Reserve University. 10:50 - 11:10 Break 11:10 - 12:15 Privacy Issues and Data Mining Panel Session Chaired by David Stodder, Editor-in-Chief, Database Programming & Design Data mining tools, when combined with large, sophisticated databases, already offer businesses and other organizations powerful new abilities to learn more about clients, customers, citizens, and taxpayers. The Internet and Web-enabled commerce will create vast sources of data and new ways to package information databases as products and services. Privacy and security specialists are becoming increasingly concerned that basic privacy rights could be trampled in the race to provide modern, intelligent information services. Businesses must take new security measures to protect proprietary data- and learn how to resolve the tug-of-war with competitors and service contractors over just who owns the data. This panel session will feature a selection of experienced users, security experts, and data mining professionals, who will focus on privacy and security concerns that broadly effect the practice of data mining. The panel will discuss what measures governments and business are taking- and should take- with regard to data mining and the development of new information= services. David Stodder is editor-in-chief of Database Programming & Design. He has been with the publication since its inception in 1987. He has served on the advisory board of several industry conferences, including IDUG North America, DCI=92s Database and Client/Server World, and Blenheim/NDN=92s= DB/Expo. He is also chair of Miller Freeman Inc.=92s VLDB Summit, Object/Relational Summit, and Business Rules Summit conferences.
410.14	97:05	IJSAPL::OLTHOF	Spellchecked Henry Although	`Wed Feb 05 1997 09:17`	1172
	Knowledge Discovery Nuggets 97:05, e-mailed 97-02-04 News: * W. Kloesgen, KDD-97: Call For Panel Proposals * E. Colet, Announcing a regular posting of NBA data mining patterns, http://www.nba.com/news_feat/ * GPS, Business Week Feb 3, 1997 Story on Data Mining * B. Griffin, Tools for quantifying newgroups and email postings? * M. Rebhan, GeneCards: genes, proteins and diseases. http://bioinformatics.weizmann.ac.il/cards Publications: * A. Basu, CFP: INFORMS Journal on Computing Special issue on Knowledge Discovery and Data Mining * M. Singh, CFP: IEEE Internet Computing, Special issue on Agents http://www.computer.org/pubs/internet/ Positions: * W. Buntine, PhD/Masters Research Assistantship at Berkeley Meetings: * D. Gordon, CFP: ICML-97 Workshop on ML application in the real world http://www.aifb.uni-karlsruhe.de/WBS/ICML97/ICML97.html * M. Smyth, Learning Methods Course by Hinton and Jordan, Washington, D.C., May 2 -- 3, 1997 * J. Zytkow, Forthcoming events related to Data Mining PKDD'97, ISMIS-97 and KDD-97 -- KDD Nuggets is a free electronic newsletter for the Data Mining and Knowledge Discovery in Databases (KDD) community, focusing on the latest research and applications. Submissions are most welcome and should be emailed, with a DESCRIPTIVE subject line (and a URL, when available) to [email protected] To subscribe, email to [email protected] message with subscribe kdd-nuggets in the first line (the rest of the message and subject are ignored). See http://info.gte.com/~kdd/subscribe.html for details. Nuggets frequency is approximately 3 times a month. Back issues of Nuggets, a catalog of Siftware (data mining tools), and a wealth of other information on Data Mining and Knowledge Discovery is available at Knowledge Discovery Mine site http://info.gte.com/~kdd -- Gregory Piatetsky-Shapiro (editor) ******************* Official disclaimer ********************************* * All opinions expressed herein are those of the writers (or the moderator) * * and not necessarily of their respective employers (or GTE Laboratories) * *************************************************************************** ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If there is a 50-50 chance that something can go wrong, then 9 times out of ten it will. (Paul Harvey News, 1979) Excerpted from "Quotes, damned quotes and..." by John Bibby. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 3 Feb 1997 15:01:47 +0100 From: [email protected] (Willi Kloesgen) Subject: KDD-97: Call for Panel Proposals As in previous KDD conferences, the KDD-97 program will include panel discussions. A great panel requires an interesting topic, good speakers, and proper preparation. To facilitate all three we solicit early suggestions. Please submit suggestions for topics and preferably also for panelists who could represent diverse positions or approaches of the topic. Suggested topics should relate to any of the main KDD-97 topics (see http://www-aig.jpl.nasa.gov/kdd97). The panel topics should be of general interest for a large part of the KDD audience and allow several (controversial) approaches to be discussed. Please email informal suggestions by April 2, 1997 (earlier if possible) to: Willi Kloesgen [email protected] >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: "Edward Colet"<[email protected]> Date: Wed, 29 Jan 1997 18:00:02 -0400 Subject: Announcing a regular posting of NBA data mining patterns. National Basketball Association teams have been using IBM's Advanced Scout data mining application to discover trends and patterns in game data. Now a selected set of discovered patterns are also made available to fans via a regular posting on the Internet before and after NBA/NBC's game of the week. The reported patterns are based on analyses of the teams previous game(s), and additional commentary is added following the game. The patterns can be found in the regular feature of the NBA website entitled, "Beyond the Boxscore" (found under "News and Features"). The NBA website is at "http://www.nba.com", and the data mining results are under "http://www.nba.com/news_feat/". There are also links to more information on Advanced Scout at "http://www.nba.com/ad/ibm", and at " http://www.research.ibm.com/scout/home.html/". Regards, Ed Colet. ***************************************** IBM T.J. Watson Research Center 30 Saw Mill River Road Hawthorne NY 10532 phone: 914-784-6621; tie-line 863 fax: 914-784-7455 email: [email protected] ******************************************* >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 3 Feb 1997 09:57:37 -0500 From: [email protected] (Gregory Piatetsky-Shapiro) Subject: Business Week Feb 3, 1997 Story on Data Mining Last week's Business Week has a very nice story by John Verity on "Coaxing Meaning out of Raw Data" (p. 134). It described several successful customer modeling applications at MCI, cellular fraud detection, US West, JPL, Walmart, and more and featured quotes from Usama Fayyad, Herb Edelstein, Steven Vere, and others. "A huge opportunity is opening up", according to Usama, but "the devil really is in the details", according to NeoVista CEO John Harte. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Fri, 31 Jan 1997 11:41:47 -0800 From: [email protected] (Brian Griffin) Organization: Netscape Subject: Recommendation Can you please recommend the best PD and commercial data mining tool for quantifying newgroups and email postings. Thank you very much, Brian Griffin Manager, Technical Support Netscape Communications Corp. [GPS -- if you do know such tools, please cc to [email protected] and I will summarize to the list] >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Wed, 29 Jan 1997 05:05:46 +0200 From: Michael Rebhan <[email protected]> Organization: Weizmann Institute of Science Subject: GeneCards: genes, proteins and diseases. http://bioinformatics.weizmann.ac.il/cards This database aims at integrating knowledge about all human genes, their products, and their involvement in diseases. And although it already integrates what is easily available in different heterogenous databases, the authors are planning to use technology from Artificial Intelligence, including Knowledge Discovery in Databases (KDD) tools, to expand the current resource. We would like to hear opinions from people inside the AI/KDD community regarding the following projects: a) a user guidance system that recognizes problems caused by "poorly designed" search strategies entered to suggest intelligent options to the user that might take him/her as fast as possible to the wanted information (this system should thus somehow replace an expert in the retrieval of biomedical information as much as possible). b) knowledge extraction tools taking data from free text, like from abstracts of papers in Medline, to gather data about the relationships between genes/proteins (which one interacts directly with which one a.s.o.), and about the role of a particular gene/protein in the pathogenesis of a particular disease Although both projects are still more or less ill-defined, we are very interested in your ideas. If you are also fascinated by this challenge, please email Michael Rebhan ([email protected]). Michael Rebhan, Ph.D. Weizmann Institute of Science, Dept. Biol. Serv., Bioinformatics Unit, Rehovot 76100, Israel (FAX: +972-8-934-4113) WWW: http://bioinfo.weizmann.ac.il/cards/rebhan.html Email: [email protected] >~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 27 Jan 1997 08:48:06 -0700 From: Amit Basu <[email protected]> Subject: cfp for INFORMS Journal on Computing Call for Papers on Knowledge Discovery and Data Mining for the INFORMS Journal on Computing The knowledge and data management area of the INFORMS Journal on Computing invites technical papers on the analysis, design and management of knowledge discovery and data mining methods and systems. Selected papers will be published in a special cluster on this topic. The journal is an official publication of the Institute for Operations Research and Management Sciences, and focuses on the interface between operations research/management science and computer science. Papers that deal with algorithms for system design, methods for efficient information management, and analytical or empirical studies of system performance are welcome. Topics of interest include (but are not limited to): * performance analysis of KD/DM algorithms (efficiency, scalability, reliability, etc.) * the use of optimization methods in KD/DM * comparative studies of KD/DM versus other exploratory data analysis methods, including traditional statistical and mathematical programming models * analysis of context-specific KD/DM methods * neural networks in KD/DM * performance analysis of uncertainty management methods in KD/DM * analysis of KD/DM algorithms in large-scale, distributed and/or heterogeneous database systems * efficiency and scalability analysis of KD/DM algorithms for specialized databases (spatial, temporal, multimedia, statistical, etc.) * analysis of data mining methods on confidential data * efficient data preprocessing methods (e.g., scrubbing, sampling and reduction) for data mining * performance of KD/DM methods on multidimensional data Manuscripts should be prepared according to JoC guidelines. Deadline: July 31, 1997. Four (4) copies of each manuscript should be submitted to Professor Amit Basu, the Area Editor for Knowledge and Data Management, at the following address: Owen Graduate School of Management Vanderbilt University Nashville, TN 37203 TEL: 615-322-7043 FAX: 615-343-7177 email: [email protected] For more information, please contact Professor Basu at the above address, or the Editor-in-Chief of JoC, Professor Bruce Golden, at the address below: College of Business and Management University of Maryland College Park, MD 20742 TEL: 301-405-2232 FAX: 301-314-9157 email: [email protected] ------------------------------------------------------------------------------ Amit Basu Associate Professor Owen Graduate School of Management Vanderbilt University Nashville, TN 37203 TEL: 615-322-7043 FAX: 615-343-7177 >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Subject: IEEE Internet Computing: Agents From: [email protected] (Munindar Singh) Date: Wed, 29 Jan 1997 10:27:57 -0500 (EST) IEEE Internet Computing http://www.computer.org/pubs/internet/ CALL FOR PAPERS IEEE Internet Computing is a new bimonthly magazine from the IEEE Computer Society designed to help the engineer productively use the ever expanding technologies and resources of the Internet. Internet Computing and IC on-line will provide developers and users with the latest advances in Internet-based computer applications and supporting technologies such as the World Wide Web, Java programming, and Internet-based agents. Through the use of peer-reviewed articles as well as essays, interviews, and roundtable discussions, IC will address the Internet's widening impact on engineering practice and society. IC is soliciting regular papers and papers for theme issues, including one on agents. To submit, send e-mail to any member of the editorial board. Include a plain text abstract, and a URL from which the paper can be viewed. Members of the editorial board are listed on the IC web page. Author guidelines are available at http://www.computer.org/pubs/internet/auguide.htm Topics include system engineering issues such as agents, agent message protocols, engineering ontologies, web scaling, intelligent search, on-line catalogs, distributed document authoring, electronic design notebooks, electronic libraries, security, remote instruction, distributed project management, reusable service access and validation, electronic commerce, and Intranets. ----------------------------------------------------------- UPCOMING THEME ISSUES ------------------------------ Agents: Editorial Board Contacts: What kinds of agents are performing useful Munindar Singh work on the Internet? Papers should [email protected] clearly define both the applications and or technologies being used as well as the Michael Huhns sense of "agent." Applications should be [email protected] demonstrable. Issues include security, Due date: March 15, 1997 mobility, and agent communication languages. Claims about the efficacy of one approach or language should be supported by examples from applications. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Sat, 25 Jan 1997 10:50:41 -0800 From: Wray Buntine <[email protected]> Subject: PhD/Masters Research Assistantship PhD/Masters Research Assistantships Field: probabilistic algorithms, data analysis/mining and optimization for CAD Place: Electrical Engineering and Computer Science University of California, Berkeley The CAD group in the EECS Dept. at UC Berkeley is offering research support for its Masters and Doctoral program. Research areas include but are not limited to the use of data mining/analysis/engineering techniques in CAD or optimization, and probabilistic methods for optimization or specialized compilation. The Electronic Design Technology (EDT) field is concerned with computer automated or computer-assisted design of complex electronic systems. With current hardware capabilities advancing rapidly, a key bottleneck is the development of advanced algorithms for optimization and simulation of partial, abstract or completed designs. Our task is to design, code and experiment with new algorithms, methodologies, and software technologies for alleviating this bottleneck. The task can include the use of data mining/analysis to understand the nature of the optimization task, or in order to develop adaptive optimization methods. The ideal candidate should have a background in computer science, electrical engineering or related disciplines, should be an accomplished or developing programmer, and should have an interest in the theory and mathematical techniques used in optimization, data analysis, or probabilistic methods. Candidates who wish to apply are invited to respond with a copy of their CV to: Professor R. Newton URL: http://www.eecs.berkeley.edu/~newton Dr. Wray Buntine URL: http://www.eecs.berkeley.edu/~wray Dr. Andrew Mayer URL: http://www.eecs.berkeley.edu/~mayer Dept. of Electrical Engineering and Computer Sciences 520 Cory Hall University of California at Berkeley Berkeley, CA, 94720 The CAD Group URL: http://www-cad.eecs.berkeley.edu EECS, UC Berkeley URL: http://www.eecs.berkeley.edu >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: [email protected] Date: Wed, 29 Jan 97 14:44:16 EST Subject: ICML-97 workshop CFPs CALL FOR PAPERS ML APPLICATION IN THE REAL WORLD: METHODOLOGICAL ASPECTS AND IMPLICATIONS Workshop at the Fourteenth International Conference on Machine Learning (ICML-97) Nashville, Tennessee July 12, 1997 WWW-page: http://www.aifb.uni-karlsruhe.de/WBS/ICML97/ICML97.html Description Application of Machine Learning techniques to solve real-world problems has gained more and more interest over the last decade. In spite of this attention, the ML application process is still lacking a generally accepted terminology, let alone commonly accepted approaches or solutions. Several initiatives, both conferences and workshops have been held concerning this topic. The ICML-93 workshop of Langley and Kodratoff on ML applications as well as at the ICML-95 workshop on 'Applying Machine Learning in Practice' by Aha, Catlett, Hirsh and Riddle form the successful precedents of this workshop. The focus of the ICML-95 workshop was the 'characterization of the expertise used by machine learning experts during the course of applying learning algorithms to practical applications'. In the last year a significant research effort has been spent that deals with applications of learning algorithms. A reflection of this is the recent interest in Data Mining and KDD, as for instance reflected in the international KDD- conference (1995 (Montreal) and 1996 (Portland, OR)). Since the application of ML-techniques is also very relevant to the KDD-community it is not surprising that this is also reflected in those conferences. The workshop will draw along the lines of all these events, but will emphasise the processes underlying the application of ML in practice. Methodological issues, as well as issues concerning the kinds and roles of knowledge needed for applying ML will form a major focus of the workshop. It aims at building upon some of the results of discussions at the ICML-95 workshop on "Application of ML techniques in practice" and at the same time tries to move forward to a consensus regarding a methodology on the application of learning algorithms in practice. The workshop "ML Application in the real world; methodological aspects and implications" focuses on the methodological principles underlying successful application of ML techniques. Apart from powerful ML algorithms, good application strategies have to be defined. This implies a thorough understanding of the initial problem definition and its relation to the chain of tasks that leads towards a successful solution. Therefore a two-dimensional approach regarding the process of ML application is needed. The first dimension deals with the whole cycle of analysing the setting, problem definition, knowledge extraction, database interaction, learning, evaluation and iteration in real-world domains, where the second dimension forms an "inner loop" to this cycle, where the problem definition is used to refine the task at hand and map it on available algorithms for learning, pre- and postprocessing and evaluation of results. Concerning these issues there is no clear distinction between ML and KDD, and therefore this workshop will be equally interesting for researchers from both communities. This workshop does not focus on (methods for) developing new algorithms. Moreover, case studies will only contribute to the workshop discussion if general application principles can be derived from them. Intended Participants and Audience The workshop primarily aims at scientists and practitioners that apply ML and related techniques to solve problems in the real world. To attend the workshop, one should submit a paper, a one page extended abstract or a statement of interest. In case of too much interest from participants, the program committee will select participants on the basis of workshop relevance. Ideally, the audience contains a mix of university and industrial participants. Workshop program The program for this one-day workshop will have a maximum of 10 presentations. Some invited presentations will be part of the program. Presentations will take 30 minutes (15-20 minutes presentation and 10-15 minutes discussion). Speakers are asked to focus their presentation on the basis of a topic list that will be compiled during the review process. To foster discussion and debate, accepted papers will be given to a critic beforehand; by these means critics will be prepared to debate presentations. At the end of the workshop, there will be a plenary discussion session. Accepted papers will be distributed via the workshop WWW-page before the workshop, to stimulate the discussion. Accepted papers will also be published in workshop proceedings. Papers are welcomed concerning (but not limited to) the following topics: * Methodological approaches focusing on the process of ML application, or sub-processes, such as problem definition and refinement, application design, data acquisition, pre- and postprocessing, task analysis etc. * Making explicit the kinds and roles of knowledge that are necessary for execution of ML applications. * Matching of problem definitions on specific techniques and multi- technique configurations. * Impact of methodologies for empirical research on the application of ML-techniques. * Identification of the relation of different ML strategies to given problem types and identification of the characteristics that play a role in describing the initial problems. * Embedding of the ML application process in more general methodologies for (knowledge) system development. * Frameworks for support of (ML-)novices and experts for setting up applications and reuse of previously application(part)s. * Case studies, describing successful ML applications, that abstract from the implementational aspects and focus on identification of the choices that are made when designing the application i.e. the (meta-)knowledge involved, etc. * Comparison of the process of ML application with processes for application of related techniques (e.g. statistical data analysis). Submission guidelines * Submitted papers should not exceed 3500 words or 8 pages Times Roman 12pt. * The title page should contain paper title, author name(s), affiliations and full addresses including e-mail of the corresponding author, as well as the paper abstract and five keywords at most. * Papers are reviewed by at least three members of the program committee on their relevance for the workshop discussions. * For preparation of the camera ready copies, an ICML style file will be available. Tentative Submission Schedule * Submission deadline: March 22, 1997 * Notification of acceptance: April 9, 1997 * Camera ready copy + PS-file: May 1, 1997 * Papers available on WWW: June 15, 1997 * Workshop date: July 12, 1997 Electronic paper submissions are preferred. Please send your submission to: [email protected]. If Postscript printing is not available, paper submissions (4 hardcopies, preferably double sided) can be sent to: ICML Workshop "ML APPLICATION IN THE REAL WORLD" p/o ATO-DLO, Floor Verdenius Postbus 17 6700 AA Wageningen Netherlands Program Committee Dr. Pieter Adriaans (Syllogic, Houten, The Netherlands) Prof. C. Brodley (Purdue University, West Lafayette, IND, USA) Prof. David Hand (Open University, Milton Keynes, United Kingdom) Prof. Yves Kodratoff (LRI, Paris, France) Dr. Vassilis Moustakis (Technical University of Crete, Chania, Greece) Prof. Gholamreza Nakhaeizadeh (Daimler Benz AG Research, Ulm, Germany) Dr. R. Kohavi (Silicon Graphics, Mountain View, CA, USA) Dr. Enric Plaza i Cervera (IIIA-CSIC, Bellaterra, Catalonia, Spain) Dr. Foster J. Provost (NYNEX Science & Technology, White Plains, NY, USA) Dr. P. Riddle (University of Auckland, New Zealand) Dr. Celine Rouveirol (LRI, Paris, France) Prof. Derek Sleeman (University of Aberdeen, United Kingdom) Drs. Maarten van Someren (SWI, Amsterdam, The Netherlands) Prof. Rudi Studer (University of Karlsruhe, Germany) Organising Committee Robert Engels (University of Karlsruhe, Germany) [email protected] Juergen Herrmann (University of Dortmund, Germany) [email protected] Bob Evans (RR Donnelley, Gallatin TN, USA) [email protected] Floor Verdenius (ATO-DLO, Wageningen, The Netherlands) [email protected] >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: Marney Smyth <[email protected]> Subject: Learning Methods Tutorial -- Washington DC, May 1997 Date: Sat, 1 Feb 1997 12:19:02 -0500 (EST) ************************************************************ * * * Learning Methods for Prediction, Classification, * * Novelty Detection and Time Series Analysis * * * * Washington, D.C., May 2 -- 3, 1997 * * * * Geoffrey Hinton, University of Toronto * * Michael Jordan, Massachusetts Inst. of Tech. * * * ************************************************************ A two-day intensive Tutorial on Advanced Learning Methods will be held on May 2nd and 3rd, 1997, at the Hyatt Regency on Capitol Hill, Washington D.C. Space is available for up to 50 participants for the course. The course will provide an in-depth discussion of the large collection of new tools that have become available in recent years for developing autonomous learning systems and for aiding in the analysis of complex multivariate data. These tools include neural networks, hidden Markov models, belief networks, decision trees, memory-based methods, as well as increasingly sophisticated combinations of these architectures. Applications include prediction, classification, fault detection, time series analysis, diagnosis, optimization, system identification and control, exploratory data analysis and many other problems in statistics, machine learning and data mining. The course will be devoted equally to the conceptual foundations of recent developments in machine learning and to the deployment of these tools in applied settings. Case studies will be described to show how learning systems can be developed in real-world settings. Architectures and algorithms will be presented in some detail, but with a minimum of mathematical formalism and with a focus on intuitive understanding. Emphasis will be placed on using machine methods as tools that can be combined to solve the problem at hand. WHO SHOULD ATTEND THIS COURSE? The course is intended for engineers, data analysts, scientists, managers and others who would like to understand the basic principles underlying learning systems. The focus will be on neural network models and related graphical models such as mixture models, hidden Markov models, Kalman filters and belief networks. No previous exposure to machine learning algorithms is necessary although a degree in engineering or science (or equivalent experience) is desirable. Those attending can expect to gain an understanding of the current state-of-the-art in machine learning and be in a position to make informed decisions about whether this technology is relevant to specific problems in their area of interest. COURSE OUTLINE Overview of learning systems; LMS, perceptrons and support vectors; generalized linear models; multilayer networks; recurrent networks; weight decay, regularization and committees; optimization methods; active learning; applications to prediction, classification and control Graphical models: Markov random fields and Bayesian belief networks; junction trees and probabilistic message passing; calculating most probable configurations; Boltzmann machines; influence diagrams; structure learning algorithms; applications to diagnosis, density estimation, novelty detection and sensitivity analysis Clustering; mixture models; mixtures of experts models; the EM algorithm; decision trees; hidden Markov models; variations on hidden Markov models; applications to prediction, classification and time series modeling Subspace methods; mixtures of principal component modules; factor analysis and its relation to PCA; Kalman filtering; switching mixtures of Kalman filters; tree-structured Kalman filters; applications to novelty detection and system identification Approximate methods: sampling methods, variational methods; graphical models with sigmoid units and noisy-OR units; factorial HMMs; the Helmholtz machine; computationally efficient upper and lower bounds for graphical models REGISTRATION Standard Registration: $700 Student Registration: $400 Cancellation Policy: Cancellation before Friday April 25th, 1997, incurs a penalty of $150.00. Cancellation after Friday April 25th, 1997, incurs a penalty of one-half of Registration Fee. Registration Fee includes Course Materials, breakfast, coffee breaks, and lunch. On-site Registration is possible. Payment of on-site registration must be in US Dollar amounts, by Money Order or Check (preferably drawn on a US Bank account). Those interested in participating should return the completed Registration Form and Fee as soon as possible, as the total number of places is limited by the size of the venue. Please print this form, and fill in the hard copy to return by mail REGISTRATION FORM Learning Methods for Prediction, Classification, Novelty Detection and Time Series Analysis Friday, May 2 - Saturday, May 3, 1997 Washington, D.C., USA. -------------------------------------- Please complete this form (type or print) Name ___________________________________________________ Last First Middle Firm or Institution ______________________________________ Standard Registration ____ Student Registration ____ Mailing Address (for receipt) _________________________ __________________________________________________________ __________________________________________________________ __________________________________________________________ Country Phone FAX __________________________________________________________ email address (Lunch Menu - tick as appropriate): ___ Vegetarian ___ Non-Vegetarian Fee payment must be made by MONEY ORDER or PERSONAL CHECK. All amounts are given in US dollar figures. Make fee payable to Prof. Michael Jordan. Mail it, together with this completed Registration Form to: Professor Michael Jordan Dept. of Brain and Cognitive Sciences M.I.T. E10-034D 77 Massachusetts Avenue Cambridge, MA 02139 USA HOTEL ACCOMMODATION Hotel accomodation is the personal responsibility of each participant. The Tutorial will be held in Hyatt Regency on Capitol Hill 400 New Jersey Avenue, NW Washington, DC 20001 1-800-233-1234 or (202) 737-1234 on May 2 -- 3, 1997. The hotel has reserved a block of rooms for participants of the course. The special room rates for participants are: U.S. $139.00 (Single/Double) per night + tax You must reserve accommodation before April 1, 1997 to avail of this special rate. Please be aware that these prices do not include State or City taxes. ADDITIONAL INFORMATION A registration form is available from the course's WWW page at http://www.ai.mit.edu/projects/cbcl/web-pis/jordan/course/ Marney Smyth E-mail: [email protected] Phone: 617 258-8928 Fax: 617 258-6779 >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 3 Feb 1997 22:47:43 -0600 From: jan zytkow <[email protected]> Dear Colleague: You may be interested in the following forthcoming events related to machine discovery. Please notice that there is still time to submit a paper to each of these events: 1. PKDD'97 -- 1st European Symposium on Principles of Data Mining and Knowledge Discovery, Trondheim, Norway, June 25-27, 1997 Deadline for submissions: February 17 2. International Symposium on Methodologies for Intelligent Systems (ISMIS-97), Charlotte, North Carolina, October 15-18, 1997 Machine discovery and learning is a strong theme at ISMIS Deadline for submissions: March 1. 3. The Third International Conference on Knowledge Discovery and Data Mining (KDD-97), Newport Beach, California, August 14-17, 1997 Deadline for submissions: March 10 (Cover page by March 3). Best regards, -- Jan Zytkow ------------------------------------------------------------------ 1. ------------------------------------------------------------------ New deadline for submitting papers to PKDD-97 The original deadline for submitting papers to the 1997 Principles of Knowledge Discovery in Databases was Wednesday, February 5. This deadline has been extended, so that PKDD-97 papers are now due on Monday, February 17, 1997 Notice of acceptance: March 17 Camera ready copies: April 4 Submit by email (preferred) to [email protected] or by airmail to Jan Komorowski Department of Computer Systems Norwegian University of Science and Technology 7034 Trondheim, Norway Papers should be in English and not exceed ten single-spaced pages of 12pt font. The first page should begin with title, authors, affiliations, surface and e-mail addresses, and an abstract of about 200 words. The proceedings of the Symposium will be published in the Springer Verlag Lecture Notes AI Series (www.springer.de/comp/comp.html) and available at PKDD-97, June 25-27. Watch the updated PKDD'97 WWW page for further details: http://www.idt.ntnu.no/pkdd97 If you have already sent off your paper but would like to resubmit by the new deadline, please send email to [email protected] --------------------------------------------------------------------------- PKDD'97 -- 1st European Symposium on Principles of Data Mining and Knowledge Discovery Trondheim, Norway June 25-27, 1997 Program Committee Introduction * Pieter Adriaans Data Mining and Knowledge Discovery (KDD) * Attilio Giordana have recently emerged from a combination of * David Hand many research areas: databases, statistics, * Bob Henery machine learning, automated scientific * Mikhail Kiselev discovery, inductive programming, artificial * Willi Kloesgen intelligence, visualization, decision * Yves Kodratoff science, and high performance computing. * Jan Komorowski * Heikki Manilla While each of these areas can contribute in * Marjorie Moulet specific ways, KDD focuses on the value that * Steve Muggleton is added by creative combination of the * Zdzislaw Pawlak contributing areas. The goal of PKDD'97 is * Gregory to provide a European-based forum for Piatetsky-Shapiro interaction among all theoreticians and * Zbigniew Ras practitioners interested in data mining. * Lorenza Saitta Fostering an interdisciplinary collaboration * Erik Sandewall is one desired outcome, but the main * Wei-Min Shen long-term focus is on theoretical principles * Arno Siebes for the emerging discipline of KDD, * Andrzej Skowron especially those new principles that go * Derek Sleeman beyond each of the contributing areas. * Shusaku Tsumoto * Raul Valdes-Perez To promote these goals, PKDD'97 will be * Rudiger Wirth organized into tracks around the key areas * Stefan Wrobel contributing to KDD. For each area an ideal * Wojtek Ziarko paper should focus on how its methods * Jan Zytkow advance KDD's goals and principles. Both theoretical and applied submissions are sought. Reviewers will assess the contribution towards the main goals of PKDD'97, in addition to the usual requirements of novelty, clarity and significance. Applied papers should go beyond an individual application, presenting an explicit method that promises a degree of generality within some stage of the discovery process, such as preprocessing, mining, visualization, use of prior knowledge, knowledge refinement, and evaluation. Theoretical papers should demonstrate how they advance the process of data mining and knowledge discovery. ------------------------------------------------------------------ 2. ------------------------------------------------------------------ ** C A L L F O R P A P E R S ** TENTH INTERNATIONAL SYMPOSIUM ON METHODOLOGIES FOR INTELLIGENT SYSTEMS (ISMIS'97) Hilton Hotel, Charlotte, North Carolina October 15-18, 1997 SPONSORS UNC-Charlotte, Oak Ridge National Laboratory, Univ. of Warsaw, and others. PURPOSE OF THE SYMPOSIUM This Symposium is intended to attract individuals who are actively engaged both in theoretical and practical aspects of intelligent systems. The goal is to provide a platform for a useful exchange between theoreticians and practitioners, and to foster the cross-fertilization of ideas in the following areas: * Evolutionary Computation * Intelligent Information Systems * Learning and Knowledge Discovery * Knowledge Representation and Integration * Logic for Artificial Intelligence * Robotics, Motion and Machine Vision * Soft Computing * Methodologies (modeling, design, validation, performance evaluation). In addition, we solicit papers dealing with Applications of Intelligent Systems in complex/novel domains, e.g. human genome, global change, manufacturing, health care, etc. SYMPOSIUM CHAIRS Francois G. Pin (Oak Ridge National Lab.) Zbigniew W. Ras (UNC-Charlotte & Polish Acad. Sci.) Andrzej Skowron (U. Warsaw, Poland) PROGRAM COMMITTEE Luigia Carlucci Aiello (U. Roma, Italy) Thomas Baeck (Inf. Centrum Dortmund & U. Leiden, The Netherlands) Alan Biermann (Duke Univ.) Jacques Calmet (U. Karlsruhe, Germany) Jaime Carbonell (CMU) Wesley Chu (UCLA) Kenneth DeJong (GMU) Robert Demolombe (CERT/ONERA, France) Jon Doyle (MIT) Toshio Fukuda (Nagoya U., Japan) Attilio Giordana (U. Torino, Italy) Diana Gordon (Naval Research Lab.) Mirsad Hadzikadic (Carolinas HealthCare System) Jiawei Han (Simon Fraser U., Canada) David Hislop (Army Research Office) Matthias Jarke (RWTH Aachen, Germany) John Y. Jiang (Pacific Bell Lab.) Willi Kloesgen (GMD, Germany) Yves Kodratoff (U. Paris VI, France) Jan Komorowski (U. Trondheim, Norway) Alberto Martelli (U. Torino, Italy) Robert Meersman (U. Brussels, Belgium) Zbigniew Michalewicz (UNC-Charlotte & Polish Acad. Sci.) Ryszard Michalski (GMU & Polish Acad. Sci.) Jack Minker (U. Maryland) Ephraim Nissan (U. Greenwich, UK) Lin Padgham (RMIT U., Australia) Rohit Parikh (CUNY) Lynne Parker (ORNL) Gregory Piatetsky-Shapiro (GTE Lab.) Henri Prade (U. Paul Sabatier, France) Luc De Raedt (U. Leuven, Belgium) Marek Rusinkiewicz (MCC) Lorenza Saitta (U. Torino, Italy) Erik Sandewall (Linkoping U., Sweden) Yoav Shoham (Stanford U.) Richmond Thomason (U. Pittsburgh) Jing Xiao (UNCC) Carlo Zaniolo (UCLA) Gian Piero Zarri (CNRS, France) Maria Zemankova (NSF) Jan M. Zytkow (Wichita State U. & Polish Acad. Sci.) INVITED TALKS Alan Biermann (Duke Univ.) "Multimedia Dialogue: Theory and Practice" Jaime Carbonell (CMU) "Automated Text Summarization" or "Learning from the WEB" Wesley Chu (UCLA) "A knowledge-based multimedia medical distributed database system" Michael Lowry (NASA Ames) "V&V of AI systems that control deep-space spacecraft" Gregory Piatetsky-Shapiro (GTE Lab.) "Data Mining and Knowledge Discovery: The Second Generation" Gio Wiederhold (Stanford U.) "Achieving scalibility through an Ontology Algebra" ORGANIZING COMMITTEE Brian Bachman (First Union) Mirsad Hadzikadic (Carolinas HealthCare System) Karen Harber (ORNL) Mieczyslaw Klopotek (Polish Acad. Sci.) M.S. Narasimha (IBM-Charlotte) Zbigniew W. Ras (UNC-Charlotte) PAPER SUBMISSION Authors are invited to submit four copies of their manuscript (maximum 12 pages) to one of the addresses below: Papers from US and Canada: Papers from Europe: Francois G. Pin, ISMIS'97 Andrzej Skowron, ISMIS'97 ORNL, Bldg. 7601, M.S. 6305 Univ. of Warsaw P.O. Box 2008 Dept. of Mathematics Oak Ridge, TN 37831-6305 Banacha 2 e-mail: [email protected] PL-02-097 Warsaw, POLAND fax: 423-574-4624 e-mail: [email protected] tel: 423-574-6130 tel: 48-(22)-658-3449 All other papers: Zbigniew W. Ras, ISMIS'97 Univ. of North Carolina Dept. of Comp. Science Charlotte, N.C. 28223 e-mail: [email protected] fax: 704-547-3516 tel: 704-547-4567 Submissions should include a title page (1 copy) specifying the title, all authors with their affiliations, abstract (100-200 words), up to 10 keywords (begin the keyword list with at least one of the ISMIS areas listed above); and the preferred address of the contact author, including a telephone number, fax number, and e-mail address (if available). The remainder of the paper can include up to 11 pages, attached to the title page. If possible, the title page should be ADDITIONALLY submitted via email (in plain text) to <[email protected]> to facilitate submissions processing. IMPORTANT DATES Submission of Papers: March 1, 1997 Acceptance Notification: May 25, 1997 Final Paper: July 1, 1997 PUBLICATION Papers accepted for Regular Sessions will be published by Springer-Verlag in LNCS/LNAI. Poster Session proceedings will be published by Oak Ridge National Laboratory. Both proceedings will be available at the symposium. WWW Information about ISMIS'97 can be found on http://www.ipipan.waw.pl/~klopotek/ismis97.html ------------------------------------------------------------------ 3. ------------------------------------------------------------------ The Third International Conference on Knowledge Discovery and Data Mining (KDD-97) August 14-17, 1997 Newport Beach, California, U.S.A. Sponsored by the American Association for Artificial Intelligence ---------------------------------------------------------------------------- Call for Papers The rapid growth of data and information has created a need and an opportunity for extracting knowledge from databases, and both researchers and application developers have been responding to that need. Knowledge discovery in databases (KDD), also referred to as data mining, is an area of common interest to researchers in machine discovery, statistics, databases, knowledge acquisition, machine learning, data visualization, high performance computing, and knowledge-based systems. KDD applications have been developed for astronomy, biology, finance, insurance, marketing, medicine, and many other fields. The third international conference on knowledge discovery and data mining (KDD-97) will follow up the success of KDD-95 and KDD-96 by bringing together researchers and application developers from different areas focusing on unifying themes. Suggested Topics The topics of interest include, but are not limited to: Theory and Foundational Issues in KDD * Data and knowledge representation for KDD * Probabilistic modeling and uncertainty management in KDD * Modeling of structured, unstructured and multimedia data * Fundamental advances in search, retrieval, and discovery methods * Definitions, formalisms, and theoretical issues in KDD Data Mining Methods and Algorithms * Algorithmic complexity, efficiency and scalability issues in data mining * Probabilistic and statistical models and methods * Using prior domain knowledge and re-use of discovered knowledge * Parallel and distributed data mining techniques * High dimensional datasets and data preprocessing * Unsupervised discovery and predictive modeling KDD Process and Human Interaction * Models of the KDD process * Methods for evaluating subjective relevance and utility * Data and knowledge visualization * Interactive data exploration and discovery * Privacy and security Applications * Data mining systems and data mining tools * Application of KDD in business, science, medicine and engineering * Application of KDD methods for mining knowledge in text, image, audio, sensor, numeric, categorical or mixed format data * Resource and knowledge discovery using the Internet This list of topics is not intended to be exhaustive but an indication of typical topics of interest. Prospective authors are encouraged to submit papers on any topics of relevance to knowledge discovery and data mining. Demonstration Sessions KDD-97 also invites working demonstrations of discovery systems. Contact information for details is provided below. Submission and Review Criteria Both research and applications papers are solicited. All submitted papers will be reviewed on the basis of technical quality, relevance to KDD, novelty, significance, and clarity. Authors are encouraged to make their work accessible to readers from other disciplines by including a carefully written introduction. Papers should clearly state their relevance to KDD. Please submit 7 hardcopies of a short paper (a maximum of 9 single-spaced pages not including cover page and bibliography, 1 inch margins, and 12pt font) to be received by March 10, 1997. A cover page must include author(s) full address, email, paper title and a 200 word abstract, and up to 5 keywords. This cover page must accompany the paper. In addition, an ascii version of the cover page must be submitted electronically by March 3, 1997 (earlier if possible), preferably using a WWW form located at http://www-aig.jpl.nasa.gov/kdd97/. If the WWW form cannot be used, please submit the ascii cover page by email to [email protected], using the template available by ftp at http://www-aig.jpl.nasa.gov/kdd97/. Please mail the 7 hardcopies of the full papers to: AAAI (KDD-97) 445 Burgess Drive Menlo Park, CA 94025-3496 USA Phone: (+1 415) 328-3123 Fax: (+1 415) 321-4457 Email: [email protected] Web Site: http://www.aaai.org. Important Dates * Submissions Due: March 10, 1997 * Acceptance Notice: April 28, 1997 * Camera-ready paper due: May 26, 1997 KDD-97 Organization ------------------- General Conference Chair Ramasamy Uthurusamy (General Motors Corporation, USA) Program Co-Chairs David Heckerman (Microsoft Research, USA) Heikki Mannila (University of Helsinki, Finland) Daryl Pregibon (AT&T Research, USA) Publicity Chair Paul Stolorz (Jet Propulsion Laboratory, USA) Tutorial Chair Padhraic Smyth (UC Irvine, USA) Demo and Poster Sessions Chair Tej Anand (NCR Corporation, USA) Awards Chair Gregory Piatetsky-Shapiro (GTE Laboratories, USA) Panel Chair Willi Kloesgen Contact Information ------------------- For further information, send inquiries regarding * submission logistics to AAAI at [email protected] Phone: (+1 415) 328-3123 Fax: (+1 415) 321-4457 * KDD-97 sponsorship and industry participation to Ramasamy Uthurusamy [email protected] Phone: 810-696-0669 Fax: 810-696-0580 * technical program and content to [email protected] * demo and poster sessions to [email protected] * general and publicity issues to [email protected] >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
410.15	97:06	IJSAPL::OLTHOF	Spellchecked Henry Although	`Wed Feb 12 1997 22:35`	710
	Knowledge Discovery Nuggets 97:06, e-mailed 97-02-12 News: * E. Colet, ESPN to regularly show the application of data mining http://www.nba.com/allstar97/asgame/beyond.html * K. Parsaye, IDI Press Release: "Bridge Between OLAP and Data Mining" Publications: * R. Greiner, CLNL 4: Computational Learning Theory and Natural Learning Systems, v. IV: Making Learning Systems Practical, http://www-mitpress.mit.edu/mitp/recent-books/comp/greop.html * R. Kohavi, MLJ Spec Issue on Applications of Machine Learning and the Knowledge Discovery Process, deadline: March 4. http://reality.sgi.com/ronnyk/mljapps/ Positions: * H. Mannila, Postdoctoral position in data mining / pattern matching / spatial data, http://www.cs.helsinki.fi/~mannila * F. Provost, KB system developer positions at NYNEX Science and Technology Meetings: * S. Cartmell, PADD 97 update -- http://www.demon.co.uk/ar/PADD97/ * B. Zupan, IDAMAP-97: Reminder and brief Second CFP * G. Widmer, ECML'97 - Papers & Registration Info http://is.vse.cz/ecml97/home.html -- Knowledge Discovery Nuggets is a free electronic newsletter for the Data Mining and Knowledge Discovery in Databases (KDD) community, focusing on the latest research and applications. Submissions are most welcome and should be emailed, with a DESCRIPTIVE subject line (and a URL, when available) to [email protected] To subscribe, email to [email protected] message with subscribe kdd-nuggets in the first line (the rest of the message and subject are ignored). See http://info.gte.com/~kdd/subscribe.html for details. Nuggets frequency is approximately 3-4 times a month. Back issues of Nuggets, a catalog of Siftware (data mining tools), and a wealth of other information on Data Mining and Knowledge Discovery is available at Knowledge Discovery Mine site http://info.gte.com/~kdd -- Gregory Piatetsky-Shapiro (editor) ******************* Official disclaimer ********************************* * All opinions expressed herein are those of the writers (or the moderator) * * and not necessarily of their respective employers (or GTE Laboratories) * *************************************************************************** ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Arguing with engineers is like mud-wrestling with pigs. Sooner or later you'll realize that they like it. Thanks to Tom Lanning >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: "Edward Colet"<[email protected]> Date: Tue, 11 Feb 1997 16:11:17 -0400 Subject: "ESPN to regularly show the application of data mining" On Sunday mornings from 9:00-9:30 (EST), ESPN will regularly broadcast a show called "NBA Matchups presented by IBM". The show will feature in-depth analysis of player and team match-ups based on trends and patterns found by Advanced Scout that pertain to the National Basketball Association (NBA) game of the week. The game of the week is aired later that afternoon on NBC. Bob Hill (former coach of the San Antonio Spurs), Fred "Mad Dog" Carter and Mark Jones (both of ESPN) are the hosts, and an invited guest will round out the panel (last week's guest was Red Auerbach). As some of you may know, several NBA coaches have been using IBM's Advanced Scout data mining application to discover trends and patterns in game data. Advanced Scout is also the basis for the "Beyond the Box Score" feature on the NBA website (www.nba.com. Look under "News and Features" if you don't see it off the home page). Thanks, Ed Colet. ***************************************** IBM T.J. Watson Research Center 30 Saw Mill River Road Hawthorne NY 10532 phone: 914-784-6621; tie-line 863 fax: 914-784-7455 email: [email protected] ***************************************** >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Wed, 5 Feb 1997 10:10:31 -0800 From: [email protected] (IDI) Subject: OLAP & DM Press Release ******************************************************************** Special Release CONTACT: IDI MARKETING COMMUNICATIONS (310) 937-3600 Breakthrough Merges OLAP and DataMining The Bridge Between OLAP and Data Mining Impacts all Corporate Decision Support Plans Los Angeles -- January 27, 1997 The 2nd Annual Data Mining Summit in San Francisco, California on February 19, 1997 is likely to be remembered as the event in which On Line Analytical Processing (OLAP) and datamining came together for the first time and took uniform shape. Up until now, most corporations had considered data mining and OLAP as individual and disparate components of their decision support system, because no coherent theory and methodology existed for a relationship. The 1997 Data Mining Summit will bridge this gap and will forever change the way corporations view and use decision support systems. At the Keynote Address for the Summit, Dr. Kamran Parsaye, CEO of Information Discovery, Inc. will introduce a fundamentally new theory and methodology for connecting OLAP and data mining, showing that they must be merged in order to avoid incorrect and misleading results during data analysis. "The bridge between OLAP and data mining is not a luxury but a necessity," said Dr. Parsaye. "OLAP analyses and datamining need to be performed together if we are to trust the results from either" he added. "In the early days of relational databases, before normalization theory was introduced, people were getting incorrect results. Now, unless OLAP and data mining are performed together a similar situation can prevail" he said. The keynote address will show that whenever data analysis takes place, it happens within some "dimension", and datamining along a single axis is merely a rough approximation of multi-dimensional mining. Lack of attention to dimensionality in data mining can result in unexpected results. And, decision support errors can take a long time to be uncovered -- if ever. A companion paper in the February issue of Database Programming and Design magazine details examples of this phenomena and outlines a uniform approach for dealing with both OLAP and datamining. At the keynote, Dr. Parsaye will also describe how OLAP and data mining fit together in the context of the Four Spaces of Decision Support. This methodology for applying OLAP data mining has three distinct processes of episodic, strategic and continuous mining for specific user groups within corporate environments. "Integration between OLAP and data mining can not take place at the desktop level and must be performed on the server" said Dr. Parsaye. "IS departments that hand their users OLAP data to be mined on the desktop could be unknowingly getting their users into serious trouble" he said. The impact of the new result on corporate planning for decision support and data warehousing can be significant. Business users and IS departments can no longer just consider an OLAP product and a separate data mining system but will need to consider both at once to avoid the pitfalls outlined in the keynote. This will also accelerate the use of products for both OLAP and data mining. For more information on the DataMining Summit please visit http://www.dbsummit.com on the internet, or call (415) 905 2267. For more information on Information Discovery, Inc. please visit http://www.datamining.com on the internet. [note: any comments from readers on appropriateness of posting commercial press releases such as above? GPS] >~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 3 Feb 1997 18:56:37 -0500 (EST) From: Russell Greiner <[email protected]> To: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Subject: CLNL v4 is here! CC: [email protected], [email protected] Content-Length: 363 We are pleased to announce that the book "Computational Learning Theory and Natural Learning Systems Volume IV: Making Learning Systems Practical" (ed. Russell Greiner, Thomas Petsche, and Stephen Jose Hanson) is now available from MIT Press; see http://www-mitpress.mit.edu/mitp/recent-books/comp/greop.html for details. Cheers, Russ Greiner >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Tue, 4 Feb 1997 23:23:12 -0800 From: Ronny Kohavi <[email protected]> Subject: CFP: Special Machine Learning issue on applications of ML This is a short reminder that the submission deadline for the special issue of Machine Learning is in a few weeks. For more information, see http://reality.sgi.com/ronnyk/mljapps/ * Submission deadline: 4 Mar 1997 _____________________________________________________________________________ Machine Learning Special Issue on Applications of Machine Learning and the Knowledge Discovery Process Guest editors: Ronny Kohavi and Foster Provost With the explosion in size of business and scientific databases (VLDBs), the opportunities and pressure to mine the data and make novel discoveries have increased dramatically. For many problems, basic statistical summaries are not sufficient and there is a clear and recognized need for solutions involving a machine learning component. For example, modern businesses constantly seek to gain competitive advantage by tailoring actions to different customer segments and avoiding the trap of targeting the "average customer." This special issue of the journal Machine Learning will be dedicated to papers describing work in which machine learning technologies have been applied to solve significant real-world problems. In particular, it will focus on the application of Machine Learning technology, the simplifying assumptions that cannot be made in a real-world application, and the processes that are involved in going from the raw data to the final knowledge that decision makers seek. _____________________________________________________________________________ Ronny Kohavi and Foster Provost [email protected] >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: [email protected] Date: Wed, 12 Feb 1997 09:58:29 +0500 Subject: 3 positions at NYNEX S&T KB system developer positions at NYNEX Science and Technology The Integrated Network Services Testing & Analysis (INSTA) group at NYNEX Science & Technology has three openings for knowledge-based (KB) diagnostic system developers. The group is involved in building monitoring, testing and diagnostic systems using state of the art AI technologies for advanced Telecom networks and circuits. The group has been building systems that support complete testing and diagnosis of circuits, both from the central office and in the field. Systems already built and deployed test and diagnose residential telephone lines and some of the business services. In addition to these, the group is currently looking at ISDN and broadband services. The selected candidate would work on one or more of the following projects: - Building KB system for assisting field technicians out in the field in testing and troubleshooting faults in telecomm circuits. The candidate will also explore complementing this KB with the KB performing centralized testing from the Central Office. - Building KB system for automated centralized testing and diagnosis of Special (Buisness) service circuits. - Building monitoring, testing, and diagnostic systems for broadband circuits. - Building an intelligent interactive assistant to aid testers in testing and diagnosing circuits. Suitable candidates must have the following: =========================================== - Background in Computer Science or Computer Engineering or Electrical Engineering. - Experience in all aspects of building knowledge-based systems including knowledge acquisition, knowledge engineering, domain and task modeling, testing, validation, and evaluation of the knowledge-based systems. - Good understanding of various AI techniques such as model-based reasoning, case-based reasoning, neural nets, and machine learning. - Good analytical skills. - Quick learner - to quickly acquire relevant domain knowledge. - Good system building experience Experience with the following would be a plus: =================================================================== - Knowledge of data-analysis tools (eg: statistical tools) - Unix, C, C++, LISP, ARTIM, CLIPS... - Distributed Client server Architectures - databases, database wharehousing - Telecomm experience: Operation Support Systems, Residential Lines, Special services, broadband services, telecomm network and circuit testing, alarm monitoring etc. If interested, please mail a hard copy of your resume to: Yuling Wu NYNEX Science & Technology 400 Westchester Av. White Plains, NY 10604 or email the postscript version to: [email protected] >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Fri, 7 Feb 1997 14:21:24 +0200 (EET) From: Heikki Mannila <[email protected]> Subject: Postdoc position in Helsinki: data mining / pattern matching / spatial data Content-Length: 1366 Postdoctoral position in data mining / pattern matching / spatial data University of Helsinki Department of Computer Science The pattern matching and data mining group in the Department of Computer Science, University of Helsinki, has an opening for a postdoc researcher in the areas of data mining, pattern matching, or spatial data. The research group combines methods from pattern matching, statistics, and databases to develop methods for the analysis of large data sets. The group does theoretical and applied research. Currently, special emphasis is given to work related to bioinformatics and geoinformatics. The group is one of the leading ones in data mining and string matching. For further information, see http://www.cs.helsinki.fi/~mannila http://www.cs.helsinki.fi/~ukkonen Applicants should have a recent Ph.D. or equivalent. The appointment is initially for one year, starting from September 1997. Applications should contain a curriculum vita, a list of three referees and a letter addressing the applicant's suitability for the position. Applications and inquiries may be submitted by email to [email protected] or [email protected] before February 28, 1997. Heikki Mannila Esko Ukkonen >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Wed, 5 Feb 1997 18:21:51 +0000 From: Steve Cartmell <[email protected]> Subject: PA EXPO97 UPDATE PRACTICAL APPLICATION EXPO97 ============================== CONFERENCE UPDATE =================== Westminster Central Hall, London, 21-25 April, 1997 The Practical Application EXPO97 brings together four events under one roof: PAAM97 - The Practical Application of Intelligent Agents and Multi-Agents; PADD97- The Practical Application of Knowledge Discovery and Data Mining; PACT97-The Practical Application of Constraint Technology and PAP97-The Practical Application of Prolog. PLEASE VISIT OUR RECENTLY UPDATED WEB PAGES FOR FURTHER INFORMATION ON Tutorials Invited Talks Exhibition Venue Hotel reservations Registration http://www.demon.co.uk/ar/Expo97/ http://www.demon.co.uk/ar/PAP97/ http://www.demon.co.uk/ar/PACT97/ http://www.demon.co.uk/ar/PAAM97/ http://www.demon.co.uk/ar/PADD97/ The Practical Application Company PO Box 137 Blackpool Lancs FY2 9UN UK Tel: +44 (0)1253 358081 Fax: +44 (0)1253 353811 email: [email protected] WWW: http://www.demon.co.uk/ar/TPAC/ >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: Blaz Zupan <[email protected]> Subject: IDAMAP-97: Reminder and brief Second CFP Date: Wed, 5 Feb 1997 10:35:20 +0100 (MET) Reminder and brief Second Call for Papers for IDAMAP-97 INTELLIGENT DATA ANALYSIS IN MEDICINE AND PHARMACOLOGY Saturday, August 23, 1997 Workshop W15 at IJCAI-97 August 23-29, 1997, Nagoya, Japan Paper submission deadline is March 3, 1997. Submit 8-12 page papers by e-mail (postscript) and 3 hard-copies by surface mail to: Nada Lavrac, Blaz Zupan J. Stefan Institute, Jamova 39, SI-1000 Ljubljana, Slovenia email: [email protected] For up-to-date workshop information please check: http://www-ai.ijs.si/ailab/activities/idamap97.html >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 10 Feb 1997 18:00:38 +0100 (MET) From: Gerhard Widmer <[email protected]> Subject: ECML'97 - Papers & Registration Info ------------------------------------------------------------------------- NINTH EUROPEAN CONFERENCE ON MACHINE LEARNING (ECML-97) Prague, Czech Republic, April 23-26 1997 **************************************************** ECML'97: LIST OF ACCEPTED PAPERS and REGISTRATION INFO **************************************************** ------------------------------------------------------------------------- The list of accepted papers, INCLUDING ALL ABSTRACTS, is now available from the ECML-97 WWW home page: http://is.vse.cz/ecml97/home.html This page also gives access to - the 4 post-conference ECML/MLNet WORKSHOPS and - ECML-97 REGISTRATION INFORMATION and the ECML REGISTRATION FORM. - A preliminary version of the CONFERENCE PROGRAMME will be available soon. For further questions about the program, contact Gerhard Widmer at [email protected], for questions regarding registration, contact the local organizers at [email protected]. For those without access to the WWW, please find below - titles and contact addresses for the 4 MLNet workshops, - the list of papers (w/o abstracts), - an ascii version of the registration form. ------------------------------------------------------------------ ECML / MLNet WORKSHOPS (Saturday, April 26): WS 1: Data-Driven Learning of Natural Language Processing Tasks Contact: Walter Daelemans, P.O. BOX 90153, NL-5000 LE Tilburg, The Netherlands. Tel: +31 13 4663070, Fax: +31 13 4663110, E-mail: [email protected] WS1 WWW Page: http://www.cs.unimaas.nl/ecml97/ WS 2: Case-Based Learning: Beyond Classification of Feature Vectors Contact: Dietrich Wettschereck, GMD, FIT.KI, Schloss Birlinghoven, 53754 Sankt Augustin, Germany Tel: +49-2241-14-2097, Fax: +49-2241-14-2072, E-mail: [email protected] WS2 WWW Page: http://nathan.gmd.de/persons/dietrich.wettschereck/ecmlws.html WS 3: Learning in Dynamically Changing Domains: Theory Revision and Context Dependence Issues Contact: Gholamreza Nakhaeizadeh, Research Center of Damiler-Benz AG, Ulm, Germany E-mail: [email protected] WS3 WWW Page: http://www.amsta.leeds.ac.uk/statistics/ecml97/dyn.htm WS 4: Machine Learning and Human-Agent Interaction Contact: Michael Kaiser, Institute for Real-Time Computer Systems & Robotics University of Karlsruhe, Kaiserstrasse 12, D-76128 Karlsruhe, Germany E-Mail: [email protected] WS4 WWW Page: http://wwwipr.ira.uka.de/events/hai97/ Common dates for all workshops: Deadline for submissions: February 15 Notification of acceptance: March 8 Camera-ready copy due: April 1 ------------------------------------------------------------------ PAPERS ACCEPTED FOR PRESENTATION AT ECML'97: INVITED TALKS / PAPERS: Learning Complex Probabilistic Models (tentative title) Stuart J. Russell, University of California, Berkeley, USA Constructing and Sharing Perceptual Distinctions Luc Steels, Free University of Brussels (VUB) and Sony Computer Science Laboratory, Paris On Prediction by Data Compression Paul Vitanyi, CWI, Amsterdam Ming Li, City University of Hong Kong LONG TALKS/PAPERS: Induction of Feature Terms with INDIE Eva Armengol & Enric Plaza, IIIA, Barcelona, Spain Integrated Learning and Planning Based on Truncating Temporal Differences Pawel Cichosz, Warsaw University of Technology, Warsaw, Poland Theta-subsumption for Structural Matching Luc De Raedt, Katholieke Universiteit Leuven, Belgium Peter Idestam-Almquist, Stockholm University, Sweden Gunther Sablon, Katholieke Universiteit Leuven, Belgium Constructing Intermediate Concepts by Decomposition of Real Functions Janez Demsar, Blaz Zupan, Marko Bohanec, Ivan Bratko University of Ljubljana and Jozef Stefan Institute, Ljubljana, Slovenia Conditions for Occam's Razor Applicability and Noise Elimination Dragan Gamberger, Rudjer Boskovic Institute, Zagreb, Croatia Nada Lavrac, Jozef Stefan Institute, Ljubljana, Slovenia Learning Different Types of New Attributes by Combining the Neural Network and Iterative Attribute Construction Yuh-Jyh Hu, University of California, Irvine, USA Finite-Element Methods with Local Triangulation Refinement for Continuous Reinforcement Learning Problems Remi Munos, CEMAGREF, Antony, France Compression-based Pruning of Decision Lists Bernhard Pfahringer, University of Waikato, New Zealand NeuroLinear: A System for Extracting Oblique Decision Rules from Neural Networks Rudy Setiono & Huan Liu, National University of Singapore Model Combination in the Multiple-data-batches Scenario Kai Ming Ting, University of Waikato, New Zealand Boon Toh Low, Chinese University of Hong Kong Natural Ideal Operators in Inductive Logic Programming Fabien Torre & Celine Rouveirol, LRI, Paris, France Ibots Learn Genuine Team Solutions Cristina Versino & Luca Maria Gambardella, IDSIA, Switzerland Global Data Analysis and the Fragmentation Problem in Decision Tree Induction Ricardo Vilalta, Gunnar Blix, Larry Rendell, University of Illinois at Urbana-Champaign, USA SHORT TALK/PAPERS: Exploiting Qualitative Knowledge to Enhance Skill Acquisition Cristina Baroglio, Universita di Torino, Italy Classification by Voting Feature Intervals G"ulsen Demir"oz & H. Altay G"uvenir, Bilkent University, Ankara, Turkey Metrics on Terms and Clauses Alan Hutchinson, King's College, London, UK Learning When Negative Examples Abound Miroslav Kubat, Robert Holte, Stan Matwin, University of Ottawa, Canada A Model for Generalization based on Confirmatory Induction Nicolas Lachiche, INRIA Looraine, France Pierre Marquis, Universite d'Artois, France Learning Linear Constraints in Inductive Logic Programming Lionel Martin & Christel Vrain, Universite d'Orleans, France Inductive Genetic Programming with Decision Trees Nikolay Nikolaev, American University in Bulgaria Vanyo Slavov, New Bulgarian University, Sofia, Bulgaria Parallel and Distributed Search for Structure in Multivariate Time Series Tim Oates, Matthew Schmill, Paul Cohen University of Massachusetts, Amherst, USA Probabilistic Incremental Program Evolution: Stochastic Search Through Program Space Rafal Salustowicz & J"urgen Schmidhuber, IDSIA, Switzerland The GRG Knowledge Discovery System: Design Principles and Architectural Overview Ning Shan, Macro International Inc., Calverton, MD, USA Howard Hamilton & Nick Cercone, University of Regina, Canada Learning and Exploitation do not Conflict under Minimax Optimality Csaba Szepesvari, University of Szeged, Hungary Search-based Class Discretization Luis Torgo & Joao Gama, University of Porto, Portugal A Case Study in Loyalty and Satisfaction Research Koen Vanhoof, Josee Bloemer, K. Pauwels Limburgs Universitair Centrum, Belgium --------------------------------------------------------------------- REGISTRATION FORM - ECML 97 (The deadline: March 25, 1997) TO BE FAXED (42-2) 6731 0503 OR MAILED Action M Agency, note, please Vrsovicka 68 that after March 1, 1997, 101 00 - Praha 10 the country number (42) Czech Republic will be changed to (420) FILL IN CAPITAL LETTERS, PLEASE last name: first name: Prof./Dr./Mr./Ms. affilliation: university/dept.: street: town: Code: country: phone: fax: e-mail: name of accompanying person(s): date (time) of arrival: date of departure: number of nights: I will attend workshop: 1. 2. 3. 4. (tick, please) ACCOMMODATION: krystal hotel (Conference site) an individual choice up to price per night: Room: single double NAME OF PERSON SHARING THE ROOM: special needs (vegetarian, disabled, etc.): CONFERENCE FEES: BEFORE / AFTER FEBRUARY 20, 1997 CONFERENCE FEE (APRIL 23-25) DM 270.00 / 320.00 MLNet WORKSHOP FEE (APRIL 26) DM 35.00 / 35.00 ACCOMPANYING PERSON FEE DM 80.00 / 100.00 ACCOMMODATION DEPOSIT: DM 150. 00 ACCOMMODATION BALANCE: (NUMBER OF NIGHTS MINUS THE DEPOSIT ) SOCIAL PROGRAM: SIGHTSEEING TOUR OF PRAGUE DM 25. 00 TA FANTASTIKA THEATRE DM 27. 00 TRIP & FAREWELL PARTY DM 65. 00 TOTAL AMOUNT: PAYMENT BY CREDIT CARD: AMEX VISA Master Card / Eurocard JCB Diners club Number: Expire: / Four-numbers code (for amex cards only): / / / / I, the undersigned, give the authorization to the Action M Agency to withdraw from my account the equivalent in Czech Crowns of the total amount of DM Your Signature I agree to withdraw from my credit card the accommodation balance (after March 25) Your Signature PAYMENT BY BANK TRANSFER: Name of the bank Date of payment Your Signature
410.16	9707	IJSAPL::OLTHOF	Spellchecked Henry Although	`Sat Mar 01 1997 14:29`	1415
	Knowledge Discovery Nuggets 97:07, e-mailed 97-02-24 Publications: * GPS, Review of Adv. KDDM in NeuroVe$t journal Siftware: * R. Kohavi, SGI MineSet Available for Varsity Members http://www.sgi.com/Products/software/MineSet Positions: * T. Gutschow, Data Mining Research Position at HNC Software Inc. * C. Shearer, Vacancies - Data Mining Tool Development & Consulting : UK & US, at ISL * W. Zhang, Job: Machine Learning at Boeing Meetings: * M. P. Singh, 2nd CFP: Workshop on Agent Theories, Architectures, and Languages (ATAL), Providence, RI, July 24-26, 1997 http://www.csc.ncsu.edu/faculty/mpsingh/activities/atal/ * H. M. Chung, CFP: track on Data Mining at AIS-97, Indianapolis, Indiana, August 15-17, 1997 http://hsb.baylor.edu/ramsower/ais.ac.97 * L. DeRaedt, CFP: IJCAI-97 workshop on Frontiers of Inductive Logic Programming, 25 August 1997 * M. Manago, 2 days course on Data Mining & CBR in San Francisco for U. of Berkeley Extension, March 24-25, 1997 * M. Manago, Tutorial + Seminar on CBR & Data Mining, London, 17-19 March 1997, http://www.unicom.co.uk -- Knowledge Discovery Nuggets is a free electronic newsletter for the Data Mining and Knowledge Discovery community, focusing on the latest research and applications. Submissions should be emailed, with a DESCRIPTIVE subject line (and a URL, when available) to [email protected] To subscribe, email to [email protected] message with subscribe kdd-nuggets in the first line (the rest of the message and subject are ignored). See http://info.gte.com/~kdd/subscribe.html for details. Nuggets frequency is approximately 3 times a month. Back issues of KD Nuggets, a catalog of Siftware (data mining tools), and a wealth of other information on Data Mining and Knowledge Discovery is available at Knowledge Discovery Mine site http://info.gte.com/~kdd/ -- Gregory Piatetsky-Shapiro (editor) (p.s. this is my last week at GTE. Starting today, I can be reached at [email protected] . After March 1, 1997 I will continue to edit and distribute KD Nuggets and maintain KD Mine pages at a new web site -- details to be announced soon! The [email protected] and [email protected] email addresses would still work for a while. GPS) ******************* Official disclaimer ********************************* * All opinions expressed herein are those of the writers (or the moderator) * * and not necessarily of their respective employers (or GTE Laboratories) * *************************************************************************** ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Q: What is the link between a large number of meetings and a large number of job announcements? A: Somebody got to work, while all those other people go to meetings >~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Sun, 16 Feb 1997 12:20:06 -0500 From: gps0 (Gregory Piatetsky-Shapiro) Subject: NeuroVe$t journal and Data Mining for Financial Applications] Content-Length: 3383 Here, reprinted with permission, is the review of AKDDM book from * NeuroVe$t Journal, Jan/Feb 1996, pg.49, Reviews in Brief section - Advances in Knowledge Discovery and Data Mining Advances in Knowledge Discovery and Data Mining (AKDDM) provides a well-edited collection of material from the 1994 KDD (Knowledge Discovery in Databases) Workshop, and several additional invited papers. In all, 23 papers presented in 7 chapters are included along with a useful appendix on KDD terminology and resources on the Internet. Coupled with an extensive index and a very good job of editing, AKDDM makes for a very accessible and worthwhile collection of papers. Of particular interest to investors and traders, especially those using data-driven computer technologies, are "A Statistical Perspective on Knowledge Discovery in Databases" by Elder and Pregibon, which provides a very good introduction to the topics. "Finding Patterns in Time Series" by Berndt and Clifford include in their studies a look at various technical analysis patterns of daily DJIA prices from 1989 to 1993, using pattern templates that vary in length from 9 to 12 trading days. "Integrating Inductive and Deductive Reasoning for Data Mining" by Simoudis, Livezey and Kerber involves the creation of portfolios of 100 stocks from 7 years of data on 1500 stocks. "Predicting Equity Returns from Securities Data with Minimal Rule Generation" by Apte and Hong describes a minimal rule generation technique for forecasting 1-month S&P 500 returns using 40 fundamental and technical variables (not specifically identified). Unfortunately, there is scant mention of the specifics of rough sets, nearest neighbor classifiers, learning vector quantizers, self-organizing maps, fuzzy logic and other tools of interest to practitioners and applied researchers working in the field. And, on more than a couple of occasions, the authors (including the editors) appear to venture beyond their respective areas of expertise. However, the few shortcomings are overshadowed by several very good introductory studies. Seldom do I recommend collections of workshop or conference papers to the general audience. However, AKDDM represents an exception. Despite its weaknesses, it provides a valuable introduction to a relatively new, yet increasingly important area of applied research. Financial practitioners who are particularly interested in data mining will certainly want to take a look. Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, and Ramasamy Uthursusamy (editors). 1996. The MIT Press, 55 Hayward Street, Cambridge, MA 02142. 620 pages. US$50. ISBN 0-262-56097-6. 617-253-5643. -- James Hampton * (c) Copyright 1997 Finance & Technology Publishing, P.O. Box 764, Haymarket, VA 20168. Reprinted with permission of the publisher from NeuroVe$t Journal, Jan/Feb 1997. Details on NeuroVe$t Journal (now named J. of Computational Intelligence in Finance are at) at http://ourworld.compuserve.com/homepages/ftpub >~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Sat, 15 Feb 1997 12:14:00 -0800 From: Ronny Kohavi <[email protected]> Subject: SGI MineSet Available for Varsity Members Reply-to: [email protected] Silicon Graphics' MineSet Available to Varsity Members ---------------------------- MineSet(TM) version 1.1 is the second release of SGI's product for data mining and exploratory data analysis. MineSet integrates tools for data access, transformations, analytical data mining, and visual data mining. See http://www.sgi.com/Products/software/MineSet for more information. In addition to 30-day free evaluation copies available to any site, with the new release of SGI's Varsity program CDs (happening now), varsity members can get PERMANENT MineSet licenses. Any educational institution is eligible. To qualify, the institution must have an infrastructure capable of handling technical software support for its Silicon Graphics users who have purchased Varsity Program software packages. THE VARSITY PROGRAM AGREEMENT MUST BE COMPLETED AND SIGNED BY THE INSTITUTION AND APPROVED BY SILICON GRAPHICS. The institution buys the right to distribute Varsity Program Developer Package right-to-use licenses in multiples of 10 or 25. These licenses are maintained by purchasing yearly support. Thus, the cost of ownership is significantly reduced in the second year and beyond. How Does this Work ------------------ SGI Varsity sites will get Varsity CD-ROMs with MineSet or they can download it directly from http://www.sgi.com/Products/Evaluation/evaluation.html To get a permanent license, the site administrator can use the VPX (varsity ID) number to get a license from http://www.sgi.com/Products/license.html (click the radio button for varsity). See http://www.sgi.com/silicon_campus/varsity.html for more information about the SGI's varsity program. For questions about MineSet, send e-mail to [email protected] or visit our site at: http://www.sgi.com/Products/software/MineSet -- Ronny Kohavi ([email protected]) >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: "Gutschow, Todd" <[email protected]> Subject: Data Mining Research Position at HNC Software Inc. Date: Wed, 12 Feb 1997 17:51:58 -0800 The Technology Development Group at HNC Software Incorporated has an opening for a Manager of Data Mining Technology Research. The Technology Development Group is responsible for the core data analysis, data mining, and data modeling technology used in all HNC vertical solution products. The position will report to the Vice President of Technology Development and will be located at HNC's headquarters facility in San Diego, CA. Duties/Job Description: Conduct research in to new data mining algorithms in support of the Database Mining=D2 Marksman and other HNC products. Identify and coordinate data mining technology projects across all HNC operating groups. Monitor the data mining research literature to identify promising new techniques. Support product development and marketing activities via customer presentations, conference talks, and white papers. Required Qualifications (Experience/Skills): MS or Ph.D. in computer science, engineering, mathematics or other hard science (e.g., physics, chemistry, etc.). Five or more years experience in implementing and evaluating new statistical data analysis, neural networks, and/or data mining algorithms. Good software development skills. Experience with modern software development processes and tools (e.g., C++, Object oriented design, etc.). Strong communication and presentation skills. Preferred Qualifications (Experience/Skills) Strong algorithm diagnosis and troubleshooting skills. Experience with database marketing and its associated data analysis problems. Project management experience. If you know someone with the above qualifications who is interested in employment opportunities with HNC, please ask them to fax, mail or e-mail resumes immediately to: Human Resources Department HNC Software Inc. 5930 Cornerstone Court West San Diego, CA 92121 FAX: (619) 452-6524 E-mail: [email protected] Reference Job No. 293 >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: Colin Shearer <[email protected]> Date: Thu, 13 Feb 97 14:36:13 GMT Subject: VACANCIES - DATA MINING TOOL DEVELOPMENT & CONSULTING : UK & US VACANCIES - DATA MINING TOOL DEVELOPMENT & CONSULTING : UK & US =============================================================== Integral Solutions Limited (ISL) is a leading supplier of advanced decision support technology, specialising in data mining. Our award-winning Clementine tool combines multiple modelling techniques (neural networks, rule induction, regression) with data visualisation and manipulation to extract high-value decision making knowledge from large bodies of historical data. A rich visual programming interface makes Clementine accessible to non-technologist "data owners" - business, rather than IT, experts - and provides high productivity for "power" users. Clementine has established a leading position in the data mining market, and is in use in a wide range of industry sectors including finance, retail, telecoms, pharmaceuticals, utilities, broadcasting, defence. Applications are diverse and include demand prediction, customer profiling, risk assessment, turnover forecasting, process optimisation, fault pre-emption and fraud detection. We have an urgent need to recruit top-quality technical personnel. Current vacancies are: Data Mining Tool Developers --------------------------- Basingstoke, UK. To work on the ongoing development of Clementine. Candidates should have an interest in, and ideally experience of implementing, advanced modelling and data analysis techniques; experience of commercial data mining tool development is desirable but not essential. Experience of some or all of the following would also be useful: Unix GUI Development VMS Pop11 X Windows / Motif C Windows 95 / NT SQL Databases/ODBC Statistics Applicants should have a 2.1 or better at first degree; a relevant second degree may be an advantage. Technical excellence is expected, but must be combined with first rate communications and interpersonal skills and a desire for close contact with customers. Recent graduates and those with commercial experience will both be considered. Data Mining Consultants ----------------------- Basingstoke, UK; King of Prussia, PA, USA. To apply Clementine to customers' business problems. The role will include pre-sales consulting, training, and developing solutions. Candidates should be degree-qualified (2.1 or better) and, ideally, should have experience of data analysis and modelling in a business environment. Excellent communication and interpersonal skills are vital, and candidates should display initiative, creativity, enthusiasm (and the ability to convey it to clients) and self-management skills. As ISL's clients span many markets, our consultants need the ability to assimilate knowledge of any client's business, understand their problems, and fit a data mining solution to these. However, we also encourage applications from those with a specific business/sector specialisation (for example finance (banking, insurance), retail or manufacturing). We are willing to consider applications both from experienced consultants and from any other candidates who believe they have the aptitude to be developed into first-class consultants. This is an opportunity to join a small (30 people) but dynamic and rapidly developing company in an exciting business/technology area. ISL provides a stimulating and technically challenging environment with considerable scope for professional development. ISL is an equal opportunities employer. We encourage applications from new graduates through to experienced professionals. Salaries/benefits are competitive, and commensurate with relevant experience. Please apply with CV to: For UK: For US: Linda Montgomery, Kevin Peyton Integral Solutions Limited, ISL Decsion Systems Inc. Berk House, 630 Freedom Business Center Basing View, King of Prussia Basingstoke, PA 19406 RG21 4RG USA UK Fax : +44 1256 63467 Fax : (610) 768 7774 Email: [email protected] Email: [email protected] Tell us why you are the ideal candidate for a position at ISL. >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 17 Feb 1997 16:52:04 -0800 From: [email protected] (Wei Zhang) Subject: Job: machine learning at Boeing Outstanding Machine Learning Researcher needed The Boeing Company, the world's largest aerospace company, is actively working research projects in advanced computing technologies including projects involving NASA, FAA, Air Traffic Control, and Global Positioning as well as airplane and manufacturing research. The Research and Technology organization located in Bellevue, Washington, near Seattle, has an open position for a machine learning researcher. We are the primary computing research organization for Boeing and have contributed heavily to both short term technology advances and to long range planning and development. BACKGROUND REQUIRED: Machine Learning, Knowledge Discovery, Data Mining, Statistics, Artificial Intelligence or related field. RESEARCH AREAS: We are developing and applying techniques for data mining and statistical analyses of diverse types of data, including: safety incidents, flight data recorders, reliability, maintenance, manufacturing, and quality assurance data. These are not areas where most large R&D data mining efforts are currently focused. Research areas include data models, data mining algorithms, statistics, and visualization. Issues related to our projects also include pattern recognition, multidimensional time series, and temporal databases. We can achieve major practical impacts in the short-term both at Boeing and in the airline industry, which may result in a safer and more cost-effective air travel industry. A Ph.D. in Computer Science or equivalent experience is highly desirable for the position. We strongly encourage diversity in backgrounds including both academic and industrial experiences. Knowledge of machine learning, statistics, and data mining are important factors. Experience with databases and programming (C/C++, JAVA, and Splus) is desirable. APPLICATION: If you meet the requirements and you are interested, please send your resume via electronic e-mail in plain ASCII format to [email protected] (Wei Zhang). You can also send it via US mail to Wei Zhang The Boeing Company PO Box 3707, MS 7L-66 Seattle, WA 98124-2207 Application deadline is April 30, 1997. The Boeing Company is an equal opportunity employer. >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Note -- CFPs lately are getting too long! please send short versions with all the wonderful details at your the conference website! GPS] From: [email protected] Subject: 2nd CFP: Agent Theories, Architectures, and Languages, 1997 (4th Intl Wshop) Date: Mon, 17 Feb 1997 18:20:54 -0500 (EST) Reply-To: [email protected] SECOND CALL FOR PAPERS The Fourth International Workshop on Agent Theories, Architectures, and Languages (ATAL) Providence, Rhode Island, USA July 24-26, 1997 http://www.csc.ncsu.edu/faculty/mpsingh/activities/atal/ Intelligent agents are one of the most important developments in computer science in the 1990s. Agents are of interest in many important application areas, ranging from human-computer interaction to industrial process control. The ATAL workshop series aims to bring together researchers interested in the agent-level, micro aspects of agent technology. Specifically, ATAL-97 will address issues such as theories of rational agency, software architectures for intelligent agents, methodologies and programming languages for realising agents, and software tools for applying and evaluating agent systems. Papers that consider macro-level, societal issues of agent-based systems are welcome only if they explicitly relate to the workshop themes. ATAL-97 will be held over the three days immediately preceding the AAAI-97 conference, also being held in Providence. The ATAL-97 proceedings will be formally published as volume four of the Intelligent Agents series from Springer-Verlag. TIMETABLE Submissions due April 18, 1997 Notifications sent May 23, 1997 Prefinal versions due July 1, 1997 Workshop July 24-26, 1997 [edited for brevity -- full details at URL above. GPS] >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 17 Feb 1997 12:15:07 -0800 (PST) From: H Michael Chung <[email protected]> Call for Papers Association of Information Systems 1997 Americas Conference Indianapolis, Indiana, August 15-17, 1997 Mini-track on "Tools and Applications of Data Mining, Induction, and Knowledge Discovery: In Search of a Mighty Tool" Minitrack Chair: H. Michael Chung, CSULB Description This minitrack covers broader issues related to data mining, induction, and knowledge discovery in the areas of business and management applications. Tools based on regression analysis, information theoretic methods, genetic algorithms, and neural networks have been applied to discover patterns of financial fraud, to capture customer profiles for marketing, to predict fluctuations in stock prices, to control product quality, and to diagnose telecommunication network problems, among others . Expert decisions, environmental/normative datasets, and Internet database are considered for discovering information and knowledge. There are many issues that should be addressed in order to reap quality knowledge by applying sophisticated algorithms that would satisfy user needs. Some of the relevant topics include - Applications of Inductive Learning, Data Mining, and Knowledge Discovery - Data Warehousing - Statistical Inference of Data Mining - Knowledge Acquisition - WWW Database and Agents - Evaluation of Tools - Economics of Decisions - Data Visualization - Learning Systems ***********Important Dates*********** Electronic Submission Deadline: March 1st, 1997 Notification of Acceptance: April 15th, 1997 Camera Ready Copy Due: May 4th, 1997 ***********Submission Guidelines**************** Each submission must be FORWARDED ELECTRONICALLY AS A WORD PROCESSING FILE (MS WORD OR WORDPERFECT FORMAT) ATTACHED TO AN E-MAIL MESSAGE to the mini-track chair, H. Michael Chung. If this is not possible, then authors should contact the mini-track chair and arrange for a suitable workaround. Each submission is limited to THREE-PAGES IN LENGTH (APPROXIMATELY 1,750 WORDS) INCLUDING ALL FIGURES, TABLES, APPENDICES, AND REFERENCES, and must include the following: a) The name, e-mail address, mailing address, university/organizational affiliation, and phone/fax numbers of the contact person for the submission in the first few lines of the file, b) The submission title and the author's(s') name(s), the author's(s') e-mail address(es), mailing address(es), and author's(s') organization/university affiliation(s), c) An abstract of the submission, d) The body of the submission, and e) A list of references or a bibliography. All conference submissions and the submission review processes will be managed through e-mail. The receipt of submissions will be quickly confirmed by the mini-track chair. Submissions should follow the style guidelines of the MIS Quarterly. All camera-ready copy preparation details will be provided to submitting authors by the mini-track chairs through e-mail upon acceptance. Please send any questions and all submissions to Data Mining mini-track to H. Michael Chung Department of Information Systems College of Business Administration California State University, Long Beach Long Beach, CA 90840-8506 TEL (562) 985-7691 FAX (562) 985-5543 INTERNET [email protected] For additional information on the 1997 AIS Americas Conference, please see the homepages, http://hsb.baylor.edu/ramsower/ais.ac.97. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 17 Feb 1997 15:05:21 +0100 (MET) From: Luc De Raedt <[email protected]> CALL FOR PARTICIPATION and PAPERS IJCAI-97 Workshop on FRONTIERS OF INDUCTIVE LOGIC PROGRAMMING Monday 25 August 1997 GENERAL INFORMATION The IJCAI-97 one day workshop on "Frontiers of ILP" in Nagoya, Japan, will take place on August 25, immediately prior to the start of the main IJCAI conference. TECHNICAL DESCRIPTION Inductive logic programming (ILP) is a recent subfield of artificial intelligence that studies the induction of first order formulae from examples. The purpose of this workshop is twofold: on the one hand, we wish to widen the scope of ILP by investigating its relations to neighboring fields, and on the other hand, we wish to make ILP more accessible for researchers from neighboring fields. The workshop therefore solicits papers that lie at the frontiers of ILP with neighboring fields. A non-exclusive list of interesting topics for the workshop includes : * ILP and Software Engineering: what has ILP to offer to Software Engineering ?, and in what way can Software Engineering help to design ILP systems and applications ? * ILP for Knowledge Discovery in Databases : ILP aims at learning complex rules involving multiple relations from small databases, whereas KDD typically induces simple rules about a single relation from a large database. Furthermore, ILP allows to exploit background knowledge in a variety of ways. Can KDD and ILP be succesfully combined ? * ILP and Computational or Algorithmic Learning Theory : though many results have been obtained concerning the learnability of inductive logic programming, most of the results are negative and most of the positive results are reducible to propositional learning methods. Is there a mismatch of COLT with ILP ? and if so, what can be done about it ? * ILP versus propositional learning methods : Since the very start of ILP, researchers and practioners of machine learning have wondered about the relation between ILP and propositional learning methods. Theoretical and experimental questions that arise include: when to use ILP and when to use propositional learning methods ? under what circumstances can ILP be reduced to propositional learning ? what is the price to pay for using first order logic in terms of efficiency ? * ILP and Knowledge Representation : ILP has traditionally employed computational logic to represent hypotheses and observations. Alternative well-founded knowledge representation formalisms have received little attention (with the exception of CLASSIC). What can ILP learn from Knowledge Representation ? and in what well-founded Knowledge Representation formalisms is induction feasible ? * ILP in multistrategy learning : Multistrategy learning combines multiple learning strategies. What role can ILP play for multistrategy learning ? * ILP and Probabilistic reasoning: in contrast to propositional learning methods, ILP has not used probabilistic representations. How can ILP incorporate such representations ? and how can it interact with methods such as Bayes nets or Hidden Markov Models ? * ILP for Intelligent Information Retrieval: The rapid development of the World Wide Web has spawned significant interest in intelligent information retrieval. In particular, the need for algorithms for reliably classifying textual documents into given categories (like interesting/uninteresting) be useful for a wide variety of tasks. Currently, most learning algorithms are not able to make use of structural information like word order, succesive words, structure of the text, etc. Can ILP algorithms offer advantages over conventional information retrieval or machine learning algorithms for this sort of tasks? * Applications of ILP in subfields of AI : ILP has been applied to other subfields of AI, including natural language processing, intelligent agents and planning. Further applications of ILP within AI are solicited. Both position papers about the relation of ILP to other fields, as well as research papers that make specific techical contributions are solicited. However, to stimulate discussion, it is expected that each technical paper also clarifies the position of ILP with regard to the neighboring field(s) it addresses. Except for the presentation of position and technical papers, the workshop will also feature a panel discussion on the frontiers of ILP and possibly an invited talk. ORGANISERS Luc De Raedt (chair and primary contact) Saso Dzeroski Koichi Furukawa Fumio Mizoguchi Stephen Muggleton PROGRAMME COMMITTEE Francesco Bergadano (Italy) Luc De Raedt (co-chair, Belgium) Saso Dzeroski (Slovenia) Johannes Furnkranz (Austria) Koichi Furukawa (Japan) David Page (U.K.) Fumio Mizoguchi (Japan) Ray Mooney (U.S.A.) Stephen Muggleton (co-chair, U.K.) CALL FOR PARTICIPATION Participation is open to all members of the AI Community. However, to encourage interaction and a broad exchange of ideas the number of participants will be strictly limited (preferably under 30 and certainly under 40). Participants will be selected on the basis of submissions. Three types of submissions will be considered : 1) technical contributions (ideally, a 3 to 5 page extended abstract, in the IJCAI Proceedings Format, 3000-4000 words), 2) position papers (ideally, a 1 to 3 page abstract in the IJCAI Proceedings Format, 1000 - 3000 words) 3) a statement of interest (ideally, a one page motivation of why you would like to participate, 300- 500 words) Only submissions of type 1) and 2) will be considered for presentation at the workshop and inclusion in the workshop notes. Submissions should be received no later than April 1, 1997, and must include first author's complete contact information, including address, email, phone, and fax number. Though 1 April is the hard deadline, the authors are encouraged to submit their material by 24 March, in order to facilitate the reviewing process. Double submissions with the ILP-97 Workshop (which is to take place in Prague, September 1997) are allowed. SUBMISSIONS Submit papers by email (postscript) and surface mail (2 copies) to Luc De Raedt Dept. of Computer Science Katholieke Universiteit Leuven Celestijnenlaan 200A B-3001 Heverlee Belgium Email : [email protected] IMPORTANT DATES - Paper submission : 1 April - Notification to Authors : 21 April - Camera ready copy : the submissions themselve will serve as camera ready copy (submissions in the IJCAI Proceedings Style are strongly preferred, see http://www.ijcai.org/ijcai-97/ for details) PUBLICATION The accepted submissions will be included in the workshop notes to be distributed at the workshop. Post-conference publication of a selection of the workshop papers will be considered and discussed at the workshop. COSTS To cover costs, a fee of $US 50 will be charged, in addition to the normal IJCAI-97 conference registration fee. Attendees of IJCAI workshops will be required to register for the main IJCAI conference. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: "MANAGO" <[email protected]> Subject: 2 days course on Data Mining & CBR in San Francisco for University of Berkeley Extension Date: Tue, 18 Feb 1997 17:23:09 +0100 Continuing Education in Engineering University of California Berkeley Extension Intensive short course at the San Francisco Airport Course Organizer Michel Manago, Acknosoft Course Lecturers Dr Usama Fayyad, Senior Researcher, Microsoft Research Dr Michel Manago, President, Acknosoft international Dr Evangelos Simoudis, Vice President, Data Mining and Decision Support Solutions at IBM Data Mining and Case-Based Reasoning (CBR): Principles and Applications An intensive two-day course Monday-Tuesday, March 24-25, 1997 San Francisco Airport Course Description The objective of this course is to present technologies for making better use of data for decision-making purposes. Data mining techniques are used to extract decision knowledge: for instance, in the form of a decision tree or decision rules from a database. Case-based reasoning is the name given to problem-solving methods that make direct use of past experiences (cases) rather than a corpus of general knowledge. Data mining (DM) and case-based reasoning (CBR) technologies can be used to: * Explore and analyze databases and generate hypotheses about the data; * Anticipate future events (decision support); * Solve a new problem, whose solution is unknown, by retrieving and adapting similar problems that have been previously solved. According to the meta-group, the market for data mining is estimated at $800 million by the year 2000. It is considered to be one of the three key technologies that will have the biggest impact on information technologies in the third millennium. The course addresses both practical and theoretical issues. We will compare and contrast the technologies, present the architecture of CBR and DM systems, describe some algorithms, and more. We also will show how: cases are indexed for efficient retrieval; the similarity between new and past cases is assessed; cases can be represented; to use domain knowledge in addition to data to characterize applications domains and reveal the underlying methodology for building an application. We will identify the market and present real applications in various domains such as technical maintenance (diagnosis of Boeing 737 aircraft engines), customer support (help desk for troubleshooting SEPRO robots in the plastic industry), configuration (layouts of composite parts of an autoclave at Lockheed), financial decision support, retail, and fraud detection. Who Should Attend This course is intended for: * Business analysts who want to have an in-depth overview of data mining technology and learn what it can really do and cannot do * IT managers and technical staff who are in charge of engineering business information systems and who want to learn how to implement data mining solutions * End-users who need to make better use of their data for decision making * Customer service managers, maintenance managers, manufacturing managers, financial decision makers who want to learn how to solve problems more efficiently and at reduced costs Anyone with a specific application in mind can benefit from the course, which provides an overview of the technologies as well as of the applications. Non-technical people will benefit from the basics of the course, such as general principles and overview of applications (quantification of business benefits, for example). There are no prerequisites; this tutorial describes basic notions and illustrates these with meaningful examples from a variety of applications in technical maintenance, customer support, manufacturing, banking, and the consumer market. Computer skills are not required. Schedule Monday-Tuesday, March 24-25, 1997 Registration: 8:00 am Monday Lectures: 8:30 am-4:30 pm daily Lunches: noon-1:00 pm daily Location Embassy Suites Hotel, San Francisco Airport, 150 Anza Blvd., Burlingame, California. Fee The fee is $895 (EDP 326611). This includes: * 2 days of instruction (1.4 ceu) * Comprehensive course notes * Daily lunches and refreshments Topic Outline Day One From Data to Decisions This brief introduction will provide to the attendees a common ground that will enable them to understand and participate in the rest of the tutorial. We will define knowledge discovery (KDD) in databases and case-based reasoning (CBR) Introduction to Knowledge Discovery in Databases In this section we will: Provide a general architecture for a generic KDD system that will enable the subsequent discussion of the fundamental KDD issues, presentation of the various KDD techniques, and description of various existing KDD systems. Present the basic knowledge discovery process, from the initial stages of selecting data and cleaning of the selected data, to the identification of important attributes and the final stages of integrating the extracted knowledge into a decision support system. Briefly discuss the various types of data mining techniques that are commonly used for KDD. A brief introduction of CBR will be made. Outline the core research issues in the field of KDD, as well as present how these issues relate to fundamental AI issues such as representation and search. Preparing Data for Mining The quality of the knowledge extracted by a KDD system from a data set is related to the quality of the provided data. In this part of the tutorial we will: Examine various data problems, e.g., noisy data, incomplete data, low-information content data, etc. Discuss how each such problem affects the KDD operation. Present techniques for solving certain of these problems, e.g., data cleaning techniques. The large size of the databases that must be analyzed necessitates the use of sampling techniques and the application of dimensionality reduction techniques on a data set before a data mining method is applied to it. We will present commonly used sampling methods and discuss how they can be implemented. We will also discuss commonly used dimensionality reduction techniques from statistics, e.g., principal component analysis, and the use of domain knowledge for identifying important attributes of a data set. Due to the particular prevalence and importance of time-series data in a variety of application domains, we will discuss techniques for preprocessing such data before it is presented to a KDD system. Data Mining and Technique Selection We will present data mining techniques from five basic areas: (1) artificial intelligence, (2) neural networks, (3) statistics, (4) multidimensional and deductive databases, and (5) data visualization. With each type of technique we will present its pros and cons with respect to the generic KDD model defined in the tutorial's first part. Databases and Visualization Techniques Multidimensional and deductive databases merge knowledge-based techniques with database technology. Recently such databases have been successfully coupled with relational and legacy database management systems, providing analysts with unique ways to express and automatically test hypotheses on very large data sets. In addition, research on very large databases has resulted in a variety of KDD techniques, such as association discovery and sequence discovery. These techniques are based on simple database operations, such as aggregation, and are applicable to specific types of data, such as those commonly collected by large retail chains. We will provide an introduction to multidimensional and deductive databases, discuss data warehousing concepts, present how these techniques can be applied on KDD tasks, and review the current research on databases. Visualization has traditionally been used for the presentation of results obtained by other methods, e.g., statistical analysis. We will discuss how interactive visualization techniques can be used for knowledge discovery operations. We will begin with simple techniques (scatter plots and line plots, for example) and proceed with modern 3-D visualization techniques. Some Examples of KDD Applications We will first develop a set of criteria for comparing KDD systems. We will then review in depth two such systems developed by the authors and considered by the research community as representing the state-of-the-art: IBM's customer segmentation data mining system and JPL's SKICAT system. In addition to presenting the architecture of each system and discussing the KDD methods it integrates, we will present a detailed account of how the systems have been applied on financial, retail, manufacturing, astronomy, and large image databases in planetary sciences. Demonstration of a Data Mining System and Applications Summary of the Day and Discussion Summary, recap, overview of the basic unifying themes, and pointers to available literature on KDD and future work. Day Two Overview of Case-Based Reasoning (CBR) Technology In this introduction, we will present an overview of CBR, detail the CBR cycle, and explain the main characteristics of CBR technology. Applications of CBR in Technical Domains We will present several CBR applications in technical domains. These deal with maintenance, customer support, manufacturing, design, rapid evaluation of production costs, and sale-support. Troubleshooting CF56-3 engines for the Boeing 737. Time spent by airline maintenance operators to solve engine failures and related costs (flight delays or cancellations) are a major concern. The use of an intelligent diagnostic software contributes to improving customer support and reduces the cost of ownership by improving troubleshooting accuracy and reducing airplane downtime. We will examine this application from the engine manufacturer perspective (CFM international/Snecma) as well as from the client's perspective (British Airways). Integration of the CBR troubleshooting with electonic technical documentation. Demonstration. A help desk for troubleshooting SEPRO robots in the plastic industry. Case study from a small size company (160 employees) that has adopted CBR for it customer support services. Demonstration. Improving feedback from experience in manufacturing. We will present the ongoing Noemie data warehousing and data mining project. Noemie aims at increasing the quality and reliability of equipments for the oil industry. Case study from the manufacturer perspective (Schlumberger) as well as from the end-user's perspective (Nork Hydro). CBR: How It Works Based on the review of applications that will have been presented during the morning, we will go into the details of the algorithms and present how they have been used. In particular, we will describe mechanisms for: retrieving cases; assessing the similarity; and indexing cases. We will describe the link between induction, a form of KDD, and CBR. We also will present some sample algorithms. Comparing CBR with Other Technologies During this part of the tutorial, we will compare CBR and other technologies for decision making. In particular, we will look at rule-based expert systems, classical statistics, neural networks, and standard database queries. We will review a case study done at a banking institution for comparing credit scoring, CBR, and rule-based expert systems. Case-Based Reasoning in Practice During this final presentation, we will detail the basic steps and a methodology for building a CBR system. We will describe how to model cases, state how cases can be acquired from scratch or from existing databases, review potential sources for the cases, and explain how to choose an algorithm. We will also investigate organizational issues for assuring case quality and explain how human factors have to be taken into consideration when delivering a CBR application. Summary of the Tutorial and Discussion Lecturers Usama Fayyad, Ph.D., is a Senior Researcher at Microsoft Research. He is also a Distinguished Visiting Scientist at the Jet Propulsion Laboratory (JPL), California Institute of Technology, and an adjunct professor of computer science at University of Southern California. Prior to joining Microsoft Research, he headed the Machine Learning Systems Group at JPL and was Principal Investigator of the Science Data Analysis and Visualization Task and other tasks involving machine learning applications. He received his Ph.D. in computer science and engineering from the University of Michigan, Ann Arbor. He is a recipient of the NASA Exceptional Achievement Medal (1994) and the 1993 Lew Allen Award for Excellence at JPL. He has co-chaired Knowledge Discovery in Database conferences KDD-94 and KDD-95, and is general chair of KDD-96. He is a co-editor of Advances in Knowledge Discovery and Data Mining (AAAI/MIT Press 1996), and Editor-in-Chief of a new journal on this topic (Kluwer). Michel Manago, Ph.D., is the scientific and managing director of AcknoSoft. Dr. Manago graduated from the University of Illinois at Urbana-Champaign and obtained his Ph.D. at the University of Paris, writing his thesis on "Integration of Symbolic and Numeric Techniques in Machine Learning." He has applied DM and CBR in technical domains such as diagnosis of Boeing 737 engines, customer support for marine diesel engines and robots, maintenance of trains, reliability analysis of gas meters, experience feedback to increase quality of production when manufacturing oil equipment, nuclear safety, design of plastic parts in the manufacturing industry, and active sale support over the Internet. He is author of the KATE line of products for DM and CBR. He is editor of the book Advances in Case Based Reasoning (Springer Verlag, 1995) and author of the report "A Review of Industrial Case-Based Reasoning. He received the Information Technologies European Award in 1995 (the European "Nobel prize" in computer technologies), among other honors. Evangelos Simoudis, Ph.D., is Vice President, Data Mining and Decision Support Solutions at IBM, where he is responsible for the development and deployment of data mining solutions to IBM's customers worldwide. Prior to joining IBM, Dr. Simoudis was a Group Leader of the Data Comprehension Group at the Lockheed AI Center where, since 1991, he led the development and market introduction of the Recon data mining system and led research on knowledge discovery in databases, machine learning, case-based reasoning and their application to financial, retail, and fraud detection problems. In 1994 Dr. Simoudis and his team were awarded Lockheed's Pursuit of Excellence Award for their work on the Recon system. Dr. Simoudis is also an adjunct assistant professor at the computer engineering department of Santa Clara University. Dr. Simoudis holds a Ph.D. in computer science from Brandeis University, an M.S. in computer science from the University of Oregon, a B.S. in electrical engineering from the California Institute of Technology, and a B.A. in physics from Grinnell College. Enrollment Information Enrollment may be made by companies or individuals. Enrollment is limited and advance enrollment is required. Upon request, a place in the course will be reserved for individuals who require time to obtain authorization. To reserve a place, call (510) 642-4151, or fax (510) 642-6027. How to enroll By phone: You may enroll by phone if you use MasterCard, Visa, or American Express; call (510) 642-4111. By fax: If you use MasterCard, Visa, or American Express, fill out the form on the back of this brochure and send it via fax number (510) 642-0374. Please be sure to fax the entire form including the mailing label, if there is one. Please provide all the information requested on the form. By mail: Fill out and return the enrollment form provided. By purchase order: Companies, agencies, and other organizations may pay course fees by purchase order. Enrollments must be accompanied by the full fee or by purchase order authorization. You may pay by check or use MasterCard, Visa, or American Express. Make checks payable to the UC Regents. For efficient enrollment processing, we must have the Priority Code from this publication, whether or not it is addressed to you. This five-digit code (three numbers and two letters) appears on the mailing label above the addressee's name. If there is no label on your copy, the code appears in a box in the middle of the address surface. Cancellation policy: Any cancellation is subject to a $30 processing fee. Cancellations received less than five working days from the start of the course are subject to a $100 cancellation fee. Substitutions may be made at any time. If the course is not held for any reason, UC Berkeley Extension's liability is limited to refund of the full course fee. Confirming your enrollment: If you enroll by mail and have not received an enrollment confirmation five days prior to the scheduled date of the course, please call (510) 642-4151 to confirm that the course will convene as scheduled. Housing A group of rooms will be set aside at the Embassy Suites Hotel, San Francisco Airport, 150 Anza Blvd., Burlingame, California, and reservation information will be sent to enrollees. Participants may reserve rooms in advance with Embassy Suites, phone (415) 342-4600 or fax (415) 342-8109. Special rates will be available; participants in these courses should so identify themselves when requesting room reservations. Reservations must be made no later than one month before the date of your course. After this date room reservations will be accepted only on a rate and space availability basis. Airport transportation and parking Courtesy shuttle service is provided between the hotel and the airport. There is ample free parking available at the hotel. Continuing education units (ceu) These units are a nationally recognized means of recording noncredit study and are accepted by many employers and relicensure agencies as evidence of a serious commitment to career advancement and the maintenance of professional competence. One ceu is awarded for each 10 hours of attendance. If you want us to keep a record of your ceu study you must fill out and return a form that will be distributed in class. Program Coordinator Linda Reid, Continuing Education in Engineering, University Extension, University of California, Berkeley Program Representative Natalie Dennis, Continuing Education in Engineering, University Extension, University of California, Berkeley General Information Housing A group of rooms will be set aside at the Embassy Suites Hotel, San Francisco Airport, 150 Anza Blvd., Burlingame, California, and reservation information will be sent to enrollees. Participants may reserve rooms in advance with Embassy Suites, phone (415) 342-4600 or fax (415) 342-8109. Special rates will be available; participants in these courses should so identify themselves when requesting room reservations. Reservations must be made no later than one month before the date of your course. After this date room reservations will be accepted only on a rate and space availability basis. Airport transportation and parking Courtesy shuttle service is provided between the hotel and the airport. There is ample free parking available at the hotel. Continuing education units (ceu) These units are a nationally recognized means of recording noncredit study and are accepted by many employers and relicensure agencies as evidence of a serious commitment to career advancement and the maintenance of professional competence. One ceu is awarded for each 10 hours of attendance. If you want us to keep a record of your ceu study you must fill out and return a form that will be distributed in class. Program Coordinator Linda Reid, Continuing Education in Engineering, University Extension, University of California, Berkeley Program Representative Natalie Dennis, Continuing Education in Engineering, University Extension, University of California, Berkeley If you have questions Call (510) 642-4151, e-mail [email protected], fax (510) 642-6027, or write to Continuing Education in Engineering, University Extension, UC Berkeley, 1995 University Ave., Berkeley, CA 94720-7010 The University of California, in accordance with applicable federal and state law and University policy, prohibits discrimination, including harassment, on the basis of race, color, national origin, religion, sex, disability, age, medical condition (cancer-related), ancestry, marital status, citizenship, sexual orientation, or status as a Vietnam-era veteran or special disabled veteran. This nondiscrimination policy covers admission, access, and treatment in University programs and activities. Inquiries may be directed as follows: sex discrimination and sexual harassment: Carmen McKines, Title IX Compliance Officer, (510) 643-7895; disability discrimination and access: Ward Newmeyer, A.D.A./504 Compliance Officer, (510) 643-5116 (voice or TTY/TDD); age discrimination: Alan T. Kolling, Age Discrimination Act Coordinator, (510) 642-6392. Other inquiries may be directed to the Academic Compliance Office, 200 California Hall, #1500, (510) 642-2795. CONTRACT TRAINING Enlist our experts at your location At UC Berkeley Extension we're committed to working with you and your staff to help achieve your objectives. Through the Berkeley Partnership for Professional Development, we'll meet with you to analyze your staff's training needs, then custom-design a program to satisfy your special requirements. Or you can select from our many established courses. Contract training offers: _ Choice of format: from workshops and sequential classes to multiday residential seminars _ Highly qualified instructors _ Convenient location: on-site at your company or at a facility of your choice _ Courses tailored to your needs To discuss your training needs, call Karl Johnson at (510) 642-4151 or fax (510) 642-6027 ENROLL BY FAX with MasterCard, Visa, American Express, or a company purchase order: (510) 642-0374. Or enroll by phone with MasterCard, Visa, or American Express: (510) 642-4111. Please give us the Priority Code (see below) if you enroll by phone. To enroll by mail, return this entire page. Please do not remove the mailing label. Mail to: Dept. B, UC Berkeley Extension, 1995 University Ave., Berkeley, CA 94720. Name last first middle Position Company name BUSINESS ADDRESS number street mail stop city state zip Daytime phone Fax number These numbers are requested so that you can be notified if there is a change in the schedule or status of your course. Priority Code 6 0 9 ___ ___ For efficient processing, we must have the Priority Code from this publication, whether or not it is addressed to you. This 5-digit code (3 numbers and 2 letters) appears on the mailing label above the addressee's name. If there is no label on your copy, the code appears in a box in the middle of the address surface. I enclose $ ___________to cover_______enrollments in: _____ Data Mining and Case-Based Reasoning $895 EDP 326611 To pay by check, make check payable to the UC Regents. To use oMasterCard oVisa oAmerican Express check appropriate box and give: account number date card expires authorizing signature For companies/agencies: _____ Purchase order enclosed (For proper processing this form must accompany your purchase order.) Michel Manago AcknoSoft 58 rue du Dessous des Berges 75013 Paris - France tel : (33 1) 44 24 88 00, fax : (33 1) 44 24 88 66 web : http://www.AcknoSoft.com >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: "MANAGO" <[email protected]> Subject: Tutorial on CBR & Data Mining in London + 2 days seminar on applications of CBR & Data Mining Date: Tue, 18 Feb 1997 17:32:40 +0100 The following events are taking place in London on 17-19 March 1997 For registration please see the website (http://www.unicom.co.uk). Principles & Applications of CBR & Data Mining UNICOM Tutorial + Seminar Organized by Dr Michel Manago, Acknosoft OBJECTIVES: The objective of the tutorial is to present technologies for making better use of data for decision making purposes. Induction is a data mining technique that is used to extract decision Knowledge, for instance in the form of a decision tree or decision rules, from a database. Case-Based Reasoning is the name given to problem solving methods that make direct use of past experiences (cases) rather than a corpus of general Knowledge. The technologies can be used for: 1. Exploring and analysing databases and generate hypothesis about the data 2. Anticipate future events (decision support) 3. Solve a new problem, whose solution is unknown, by retrieving and adapting similar problems that have been previously solved. During this course, we will describe the underlying techniques and methodologies to improve the decision making process by making better use of data. The course will address both theoretical and practical issues. We will compare and contrast the technologies, present the architecture of a CBR and a DM System, describe some algorithms etc. We will show how cases are indexed for efficient retrieval, how the similarity between new and past cases is assessed, how cases can be represented, how to use domain knowledge in addition to data, characterise applications domains and reveal the underlying methodology for building an application. We will identify the market and delineate real applications in various domains. A. From data to decisions The brief Introduction will provide to the attendees a common ground that will enable them to understand and participate in the rest of the tutorial. We will define Data Mining (induction) and Case-Reasoning (CBR). B. Introduction to induction In this section we will: 1. Present how to generate decision tree by induction 2. Present the inductive process, from the initial stages of selecting data to the identification of important attributes, and the final stages of integrating the extracted knowledge into a decision support system. C. Presentation of Based Reasoning (CBR) technology In this introduction, we will present an overview of CBR, detail the CBR cycle and explain the main characteristics of CBR technology.. D. CBR : how it works We will go into the details of the algorithms and present how they have been used. In particular, we will describe mechanisms for : 1. retrieving cases 2. assessing the similarity 3. Indexing cases. We will describe the link between induction, a form of KDD, and CBR Finally, we will present some sample algorithms. E. Preparing Data for CBR and Data Mining The quality of the knowledge extracted by a decision support system from a data set, is related to the quality of the provided data. In this part of the tutorial we will examine various data problems, e.g., noisy data, incomplete data, low-information content data, etc. F. Comparing induction and CBR with other technologies During this part of the tutorial, we will compare KDD & CBR and other technologies for decision making. In particular, we will look at rule based expert systems, classical statistics, neural networks and standard database queries. We will review a case study done at a Banking institution for comparing credit scoring, CBR and rule base expert systems. G. Applications of CBR and data mining During this final presentation, we will detail the basic steps and a methodology for building a CBR system. We will describe how to model cases, stated how cases can be acquired from scratch or from existing databases, review potential sources for the cases and explain how to choose an algorithm. We will also investigate organisational issues for assuring case quality and explain how human factors have to be taken into consideration when delivering a CBR application. We will also try to characterise the market for CBR and data mining. H. Summary of the tutorial and discussion PRESENTER: Dr Michel Manago graduated from the University of Illinois in Urbana-Champaign in 1983. He obtained his PhD in 1988 at University of Paris on "Integration of Symbolic and Numeric Techniques in Machine Learning. Since 1991, Dr Manago has been the scientific and managing director of AcknoSoft where he has been "putting the technology to use". Michel Manago is the father of the KATE line of products for taking smart decisions from data. He was chairman of the 2nd European workshop on CBR in 1994, editor of the book Advances in Case Based Reasoning (Springer Verlag, 1995) and author of the report "A review of industrial Case Based Reasoning. Dr Michel Manago received the Information Technologies European Award in 1995 (the European "Nobel prize" in computer technologies), the 1st prize for innovative software application at the XPS trade show in Germany in 1995 and the 1996 Application of the Year award by the French computer magazine "Decision micros et rouseaux". CBR and Data Mining: Putting the Technology to Use BACKGROUND Companies have gathered vast amounts of data that is not well used. Some corporate databases almost work in write-only mode! Well exploited, this mass of data could be turned into strategic corporate knowledge : - the marketing department wants to discover trends in buyer behaviour - the after sales division must work more efficiently so that the company keeps customers - the financial department wants to assess risks in a better way - quality management and control must be improved... However, going from data to decisions is not an easy task. Innovative computer technologies such as data mining and Case Based Reasoning (CBR), will help you solve complex problems in domains where experience plays a critical role in good decision making. And with only a short delay develop a solution and a guaranteed payback. (C) Copyright AcknoSoft, 1997 OBJECTIVES : The goal of this seminar is to get a clear view about the state of the art of applying data mining and CBR technologies to solving practical problems. The emphasis of the seminar will be on presentations done by users of the technology as opposed to technology providers. They will share their experience and delineate the benefits as well as the difficulties of putting the technologies into use. The themes that will be covered by the speakers include - What are CBR and data mining? - Features of the software products they have used to build their application - Comparison of data mining and CBR with other technologies - Methodologies for case acquisition and maintenance - Ensuring case quality and monitoring it over time - Organisational issues that needed to be solved in order to field the application - Human factors - Overcoming technological risks - Cost and benefits of using data mining and CBR in various domains The goal of the seminar is to present a clear view about issues that are in common when building CBR and data mining applications in different domains (banking, insurance, customer support and help desk, manufacturing, energy). We will focus on general topics such as how to assess the costs and quantify the benefits of using the technology, how to model cases so that they contain the right sort of knowledge for decision making purposes, how to use the tools to build systems that analyse cases efficiently or how to manage a CBR project from the customer's perspective. Benefits of Attending -Find out how the knowledge of your specialists available to everyone in your organisation -Learn how to solve problems more quickly without the burden of building expert systems -Capitalise your experience -Elicit the user point of view -Share experience with other CBR application developers -Find out how to analyse and distill your data into usable knowledge -Take smart decisions that are based on your experience Programme Day 1 Brief introduction by Michel Manago Short presentation about Data Mining and CBR, introduction of the objectives of the seminar. Using data mining and CBR at Deloitte & Touche Olivier Curet and Jonathan Killin Deloitte & Touche Consulting Group UK,=
410.17	97:08	IJSAPL::OLTHOF	Spellchecked Henry Although	`Sat Mar 01 1997 14:30`	667
	Knowledge Discovery Nuggets 97:08, e-mailed 97-02-28 News: * GPS, New Location for KD Mine and KD Nuggets: www.kdnuggets.com * W. Kloesgen, KDD-97: Second Call For Panel Proposals * P. Maiste, Price Waterhouse announces new data mining services * T. Denecke, Query: Data Mining and Workflow Management ? * D. Throop, Query: Finding approximately duplicate records ? Publications: * P. Stolorz, CFP: DMKD special issue on scalable computing http://www.research.microsoft.com/research/datamine/dmkdpar Siftware: * G6G, Intelligent Software Web Site, http://www.intelligent-dir.com Positions: * W. Buntine, summer students and scientist positions in autonomous data analysis * B. Masand, KDD Job at GTE Laboratories, Waltham, Ma * S. Wrobel, Two positions in Machine Learning/Data Mining at GMD -- KD Nuggets is a free electronic newsletter for the Data Mining and Knowledge Discovery community, focusing on the latest research and applications. Submissions are most welcome and should be emailed, with a DESCRIPTIVE subject line (and a URL) to [email protected] To subscribe, email to [email protected] message with subscribe kdd-nuggets in the first line (the rest of the message and subject are ignored). See http://www.kdnuggets.com/subscribe.html for details. Nuggets frequency is 3-4 times a month. Back issues of Nuggets, a catalog of Siftware (data mining tools), and a wealth of other information on Data Mining and Knowledge Discovery is available at Knowledge Discovery Mine site http://www.kdnuggets.com/ -- Gregory Piatetsky-Shapiro (editor) ******************* Official disclaimer ********************************* * All opinions expressed herein are those of the writers (or the moderator) * * and not necessarily of their respective employers (or GTE Laboratories) * *************************************************************************** ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ "An experimental science is supposed to do experiments that find generalities. It's not just supposed to tally up a long list of individual cases and their unique life stories. That's butterfly collecting." Richard C. Lewontin, biology professor at Harvard University Thanks to Yolanda Gil >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Fri, 28 Feb 1997 09:41:10 -0500 (EST) From: GPS <[email protected]> Subject: New Location of KD Mine -- www.kdnuggets.com I have set up a new location for Knowledge Discovery Mine web site -- www.kdnuggets.com -- which is operational today, Feb 28, 1997. I will continue to maintain and improve that site in my new job -- see www.kdnuggets.com/gps.html The GTE location at info.gte.com/~kdd will remain for some time, but I will not be updating it. I will also continue to edit and email Knowledge Discovery Nuggets (I have dropped the second D to emphasize the more general focus). It will be gradually transitioned to kdnuggets.com site, but in the meantime will continue be distributed from GTE. The changeover should be transparent to all subscribers. -- Gregory Piatetsky-Shapiro please address KD Nuggets related email to [email protected] (which is an alias for [email protected]) other email to me to [email protected] >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Wed, 26 Feb 1997 14:41:27 +0100 From: [email protected] (Willi Kloesgen) Subject: KDD-97 organization -- call for panels As in previous KDD conferences, the KDD-97 program will include panel discussions. A great panel requires an interesting topic, good speakers, and proper preparation. To facilitate all three we solicit early suggestions. Please submit suggestions for topics and preferably also for panelists who could represent diverse positions or approaches of the topic. Suggested topics should relate to any of the main KDD-97 topics (see http://www-aig.jpl.nasa.gov/kdd97). The panel topics should be of general interest for a large part of the KDD audience and allow several (controversial) approaches to be discussed. Please email informal suggestions by April 2, 1997 (earlier if possible) to: Willi Kloesgen [email protected] >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Fri, 21 Feb 1997 13:34:22 +0100 From: Tom Denecke <[email protected]> Subject: Data Mining and Workflow Management I am a student of Business Science and working in a research project "controlling of workflow processes". My idea is to use data mining techniques to evaluate the control data of workflow systems. My problem is that I am not very familiar with that technical terms. So it would be great to get a hint which methodolgies would fit to this application domain. Here is a little description which kind of information can be achieved: There several process instances of each process type(for example auditing). After the execution of 100 instances, there exist a lot a data for this process type, which can be explored. - processing and idle time - who executed the process (employee, role, orga. unit) - which kind of workflow - which activities were executed - data about the process object (which customer, article ...) - which other processes are running - metrics concerning quality and cost of a process/activity - ... We would like to generate rules about the process performance (bottle neck detection, when does a process perform well,..). I would be very kind to get a little information, if there a similar problems, which are solved by data mining techniques or just literature hint. Thank you very much Tom Denecke - MBA - WWU Muenster Rudolf-Harbig-Weg 24 48149 Muenster PHONE + 49 251 89 75 65 >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [The following is a commercial announcement. GPS] From: [email protected] Date: Fri, 28 Feb 97 08:09:07 EST Subject: Press Release: Opening of a Knowledge Discovery Center IMMEDIATE RELEASE CONTACT: Price Waterhouse Management Consulting in New York: Jan Butler 212- 819-4838, [email protected] Liza Kurtz 212-995-5680, ext. 210, [email protected] PRICE WATERHOUSE LLC ANNOUNCES NEW DATA MINING SERVICES AND OPENING OF KNOWLEDGE DISCOVERY CENTER New York, NY - February 26 - Price Waterhouse Management Consulting, a recognized leader in delivering data warehouse services to global companies, introduces Data Mining Services for helping clients achieve strategic value from the mounds of data often accumulated in the course of business. An integrated offering of Price Waterhouse's Global Data Warehouse Practice, the Data Mining Services range from introductory seminars on data mining and knowledge discovery to full data mining system implementations. To support these offerings, Price Waterhouse has opened the Knowledge Discovery Center in Bethesda, Maryland "Data mining has recently moved to the forefront of business executive's strategic data warehouse initiatives, driven by a significant growth in the amount of data that companies collect on their customers, processes, and finances," said Mike Schroeck, Global Data Warehouse Practice Leader for Price Waterhouse. Data mining technologies use sophisticated, automated algorithms to discover hidden patterns, correlations, and interacting relationships among the hundreds of strategic data elements collected by an organization. The impact of data mining on a company's bottom line, whether through increased revenues or decreased costs, is often enormous. A leader in data mining knowledge and research, Price Waterhouse has performed a comprehensive, hands-on evaluation of many of the leading data mining tools currently available on the market, and has spoken at a variety of conferences and trade shows on the subject. With years of analytical modeling and data analysis experience, Price Waterhouse can help clients get the greatest return on their data mining investment. "We are dedicated to offering value-added data mining analyses to our clients. The time for businesses to take advantage of these tools and algorithms has never been better," says Dr. Glenn Galfond, Partner in charge of Price Waterhouses Management Analytics practice, which is spearheading the firms Data Mining Services. The Data Mining Services offered by Price Waterhouse include Data Mining 101, Data Mining Proof, Data Mining Service, and Data Mining Solutions. Data Mining 101 is a half-day beginner's course in data mining. The course provides an overview of the technology, examples of how it has been successfully used, and a demonstration of the leading data mining tools. Data Mining Proof is a short proof of concept project, in which Price Waterhouse mines a small extract of a client's data for quick, but rewarding results. This allows the client to see data mining's potential in a hands-on environment. Clients also receive a copy of PW's comprehensive Data Mining Tool Evaluation report. For companies that are ready to delve more deeply into data mining but do not have the necessary in-house resources, Data Mining Service offers a full range of data mining outsourcing options, including data extraction, data cleansing, and data mining. For companies that wish to implement enterprise-wide data mining systems, Data Mining Solutions offers Price Waterhouse's proven data mining and data warehousing methodology and full-scale systems implementation experience. The Knowledge Discovery Center will be used to support these services and to provide an environment for demonstrating the latest data mining tools and train clients in their use. Price Waterhouse has equiped the Center with many of the leading data mining tools. The technologies and algorithms available in the Center encompass the full-breadth of data mining capabilities. Galfond adds, "Price Waterhouse has invested heavily in the research and evaluation of the leading data mining tools. Our clients can take advantage of this investment while reaping the benefits that data mining brings to their companies." Price Waterhouse Management Consulting delivers enterprise-wide solutions to large multinational clients through integrated Information Technology and Change Integration services. With in-depth knowledge of selected industries and business process expertise, Price Waterhouse Management Consulting works with clients worldwide, from strategy through implementation, to help them improve business performance. Price Waterhouse Management Consulting services are provided in the U.S. by Price Waterhouse LLC. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ {Please cc responses to the [email protected] since the problem is of general interest. GPS] From: "Throop, David R" <[email protected]> Subject: Looking for phrase matching tool Date: Tue, 25 Feb 1997 10:03:30 -0600 Dr. Piatetsky-Shapiro, Thank you for your excellent website on data mining. I'm hoping you might help me, or point me towards someone who can. I'm looking for a piece of commercial software that may or may not exist. I couldn't find it on your pages, but your stuff is the closest I've found. So I'm asking you for any pointers. We have several databases which have lists of components (pieces of the International Space Station.) These databases have no common key. They do, however, have english-language descriptions of the components (on the order of 20 - 50 characters long.) However, these descriptions are not identical. For instance, a certain power switch is known by two different names: RPCM N1-3B-C Switch14 and N1-3B-RPCM-C-RPC-14 As you see, the order of the identifiers is different, one set uses the term 'switch' where another uses 'RPC', and the '14' is concatenated with no space on one side. Anyway, I'm looking for a piece of software that could go through the databases, (armed with a dictionary, list of abbreviations, synonyms etc) and come up with a set of best guesses about which items match. Do you know of such a tool, either as a commercial product or a research program? Thanks David Throop 281 212 9369 >~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Thu, 27 Feb 1997 22:43:45 -0800 (PST) From: DMKDPAR <[email protected]> Subject: CFP for DMKD special issue on scalable computing ============================================================================ CALL FOR PAPERS ============================================================================ DATA MINING AND KNOWLEDGE DISCOVERY Special Issue on Scalable High-Performance Computing for KDD Guest editors: Paul Stolorz and Ron Musick ========================================== http://www.research.microsoft.com/research/datamine/dmkdpar Traditional computational techniques and computer architectures are routinely overwhelmed by the sheer volume and complexity of information generated from data-gathering instruments, computational and experimental methodologies, and business operations. The fundamental problem of extracting knowledge and insight from massive databases and datasets is shared across a wide range of fields in business, academia and government. The new field of Data Mining and Knowledge Discovery in Databases (KDD) has arisen as an interdisciplinary response to this situation, merging ideas drawn from disciplines such as statistics, pattern recognition, machine learning, databases, visualization and high performance computing. This special issue of Data Mining and Knowledge Discovery is devoted to the challenge of applying data mining and knowledge discovery methods to large, complex datasets. Implementation of data mining ideas in high-performance computing environments is crucial for coping with large-scale data. In particular, parallel and distributed systems are needed to ensure system scalability as datasets grow inexorably in size and scope. These environments include dedicated massively parallel supercomputers, super-servers built from clusters of commodity workstations and high-speed network interfaces, and heterogeneous networks distributed over regional, national and global scales. High-performance and parallel computing holds the promise of scaling to large data sets, allowing the data mining component to search a much larger set of patterns and models than traditional computational platforms and algorithms would allow. In addition, it promises to render the KDD process much more interactive by allowing fast response times for difficult search and model fitting problems. Data Mining and Knowledge Discovery, published by Kluwer Academic publishers, is the flagship publication in the rapidly growing area of KDD. In this special issue we solicit the most dramatic new developments in high performance large-scale KDD applications, highlighting the promise of the technology and identifying the main challenges for the future. Technically innovative papers that describe new theoretical developments, or tackle the application of practical data mining approaches to real problems and datasets on parallel and distributed architectures, are solicited. Topics of interest include, but are not limited to, the intersection of KDD with the following fields: Parallel implementations of datamining & KDD methods: Classification and regression: e.g. decision trees, neural nets Pattern recognition Belief nets and other Bayesian approaches Genetic programming Association rules Statistical inference Similarity detection and measurement Clustering and density estimation Change-detection Text retrieval Content-based indexing Data visualization Trend Analysis Integration of KDD techniques with scalable I/O systems: Data warehouses & federated databases Parallel file systems High-performance network interfaces Intelligent data layout Out-of-core algorithms Parallel relational querying High performance storage systems Hierarchical and distributed storage Methods to control complexity: Random sampling Anytime algorithms applied to datamining techniques New complex data-type algorithms (eg. not based on feature vectors) Domain simplification techniques Inference error/confidence characterization Parallel, clustered and/or distributed applications: Datamining on commodity-based clusters and networks Web-oriented datamining Novel applications and case studies Knowledge discovery systems and tools SUBMISSION INSTRUCTIONS Electronic submissions are STRONGLY ENCOURAGED. Postscript copies of papers may be emailed to [email protected]. Latex style files and related instructions can be obtained at the web site http://www.research.microsoft.com/research/datamine. =============== IMPORTANT DATES =============== ********************************** SUBMISSION DEADLINE: May 8, 1997 ACCEPTANCE NOTIFICATION: June 20, 1997 ************************************ Enquiries about the submission process and scope of the special issue may be sent to [email protected]. >~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [The following is a commercial announcement. GPS] From: [email protected] Date: Mon, 24 Feb 1997 22:47:04 -0500 Subject: SAIC and G6G Develop an Intelligent Software Web Site "SAIC and G6G Develop an Intelligent Software Web Site" NEW Web-Site Address is: www.intelligent-dir.com Science Applications International Corporation's (SAIC) Asset Source for Software Engineering Technology (ASSET) Division has teamed up with G6G Consulting Group (G6G) and co-developed a ground breaking new World Wide Web (Web) site focused on "intelligent software." The new site contains the entire content of "The G6G Directory of Intelligent Software," a publication that contains over 750 abstracts covering 15 advanced technology corridors. "The G6G Directory of Intelligent Software" contains product abstracts in Expert (Knowledge-Based) Systems, Fuzzy Logic, Hypermedia, Hypertext and Multimedia, Intelligent Software Tools, Neural Networks, Object-Oriented Programming, Virtual Reality, Voice & Speech Systems, and other areas. The directory is further categorized by over 140 sub-categories of "what" the product can be used for or "what it is" such as: - Data Mining - Manufacturing Systems - Diagnostic Systems - Modeling - Help Desk Systems - Network Systems - Help Authoring Systems - Stock Market - Knowledge Management - Software/Hardware - Lending and Learning Systems - Software Development - Customer Support Systems - and many others. The directory content on this Web site will be updated on a weekly basis. The combination of G6G's directory and ASSET's on-line free and commercial product inventory will present a powerful complement of information on the Web. Knowledge engineers, software engineers, developers and other users of intelligent software products will find www.intelligent-dir.com to be extremely useful. This valuable free resource will help create a sense of community in the world of intelligent software by providing an on-line source of searchable information about intelligent software products and vendors. __________________________________________________ The G6G Directory of Intelligent Software -------------------------------------------------- http://www.intelligent-dir.com -------------------------------------------------- SAIC/ASSET G6G Consulting Group (304) 284-9000 (310) 458-4187 [email protected] [email protected] __________________________________________________ >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Tue, 18 Feb 1997 14:39:31 -0800 From: Wray Buntine <[email protected]> Subject: summer students and scientist positions in autonomous data analysis Please note the two sets of positions below. Research scientist 2 summer students, or longer term support for PhD The summer student position could be transferred into longer term support for focussed PhD research if the interest is right. Wray Buntine ======================= Scientist NASA's Center of Excellence in Information Technology at Ames Research Center invites candidates to apply for a position as Research Scientist in Information Technology: Position description: * We seek applicants to join a small team of space scientists and computer scientists in developing NASA's next generation smart spacecraft with on-board, autonomous data analysis systems. The group includes leading space scientists (Ted Roush, Virginia Gulick) and leading data analysts (Wray Buntine, Peter Cheeseman), and their counterparts at JPL. * The team is doing the research and development required for the task, and has a multi-year program with deliverables planned. This is not a pure research position, and requires dedication in seeing completion of the R&D milestones. * The applicant will be responsible for the information technology side of R&D, with guidance from senior space scientists on the project. * The research has strong links with on-going work at the Center of Excellence and is an integral part of NASA's long term goals. Candidate requirements: * Strong interest in demonstrating autonomous analysis systems to enhance science understanding in operational tests, with the ultimate goal of putting such systems in space. * Ph.D. degree in Computer Science, Electrical Engineering, or related field, and applied experience, possibly within the PhD. In exceptional cases, an M.S. degree with relevant work experience will suffice. * Knowledge of neural or probabilistic networks, machine learning, statistical pattern recognition, image processing, science data, processing, probabilistic algorithms, or related topics is essential. * Strong communication and organizational skills with the ability to lead a small team and interact with scientists. * Strong C programming and Unix skills (experimental, not necessarily production), with experience in programming mathematical algorithms: C++, Java, MatLab, IDL. Application deadline: * March 15th, 1997 (hardcopy required -- see below). Please send any questions by e-mail to the addresses below, and type "PI for Autonomous data analysis" as your header line. Dr. Ted Roush: [email protected] Dr. Wray Buntine: [email protected] Full applications (which must include a resume and the names and addresses of at least two people familiar with your work) should be sent by surface mail (no e-mail, ftp or html applications will be accepted) to: Dr. Steve Lesh Attn: PI for Autonomous data analysis Mail Stop 269-1 NASA Ames Research Center Moffett Field, CA, 94035-1000 ============================== Summer students or Student Assistantship NASA's Center of Excellence in Information Technology at Ames Research Center invites current PhD students to apply for a summer position (possibly two available). Position description: * We seek applicants to join a small team of space scientists and computer scientists in developing NASA's next generation of smart space-craft on-board, autonomous data analysis systems. The group includes leading space scientists (Ted Roush, Virginia Gulick) and leading data analysts (Wray Buntine, Peter Cheeseman). * We are working with spectrometers and a CCD camera, and are building resource-bounded autonomous classification systems, and trainable object recognizers. * The successful student will have considerable flexibility within the goals of the project to contribute. * An ideal summer project would produce demonstration software together with a conference paper. Candidate requirements: * Knowledge of neural or probabilistic networks, machine learning, statistical pattern recognition, image processing, science data, processing, probabilistic algorithms, or related topics is essential. * Strong C programming and Unix skills (experimental, not necessarily production), with experience in programming mathematical algorithms: C++, Java, MatLab, IDL. * Interest in revisiting the project at a later date. Application deadline: * We will accept applications on a continuing basis until the beginning of summer, and will take good applicants as they apply. Please send any questions by e-mail to the addresses below, and type "PI for Autonomous data analysis" as your header line. Dr. Ted Roush: [email protected] Dr. Wray Buntine: [email protected] Full applications (which must include a resume and the names and addresses of at least two people familiar with your work) should be sent by surface mail (no e-mail, ftp or html applications will be accepted) to: Dr. Steve Lesh Attn: summer student for Autonomous data analysis Mail Stop 269-1 NASA Ames Research Center Moffett Field, CA, 94035-1000 >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Fri, 21 Feb 1997 14:11:12 -0500 From: [email protected] (Brij Masand) Subject: KDD Job at GTE Laboratories, Waltham, Ma ** An Outstanding Applied Researcher/Developer needed for the ****** Knowledge Discovery in Databases project at GTE Laboratories ******** Description: Participate in the design and development of state-of-the-art systems for data mining and knowledge discovery. The focus of the job is on applied research in KDD, including development of prototypes to demonstrate innovative business applications of KDD. The candidate will join one of the leading R&D teams in the area of data mining and knowledge discovery. Our current projects include predictive customer modeling for GTE's cellular telephone markets. We are applying multiple learning and discovery methods to very large, high-dimensional real-world databases, involving millions of records and Gbytes of data and have created KDD-based solutions that are being deployed in the field. The ideal candidate will have a Ph.D. in Machine Learning or related fields and 2-3 years of experience, or an M.S. with equivalent experience. The candidate should have experience with machine learning algorithms, be familiar with statistical theory, have practical experience with databases, and be proficient with Web/Internet tools. Excellent coding skills in C/Unix environment and an ability to quickly pick up new systems and languages are needed. Good communication skills, the ability to work in a team, and good coding and system maintenance practices are very desirable. GTE Laboratories incorporated, located in Waltham, Ma is the central research facility for GTE. GTE is among the the largest local exchange telephone carriers and the second largest mobile service provider in the United States. Our research facility is located on a quiet 50 acre campus-like setting in Waltham, MA, 20 minutes from downtown Boston. Our salaries are competitive, and our outstanding benefits include medical/life/dental insurance, saving and investment plans, and an on-site fitness center. Please send a resume and a cover letter (preferably by e-mail, in ASCII) to: [email protected] or by fax to 617.466.3342 (Attn: Brij Masand) I will be travelling till Mar 12th and will reply to email responses after that. thanks! -- Brij Masand ([email protected]) >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Subject: Two positions in Machine Learning/Data Mining at GMD Date: Fri, 28 Feb 97 13:55:06 +0100 From: [email protected] Two positions in Machine Learning/Data Mining at GMD GMD's FIT.KI department (the AI research division of the Institute for Applied Computer Science) is looking to fill two scientist positions (M.S./Diplom or postdoc level) in the area of Machine Learning/Data Mining. We are looking for excellent people with a strong background in one or both of these areas, preferably combining both theoretical/scientific and application/software-engineering skills. Applications at both the postdoctoral and the M.S. level are welcome. You will be working as a research scientist in one of our current ML/DM projects, KESO or ILP2, and will be part of FIT's data mining group consisting of currently 4 people. Scientific work, writing and presentation of papers, and application and software work will both be part of your job. M.S. level applicants will be given time to complete their Ph.D.s while at GMD. Both positions are to be filled as soon as possible, for a period of initially two or three years, renewable for up to five years. Salary is according to the BAT IIa tariff, in the range of approx. DEM 50.000 to DEM 80.000 depending on age, qualifications, and marital status. For more information about FIT.KI, see http://nathan.gmd.de, for more information about the ML/data mining group, see http://nathan.gmd.de/projects/ml/home.html. If you are interested in such a position, please send your application material to Dr. Stefan Wrobel GMD, FIT.KI Schloss Birlinghoven 53754 Sankt Augustin Germany [email protected] to be received no later than March 23, 1997 (preferably by paper mail, but E-Mail is o.k. if otherwise you cannot meet the deadline). Please include at least a brief curriculum vitae, description of your qualifications, research experience and future research interests, degree/grade information (if relevant) and if applicable, a selection of three of your best publications (full text copy). We are looking forward to your application! -------------------------------------------------------------- Dr. Stefan Wrobel GMD -- German Natl. Research Center for Information Technology FIT.KI, Schloss Birlinghoven, 53754 Sankt Augustin, Germany Tel.: +49/2241/14-0, Fax: -2889 E-Mail: [email protected] WWW http://nathan.gmd.de/persons/stefan.wrobel.html Secr.: D. Boethgen Tel. -2731, E-Mail: [email protected]
410.18	97:09	IJSAPL::OLTHOF	Spellchecked Henry Although	`Thu Mar 13 1997 10:15`	576
	Knowledge Discovery Nuggets 97:09, e-mailed 97-03-10 News: * P. Domingo, Re: Looking for phrase matching tool * R. Jain, Tandem Data Mining Announcement, http://www.tandem.com Siftware: * R. Quinlan, C5.0: Successor to C4.5, http://www.rulequest.com Positions: * P. Norvig, Job offered in information extraction and learning, data mining, http://www.junglee.com * M. Bramer, Research Fellowship in Knowledge Discovery * X. Liu, Research Studentship in Intelligent Data Analysis, http://web.dcs.bbk.ac.uk/~hui/IDA/home.html * D. Sleeman, University of Aberdeen, Chair of Computing Science http://www.csd.abdn.ac.uk/people/chair_fp.html -- -2345678-2-2345678-3-2345678-4-2345678-5-2345678-6-2345678-7-2345678- Knowledge Discovery Nuggets is a free electronic newsletter for the Data Mining and Knowledge Discovery community, focusing on the latest research and applications. Submissions are most welcome and should be emailed, with a DESCRIPTIVE subject line (and a URL) to [email protected]. To subscribe, see http://www.kdnuggets.com/subscribe.html KD Nuggets frequency is 3-4 times a month. Back issues of KD Nuggets, a catalog of data mining tools ("Siftware"), and a wealth of other information on Data Mining and Knowledge Discovery is available at Knowledge Discovery Mine site http://www.kdnuggets.com/ -- Gregory Piatetsky-Shapiro (editor) ******************* Official disclaimer ************************* All opinions expressed herein are those of the contributors and not necessarily of their respective employers, or of KD Nuggets ********************************************************************* ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There is security, only opportunity General McArthur >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To: [email protected] cc: [email protected], [email protected] Subject: Re: Looking for phrase matching tool Date: Fri, 28 Feb 1997 13:43:11 -0800 From: "Pedro M. Domingos" <[email protected]> Alvaro Monge and Charles Elkan of UC San Diego ([email protected], [email protected]) have one such program. They have a paper in the proceedings of KDD-96 (p. 267) that describes their system, and also gives references to other work in the area. Pedro >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [Note: the following is a commercial announcement. GPS] From: JAIN_ROHIT%t16@fedex Date: 28 Feb 97 15:08:00 -0600 To: [email protected] Cc: [email protected], [email protected] Subject: Tandems's Feb. 11 announcement Hi folks, It seems in Nuggets you seem to cover announcements made by many companies. I am wondering what would be needed on Tandem's part to have you include that announcement in Nuggets. You can get to the announcement from our home page at http://www.tandem.com. I have also included parts of it in this message. Rohit Jain Contact: Kristine Austin Tandem Computers Incorporated Tel: +1 (408) 285 6645 World Wide Web Home Page Address: http://www.tandem.com Tandem Object Relational Data Mining Architecture Drives Next Generation of Knowledge Discovery Cupertino, CA February 11, 1997 Tandem. Computers Incorporated today announced a revolutionary approach in bringing complete knowledge discovery to business users through its Object Relational Data Mining technology. For the first time, the complete warehouse data set is available for real-time data mining, resulting in reduced processing time, more complete results, and significantly easier management. This new architecture establishes a standard SQL interface between client data mining tools and both object relational and relational database engines. The database engine will perform specialized data manipulation functions required by the data mining algorithms. Tandem's Object Relational Data Mining architecture takes full advantage of the capabilities of relational database engines resulting in the ability to mine larger volumes of data and better performance. By integrating the best-of-breed data mining software with a relational database, Tandem's Object Relational Data Mining will enable business professionals to more effectively uncover and exploit valuable patterns and trends hidden in their data. This architecture will enhance knowledge discovery in solutions such as credit card marketing, claims analysis, retail basket analysis, and others. The interface between data mining tools and the database engine is enabled through the use of SQL extensions, ultimately allowing customers to enjoy a much wider range of data mining clients. Tandem will promote the establishment of de facto standards for these extensions with other database vendors and data mining tool providers. "Initially, the use of SQL extensions will greatly enhance the way traditional alphanumeric data types are mined today," said Abhay Mehta, Tandem's director of Object Relational Data Mining Development. As technology evolves, this architecture will enable the fast, efficient mining of more complex data types such as image, voice, video, and other multimedia objects. In the second half of 1997, Tandem's ServerWare database will be the first to combine all of the elements into a powerful knowledge discovery business environment. Tandem will be able to build on its success in the data warehouse marketplace to position itself well in the high-end macromining segment of the data mining arena, said Dr. Wolfgang Martin, program director, META Group. Tandem s approach is unique in that it opens up the powerful ServerWare database, and other database management systems, to a wide range of data mining functions while accommodating future data mining developments and complex data types. Tandem s data mining partners have been selected so that customers can benefit from their combined breadth of data mining algorithms and for the ability of their tools to work in a high-performance parallel environment necessary to take advantage of this new architecture. Data mining partners include leading companies such as Angoss Software International Limited, Data Distilleries B.V., Magnify Incorporated, NeoVista Solutions Incorporated, and Syllogic B.V. ANGOSS Software International Limited ANGOSS KnowledgeSEEKER excels in applications including fraud detection, target marketing, process control, and risk management. KnowledgeSEEKER displays results in a decision tree format by uncovering valuable relationships and correlations in the dataset, and by writing predictive rules. This format can be easily understood by any business end user. KnowledgeSEEKER turns data into valuable business knowledge. Data Distilleries B.V. Data Distilleries Data Surveyor uses highly efficient decision tree based search strategies and database optimization techniques, enabling it to take into account hundreds of variables to mine finance, retail, insurance, and database marketing databases. At the end of the data mining process, Data Surveyor produces a graphical representation of the discovered relationships and an overview of all actions and results during the mining process. Magnify Incorporated Magnify s PATTERN software is an open set of modular software tools for mining, managing, and analyzing very large data sets. The PATTERN system includes several specialized applications, such as PATTERN:Detect for detecting fraud, anomalies, and rare events and PATTERN:Profit for predicting the delinquency, bankruptcy, credit usage, and profitability of customers. The PATTERN system incorporates algorithms for parallel and distributed variants of classification, regression, and optimization trees, and a variety of other data mining algorithms. NeoVista Solutions Incorporated NeoVista Solutions Decision Series suite of knowledge discovery tools are directed towards solving data mining challenges in a variety of markets, including retail, insurance, telecommunications, and healthcare. The Decision Series suite includes pattern discovery tools based on neural networks, clustering, genetic algorithms, and association rules. Syllogic B.V. The Syllogic Data Mining Tool supports all stages in the data mining process, including data selection, data cleaning, enrichment, coding, discovery, and visualization. Using a toolbox approach, the tool combines various database analysis techniques, such as decision trees, association rules, k-nearest neighbor, clustering, and visualization to solve business challenges in the finance, transportation, government, and system and network management segments. To help customers stay on the leading edge of data mining, Tandem is also partnering with key universities such as Simon Fraser University in order to benefit from the results of their on-going research. This alliance includes parallelizing existing and next-generation data mining algorithms and techniques. Tandem is making a major investment in data mining and in driving its widespread deployment as a business tool, said Bill Heil, senior vice president and general manager of Tandem s ServerWare business unit. By focusing on the Tandem ServerWare database engine and partnering with best-of-breed solutions providers and researchers, we are able to supply customers with the industry s most advanced and comprehensive range of data mining solutions. What we are offering is an extensible approach designed to keep customers at the forefront of the latest developments in knowledge discovery. Availability Tandem s Object Relational Data Mining solutions will be available starting in the third quarter of 1997. With these solutions, customers will be able to take advantage of the industry s most scalable performance for mining databases residing on either Microsoft. Windows NT. Server based platforms (including Tandem s recently introduced S-series servers based on Windows NT Server) or on Tandem s massively scalable NonStop. Himalaya. servers. About Tandem Founded in 1974, Tandem Computers Incorporated designs and delivers technology solutions that companies rely on to compete in a business world that runs 24 hours a day. A US$1.9 billion company headquartered in Cupertino, California, Tandem has offices, strategic partners, and providers in more than 50 countries around the world. Tandem, Himalaya, NonStop, Object Relational Data Mining, ServerWare, and the Tandem logo are trademarks or registered trademarks of Tandem Computers Incorporated in the United States and/or other countries. Microsoft and Windows NT are either trademarks or registered trademarks of Microsoft Corporation in the United States and other countries. All other brand or product names are trademarks or registered trademarks of their respective companies. Contact: Kristine Austin Tandem Computers Incorporated Tel: +1 (408) 285 6645 World Wide Web Home Page Address: http://www.tandem.com Tandem Introduces Object Relational Data Mining Solutions and Services for Vertical Markets Business-driven offerings target card marketing, micromerchandising, claims analysis, and other key applications Cupertino, CA February 11, 1997 Applying its vertical market expertise and new Object Relational Data Mining architecture to real-world business problems, Tandem. Computers Incorporated today launched a series of Object Relational Data Mining solutions packages for card marketing, micromerchandising, and insurance claims analysis. Tandem also announced new consulting services designed to allow companies to quickly enjoy low-risk, discovery-driven decision making. The solutions and services are based on Tandem s revolutionary new Object Relational Data Mining architecture. This enables customers to efficiently mine their entire database, not merely samples, for useful patterns and trends. The result is a more effective realization of the full business value of data. Object Relational Data Mining solutions add significant new functionality to customer segmentation and predictive modeling techniques, said Jonathan Kalman, managing director of MRJ Technology Solutions, a leading specialty systems integrator. Tandem is taking a profoundly different approach by integrating its powerful database, capable of handling an entire organization s data, with leading data mining tools. Delivering full value of business data The new solutions packages will be comprised of the cross-platform Tandem ServerWare, database, appropriate integrated data mining and other analysis tools from leading solutions partners, Tandem S-series massively scalable Himalaya. and/or Microsoft. Windows NT. Server based hardware platforms, application and reporting templates, data models, and Directional Consulting services. Though specially tested and packaged, the solutions are all easily customizable. Initial solutions include: Card Marketing Aimed at card acquirers and issuers, this solutions package applies Object Relational Data Mining architecture and other decision support technology to improve the effectiveness of cardholder retention and acquisition efforts. This provides a better understanding of when certain customers are likely to leave and why, leading to more effective customer segmentation, increased response rates to marketing promotions, and improved margins through targeted product development and pricing. Micromerchandising This package enables retailers to mine immense volumes of detailed merchandising data, resulting in improved in-stock positions, reduced markdowns by better understanding buying patterns and trends, enhanced promotional effectiveness, and improved store profitability through more precise forecasting. Claims Analysis Aimed at insurance providers looking to contain underwriting costs and improve loss ratios, this package uses Object Relational Data Mining technology to support new product development, fraud profiling and detection, better service provider alliances, and more exact underwriting experience comparisons. Immediate customer reaction to these benefits is positive. Said Juan Verastigui, director of Claims System Development at USAA, a leading insurance company, Tandem s Object Relational Data Mining architecture and the way it leverages the parallel ServerWare database will provide USAA with the ability to derive full value from all our claims data, and not just subsets. The resulting faster and more complete answers to our business queries will have a very positive effect on our bottom line. Looking ahead, Object Relational Data Mining architecture will enable the mining of complex data types that include voice, video and images. Said MRJ s Jonathan Kalman, Object Relational Data Mining solutions provide immediate value with traditional data types, and extensibility to meet future multimedia analysis needs. Directional Consulting, new Object Relational Data Mining services Tandem s Directional Consulting services are an integral part of the new solutions packages and are also available separately. These services define a low-risk, high-return methodology proven over many Tandem based data warehousing implementations for exploring and understanding how data mining can support particular business initiatives. Directional Consulting services use a phased approach to having data mining production environments up and running within 90 days. The process begins with establishing priorities for implementation of Object Relational Data Mining and proceeds to a proof of concept phase to verify that the selected data mining solutions will meet expectations. System design, data modeling, and implementation then follow, culminating with the establishment of a robust, scalable operational environment that supports application evolution and growth. Availability Tandem Card Marketing, Micromerchandising, and Claims Analysis solutions will be available beginning in the first quarter of 1997. These will be enhanced to take advantage of Object Relational Data Mining technology in the third quarter of 1997. About Tandem Founded in 1974, Tandem Computers Incorporated designs and delivers technology solutions that companies rely on to compete in a business world that runs 24 hours a day. A US$1.9 billion company headquartered in Cupertino, California, Tandem has offices, strategic partners, and providers in more than 50 countries around the world. Tandem, Himalaya, NonStop, Object Relational Data Mining, ServerWare, and the Tandem logo are trademarks or registered trademarks of Tandem Computers Incorporated in the United States and/or other countries. Microsoft and Windows NT are either trademarks or registered trademarks of Microsoft Corporation in the United States and other countries. All other brand or product names are trademarks or registered trademarks of their respective companies. >~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Wed, 5 Mar 1997 23:31:07 -0500 (EST) From: [email protected] (Ross Quinlan) Subject: Successor to C4.5 I have developed a new inductive program called C5.0. Its main advantages are: * new, faster methods for generating rules * support for boosting * optional non-uniform misclassification costs Further information and free demonstration versions are available from http://www.rulequest.com Ross Quinlan >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Fri, 28 Feb 1997 15:33:40 -0800 From: [email protected] (Peter Norvig) Organization: Junglee Corp. To: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Subject: Job offered in information extraction and learning, data mining Junglee is looking for full-time employees and summer interns to work on information discovery and data mining from text documents. We're looking for creative hard-working people with experience in some of the following: agents, databases, information extraction, parsing, regular expressions, language design, statistics, machine learning, and GUI design. Junglee develops Internet and Intranet information technology for the future and pushes it to market today. Technology that raises eyebrows and drops barriers. Founded in 1996 by four PhD students from the Stanford University Computer Science Department and a Silicon Valley veteran, Junglee Corporation has excellent funding, high-profile customers, and a strong revenue plan. Our Virtual DataBase (VDB) engine is fueled by our ability for data source description, extraction, and attribute mapping. Imagine capturing data from hundreds of disparate unstructured web sites, mixing that with data from other heterogeneous, distributed database and non-database sources and turning it all into a relational aggregate with the power of full SQL queries and the ease and portability of HTML user interfaces. We call these applications PALs - powerful information sites where people can ask for and get an answer. Several of our PALs are up on the web today at www.junglee.com and www.washingtonpost.com; we are currently building more of them for some well-known companies. One of the key aspects of the technology is discovering/mining information from text. The project is lead by Peter Norvig who has done extensive work on Natural Language Processing, Machine Learning, and other Artificial Intelligence problems. While this project involves significant ground-breaking research, it is definitely a development project, not just research. Please send responses to [email protected] or by fax to 408-522-9470 and mention this posting. -- Peter Norvig [email protected] Junglee Corporation phone: 408-522-9482 1250 Oakmead Parkway fax: 408-522-9470 Suite 310 http://www.junglee.com Sunnyvale CA 94086 http://www.norvig.com >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: "Max Bramer" <[email protected]> Organization: University of Portsmouth To: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Date: Sat, 1 Mar 1997 17:05:45 +0000 Subject: Research Fellowship in Knowledge Discovery Reply-to: [email protected] UNIVERSITY OF PORTSMOUTH DEPARTMENT OF INFORMATION SCIENCE RESEARCH FELLOWSHIP IN KNOWLEDGE DISCOVERY Salary: stlg17,472 - stlg20,381 (Pay award pending) Closing Date: 21 March, 1997 (Note: This is an extension to the previously announced closing date.) Reference: RTEC 0149 (G) Applications are invited for a two-year Research Fellowship in the Department of Information Science to commence as soon as possible. The successful candidate will work closely with Professor Max Bramer (Head of the Department of Information Science) to develop research in the area of Knowledge Discovery and Data Mining. The Department currently has projects in the sub-areas of automatic induction of classification rules from examples, Case Based Reasoning, Neural Networks, Genetic Algorithms and related statistical techniques. Applicants should have a good honours degree in Computer Science or related subject. Preference will be given to candidates who have (or expect soon to receive) a higher degree in a relevant discipline. Relevant commercial experience would also be an advantage. Informal enquiries may be made to Professor Bramer, either by telephone (01705) 844444 or by electronic mail ([email protected]), or to Simon Thompson on (01705) 844097 ([email protected]). Further information about the department is also available from the World Wide Web at http://www.sis.port.ac.uk. Further particulars are available from: Personnel Office University House Winston Churchill Avenue Portsmouth PO1 2UP England Telephone (01705) 843421 (24 hour answerphone) E-mail: [email protected] http://www.port.ac.uk/ IMPORTANT NOTE: All applications should be sent (preferably on paper not by email) to the Personnel Office NOT to the Department of Information Science. _______________________________________________________ Professor Max Bramer Department of Information Science University of Portsmouth Milton, Southsea PO4 8JF, England Tel: +44-(0)1705-844444 Fax: +44-(0)1705-844006 email: [email protected] >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: [email protected] (Xiaohui Liu) Date: Tue, 4 Mar 97 12:17:57 GMT To: [email protected] Subject: Re: EPSRC CASE Research Studentship in Intelligent Data Analysis BIRKBECK COLLEGE DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF LONDON EPSRC CASE Research Studentship in Intelligent Data Analysis Applications are invited for an EPSRC CASE PhD studentship, within the Intelligent Data Analysis (IDA) Group, at the Department of Computer Science, Birkbeck College. The three-year studentship is for the investigation of intelligent data analysis techniques for research problems in process industries, funded by Honeywell Hi-Spec Solutions, UK and Honeywell Technology Center, USA. The successful candidate will have a tax-free salary of at least 10,000 pounds (there are experience, age-related and dependants additions), and will be expected to work on a joint research project between Birkbeck and Honeywell on "Causal Modeling for Time Series Data". The IDA Group at Birkbeck conducts research into the application of computationally intelligent techniques to data analysis problems. The group has enjoyed successful collaboration with several external organisations in industry and medicine on a variety of IDA research projects, funded by government agencies, industrial sponsorships and charity organisations. The group is to host the second IDA conference in London this August. Applicants should have at least a 2(i) in Computer Science or related subject, with a good background in Artificial Intelligence or Statistics, or a 2(i) in Chemical Engineering with strong computing background. Please submit a CV as soon as possible, but not later than 31 March 1997, to Dr X Liu, Department of Computer Science, Birkbeck College, Malet Street, London WC1E 7HX, UK. Phone Dr Liu on 0171-631 6711 or email him ([email protected]) if you wish to make an informal enquiry. Information regarding this project and research activities of the IDA Group at Birkbeck can be accessed on the World Wide Web via URL: http://web.dcs.bbk.ac.uk/~hui/IDA/home.html >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: Derek Sleeman <[email protected]> Date: Sun, 2 Mar 1997 15:01:52 GMT To: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Cc: [email protected] Subject: CHAIR VACANCY (for Posting) Announcement of Post (Closing date: early MARCH) University of Aberdeen Chair of Computing Science Applications are invited for the post of Professor of Computing Science. The new Professor will play a key role in strengthening the teaching and research activities of the Department of Computing Science. The new Professor will provide academic leadership in the development of the Department's existing areas of interest, Artificial Intelligence and Databases. Candidates should have an international reputation with an excellent record of innovative research as measured by publications and grant income. Applications from academics, research managers and others from Industry and public sector Institutions will be considered. Further, as the University of Aberdeen has recently made a major research investment in the Institute of Medical Sciences, it would be an advantage if the person had experience of working with Medical/Healthcare professionals. The person appointed will be expected to acquire a significant role in the management of the Department. Informal enquiries may be directed to Professor A R Forrester, Vice-Principal and Dean of the Faculty of Science and Engineering: Email: [email protected] Tel: +44 (0)1224 272081 Fax: +44 (0)1224 272082 More details of the Department's research activities can be found on our research pages at http://www.csd.abdn.ac.uk/research/index.html or contact Professor Derek Sleeman, Head of Department: Email: [email protected] Tel: +44 (0)1224 272295/6 Fax: +44 (0)1224 273422 For further particulars of this post, see: http://www.csd.abdn.ac.uk/people/chair_fp.html
410.19	97:10	IJSAPL::OLTHOF	Spellchecked Henry Although	`Fri Mar 21 1997 14:47`	793
	Knowledge Discovery Nuggets 97:10, e-mailed 97-03-19 News: * J. Brown, Report on DM Summit in San Francisco, Feb 18-21, 1997 * B. Pearlmutter, Abbadingo One: DFA Learning Competition http://abbadingo.cs.unm.edu/ Siftware: * K. Schirmer, smart information services GmbH, http://www.newscan-online.de Positions: * G. John, IBM DATA MINING ANALYST POSITIONS, http://www.ibm.com/bi * B. Perry, HRL Job Opening: Research Intern/Parttime (KDD, DAI, Java) http://www.wins.hrl.com Meetings: * M. Bramer, Expert Systems 97: Call for Papers http://www.sis.port.ac.uk/sges/es97.html * M. Smyth, Hinton-Jordan Learning Methods Tutorial, May 1997, http://www.ai.mit.edu/projects/cbcl/web-pis/jordan/course/ * L. De Raedt, Final call for IJCAI-97 Workshop on Frontiers of inductive logic programming * S. Dzeroski, ILP-97: CFP Reminder http://www-ai.ijs.si/SasoDzeroski/ilp97.html -- Knowledge Discovery Nuggets is a free electronic newsletter for the Data Mining and Knowledge Discovery community, focusing on the latest research and applications. Submissions are most welcome and should be emailed, with a DESCRIPTIVE subject line (and a URL) to [email protected]. Please keep meeting announcements short and put all the details on the meeting web page ! To subscribe, see http://www.kdnuggets.com/subscribe.html KD Nuggets frequency is 3-4 times a month. Back issues of KD Nuggets, a catalog of data mining tools ("Siftware"), and a wealth of other information on Data Mining and Knowledge Discovery is available at Knowledge Discovery Mine site http://www.kdnuggets.com/ -- Gregory Piatetsky-Shapiro (editor) ******************* Official disclaimer ******************************** All opinions expressed herein are those of the contributors and not necessarily of their respective employers or of KD Nuggets *************************************************************************** ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Knowledge is the antidote to fear Ralph Waldo Emerson >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 17 Mar 1997 21:12:01 -0600 From: "J.P.Brown" <[email protected]> Subject: Second Annual Data Mining Summit The Second Annual Data Mining Summit was held, February 19-21, 1997, at the San Francisco Regency Hyatt. As I was not at every session, this is a generalization - no names, no pack drill. The majority of the delegates were from the United States and Canada. Nine other countries were represented, from Europe, South America and Asia. There were presentations all the way from the "Biggies" to the "Start-Ups". From the Past to the Present, there were papers on specific Data Mining techniques, and much reliance on subjective approaches. A thought-provoking paper with present-day relevance covered the Public Perception of Data-Mining. From the Present to the Future, there were extensions to accepted ideas and some concepts moving toward a more controversial emphasis on objectivity. The Basics, and some Specialties, were covered in detail, and attention was paid to the Dimensions of Decision Support and to On-Line Analytical Processing, both subjects of great importance. Some intensely practical, no-nonsense success stories were presented, and some novel perspectives on iterative "living" processes. As well as successful Data Mining examples, Limitations, Challenges and Possible Pitfalls were pointed out. Solutions were suggested. Before these demonstrably useful techniques can become the work horses of the future, a new generation of Tool Support must prove itself to be effective. This has begun to happen, and the competition between these new user-friendly applications will be interesting to participate in. Little attention to variations with passage of time, could be noted. There seems to be a prevalent assumption that "situations" will not change. This is "writing the history of the future" as opposed to the approach which starts off by "predicting the past", and then keeps a constant, trigger-happy lookout for significant change. The approaches which were considered, varied from simple functions, to Algorithms, to Genetic Algorithms. Complex hybrid populations could be separated in several ways. Rules could be used, and Artificial Neural Nets. Agents could do it, if they were made to be versatile enough. Visualization was important because we can "think with our eyes". Some of you will know that I am of the "all of the above" school. >From my own personal point of view the Data Mining Summit was encouraging. The next move will be to put the pieces together, and to consciously emphasize our goals. Those who want to know more about the "all of the above" school, could try http://www.hal-pc.org/~jpbrown and then let me know what they think. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Sun, 9 Mar 97 23:45 MST From: "Barak Pearlmutter" <[email protected]> To: [email protected] Subject: Abbadingo One: DFA Learning Competition Thought database miners might want to whet their teeth on these little datasets. Although neither as big nor as lucrative as the big boys, they are a bit more controlled, and give an opportunity to test an algorithm against all the competition. Abbadingo One: DFA Learning Competition Announcement & Call for Participation In order to encourage the development of better grammar induction algorithms, the Abbadingo One competition will award at least $1,024 to the designer of the system that is most successful at discovering the structure of random deterministic finite automata, as assessed by a graded series of nine benchmark problems. The competition ends on 15-Nov-1997. This competition is being sponsored by, among others, * The Computer Science Department at the University of New Mexico, which is providing computational support. * The Kluwer Academic journal "Machine Learning," which will give priority treatment to a paper describing the award winning algorithm. * The Santa Fe Institute, which will host the award ceremony. * The "Journal of Artificial Intelligence Research." For details retrieve http://abbadingo.cs.unm.edu/ Good luck, and may the best algorithm win! -- Competition Kevin J. Lang <[email protected]> organizers: Barak A. Pearlmutter <[email protected]> >~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [The following is a commercial announcement. GPS] Date: Tue, 11 Mar 1997 21:30:46 +0100 From: Kai Schirmer <[email protected]> Subject: smart information services GmbH Hello! We would like to introduce ourselves and are interested in being listed in your company overview on data mining and knowledge discovery. Formed in early 1995, smart information services GmbH is located in Potsdam near Berlin in Germany. The company's activities centers in application development, service and research using advanced information technologies in the areas of Intelligent Information Retrieval. Smart information is currently developing a news categorizing and filtering system (newscan) using advanced text processing techniques. Further activities focus on fact extraction from financial news and automated classification of news from business news wires for signaling, filtering and routing tasks. The newscan news filtering system and service offers business professionals a smartest, easy and cost-effective way of gaining current awareness in a rapidly changing world. A true knowledge exchange company, smart information provides electronic information services which intelligently interconnect content providers and subscribers. Its interactive, customized services include newscan for corporate workgroups and enterprises. Newscan is a premium business intelligence service customized to the specific needs of clients that focuses on the industry news that's critical to their business. It provides customers with "custom-tailored" news based on a profile that describes their markets, news needs and specialized interests. Using advanced filtering techiques, newscan selects highly relevant news by scanning some 3,000 to 4,000 German and English news daily and delivers only those relevant to each customer in time for each business day. Smart information is partner in the Esprit project ECRAN. ECRAN will develop a new generation of Information Extraction (IE) applications, to be included in telematic services having a large textual content. ECRAN will analyse free texts (initially, financial information from specialised newswire services, and market information on the internet) extracting information content. The information can be compared against a model of user requirements so that the system can precisely identify text of interest to a customer. By using the results of the ECRAN project specific financial, economic and political information from standardised news will be extracted and stored in a database format. The information extraction is based on lexicon tuning technologies and sophisticated template handling. Once stored in a database format the extracted facts can be analysed in combination with time series. Currently smart information is preparing a European research project on information mining in heterogeneous environments. The main ideas are described in the following. In the past few years, the abundance of continuous data sources, the connectivity allowed by local and worldwide public and private networks, and the continuous decrease of the bandwidth/price ratio, have been subject to a steady growth at explosive rates, and this trend has shown no sign of decline ever since. Thus, staggering amounts of new information are continuously made available to private users, business firms and professional operators. Extracting the information relevant for a given business or position from an overwhelming flood of data, and being able to use it for tactical and strategical planning, as well as decision support on the fly, is vital for business survival and leadership, but it is getting less and less amenable of human handling. On the other hand, an ever increasing part of current information fluxes passes through computer networks, which makes them amenable of automatic filtering, processing and interpretation. Both situations concur to demonstrate both the need and the feasibility of systems that filter and integrate information from different data sources, sometimes being static and well structured (legacy Data bases), sometimes dynamic and with a variable degree of standardization, from rigidly defined records, to multimedia documents, to free text, speech, images. Please link to our web-site "www.newscan-online.de". Yours sincerely Kai Schirmer >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Tue, 11 Mar 1997 20:43:00 -0800 (PST) From: George John <[email protected]> Subject: IBM DATA MINING ANALYST POSITIONS (please post/redistribute) IBM DATA MINING ANALYST POSITIONS (please post/redistribute) Help! We're drowning in work! IBM needs 10 more analysts for its highly successful data mining group. Join our team of high-caliber PhD's in an exciting multi-faceted career in data mining: * Analyze data for customers using IBM's industry-leading data mining products * Interact directly with senior management at Fortune 500 companies * Teach data mining classes to our customers and develop course materials * Travel, see the world! (One member of our team just got back from Paris, another is heading to Australia for two weeks... these are not vacations, it's their job!) * Interact with researchers and product developers, discuss ideas for new data mining algorithms, new visualizations, and new features for our products * Assist sales reps in customer visits, be the "technical person" to answer hard questions * Work with the marketing group to help develop brochures, etc. * Attend trade shows and conferences, learn more about the industry and talk to customers * Use SQL/AWK/PERL/SAS to process data (ooh, the excitement!) The ideal candidate * has an excellent understanding of the data analysis process and has participated in several projects * is strongly technically proficient in at least some areas of data mining (background in statistics, machine learning, neural nets, or pattern recognition, or related), with a desire to learn more * has excellent communication and presentation skills * is a self-starter, good at quickly becoming a productive member of a team * is a fast learner, can quickly become an expert in a new industry and work with IBM consultants to productively apply data mining * has some unix skills, knows enough AWK and PERL to be self-sufficient in processing data * has a good sense of humor, fun to work with, enjoys taking co-workers out to dinner, insists on paying every time, etc... Positions are available for both senior applicants (professors, PhD's, MBA's, or 4+ years relevant business experience) and more junior members (MS, BS, less job experience). Salaries are competitive, and based on experience. The jobs are focused on business, but some amount of time spent on research may be negotiated. IBM's data mining group is growing quickly, and offers excellent career opportunities. For more information on data mining at IBM, see the webpage for IBM Global Business Intelligence Solutions (our parent organization) at http://www.ibm.com/bi Send resume to George H. John, [email protected]. ASCII (plain text) via email is strongly preferred. Please put "DMJOBS-97:" then your name in the subject. Hardcopy may be sent to George H. John IBM Alamden Research Center 650 Harry Rd / D2 San Jose, CA 95120-6099 FAX: 408-927-2100 (put "Attn: George John" on cover sheet) IBM is an equal opportunity employer. >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Wed, 12 Mar 1997 15:25:34 -0800 From: [email protected] (Brad Perry) Subject: HRL Job Opening: Research Intern/Parttime (KDD, DAI, Java) http://www.wins.hrl.com Subject: HRL Job Opening: Research Intern/Parttime (KDD, DAI, Java) We are currently looking to fill an intern, or part-time, PhD candidate at Hughes Research Laboratories (HRL). The position will be a summer intern capable of extending into a part-time position during the school year. HRL is located in Malibu, CA and represents the central research lab for Hughes Electronics Corporation. Our group is investigating the use of agent, data mining, and database technologies to support information management, discovery, and analysis in large-scale dynamic Internet environments. Our two primary research areas involve: * Information exploitation techniques to effectively identify and disseminate semantically relevant information to large user populations, especially with the use of satellite broadcast channels. * Data mining techniques to extract, represent, and manipulate semantic cues from large-scale and distributed information sources. The candidate should have a background in DAI, agent architectures, machine learning, and data mining. Experience with KQML, KIF, and/or Java a definite plus. This position entails research and prototype development. Required: * PhD candidate in Computer Science (or related field) * Good OO programming skills (implementation of prototypes will be required). * Unix programming background. * Good oral and written communication skills. Desirable: * Machine Learning or Data Mining background * Java programming experience (or C/C++, at least). * Ontologies. * Multidatabase systems. * Distributed object systems (CORBA, RMI, etc.) Please email your resume to Son Dao at [email protected], or mail to: Son Dao Hughes Research Laboratories 3011 Malibu Canyon Road Malibu, CA 90265 HRL is an equal opportunity employer. ------ Brad Perry Hughes Research Laboratories [email protected] (310) 317-5683 UCLA [email protected] (310) 206-4561 >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: "Max Bramer" <[email protected]> To: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Date: Sun, 9 Mar 1997 17:05:52 +0000 Subject: Expert Systems 97: Call for Papers Reply-to: [email protected] BRITISH COMPUTER SOCIETY SPECIALIST GROUP ON EXPERT SYSTEMS ANNUAL CONFERENCE - EXPERT SYSTEMS '97 (ES97) CALL FOR PAPERS The 17th annual Conference of the British Computer Society Specialist Group on Expert Systems, ES97, is being held at St. John's College, Cambridge between 15th and 17th December 1997. The objective of the ES series of conferences is to bring together researchers and application developers from business, industrial and academic communities to discuss issues and solutions to problems based on techniques derived from Artificial Intelligence. The Conference continues to build on the success of previous years, with a two-track event containing fully refereed technical and applications papers. For the Technical Stream, contributions are invited in the form of papers of up to 5,000 words on knowledge-based systems and related areas of Artificial Intelligence. Papers representing original work on theoretical and applied AI relating to: constraint satisfaction; intelligent agents; knowledge engineering methods; machine learning; model-based reasoning; verification and validation of KBS; natural language understanding; case-based reasoning, knowledge discovery in databases and other related areas are welcome. For the Applications Stream, contributions are invited in the form of papers of up to 5,000 words presenting case studies of knowledge based systems that address real-world problems such as: diagnosis, monitoring, scheduling and selection. Most importantly, the papers should highlight the critical elements of success and the lessons learned. Papers submitted to both streams will be refereed and those accepted will again be published in book form in the "Research and Development in Expert Systems" and "Applications and Innovations in Expert Systems" series (for the technical and application streams respectively). To assist us with our planning of the conference, anyone intending to submit a paper should provide a short abstract, with title, at the earliest opportunity to the Conference Secretariat. Authors should indicate the stream to which their papers are being submitted. Please include your full name and postal address in any email submissions. Formatting instructions for papers will be sent as soon as the title and abstract are received. Four copies of papers should be submitted to arrive no later than Friday 20th June 1997. Submissions should be sent in paper form by post to the Conference Secretariat. Please note that presenters of submitted papers will be asked to cover their costs of attending the conference by paying at the SGES members' academic rate. TUTORIALS & WORKSHOPS The Conference Committee invites proposals for tutorials or workshops to be presented on Monday 15 December. Proposals for full and half day tutorials, from an individual or group of presenters should be directed in the first instance to the Conference Secretariat. EXHIBITION A table top exhibition will run alongside the Conference. There will be a limited number of spaces available and potential exhibitors are encouraged to book early, as these will be on a first-come, first-served basis. SPONSORSHIP The Conference Committee is keen to make contact with any organisations who may wish to sponsor the Conference, in whole or in part. Sponsorship of an international conference such as ES97 will ensure the highest visibility for the benefactor, both through the appearance of the company logo on all promotional literature and in references to the Conference in all media exposure prior to and after the event. CONFERENCE COMMITTEE: Conference Chair: Prof Max Bramer, University of Portsmouth, Southsea, PO4 8JF [email protected] Deputy Conference Chair: Dr Ian Watson, University of Salford, Salford, M5 4WT [email protected] Technical Programme Chair: Dr John Hunt, University of Wales, Dept of Computer Science, Aberystwyth, Dyfed SY23 3DB [email protected] Applications Programme Chair: Mrs Ann Macintosh, Artificial Intelligence Applications Institute, Edinburgh, EH1 1HN [email protected] CONFERENCE SECRETARIAT: Ms. Kit Stones, The Conference Team 17 Spring Road Kempston, Bedford MK42 8LS Tel/Fax +44 (0)1234-302490 [email protected] IMPORTANT DATES: Title/Abstract notification: now Full paper submission: 20 June 1997 Notification of acceptance: 8 August 1997 Camera ready papers due: 19 September 1997 WORLD WIDE WEB ADDRESS FOR CONFERENCE INFORMATION: http://www.sis.port.ac.uk/sges/es97.html _______________________________________________________ Professor Max Bramer Department of Information Science University of Portsmouth Milton, Southsea PO4 8JF, England Tel: +44-(0)1705-844444 Fax: +44-(0)1705-844006 email: [email protected] >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: Marney Smyth <[email protected]> Subject: Hinton-Jordan Learning Methods Tutorial, May 1997 Date: Mon, 10 Mar 1997 06:09:19 -0500 (EST) ************************************************************ * * * Learning Methods for Prediction, Classification, * * Novelty Detection and Time Series Analysis * * * * Washington, D.C., May 2 -- 3, 1997 * * * * Geoffrey Hinton, University of Toronto * * Michael Jordan, Massachusetts Inst. of Tech. * * * ************************************************************ A two-day intensive Tutorial on Advanced Learning Methods will be held May 2 -- 3rd, 1997, at the Hyatt Regency on Capitol Hill, Washington D.C. Space is available for up to 50 participants for the course. The course will provide an in-depth discussion of the large collection of new tools that have become available in recent years for developing autonomous learning systems and for aiding in the analysis of complex multivariate data. These tools include neural networks, hidden Markov models, belief networks, decision trees, memory-based methods, as well as increasingly sophisticated combinations of these architectures. Applications include prediction, classification, fault detection, time series analysis, diagnosis, optimization, system identification and control, exploratory data analysis and many other problems in statistics, machine learning and data mining. The course will be devoted equally to the conceptual foundations of recent developments in machine learning and to the deployment of these tools in applied settings. Case studies will be described to show how learning systems can be developed in real-world settings. Architectures and algorithms will be presented in some detail, but with a minimum of mathematical formalism and with a focus on intuitive understanding. Emphasis will be placed on using machine methods as tools that can be combined to solve the problem at hand. WHO SHOULD ATTEND THIS COURSE? The course is intended for engineers, data analysts, scientists, managers and others who would like to understand the basic principles underlying learning systems. The focus will be on neural network models and related graphical models such as mixture models, hidden Markov models, Kalman filters and belief networks. No previous exposure to machine learning algorithms is necessary although a degree in engineering or science (or equivalent experience) is desirable. Those attending can expect to gain an understanding of the current state-of-the-art in machine learning and be in a position to make informed decisions about whether this technology is relevant to specific problems in their area of interest. COURSE OUTLINE Overview of learning systems; LMS, perceptrons and support vectors; generalized linear models; multilayer networks; recurrent networks; weight decay, regularization and committees; optimization methods; active learning; applications to prediction, classification and control Graphical models: Markov random fields and Bayesian belief networks; junction trees and probabilistic message passing; calculating most probable configurations; Boltzmann machines; influence diagrams; structure learning algorithms; applications to diagnosis, density estimation, novelty detection and sensitivity analysis Clustering; mixture models; mixtures of experts models; the EM algorithm; decision trees; hidden Markov models; variations on hidden Markov models; applications to prediction, classification and time series modeling Subspace methods; mixtures of principal component modules; factor analysis and its relation to PCA; Kalman filtering; switching mixtures of Kalman filters; tree-structured Kalman filters; applications to novelty detection and system identification Approximate methods: sampling methods, variational methods; graphical models with sigmoid units and noisy-OR units; factorial HMMs; the Helmholtz machine; computationally efficient upper and lower bounds for graphical models REGISTRATION Standard Registration: $700 Student Registration: $400 Cancellation Policy: Cancellation before Friday April 25th, 1997, incurs a penalty of $150.00. Cancellation after Friday April 25th, 1997, incurs a penalty of one-half of Registration Fee. Registration Fee includes Course Materials, breakfast, coffee breaks, and lunch. On-site Registration is possible. Payment of on-site registration must be in US Dollar amounts, by Money Order or Check (preferably drawn on a US Bank account). Those interested in participating should return the completed Registration Form and Fee as soon as possible, as the total number of places is limited by the size of the venue. [edited for space] ADDITIONAL INFORMATION A registration form and hotel information are available from the course's WWW page at http://www.ai.mit.edu/projects/cbcl/web-pis/jordan/course/ Marney Smyth E-mail: [email protected] Phone: 617 258-8928 Fax: 617 258-6779 >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Fri, 14 Mar 1997 16:47:02 +0100 (MET) From: Luc De Raedt <[email protected]> To: [email protected], [email protected] Subject: Final CFP Frontiers of ILP Workshop at IJCAI FINAL CALL FOR PARTICIPATION and PAPERS IJCAI-97 Workshop on FRONTIERS OF INDUCTIVE LOGIC PROGRAMMING Monday 25 August 1997 ========================================================================== GENERAL INFORMATION The IJCAI-97 one day workshop on "Frontiers of ILP" in Nagoya, Japan, will take place on August 25, immediately prior to the start of the main IJCAI conference. TECHNICAL DESCRIPTION Inductive logic programming (ILP) is a recent subfield of artificial intelligence that studies the induction of first order formulae from examples. The purpose of this workshop is twofold: on the one hand, we wish to widen the scope of ILP by investigating its relations to neighboring fields, and on the other hand, we wish to make ILP more accessible for researchers from neighboring fields. The workshop therefore solicits papers that lie at the frontiers of ILP with neighboring fields. A non-exclusive list of interesting topics for the workshop includes : * ILP and Software Engineering: what has ILP to offer to Software Engineering ?, and in what way can Software Engineering help to design ILP systems and applications ? * ILP for Knowledge Discovery in Databases : ILP aims at learning complex rules involving multiple relations from small databases, whereas KDD typically induces simple rules about a single relation from a large database. Furthermore, ILP allows to exploit background knowledge in a variety of ways. Can KDD and ILP be succesfully combined ? * ILP and Computational or Algorithmic Learning Theory : though many results have been obtained concerning the learnability of inductive logic programming, most of the results are negative and most of the positive results are reducible to propositional learning methods. Is there a mismatch of COLT with ILP ? and if so, what can be done about it ? * ILP versus propositional learning methods : Since the very start of ILP, researchers and practioners of machine learning have wondered about the relation between ILP and propositional learning methods. Theoretical and experimental questions that arise include: when to use ILP and when to use propositional learning methods ? under what circumstances can ILP be reduced to propositional learning ? what is the price to pay for using first order logic in terms of efficiency ? * ILP and Knowledge Representation : ILP has traditionally employed computational logic to represent hypotheses and observations. Alternative well-founded knowledge representation formalisms have received little attention (with the exception of CLASSIC). What can ILP learn from Knowledge Representation ? and in what well-founded Knowledge Representation formalisms is induction feasible ? * ILP in multistrategy learning : Multistrategy learning combines multiple learning strategies. What role can ILP play for multistrategy learning ? * ILP and Probabilistic reasoning: in contrast to propositional learning methods, ILP has not used probabilistic representations. How can ILP incorporate such representations ? and how can it interact with methods such as Bayes nets or Hidden Markov Models ? * ILP for Intelligent Information Retrieval: The rapid development of the World Wide Web has spawned significant interest in intelligent information retrieval. In particular, the need for algorithms for reliably classifying textual documents into given categories (like interesting/uninteresting) be useful for a wide variety of tasks. Currently, most learning algorithms are not able to make use of structural information like word order, succesive words, structure of the text, etc. Can ILP algorithms offer advantages over conventional information retrieval or machine learning algorithms for this sort of tasks? * Applications of ILP in subfields of AI : ILP has been applied to other subfields of AI, including natural language processing, intelligent agents and planning. Further applications of ILP within AI are solicited. Both position papers about the relation of ILP to other fields, as well as research papers that make specific techical contributions are solicited. However, to stimulate discussion, it is expected that each technical paper also clarifies the position of ILP with regard to the neighboring field(s) it addresses. Except for the presentation of position and technical papers, the workshop will also feature a panel discussion on the frontiers of ILP and possibly an invited talk. ORGANISERS Luc De Raedt (chair and primary contact) Saso Dzeroski Koichi Furukawa Fumio Mizoguchi Stephen Muggleton PROGRAMME COMMITTEE Francesco Bergadano (Italy) Luc De Raedt (co-chair, Belgium) Saso Dzeroski (Slovenia) Johannes Furnkranz (Austria) Koichi Furukawa (Japan) David Page (U.K.) Fumio Mizoguchi (Japan) Ray Mooney (U.S.A.) Stephen Muggleton (co-chair, U.K.) CALL FOR PARTICIPATION Participation is open to all members of the AI Community. However, to encourage interaction and a broad exchange of ideas the number of participants will be strictly limited (preferably under 30 and certainly under 40). Participants will be selected on the basis of submissions. Three types of submissions will be considered : 1) technical contributions (ideally, a 3 to 5 page extended abstract, in the IJCAI Proceedings Format, 3000-4000 words), 2) position papers (ideally, a 1 to 3 page abstract in the IJCAI Proceedings Format, 1000 - 3000 words) 3) a statement of interest (ideally, a one page motivation of why you would like to participate, 300- 500 words) Only submissions of type 1) and 2) will be considered for presentation at the workshop and inclusion in the workshop notes. Submissions should be received no later than April 1, 1997, and must include first author's complete contact information, including address, email, phone, and fax number. Though 1 April is the hard deadline, the authors are encouraged to submit their material by 24 March, in order to facilitate the reviewing process. Double submissions with the ILP-97 Workshop (which is to take place in Prague, September 1997) are allowed. SUBMISSIONS Submit papers by email (postscript) and surface mail (2 copies) to Luc De Raedt Dept. of Computer Science Katholieke Universiteit Leuven Celestijnenlaan 200A B-3001 Heverlee Belgium Email : [email protected] IMPORTANT DATES - Paper submission : 1 April - Notification to Authors : 21 April - Camera ready copy : the submissions themselve will serve as camera ready copy (submissions in the IJCAI Proceedings Style are strongly preferred, see http://www.ijcai.org/ijcai-97/ for details) PUBLICATION The accepted submissions will be included in the workshop notes to be distributed at the workshop. Post-conference publication of a selection of the workshop papers will be considered and discussed at the workshop. COSTS To cover costs, a fee of $US 50 will be charged, in addition to the normal IJCAI-97 conference registration fee. Attendees of IJCAI workshops will be required to register for the main IJCAI conference. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Subject: ILP-97: CFP Reminder Date: Mon, 17 Mar 1997 15:49:23 +0100 From: Saso Dzeroski <[email protected]> The Seventh International Workshop on Inductive Logic Programming 17-19 September 1997, Prague, Czech Republic The deadline for paper submissions is 31 March 1997. ------------- Invited talks will include: "Data Mining: Algorithms and Limitations" by Usama Fayyad, "Complexity of Logic Programming" by Georg Gottlob, and "ILP and CLP" by Jean-Francois Puget. For more information see http://www-ai.ijs.si/SasoDzeroski/ilp97.html
410.20	97:11	IJSAPL::OLTHOF	Spellchecked Henry Although	`Tue Apr 01 1997 08:39`	904
	Knowledge Discovery Nuggets 97:11, e-mailed 97-03-28 News: * GPS, KDD-97 Tutorials Program http://www-aig.jpl.nasa.gov/kdd97-docs/kdd97.tutorials.html * J. Wiegand, KDD tools/methods for detection of skin malignancies? Publications: * P. Vitanyi, Kolmogorov Complexity and Applications, 2nd ed., http://www.cwi.nl/~paulv/kolmogorov.html * R. Caldwell, Special Issue and Competition on Improving Generalization for Nonlinear Financial Forecasting Models http://ourworld.compuserve.com/homepages/ftpub/call.htm Positions: * V. Petraglia, Thinking Machines, Consultant Positions * M. Ramoni, Research Studentships at the Knowledge Media Institute Meetings: * G. Widmer, ECML-97 Preliminary Programme, 23-25 April 1997, Prague, Czech Republic http://is.vse.cz/ecml97/home.html * J. Han, SIGMOD-97 Data Mining Workshop, May 11, 1997 http://fas.sfu.ca/cs/conf/dmkd97.html * W. Wothke, Chicago ASA Data Mining meeting, May 2, 1997 http://www.smallwaters.com/datamine * GPS, Data Mining'97 : Increasing Corporate Performance, Paris, June 2-4, 1997, http://www.datamining.org/events.htm * S. Tafolla, XpertUser Conference, 2-5 November 1997, Boston, http://www.XpertUser.com -- Knowledge Discovery Nuggets is a free electronic newsletter for the Data Mining and Knowledge Discovery community, focusing on the latest research and applications. Submissions are most welcome and should be emailed, with a DESCRIPTIVE subject line (and a URL) to [email protected]. Submissions may be edited for brevity. To subscribe, see http://www.kdnuggets.com/subscribe.html KD Nuggets frequency is 3-4 times a month. Back issues of KD Nuggets, a catalog of data mining tools ("Siftware"), and a wealth of other information on Data Mining and Knowledge Discovery are available at Knowledge Discovery Mine site http://www.kdnuggets.com/ -- Gregory Piatetsky-Shapiro (editor) ******************* Official disclaimer ******************************** All opinions expressed herein are those of the contributors and not necessarily of their respective employers or of KD Nuggets *************************************************************************** ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The first and simplest emotion which we discover in the human mind, is curiosity. --Edmund Burke >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: 27 Mar 1997, 17:12:15 From: GPS <[email protected]> Subject: KDD-97 Tutorials KDD-97 conference will have a day of excellent tutorials by leading researchers-many thanks to P. Smyth for putting it together. See http://www-aig.jpl.nasa.gov/kdd97-docs/kdd97.tutorials.html for full details ================================================================ <title> KDD97 Tutorial Abstracts and Speakers </title> <h2> Tutorial 1: Data Mining and KDD: An Overview </h2> <h3> Usama Fayyad, Microsoft Research and Evangelos Simoudis, IBM. </h3> We present a basic tutorial of this new and emerging area and emphasize relations to constituent communities including statistics, databases, pattern recognition, learning, and visualization. The tutorial provides a basic overview of the KDD process for extracting knowledge from databases and covers the basics of each step in the process including: data warehousing, selection and cleaning, data transformation, data mining, evaluation, and visualization. We also cover a sampling of successful applications and outline challenges and issues to be addressed.<p> <hr> <h2> Tutorial 2: Modelling Data and Discovering Knowledge</h2> <h3> David Hand, Open University, UK. </h3> Our aim is to extract knowledge from large bodies of data. The size of these bodies mean that we cannot do it unaided, but must use fast computers, applying sophisticated statistical tools. Attempts to automate the process of knowledge extraction date from at least the early 1980s, with the work on statistical expert systems. We examine this work, noting its successes and failures and, especially, what researchers in data mining and knowledge discover can learn from those efforts. We examine what data are, what information is, and what knowledge is. We contrast modelling with discovery, especially in the context of large data sets. We examine high level modelling issues, such as overfitting, generalisability, overmodelling, and model evaluation. And we examine high level exploration issues such as the discovery of accidental artefacts. The confluence of computing and statistics in some areas provides a nice backdrop against which to examine these issues, and we briefly discuss neural networks and classification trees from these two perspectives.<p> <hr> <h2> Tutorial 3: Text Mining - Theory and Practice</h2> <h3> Ronen Feldman, Bar-Ilan University, Israel. </h3> Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. In this tutorial we will present the general theory of Text Mining and will demonstrate several systems that use these principles to enable interactive exploration of large textual collections. We will describe generic techniques for text categorization and information extraction that are used by these systems. The systems that will be presented are KDT which is system for Knowledge Discovery in Texts, FACT, which discovers associations amongst keywords labeling the items in a collection of textual documents, and the Text Explorer which is a system that provides a high level language for interactive exploration of textual collections. We will present a general architecture for text mining and will outline the algorithms and data structures behind the systems. We will give special emphasis to incremental algorithms and to efficient data structures. <p> <hr> <h2> Tutorial 4: Exploratory Data Analysis using Interactive Dynamic Graphics </h2> <h3> Deborah Swayne, Bell Communications Research and Diane Cook, Iowa State University. </h3> Researchers and software designers in the field of data mining are just beginning to make extensive use of graphical methods. Interactive dynamic data visualization has been explored in the field of statistics for over twenty years, and we propose that much of what has been learned in statistics is relevant for data mining. This class is an introduction to interactive data visualization as it is practiced as part of exploratory data analysis. The XGobi software, publicly available dynamic visualization software, will be used in the analysis of examples from biology, business, physics, engineering, and telecommunications. The examples will illustrate a set of general visualization principles which are embodied in specific methods such as brushing and identification of points in simple scatterplots, three dimensional rotations, rotations in higher dimensions such as the grand tour, and directed searches in higher dimensions for interesting two dimensional views using projection pursuit and manual control. <p> <hr><h2> Tutorial 5: Visual Techniques for Exploring Databases </h2> <h3> Daniel Keim, University of Munich.</h3> For data exploration to be effective, it is important to include the human in the exploration process and combine the flexibility, creativity, and general knowledge of the human with the enormous storage capacity and the computational power of today's computers. Visual database exploration aims at integrating the human in the exploration process, applying its perceptual abilities to the large data sets available in today's computer systems. The basic idea of visual data exploration is to present the data in some visual form, allowing the human to get insight into the data and draw conclusions. Visual data exploration techniques have proven to be of high value in exploratory data analysis and they also have a high potential for exploring large databases. Visual database exploration is especially powerful for the first steps of the data mining process, namely understanding the data and generating hypotheses about the data, but it may also significantly contribute to the actual knowledge discovery by guiding the search using visual feedback. The goal of the tutorial is to show the potential of visualization technology for exploring large databases. The tutorial provides an overview of the state-of-the-art in data visualization and provides a classification of the existing data visualization techniques. Besides describing each of the classes, the tutorial focuses on new developments in data visualization, which are relevant to the area of knowledge discovery, and describes a wide range of recently developed techniques for visualizing large amounts of arbitrary multi-attribute data which does not have any two- or three-dimensional semantics and therefore does not lend itself to an easy display. A detailed comparison shows the strength and weaknesses of the existing techniques and reveals potentials for further improvements. Several examples demonstrate the benefits of visualization techniques for exploring databases. The tutorial concludes with an overview of existing database exploration and visualization systems, including research prototypes as well as commercial products. <p> <hr><h2> Tutorial 6: OLAP and Data Warehousing</h2> <h3> Surajit Chaudhuri, Microsoft Research and Umesh Dayal, Hewlett Packard Labs. </h3> On-Line Analytical Processing (OLAP) and Data Warehousing technologies enable enterprises to gain competitive advantage by exploiting the ever-growing amount of data that is collected and stored in corporate databases and files for better and faster decision making. Over the past few years, these technologies have experienced explosive growth, both in the number of products and services offered, and in the extent of coverage in the trade press. Vendors (including all database companies) are paying increasing attention to all aspects of decision support. The area opens up interesting research directions, with ties to past work in database systems, but with different assumptions and requirements. Only very recently, however, has the database research community started to understand and address some of these issues. This tutorial presents an overview of OLAP and data warehousing, and an in-depth study of selected aspects. An outline of the tutorial follows: 1. Introduction: definitions, evolution, differences from OLTP, architectures 2. Models and Tools: conceptual model for OLAP, front-end tools (e.g., multidimensional spreadsheets), database design (e.g., star and snowflake schema). 3. Database Server technologies for Decision Support Queries: specialized indexing techniques, specialized join and scan methods, data partitioning and use of parallelism, intelligent processing of aggregates, complex query processing, extensions to SQL, ROLAP vs. MOLAP. 4. Other Services for OLAP/Data warehousing: data cleaning, loading and refresh, tools for warehouse, system and process management, metadata management and the role of repository. 5. State of Commercial Practice. 6. Research Issues. The target audience is researchers and developers interested in learning about the concepts, products and the technical innovations in the area of decision support technologies. <p> <hr><h2> Tutorial 7: Statistical Models for Categorical Response Data</h2> <h3> William DuMouchel, AT&T Research. </h3> This tutorial will survey the most common models and methods statisticians use to fit and test relationships among categorical (discrete) data. Most of these techniques are described in statistics texts such as <i> Categorical Data Analysis </i>, by Alan Agresti, (Wiley 1990) and are widely available in popular computer packages such as SAS and Splus. Therefore it is almost de rigeur for someone with a new classification technique to compare the proposal to one or more of these standard methods. The tutorial will focus on loglinear and logistic regression models, and related models such as probit, poisson regression, and survival models. In the short time available, priority will be given to explaining why these techniques are so popular among statisticians, and to how the basic models have been extended to handle variables having more than two categories or when some of the variables have continuous or ordinal scales. Examples of model fitting, model search and model comparison using SAS and Splus will be presented and discussed. For Biographical Information on Presenters see the web site http://www-aig.jpl.nasa.gov/kdd97-docs/kdd97.tutorials.html Contact Information: <a href="http://www.ics.uci.edu/~smyth"> Padhraic Smyth </a> University of California, Irvine (KDD-97 Tutorials Chair). >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: [email protected] Date: Fri, 21 Mar 1997 20:04:25 -0500 (EST) I am searching for KDD tools/approaches for searching through clinical data to help develop and fine-tune medical imaging or detection equipment. Specifically, early detection of skin malignancies. Perhaps there is a group somewhere working on this. Thank you. Best wishes, Jeff Wiegand >~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Wed, 19 Mar 1997 15:48:16 +0100 From: [email protected] Ming Li and Paul Vitanyi, AN INTRODUCTION TO KOLMOGOROV COMPLEXITY AND ITS APPLICATIONS, REVISED AND EXPANDED SECOND EDITION, Springer-Verlag, New York, 1997, xx+637 pp, 41 illus. Hardcover \$49.95/ISBN 0-387-94868-6 (Graduate Texts in Computer Science Series) After four years and two printings the second edition has now appeared. During the preparation the book has been out of stock for a year. In interaction with many readers and teachers of courses and seminars, all reported errors and problems have been corrected. The book is revised and expanded by about 90 pages. The price has been lowered by over $9. See the web page "http://www.cwi.nl/~paulv/kolmogorov.html". >From the ``PREFACE TO THE SECOND EDITION'': When this book was conceived ten years ago, few scientists realized the width of scope and the power for applicability of the central ideas. Partially because of the enthusiastic reception of the first edition, open problems have been solved and new applications have been developed. We have added new material on the relation between data compression and minimum description length induction, computational learning, and universal prediction; circuit theory; distributed algorithmics; instance complexity; CD compression; computational complexity; Kolmogorov random graphs; shortest encoding of routing tables in communication networks; resource-bounded computable universal distributions; average case properties; the equality of statistical entropy and expected Kolmogorov complexity; and so on. Apart from being used by researchers and as reference work, the book is now commonly used for graduate courses and seminars. In recognition of this fact, the second edition has been produced in textbook style. We have preserved as much as possible the ordering of the material as it was in the first edition. The many exercises bunched together at the ends of some chapters have been moved to the appropriate sections. The comprehensive bibliography on Kolmogorov complexity at the end of the book has been updated, as have the ``History and References'' sections of the chapters. Many readers were kind enough to express their appreciation for the first edition and to send notification of typos, errors, and comments. Their number is too large to thank them individually, so we thank them all collectively. BLURB: Written by two experts in the field, this is the only comprehensive and unified treatment of the central ideas and their applications of Kolmogorov complexity---the theory dealing with the quantity of information in individual objects. Kolmogorov complexity is known variously as `algorithmic information', `algorithmic entropy', `Kolmogorov-Chaitin complexity', `descriptional complexity', `shortest program length', `algorithmic randomness', and others. The book is ideal for advanced undergraduate students, graduate students and researchers in computer science, mathematics, cognitive sciences, artificial intelligence, philosophy, statistics and physics. The book is self contained in the sense that it contains the basic requirements of computability theory, probability theory, information theory, and coding. Included are also numerous problem sets, comments, source references and hints to the solutions of problems, course outlines for classroom use, as well as a great deal of new material not included in the first edition. If you are seriously interested in using the text in the course, contact Springer-Verlag's Editor for Computer Science, Martin Gilchrist, for a complimentary copy. Martin Gilchrist [email protected] Suite 200, 3600 Pruneridge Ave. (408) 249-9314 Santa Clara, CA 95051 If you are interested in the text but won't be teaching a course, we understand that Springer-Verlag sells the book, too. To order, call toll-free 1-800-SPRINGER (1-800-777-4643); N.J. residents call 201-348-4033. For information regarding examination copies for course adoptions, write Springer-Verlag New York, Inc. , 175 Fifth Avenue, New York,NY 10010. You can order through the Web site: "http://www.springer-ny.com/" For U.S.A./Canada/Mexico- e-mail: [email protected] or fax an order form to: 201-348-4505. For orders outside U.S.A./Canada/Mexico send this form to: [email protected] Or call toll free: 800-SPRINGER - 8:30 am to 5:30 pm ET (that's 777-4643 and 201-348-4033 in NJ). Write to Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY, 10010. Visit your local scientific bookstore. Mail payments may be made by check, purchase order, or credit card (see note below). Prices are payable in U.S. currency or its equivalent and are subject to change without notice. Remember, your 30-day return privilege is always guaranteed! Your complete address is necessary to fulfill your order. >~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: Randall Caldwell <[email protected]> Subject: CFP: Improving Generalization for Nonlinear Financial Forecasting Models Journal of Computational Intelligence in Finance Call for Papers Special Issue and Competition on "Improving Generalization for Nonlinear Financial Forecasting Models" The Journal of Computational Intelligence in Finance, a peer-reviewed technical journal, published by Finance & Technology Publishing, is seeking papers for review and publication in 1997 on "Improving Generalization for Nonlinear Financial Forecasting Models". For comparison of methods submitted, the target variable series and performance metrics are specified (though not required). PUBLICATION DATE November 1997 PAPER SUBMISSION DEADLINE June 30, 1997 MOTIVATION The critical issue in applying neural networks and other data-driven forecasting systems is generalization, the performance on data not used for training. The key to generalization behavior is model complexity. Too simple a model cannot approximate the true relationship, and overly complex models adjust to the noise in the data. Nearly all financial applications of nonparametric models (such as neural networks and genetic algorithms) vary model complexity by adjusting the number of parameters. This special issue intends to highlight other methods to improve generalization, in particular regularization (e.g., neural network weight decay and smoothing) and techniques for combining models. Of particular interest are nonlinear methods including neural networks, genetic algorithms, nearest neighbor networks, polynomial networks, fuzzy logic, and hybrids. Nearly all studies apply cross-validation to select the best model. Alternatives to cross-validation include 'analytical' selection rules such as Akaike's Information Criterion, Schwartz's Information Criterion, and a number of others. Of particular interest are the statistical properties (i.e., bias and variance) of model selection methods in estimating out-of-sample performance. DATA, TARGET VARIABLES and PERFORMANCE METRICS Data: daily prices of a financial time series (see below) Target Variable: the relative difference in percent (RDP) between today's closing price and the price five (5) days ahead Performance Metrics: MSE (target). nRMSE and DS (to be used in the analysis). Participants are encouraged to use the forecast data, target variable and performance metrics specified for this special issue, which are available on the Web to those who submit a satisfactory abstract (including brief biography) as outlined below. Participants are not be restricted regarding the data used as inputs to their predictors. Especially interesting original methods using other forecast data, target variables and performance metrics will also be considered. The forecast series is derived from daily closing prices for a financial time series. The target variable is the relative difference in percent (RDP) between today's closing price and the closing price five (5) days ahead. The date, the underlying price series and the target variable series are all provided in the downloadable data file. The target metric is the MSE. Also, authors' analysis should include the normalized RMSE (RMSE normalized using the standard deviation of actual RDP values), and Directional Symmetry (percentage of correctly predicted directions with respect to the target variable). The forecast data provided is separated into in-sample (10 years of daily data) and out-of-sample (2 years of daily data) sets. Participants are not restricted regarding the data used as input to their predictors. However, all data used should be disclosed in the paper presentaton, including the details of all techniques and formulas used to pre-process the data. Details on the predictor and the methods used for improving generalization should be presented in the paper. FORECAST HORIZON AND RE-TRAINING Participants should test performance of their predictors over the entire two-year out-of-sample dataset. Of interest are results of analyses and performance of predictors over the entire two-year prediction period: (1) without re-training and (2) with re-training (optional). The results from (1) and (2) can be useful for estimating the limits of the forecasting horizon for the prediction methods presented. For additional details on the forecast data, target variable and performance metrics, see: http://ourworld.compuserve.com/homepages/ftpub/call.htm >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Fri, 21 Mar 1997 11:07:08 -0500 From: Vaughn Petraglia <[email protected]> Subject: Thinking Machines, Consultant Positions Thinking Machines Professional Services Senior Consultant Data Mining San Francisco bay area and other locations 3/12/97 As a member of the new Thinking Machines Professional Services Organization, you will be responsible for all aspects of bidding and delivering consulting products and service to many of our most important customers. You will lead or participate in small teams of seasoned professionals to help our customers use Darwin to find new business opportunities hidden in their very large databases and data warehouses. Major job functions include: 1. Working with a TMC Account Executive to understand the customer or prospects requirements, you will provide technical guidance through the sales cycle. 2. Develop a project plans, risk analysis, and formal services bids. 3. Organizing and managing all resources needed to complete the project within budget and on time. 4. Providing hands on data analysis and data mining consulting. 5. Consulting and skills transfer on the Darwin product. 6. Follow-up to insure customer satisfaction. The ideal candidate will have: 1. Project management experience. 2. Excellent written and oral communications skills. 3. Advanced degree in an analytical field or equivalent experience. 4. Experience in data analysis, database systems, knowledge based systems or data mining. 5. Experience in parallel algorithms and parallel computer systems is desirable. Contact: Vaughn Petraglia [email protected] Thinking Machines 14 Crosby Dr. Bedford, Ma 01730 >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Fri, 21 Mar 1997 18:45:32 +0000 From: Marco Ramoni <[email protected]> Subject: Research Studentships at the Knowledge Media Institute The Knowledge Media Institute (KMi) is home to internationally recognised researchers in Educational Multimedia, Collaboration Technologies, Artificial Intelligence, Cognitive Science, and Human-Computer Interaction. KMi offers students an intellectually challenging environment with exceptional research and computer facilities. We are currently seeking applications for full-time, 3-year research studentships in the following areas: - Migratory Interfaces and Mobile Computing - Virtual Intelligence and Knowledge Discovery - Knowledge Management and Knowledge Modelling - Sharing and Reusing Design Knowledge over the WWW Applicants are typically expected to have a degree in computer science, artificial intelligence, cognitive science, psychology, or a related discipline. As KMi only accepts a very small number of research students per year, admission is highly competitive. To apply, send a CV and short project proposal (3 pages) along with a completed application form. Successful candidates must be willing to live within reasonable commuting distance from Milton Keynes, and be available to start on October 1, 1997. Applicants are strongly encouraged to visit the KMi web site (http://kmi.open.ac.uk/studentships) for more information on ongoing KMi projects and the studentships. An application form with further particulars can be obtained by contacting Ms. Ortenz Rose by email ([email protected]), telephone (+44 (1908) 653 800) or post (Knowledge Media Institute, The Open University, Walton Hall, Milton Keynes, MK7 6AA, UK). Informal advice on these studentships can be obtained by contacting Dr. Tamara Sumner, admissions co-ordinator, by email at [email protected] or by telephone at the number above. Closing date for applications: 18 April 1997 Further particulars are attached below. Virtual Intelligence and Knowledge Discovery Marco Ramoni (KMi) http://kmi.open.ac.uk/~marco The Virtual Intelligence Project and the Knowledge Discovery Project at the Knowledge Media Institute seek a candidate PhD student to work at the intersection of their areas of research. The Virtual Intelligence Project focuses on the development of distributed Artificial Intelligence applications over the World Wide Web. The Knowledge Discovery Project investigates probabilistic and statistical methods to extract reusable knowledge sources from databases. The PhD project will fall into their joint effort to develop a distributed knowledge discovery architecture over the World Wide Web. The successful candidate will be able to choose a research topic among a variety of key issues underlying this research, ranging from methodological aspects of knowledge extraction and distributed artificial intelligence to design and development issues of the architecture. More information on the Virtual Intelligence Project is available at: http://kmi.open.ac.uk/~marco/projects/wai/vip More information on the Knowledge Discovery Project is available at: http://kmi.open.ac.uk/~marco/projects/kdd For more information on this studentship, contact Marco Ramoni at [email protected]. >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 17 Mar 1997 15:21:43 +0100 (MET) From: Gerhard Widmer <[email protected]> Subject: ECML-97 Preliminary Programme 9th EUROPEAN CONFERENCE ON MACHINE LEARNING (ECML-97) 23-25 April 1997, Prague, Czech Republic PRELIMINARY PROGRAMME Up-to-date information on the conference (including registration information) can be found at http://is.vse.cz/ecml97/home.html This programme with complete abstracts of all talks and links to the workshops is also available at http://www.ai.univie.ac.at/ecml/programme.html ----------------------------------------------------------------------------- -------------------- WEDNESDAY, APRIL 23: 9.00 - 9.30 Welcome 9.30 - 10.30 INVITED TALK: Uncertain Learning Agents Stuart Russell, University of California, Berkeley, USA 10.30 - 11.00 Coffee Break 11.00 - 10.30 Integrated Learning and Planning Based on Truncating Temporal Differences Pawel Cichosz 11.30 - 12.00 Finite-Element Methods with Local Triangulation Refinement for Continuous Reinforcement Learning Problems Remi Munos 12.00 - 12.15 Learning and Exploitation Do Not Conflict Under Minimax Optimality Csaba Szepesvari 12.15 - 12.30 Exploiting Qualitative Knowledge to Enhance Skill Acquisition Cristina Baroglio 12.30 - 14.00 Lunch 14.00 - 15.00 INVITED TALK: Constructing and Sharing Perceptual Distinctions Luc Steels, Free University of Brussels (VUB) and Sony Computer Science Laboratory, Paris 15.00 - 15.30 Ibots Learn Genuine Team Solutions Cristina Versino, Luca Maria Gambardella 15.30 - 16.00 Coffee Break 16.00 - 16.30 NeuroLinear: A System for Extracting Oblique Decision Rules from Neural Networks Rudy Setiono, Huan Liu 16.30 - 17.00 Learning Different Types of New Attributes by Combining the Neural Network and Iterative Attribute Construction Yuh-Jyh Hu 17.00 - 17.45 Commenting Session ------------------- THURSDAY, APRIL 24: 9.00 - 10.00 INVITED TALK: On Prediction by Data Compression Paul Vitanyi, CWI, Amsterdam 10.00 - 10.30 Conditions for Occam's Razor Applicability and Noise Elimination Dragan Gamberger, Nada Lavrac 10.30 - 11.00 Coffee Break 11.00 - 11.30 Compression-Based Pruning of Decision Lists Bernhard Pfahringer 11.30 - 11.45 Inductive Genetic Programming with Decision Trees Nikolay I. Nikolaev, Vanio Slavov 11.45 - 12.00 Probabilistic Incremental Program Evolution: Stochastic Search Through Program Space Rafal Salustowicz, Juergen Schmidhuber 12.00 - 12.30 Constructing Intermediate Concepts by Decomposition of Real Functions Janez Demsar, Blaz Zupan, Marko Bohanec, Ivan Bratko 12.30 - 14.00 Lunch 14.00 - 14.30 Global Data Analysis and the Fragmentation Problem in Decision Tree Induction Ricardo Vilalta, Gunnar Blix, Larry Rendell 14.30 - 15.00 Model Combination in the Multiple-Data-Batches Scenario Kai Ming Ting, Boon Toh Low 15.00 - 15.30 Commenting Session 15.30 - 16.00 Coffee Break 16.00 - 17.00 Poster Session 17.00 - open ECML Community Meeting ----------------- FRIDAY, APRIL 25: 9.00 - 9.15 A Case Study in Loyalty and Satisfaction Research Koen Vanhoof, Josee Bloemer, Koen Pauwels 9.15 - 9.30 Inducing and Using Decision Rules in the GRG Knowledge Discovery System Ning Shan, Howard J. Hamilton, Nick Cercone 9.30 - 9.45 Learning When Negative Examples Abound Miroslav Kubat, Robert Holte, Stan Matwin 9.45 - 10.00 Search-Based Class Discretization Luis Torgo, Joao Gama 10.00 - 10.15 Classification by Voting Feature Intervals G"ulsen Demir"oz, H. Altay G"uvenir 10.15 - 10.30 A Model for Generalization Based on Confirmatory Induction Nicolas Lachiche, Pierre Marquis 10.30 - 11.00 Coffee Break 11.00 - 11.30 Natural Ideal Operators in Inductive Logic Programming Fabien Torre, Celine Rouveirol 11.30 - 12.00 Theta-subsumption for Structural Matching Luc De Raedt, Peter Idestam-Almquist, Gunther Sablon 12.00 - 12.30 Induction of Feature Terms with INDIE Eva Armengol, Enric Plaza 12.30 - 12.45 Metrics on Terms and Clauses Alan Hutchinson 12.45 - 13.00 Learning Linear Constraints in Inductive Logic Programming Lionel Martin, Christel Vrain Afternoon off - trip and farewell party (optional; see social programme) ------------------ SATURDAY, APRIL 26: ECML/MLNet WORKSHOPS: WS 1: Data-Driven Learning of Natural Language Processing Tasks WS 2: Case-Based Learning: Beyond Classification of Feature Vectors WS 3: Learning in Dynamically Changing Domains: Theory Revision and Context Dependence Issues WS 4: Machine Learning and Human-Agent Interaction >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: Jiawei Han <[email protected]> Date: Tue, 18 Mar 1997 22:05:37 -0800 (PST) Subject: SIGMOD'97 Data Mining Workshop: Call for Participation Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD'97) in cooperation with ACM-SIGMOD'97 Tucson, Arizona, May 11, 1997 (URL: http://fas.sfu.ca/cs/conf/dmkd97.html) PROGRAM The workshop will be held one day before the SIGMOD/PODS'97 conference. The program is as follows: 8:30--8:35 Opening Remarks 8:35--9:30 Invited Talk 9:30--9:45 Coffee Break 9:45--11:00 Session I Clustering/Classification A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining Zhexue Huang Clustering Based On Association Rule Hypergraphs Eui-Hong Han, George Karypis, Vipin Kumar and Bamshad Mobasher Ontology-based Induction of High Level Classification Rules Merwyn G. Taylor, Kilian Stoffel and James A. Hendler 11:00--11:15 Coffee Break 11:15--12:30 Session II Applications An efficient domain-independent algorithm for detecting approximately duplicate database records Alvaro E. Monge and Charles P. Elkan An Application of Adaptive Data Mining: Facilitating Web Information Access Parvathi Chundi and Umeshwar Dayal Efficient Roll-Up and Drill-Down Analysis for Large Data Sets Min Wang and Bala Iyer 12:30--14:15 Lunch, Posters, Demos 14:15--15:30 Session III Association Rules Mining Association Patterns from Nested Databases Ke Wang Maintenance of Discovered Association Rules: When to update? S.D. Lee and David W. Cheung Efficient Algorithms for Discovering Frequent Sets in Incremental Databases Ronen Feldman, Yonatan Aumann, Amihood Amir and Heikki Mannila 15:30--15:45 Coffee Break 15:45--17:00 Session IV Miscellany Sharing Processing in Data Mining Systems Arun Swami and Brian Lent A Pattern Discovery Algebra Alexander Tuzhilin On the Complexity of Mining Temporal Trends Jef Wijsen and Robert Meersman 17:00-18:00 Summary Discussion >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Thu, 20 Mar 1997 09:29:50 -0600 From: Werner Wothke <[email protected]> Subject: Chicago ASA Data Mining Conference, May 2, 1997 The Chicago Chapter of the American Statistical Association is presenting a Data Mining conference on May 2, titled A Hard Look at Data Mining The idea of the conference is to peel away most of the hype and present the local statistical and data analysis community with some solid technical and statistical information. A web site with additional information can be found at http://www.smallwaters.com/datamine With beste wishes, Werner Wothke >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Thu, 20 Mar 1997 17:48:34 -0500 From: Gregory Piatetsky-Shapiro <[email protected]> Subject: Paris Data Mining'97 Event, June 2-4 See http://www.datamining.org/events.htm for full information <h2 align=center>Data Mining'97 : Increasing Corporate Performance</h2> <h2 align=center>Meridien Montparnasse Hotel, Paris, June 2-4, 1997</h2> <h3>THE DATA MINING MARKET : TRENDS AND EVOLUTION</h3> <dl> <li>Market and players <li>Perspectives and trends : Data Mining in 2000 and beyond <li>Mining the Net : maximizing external data retrieval and analysis <li>Data Mining and the law : situation and perspectives </dl> <h3>INTRODUCTION TO DATA MINING</h3> <dl> <li>More than a media phenomenon, what are the real issues for data mining ? <li>Corporate data bases : retrieval and output <li>The latest technologies <li>Technology-human interface </dl> <h3>DATA MINING BEST PRACTICE</h3> <dl> <li>Data warehousing, On Line Analytical Processing and data mining <li>Data and their representation for data mining <li>Optimizing access to stored information <li>Utilizing data mining to further management strategies <li>Using data mining to measure corporate performance through data mining </dl> <h3>DATA MINING APPLICATIONS</h3> <dl> <li>Direct marketing and data mining : customer satisfaction and retention <li>Geomarketing and data mining <li>Marketing strategy and data mining : optimizing a commercial network <li>Finance and data mining : credit management and risk assessment <li>Adapting to changing markets through implementing data mining processes in all fields of business </dl> <p><strong>A unique opportunity to meet your potential customers and peers and hear the latest from the competition !</strong></p> <p>This forum will be a premier opportunity to network & exchange business cards with CEOs, VPs, and managers of : <dl> <li>Finance <li>Marketing <li>Sales <li>Strategic Planning <li>Information Systems <li>Advertising above and below the line </dl> In the fields of : <dl> <li>Financial services <li>Insurance <li>Mail order companies <li>Retail <li>Healthcare <li>Computing, Telecommunications <li>Government <li>Transport and logistics </dl> </p> <p><strong>This Conference will be a premiere in Europe. Come join us in Paris!</strong></p> <p>For further information and registration, please contact us at <a href='mailto:[email protected]'>[email protected]</a></p> >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: [email protected] Date: Sun, 23 Mar 1997 17:05:26 -0800 KNOWLEDGE ACCELERATION The 1997 XpertUser Conference 2 - 5 November 1997 Boston, Massachusetts http://www.XpertUser.com In support of its XpertRule(r) and Profiler(tm) products, Attar Software announces its 1997 XpertUser Conference entitled: "Knowledge Acceleration." The Conference, to be held in Boston, MA, 2 - 5 November 1997, features a keynote address by Professor Donald Michie, a pioneer in the field of Machine Intelligence. In addition, there are planned tutorials on data mining and knowledge engineering as well as application demonstrations, and technical sessions with Dr. Akeel Al-Attar, and other experts from Attar's world-wide customer base. The Conference web page is at http://www.XpertUser.com. The registration fee is $695 until 1 July when it iincreases to $895.
410.21	97:12	IJSAPL::OLTHOF	Spellchecked Henry Although	`Wed Apr 23 1997 11:45`	459
	Knowledge Discovery Nuggets 97:12, e-mailed 97-04-10 News: * E. Colet, Advanced Scout News -- http://www.nextstep.com/new_this_week/120/advancedscout.html * A. Andrusiewicz, Query -- Mining Association Rules Publications: * H. Motoda, Final CFP: IEEE Expert Special Issue on Feature Transformation and Subset Selection Siftware: * O. Leng, WinViz for Excel, http://jsaic.iti.gov.sg/projects/vizMain.html Positions: * W. Jones, Knowledge Discovery Research at U. of Alabama at Birmingham (UAB), http://www.cis.uab.edu/info/kdrg/kdrg.html * R. Straughan, Senior Consultant in Data Mining at NSRC in Singapore http://www.nsrc.nus.sg Meetings: * R. Tibshirani, Modern Regression and Classification course, New York , June 23-24, 1997 http://stat.stanford.edu/~trevor/mrc.finance.html * PADD97, Practical Application of Knowledge Discovery and Data Mining Conference Program, London, 23-25 April 1997, http://www.demon.co.uk/ar/PADD97/ * M. Conkling, Data Warehousing Best Practices & Implementation Conference Chicago May 27-June 1, 1997, http://www.dw-institute.com/ * GPS, Data Mining'97 : Increasing Corporate Performance, Paris, June 2-4, 1997, cancelled -- Knowledge Discovery Nuggets is a free electronic newsletter for the Data Mining and Knowledge Discovery community, focusing on the latest research and applications. Submissions are most welcome and should be emailed, with a DESCRIPTIVE subject line (and a URL) to [email protected]. To subscribe, see http://www.kdnuggets.com/subscribe.html Back issues of KD Nuggets, a catalog of data mining tools ("Siftware"), pointers to Data Mining Companies, Relevant Websites, Meetings, and more is available at Knowledge Discovery Mine site at http://www.kdnuggets.com/ -- Gregory Piatetsky-Shapiro (editor) [email protected] ******************* Official disclaimer *********************** All opinions expressed herein are those of the contributors and not necessarily of their respective employers (or of KD Nuggets) ***************************************************************** ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ No matter how neutral the topic, your message will offend SOMEONE. Murphy's laws of BBS, thanks to http://www.calweb.com/~logon/murphy.html >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: "Edward Colet"<[email protected]> Date: Wed, 26 Mar 1997 16:30:56 -0400 Subject: Advanced Scout Readers may be interested in some recent updates on the data mining/KDD work of IBM Research's Advanced Scout Project (the data mining application used in the National Basketball Association). These can be found in newspapers, TV, the web and the SIGMOD/PODS schedule. Specifically, the press coverage of Advanced Scout appeared in the Los Angeles Times, 2/17/97, page C4. Also, the TV show, "NextStep" showed a feature on Advanced Scout that aired in the San Francisco area on 3/8/97. A broadcast of this feature will air nationwide on the Discovery channel at a later date. The URL for the NextStep feature called "Hard-wired Hoops" can be found at : http://www.nextstep.com/new_this_week/120/advancedscout.html Also available on the Web is an online posting containing the abstract and bio for the keynote address on data mining at SIGMOD/PODS, 1997 to be given by Inderpal. The URL is: http://mundos.ifsm.umbc.edu/~ramesh/sigmod97/advprog.html. It's accessible from within both the SIGMOD or the PODS schedules. Thanks, Ed Colet. ***************************************** IBM T.J. Watson Research Center 30 Saw Mill River Road Hawthorne NY 10532 phone: 914-784-6621; tie-line 863 fax: 914-784-7455 email: [email protected] ***************************************** >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Thu, 27 Mar 1997 12:04:21 +1000 (EST) From: Anna Andrusiewicz <[email protected]> Hi, I am working on a problem that may be related to mining generalized association rules. The basic problem involves mining student enrolment histories in order to figure out what subjects are being taken by what kinds of students. I would like to conduct a case study on the enrolments data I have, and was wondering if anyone knows of a public domain system for mining association, or multi-level association rules. Any help offered will be much appreciated - thank you, Anna Andrusiewicz School of Information Technology The University of Queensland, Australia >~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: [email protected] Subject: Final Call for Papers: IEEE Special Issue Date: Sat, 29 Mar 97 17:13:06 +0900 Final Call For Papers IEEE Expert Special Issue on Feature Transformation and Subset Selection Guest Editors: Huan Liu and Hiroshi Motoda (edited for space ... see Nuggets 96:37 for full CFP http://www.kdnuggets.com/nuggets/96/n37.html#item4) III. SUBMISSION REQUIREMENTS and SCHEDULE High quality, original papers that deal with real-world problems are solicitated. All the submitted manuscripts will be subject to a rigorous review process. Manuscripts should be prepared in accordance with the IEEE Expert "submission guidelines". Manuscripts should be approximately 5,000 words long, preferably not exceeding 10 references. This special issue is scheduled to appear in late 1997. Important Dates: Submission April 30 (FIRM DEADLINE) Notification June 30 Prospective authors should submit six copies of the completed manuscript to one of the guest editors: Huan Liu Hiroshi Motoda S16 #4-17 Institute of Scientific & Industrial Dept of Info Sys & Comp Sci Research National University of Singapore Osaka University Kent Ridge, Singapore, 119260 Ibaraki, Osaka 567, Japan [email protected] [email protected] >~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Sat, 29 Mar 1997 12:08:21 +0800 From: Ong Hwee Leng <[email protected]> Subject: WinViz for Excel A version of WinViz which runs with Excel 7.0 on Win95 is available for sale. WinViz is a multi-dimensional visualisation tool developed at the Information Technology Institute. More info & self-running demos can be found at http://jsaic.iti.gov.sg/projects/vizMain.html -Hwee-Leng Ong >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 24 Mar 1997 09:39:26 +0600 From: [email protected] (Warren Jones) Knowledge Discovery Research at University of Alabama at Birmingham (UAB) URL:http://www.cis.uab.edu/info/kdrg/kdrg.html This multidisciplinary research group is concentrating on healthcare applications, specifically on surveillance problems. The group consists of representatives from Computer and Information Sciences, Pathology and Health Informatics. A tool called Hawkeye has been developed which searches temporally organized medical data, builds associations and applies interestingness heuristics for the identification of trends of interest to medical domain experts. Hawkeye is also an example of a large scalable KDD system which requires the utilization of all stages of the KDD process. One of the important surveillance problems being investigated is the spread of antibiotic resistance. This Group provides a very attractive opportunity for UAB computer science graduate students to become involved in KDD research with a medical emphasis. Four Ph.D. students are currently associated with the Group and its on-going research. Graduate Assistantships are available for prospective Ph.D.students who are interested in entering the program Fall 1997 with a research interest in the directions of the Knowledge Discovery Research Group. UAB is a comprehensive urban institution in Alabama's largest city of almost a million population. Student enrollment exceeds 16,400, including more than 3,500 graduate students. The Academic Health Center is well-known for its interdisciplinary biomedical research. The computer science graduate program has an enrollment of 50, half of which are Ph.D. students. The campus encompasses a seventy-block area on Birmingham's Southside, offering all of the advantages of a university within a major city. Warren T. Jones, Ph.D. Chair Department of Computer and Information Sciences University of Alabama at Birmingham Birmingham, AL 35294-1170 Ph: (205)934-8657 Fax: (205)934-5473 [email protected] >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: Robert Straughan <[email protected]> Subject: Senior Consultant in Data Mining at NSRC in Singapore Date: Sat, 5 Apr 1997 09:06:47 +0800 (SGT) Staff Title: Group Leader - Senior Consultant, Commercial Applications Date Required: 1 June 1997 Job Description: National Supercomputing Research Centre (NSRC) is Singapore's national centre for High Performance Computing (HPC). NSRC currently facilitates services and solutions to the Singapore industry in the field of Computer Aided Engineering, Chemical Applications and Electronics. Commercial Applications has been identified as a new growth area, where HPC can make a significant impact on the commercial industries' competitiveness. NSRC has therefore decided to expand into this field and is currently looking for a person with extensive industrial experience in the field of Data Mining within finance, banking, insurance, or retail marketing. The Group Leader shall take overall responsibility in promoting NSRC's capabilities within the field of Data Mining to the commercial industry in Singapore and to solicit for business. The Group Leader shall work closely with NSRC's existing staff within this field to develop the best possible strategy to target potential commercial organisations. Skills Required: Minimum Masters Degree. Specialisation within the field of Computer Science and Business Administration. At least 5 years experience from a financial institution or in retail marketing within the field of Data Mining / Data Analysis. Extensive managerial experience, in particular project management, business analysis and negotiation skills. Strong knowledge of statistical analysis and selection / building of appropriate modelling techniques to solve business problems. A good understanding of the algorithms used in Data Mining (neural networks, classifications etc.). Have previously used IBM SP2 and tools such as Intelligent Miner and Darwin as well as statistical packages such as SAS and SPSS. Relocation assistance, allowances for housing, children's education and transportation apply. Salary will be commensurate with qualifications and experience. You can obtain more details by contacting [email protected] or visit our web site at http://www.nsrc.nus.sg. Resumes can be sent to: Administration Manager NSRC 89 Science Park Drive The Rutherford #01-05/08 Singapore 118261 >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: [email protected] Date: Sun, 23 Mar 97 22:45 EST Subject: Modern Regression and Classification course - New York ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++ +++ +++ Modern Regression and Classification: +++ +++ +++ +++ Statistical prediction methods for finance +++ +++ and marketing +++ +++ +++ +++ +++ +++ New York City: June 23-24, 1997 +++ +++ +++ +++ Trevor Hastie, Stanford University +++ +++ Rob Tibshirani, University of Toronto +++ +++ +++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ This two-day course will give a detailed overview of statistical models for regression and classification. Known as machine-learning in computer science and artificial intelligence, and pattern recognition in engineering, this is a hot field with powerful applications in finance, science and industry. This course covers a wide range of models from linear regression through various classes of more flexible models to fully nonparametric regression models, both for the regression problem and for classification. This special version of our popular MRC course is tailored to financial and marketing professionals. Although a firm theoretical motivation will be presented, the emphasis will be on practical applications and implementations, especially in the finance and marketing areas. The course will include many examples and case studies, and participants should leave the course well-armed to tackle real problems with realistic tools. The instructors are at the forefront in research in this area. After a brief overview of linear regression tools, methods for one-dimensional and multi-dimensional smoothing are presented, as well as techniques that assume a specific structure for the regression function. These include splines, wavelets, additive models, MARS (multivariate adaptive regression splines), projection pursuit regression, neural networks and regression trees. All of these can be adapted to the time-series framework for predicting future trends from the past. The same hierarchy of techniques is available for classification problems. Classical tools such as linear discriminant analysis and logistic regression can be enriched to account for nonlinearities and interactions. Generalized additive models and flexible discriminant analysis, neural networks and radial basis functions, classification trees and kernel estimates are all such generalizations. Other specialized techniques for classification including nearest- neighbor rules and learning vector quantization will also be covered. Apart from describing these techniques and their applications to a wide range of problems, the course will also cover model selection techniques, such as cross-validation and the bootstrap, and diagnostic techniques for model assessment. Software for these techniques will be illustrated, and a comprehensive set of course notes will be provided to each attendee. Additional information is available at the Website: http://stat.stanford.edu/~trevor/mrc.finance.html ******************************************************** Some quotes from past attendees: "... the best presentation by professional statisticians I have ever had the pleasure of attending" "Superior to most courses in all aspects" "I really liked how you emphasized concepts rather than mathematical expressions" "Your 2-day course has saved me months of research" *********************************************************** ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Rob Tibshirani, Dept of Preventive Med & Biostats, and Dept of Statistics Univ of Toronto, Toronto, Canada M5S 1A8. Phone: 416-978-4642 (PMB), 416-978-0673 (stats). FAX: 416 978-8299 computer fax 416-978-1525 (please call or email me to inform) [email protected]. ftp: //utstat.toronto.edu/pub/tibs http://www.utstat.toronto.edu/~tibs +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: [email protected] Date: Mon, 31 Mar 1997 13:15:16 -0500 (EST) Subject: PADD97 PADD97 - The First International Conference and Exhibition on ==================================================== The Practical Application of Knowledge Discovery and Data Mining ========================================================= 23rd April - 25th April 1997 REGISTRATION http://www.demon.co.uk/ar/Expo97/ INFORMATION http://www.demon.co.uk/ar/PADD97/ TUTORIALS Usama Fayyad, Microsoft Research, USA Evangelos Simoudis, IBM, USA DATA Mining and the KDD Process Blaise Egan, Huw Roberts, BT Laboratories, UK Knowledge Discovery - Practical Methodology and Case Studies Luc De Raedt, Catholic University of Leuven, Belgium Principles and Practice of Inductive Logic Programming INVITED SPEAKERS Stephen Muggleton, Oxford University, UK Declarative Knowledge Discovery in Industrial Databases Usama Fayyad, Microsoft Research, USA Data Mining: Algorithms, Challenges and Limitations Xindong Wu, Monash University, Australia Building Intelligent Learning Database Systems Stephen Pass, Red Brick Systems, UK Data Mining and Data Warehouses - The Power of Integration Neil Mackin, White Cross Systems, UK The Application of WhiteCross MPP Servers to Data Mining PRACTICAL APPLICATION EXPO97 ============================== CONFERENCE REGISTRATION ========================= Westminster Central Hall, London, 21-25 April, 1997 PADD97 is part of The Practical Application EXPO97 which brings together four events under one roof: PAAM97 - The Practical Application of Intelligent Agents and Multi-Agents; PADD97- The Practical Application of Knowledge Discovery and Data Mining; PACT97-The Practical Application of Constraint Technology and PAP97-The Practical Application of Prolog. REGISTRATION NOW AVAILABLE AT http://www.demon.co.uk/ar/Expo97/ PLEASE VISIT OUR WEB PAGES FOR FURTHER INFORMATION ON Programmes Tutorials Invited Talks Exhibition Venue Hotel reservations http://www.demon.co.uk/ar/PAP97/ http://www.demon.co.uk/ar/PACT97/ http://www.demon.co.uk/ar/PAAM97/ http://www.demon.co.uk/ar/PADD97/ The Practical Application Company PO Box 137 Blackpool Lancs FY2 9UN UK Tel: +44 (0)1253 358081 Fax: +44 (0)1253 353811 email: [email protected] WWW: http://www.demon.co.uk/ar/TPAC/ >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 31 Mar 97 12:50:10 -0600 (CST) From: Melinda Conkling <[email protected]> Subject: Data warehousing event Hi -- The Data Warehousing Institute (www.dw-institute.com) is holding its Best Practices & Implementation Conference in Chicago May 27-June 1, 1997. All conference information (including how to register) can be found on-line. Thanks! -- Melinda >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Thu, 10 April Mar 1997 17:48:34 -0500 From: Gregory Piatetsky-Shapiro <[email protected]> Subject: Paris Data Mining'97 Event, June 2-4 -- cancelled I have been informed by Gaelle Piernikarch, organizer of the above conference, that it has been cancelled and may be rescheduled for fall. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
410.22	97:13	IJSAPL::OLTHOF	Spellchecked Henry Although	`Wed Apr 23 1997 11:46`	655
	Knowledge Discovery Nuggets 97:13, e-mailed 97-04-16 News: * GPS, new address for subscribing to KD nuggets, [email protected] * G. Prisco, Query: Knowledge Discovery in Network Alarm Databases Publications: * J. Fuernkranz, AAI Spec Issue on First-Order Knowledge Discovery in Databases, http://www.ai.univie.ac.at/ilp_kdd/aai-si.html * T. Anand, Review of "Seven Methods for Transforming Corporate Data into Business Intelligence" by Vasant Dhar and Roger Stein * S. Kaski, Thesis on data exploration with SOMs available, http://nucleus.hut.fi/~sami/thesis/thesis.html Siftware: * L. Zoob, SemioMap, the Discovery Search Application http://www.semio.com * S.D. BYERS, new version of ace.glm for Splus http://lib.stat.cmu.edu/S/ace.glm Positions: * R. Straughan, Senior Consultant in Data Mining at NSRC in Singapore http://www.nsrc.nus.sg * N. Dayanand, Manager of the Data Analysis and Applications group http://www.think.com Meetings: * J. Komorowski, PKDD'97 -- Preliminary symposium program, http://www.idt.ntnu.no/pkdd97/ * ICML-Colt, ICML-97/Colt-97 call for participation http://cswww.vuse.vanderbilt.edu/~mlccolt/ * X. Wu, CFP: IEEE Knowledge and Data Engineering Exchange Workshop (KDEX-97), Nov 3, 1997, Newport Beach, CA, USA http://www.sd.monash.edu.au/kdex-97 * M. Smyth, Hinton -- Jordan Learning Methods course: spaces still available, http://www.ai.mit.edu/projects/cbcl/web-pis/jordan/course/ -- Knowledge Discovery Nuggets is a free electronic newsletter for the Data Mining and Knowledge Discovery community, focusing on the latest research and applications. Submissions are most welcome and should be emailed, with a DESCRIPTIVE subject line (and a URL) to [email protected]. Please keep CFP and meetings announcements short and provide a URL for details. To subscribe, see http://www.kdnuggets.com/subscribe.html KD Nuggets frequency is 3-4 times a month. Back issues of KD Nuggets, a catalog of data mining tools ("Siftware"), pointers to Data Mining Companies, Relevant Websites, Meetings, and more is available at Knowledge Discovery Mine site at http://www.kdnuggets.com/ -- Gregory Piatetsky-Shapiro (editor) [email protected] ******************* Official disclaimer *********************** All opinions expressed herein are those of the contributors and not necessarily of their respective employers (or of KD Nuggets) ******************************************************************* ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2 is not equal to 3 - not even for very large values of 2. Grabel's Law >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Wed, 16 Apr 1997 09:41:10 -0500 (EST) From: Gregory Piatetsky-Shapiro <[email protected]> Subject: New address for subscribing to KD Nuggets -- [email protected] Thanks to many of you for the good words about Nuggets. Last week I have completed the transfer of Nuggets server (now called Knowledge Discovery Nuggets rather than KDD Nuggets to emphasize the broader scope) to kdnuggets.com site. To subscribe, please email to [email protected] 1-line message with subscribe kdnuggets (to unsubscribe, message should be unsubscribe kdnuggets) See http://www.kdnuggets.com/subscribe.html for details. Please address all submissions for Nuggets to [email protected] ; Email to the old Nuggets address [email protected] will probably be forwarded to [email protected] for some time, but it is better to send email to the new address. -- GPS >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 14 Apr 97 12:48:49 PDT From: Giuseppe Prisco <[email protected]> Subject: Knowledge Discovery in Switching Network Alarm Databases We are interested in the application of KDD methods to a public switching network alarm database. Our goal is to improve maintenance and severe alarm prevention. Our research started studying TASA System experience and their sequence analysis algorithm. Any help would be appreciated, in particular: - suggestions, experiences etc. - suggestions about (eventually free) software for searching significant sequences. - contacts with any Italian University, in order to start a possible thesis work on that topic. Thank you _________________________________________ Giuseppe Prisco - Software Analyst Telesoft s.p.a SPR/SSCT Via degli Agrostemmi, 30 S.Palomba - Roma 00040 tel 06/71035723 email [email protected] >~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Tue, 01 Apr 1997 12:50:19 +0200 From: Johannes Fuernkranz <[email protected]> 2nd Call For Papers Applied Artificial Intelligence Special issue on First-Order Knowledge Discovery in Databases (URL: http://www.ai.univie.ac.at/ilp_kdd/aai-si.html) A recent MLnet Workshop, held at the ICML-96, focussed on a discussion of the potential contribution of ILP for KDD. Information on the workshop including a short summary and all accepted papers can be found at http://www.ai.univie.ac.at/ilp_kdd/. The general conclusion was that ILP can be a valuable tool for data mining, its main advantages being the expressiveness of first-order logic as a representation language and the ability of many ILP systems to use strong language biases for restricting the huge search space. ILP has a high flexibility in incorporating various forms of background knowledge, which can be invaluable for large KDD tasks. The special issue on "First-Order Knowledge Discovery in Databases" of the Applied Artificial Intelligence Journal will thus welcome papers that focus on one or more of the following topics: * Embedding ILP into the KDD process * Necessary pre- and post-processing steps for real-world applications * Interfacing ILP systems with database managers * Scalability of ILP for real-world databases * Criteria for quantifying the complexity of ILP problems * Evaluation of gain and price of ILP versus propositional learning * Non-classification learning and discovery in a first-order framework * Benefits of using background knowledge and/or strong explicit biases * Innovative real-world applications of ILP Papers on related subjects are also welcome, but a strong focus on applications and database issues is required for all submissions. see http://www.ai.univie.ac.at/ilp_kdd/aai-si.html for full details on Submissions Submission Deadline: April 30, 1997 [edited for space. GPS] >~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: "Anand, Tej" <[email protected]> Subject: book review for Nuggets Date: Fri, 4 Apr 1997 16:58:14 -0500 Book Review: "Seven Methods for Transforming Corporate Data into Business Intelligence" by Vasant Dhar and Roger Stein, (Prentice-Hall, 1997). (see http://www.prenhall.com/allbooks/be_0132820064.html for more on this book. GPS) It has been quite a while since I have been able to read a technical/business book in its entirety, but recently I accomplished this feat with "Seven Methods for Transforming Corporate Data into Business Intelligence" by Vasant Dhar and Roger Stein. Usually I am unable to complete a technical/business book because either it is so high-level (and abstract) that I cannot appreciate how the material would apply to me, or it is so detailed that I am totally lost "in the trees". Seven Methods... is different. This short book starts off by providing a framework for representing objectives and requirements for "intelligent systems" (systems that embed AI techniques or systems that explicitly represent knowledge) using a business oriented vocabulary. This framework not only helps select the "appropriate" technique but it helps in formulating the problem that makes that selection transparent. The business vocabulary helps explain the selection to management and business types. The book then describes seven data-intensive modeling techniques (tree induction, analogical reasoning, fuzzy logic, rule-based systems, neural nets, genetic algorithms, and OLAP) using the framework. While these chapters are written to enable business-oriented people to get a quick understanding of the techniques, they are also great for technical folks because they can provide us knowledge about techniques in which we are not experts. All techniques are treated with uniform depth, which makes it a handy reference. The explanation of the techniques is highly visual with almost every other page containing a high quality graphic that explains how the techniques work. One quibble: Chapter 10, titled Machine Learning, could have been more aptly titled "Tree Induction". The book ends with seven detailed (8-10 pages each) case studies of successful applications of each of the techniques. Each case study is described using the same framework. This is where the rubber meets the road, and for the seven case studies selected the framework holds up very well. My only real complaint with this book is that it does not talk about using multiple techniques together. Btw: I felt this book was so well written that I promptly lent it to my manager for weekend reading. Disclaimer: Although we have never worked together, Roger Stein and I for a brief time shared the same employer: Dun & Bradstreet, Roger at Moody's and I at A.C Nielsen. One of the case studies is about Spotlight, a system with which I was associated. -Tej Anand NCR Corporation Human Interface Technology Center >~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Sun, 6 Apr 1997 21:54:10 +0300 From: Sami Kaski <[email protected]> Subject: Thesis on data exploration with SOMs available The following Dr.Tech. thesis is available at http://nucleus.hut.fi/~sami/thesis/thesis.html (html-version) http://nucleus.hut.fi/~sami/thesis.ps.gz (compressed postscript, 300K) http://nucleus.hut.fi/~sami/thesis.ps (postscript, 2M) The articles that belong to the thesis can be accessed through the page http://nucleus.hut.fi/~sami/thesis/node3.html Data Exploration Using Self-Organizing Maps Samuel Kaski Helsinki University of Technology Neural Networks Research Centre P.O.Box 2200 (Rakentajanaukio 2C) FIN-02015 HUT, Finland Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and time-consuming. Interesting, novel relations between the data items may be hidden in the data. The self-organizing map (SOM) algorithm of Kohonen can be used to aid the exploration: the structures in the data sets can be illustrated on special map displays. In this work, the methodology of using SOMs for exploratory data analysis or data mining is reviewed and developed further. The properties of the maps are compared with the properties of related methods intended for visualizing high-dimensional multivariate data sets. In a set of case studies the SOM algorithm is applied to analyzing electroencephalograms, to illustrating structures of the standard of living in the world, and to organizing full-text document collections. Measures are proposed for evaluating the quality of different types of maps in representing a given data set, and for measuring the robustness of the illustrations the maps produce. The same measures may also be used for comparing the knowledge that different maps represent. Feature extraction must in general be tailored to the application, as is done in the case studies. There exists, however, an algorithm called the adaptive-subspace self-organizing map, recently developed by Kohonen, which may be of help. It extracts invariant features automatically from a data set. The algorithm is here characterized in terms of an objective function, and demonstrated to be able to identify input patterns subject to different transformations. Moreover, it could also aid in feature exploration: the kernels that the algorithm creates to achieve invariance can be illustrated on map displays similar to those that are used for illustrating the data sets. >~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Thu, 10 Apr 1997 17:43:04 -0700 From: Laurie Zoob <[email protected]> Subject: SemioMap, the Discovery Search Application Semio Corporation, a newly formed start-up company, is using computational semiotics to identify patterns and relationships in text-based information on the internet and intranet. Using data visualization, the relationships are automatically displayed in a graphical, navigable map. There is a working alpha version/early beta of the software at http://www.semio.com. The initial product is called, SemioMap, the Discovery Search application. SemioMap is targeted toward the corporate intranet market. We are currently seeking data mining, knowledge discovery and data base oriented companies as development partners. If you are interested in receiving more information, please email me at [email protected]. Best, Laurie Zoob Director, Business Development -- :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Laurie Zoob Phone: (415) 802-2943 Director Business Development Fax: (415) 802-2942 Semio Corporation Email: [email protected] One Dolphin Drive http://www.semio.com Redwood Shores, CA 94065 ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: >~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Wed, 26 Mar 1997 13:07:39 -0800 (PST) From: "S.D. BYERS" <[email protected]> Subject: new version of ace.glm Dear Splus and GLM users, I have written a new version of ace.glm for Splus and it is now available in the S archive at Statlib at http://lib.stat.cmu.edu/S/ace.glm This simple function performs the ACE transformation detection algorithm for generalized linear models using the weighted linear model obtained from the GLM at convergence of the fitting algorithm. It generalizes ace.logit, ACE for logistic regression. A paper describing ace.logit and its uses can be found at http://www.stat.washington.edu/tech.reports/raftery-richardson.ps These functions can be powerful tools in Generalised Linear Modelling. The new ace.glm will work for any GLM that has a family defined in Splus. It will also work for any link function defined for these families. Previously, ace.glm worked only for the canonical link function. By default, ace.glm will pleasantly plot your ACE output if a graphics device is open. I would like to hear about any use/abuse/errors that may arise. Thanks, Simon Byers, University of Washington Statistics. [email protected] >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: Robert Straughan <[email protected]> Subject: Senior Consultant in Data Mining at NSRC in Singapore Date: Sat, 5 Apr 1997 09:06:47 +0800 (SGT) Staff Title: Group Leader - Senior Consultant, Commercial Applications Date Required: 1 June 1997 Job Description: National Supercomputing Research Centre (NSRC) is Singapore's national centre for High Performance Computing (HPC). NSRC currently facilitates services and solutions to the Singapore industry in the field of Computer Aided Engineering, Chemical Applications and Electronics. Commercial Applications has been identified as a new growth area, where HPC can make a significant impact on the commercial industries' competitiveness. NSRC has therefore decided to expand into this field and is currently looking for a person with extensive industrial experience in the field of Data Mining within finance, banking, insurance, or retail marketing. The Group Leader shall take overall responsibility in promoting NSRC's capabilities within the field of Data Mining to the commercial industry in Singapore and to solicit for business. The Group Leader shall work closely with NSRC's existing staff within this field to develop the best possible strategy to target potential commercial organisations. Skills Required: Minimum Masters Degree. Specialisation within the field of Computer Science and Business Administration. At least 5 years experience from a financial institution or in retail marketing within the field of Data Mining / Data Analysis. Extensive managerial experience, in particular project management, business analysis and negotiation skills. Strong knowledge of statistical analysis and selection / building of appropriate modelling techniques to solve business problems. A good understanding of the algorithms used in Data Mining (neural networks, classifications etc.). Have previously used IBM SP2 and tools such as Intelligent Miner and Darwin as well as statistical packages such as SAS and SPSS. Relocation assistance, allowances for housing, children's education and transportation apply. Salary will be commensurate with qualifications and experience. You can obtain more details by contacting [email protected] or visit our web site at http://www.nsrc.nus.sg. Resumes can be sent to: Administration Manager NSRC 89 Science Park Drive The Rutherford #01-05/08 Singapore 118261 >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Fri, 04 Apr 1997 14:41:09 -0500 From: Nalini Dayanand <[email protected]> Subject: Job Announcement-Please post THINKING MACHINES CORPORATION is a leading provider of knowledge discovery software and services. TMC's high end datamining software suite enables users to extract meaningful information from large databases. For more information please see http://www.think.com. The company is seeking an individual to join the development organization as Manager of the Data Analysis and Applications group. The manager of the data analysis and applications group will provide leadership and individual contribution in the design, development and deployment of data mining applications, prototypes and application frameworks. Responsibilities include * working with product marketing and clients to identify opportunities for data mining applications * providing leadership and individual contribution in requirements definition and application/prototype/framework development * organizing and managing a team of analysts, software engineers and technology engineers responsible for the development of specific applications/prototypes/frameworks * providing feedback to the development organization on potential enhancements to existing products Experience in a telecommunications and/or financial services is desirable but not essential. If you background and interests match these expectations, please send your resume via fax, email or regular mail to Nalini Dayanand Thinking Machines Corporation 14 Crosby Drive Bedford, MA 01730 Fax: (617) 276-0444 email: [email protected] >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: Jan Komorowski <[email protected]> Subject: PKDD'97 -- Preliminary symposium program PKDD'97 -- 1st European Symposium on Principles of Data Mining and Knowledge Discovery, Trondheim, Norway, June 24-27, 1997. Preliminary symposium program and registration information: http://www.idt.ntnu.no/pkdd97/ >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Thu, 10 Apr 97 15:04:39 CDT From: [email protected] (ICML-COLT Administration) Subject: COLT/ICML Call for Participation Tenth Annual Conference on Fourteenth International Computational Learning Theory Conference on Machine Learning (COLT-97) (ICML-97) July 6-9 July 8-11 COLT/ICML Tutorials on July 8 ICML-affiliated Workshops on July 12 Vanderbilt University Nashville, Tennessee, USA The organizers of COLT-97 and ICML-97 invite you to participate in one or both of these conferences. In hopes of encouraging interactions between the learning theory and machine learning communities, the conferences are loosely coupled by joint tutorials, a day of joint technical sessions, a joint banquet, and otherwise through co-location at Vanderbilt University in Nashville, Tennessee. Find all the latest information about COLT-97 and ICML-97 at http://cswww.vuse.vanderbilt.edu/~mlccolt/, including lists of papers to be presented, registration and housing material, information on tutorials and workshops, invited speakers, travel, and the like. You may also obtain registration and housing material by writing to [email protected]. -------------------- Registration costs and applicable dates are: Early Late (until June 2) (after June 2) COLT $140 $180 ICML $140 $180 COLT/ICML $240 $310 -------------------- Registration for one of three ICML-affiliated Workshops on (1) reinforcement learning, (2) automata induction, grammatical inference, and language acquisition, or (3) machine learning application in the real world is $25 until June 2, and $35 after June 2. -------------------- ICML-97 acknowledges generous support from the Daimler-Benz Corporation. COLT-97 acknowledges generous support from ATT and is held in cooperation with ACM SIGACT and SIGART. Both conferences are sponsored by Vanderbilt University. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Fri, 11 Apr 1997 11:03:04 +1000 (EST) From: [email protected] (Xindong Wu) Subject: CFP: IEEE KDEX-97 1997 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX-97) -------------------------------------------------------------------- Sponsored by the IEEE Computer Society and Co-located with the 9th IEEE Tools with Artificial Intelligence Conference November 3, 1997, Newport Beach, California, U.S.A. =================================================== Call for Papers The 1997 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX-97) will provide an international forum for researchers, educators and practitioners to exchange and evaluate information and experiences related to state-of-the-art issues and trends in the areas of artificial intelligence and databases. The goal of this workshop is to expedite technology transfer from researchers to practitioners, to assess the impact of emerging technologies on current research directions, and to identify emerging research opportunities. Educators will present material and techniques for effectively transferring state-of-the-art knowledge and data engineering technologies to students and professionals. The workshop is currently scheduled for an one-day duration, but depending on the final program it might be extended to a second day. Submissions can be in the form of survey papers, experience reports, and educational material to facilitate technology transfer. Accepted papers will be published in the workshop proceedings by the IEEE Computer Society. A selected number of the accepted papers will possibly be expanded and revised for publication in the IEEE Transactions on Knowledge and Data Engineering (IEEE-TKDE) and the International Journal of Artificial Intelligence Tools. Educational material related to papers published in the IEEE-TKDE will be posted on the IEEE-TKDE home page. The theme of the workshop is "AI MEETS DATABASES". Topics of interest include, but are not limited to: - Computer supported cooperative processing and interoperable systems - Data sharing, data warehousing and meta-data management - Distributed intelligent mediators and agents - Distributed object management - Dynamic knowledge - Evaluation and measurement of knowledge and database systems - High-performance issues (including architectures, knowledge representation techniques, inference mechanisms, algorithms and integration methods) - Information structures and interaction - Intelligent search, data mining and content-based retrieval - Knowledge and data engineering systems - Quality assurance for knowledge and data engineering systems (correctness, reliability, security, survivability and performance) - Software re-engineering and intelligent software information systems - Spatio-temporal, active, mobile and multimedia data - Emerging applications (biomedical systems, decision support, geographical databases, Internet technologies and applications, digital libraries, etc.) All submissions should be limited to a maximum of 5,000 words. Six hardcopies should be forwarded to the following address. Xindong Wu (KDEX-97) Department of Software Development Monash University 900 Dandenong Road Caulfield East, Melbourne 3145 Australia Phone: +61 3 9903 1025 Fax: +61 3 9903 1077 E-mail: [email protected] Please include a cover page containing the title, authors (names, postal and email addresses, telephone and fax numbers), and an abstract. This cover page must accompany the paper. ********** I m p o r t a n t D a t e s *************** * 6 copies of full papers received by: June 15, 1997 * * acceptance/rejection notices: July 31, 1997 * * final camera-readies due by: August 31, 1997 * * workshop: November 3, 1997 * ********************************************************** Further Information =================== WWW: http://www.sd.monash.edu.au/kdex-97 >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: Marney Smyth <[email protected]> Subject: Hinton -- Jordan Learning Methods course : spaces still available Date: Thu, 10 Apr 1997 07:38:25 -0400 (EDT) some spaces still available ... ********************************************************** * * * Learning Methods for Prediction, Classification, * * Novelty Detection and Time Series Analysis * * * * Washington, D.C., May 2 -- 3, 1997 * * * * Geoffrey Hinton, University of Toronto * * Michael Jordan, Massachusetts Inst. of Tech. * * * ************************************************************ A two-day intensive Tutorial on Advanced Learning Methods will be held May 2 -- 3rd, 1997, at the Hyatt Regency on Capitol Hill, Washington D.C. Space is available for up to 50 participants for the course. The course will provide an in-depth discussion of the large collection of new tools that have become available in recent years for developing autonomous learning systems and for aiding in the analysis of complex multivariate data. These tools include neural networks, hidden Markov models, belief networks, decision trees, memory-based methods, as well as increasingly sophisticated combinations of these architectures. Applications include prediction, classification, fault detection, time series analysis, diagnosis, optimization, system identification and control, exploratory data analysis and many other problems in statistics, machine learning and data mining. (edited for space) ADDITIONAL INFORMATION A registration form is available from the course's WWW page at http://www.ai.mit.edu/projects/cbcl/web-pis/jordan/course/ Marney Smyth E-mail: [email protected] Phone: 617 258-8928 Fax: 617 258-6779
410.23	97:14	IJSAPL::OLTHOF	Spellchecked Henry Although	`Thu Apr 24 1997 11:47`	539
	Knowledge Discovery Nuggets 97:14, e-mailed 97-04-23 News: * E. Bertino, Query: data mining from wafers manufacturing process ? Publications: * M. Ramoni, Technical Reports on Bayesian Knowledge Discovery, http://kmi.open.ac.uk/~marco/projects/kdd * Tom Mitchell, Text book for Data Mining: Machine Learning http://www.cs.cmu.edu/~tom/mlbook.html Siftware: * R. Quinlan, Windows Version of C5.0 ("See5") Available Now http://www.rulequest.com * Stanley Rice, Postcoordinate Software http://www.cruzio.com/~autospec/darwin.htm * Pamela Lerwick, IDIS Special Release http://www.datamining.com Positions: * R. King, Ph.D. Studentships in Data Mining at University of Wales, UK * Fred J. Damerau, Research Associate in Text Mining/Information Extraction -- Knowledge Discovery Nuggets is a free electronic newsletter for the Data Mining and Knowledge Discovery community, focusing on the latest research and applications. Submissions are most welcome and should be emailed, with a DESCRIPTIVE subject line (and a URL) to [email protected]. Please keep CFP and meetings announcements short and provide a URL for details. To subscribe, see http://www.kdnuggets.com/subscribe.html KD Nuggets frequency is 3-4 times a month. Back issues of KD Nuggets, a catalog of data mining tools ("Siftware"), pointers to Data Mining Companies, Relevant Websites, Meetings, and more is available at Knowledge Discovery Mine site at http://www.kdnuggets.com/ -- Gregory Piatetsky-Shapiro (editor) [email protected] ******************* Official disclaimer *********************** All opinions expressed herein are those of the contributors and not necessarily of their respective employers (or of KD Nuggets) ******************************************************************* ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Restlessness and discontent are the necessities of progress. --Thomas A. Edison >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: [email protected] Date: Thu, 17 Apr 1997 09:44:45 +0200 (METDST) Subject: data mining from wafers manufacturing process At our University, we are starting an application project dealing with data from a wafers manifacturing process. We are thinking to use data mining techniques for try to address the following problem. Some of those wafers are faulty. There is a database keeping track of the entire manifacturing process for each wafer and collecting large amount of data concerning each step of the manifacturing process (there are about 300 steps; each step is characterized about 100 parameters). Our problem is use data mining techniques in helping the diagnosis, that is, to see which step may have caused the problem. I was wondering whether you are aware of any use of data mining techniques for similar problems. We have also to acquire some suitable data mining tools. I would appreciate any suggestion you may give me on this issue. Best regards Elisa ---------------------------------------------------------------------------- --- Prof. Elisa Bertino Dipartimento di Scienze dell'Informazione Universita' di Milano Via Comelico 39/41 20135 Milano (Italy) tel: (+39)2-55006227 fax: (+39)2-55006253 e-mail: [email protected] [email protected] www http://mercurio.sm.dsi.unimi.it/~bertino/ >~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Wed, 9 Apr 1997 19:23:44 +0100 From: Marco Ramoni <[email protected]> Subject: Technical Reports Available The following reports are available on the World Wide Web. Further information about the Bayesian Knowledge Discovery Project can be reached at http://kmi.open.ac.uk/~marco/projects/kdd Marco ____________________________________________________________________________ __ Title: Efficient Parameter Learning in Bayesian Networks from Incomplete Databases Authors: Marco Ramoni [1] and Paola Sebastiani [2] 1.Knowledge Media Institute, The Open University. 2.Department of Actuarial Science and Statistics, City University. TR number: KMI-TR-41 Date: January 1997 Keywords: Bayesian Belief Networks; Machine Learning, Probabilistic Reasoning, Missing Data. Abstract: Current methods to learn conditional probabilities from incomplete databases use a common strategy: they complete the database by inferring somehow the missing data from the available information and then learn from the completed database. This paper introduces a new method - called bound and collapse (BC) - which does not follow this strategy. BC starts by bounding the set of estimates consistent with the available information and then collapses the resulting set to a point estimate via a convex combination of the extreme points, with weights depending on the assumed pattern of missing data. Experiments comparing BC to the Gibbs Samplings are also provided. WWW: http://kmi.open.ac.uk/kmi-abstracts/kmi-tr-41-abstract.html ____________________________________________________________________________ __ Title: Learning Bayesian Networks from Incomplete Databases Authors: Marco Ramoni [1] and Paola Sebastiani [2] 1.Knowledge Media Institute, The Open University. 2.Department of Actuarial Science and Statistics, City University. Reference: Technical Report KMI-TR-43 Date: February 1997 Keywords: Bayesian Belief Networks, Bayesian Learning, Missing Data, Model Selection Abstract: Bayesian approaches to learn the graphical structure of Bayesian Belief Networks (BBNs) from databases share the assumption that the database is complete, that is, no entry is reported as unknown. Attempts to relax this assumption often involve the use of expensive iterative methods to discriminate among different structures. This paper introduces a deterministic method to learn the graphical structure of a BBN from a possibly incomplete database. Experimental evaluations show a significant robustness of this method and a remarkable independence of its execution time from the number of missing data. WWW: http://kmi.open.ac.uk/kmi-abstracts/kmi-tr-43-abstract.html ____________________________________________________________________________ _ Title: The Use of Exogenous Knowledge to Learn Bayesian Networks from Incomplete Databases Authors: Marco Ramoni [1] and Paola Sebastiani [2] 1.Knowledge Media Institute, The Open University. 2.Department of Actuarial Science and Statistics, City University. TR number: KMI-TR-44 Date: February 1997 Keywords: Information extraction, Uncertainty and noise in data, Bayesian inference. Abstract: Current methods to learn Bayesian Networks from incomplete databases share the common assumption that the unreported data are missing at random. This paper describes a method - called Bound and Collapse (BC) - to learn Bayesian Networks from incomplete databases which allows the analyst to efficiently integrate the information provided by the database and the exogenous knowledge about the pattern of missing data. BC starts by bounding he set of estimates consistent with the available information and then collapses the resulting set to a point estimate via a convex combination of the extreme points, with weights depending on the assumed pattern of missing data. Experiments comparing BC to the Gibbs Samplings are also provided. WWW: http://kmi.open.ac.uk/kmi-abstracts/kmi-tr-44-abstract.html ____________________________________________________________________________ Title: Discovering Bayesian Networks in Incomplete Databases Authors: Marco Ramoni [1] and Paola Sebastiani [2] 1.Knowledge Media Institute, The Open University. 2.Department of Actuarial Science and Statistics, City University. TR number: KMI-TR-46 Date: March 1997 Keywords: Information extraction, Uncertainty and noise in data, Bayesian inference. Abstract: Bayesian Belief Networks (BBNs) are becoming increasingly popular in the Knowledge Discovery and Data Mining community. A BBN is defined by a graphical structure of conditional dependencies among the domain variables and a set of probability distributions defining these dependencies. In this way, BBNs provide a compact formalism - grounded in the well-developed mathematics of probability theory - able to predict variable values, explain observations, and visualize dependencies among variables. During the past few years, several efforts have been addressed to develop methods able to extract both the graphical structure and the conditional probabilities of a BBN from a database. All these methods share the assumption that the database at hand is complete, that is, it does not report any entry as unknown. When this assumption fails, these methods have to resort to expensive iterative procedures which are infeasible for large databases. This paper describes a new Knowledge Discovery system based on an efficient method able to extract the graphical structure and the probability distributions of a BBN from possibly incomplete databases. An application using a large real-world database will illustrate methods and concepts underlying the system and will assess its advantages as a Knowledge Discovery system. WWW: http://kmi.open.ac.uk/kmi-abstracts/kmi-tr-46-abstract.html ____________________________________________________________________________ __ Marco Ramoni Knowledge Media Institute Phone: +44-1908-65-5721 The Open University Fax: +44-1908-65-3169 Walton Hall Email: [email protected] Milton Keynes MK7 6AA URL: http://kmi.open.ac.uk/~marco UNITED KINGDOM CUSeeMe: 137.108.81.18 >~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Wed, 16 Apr 1997 10:24:19 -0400 From: Tom Mitchell <[email protected]> Sibject: Text book for Data Mining: Machine Learning by Tom Mitchell DATAMINING TEXTBOOK: Machine Learning, Tom Mitchell, McGraw Hill McGraw Hill announces immediate availability of a new textbook that covers the primary algorithms used in datamining. MACHINE LEARNING provides a thorough, interdisciplinary introduction to the key algorithms used in datamining. Free inspection copies are available for instructors, by contacting Betsy Jones (McGraw Hill) at (630) 789-5057. The chapter outline is: 1. Introduction 2. Concept Learning and the General-to-Specific Ordering 3. Decision Tree Learning 4. Artificial Neural Networks 5. Evaluating Hypotheses 6. Bayesian Learning 7. Computational Learning Theory 8. Instance-Based Learning 9. Genetic Algorithms 10. Learning Sets of Rules 11. Analytical Learning 12. Combining Inductive and Analytical Learning 13. Reinforcement Learning (414 pages) This book is intended for upper-level undergraduates, graduate students, and professionals working in the area of datamining, machine learning, and statistics. The text includes over a hundred homework exercises, along with web-accessible code and datasets (e.g., neural networks applied to face recognition, Bayesian learning applied to text classification). For further information and ordering instructions, see http://www.cs.cmu.edu/~tom/mlbook.html >~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: [email protected] (Ross Quinlan) Date: Wed, 16 Apr 1997 07:47:28 -0400 (EDT) Subject: Windows Version of C5.0 ("See5") Available Now Please see http://www.rulequest.com for details. As with the Unix version, a scaled-down demonstration version is free, and there is also a free 10-day trial of the real thing. Ross >~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [The following is a commercial announcement. GPS] Date: Sat, 19 Apr 97 11:51:52 PDT From: Stanley Rice <[email protected]> Now that spring is sprung, what about tasting some PRECOORDINATE WINES FROM POSTCOORDINATE BOTTLES? ;-) Like the taste of wine, relevance is not objective to us. It is subjective, without crisp definition, dependent on our context, describable only by fuzzy postcoordinations. SIGs as well as individuals recognize relevance only in context. With a little help from our friends we can optimize relevance. But most folks have never even heard the word postcoordination. Precoordinate systems still predominate-- Yahoo categories, single topic and alphabetical filings--at work, at school, and at home. The Internet, AltaVista-style search engines, and Thematic concept filtering will change a lot of that before long. The change may come more smoothly because old precoordinations can be included under postcoordinations, and actually be much enhanced thereby. Just putting the old wine in the new bottles can multiply its bouquet and value. (No, there is nothing for sale here.) Examples of postcoordination possibilities with included fuzzy precoordinations, suited to electronic libraries, corporate intranets (and many other "incoherent" but currently precoordinated collections) are given at: http://www.cruzio.com/~autospec/darwin.htm (Darwin's "The Voyage of the Beagle" is used to illustrate Dewey precoordinations included under postcoordinations.) Want a different kind of example? Consider "Correlating Symptoms and Remedies," which includes uses for various kinds of traditional diagnostic precoordinations: http://www.cruzio.com/~autospec/accessf.htm On the Autospec home page (address below) we look at postcoordination of contextual and conceptual filtering from many points of view. Your reactions are always appreciated. In any case, relax and have another glass. It's spring! ;-) Regards, Stan Rice -- THEMATICS: Conceptual & Marketing Access to Text and Media AUTOSPEC, Inc. Santa Cruz, CA. Stan Rice Voice: (408) 457-1430 Home page for Autospec: http://www.cruzio.com/~autospec/ >~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [The following is a commercial announcement. GPS] Date: Tue, 22 Apr 1997 11:09:49 -0700 From: Pamela Lerwick <[email protected]> Subject: IDIS Special Release Contact: IDI Marketing Communications (310) 936-3600 New Machine-Man Paradigm Refocuses Data Mining Novel Approach Based on Explainable Intranet Documents Introduces New Languages and Techniques for Data Mining ____________________________________________________________________________ _ Los Angeles -- April 21, 1997 The 1997 Database World Conference in Boston will witness the birth of a new computing paradigm for decision support -- certain to affect the way corporations use and benefit from computers. While most computing to date has focused on man-machine interaction, this new and novel approach introduces machine-man interaction. In man-machine systems, humans view machines as "order-takers" -- we tell machines what to do, not help them tell us what they know. This one-way bias is manifest even in the term man-machine itself. While the direction of man-machine systems has been from man to machine, the focus of machine-man interaction is from machine to man, assisting machines to say their piece -- delivering the benefits of the immense knowledge they possess. This does not mean natural language output, but is based on a specific and novel approach to model building, data structuring, language design and information delivery. With a database query language or a programming language, the user types or otherwise inputs a query or program -- the machine then tries to understand it and generate a response. In machine-man interaction, the machine types up a set of statements as an "explainable document" and the user understands them to improve decision making. This dramatic new idea will be first presented at the Database World Conference in Boston, on May 20, 1997 by Dr. Kamran Parsaye, CEO of Information Discovery, Inc. He will discuss the far reaching consequences of this paradigm for corporate computing. The NASA Scientific and Technical Information Program defines a man-machine system as: "A System in which the functions of the man and the machine are interrelated and necessary for the operation of the system." Similarly, Dr. Parsaye defines a machine-man system as: "A System in which the functions of the machine and the man are interrelated and necessary for the thinking of the man." For a machine to tell us anything, it needs a suitable language of expression. It needs to be able to phrase its knowledge in terms of a language understandable by us. When dealing with computer systems, the term "language" has often been used in the context of programming languages and query languages. In machine-man interaction, we need languages that help machines express their knowledge for our benefit -- i.e. knowledge expression languages. Programming and query languages have to be understandable by computers, knowledge expression languages have to be comprehensible to human users -- they are the tools machines use to help us. Dr. Parsaye will illustrate how traditional languages and systems such as SQL or OLAP are inadequate due to their focus on one-way interaction models. Machine-man interaction requires three distinct language facilities: First a language to organize the environment and develop scripts, etc. as one does in any system, second a language to let a developer or analyst define models, set up scenarios and specify terms for the lexicon to be used by the machine (i.e. an interactive document composition language), and third a language to allow the machine to express knowledge (i.e. a knowledge expression language.) Using agent technology on the inter/intranet, machine-man system have a life of their own. They look for patterns with agents, perform discovery and when there is something interesting to say, they generate an "explainable document" on the intranet in plain English (or Italian, French, etc.) accompanied by graphs. Machines need no longer be just order-takers, but can be the finders and communicators of knowledge. The impact of the new paradigm on corporate planning for decision support and data warehousing will be significant. Business users and IS departments need no longer just consider "tools" as a method of data mining, but can rely on automatically generated Java-based explainable documents with rich text and graphic content. This will simultaneously accelerate the use of Java, intranets, data warehousing and data mining. For more information on the Database World Conference please visit DCI at http://www.DCIexpo.com on the internet, or call (508) 470-3870. For more information on Information Discovery, Inc. please visit http://www.datamining.com on the internet or call (310) 937-3600. Pamela Lerwick >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 14 Apr 1997 17:14:00 +0100 From: ROSS DONALD KING <[email protected]> Subject: Ph.D. Studentships Field: data mining, machine learning, ILP, scientific discovery Place: University of Wales, Aberystwyth Wales, UK Applications are invited for Ph.D. Studentships in the area of data mining in the Centre for Intelligent Systems at the Department of Computer Science, University of Wales, Aberystwyth. The Centre for Intelligent Systems has a particular interest in knowledge rich data mining systems, Inductive Logic programming, and applications in biology and chemistry. Applicants should have at least a 2(i) in Computer Science or related subject, with a good background in Artificial Intelligence or Statistics. More information can be obtained from Professor Mark Lee or Dr. Ross D. King Department of Computer Science, University of Wales, Penglais, Aberystwyth, Ceredigion, SY23 3DB, Wales, UK Tel: +44 1970 622420 Fax: +44 1970 622455 Email: [email protected] [email protected] or from the URLs: http://www.aber.ac.uk/~dcswww/Public/Recruitment/Proposals/ http://www.aber.ac.uk/~dcswww/Public/Research/ >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Thu, 17 Apr 97 09:32:42 EDT From: "Fred J. Damerau (862-2214)" <[email protected]> Subject: Research Associate Position in Text Mining/Information Extraction The Natural Language Understanding Group at the IBM T. J. Watson Research Laboratory (Yorktown Heights, NY 10566) is looking for a Research Associate with the qualifications listed below. The position will most likely be initially for one year, but it is renewable. The successful candidate will work on our text mining/ information extraction project, with a particular emphasis on applying machine learning techniques to various issues in document management. The project combines state-of-the-art research on machine learning in text mining with practical production-level systems building. ________________________________________________________________ Qualifications: The ideal candidate would have the following knowledge and experience. Education: MA/MS in computer science or other field with extensive background in computer science. Programming languages: Extensive knowledge and experience in C/C++ required; Java a plus. Specialized Background: Experience in implementing machine learning algorithms and/or natural language processing algorithms. Operating systems: Required: Familiarity with Windows95/NT and Unix/AIX, Helpful: Familiarity with OS/2 System programming/API experience on these operating systems not required. General Software Development: Familiarity with issues of large scale software development, e.g., API design and use, creation and integration of DLLs/Libraries, source code control systems etc. Candidates should send resumes and supporting letters to: Thomas Hampp eMail: [email protected] phone: 914-945-1714 End of message
410.24	97:15	IJSAPL::OLTHOF	Spellchecked Henry Although	`Tue May 06 1997 10:34`	1146
	Knowledge Discovery Nuggets 97:15, e-mailed 97-05-04 News: * R. Uthurusamy, KDD-97 Overview and Tutorials http://www-aig.jpl.nasa.gov/kdd97-docs/kdd97.tutorials.html * R. Uthurusamy, KDD-97 Workshop, Integration of Data Mining and Data Visualization http://www.cs.uml.edu/~grinstei/kddvis-workshop.html * R. Uthurusamy, KDD-97 Registration Information http://www-aig.jpl.nasa.gov/kdd97-docs/registrationinfo.html * Peter Turney, data mining from wafers manufacturing process Siftware: * Nicolas Bissantz, Delta Miner 3.0 http://www.bissantz.de Positions: * Pablo Tamayo, Job Position at Thinking Machines Meetings: * E. Horvitz, Call for participation, UAI-97, http://cuai97.microsoft.com * Gordian Institute, "Making Sense of Data: Computer-Aided Pattern Discovery", July 14-18, Charlottesville, Virginia http://www.gordianknot.com * R. Zicari, COMDEX Internet & OBJECT WORLD Frankfurt`97 (Oct 7-10) http://www.ltt.de -- Knowledge Discovery Nuggets is a free electronic newsletter for the Data Mining and Knowledge Discovery community, focusing on the latest research and applications. Submissions are most welcome and should be emailed, with a DESCRIPTIVE subject line (and a URL) to [email protected]. Please keep CFP and meetings announcements short and provide a URL for details. To subscribe, see http://www.kdnuggets.com/subscribe.html KD Nuggets frequency is 3-4 times a month. Back issues of KD Nuggets, a catalog of data mining tools ("Siftware"), pointers to Data Mining Companies, Relevant Websites, Meetings, and more is available at Knowledge Discovery Mine site at http://www.kdnuggets.com/ -- Gregory Piatetsky-Shapiro (editor) [email protected] ******************* Official disclaimer *********************** All opinions expressed herein are those of the contributors and not necessarily of their respective employers (or of KD Nuggets) ******************************************************************* ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A gentleman is not a pot Confucius >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Thu, 24 Apr 1997 18:06:38 -0400 From: [email protected] (R. Uthurusamy) Subject: KDD-97 Registration Information KDD-97 Registration Brochure Third International Conference on Knowledge Discovery and Data Mining (KDD-97) August 14-17, 1997 Sponsored by the American Association for Artificial Intelligence http://www.aaai.org KDD-97: A Preview The rapid growth of data and information has created a need and an opportunity for extracting knowledge from databases, and both researchers and application developers have been responding to that need. Knowledge discovery in databases (KDD), also referred to as data mining, is an area of common interest to researchers in machine discovery, statistics, databases, knowledge acquisition, machine learning, data visualization, high performance computing, and knowledge-based systems. KDD applications have been developed for astronomy, biology, finance, insurance, marketing, medicine, and many other fields. The Third International Conference on Knowledge Discovery and Data Mining (KDD-97) will follow up the success of KDD-95 and KDD-96 by bringing together researchers and application developers from different areas focusing on unifying themes. KDD-97 Organization General Conference Chair: Ramasamy Uthurusamy, General Motors Corporation, USA Program Cochairs: David Heckerman, Microsoft Research, USA Heikki Mannila, University of Helsinki, Finland Daryl Pregibon, AT&T Laboratories, USA Publicity Chair: Paul Stolorz, Jet Propulsion laboratory, USA Tutorial Chair: Padhraic Smyth, University of California, Irvine, USA Demo and Poster Sessions Chair: Tej Anand, NCR Corporation, USA Awards Chair: Gregory Piatetsky-Shapiro, Geneve Consulting, USA Keynote Speaker: Peter Huber, Universitat Bayreuth, Germany "From Large to Huge. A Statistician's Reactions to KDD & DM" The statistics and AI communities are confronted by the same challenge, the onslaught of ever larger data collections, but the two communities have reacted independently and differently. What could they learn from each other if they looked over the fence? What is amiss on either side? KDD-97 Tutorial Abstracts and Speakers -------------------------------------- Full info on tutorials available at http://www-aig.jpl.nasa.gov/kdd97-docs/kdd97.tutorials.html All tutorials will be presented on Thursday, August 14, 1997. The times listed below are tentative. Admission to the tutorials is included in your conference registration fee. Registrants can attend up to four consecutive tutorials, including four tutorial syllabi. 8:00 to 10:00am T1- Fayyad and Simoudis (single session) Session 1 Session 2 10:30am to 12:30pm T2 - Hand T3 - Feldman 1:30 to 3:30pm T4 - Swayne and Cook T5 - Chaudhuri and Dayal 4:00 to 6:00 pm T6 - Keim T7 - DuMouchel Tutorial 1: 8:00-10:00am Data Mining and KDD: An Overview Usama Fayyad, Microsoft Research and Evangelos Simoudis, IBM We present a basic tutorial of this new and emerging area and emphasize relations to constituent communities, including statistics, databases, pattern recognition, learning, and visualization. The tutorial provides a basic overview of the KDD process for extracting knowledge from databases and covers the basics of each step in the process including: data warehousing, selection and cleaning, data transformation, data mining, evaluation, and visualization. We also cover a sampling of successful applications and outline challenges and issues to be addressed. Dr. Usama Fayyad is a Senior Researcher at Microsoft Research, the Decision Theory & Adaptive Systems Group. His research interests include knowledge discovery in large databases, data mining, machine learning, statistical pattern recognition, and clustering. After receiving the Ph.D. degree in 1991, he joined the Jet Propulsion Laboratory (JPL), California Institute of Technology (until 1996). At JPL, he headed the Machine Learning Systems Group where he developed data mining systems for analysis of large scientific databases. Dr. Evangelos Simoudis is Vice President, Global Business Intelligence Solutions - IBM North America, where he is responsible for the development and deployment of data mining and decision support solutions to IBM's customers worldwide. Dr. Simoudis received a B.A. in Physics from Grinnell College, a B.S. in Electrical Engineering from California Institute of Technology, an M.S. in Computer Science from the University of Oregon, and a Ph.D. in Computer Science from Brandeis University. Tutorial 2: 10:30am-12:30pm Modelling Data and Discovering Knowledge David Hand, Open University, UK Our aim is to extract knowledge from large bodies of data. The size of these bodies mean that we cannot do it unaided, but must use fast computers, applying sophisticated statistical tools. Attempts to automate the process of knowledge extraction date from at least the early 1980s, with the work on statistical expert systems. We examine this work, noting its successes and failures and, especially, what researchers in data mining and knowledge discover can learn from those efforts. We examine what data are, what information is, and what knowledge is. We contrast modelling with discovery, especially in the context of large data sets. We examine high level modelling issues, such as overfitting, generalisability, overmodelling, and model evaluation. And we examine high level exploration issues such as the discovery of accidental artefacts. The confluence of computing and statistics in some areas provides a nice backdrop against which to examine these issues, and we briefly discuss neural networks and classification trees from these two perspectives. Dr. David Hand is Professor of Statistics at the Open University. His research interests include the foundations of statistics, statistical computing, and multivariate statistics, the latter especially as applied to classification problems. His applications interests include medicine, finance, and psychology. He is Editor-in-Chief of Statistics and Computing and has has published fourteen books, the most recent of which is Construction and Assessment of Classification Rules, Wiley, January 1997. Tutorial 3: 10:30am-12:30pm Text Mining - Theory and Practice Ronen Feldman, Bar-Ilan University, Israel Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. In this tutorial we will present the general theory of Text Mining and will demonstrate several systems that use these principles to enable interactive exploration of large textual collections. We will describe generic techniques for text categorization and information extraction that are used by these systems. The systems that will be presented are KDT which is the system for Knowledge Discovery in Texts; FACT, which discovers associations among keywords labeling the items in a collection of textual documents; and the Text Explorer, which is a system that provides a high level language for interactive exploration of textual collections. We will present a general architecture for text mining and will outline the algorithms and data structures behind the systems. We will give special emphasis to incremental algorithms and to efficient data structures. Dr. Ronen Feldman is a lecturer at the Mathematics and Computer Science Department of Bar-Ilan University in Israel. He received his B.Sc. in Math, Physics and Computer Science from the Hebrew University, and his Ph.D. in Computer Science from Cornell University. His main research is in the area of Machine Learning and Data Mining. He is currently coordinating several research projects for developing dedicated text mining systems. These systems work on plain text collections and on the Internet. Tutorial 4: 1:30-3:30pm Exploratory Data Analysis using Interactive Dynamic Graphics Deborah Swayne, Bell Communications Research and Diane Cook, Iowa State University Researchers and software designers in the field of data mining are just beginning to make extensive use of graphical methods. Interactive dynamic data visualization has been explored in the field of statistics for over twenty years, and we propose that much of what has been learned in statistics is relevant for data mining. This class is an introduction to interactive data visualization as it is practiced as part of exploratory data analysis. The XGobi software, publicly available dynamic visualization software, will be used in the analysis of examples from biology, business, physics, engineering, and telecommunications. The examples will illustrate a set of general visualization principles which are embodied in specific methods such as brushing and identification of points in simple scatterplots, three dimensional rotations, rotations in higher dimensions such as the grand tour, and directed searches in higher dimensions for interesting two dimensional views using projection pursuit and manual control. Deborah Swayne has worked at Bellcore since that company's inception in 1985, and is currently a member of the Statistics and Data Mining Research Group. Her research focusses on software methods for visualizing data. She is one of the authors of the XGobi software, originally developed at Bellcore. She has a Bachelor's degree in African Linguistics from the University of Wisconsin at Madison, and a Master's degree in Statistics from Rutgers University. Dr. Dianne Cook is an Assistant Professor in the Department of Statistics, Iowa State University. She received her PhD from Rutgers University in May 1993, and has conducted research into dynamic statistical graphics. Her interests include using these methods for understanding high-dimensional data, and adapting them for analyzing geographically referenced data with multiple measurements at each site. Tutorial 5: 1:30-3:30pm OLAP and Data Warehousing Surajit Chaudhuri, Microsoft Research and Umesh Dayal, Hewlett Packard Laboratories On-Line Analytical Processing (OLAP) and Data Warehousing technologies enable enterprises to gain competitive advantage by exploiting the ever-growing amount of data that is collected and stored in corporate databases and files for better and faster decision making. Over the past few years, these technologies have experienced explosive growth, both in the number of products and services offered, and in the extent of coverage in the trade press. Vendors (including all database companies) are paying increasing attention to all aspects of decision support. The area opens up interesting research directions, with ties to past work in database systems, but with different assumptions and requirements. Only very recently, however, has the database research community started to understand and address some of these issues. This tutorial presents an overview of OLAP and data warehousing, and an in-depth study of selected aspects. An outline of the tutorial follows: 1. Introduction: definitions, evolution, differences from OLTP, architectures 2. Models and Tools: conceptual model for OLAP, front-end tools (e.g., multidimensional spreadsheets), database design (e.g., star and snowflake schema). 3. Database Server technologies for Decision Support Queries: specialized indexing techniques, specialized join and scan methods, data partitioning and use of parallelism, intelligent processing of aggregates, complex query processing, extensions to SQL, ROLAP vs. MOLAP. 4. Other Services for OLAP/Data warehousing: data cleaning, loading and refresh, tools for warehouse, system and process management, metadata management and the role of repository. 5. State of Commercial Practice. 6. Research Issues. The target audience is researchers and developers interested in learning about the concepts, products and the technical innovations in the area of decision support technologies. Dr. Surajit Chaudhuri is a researcher in the Database Research Group of Microsoft Research. From 1992 to 1995, he was a Member of the Technical Staff at Hewlett-Packard Laboratories, Palo Alto. He did his B.Tech at the Indian Instiute of Technology, Kharagpur and his Ph.D. at Stanford University. In addition to query processing and optimization, Surajit is interested in the areas of data mining, database design and uses of databases for nontraditional applications. Dr. Umesh Dayal is a senior researcher at Hewlett-Packard Labs, Palo Alto, California. His current research interests are in distributed information systems, workflow management, data mining, and information management issues related to the emerging global information infrastructure. He received his Ph.D. and S.M. degrees from Harvard University, his M.E. and B.E. degrees from the Indian Institute of Science, and his B.Sc. degree from Osmania University, India. Tutorial 6: 4:00-6:00pm Visual Techniques for Exploring Databases Daniel Keim, University of Munich For data exploration to be effective, it is important to include the human in the exploration process and combine the flexibility, creativity, and general knowledge of the human with the enormous storage capacity and the computational power of today's computers. Visual database exploration aims at integrating the human in the exploration process, applying its perceptual abilities to the large data sets available in today's computer systems. The basic idea of visual data exploration is to present the data in some visual form, allowing the human to get insight into the data and draw conclusions. Visual data exploration techniques have proven to be of high value in exploratory data analysis and they also have a high potential for exploring large databases. Visual database exploration is especially powerful for the first steps of the data mining process, namely understanding the data and generating hypotheses about the data, but it may also significantly contribute to the actual knowledge discovery by guiding the search using visual feedback. The goal of the tutorial is to show the potential of visualization technology for exploring large databases. The tutorial provides an overview of the state-of-the-art in data visualization and provides a classification of the existing data visualization techniques. Besides describing each of the classes, the tutorial focuses on new developments in data visualization, which are relevant to the area of knowledge discovery, and describes a wide range of recently developed techniques for visualizing large amounts of arbitrary multi-attribute data which does not have any two- or three-dimensional semantics and therefore does not lend itself to an easy display. A detailed comparison shows the strength and weaknesses of the existing techniques and reveals potentials for further improvements. Several examples demonstrate the benefits of visualization techniques for exploring databases. The tutorial concludes with an overview of existing database exploration and visualization systems, including research prototypes as well as commercial products. Dr. Daniel Keim is one of the leading experts in the field of visual database exploration, and he was the chief engineer in designing the VisDB system - a visual database exploration system. Dr. Keim received his diploma (equivalent to an MS degree) in Computer Science from the University of Dortmund in 1990 and his Ph.D. in Computer Science from the University of Munich in 1994. Currently, he is a teaching and research assistant (approximately equivalent to an assistant professor) at the Institute for Computer Science of the University of Munich, Germany. Tutorial 7: 4:00-6:00pm Statistical Models for Categorical Response Data William DuMouchel, AT&T Research This tutorial will survey the most common models and methods statisticians use to fit and test relationships among categorical (discrete) data. Most of these techniques are described in statistics texts such as Categorical Data Analysis , by Alan Agresti, (Wiley 1990) and are widely available in popular computer packages such as SAS and Splus. Therefore it is almost de rigeur for someone with a new classification technique to compare the proposal to one or more of these standard methods. The tutorial will focus on loglinear and logistic regression models, and related models such as probit, poisson regression, and survival models. In the short time available, priority will be given to explaining why these techniques are so popular among statisticians, and to how the basic models have been extended to handle variables having more than two categories or when some of the variables have continuous or ordinal scales. Examples of model fitting, model search and model comparison using SAS and Splus will be presented and discussed. Dr. William DuMouchel has been on the faculties of UC Berkeley, University of Michigan, University of London, MIT and Columbia University. From 1987 to 1992 he was Chief Statistical Scientist at BBN Software Products, helping to design and develop commercial software advisory systems for data analysis and experimental design. He is currently at AT&T Labs - Research, Florham Park, New Jersey. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Thu, 24 Apr 1997 18:06:38 -0400 From: [email protected] (R. Uthurusamy) Subject: KDD-97 Workshop KDD-97 Workshop - August 17, 1997 8:30am-5pm --------------------------------------------- Issues in the Integration of Data Mining and Data Visualization --------------------------------------------------------------- Details:http://www.cs.uml.edu/~grinstei/kddvis-workshop.html Data visualization deals with the effective portrayal of data with a goal towards insight about the data. Typically, the data is of high volume, multidimensional in nature, and does not lend itself to easy display. The data is also often non-spatial and temporal in nature. Data visualization software systems are very popular with end-user domain scientists who require visual tools to explore and analyze their data. These visual tools however are used strictly as output of the exploration process and have received much attention whereas the input issues to the exploration process still have not. The KDD community is concerned with two aspects of visualization techniques: 1. Its use at the "back-end" of the exploration process to help understand models extracted by data mining algorithms, and 2. Scalability issues in visualization: how do we make it efficient in presence context of large databases where data access is expensive. The visualization community looks at KDD and analytic methods also as applications to generate displays. However, visualization can be used as input to KDD and analytic tools; it can also be used to support computational steering. An effective visualization front-end can guide a data mining algorithm in its search and may result in much better and more easily acceptable solutions. This workshop will continue the discussions started at the first two workshops and focus on these and other issues that make a case for integrating KDD and visualization technologies. Two previous workshops (Siggraph '90 and Visualization '91) have dealt with areas such as high-level requirements for data structures and access software, and data visualization environments. The first and second workshop on Database Issues for Data Visualization were held in 1993 and 1995 and explored the fundamental issues. A number of experimental, prototype, and research systems were presented. The second workshop also saw a beginning interest with data mining and visualization integration. This trend, so significant in the commercial sector today, is in its infancy and is in need of much research attention. Position statements and papers are welcome on the following issues as they relate to KDD and data visualization integration. We would like to keep discussions focused on the end result, which is improving the integration of data mining and knowledge discovery systems with visualization: * Requirements Visualization places on Knowledge Discovery Systems * Data Models and Access Structures * Modeling the User - Tasks, Processes, Support Issues * Advanced User Interfaces for Data Mining * Visual Languages for Data Mining * System Integration Issues * Computational Steering for Data Mining * Scalability to Large Databases * Distributed, Heterogeneous Data Set Issues - Data and Computation Sharing * Examples of Integrated Systems * Applications of Integrated Systems Workshop Paper Submissions (Deadline June 15) Papers (and position papers to be expanded for final publication) are solicited that present research results in the integration of data mining and visualization. Papers should be limited to 5,000 words and may be accompanied by NTSC video. These should describe some original research on the particular subject, and how it fits in with the overall theme of the workshop. Proper references should be cited. Workshop Registration Fee Registration forms will be sent to the accepted participants. There is a single registration fee of US $100 which covers the workshop sessions, preprints, and coffee breaks. Workshop Organizers Georges Grinstein Institute for Visualization and Perception Research University of Massachusetts at Lowell Lowell, MA 01854, USA Email: [email protected] Fax: +1-508-934-3551 * Phone: +1-508-934-3627 Andreas Wierse Institute for Computer Applications Dep. Computersimulation and Visualization Pfaffenwaldring 27 D-70550 Stuttgart, Germany Email: [email protected], Fax: +49(0)711-682357 * Phone: +49-711-685-5796 Usama Fayyad Microsoft Research Redmond, WA 98052-6399, USA Email: [email protected] Fax: +1-206-936-7329 * Phone +1-206-703-1528 --------------------------------------------- >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Thu, 24 Apr 1997 18:06:38 -0400 From: [email protected] (R. Uthurusamy) Subject: KDD-97 Demos/Exhibits of Knowledge Discovery Products ----------------------------------------------------- Following the sucess of the demonstration sessions in previous KDD conferences, the KDD-97 program will also include demonstrations of knowledge discovery products, knowledge discovery applications and research prototypes. Unlike previous demonstration sessions, we will clearly differentiate between commercial product demonstrations and research demonstrations. We are inviting commercial vendors to exhibit at KDD-97. The exhibitor fee for KDD-97 will be a nominal $250.00. Exhibitors will be provided with a 6ft table top. In this space vendors will be allowed to distribute product or company literature, show product demonstrations and set up signage. Vendors will have to bring all necessary hardware and software that they will require for their demonstrations. The exhibit area will be open during the following hours: Aug. 15th: 12:30-5pm For your information total attendance at KDD-96 was 457. Of these 35% were affiliated with universities and 65% were affiliated with industry. If you would like to exhibit at KDD-97 please fill out the registration form and send it along with the name of your Product(s) and/or Service(s) and a 200 word (maximum) Description of Product(s)/Service(s) to: AAAI, KDD-97 Exhibit, 445 Burgess Drive, Menlo Park, CA 94025, USA. The description will be included in the conference program. We are also soliciting demonstrations of research prototypes at KDD-97. This demonstration session will be held on August 15 from 12:30 to 5:00 PM. We have a limited budget for providing hardware for research demonstrations. This year we will give priority to demonstrations that are in conjunction with accepted papers at KDD-97. Within budget and space constraints we will make every effort to accommodate as many demonstrations as possible. If you would like your demonstration to be considered for KDD-97 please provide the following information to Tej Anand ([email protected]) by June 1, 1997. * Name of Demonstration: * Title of Paper: (If this demonstration is in conjunction with a paper/poster at KDD-97) * Development Team: * Affiliations of Development Team Members: * Contact Telephone#: * Description of Demonstration: (A short description of approx. 200 words) * What is unique about your system or application?: (No more than 50 words) * Status: Research Prototype/Commercially available product/Fielded application * Hardware Required: (Please state any special memory or disk requirements) * Operating System: (Please state specific version number) * WAN Connection Required: Yes/No (If Yes, please state any special modem requirements) * Will you bring your own hardware?: Yes/No * Any other requirements: >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Thu, 24 Apr 1997 18:06:38 -0400 From: [email protected] (R. Uthurusamy) Subject: KDD-97 Registration Information A registration application is attached to this online brochure. The KDD-97 program registration includes admission to four tutorials, 4 tutorial syllabi, technical and demo sessions, the opening reception, the KDD-97 Conference Proceedings and mid-morning & afternoon coffee breaks. Onsite registration will be located in the foyer outside the California Ballroom, Newport Beach Marriott Hotel and Tennis Club, lobby level. Early Registration (Postmarked by June 10) AAAI Members Regular $295 Students $95 Nonmembers Regular $375 Students $155 Late Registration (Postmarked by July 15) AAAI Members Regular $350 Students $125 Nonmembers Regular $425 Students $180 On-Site Registration (Postmarked after July 15 or onsite.) AAAI Members Regular $400 Students $475 Nonmembers Regular $150 Students $210 Workshop Registration Registration forms will be sent to the accepted participants. There is a separate registration fee of US $100 which covers the workshop sessions, preprints, and coffee breaks. Payment Information Prepayment of registration fees is required. Checks, international money orders, bank transfers and travelers' checks must be in US dollars. American Express, MasterCard, VISA, and government purchase orders are also accepted. Registration applications postmarked after the early registration deadline will be subject to the late registration fees. Registration applications postmarked after the late registration deadline will be subject to on-site registration fees. Student registrations must be accompanied by proof of full-time student status. Refund Requests The deadline for refund requests is July 25, 1997. All refund requests must be made in writing. A $75.00 processing fee will be assessed for all refunds. Registration Hours Registration hours will be Thursday-Saturday, August 14-16, 7:30am-6:00pm and Sunday, August 17, 8:00am-3:00pm. All attendees must pick up their registration packets for admittance to programs. Housing AAAI has reserved a block of rooms at the Newport Beach Marriott Hotel at reduced conference rates. Conference attendees must contact the hotel directly and identify themselves as KDD-97 registrants to qualify for the reduced rates. Hotel rooms are priced as singles (1 person, 1 bed), doubles (2 persons, 2 beds), triples (3 persons, 2 beds), quads (4 persons, 2 beds). Rooms will be assigned on a first-come, first-served basis. All rooms are subject to a 10% occupancy tax. Headquarters Hotel: Newport Beach Marriott Hotel 900 Newport Center Drive Newport Beach, CA 92660 Phone: 714-640-4000 Fax: 714--640-4918 Single room: $105.00 Double room: $115.00 Check-in time: 4:00pm Check-out time: 12:00 noon Cut-off date for reservations: July 24, 1997. All reservation requests for arrival after 6:00 pm must be accompanied by a first night room deposit, or guaranteed with a major credit card. The Newport Beach Marriott Hotel will not hold any reservations after 6:00 pm unless guaranteed by one of the above methods. Reservations received after the cut-off time will be accepted on a space or rate available basis. Reservations accepted without a credit card guarantee or advance deposit are subject to cancellation at 6:00 pm on the day of arrival. Air Transportation and Car Rental Newport Beach, California - Get there for less! Discounted fares have been negotiated for this event. Call Conventions in America at 1-800-929-4242 and ask for Group #428. You will receive 5%-10% off the lowest applicable fares on American Airlines, or the guaranteed lowest available fare on any carrier. Travel between August 11-21, 1997. All attendees booking through CIA will receive free flight insurance and be entered in their bi-monthly drawing for worldwide travel for two on American Airlines! Hertz Rent A Car is also offering special low conference rates, with unlimited free mileage. Call Conventions in America - 1-800-929-4242, ask for Group #428. Reservation hours: M-F 6:30am-5:00pm Pacific Time. Outside US and Canada, call 619-453-3686/Fax 619-453-7679. Internet: [email protected]/24-hour emergency service 1-800-748-5520. If you call direct: American 1-800-433-1790, ask for index #S 9485. Hertz 1-800-654-2240, ask for CV#24250. Ground Transportation The following information provided is the best available at press time. Please confirm fares when making reservations. Airport Connections The Newport Beach Marriott Hotel provides complimentary airport transportation to/from John Wayne /Orange County Airport. Super Shuttle: 714-517-6600. The fare from LAX Los Angeles International Airport to Newport Beach Marriott Hotel is $21.00 per person. Reservations 24 hours in advance are recommended. Discover Card, traveller's checks and cash is accepted. Taxi Taxis are available at John Wayne Airport. Approximate fare from the airport to downtown Newport Beach is $14.00. Orange County Yellow Cab Service: 714-546-1311. The approximate taxi fare from LAX Los Angeles International Airport to Newport Beach Marriott Hotel is $75.00-80.00. Bus Greyhound/Trailways Lines. The depot is located at 100 W. Winston Road, Anaheim, CA 92805. For information on fares and scheduling, call 714-999-1256. Rail The Amtrak (Southern Pacific Railroad) stations are located at Santa Ana, Irvine and Anaheim. For general information and ticketing, call 1-800-872-7245. City Transit System OCTD (Orange County Transit District) serves Newport Beach, Balboa Island and Corona del Mar. Basic local fare is $1.00. For general information call 714-636-RIDE. Parking Parking is available at the Newport Beach Marriott Hotel. The daily rate for valet parking is $6.00, and $8.00 overnight. Self-parking is complimentary. Disclaimer: In offering American Airlines, Hertz Rent A Car, Newport Beach Marriott Hotel, and all other service providers, (hereinafter referred to as "Supplier(s)" for the Third International Conference on Knowledge Discovery and Data Mining, AAAI acts only in the capacity of agent for the Suppliers which are the providers of the service. Because AAAI has no control over the personnel, equipment or operations of providers of accommodations or other services included as part of the KDD-97 program, AAAI assumes no responsibility for and will not be liable for any personal delay, inconveniences or other damage suffered by conference participants which may arise by reason of (1) any wrongful or negligent acts or omissions on the part of any Supplier or its employees, (2) any defect in or failure of any vehicle, equipment or instrumentality owned, operated or otherwise used by any Supplier, or (3) any wrongful or negligent acts or omissions on the part of any other party not under the control, direct or otherwise, of AAAI. Newport Beach, California! Newport Beach is located along the beautiful Pacific Ocean in Orange County, California, nestled south of Los Angeles, north of San Diego, southwest of Disneyland in Anaheim, and adjacent to John Wayne/Orange County Airport. Surrounded by one of the largest small-boat harbors in the world and lazily stretching itself along more than six miles of scenic Pacific coastline, Newport Beach beckons national and international visitors to moor at the magnificient harbor and discover "The Colorful Coast". Newport Beach Visitor Information A Concierge Desk is available in the Newport Beach Marriott Hotel. They can assist with dining reservations, directions, tour bookings, entertainment suggestions, and transportation information. Maps and brochures are available. URL: http://www.newport.lib.ca.us/NBCVB/NBCVB.html ********************************************************************** KDD-97 PREREGISTRATION APPLICATION Name: Company/Univ: Dept/MS: Address (Specify Home or Business): City: State: Zip: Phone & FAX: Membership No: Email Address: ******************************************************************** TECHNICAL PROGRAM (Includes Proceedings) EARLY REGISTRATION LATE REGISTRATION (postmarked by June 10) (postmarked by July 15) AAAI Member Nonmember AAAI Member Nonmember Regular Student Regular Student Regular Student Regular Student $295 $95 $375 $155 $350 $125 $425 $180 (Students must send proof of student status to the AAAI Office. By joining AAAI now, you can qualify for member rates. Membership information is available from [email protected] or http://www.aaai.org.) Total KDD-97 Conference Fee: ______ ******************************************************************** TUTORIAL PROGRAM Thursday, August 14 (Conference fee includes up to 4 consecutive tutorials & accompanying syllabi) 8:00-10:00 AM T1 10:30 AM-12:30 PM T2, T3 1:30-3:30 PM T4, T5 4:00-6:00 PM T6, T7 Please list selected tutorial codes: ******************************************************************** KDD-97 Workshop Sunday, August 17 $100 per person. Total Workshop Fee: _______ ******************************************************************** KDD-97 OPENING RECEPTION (Included in technical program registration) Fee for spouse, child, or guest is $20 per person. Total reception fee: ______ ******************************************************************** Exhibit Registration August 15, 1997 $250 per exhibitor. An exhibitor kit will be mailed upon receipt of registration. Total Exhibitor Fee: _______ ********************************************************************** PAYMENT Email registrations must be accompanied by a credit card number. Total Amount Due: ______ Check one: Mastercard ___ Visa ___ American Express ___ Credit Card Account Number: Expiration Date: Name as it appears on card: Forms cannot be processed if information is incomplete. The refund request deadline is July 25, 1997. A $75.00 processing fee will be assessed for refunds. Registrations postmarked after July 15 are subject to onsite rates. Mail completed application to [email protected] or fax to 415/321-4457. Please note that there are security issues involved with the transmittal of credit card information over the internet. AAAI will not be held liable for any misuse of your credit card information during its transmittal from you to AAAI. For complete KDD-97 information, please visit AAAI's web site at http://www.aaai.org. Thank you for your registration! See you at KDD-97 >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Thu, 24 Apr 1997 08:42:49 -0400 From: [email protected] (Peter Turney) Subject: Re: data mining from wafers manufacturing process Dear Elisa: > At our University, we are starting an application project > dealing with data from a wafers manifacturing process. > We are thinking to use data mining techniques > for try to address the following problem. > Some of those wafers are faulty. There is a database keeping track > of the entire manifacturing process for each wafer and collecting > large amount of data concerning each step of the manifacturing > process (there are about 300 steps; each step is characterized > about 100 parameters). Our problem is use data mining techniques > in helping the diagnosis, that is, to see which step > may have caused the problem. > > I was wondering whether you are aware of any use of data mining > techniques for similar problems. We have also to acquire > some suitable data mining tools. Here are two relevant URLs for you: 1. ftp://ai.iit.nrc.ca/pub/iit-papers/NRC-39163.ps.Z P. Turney. Data Engineering for the Analysis of Semiconductor Manufacturing Data. IJCAI-95 Workshop on Data Engineering for Inductive Learning: 50-59. 1995. 2. http://www.quadrillion.com/ Quadrillion Corporation, makers of Q-Yield Best wishes, Peter. http://ai.iit.nrc.ca/staff/peter.html >~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: [email protected] Date: Thu, 24 Apr 97 11:18:57 Subject: FW: new entry for siftware section <H2>Siftware: Delta Miner </H2> <br><b>URL:</b> <A HREF="http://www.bissantz.de"> http://www.bissantz.de</a> <br><b>Description:</b>: Delta Miner 3.0 is a suite of easy to handle data mining instruments for financial controlling applications and database analysis. <br><b>Discovery tasks:</b> Clustering, Summarization, Deviation Detection, Visualization <br><b>Comments:</b> Delta Miner 3.0 is a suite of data mining instruments that analyzes complex data pools. Delta Miner's tools are flexible: they lend themselves to a broad range of applications. A common application is the analysis of financial controlling data. Delta Miner guides the user quickly and easily through complex data structures down to the significant facts. In contrast to the simple "Drill-down" capabilities of typical EIS and MIS tools, Delta Miner integrates a high level of helpful automation. The system is capable of recommending the best analysis paths, thereby relieving the controller from tedious routine tasks. In addition to identifying the important trends, the tool also points to the causes of those trends. Further analyses inform the user about the best possible countermeasures to negative developments. The basis techniques of the Delta Miner were developed at FORWISS, where since 1993, a research group led by Prof. Dr. Peter Mertens has intensively investigated algorithms for Data Mining. At it's first presentation delta miner was recognized as one of the best three products in the category "Business Management Solutions" at the Systems '96 trade show in Munich. A demoversion can be downloaded. <br><b>Platform(s):</b> Windows 95, NT <br><b>Contact:</b> <pre> Bissantz K�ppers & Company GmbH Am Weichselgarten 7 91058 Erlangen Germany phone +49 9131 691-450 fax +49 9131 691-455 [email protected] </pre> <br><b>Status: </b> Product <br><b>Updated:</b> 1997-04-11 by Dr. Nicolas Bissantz ([email protected]) >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Wed, 30 Apr 1997 16:22:28 -0400 From: Pablo Tamayo <[email protected]> Job Description: Staff Member in the Technology Group Researcher/Developer of Data Mining/KDD Technologies Thinking Machines Corp. 4/30/97 - Provide technical and scientific expertise in core areas for Data Mining and KDD, such as Machine Learning, Artificial Intelligence, Statistics and High Performance Computing, to the development organization and the company in general. Help to evaluate competing, new or strategic technologies and algorithms for current or future releases of Data Mining/KDD products (toolsets, KDD engines and vertical applications). - Design and develop state-of-the-art Machine Learning/Statistical module prototypes. Be responsible for the support and maintenance of the assigned modules. Collaborate with the Software Engineering Group to integrate these prototypes into products' software architecture following development-wide software engineering guidelines. Provide parallelism and performance enhancements for algorithms. Help support core algorithms in current products. - Collaborate with the Data Analysis, Professional Services and Technical Sales groups to study and choose appropriate algorithms and methods for proof of concept studies or to integrate permanent solutions for customers. - Help write patents and provide technical assistance in patent related issues. - Represent the company in relevant conferences, workshops, trade shows or forums and follow Data Mining/KDD literature and trends in the KDD academic and commercial communities. If you are interested please contact: Dr. Pablo Tamayo [email protected] Thinking Machines Corp. 14 Crosby Dr. Bedford, MA 01730 >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: Eric Horvitz <[email protected]> Date: Wed, 23 Apr 1997 13:53:39 -0700 Thirteenth Conference on Uncertainty in Artificial Intelligence Please refer to the UAI '97 home page at http://cuai97.microsoft.com for updated information on this summer's UAI conference and registration procedures. UAI will follow right after AAAI in Providence. The page also includes other information of interest, including details (...and even some reading assignments) for the UAI '97 Full Day Course on Uncertain Reasoning on Thursday, July 31. The pages also contain information on accomodations in Providence. Looking forward to seeing you this summer, Eric Horvitz Conference Chair >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: [email protected] Date: Fri, 25 Apr 1997 14:45:21 -0400 Subject: The Gordian Institute's "Making Sense of Data: Computer-Aided Pattern Discovery" course is scheduled for July 14-18 in Charlottesville, Virginia. Refer to http://www.gordianknot.com ------------------------------------------------------------------------ The Gordian Institute, a division of American Heuristics Corporation (AHC), established July 14-18, 1997 in the historic town of Charlottesville near Monticello as the venue for the next offering of "Making Sense of Data: Computer-Aided Pattern Discovery." The intensive four and one-half day data mining course will take place in Charlottesville, Virginia with a start date of July 14, 1997. The course includes live interactive demonstrations using data from real-world applications. Participants need only have prior working experience with computers and familiarity with data related problems to benefit from the course. Attendees will explore a host of advanced computing techniques and software tools used to discover useful patterns hidden in data. The course surveys modern algorithms drawn from the fields of statistics, machine learning, data mining and inductive modeling which automatically build classifiers or estimators from a database. You may never find another course that succinctly covers the essential parts of so many aspects of "data mining" with both theoretical and practical insights. Topics to be presented are: -Pattern Discovery: An Overview -Inducing Models from Data: Benefits and Dangers -The Data Mining Process -Perspectives of Related Fields: -Statistics, Machine Learning, Data Mining and Artificial Intelligence -Data Issues -Case Diagnostics (Outlier, Influential, Leverage Points) -Feature Creation and Selection -Classical Statistical Techniques -Linear: Regression and Discriminant Analysis -Nonparametric: Scatterplot Smoothers, Nearest Neighbors, Kernels -Key General Tools: -Scientific Visualization -Resampling -Optimization -Clustering -Modern Methods -Neural Networks -Polynomial Networks (ASPN, AIM) -Decision Trees (CART) -Brief Survey of Other Methods -Projection Pursuit -ASH (Average Shifted Histograms) -MARS (Multivariate Adaptive Regression Splines) -Radial Basis Functions -Comparing and Combining Methods While increasingly awash in data, most organizations are unable to fully extract the useful information embedded within. The practical techniques taught in this course can help you to discover and make sense of hidden patterns. A key element of corporate efficiency must be the extraction of important information to support the decision making process and accurately predict and plan for future needs. Those from government, industry and academia who see the need for non-linear modeling techniques, and who have particular applications not adequately solved with classic modeling techniques are target candidates for this course. Direct Quote from Course Evaluation Sheet: "I felt this course was far superior to many others that I have been exposed to. Most notably, the instructors were not only clearly experts but were not biased toward any one software package or technique. The instructors also emphasized targeting the users' specific applications (including analyzing sample data brought in by the students). This is exceptionally useful. Great value for the $. What was most valuable to me was the presentation of a broad range of both analytical techniques and software tools for solving various problems. This helps to give me the 'big picture' and allows me to best determine what technologies are most applicable and useful to me." -Andy Kalish, Eastman Kodak The Instructors: John F. Elder IV, PhD, and Dean Abbott of Quantitative Solutions explain the methods used inside leading commercial and academic software, providing practical tips and techniques on feature extraction and neural network problem solving. The course instructors each have more than a decade of experience in applying adaptive, data-driven techniques to practical problems. Dr. Elder has developed or refined some of the methods covered in this course. He is Chief Scientist at Quantitative Solutions and Adjunct Professor at the University of Virginia, and has authored four book chapters and numerous articles on adaptive methods of pattern discovery. He has been a researcher at Rice University and at an engineering consulting firm, and was Director of Research for an investment management company. Dr. Elder is a frequent lecturer on pattern discovery techniques, and is the technical chair of the Adaptive and Learning Systems Group of the IEEE Systems, Man, and Cybernetics Society. Dean W. Abbott is a Senior Research Scientist at Quantitative Solutions. He has applied data mining techniques to challenges in optimum guidance and control, optical character recognition, image pattern recognition, and radar and multi-spectral signal processing. Mr. Abbott has developed pattern recognition software that is sold commercially, and has written and lectured on novel applications of feature selection, polynomial network, and pattern recognition techniques to solve real-world problems in several fields. Pricing Information: Registration for this four and one-half day course is $1995. Government and academic discounts may apply. Lodging details and directions may be viewed at http://www.gordianknot.com, or obtained by providing a fax number or Email address to (800) 405-2114 or [email protected]. You may also send a message to [email protected] with "newsletter" in the subject field to receive a quarterly electronic newsletter from The Gordian Institute. If you have remaining questions regarding the course, a knowledgeable representative may be contacted directly at (800) 405-2114. Seats may also be secured through Gordian's web site. Space is limited to 24 seats, so go to your browser, set it to http://www.gordianknot.com and reserve your place! __________________________ The Gordian Institute http://www.gordianknot.com [email protected] (800) 405-2114 __________________________ The parent company, American Heuristics Corporation (AHC) is a founding member of the West Virginia High Technology Consortium, with headquarters in Triadelphia, West Virginia. AHC is an advanced software technology consulting company applying hybrid software solutions to complex technical problems in business, industry and government. AHC may be found on the web at: http:// www.heuristics.com >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: "Prof. Zicari" <[email protected]> Date: Sun, 27 Apr 1997 00:10:18 +0200 (METDST) I would like to inform you that the conference programs of COMDEX Internet & OBJECT WORLD Frankfurt`97 (October 7-10) are now available on line at : http://www.ltt.de The web site will be updated on a regular base. If you have any questions, please send me an e-mail at [email protected]. Best Regards Roberto Zicari Chair Advisory Board, COMDEX Internet & OBJECT WORLD Frankfurt. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
410.25	97:16	IJSAPL::OLTHOF	Spellchecked Henry Although	`Sun May 11 1997 18:52`	765
	Knowledge Discovery Nuggets 97:16, e-mailed 97-05-08 Publications: * GPS, first issue of DMKD journal is published! http://www.research.microsoft.com/research/datamine/ * Gerhard Widmer, CfP: MLJ Special Issue on Context Sensitivity and Concept Drift (http://www.ai.univie.ac.at/mlj_specissue/) Siftware: * Larry Bouchie, Cognos new Data Mining Tool: Scenario * Aleksander Oehrn, Rosetta - rough-set tool for data analysis http://www.idt.unit.no/~aleks/rosetta/rosetta.html Positions: * Gregory Piatetsky-Shapiro, Data Mining Company looking for experts in decision trees and/or bayesian networks * Donal Lyons, Data Mining Research Position in Ireland * Yike Guo, Data Mining Job at Fujitsu (Japan) Meetings: * Pavel Brazdil, The Workshop on "Extraction of Knowledge from Data Bases" (EKBD'97), Coimbra, Portugal, October 6-9, 1997 http://alma.uc.pt:80/~epia97/EKBD97.html * Michael Berthold, IDA-97 Call for Participation http://web.dcs.bbk.ac.uk/ida97.html * Staal Vinterbo, PKDD'97 Call for participation, Trondheim, Norway, June 24-27, 1997, http://www.idi.ntnu.no/pkdd97/ * Rob Tibshirani, Statistical prediction methods for finance and marketing, New York City: June 23-24, 1997, http://stat.stanford.edu/~trevor/mrc.finance.html * Angi Voss, Workshop on Social Agents at ECSCW97 Conference September 7, 1997 http://orgwis.gmd.de/projects/SAW/ecscw97SoAg.html -- Knowledge Discovery Nuggets is a free electronic newsletter for the Data Mining and Knowledge Discovery community, focusing on the latest research and applications. Submissions are most welcome and should be emailed, with a DESCRIPTIVE subject line (and a URL) to [email protected]. Please keep CFP and meetings announcements short and provide a URL for details. To subscribe, see http://www.kdnuggets.com/subscribe.html KD Nuggets frequency is 3-4 times a month. Back issues of KD Nuggets, a catalog of data mining tools ("Siftware"), pointers to Data Mining Companies, Relevant Websites, Meetings, and more is available at Knowledge Discovery Mine site at http://www.kdnuggets.com/ -- Gregory Piatetsky-Shapiro (editor) [email protected] ******************* Official disclaimer *********************** All opinions expressed herein are those of the contributors and not necessarily of their respective employers (or of KD Nuggets) ******************************************************************* ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ About the Deep Blue -- Kasparov match, "I just think we should look at this as a chess match," he said, "between the world's greatest chess player and Garry Kasparov." Louis Gerstner, IBM Chairman >~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Thu, 8 May 1997 09:41:10 -0500 (EST) From: GPS <[email protected]> Subject: First Issue of DMKD journal The first issue of DMKD journal has finally been published! see http://www.research.microsoft.com/research/datamine/vol1-1/default.htm The beautiful black and white cover shows an Escher-inspired picture of several robots inside a mysterious structure (a data mine?), and contents include an editorial by Usama Fayyad, 4 excellent technical papers, * Statistical Themes and Lessons for Data Mining Clark Glymour, David Madigan, Daryl Pregibon, Padhraic Smyth * Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab, and Sub Totals Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, and Hamid Pirahesh * On Bias, Variance, 0/1 - loss, and the Curse-of-Dimensionality Jerome H. Friedman * Bayesian Networks for Data Mining, David Heckerman and a brief application summary: * Advanced Scout: Data Mining and Knowledge Discovery in NBA data, Inderpal Bhandari, Ed Colet, Jennifer Parker, Zachary Pines, Rajiv Pratap, Krishnakumar Ramanujam Sample copies of first issue will be mailed soon. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Wed, 30 Apr 1997 11:09:50 +0200 (MET DST) From: Gerhard Widmer <[email protected]> Subject: CfP: MLJ Special Issue on Context Sensitivity and Concept Drift Machine Learning Journal Special Issue on Context Sensitivity and Concept Drift Miroslav Kubat and Gerhard Widmer, Guest Editors MOTIVATION AND RESEARCH ISSUES In many machine learning applications, the features given to the learning program do not capture all aspects of the application problem. This is a limitation shared with all forms of modeling -- even the person who formulates the learning problem may not be aware of all of the relevant context. Examples from the history of machine learning and pattern recognition include omitting illumination features in computer vision and omitting language accents in speech recognition systems. A similar problem arises when the relevant features are included, but the training examples do not provide enough variation of those features to permit the learning algorithm to detect their relevance. For example, if foreign accent features are included in a speech recognition system, but all training examples are from native speakers, then the foreign accent features will be ignored by the learning system. Relevant context may also change with time, so that a classifier trained on one set of training examples (where a contextual feature was absent or held constant) may suddenly begin to perform badly when the context changes. Gradual or abrupt changes in context often become apparent in the form of {\em concept drift}. For situations where a concept gradually evolves over time in a certain general direction (such as the concept ``computer''), the term {\em concept evolution} has sometimes been used. Tracking concept drift on-line requires a learner to continually monitor its performance and adjust its hypotheses if necessary. It might also require the learner to "forget" old, outdated information. In batch learning, problems may arise if the training data were collected in batches from different contexts, or if the training data were gathered in one setting but the test data are drawn from a different setting. Again, effective learning requires the recognition of such discontinuities and the ability to adapt hypotheses to different conditions. This special issue is devoted to theoretical and empirical studies of methods for detecting missing context, tracking concept drift, adapting learned knowledge to new contexts, and identifying and reasoning about contextual effects and concept changes in learning. We encourage submissions addressing one or more of the following research issues: . on-line tracking of concept drift and concept evolution . theoretical results concerning concept drift and contextual influences . formal definitions of context and its effects on concept learning . real-world applications involving context changes and/or concept drift . representation of context-sensitive concepts . representation of context . recognition of context and reasoning about context . adaptation of learned knowledge to new contexts Both theoretical and more practically oriented papers are welcome, but we do encourage papers that provide real-world examples of context sensitivity and concept drift and compare multiple ways of addressing the problems that arise. SUBMISSION INFORMATION: The expected length is 8000-12000 words for a full paper, or 2000-4000 words for a Research Note (full-page figures count for 400 words). Electronic submission via e-mail is STRONGLY ENCOURAGED. Postscript files (compressed or gzipped, uuencoded) should be sent to [email protected]. For hardcopy submissions, please send 5 copies of the manuscript to: Gerhard Widmer Austrian Research Institute for Artificial Intelligence Schottengasse 3 A-1010 Vienna Austria Tel: +43-1-53532810 Fax: +43-1-5320652 e-mail: [email protected] The submission deadline is September 15, 1997. see http://www.ai.univie.ac.at/mlj_specissue/ for full details. The special issue is scheduled to appear in the summer of 1998. >~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 28 Apr 1997 13:38:14 +0200 To: [email protected] From: Aleksander Oehrn <[email protected]> Subject: Rosetta availability =================================================== Rosetta -- A Rough Set Toolkit for Analysis of Data =================================================== Rosetta is a toolkit for analyzing tabular data within the framework of rough set theory, and consists of a computational kernel and a GUI front-end. The Rosetta GUI reflects the contents of the kernel, and runs on PCs operating under Windows NT or Windows 95. A limited version of Rosetta is made publicly available for non-commercial use. The downloadable program is limited in the sense that algorithms from the embedded RSES library are not applicable to decision tables larger than some predetermined size (currently 500 objects and 20 attributes). http://www.idt.unit.no/~aleks/rosetta/rosetta.html The software (including documentation) is provided "as is" without warranty of any kind. Kernel architecture and front-end designed and implemented at the Knowledge Systems Group, Dept. of Computer and Information Science, Norwegian University of Science and Technology, Norway. Sections of the computational kernel (RSES) developed at the Logic Group, Inst. of Mathematics, University of Warsaw, Poland. Rosetta is designed to support the overall KDD process; from initial browsing and preprocessing of the data, via reduct computation and rule generation, to validation and analysis of the extracted rules. Some of the features currently offered by the computational kernel include amongst others: - Completion of decision tables with missing values according to various completion strategies. - Computation of partitions and rough set approximations within the variable precision model. - Sampling of subtables for validation purposes. - Discretization of numerical attributes with various discretization algorithms. - Computation of reducts (both in the standard sense as well as object-related ones). Various approximation algorithms (e.g. genetic algorithms) are offered, as well as exhaustive computation via discernibility matrices. Dynamic reducts can be computed. - Generation of propositional rules. - Shortening and pruning of sets of reducts and rules. - Exporting of rules, reducts and tables, e.g. to Prolog. - Application of synthesized rules to unseen examples by means of various classification strategies, e.g. voting. - Generation of confusion matrices. Some of the features currently offered by the Rosetta GUI include amongst others: - Full Windows GUI conformance. - Organization of project items in a tree-structure in order to retain data-navigational abilities. - Viewing of all structures in intuitive grid environments, using terms from the modelling domain. - Context-sensitive menus. - Drag and drop functionality. - Masking of attributes, enabling one to work with "virtual" tables. - Automatic generation of annotations, thus documenting the modelling session. - A prototype environment for interactive classification and guidance on the basis of incomplete information, using a selected set of synthesized rules. - On-line help. >~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Wed, 7 May 1997 17:37:13 -0400 From: Larry Bouchie <[email protected]> Cognos' Scenario data mining product was released last month. Cognos' main Web page is at http://www.cognos.com and the Scenario site is at http://www.cognos.com/busintell/products/scenario.html Concise background and a review are at http://www8.zdnet.com/pcweek/reviews/0505/05mining.html COGNOS UNVEILS SCENARIO FOR DATA MINING -- New Data Mining Software Joins Cognos' Market-Leading Business Intelligence Tools, PowerPlay" For OLAP And Impromptu" For Query & Reporting -- BURLINGTON, MA, March 3, 1997 -- Cognos (NASDAQ:COGNF; TSE:CSN) today announced its newest business intelligence tool, Scenario, for enterprise-wide guided data analysis and data mining. Scenario extends the industry's most comprehensive business intelligence product family, joining Cognos' market-leading PowerPlay, the universal online analytical processing (OLAP) client, and the award-winning Impromptu query and reporting tool. Designed for spotting patterns and exceptions in business data that might otherwise be missed, Scenario's sophisticated interface allows users to readily visualize the business information being uncovered. It automates the discovery and ranking of critical factors impacting a business, exposes hidden relationships between factors and establishes thresholds and benchmarks. An intuitive, cost-effective desktop tool, Scenario liberates data mining from what is typically an expensive and time-consuming process. Insights derived using Scenario are achieved directly by those best positioned to use the knowledge and effect rapid change. Designed to support faster business decision-making, Scenario: * makes data mining immediately accessible to decision makers; * simplifies business data analysis by filtering out insignificant business variables and relationships; * validates business hypotheses by showing and ranking critical factors and relationships; * leads to new business insights by automating information discovery; and * integrates with Impromptu and PowerPlay as best-of-breed components in the Cognos enterprise business intelligence solution. "With Scenario, Cognos is delivering a very important technology to business analysts," said George Azrak, national director of IS development at Domino's Pizza. Domino's Pizza has been working with early versions of Scenario, and has provided Cognos with valuable input from an end user's point of view. "Accessible data mining is the long-awaited third wave in the data warehousing revolution," said Alan Rottenberg, Cognos' senior vice president, Business Intelligence Tools. "First query and reporting brought data to the desktop, then OLAP technologies enabled the convenient navigation of massive data warehouses. Data mining is the technological leap that automates the information discovery process. Rottenberg continued, "Impromptu gives access to the numbers and data on which a business runs. PowerPlay lets individual managers explore that data without an army of programmers. Scenario works alongside both of those products to refine business data to distinguish what really matters. Drawing a straight line to the bottom line, this product completes the spectrum of business intelligence tools that can arm knowledge workers with the insight to truly understand the data that drives a business -- and to reap the competitive rewards." Scenario uses statistical methods that go beyond "tree" analysis. For example, one such method is a data segmentation capability based on CHAID (Chi-Squared Automatic Interaction Detection) technology. CHAID allows users to find statistically relevant relationships and trends within large repositories of business data by "refining" it down to the most useful nuggets that have the greatest effect on the results being tracked. Subsequent releases of Scenario will include neural-network modeling and forecasting capabilities, using technologies from recently acquired Right Information Systems. Pricing and Availability Available from Cognos for $695, Scenario 1.0 for Windows 95 or Windows NT requires an IBM-compatible 486 PC and 8 MB of RAM. >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Thu, 8 May 1997 10:40:10 -0500 (EST) From: Gregory Piatetsky-Shapiro <[email protected]> Subject: Looking for experts in decision trees and/or bayesian networks Data Mining Consulting and Integration Company is looking for experts in decision trees and/or bayesian networks TASK: Participate in the design, development, and deployment of leading edge integrated data mining and customer modeling systems, primarily in the financial area. Perform quick data mining studies using a variety of different approaches and tools. The candidates will join a team of world-class experts in data warehousing, data mining and knowledge discovery. Ideal candidates will have a Ph.D. in Machine Learning, Statistics, or related fields and 2-3 years of experience, or an M.S. with an equivalent experience. The candidates should have expertise with different modeling approaches, but primarily with with decision trees/rules or with bayesian belief networks. The candidates should be familiar with statistical theory and have practical experience with databases. Excellent coding skills in C/Java/Unix environment along with good system maintenance practices and the ability to quickly pick up new systems and languages are needed. The candidates should also have good communication skills, be able to work in a team, and be able to enjoy the exciting atmosphere of a start-up company. Most of all, candidates should have the passion for developing and applying innovative methods for solving practical problems. We offer very competitive salaries, and our outstanding benefits include profit sharing, stock options, medical/dental insurance, and a 401(k) plan. The data mining branch of the company is conveniently located in the Cambridge area, easily accessible by public transportation. Proper work authorization required. Please email your resume and a cover letter (in plain ASCII, please) to: Gregory Piatetsky-Shapiro, Ph.D. Director of Applied Research Geneve Consulting Group 545 Concord Ave Cambridge MA 02138 email: [email protected] tel: 617-661-1358 fax: 617-491-4936 URL: http://www.kdnuggets.com/gps.html >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Subject: Data Mining Research Position possibility. Date: Sat, 26 Apr 1997 11:57:24 +0100 From: Donal Lyons <[email protected]> Currently there is EU funding available for experienced researchers to spend a year in countries such as Ireland. I wish to explore the possibility of using this funding to help develop a Data Mining Interest Group within the School of Systems and Data Studies in Trinity College, Dublin. I'd like to discuss this further with any experienced EU researchers who are at least tentatively interested. Regards, Donal. Donal Lyons, Phone (1000-1700 GMT) +353 1 608 1919 Lecturer (Information Systems) Phone Messages +353 1 608 1767 School of Systems & Data Studies Trinity College, Dublin 2, FAX on request Ireland. ................http://www2.tcd.ie/Statistics/staff/dlyons.html........ >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 5 May 97 11:48 BST From: Yike Guo <[email protected]> Subject: Job in Japan A Fujitsu subsidiary company which is developing OLAP and datamining tools is now looking for a foreign engineer who is interested in working in Japan. Carrier opportunity for a programing engineer in Japan Duties Designing and programing data mining products which include a visualizing OLAP client. Requirements - BS or MS degree related to computer science - C programming skill (VC++ on NT background is best) - Familiarity with datamining, visualization, or OLAP - Native English speaker Contact Fujitsu SWE, Manager Mr. Katoh E-mail: [email protected] >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Tue, 29 Apr 1997 19:30:03 +0200 (MET DST) From: Pavel Brazdil <[email protected]> Call for Participation The Workshop on "Extraction of Knowledge from Data Bases" EKBD'97 http://alma.uc.pt:80/~epia97/EKBD97.html Under the auspices of the Portuguese Conference on Artificial Intelligence (EPIA'97) Coimbra, Portugal, October 6-9, 1997 October, 7-8, 1997 Coimbra University Physics Building Aims of the Workshop This workshop is in the area of Extraction (or Discovery) of Knowledge from Data Bases and Data Mining, which are rather recent but expanding rapidly. The objective of the workshop is to discuss methods for non-trivial extraction of information which is implicit in the existing data and which can be represented in a high-level language so as to facilitate interpretation. EKBD'97 welcome original papers in English on the following topics: - Machine Learning methods useful in KDD and Data Mining, (decision tree /rule induction, relational learning (ILP) etc.) - Statistical methods useful in KDD and Data Mining, (multivariate analysis, principle components, clustering, regression methods etc.), - Reduction of complexity through preprocessing, (identification of relevant attributes, data sampling, clustering, etc.), - Data summarization and consolidation, - Languages useful in describing user's hypotheses, - Applications of KDD and Data Mining, - other related areas of interest. Workshop Format and Attendance Requirements: The workshop will include invited talks, paper presentations and a panel discussion. The workshop will last 1-2 days. Papers in English, with no more than 15 pages are welcome. Attendees should be registred to the main EPIA conference. (see http://alma.uc.pt:80/~epia97) Submit 3 copies of the full paper to the address below: Pavel Brazdil LIACC, Universidade do Porto, R. Campo Alegre, 823, 4150 PORTO, PORTUGAL Text format should follow Springer Verlag Lecture Notes Series. English is the official language of the workshop. Important dates: June, 16: submissions due July, 15: notifications sent September, 8: final versions due Programme Committee: Pavel Brazdil, Univ.Porto (chair) Arlindo Oliveira, IST Carlos Bento, U. Coimbra Ernesto Costa, U. Coimbra Fernando Moura-Pires, UNL-FCT Fernando Nicolau, UNL-FCT Helena Bacelar Nicolau, UNL-FCT Joaquim Pinto da Costa, Univ. Porto Paulo Azevedo, Univ. Minho Paula Brito, Univ. Porto Paulo Gomes, INE, Porto Organizing Committee: Pavel Brazdil (chair) LIACC, Universidade do Porto, R. Campo Alegre, 823, 4150 PORTO, PORTUGAL email: [email protected] Tel.: (02) 600 1672, Fax: (02) 600 3654 Fernando Moura-Pires UNL-FCT, Dept. Informatica, Quinta da Torre 2825 Monte da Caparica, PORTUGAL email: [email protected] Tel.: (01) 295 4464, Fax: (01) 295 5641 >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Subject: IDA Call for Participation Date: Thu, 8 May 1997 17:43:12 +0200 From: Michael Berthold <[email protected]> CALL FOR PARTICIPATION The Second International Symposium on Intelligent Data Analysis (IDA-97) Birkbeck College, University of London 4th-6th August 1997 In Cooperation with AAAI, ACM SIGART, BCS SGES, IEEE SMC, and SSAISB [ http://web.dcs.bbk.ac.uk/ida97.html ] You are invited to participate in IDA-97, to be held in the heart of London. IDA-97 will be a single-track conference consisting of oral and poster presentations, invited speakers, demonstrations and exhibitions. The conference Call for Papers introduced a theme, "Reasoning About Data", and many papers complement this theme, but other, exciting topics have emerged, including exploratory data analysis, data quality, knowledge discovery and data-analysis tools, as well as the perennial technologies of classification and soft computing. A new and exciting theme involves analyzing time series data from physical systems, such as medical instruments, environmental data and industrial processes. Information regarding registration as well as the preliminary technical program can be found on the IDA-97 web page (address listed above). Please note that there are reduced rates for early registration (before 2nd June). Also there are still a limited number of spaces available for exhibition, and potential exhibitors are encouraged to book early (the application deadline is 2nd June). >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: "Staal Vinterbo" <[email protected]> Message-Id: <[email protected]> Date: Tue, 6 May 1997 18:05:56 +0200 X-Mailer: Z-Mail (3.2.1 10oct95) To: [email protected] Subject: PKDD'97 Call for participation Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Status: U X-Mozilla-Status: 0001 Content-Length: 4951 Dear Sir. I am asking on behalf of Prof. Komorowski that the following call for participation is distributed via the kdd nuggets mailinglist. Thank you. PKDD'97 -- Call For Participation 1st European Symposium on Principles of Data Mining and Knowledge Discovery Trondheim, Norway June 24-27, 1997 Tutorials: June 24-25 Symposium: June 26-27 Data Mining and Knowledge Discovery (KDD) have recently emerged from a combination of many research areas: databases, statistics, machine learning, automated scientific discovery, inductive programming, artificial intelligence, visualization, decision science, and high performance computing. While each of these areas can contribute in specific ways, KDD focuses on the value that is added by creative combination of the contributing areas. The goal of PKDD'97 is to provide a European-based forum for interaction among all theoreticians and practitioners interested in data mining. Fostering an interdisciplinary collaboration is one desired outcome, but the main long-term focus is on theoretical principles for the emerging discipline of KDD, especially those new principles that go beyond each of the contributing areas. Please look at the PKDD'97 Homepage (http://www.idi.ntnu.no/pkdd97/) for detailed information and news about the symposium. Registration Information is available at http://www.idi.ntnu.no/pkdd97/fees.html >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: [email protected] Date: Sun, 4 May 97 12:10 EDT Subject: Modern Regression and Classification course - New York ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++ +++ +++ Modern Regression and Classification: +++ +++ +++ +++ Statistical prediction methods for finance +++ +++ and marketing +++ +++ +++ +++ +++ +++ New York City: June 23-24, 1997 +++ +++ +++ +++ Trevor Hastie, Stanford University +++ +++ Rob Tibshirani, University of Toronto +++ +++ +++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ This two-day course will give a detailed overview of statistical models for regression and classification. Known as machine-learning in computer science and artificial intelligence, and pattern recognition in engineering, this is a hot field with powerful applications in finance, science and industry. This course covers a wide range of models from linear regression through various classes of more flexible models to fully nonparametric regression models, both for the regression problem and for classification. This special version of our popular MRC course is tailored to financial and marketing professionals. Although a firm theoretical motivation will be presented, the emphasis will be on practical applications and implementations, especially in the finance and marketing areas. The course will include many examples and case studies, and participants should leave the course well-armed to tackle real problems with realistic tools. The instructors are at the forefront in research in this area. After a brief overview of linear regression tools, methods for one-dimensional and multi-dimensional smoothing are presented, as well as techniques that assume a specific structure for the regression function. These include splines, wavelets, additive models, MARS (multivariate adaptive regression splines), projection pursuit regression, neural networks and regression trees. All of these can be adapted to the time-series framework for predicting future trends from the past. The same hierarchy of techniques is available for classification problems. Classical tools such as linear discriminant analysis and logistic regression can be enriched to account for nonlinearities and interactions. Generalized additive models and flexible discriminant analysis, neural networks and radial basis functions, classification trees and kernel estimates are all such generalizations. Other specialized techniques for classification including nearest- neighbor rules and learning vector quantization will also be covered. Apart from describing these techniques and their applications to a wide range of problems, the course will also cover model selection techniques, such as cross-validation and the bootstrap, and diagnostic techniques for model assessment. Software for these techniques will be illustrated, and a comprehensive set of course notes will be provided to each attendee. Additional information is available at the Website: http://stat.stanford.edu/~trevor/mrc.finance.html >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 05 May 1997 12:45:27 +0200 From: Angi Voss <[email protected]> Subject: Workshop on Social Agents "Social Agents in Web-Based CollaborationTS at the ECSCWP297 Conference September 7, 1997 Organizers: Thomas Kreifelts, Angi Voss, Gloria Mark, Arnstein Borstad, Vidar Hepsoe Abstract -------- We see signs today that the Web is moving toward an environment where new social and collaborative interactions are being realized. Rather than continuing to evolve as a single-user environment, the Web is beginning to be regarded as an environment where reciprocity and awareness of othersP2 activities have an important function. Software agents can help develop and support the process of reciprocity by helping people find others with similar interests, and helping match knowledge to the right people. Agents can also help people collectively construct knowledge, shaped around their needs. This full-day workshop is intended for designers and researchers from academia and industry to discuss the role of agents in dealing with social information. How can social agents be integrated into collaborative relationships so that information and expertise can be distributed and matched to the right people, where appropriate relationships can be developed, and where collective knowledge can be established? Participation requires the submission of an input paper (3-6 pages) that should try to address the points described above, from any of the following aspects: -experiences with agent use in collaboration -design of agent systems -application areas -interface design The paper should be sent for review by June 15 to: Thomas Kreifelts GMD-FIT.CSCW D-53754 Sankt Augustin Germany Email: [email protected] Fax: +49-2241-142084 Electronic submission is encouraged, HTML being the preferred format. The selection of participants will be based on the input papers. Accepted participants will be notified before the end of June so that they can take advantage of early registration by July 1. For those who are interested in submitting a paper to the workshop, but are not able to meet the June 15 deadline, please contact the organizers as soon as possible expressing your interest to participate in the workshop. The accepted input papers will be distributed electronically in advance to the workshop participants. The workshop will be structured around the presentation of selected input papers to stimulate the discussion. Note that participation in the workshop requires participation in the ECSCW 97 conference. Important Dates: ---------------- June 15, 1997 - Deadline for submissions end of June - Notification of acceptance ...July 1, 1997 - Early registration deadline for the ECSCW '97 conference September 7, 1997 - The Workshop For more information: http://orgwis.gmd.de/projects/SAW/ecscw97SoAg.html Angi Voss GMD FIT D-53754 Sankt Augustin phone: (+49) 2241-142726 fax: (+49) 2241-142384 e-mail: [email protected] URL: http://nathan.gmd.de/persons/angi.voss.html >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
410.26	97:17	IJSAPL::OLTHOF	Spellchecked Henry Although	`Sat May 17 1997 11:40`	697
	Knowledge Discovery Nuggets 97:17, e-mailed 97-05-15 Publications: * Phil Chan, CFP: MLJ special issue on IMLM, http://www.cs.fit.edu/~imlm/ Siftware: * P. Spedding, Cognos' Scenario Wins PC Week Labs Analyst's Choice Award, http://www8.zdnet.com/pcweek/reviews/0505/05mining.html Positions: * COMPUTATIONAL FINANCE at the Oregon Graduate Institute of Science & Technology (OGI), http://www.cse.ogi.edu/CompFin/ * George Smith, Research Assistant Position at UEA, Norwich, UK Meetings: * Lipo Wang, 2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-98), Melbourne, Australia, 15-17 April 1998, http://www.sd.monash.edu.au/pakdd-98 * David Leake, ICCBR-97: First Call for Participation, http://www.iccbr.org/iccbr-97.html * Hakan Erdogmus, CASCON'97 CfP, http://www.cas.ibm.ca/cascon/ * John R. Koza, GP-97 Revised Call for Participation, http://www-cs-faculty.stanford.edu/~koza/gp97.html -- Knowledge Discovery Nuggets is a free electronic newsletter for the Data Mining and Knowledge Discovery community, focusing on the latest research and applications. Submissions are most welcome and should be emailed, with a DESCRIPTIVE subject line (and a URL) to [email protected]. Please keep CFP and meetings announcements short and provide a URL for details. To subscribe, see http://www.kdnuggets.com/subscribe.html KD Nuggets frequency is 3-4 times a month. Back issues of KD Nuggets, a catalog of data mining tools ("Siftware"), pointers to Data Mining Companies, Relevant Websites, Meetings, and more is available at Knowledge Discovery Mine site at http://www.kdnuggets.com/ -- Gregory Piatetsky-Shapiro (editor) [email protected] ******************* Official disclaimer *********************** All opinions expressed herein are those of the contributors and not necessarily of their respective employers (or of KD Nuggets) ******************************************************************* ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If the fool would persist in his folly he would become wise. William Blake >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: "IMLM Workshop (pkc)" <[email protected]> Subject: CFP: MLJ special issue on IMLM Dear colleagues, Here is a CFP for the Machine Learning Journal special issue on IMLM. Submission is due on Oct 1st, 97. Hope you can submit. Thanks. Phil, Sal, and Dave ------ CALL FOR PAPERS Machine Learning Journal Special Issue on Integrating Multiple Learned Models for Improving and Scaling Machine Learning Algorithms Most modern Machine Learning, Statistics and KDD techniques use a single model or learning algorithm at a time, or at most select one model from a set of candidate models. Recently however, there has been considerable interest in techniques that integrate the collective predictions of a set of models in some principled fashion. With such techniques often the predictive accuracy and/or the training efficiency of the overall system can be improved, since one can "mix and match" among the relative strengths of the models being combined. Any aspect of integrating multiple models is appropriate for the special issue. However we intend the focus of the special issue to be on the issues of improving prediction accuracy and improving training efficiency in the context of large databases. Submissions are sought in, but not limited to, the following topics: 1) Techniques that generate and/or integrate multiple learned models. Examples are schemes that generate and combine models by * using different training data distributions (in particular by training over different partitions of the data) * using different sampling techniques to generate different partitions * using different output classification schemes (for example using output codes) * using different hyperparameters or training heuristics (primarily as a tool for generating multiple models) 2) Systems and architectures to implement such strategies. For example, * parallel and distributed multiple learning systems * multi-agent learning over inherently distributed data 3) Techniques that analyze the integration of multiple learned models for * selecting/pruning models * estimating the overall accuracy * comparing different integration methods * tradeoff of accuracy and simplicity/comprehensibility Schedule: October 1: Deadline for submissions December 15: Deadline for getting decisions back to authors March 15: Deadline for authors to submit final versions August 1998: Publication Submission Guidelines: 1) Manuscripts should conform to the formatting instructions in: http://www.cs.orst.edu/~tgd/mlj/info-for-authors.html The first author will be the primary contact unless otherwise stated. 2) Authors should send 5 copies of the manuscript to: Karen Cullen Machine Learning Editorial Office Attn: Special Issue on IMLM Kluwer Academic Press 101 Philip Drive Assinippi Park Norwell, MA 02061 617-871-6300 617-871-6528 (fax) [email protected] and one copy to: Philip Chan MLJ Special Issue on IMLM Computer Science Florida Institute of Technology 150 W. University Blvd. Melbourne, FL 32901 407-768-8000 x7280 (x8062) (407-674-7280/8062 after 6/1/97) 407-984-8461 (fax) 3) Please also send an ASCII title page (title, authors, email, abstract, and keywords) and a postscript version of the manuscript to [email protected]. General Inquiries: Please address general inquiries to: [email protected] Up-to-date information is maintained on WWW at: http://www.cs.fit.edu/~imlm/ Co-Editors: Philip Chan, Florida Institute of Technology [email protected] Salvatore Stolfo, Columbia University [email protected] David Wolpert, IBM Almaden Research Center [email protected] >~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [The following is a commercial announcement. GPS] From: "Spedding, Patrick" <[email protected]> Subject: Cognos' Scenario Wins PC Week Labs Analyst's Choice Award Date: Fri, 9 May 1997 05:36:20 -0400 Cognos' Scenario Wins PC Week Labs Analyst's Choice Award BURLINGTON, Mass., May 6 /PRNewswire/ -- Cognos'(R) (Nasdaq: COGNF; Toronto: CSN) Scenario(TM) data mining tool won PC Week Labs Analyst's Choice Award after a head-to-head review with a competing product. Scenario's "innovative interface makes it the coolest software package we've seen this year," said the review, which cited its superiority, power and graphics. Scenario extends the industry's most comprehensive business intelligence product family, joining Cognos' market-leading PowerPlay(R), the universal OLAP client, and award-winning Impromptu(R) query and reporting tool. "This award substantiates Cognos' belief that data mining in the hands of business users offers up a powerful, functional and affordable competitive edge," said Alan Rottenberg, senior vice president, Business Intelligence products. "Putting data mining capabilities into the hands of decision makers and knowledge workers extends our strategy of enabling them to react quickly to newfound knowledge, whether in operational systems or data warehouses. Scenario joins Cognos' other award-winning business intelligence tools for fastest time to results, lowest cost of ownership and unparalleled ease of use." PC Weeks Labs, the world's largest independent testing laboratory, applauded both Cognos' Scenario and the competitor for bringing new data mining techniques to the PC. "But in head-to-head testing," it wrote, "Scenario safely mined more usable information than its competitor, making it our top pick." Designed for spotting patterns and exceptions in business data that might otherwise be missed, Scenario's sophisticated interface allows users to readily visualize the business information being uncovered. It automates the discovery and ranking of critical factors impacting a business, exposes hidden relationships between factors and establishes thresholds and benchmarks. An intuitive, cost-effective desktop tool, Scenario liberates data mining from what is typically an expensive and time-consuming process. Insights derived using Scenario are achieved directly by those best positioned to use the knowledge and effect rapid change. Scenario 1.0, released in April 1997, is available from Cognos for $695. It runs on Windows 95 and Windows NT and requires an IBM-compatible 486 PC and 8 MB of RAM. (see http://www8.zdnet.com/pcweek/reviews/0505/05mining.html for PC week comparison of Scenario and BusinessMiner. GPS) >~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Wed, 7 May 1997 11:46:09 -0700 (PDT) From: Computational Finance <[email protected]> Subject: Computational Finance Graduate Programs ======================================================================= COMPUTATIONAL FINANCE at the Oregon Graduate Institute of Science & Technology (OGI) Master of Science Concentrations in Computer Science & Engineering (CSE) Electrical Engineering (EE) Upcomming MS Application Deadline for Fall 1997: May 15 & June 15! New! Certificate Program Designed for Part-Time Students. For more information, contact OGI Admissions at (503)690-1027 or [email protected], or visit our Web site at: http://www.cse.ogi.edu/CompFin/ ======================================================================= Computational Finance Overview: Advances in computing technology now enable the widespread use of sophisticated, computationally intensive analysis techniques applied to finance and financial markets. The real-time analysis of tick-by-tick financial market data, and the real-time management of portfolios of thousands of securities is now sweeping the financial industry. This has opened up new job opportunities for scientists, engineers, and computer science professionals in the field of Computational Finance. The strong demand within the financial industry for technically sophisticated graduates is addressed at OGI by the Master of Science and Certificate Programs in Computational Finance. Unlike a standard two year MBA, the programs are directed at training scientists, engineers, and technically oriented financial professionals in the area of quantitative finance. The master's programs lead to a Master of Science in Computer Science and Engineering (CSE track) or in Electrical Engineering (EE track). The MS programs can be completed within 12 months on a full-time basis. In addition, OGI has introduced a Certificate program designed to provide professionals in engineering and finance a means of upgrading their skills or acquiring new skills in quantitative finance on a part-time basis. The Computational Finance MS concentrations feature a unique combination of courses that provides a solid foundation in finance at a non-trivial, quantitative level, plus the essential core knowledge and skill sets of computer science or the information technology areas of electrical engineering. These skills are important for advanced analysis of markets and for the development of state-of-the-art investment analysis, portfolio management, trading, derivatives pricing, and risk management systems. The MS in CSE is ideal preparation for students interested in securing positions in information systems in the financial industry, while the MS in EE provides rigorous training for students interested in pursuing careers as quantitative analysts at leading-edge financial firms. The curriculum is strongly project-oriented, using state-of-the-art computing facilities and live/historical data from the world's major financial markets provided by Dow Jones Telerate. Students are trained in the use of high-level numerical and analytical software packages for analyzing financial data. OGI has established itself as a leading institution in research and education in Computational Finance. Moreover, OGI has strong research programs in a number of areas that are highly relevant for work in quantitative analysis and information systems in the financial industry. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Tue, 13 May 1997 14:40:06 +0100 (BST) From: [email protected] (George Smith) Subject: Research Assistant Position at UEA, Norwich, UK The School of Information Systems, University of East Anglia, Norwich has a vacancy for a Research Assistant to work on a project entitled "Datamining in the Telecommunications Sector". A computer graduate with at least a 2(I) degree in computing or allied subject is sought for a two year post starting August 1st, 1997, or as soon as possible thereafter. The appointee will work within a leading telecommunications company, Nortel plc, on a day-to-day basis but will be an employee of the University of East Anglia. Opportunities will exist for registration for a part-time higher degree at the University. A successful applicant will be expected to have a high degree of numeracy and a strong computing background. Preference will be given to those who, in addition, have some knowledge (and expertise) in one or more of the following: evolutionary computation, operations research, artificial intelligence or telecommunications. The research is sponsored jointly by the Teaching Company Scheme and by Nortel plc and involves the development and application of various inference and heuristic techniques, including genetic algorithms, simulated annealing and tabu search, to elicit knowledge from large scale data sets generated within the telecommunications industry. Initial salary to be determined but expected to be around 16K UK pounds. Applicants are invited to telephone Dr George D Smith (+44 (0) 1603 593260) or email [email protected] for further information. Applications in the form of a covering letter plus three copies of a CV, including the names and addresses of three referees, should be sent to: Dr George D Smith School of Information Systems University of East Anglia Norwich NR4 7TJ, UK on or before Friday 6th June 1997. Tel: + 44 (0)1603 593260 FAX: + 44 (0)1603 593344 Email: [email protected] www: http://www.sys.uea.ac.uk/Teaching/Staff/gds.html >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Mon, 12 May 1997 16:14:32 +1000 From: Lipo Wang <[email protected]> Subject: CFP: Conference on Knowledge Discovery and Data Mining (PAKDD-98) ====================================================================== C A L L F O R P A P E R S ====================================================================== The Second Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-98) ---------------------------------------------- Melbourne, Australia, 15-17 April 1998 ====================================== URL: http://www.sd.monash.edu.au/pakdd-98 The Second Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-98) will provide an international forum for the sharing of original research results and practical development experiences among researchers and application developers from different KDD related areas such as machine learning, databases, statistics, knowledge acquisition, data visualization, software re-engineering, and knowledge-based systems. It will follow the success of PAKDD-97 held in Singapore in 1997 by bringing together participants from universities, industry and government. Papers on all aspects of knowledge discovery and data mining are welcome. Areas of interest include, but are not limited to: - Data and Dimensionality Reduction - Data Mining Algorithms and Tools - Data Mining and Data Warehousing - Data Mining on the Internet - Data Mining Metrics - Data Preprocessing and Postprocessing - Data and Knowledge Visualization - Deduction and Induction in KDD - Discretisation of Continuous Data - Distributed Data Mining - KDD Framework and Process - Knowledge Representation and Acquisition in KDD - Knowledge Reuse and Role of Domain Knowledge - Knowledge Acquisition in Software Re-Engineering and Software Information Systems - Induction of Rules and Decision Trees - Management Issues in KDD - Machine Learning, Statistical and Visualization Aspects of KDD (including Neural Networks and Inductive Logic Programming) - Mining in-the-large vs Mining in-the-small - Noise Handling - Security and Privacy Issues in KDD - Successful/Innovative KDD Applications in Science, Government, Business and Industry. Both research and applications papers are solicited. All submitted papers will be reviewed on the basis of technical quality, relevance to KDD, significance, and clarity. Accepted papers will be published in the conference proceedings by an international publisher. A selected number of the accepted papers will be expanded and revised for inclusion in a special issue of an international journal. All submissions should be limited to a maximum of 5,000 words. Four hardcopies should be forwarded to the following address. Professor Ramamohanarao Kotagiri (PAKDD '98) Department of Computer Science The University of Melbourne Parkville, VIC 3052 Australia Please include a cover page containing the title, authors (names, postal and email addresses), an 200-word abstract and up to 5 keywords. This cover page must accompany the paper. ************* I m p o r t a n t D a t e s ************* * 4 copies of full papers received by: October 16, 1997 * * acceptance notices: December 22, 1997 * * final camera-readies due by: January 30, 1998 * ************************************************************* Conference Chairs: ================== Ross Quinlan Sydney University Bala Srinivasan Monash University Program Chairs: =============== Xindong Wu Monash University Ramamohanarao Kotagiri Melbourne University Organising Committee Co-Chairs: =============================== Kevin Korb Monash University Graham Williams CSIRO, Australia PAKDD-98 Publicity Chair: ========================= Lipo Wang Deakin University PAKDD-98 Tutorial Chair: ======================== Jon Oliver Monash University PAKDD-98 Treasurer: =================== Michelle Riseley Monash University Program Committee: ================== Grigoris Antoniou James Boyce Ivan Bratko Mike Cameron-Jones Arbee Chen David Cheung Vic Ciesielski Honghua Dai John Debenham Olivier de Vel Tharam Dillon Guozhu Dong Peter Eklund Usama Fayyad Matjaz Gams Yike Guo David Hand Evan Harris David Heckerman David Kemp Masaru Kitsuregawa Kevin Korb Hingyan Lee Jae-Kyu Lee Deyi Li Bing Liu Huan Liu Zhi-Qiang Liu Hongjun Lu Dickson Lukose Kia Makki Heikki Mannila Peter Milne Shinichi Morishita Hiroshi Motoda Hwee-Leng Ong Jon Oliver Maria Orlowska G. Piatetsky-Shapiro Niki Pissinou Peter Ross Claude Sammut S. Seshadri Hayri Sever Arun Sharma Heinz Schmidt Evangelos Simoudis Atsuhiro Takasu Takao Terano B. Thuraisingham Kai Ming Ting David Urpani R. Uthurusamy Lipo Wang Geoff Webb Graham Williams Beat Wuthrich Xin Yao John Zeleznikow Dian-cheng Zhang Ming Zhao Zijian Zheng Ning Zhong Justin Zobel Further Information =================== Dr Xindong Wu Department of Software Development Monash University 900 Dandenong Road Caulfield East, Melbourne 3145 Australia Phone: +61 3 9903 1025 Fax: +61 3 9903 1077 Email: [email protected] >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Tue, 6 May 1997 13:08:00 -0500 (EST) From: "David Leake" <[email protected]> Subject: ICCBR-97: First Call for Participation ICCBR-97 Second International Conference on Case-Based Reasoning Brown University Providence, Rhode Island, July 25-27, 1997 Note: The early registration deadline is May 28, 1997 (extended from May 20). Additional information is available from http://www.iccbr.org/iccbr-97.html Questions should be sent to [email protected]. --------------- Conference Overview --------------- In 1995, the first International Conference on Case-Based Reasoning (ICCBR-95) was held in Sesimbra, Portugal, as the start of a biennial series. ICCBR-97, the Second International Conference on Case-Based Reasoning, will be held at Brown University in Providence, Rhode Island, on July 25-27, immediately prior to AAAI-97 and IAAI-97. The program of ICCBR-97 will include both research and applications. The three-day conference will feature invited talks, paper and poster sessions, and panels presenting both mature work and new ideas, selected from over 100 submissions to the conference. The conference aims to achieve a vibrant interchange between researchers and practitioners with different perspectives on fundamentally related issues, in order to examine and advance the state of the art in case-based reasoning and related fields. Topics to be addressed in conference presentations include: * Case representation, indexing and retrieval, similarity assessment, case adaptation, and analogical reasoning * Case-based and instance-based learning, index learning, and integrating CBR with other learning methods * Case-based reasoning and related approaches for task areas such as education, design, and medicine * Integration of CBR with other AI methods and comparisons to other approaches * Methods and systems for decision support, knowledge management, and intelligent information retrieval * Novel application areas for case-based techniques, deployed applications with significant impact, and lessons learned from application development (See http://www.iccbr.org/iccbr-97.html for details on registration, etc.) >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: 8 May 1997 10:17:04 -0500 From: "Erdogmus" <[email protected]> Subject: CASCON'97 CfP CASCON'97 web site: http://www.cas.ibm.ca/cascon/ -- CASCON'97: Meeting of Minds November 10-13, 1997 International Plaza Hotel Mississauga, Ontario, Canada Dear Colleague, CASCON '97, the seventh annual IBM Center for Advanced Studies Conference is upon us. CASCON provides an excellent opportunity for academic, governmental, and industrial research communities to share their work. We encourage you to submit papers. The deadline for paper submissions is June 27, however, we would like to know about your intention to submit a paper earlier (by May 16, if possible). If you are thinking about submitting a paper, please register as soon as possible on our web site at http://www.cser.ca:8001/ All you have to do is to fill out a simple online form specifying a tentative title and some keywords. This information can easily be changed any time using the automated system. This year, we are soliciting papers in a wide range of topics including = but not limited to the following: - Distributed systems and applications: Internet and the WWW, electronic commerce, tele-learning, tele-medicine, CSCW, multimedia, distributed object technologies, Java, performance analysis, high-speed networks, and applications management - Database technology: data mining, knowledge recovery, digital = libraries, and data warehousing - User technologies: human-computer interaction, navigation, and GUIs - Software engineering and practices: maintenance, design recovery, program understanding, visualization, reuse, frameworks and design patterns, development environments, reliability, testing and validation, metrics, and real-time systems - Compiler technology: new techniques, compiler development, optimization, parallelism, and architectures For more information about CASCON'97, please visit the web site http://www.cas.ibm.ca/cascon/ We are looking forward to your participation. Dr. Hakan Erdogmus CASCON'97 Program Co-chair [email protected] CASCON'97 web site: http://www.cas.ibm.ca/cascon/ >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Sat, 10 May 1997 13:09:26 -0700 (PDT) From: "John R. Koza" <[email protected]> Subject: GP-97 Revised Call for Participation CALL FOR PARTICIPATION Genetic Programming 1997 Conference (GP-97) July 13 - 16 (Sunday - Wednesday), 1997 Fairchild Auditorium - Stanford University - Stanford, California ----------------------------------------------------------------------- In cooperation with American Association for Artificial Intelligence (AAAI), Association for Computing Machinery (ACM), SIGART, and Society for Industrial and Applied Mathematics (SIAM) ----------------------------------------------------------------------- WWW FOR GP-97: http://www-cs-faculty.stanford.edu/~koza/gp97.html ----------------------------------------------------------------------- NOTE: You are urged to make your housing arrangements as early as possible since convenient hotel locations are limited. Also, if you are driving to the Stanford campus, please be aware of parking lot construction in the area of Fairchild Auditorium and allow a little extra time (particularly on the first Monday session) to find a parking place. ----------------------------------------------------------------------- Genetic programming is an automatic programming technique for evolving computer programs that solve (or approximately solve) problems. Starting with a primordial ooze of thousands of randomly created computer programs, a population of programs is progressively evolved over many generations using the Darwinian principle of survival of the fittest, a sexual recombination operation, and occasional mutation. The first annual genetic programming conference in 1996 featured 15 tutorials, 2 invited speakers, 3 parallel tracks, 73 papers, and 17 poster papers in proceedings book, and 27 late-breaking papers in a separate book distributed to conference attendees, and 288 attendees. A description of GP-96 appears in the October 1996 issue of Scientific American (http://www.sciam.com/WEB/1096issue/1096techbus3.html). This second annual conference in 1997 reflects the rapid growth of this field in which over 600 technical papers have been published since 1992. For August 5, 1996 article in E. E. Times on GP-96 conference and August 12, 1996 article in E. E Times on John Holland's invited speech at GP-96, go to http://www.techweb.com/search/search.html There will be 36 long, 33 short, and 15 poster papers at the Second Annual Genetic Programming Conference to be held on July 13-16 (Sunday - Wednesday), 1997 at Stanford University. In addition, there will be late-breaking papers (published in a separate book in mid June after the June 11 deadline for late-breaking papers). Topics include, but are not limited to, applications of genetic programming, theoretical foundations of genetic programming, implementation issues, technique extensions, cellular encoding, evolvable hardware, evolvable machine language programs, automated evolution of program architecture, evolution and use of mental models, automatic programming of multi-agent strategies, distributed artificial intelligence, auto-parallelization of algorithms, automated circuit synthesis, automatic programming of cellular automata, induction, system identification, control, automated design, data and image compression, image analysis, pattern recognition, molecular biology applications, grammar induction, and parallelization. Papers describing recent developments are also solicited in the following additional areas: genetic algorithms, classifier systems, evolutionary programming and evolution strategies, artificial life and evolutionary robotics, DNA computing, and evolvable hardware. ----------------------------------------------------------------------- full information at http://www-cs-faculty.stanford.edu/~koza/gp97.html >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
410.27	97:18	IJSAPL::OLTHOF	Spellchecked Henry Although	`Thu Jun 05 1997 21:54`	749
	Knowledge Discovery Nuggets 97:18, e-mailed 97-05-27 News: * Ronny Kohavi, Silicon Graphics' MineSet used in Incyte's LifeTools 3D http://www.incyte.com/press/1997/PR9712-LT3D.html * R. Zicari, COMDEX Internet Application Awards, http://www.ltt.de * Brij Masand, HPCwire: Robert Grossman discusses managing, mining large data sets Publications: * GPS, First Issue of DMKD journal is available on-line in PDF format, http://www.wkap.nl/kapis/CGI-BIN/WORLD/kaphtml.htm?DAMISAMPLE * Andy Pryke, Bibliography of KDD and Data Mining Papers, http://www.cs.bham.ac.uk/~anp/papers.html Meetings: * D. Fischer, COLT/ICML Early Registration deadline June 2, http://cswww.vuse.vanderbilt.edu/~mlccolt/ * Jan Komorowski, PKDD'97 -- Call For Participation, http://www.idi.ntnu.no/pkdd97/ * David Heckerman, Summer School on PROBABILISTIC GRAPHICAL MODELS http://www.newton.cam.ac.uk/programs/nnm.html * Vasant Honavar, CFP: Workshop on Automata Induction Grammatical Inference, and Language Acquisition at ICML-97 http://www.cs.iastate.edu/~honavar/mlworkshop.html * Honghua Dai, KDEX-97: IEEE Knowledge and Data Engineering Exchange Workshop, http://www.sd.monash.edu.au/kdex-97 * Gordon, CFP: ICML-97 workshop on Reinforcement Learning http://www.cs.cmu.edu/~ggordon/ml97ws -- Knowledge Discovery Nuggets is a free electronic newsletter for the Data Mining and Knowledge Discovery community, focusing on the latest research and applications. Submissions are most welcome and should be emailed, with a DESCRIPTIVE subject line (and a URL) to [email protected]. Submissions may be edited for length. Please keep CFP and meetings announcements short and provide a URL for details. To subscribe, see http://www.kdnuggets.com/subscribe.html KD Nuggets frequency is 3-4 times a month. Back issues of KD Nuggets, a catalog of data mining tools ("Siftware"), pointers to Data Mining Companies, Relevant Websites, Meetings, and more is available at Knowledge Discovery Mine site at http://www.kdnuggets.com/ -- Gregory Piatetsky-Shapiro (editor) [email protected] ******************* Official disclaimer *********************** All opinions expressed herein are those of the contributors and not necessarily of their respective employers (or of KD Nuggets) ***************************************************************** ~~~~~~~~~~~~ Quotable Quote ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ "When you come to a fork in the road, take it." - Yogi Berra - >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Thu, 15 May 1997 22:22:53 -0700 From: Ronny Kohavi <[email protected]> Subject: Silicon Graphics' MineSet used in Incyte's LifeTools 3D A recent press release by Incyte Pharmaceuticals Inc. announces LifeTools 3D, a powerful data mining and visualization software based on Silicon Graphics' MineSet(tm) software suite of data analysis and visualization tools. In collaboration with Silicon Graphics, Incyte created customized functions that are specifically designed to help researchers view, explore, and identify novel genes within LifeSeq. See http://www.incyte.com/press/1997/PR9712-LT3D.html for details. -- Ronny Kohavi ([email protected], http://robotics.stanford.edu/~ronnyk) Engineering Manager, Analytical Data Mining. >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: "Prof. Zicari" <[email protected]> Date: Fri, 9 May 1997 23:39:14 +0200 (METDST) Subject: COMDEX Internet Application Awards. News Release First COMDEX Internet Application Awards IBM, Microsoft and SUN to sponsor Awards Program for the new generation of Internet applications Frankfurt -- April 1997. The three leading IT companies IBM, MICROSOFT and SUN Microsystems will jointly support an international Awards Program designed for the new generation of Internet-based applications for business. The first COMDEX Internet Application Awards will be given out in the following three categories: Best Intranet-based application for enterprise usage Focus: Use of an Intranet for Institutional/Corporate knowledge for competitive advantage. Most Innovative Web Site Focus: Best or most innovative Web Site with respect to user interface, easy to use, innovative content. Best Transactional Internet Application Focus: Database, interactive applications. The Award winners will be selected among the submittals by a jury of international experts. The Awards ceremony will take place on October 8, 1997 at the trade show COMDEX Internet & Object World Frankfurt'97 (October 7-10,1997, Sheraton Conference Center, Frankfurt/Main Airport). "Successful Internet technologies like Java confirm us in considering the Internet as the future base for enterprise computing. The COMDEX Internet Application Awards program provides an excellent forum for honoring and supporting outstanding Internet applications. We are looking forward to an exciting contest", says Gert Haas, Marketing Director, SUN Microsystems, Germany. Microsoft's commitment to the Awards Program is explained by Karl-Heinz Breitenbach, Customer Unit Manager Internet & Developer Customer Unit, Microsoft Germany: "The availability of all relevant information at work is the base for a fast and successful decision in a company. We therefore have taken the challenge of providing 'information at your fingertips' very early and this is reflected by our current product line. Internet technology today allows to rapidly and reliably represent information distributed in all branches of the company via a so called Intranet solution. With the sponsorship of the COMDEX Internet Application Awards, Microsoft confirms its commitment to innovative Internet technologies which perfectly match our company goals." Sanyaya Addanki, General Manager of Network Computing Solutions, IBM EMEA, explains IBM's motivation for a sponsorship: "IBM is committed to providing companies with solutions that link business critical applications and data with the global reach and easy access of the web. We are proud to sponsor the COMDEX Internet Application Awards Program, which fosters the development of electronic business applications. Electronic business is the cornerstone of IBM's network computing vision." To obtain the entry kit: download it from the web at: http://www.ltt.de send an e-mail to: [email protected] call LogOn at: +49-6173-9558-51 COMDEX Internet and Object World Frankfurt '97 are produced by SOFTBANK COMDEX Inc. and LogOn Technology Transfer GmbH. The show is sponsored by: Object Management Group (OMG), A1-Solutions, Business Online, Computer Associates, Computer Zeitung, MID and redmond's. Internet and Wireless are sponsored by Omnilink Internet Service Center and ARtem. Information on Conferences and Exhibition: Christiane Sattler LogOn Technology Transfer GmbH Burgweg 14, D-61476 Kronberg/Ts., Germany phone: +49-6173-9558-53 fax: +49-6173-9404-20 e-mail: [email protected] Web: http://www.ltt.de >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [the following article is included with the permission of HPCwire. GPS] Date: Fri, 23 May 1997 14:00:49 -0400 From: Brij Masand <[email protected]> Subject: ROBERT GROSSMAN DISCUSSES MANAGING, MINING LARGE DATA SETS [From H P C w i r e * May 23, 1997: Vol. 6, No. 20 *] ROBERT GROSSMAN DISCUSSES MANAGING, MINING LARGE DATA SETS by Alan Beck, editor in chief, HPCwire 05.23.97 ============================================================================= Chicago, Ill. -- Issues raised in the effective archiving, managing and mining of very large data sets have significant pragmatic repercussions throughout both commercial and scientific computing. To learn more about the state of the art in this area, HPCwire interviewed Robert Grossman, professor of mathematics, statistics and computer science at the University of Illinois at Chicago, president of Magnify, and principal researcher in the Terabyte Challenge. ------------------- HPCwire: Please give an overview of the current status of the Terabyte Challenge, including funding sources and participants. GROSSMAN: "The Terabyte Challenge is open, distributed test bed for managing and mining massive data sets. The infrastructure for the Terabyte Challenge is provided by the NSF sponsored National Scalable Cluster Project (NSCP) and its industrial partners. The NSCP philosophy is to use commodity components with high performance networking to build virtural platforms with supercomputing power. The software tools developed for the Terabyte Challenge seek to balance high performance computing with the high performance input/output required by data intensive and data mining applications. "Currently, the NSCP consists of approximately 25 nodes and 500 Gigabytes of disk at both UIC and UPenn, together with smaller clusters at the participating partners. The infrastructure will be more than doubling over the next few months to over 100 nodes and 2 Terabytes of disk. Unlike other centers, the NSCP is configured for managing and mining large data sets, ranging in size from 100 to 500 Gigabytes. "We are currently planning the third Annual Terabyte Challenge, which will take place at SC 97. The first two took place at Supercomputing 95 and 96 (both won High Performance Computing Challenge Awards). "Currently, the University of Illinois at Chicago, the University of Pennsylvania, and the University of Maryland form the core academic team. Two industrial partners-HUBS (Philadelphia) and Magnify, Inc. (Chicago) will also be working closely on this year's Terabyte Challenge. Funding is provided by NSF to the NSCP Consortium, by DOE to UIC and UPenn, and by DOD to Magnify. We expect additional partners to join us. If interested, please contact RLG. "Current applications include mining scientific data (UIC and UPenn), mining medical data (UIC and UPenn), detecting network intrusions with data mining (Magnify, Inc), and data intensive computing in support of virtual reality (HUBS). "The web site http://www.lac.uic.edu will contain additional information shortly." HPCwire: What progress has been made in scaling algorithms for very large data sets? GROSSMAN: "I use the 10x rule: one can expect to archive 10-100x more data than one can manage, and manage 10-100x more data than one can mine. This makes sense since archiving requires a simple retrieval of files or objects, managing requires the ability to perform simple queries, and mining requires statistically and numerically intensive queries. At SC 96, we mined data sets that were roughly 100-250 Gigabytes in size using 10-25 nodes. At SC 97, we hope to mine 500-1000 Gigabytes of data on 50-100 nodes. I want to emphasize that one can manage and perform simple queries of much larger data sets (up to tens of Terabytes), but the detailed data mining of even a few hundred gigabytes of data is a challenge today." "Parallelizing data mining algorithms can be done in several ways. Most data mining algorithms are sufficiently compute-intensive that they work best when the data and the working space required for the algorithm fit into memory. For large data sets this is not clearly not possible and the challenge is to balance the i/o requirements of the algorithm with the cpu requirements. Several approaches are possible: "For the purposes here, we assume that the data mining process consists of several steps, including 1) extracting patterns, 2) using these patterns automatically to build predictive models, and 3) selecting or combining multiple predictive models to produce a single decision. In each of the four methods described next, one or more subsets of the data are chosen and mined. The methods differ in how the subsets are chosen: the subsets may be created by random draws, by a partition of the data, by a cover of the data, or by a range based query of the data. "In sample based data mining, one samples a large data set and then extracts a patterns or builds a model. This is the most common approach. It works well for patterns that are still easily found after down sampling. It has the advantage that the compute time is vastly reduced (since the data to be mined is vastly smaller) and the disadvantage that the patterns obtained are often not indicative of the whole data set -- this is closely related to the problem of over-fitting. This approach is most often not parallelized, although sometimes sampling can be done in parallel and the results combined into one model using model averaging techniques. "In partitioned based data mining, the data set is partitioned into distinct subsets which fit into memory, each partition is separately mined to produce a collection of predictive models, and then the predictive models are combined using model selection and model averaging techniques. This type of data mining is easily parallelized, since one (or more) processors can be assigned to each partition. "Cover-based data mining is similar to partitioned based data mining, but the different subsets to be mined can be overlapping. This is closely related to what is called local mining, in which the patterns extracted use data which is localized in some fashion, say based on the N closest data points to a fixed reference point. "Attribute-based data mining creates different subsets to be mined by using an attribute based query of the underlying data set. For example, all objects whose first attribute is less than 1.1 and whose second attribute is equal to "A", etc. are selected and then mined. "For more information, see R. L. Grossman, Scaling Data Mining Algorithms Using Cover-based Learning with Model Selection and Model Averaging, http://www.magnify.com " HPCwire: How is the TC approaching the mining of highly distributed data? GROSSMAN: "On the systems side, we have made good progress in this area. The NSCP clusters at UIC and UPenn have been connected for several weeks now by the vBNS at OC-3 (155 Mbps) speeds. Using this infrastructure we have experimented with wide area data mining of scientific and medical data. We are currently using this experience to develop new algorithms for wide area data mining and to develop new generations of our data management and data mining tools. The challenge is to develop a new class of algorithms for extracting patterns from widely distributed data without the necessity of first warehousing the data." HPCwire: What progress has been made in better understanding dynamical systems via data mining? GROSSMAN: "Not as much as we would have liked. Data mining algorithms today, by and large, work with data which is flat and static. The core dynamical system concepts of a state vector and its evolution in time are missing in most data mining algorithms. Hybrid systems is an emerging field which combines dynamical systems with discrete structures such as rule systems and automata. The latter can express the patterns discovered in data mining. Researchers working in the NSCP are actively investigating exploiting hybrid systems and related techniques to develop next generation data mining algorithms which can utilize state information and work with time varying data." HPCwire: How is TC research being made available to the commercial sector? Have any new products or partnerships resulted from TC-generated technology? GROSSMAN: "The NSCP and the Terabyte Challenge have 1) published the core ideas they have developed for data mining and data intensive computing, 2) developed reference architectures and implementations for software tools to support data mining (the UIC software tools PTool, JTool, and DMTool), and 3) encouraged companies to exploit this technology for data intensive computing and data mining. "To date, HUBS in Philadelphia and Magnify, Inc. in Chicago have begun to employ some of these ideas in the products and services they offer. Currently, regional data minings centers are in the planning process in both Chicago and Philadelphia." HPCwire: How do you see the TC evolving over the next five years? GROSSMAN: "The most exciting development is the expected transformation of the NSCP into two regional data mining centers with very strong industrial ties: one in Chicago and one in Philadelphia. This has three important consequences: 1) First the compute, i/o, and networking infrastructure which we can dedicate to data mining projects is expected to double this year and hopefully to double again in about two years. 2) With our industrial partners, we are actively working to demonstrate the practical feasibility of mining massive data sets and to establish open standards for managing, mining, and modeling massive data sets. 3) Using the vBNS network connecting the centers in Chicago and Philadelphia, we are finding it easy to experiment with the type of wide area data mining issues which we expect to take on an increasing important role for scientific, engineering, medical, and business data mining applications. "To summarize, during the next five years, we expect the Terabyte Challenge not only to continue to push the boundaries of massive data mining through an annual competition, but also, together with its industrial partners, to be actively involved with establishing data mining standards and reference implementations of software tools for managing, mining, and modeling massive data sets. "Additional participants for 1997 competition are welcome. Please contact one of the organizers if interested. Additional information can be found at http://www.nscp.uic.edu " -------------------- Alan Beck is editor in chief of HPCwire. Comments are always welcome and should be directed to [email protected] Copyright 1997 HPCwire. Redistribution of this article is forbidden by law without the expressed written consent of the publisher. For a free trial subscription to HPCwire, send e-mail to [email protected]. H P C w i r e The Text-on-Demand E-zine for High Performance Computing ************************************************************************* >~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Thu, 22 May 1997 15:05:54 -0400 From: Gregory Piatetsky-Shapiro <[email protected]> Subject: First Issue of DMKD journal is available on-line in PDF format The premiere issue of Data Mining and Knowledge Discovery journal is available on-line, in PDF format, at http://www.wkap.nl/kapis/CGI-BIN/WORLD/kaphtml.htm?DAMISAMPLE To read this very good (in my biased opinion) issue you need an Acrobat reader, which you can download from http://www.adobe.com/acrobat/ Only the first issue will be freely available on-line, but you can subscribe to the journal for $50 individual rate, more for institutional rate -- see http://www.wkap.nl/kapis/CGI-BIN/WORLD/journalhome.htm?1384-5810 for subscription information. Please support this journal ! >~~~Publications:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: [email protected] Date: Fri, 23 May 97 22:12:09 BST Subject: Nuggets: Bibliography of KDD and Data Mining Papers The Master Bibliography of KDD and Data Mining Papers is a bibliography of over 400 papers on the topics of Data Mining and Knowledge Discovery in Databases (this includes closely related papers on visualisation and machine learning). More than 70 of the papers are online. It is available in either bibtex, or html annotated bibtex formats from: http://www.cs.bham.ac.uk/~anp/papers.html A search interface is also available at: http://www.cs.bham.ac.uk/~anp/bibtex/search.html Andy additional references, or corrections are gratefully received. Please email them to me, Andy Pryke, at [email protected] Only references in machine readable format (e.g. refer or preferable Bibtex) can be added, due to time constraints. Note that all the information I have about the papers in in the bibliography, and many (330ish) of the papers are not available online. Please read the _collection_ copyright statement at (http://www.cs.bham.ac.uk/~anp/bibtex/copyright.html). If you find the bibliography useful, you may wish to send me a postcard (details in the copyright statement). Andy Pryke -- Andy Pryke, Research Student, Computer Science, Birmingham University Data Mining Information - http://www.cs.bham.ac.uk/~anp/TheDataMine.html >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Fri, 16 May 1997 19:09:05 -0500 From: [email protected] (Douglas H. Fisher) Subject: COLT/ICML Early Registration Early registration for the Tenth Annual Conference on Computational Learning Theory (COLT-97) and/or the Fourteenth International Conference on Machine Learning (ICML-97) concludes June 2, 1997. Room blocks at area hotels and on campus are also "released" June 2 (though rooms will likely still be available after that date). See http://cswww.vuse.vanderbilt.edu/~mlccolt/ for more information. >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Fri, 16 May 1997 16:44:56 +0200 (MET DST) From: Jan Komorowski <[email protected]> Subject: PKDD'97 -- Call For Participation 1st European Symposium on Principles of Data Mining and Knowledge Discovery in Databases Trondheim, Norway June 24-27, 1997 Tutorials: June 24-25 Symposium: June 26-27 This is an invitation to the 1st European Symposium on Principles of Data Mining and Knowledge Discovery in Databases. PKDD'97 is the first symposium in an intended series of meetings of the data mining and knowledge discovery from databases (KDD) community in Europe. The goal of the PKDD series is to provide a European-based forum for interaction among all theoreticians and practitioners interested in data mining and knowledge discovery. Fostering an interdisciplinary collaboration is one desired outcome, but the main long-term focus is on theoretical principles for the emerging discipline of KDD, especially those new principles that go beyond each of the contributing areas. There were 50 papers submitted to PKDD'97. After the selection by the program committee, the papers were assigned into three categories: 14 plenary papers, 13 parallel session papers and 11 poster papers that include spot-light presentations in the plenary sessions. In addition, four tutorials were selected: Rough Sets for Data Mining and Knowledge Discovery, Techniques and Applications of KDD, High Performance Data Mining, and Data Mining in the Telecommunications Industry. The proceedings are published by Springer Verlag. The invited speakers include Evangelos Simoudis, USA, and Bjarne Foss, Norway. Theey will provide their different perspectives on the field: one is data mining for businesses and the other data mining seen from the point of view of control theory. Panel discussions on the present situation and the future development of the field are planned. There will be software exhibitions of both commercial and academic software. Please look at the PKDD'97 Homepage (http://www.idi.ntnu.no/pkdd97/) for detailed information and news about the symposium. >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: David Heckerman <[email protected]> Subject: Summer School on PROBABILISTIC GRAPHICAL MODELS Date: Fri, 16 May 1997 08:08:00 -0700 A Newton Institute EC Summer School PROBABILISTIC GRAPHICAL MODELS 1 - 5 September 1997 Isaac Newton Institute, Cambridge, U.K. Organisers: C M Bishop (Aston) and J Whittaker (Lancaster) Probabilistic graphical models provide a very general framework for representing complex probability distributions over sets of variables. A powerful feature of the graphical model viewpoint is that it unifies many of the common techniques used in pattern recognition and machine learning including neural networks, latent variable models, probabilistic expert systems, Boltzmann machines and Bayesian belief networks. Indeed, the increasing interactions between the neural computing and graphical modelling communities have resulted in a number of powerful new ideas and techniques. The conference will include several tutorial presentations on key topics as well as advanced research talks. Provisional themes: Conditional independence; Bayesian belief networks; message propagation; latent variable models; variational techniques; mean field theory; learning and estimation; model search; EM and MCMC algorithms; axiomatic approaches; causality; decision theory; neural networks; information and coding theory; scientific applications and examples. Provisional list of speakers: C M Bishop (Aston) D J C MacKay (Cambridge) R Cowell (City) J Pearl (UCLA) A P Dawid (UCL) M D Perlman (Washington) D Geiger (Technion) M Piccioni (Aquila) E George (Texas) R Shachter (Stanford) W Gilks (Cambridge) J Q Smith (Warwick) D Heckermann (Microsoft) M Studeny (Prague) G E Hinton (Toronto) M Titterington (Glasgow) T Jaakkola (UCSC) J Whittaker (Lancaster) M I Jordan (MIT) S Lauritzen (Aalborg) B Kappen (Nijmegen) D Spiegelhalter (Cambridge) M Kearns (AT&T) S Russell (Berkeley) This instructional conference will form a component of the Newton Institute programme on Neural Networks and Machine Learning, organised by C M Bishop, D Haussler, G E Hinton, M Niranjan and L G Valiant. Further information about the programme is available via the WWW at http://www.newton.cam.ac.uk/programs/nnm.html Location and Costs: The conference will take place in the Isaac Newton Institute and accommodation for participants will be provided at Wolfson Court, adjacent to the Institute. The conference package costs 270 UK pounds which includes accommodation from Sunday 31 October to Friday 5 September, together with breakfast, lunch during the days that the lectures take place and evening meals. Applications: To participate in the conference, please complete and return an application form and, for students and postdoctoral fellows, arrange for a letter of reference from a senior scientist. Limited financial support is available for participants from appropriate countries. Application forms are available from the conference Web Page at http://www.newton.cam.ac.uk/programs/nnmec.html Completed forms and letters of recommendation should be sent to Heather Dawson at the Newton Institute, or by e-mail to [email protected] Closing Date for the receipt of applications and letters of recommendation is 16 June 1997 >~~~Meetings:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: Vasant Honavar <[email protected]> Subject: Call for Participation: Workshop on Automata Induction, Grammatical Inference, and Language Acquisition Date: Thu, 8 May 1997 10:53:48 -0500 (CDT) Workshop on Automata Induction, Grammatical Inference, and Language Acquisition The Fourteenth International Conference on Machine Learning (ICML-97) July 12, 1997, Nashville, Tennessee The Automata Induction, Grammatical Inference, and Language Acquisition Workshop will be held on Saturday, July 12, 1997 during the Fourteenth International Conference on Machine Learning (ICML-97) which will be co-located with the Tenth Annual Conference on Computational Learning Theory (COLT-97) at Nashville, Tennessee from July 8 through July 12, 1997. Additional information on ICML-97 and COLT-97 can be found at http://www.cs.iastate.edu/~honavar/mlworkshop.html >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Date: Wed, 21 May 1997 12:23:13 +1000 From: Honghua Dai <[email protected]> Subject: KDEX-97 Final Call for Papers 1997 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX-97) -------------------------------------------------------------------- Sponsored by the IEEE Computer Society and Co-located with the 9th IEEE Tools with Artificial Intelligence Conference November 4, 1997, Newport Beach, California, U.S.A. =================================================== Call for Papers The 1997 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX-97) will provide an international forum for researchers, educators and practitioners to exchange and evaluate information and experiences related to state-of-the-art issues and trends in the areas of artificial intelligence and databases. The goal of this workshop is to expedite technology transfer from researchers to practitioners, to assess the impact of emerging technologies on current research directions, and to identify emerging research opportunities. Educators will present material and techniques for effectively transferring state-of-the-art knowledge and data engineering technologies to students and professionals. The workshop is currently scheduled for an one-day duration, but depending on the final program it might be extended to a second day. Submissions can be in the form of survey papers, experience reports, and educational material to facilitate technology transfer. Accepted papers will be published in the workshop proceedings by the IEEE Computer Society. A selected number of the accepted papers will possibly be expanded and revised for publication in the IEEE Transactions on Knowledge and Data Engineering (IEEE-TKDE) and the International Journal of Artificial Intelligence Tools. Educational material related to papers published in the IEEE-TKDE will be posted on the IEEE-TKDE home page. The theme of the workshop is "AI MEETS DATABASES". Topics of interest include, but are not limited to: - Computer supported cooperative processing and interoperable systems - Data sharing, data warehousing and meta-data management - Distributed intelligent mediators and agents - Distributed object management - Dynamic knowledge - Evaluation and measurement of knowledge and database systems - High-performance issues (including architectures, knowledge representation techniques, inference mechanisms, algorithms and integration methods) - Information structures and interaction - Intelligent search, data mining and content-based retrieval - Knowledge and data engineering systems - Quality assurance for knowledge and data engineering systems (correctness, reliability, security, survivability and performance) - Software re-engineering and intelligent software information systems - Spatio-temporal, active, mobile and multimedia data - Emerging applications (biomedical systems, decision support, geographical databases, Internet technologies and applications, digital libraries, etc.) All submissions should be limited to a maximum of 5,000 words. Six hardcopies should be forwarded to the following address. Xindong Wu (KDEX-97) Department of Software Development Monash University 900 Dandenong Road Caulfield East, Melbourne 3145 Australia Phone: +61 3 9903 1025 Fax: +61 3 9903 1077 E-mail: [email protected] Please include a cover page containing the title, authors (names, postal and email addresses, telephone and fax numbers), and an abstract. This cover page must accompany the paper. ********** I m p o r t a n t D a t e s *************** * 6 copies of full papers received by: June 15, 1997 * * acceptance/rejection notices: July 31, 1997 * * final camera-readies due by: August 31, 1997 * * workshop: November 4, 1997 * ************************************************************ Further Information =================== WWW: http://www.sd.monash.edu.au/kdex-97 >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From: [email protected] Date: Tue, 20 May 97 10:30:38 EDT Subject: CFP: ICML-97 workshop on REINFORCEMENT LEARNING: TO MODEL OR NOT TO MODEL, THAT IS THE QUESTION Workshop at the Fourteenth International Conference on Machine Learning (ICML-97) Vanderbilt University, Nashville, TN July 12, 1997 www.cs.cmu.edu/~ggordon/ml97ws Recently there has been some disagreement in the reinforcement learning community about whether finding a good control policy is helped or hindered by learning a model of the system to be controlled. Recent reinforcement learning successes (Tesauro's TD-gammon, Crites' elevator control, Zhang and Dietterich's space-shuttle scheduling) have all been in domains where a human-specified model of the target system was known in advance, and have all made substantial use of the model. On the other hand, there have been real robot systems which learned tasks either by model-free methods or via learned models. The debate has been exacerbated by the lack of fully-satisfactory algorithms on either side for comparison. Topics for discussion include (but are not limited to) o Case studies in which a learned model either contributed to or detracted from the solution of a control problem. In particular, does one method have better data efficiency? Time efficiency? Space requirements? Final control performance? Scaling behavior? o Computational techniques for finding a good policy, given a model from a particular class -- that is, what are good planning algorithms for each class of models? o Approximation results of the form: if the real system is in class A, and we approximate it by a model from class B, we are guaranteed to get "good" results as long as we have "sufficient" data. o Equivalences between techniques of the two sorts: for example, if we learn a policy of type A by direct method B, it is equivalent to learning a model of type C and computing its optimal controller. o How to take advantage of uncertainty estimates in a learned model. o Direct algorithms combine their knowledge of the dynamics and the goals into a single object, the policy. Thus, they may have more difficulty than indirect methods if the goals change (the "lifelong learning" question). Is this an essential difficulty? o Does the need for an online or incremental algorithm interact with the choice of direct or indirect methods? full information at www.cs.cmu.edu/~ggordon/ml97ws Contact: Geoff Gordon ([email protected]) >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~