colloqarchive

The Dynamic Data Cube
Amr El Abbadi, Department of Computer Science, UC Santa Barbara
Friday, December 3, 1999
Abstract

E-Speak: An Operating Environment for the Internet
Alan H. Karp, Chief Scientist, Open Services Operation, Hewlett-Packard
Friday, November 19, 1999
Abstract

Sorting in Space
Dr. Hanan Samet, Department of Computer Science, University of Maryland
Friday, October 29, 1999
Abstract

Gene Expression Data Modeling
Dr. Eric Mjolsness, Jet Propulsion Laboratory, California Institute of Technology
Wednesday, October 20, 1999
Abstract

High Performance Instruction Fetch
Mateo Valero, Polytechnic University of Catalonia (UPC) Barcelona, Spain
Monday, October 11, 1999
Abstract

Unsupervised Learning of Data Structures. An Application to Document Classification.
Paolo Frasconi, Department of Computer Science, University of Florence
Wednesday, September 8, 1999
Abstract

WebCluster, a Tool for Mediated Information Access
Gheorghe Muresan, Robert Gordon University, Aberdeen, Scotland
Monday, August 23, 1999

Designing A Coalesced-Hybrid Load Value Predictor
Martin Burtscher, University of Colorado at Boulder
Thursday, July 29, 1999
Abstract

Designing Mobile Agent-Based User Interfaces to Online Public Spaces
Robert Nideffer, School of the Arts, UCI
Monday, June 14, 1999
Abstract

Entropy minimization for learning and video understanding
Matthew Brand, Research Scientist, MERL; Lecturer, MIT
Thursday, March 18, 1999
Abstract

Adaptive QoS Framework and its Application to Visual Tracking
Klara Nahrstedt, Dept of Computer Science, University of Illinois at Urbana-Champaign
Thursday, January 28, 1999
Abstract

Design Patterns for Data Structures
Dung ("Zung") Nguyen, Pepperdine University
Wednesday, December 2, 1998
Abstract

Compiling Inference: A Structure-Based Approach
Adnan Darwiche, Senior Scientist, Rockwell, Thousand Oaks, CA
Thursday, November 19, 1998
Abstract

Abstracting Actor Interactions: Supporting Complex Distributed Software Systems
Prof. Gul A. Agha, Director, Open Systems Laboratory, University of Illinois, Urbana
Tuesday, November 17, 1998
Abstract

Representation of Process Mode Correlation for Scheduling
Dirk Ziegenbein, Ph.D. student in the research group of Prof. Rolf Ernst at Technical University (TU), Braunschweig, Germany
Monday, November 16, 1998
Abstract

LOT: Logic Optimization with Testability - New Transformations for Logic Synthesis and
VERILAT: Verification Using Logic Augmentation and Transformation
Dr. Dhiraj K. Pradhan, Visiting Professor, Dept. of EE, Stanford University and COE Endowed Chair Professor, Dept. of Computer Science, Texas A&M University
Thursday, September 24, 1998
Abstract

Web-Based Question Answering in the FAQ Finder Project
Robin Burke
Thursday, September 10, 1998
Abstract

TROMLAB: An environment for a rigorous development of real-time reactive systems
Professor V.S.Alagar, Department of Computer Science, Concordia University, Montreal, Quebec, Canada
Wednesday, August 19, 1998
Abstract

On the Fitting of Finite Mixture Models and an Application Concerning Competing Risks in Survival Analysis
Geoff McLachlan, University of Queensland
Friday, August 14, 1998
Abstract

The NUMAchine Multiprocessor: A Cache Coherence Protocol
Sinisa Srbljic
School of Electrical Engineering and Computing, Univ. of Zagreb, Zagreb, Croatia, sinisa@zemris.fer.hr
AT&T, San Jose, CA, sinisa@ipo.att.com, Dept. of Elec & Comp Engr, Univ. of Toronto, Ontario, Canada, sinisa@eecg.toronto.edu
Monday, July 27, 1998
Abstract

Latency Hiding Techniques and Interconnect Choice in Shared Memory Multiprocessors
Alex Veidenbaum, EECS Dept, University of Illinois-Chicago
Wednesday, July 23, 1998
Abstract

Machine Learning for Adaptive User Interfaces
Pat Langley, Adaptive Systems Group, Daimler-Benz Research and Technology Center and Institute for the Study
of Learning and Expertise, Palo Alto, California
Thursday, June 25, 1998
Abstract

EM Algorithms for PCA and SPCA
Sam Roweis, Princeton University
Friday, June 5, 1998
Abstract

The XML Revolution
Robert J. Glushko, Director, Information Engineering, Veo Systems
Program Manager for eCoNet
Thursday, May 28, 1998
Abstract

Protocol Compiler
Dr.Barry M. Pangrle, Synopsys Inc.
Wednesday, May 20th, 1998
Abstract

On the Power of Barely Random Online Algorithms
Steve Seiden, Technische Universitaet Graz, Institut fuer Mathematik B
Thursday, April 30, 1998
Abstract

The Globus Project: building a computational grid
Carl Kesselman, Research Associate Processor, Department of Computer Science
Project Leader, USC/Information Science Institute
Monday, April 27, 1998
Abstract

Integrating Multiple Images for Segmentations
Dr. Michael Turmon, Jet Propulsion Laboratory, Pasadena, CA
Friday, April 24, 1998
Abstract

Cooperative Buildings: Extending the scope of CSCW beyond desktops
Dr. Dr. Norbert A. Streitz
GMD-German National Research Center for Information Technology
IPSI - Integrated Publication and Information Systems Institute, Darmstadt, Germany
Friday, April 24, 1998
Abstract

A Process-Oriented Heuristic for Model Selection
Pedro M. Domingos, Professor, Technical University of Lisbon
Friday, April 3, 1998
Abstract

Instruction Cache Prefetching Using Multilevel Branch Prediction
Alexander V. Veidenbaum, Dept. of Elec. Engr. & Comp Science, Univ of Illinois at Chicago
Monday, March 30, 1998
Abstract

The Design and Verification of a High-performance Low-control-overhead
Asynchronous Differential Equation Solver
Kenneth Y. Yun, Assistant Professor, University of California, San Diego
Friday, March 20, 1998
Abstract

Production Based Synthesis Package for Finite State Machines
Forrest Brewer, University of California, Santa Barbara
Friday, March 13, 1998
Abstract

Multimedia Analysis and Retrieval System (MARS) Project
Sharad Mehrotra, University of Illinois at Urbana-Champaign
Tuesday, March 10, 1998
Abstract

An Adaptive Resource Management Architecture for Global Distributed Computing
Nalini Venkatasubramanian, University of Illinois at Urbana-Champaign
Monday, March 9, 1998
Abstract

Fast Algorithms for Spatial and Multidimensional Joins
Nick Koudas, University of Toronto
Friday, March 6, 1998
Abstract

Managing Change in Autonomous Databases
Sudarshan S. Chawathe, Stanford University
Thursday, March 5, 1998
Abstract

Hardware Compilation: The Translation of Programs into Circuits
Niklaus Wirth, ETH Zurich, Switzerland
Wednesday, March 4, 1998
Abstract

The Geneva Messengers
Christian F. Tschudin, University of Zurich
Abstract

Protein Structure and Function Prediction through Evolutionary Sequence Analysis -- Results and Proposed Applications in the Post-Genomics Era
Dietlind L. Gerloff, UC San Francisco
Monday, February 23, 1998
Abstract

VLSI Interconnect Layout Optimization in Deep Submicron Designs
Jason Cong, UCLA
Friday, February 20, 1998
Abstract

Navigating Digital Libraries and Multimedia Archives
Robert B. Allen, Bellcore
Monday, February 16, 1998
Abstract

Dynamic Categorization: A Method for Decreasing Information Overload
Wanda Pratt, Stanford University
Thursday, February 12, 1998
Abstract

Trailblazing Path to Semantic Interoperability in Digital Library Research
Hsinchun Chen, University of Arizona
Monday, February 2, 1998
Abstract

WebCite: Finding and Visualizing Structure in Hyper-Linked Collections
Loren Terveen
Friday, January 30, 1998
Abstract

An Introduction to AT&T Labs
Loren Terveen
Thursday, January 29, 1998
Abstract

Computational Analysis of DNA Structural Transitions
Dr. Craig Benham, Mount Sinai School of Medicine
Thursday, January 29, 1998
Abstract

NSF-CISE 98: The Reorganization -- Programs, Structure, People and Thrusts
Dr. Robert P. Grafton, Program Director, National Science Foundation
Tuesday, January 27, 1998
Abstract

The Dynamic Data Cube
Amr El Abbadi

Data Cubes have become a powerful tool for on-line analytical processing (OLAP).Typically, range sum queries on data cubes are used for analysis purposes. A range sum query applies an aggregation operation (e.g., SUM, AVERAGE) over all selected cells in a data cube, where the selection is specified by providing ranges of values for numeric dimensions. Current approaches to data cube maintenance assume that the data contained in the cube is static and hence concentrate on providing efficient query functionality. Existing techniques for range sum queries on data cubes, however, can incur update costs on the order of the size of the data cube. Since the size of a data cube is exponential in the number of dimensions, rebuilding the entire data cube can be very costly. Emerging applications in the context of OLAP require balanced update and query performance. We begin by presenting an approach that achieves constant time queries while reducing update costs. We then present the Dynamic Data Cube, a new approach to range sum queries which provides efficient performance for both queries and updates, which handles clustered and sparse data gracefully, and which allows for the dynamic expansion of the data cube in any direction.

This is work done jointly with Divy Agrawal and Steve Geffner.

E-Speak: An Operating Environment for the Internet
Alan H. Karp

E-speak started as a research project at HP Labs with a very modest goal - to change the way the world thinks about computing. Today, we think about computing as the hardware we buy and the software we install on it. E-speak is about thinking of computing as a set of services that you enlist as you need them.

This dramatic shift is enabled by e-speak, an operating environment for the Internet. Almost, but not quite, an operating system, e-speak provides the access control and virtualization of an operating system while preserving the independent control of machines in the environment.

This talk will describe the requirements that led to the various features of the e-speak architecture as well as its key abstractions.

Sorting in Space
Hanan Samet

The representation of spatial data is an important issue in computer vision, geographic information systems, computer graphics, and robotics. A wide number of representations is currently in use. Recently, there has been much interest in hierarchical data structures such as quadtrees and octrees. The key advantage of these representations is that they provide a way to index into space. In fact, they are little more than multidimensional sorts. They are compact and depending on the nature of the spatial data they save space as well as time and also facilitate operations such as search. In this talk we give a brief overview of hierarchical spatial data structures and related research results. In addition a demonstration of a spatial database system and of the VASCO JAVA applet illustrating these methods (found at http://www.cs.umd.edu/~hjs/quadtree/index.html) will be presented.

Gene Expression Data Modeling
Eric Mjolsness

New biological instrumentation enables large-scale measurement of the time- and space-dependent activity of genes in creating their mRNA and protein products. To assimilate this data and use it to answer scientific questions, new algorithms and software systems are required. I will discuss relevant computational methods including the adaptation and comparison of various data clustering algorithms, the application of gene circuit models and gene regulation network inference algorithms, and suitable database and visualization software. These methods are applied to gene expression data from eukaryotic organisms.

Joint work with Rebecca Castano, Christopher Hart, Tobias Mann, George Marnellos, John Reinitz, David Sharp, and Barbara Wold.

High Performance Instruction Fetch
Mateo Valero

Instruction fetch can be a limiting factor for wide issue superscalar processors. This work shows how the fetch problem is better approached using a combination of compile-time and run-time techniques. The trace cache is a well known hardware mechanism which captures the dynamic instruction stream, and stores the resulting instruction sequences (traces) in a special-purpose cache expecting that the same instruction trace will be encountered again in the future. We propose to construct these instruction traces at compile-time using profile information, increasing the role of the instruction cache. This we call the Software Trace Cache, and increases fetch performance of the instruction cache to near trace cache level, allowing a significant trace storage reduction.

We also show how building traces at compile time exposes a significant redundancy between the instruction cache and the trace cache, as both caches are storing the same traces. We propose a small modification of the trace cache fill unit to implement Selective Trace Storage, which eliminates this redundancy, and allows further hardware savings in the fetch unit implementation.

Overall, using Software Trace Cache and Selective Trace Storage we can provide the same fetch performance using a tiny 4KB trace cache than a 128KB one with none of our optimizations.

Unsupervised Learning of Data Structures. An Application to Document Classification.
Paolo Frasconi

In several interesting machine learning domains instances cannot be accurately described using the simple attribute-value representation. This happens for example in chemistry, in structural pattern recognition, in learning heuristic evaluation functions for theorem provers. In these domains each instance is a collection of related atomic entities and data structures (labeled graphs) are a more appropriate representation.

Many connectionist models, on the other hand, implicitly assume the attribute-value representation. For example in the case of multilayered perceptrons, the input is a fixed size real vector that essentially encodes a fixed size tuple of attributes. Recursive networks have been explicitly introduced to address this limitation. In particular, recursive neural networks can effectively solve the supervised learning problem when each instance is a labeled directed acyclic graph. Unsupervised learning of data structures, however, is still largely unexplored. One interesting possibility is to build models for unsupervised learning by combining two graphical formalisms: recursive networks and Bayesian networks. This idea yields a generalization of hidden Markov models in which the linear concept of "time" is replaced by the more general concept of graphical basis.

In the first part of this talk I will revise the recursive network formalism, which is the theoretical background for connectionist data structure learning. In the second part I will specialize the formalism for constructing probabilistic generative models of labeled trees and I will present some preliminary results on the problem of scanned document classification.

Designing A Coalesced-Hybrid Load Value Predictor
Martin Burtscher

Load instructions significantly slow processor performance due to their occasional long latency and frequent execution. Load value predictors alleviate this problem by allowing the CPU to speculatively continue processing without having to wait for the memory access to complete, thus speeding up program execution.

To exploit as much of the existing load value locality as possible, hybrid predictors have been proposed that combine several different predictors in one. Unfortunately, such hybrids are often quite large. In this talk I present the evolution of a very effective yet small hybrid load value predictor whose components coalesce by sharing large amounts of state.

A detailed performance analysis using a cycle-accurate pipeline-level simulator of an aggressive 4-way superscalar microprocessor running SPECint95 shows that the coalesced hybrid outperforms other predictors by close to 40% over a wide range of sizes. With 11kB of state, the smallest configuration I examined, it surpasses the speedups of other, eight times larger predictors both with a re-fetch and a re-execute recovery mechanism.

An explicit study of the hybrid's individual components reveals that they contribute independently to the overall performance, indicating that predictors can be combined to exploit a larger percentage of the load value locality.

Designing Mobile Agent-Based User Interfaces to Online Public Spaces
Robert Nideffer

Dr. Nideffer researches, teaches, and publishes in the areas of virtual environments and behavior, interface theory and design, technology and culture, and contemporary social theory. He holds an MFA in Computer Arts, and a Ph.D. in Sociology. Currently he is an Assistant Professor in Studio Art at UC Irvine, where he heads the digital media area. In his talk, Nideffer will discuss his recent work co-directing an interdisciplinary multi-campus project organized around creating a Java-based Mobile Agent Management (MAM) system, and its integration into a testbed project he is developing.

Adaptive QoS Framework and its Application to Visual Tracking
Klara Nahrstedt

In a heterogeneous distributed environment where multiple applications compete and share a limited amount of system resources, applications tend to suffer from variations in resource availability. It is desired that they adapt their behavior to the resource variations of the system above a minimum Quality of Service (QoS) guarantee. On one hand, current adaptation mechanisms built within an application have the disadvantage of lacking global information to preserve fairness among all sharing applications. On the other hand, the adaptive resource management built within operating system ignores the data semantics of the application. Hence, we believe that a proper adaptive behavior of QoS can be achieved in a middleware framework, having data semantics of the application as well as understanding of underlying resource management dynamics. In this talk, we present a control-theoretical middleware framework to enhance QoS adaptations by dynamic control and reconfiguration to the internal functionalities of a distributed multimedia application. We will present analytical and experimental validations. We have developed the Task Control Model and implemented it within an experimental client-server visual tracking system to evaluate our middleware framework. To maintain tracking precision in a specified range, the middleware framework controls the tracking application according to the current system dynamics.

Entropy minimization for learning and video understanding
Matthew Brand

Consider the scientist's intuition that the best hypothesis should give the simplest and clearest account of the data. This can be neatly formalized in terms of entropy. I'll introduce a data-modeling framework in which one learns by minimizing all entropies associated with a hypothesis. This framework contains several powerful methods as special cases: deterministic annealing; maximum likelihood; a variant of MaxEnt; and a new method I'll call maximum structure, which blends description length minimization and free energy minimization. Maximum structure estimators seeks an optimal separation between the essential and accidental structure of a dataset, typically yielding a highly predictive model whose internal structure can be interpreted as a theory of the data-generating mechanism. The maximum structure case also has interesting connections to Kolmogorov complexity and to the theory of random graphs. When we replace the M step of expectation-maximization with the maximum-structure estimator, we obtain a considerably more efficient and powerful deterministic annealing algorithm that simultaneously estimates the structure and parameters of hidden-variable models--yielding quasi-global optima with no speculative or wasted computation. Using simple probabilistic models trained in this unsupervised learning framework, we are able to tackle applications that are difficult even with supervised learning. I'll demonstrate a variety of audio and video applications in which entropically estimated models support annotation, low bit-rate coding, anomaly detection, and even reconstruction of 3D structure and motion. Time permitting, I'll also show some results in animation.

Design Patterns for Data Structures
Dung ("Zung") Nguyen

Design patterns provide ways to structure software components into systems that are flexible, extensible, and have a high degree of reusability. Design patterns are often misconstrued as applicable only to programming in the large. In reality, they can be applied to solving problems in programming in the small such as implementing data structures. Appropriate applications of object-oriented patterns, such as the state pattern, the null object pattern, and the singleton pattern, help narrow the gap between the abstract views of the data structures and their concrete implementations. The smaller the gap, the higher the level of abstraction. The more abstract, the less complex the coding structure. As a result, algorithms are easier to understand and more manageable.

Compiling Inference: A Structure-Based Approach
Adnan Darwiche

Early AI systems operated mostly in stand-alone modes and required sophisticated software and hardware resources to function properly. In today's world, however, there is an increased need to embed AI systems into platforms which are characterized by their constrained resources and stringent operational requirements. To address this need, I have been developing a class of inference compilation techniques for synthesizing embeddable reasoning systems. The compiled systems are marked by their extreme simplicity, which renders their embeddability a reality, yet come with strong formal guarantees on their storage requirements and processing time. In this talk, I will discuss these inference compilation techniques for both probabilistic and logical reasoning and focus on their application to synthesizing embeddable diagnostic systems.

ABSTRACTING ACTOR INTERACTIONS: Supporting Complex Distributed Software Systems
Prof. Gul A. Agha

The Actor model provides flexible mechanisms for modeling and building open distributed systems. Considerable research has been done on the actors model and its application to open distributed systems. Useful results on the equivalence of configurations, their composition, and encapsulation, have been derived, and powerful proof techniques developed. Work on compilers and run-time systems has provided efficient implementation. However, it is clear that further development is necessary to address the challenge of building and reasoning about complex distributed software systems. In order to model complex systems, it is necessary to abstract over patterns of interaction between groups of actors. I will describe a number of programming language constructs such as activators, actorspaces, synchronizers, real-time synchronizers, protocols, and agents, which abstract over interaction patterns. I will then discuss current research directions.

Professor Gul Agha is Editor-in-Chief of IEEE Concurrency: Parallel, Distributed and Mobile Computing, and Associate Editor of the journals ACM Computing Surveys, and Theory and Practice of Object Systems. Prof. Agha is an ACM International Lecturer and has given invited lectures at fifteen international conferences in different areas including formal methods, parallel languages, parallel and distributed software engineering, parallel and distributed databases, and distributed AI. He is a past recipient of the Incentives for Excellence Award from Digital Equipment Corporation, the Young Investigator Award from the US Office of Naval Research, and a Fellowship in the University of Illinois Center for Advanced Study.

Representation of Process Mode Correlation for Scheduling
Dirk Ziegenbein

The specification of embedded systems very often contains a mixture of different models of computation. In particular, the data flow and control flow associated to the transformative and reactive domains, respectively, are tightly coupled. The paper considers classes of applications that feature communicating processes which have alternative behaviors that are modeled by process modes.

The process mode can be selected at system startup and then remain fixed throughout the run-time. In this case, only the fixed mode has to be considered for scheduling. Much more interesting, however, is the case of dynamically changing modes. A process changes its behavior in reaction to input data. Therefore, instead of introducing a complicated global state model, we chose to associate the change of mode with the data which triggers the change. This way, the flow of modes through the system is modeled.

In this paper, an approach is presented to model the correlation of process modes and to fully utilize this information for scheduling. A modeling example shows the optimization potential of the new approach.

LOT: Logic Optimization with Testability - New Transformations for Logic Synthesis
and
VERILAT: Verification Using Logic Augmentation and Transformation
Dr. Dhiraj K. Pradhan

This talk consist of two parts the first part describe a new formal verification tool for equivalence checking. This tool has been licensed to various industries including Motorola and Mentor Graphics among others. Next a new approach to optimize multi-level logic circuits will be introduced. Given a multi-level circuit, the synthesis method optimizes its area while simultaneously enhancing its random pattern testability. The method is based on structural transformations at the gate level. New transformations involving EX-OR gates as well as Reed-Muller expansions have been introduced in the synthesis of multi-level circuits. This method is augmented with transformations that specifically enhance random-pattern testability while reducing the area. Testability enhancement is an integral part of our synthesis methodology. Experimental results show that the proposed methodology not only can achieve lower area than other similar tools, but that it achieves better testability compared to available testability enhancement tools such as tstfx. Specifically for ISCAS-85 benchmark circuits, it was observed that EX-OR gate-based transformations successfully contributed toward generating smaller circuits compared to other state-of-the-art logic optimization tools.

Dhiraj K. Pradhan is currently a Visiting Professor of Electrical Engineering at Stanford University as well as the holder of the COE Endowed Chair Professor in Computer Science at Texas A&M University, College Station, Texas. Prior to joining Texas A&M, he served as Professor and Coordinator of Computer Engineering at the University of Massachusetts, Amherst. He also has worked at Oakland University, Michigan; University of Regina; Canada and Stanford University, California. Dr. Pradhan has contributed to VLSI CAD and test, fault-tolerant computing, computer architecture and parallel processing research with major publications in journals and conferences over the last twenty-five years. He has served as guest editor of special issues in prestigious journals such as IEEE Transactions. Currently he is also an editor for several journals, including the IEEE Transactions and JETTA. Dr. Pradhan has also served as General Chair and Program Chair for various major conferences. Dr. Pradhan has received several Best Paper Awards, including The 1996 IEEE Transactions on CAD Best Paper Award, W. Kunz and D.K. Pradhan, ``Recursive Learning: A New Implication Technique for Efficient Solutions to CAD Problems - Test, Verification, and Optimization''. A Fellow of the IEEE, Dr. Pradhan is the recipient of the Humboldt Distinguished Senior Scientist Award, Germany. He is also the recipient of the 1997-98 Fulbright-Flad Chair in Computer Science. Dr. Pradhan is co-author and editor of various books, including Fault-Tolerant Computing: Theory and Techniques, Vols. I and II (Prentice Hall, 1986), Fault-Tolerant Computer Systems Design, (Prentice Hall, 1996), and IC Manufacturability: The Art of Process and Design Integration,(IEEE Press, 1996).

Web-Based Question Answering in the FAQ Finder Project
Robin Burke

FAQ Finder is a natural language question-answering system that uses files of frequently-asked questions as its knowledge base. Unlike AI question-answering systems that focus on the generation of new answers or on detailed understanding of natural language, FAQ Finder uses indexing and matching techniques to retrieve existing answers from frequently-asked question files. Rather than rely on a purely lexical metric of similarity between query and document as in IR, FAQ Finder uses semantic and structural knowledge to improve its ability to match question and answer. I will give an overview of the system and some of the results we've achieved, and will welcome discussion of future research directions that could follow.

TROMLAB: An environment for a rigorous development of real-time reactive systems
Professor V. S. Alagar

TROMLAB environment is based on a process model committed to integrating formal approaches to the development of real-time reactive systems. This talk will introduce a formal generic reactive object, the basic building block of reactive systems, and give an overview of the architecture of TROMLAB. A formal verification method, which can be mechanized using PVS, will be presented.

On the Fitting of Finite Mixture Models and an Application Concerning Competing Risks in Survival Analysis
Geoff McLachlan

The first part of this talk will consider the fitting of mixture models in a general context, focussing on normal mixtures. We describe an algorithm called MIXFIT that automatically undertakes this fitting, including the specification of suitable initial values if not supplied by the user. The MIXFIT algorithm has several options, including the provision to carry out a resampling-based test for the number of components in the mixture model. In the second part, we consider the use of mixtures of nonnormal distributions to model failure-time distributions in the presence of competing risks. As an example, we consider the problem of providing a suitable model for the distribution of the time to a re-replacement operation for patients who have their native aortic valve replaced by a xenograft prosthesis. Distributional results based on the model provide a useful guide to heart surgeons on the type of valve to be used in view of the patient's age.

The NUMAchine Multiprocessor: A Cache Coherence Protocol
Sinisa Srbljic

In this talk, the design and hardware implementation of the cache coherence protocol for the NUMAchine multiprocessor will be presented. NUMAchine is a 64-processor, cache-coherent, shared-memory multiprocessor in the final stage of construction in the Department of Electrical and Computer Engineering at the University of Toronto (http://www.eecg.toronto.edu/parallel/NUMA). Processors, caches, and memory are distributed across a number of stations (stations are bus-based multiprocessors) connected by a hierarchy of unidirectional, bit-parallel rings. The ring hierarchy provides a number of useful features which are exploited by the cache coherence protocol, such as efficient multicasting and order-preserving message transfers. The NUMAchine cache coherence protocol employs a write-back/invalidate, directory-based scheme at two levels: network level and station level. Network-level coherence is enforced across stations, and station-level coherence is enforced within each station. Although the logic required to enforce cache coherence is complex, we show that it is possible to obtain an efficient hardware implementation by using high-capacity field-programmable devices (FPDs).

Latency Hiding Techniques and Interconnect Choice in Shared Memory Multiprocessors
Alex Veidenbaum

Various latency hiding techniques have been proposed to minimize the effect of long memory and interconnect delays in multiprocessors. These techniques require additional memory and network bandwidth and system performance depends on how well the interconnect can support the use of such techniques. This paper explores the use of prefetching and weak consistency in a 128-processor shared-memory system for three different network organizations: a 2-D torus, a multistage shuffle-exchange, or a single-stage shuffle-exchange.The performance impact of varying network topology and link bandwidth of the three networks is studied with and without the use of latency hiding. Multistage is the most robust network with the best performance under all conditions. Single-stage network comes very close in performance when sufficient channel bandwidth is available. Single-stage and torus networks have poor performance when channel bandwidth is small. With prefetching the performance can improve significantly provided sufficient network bandwidth is available. However, the relative performance of the three networks with prefetching remains qualitatively the same, with torus gaining the most. Benchmark execution time decreased by as much as 25% due to prefetching. Further gains depend on reducing the effect of write traffic. Overall, torus-based systems with prefetching show about a 10% performance degradation relative to the other two networks even with highest channel bandwidth.

Machine Learning for Adaptive User Interfaces
Pat Langley

In this talk I examine adaptive user interfaces -- advisory systems that personalize themselves to individual users. First I review the issues that arise in developing systems that learn from experience, then draw a strong analogy with software that adapts to its users. After this, I consider some examples of adaptive interfaces, focusing on systems that we are developing. These include the Adaptive Route Advisor, which suggests routes based on a user's driving history, and INCA, an interactive aid for crisis management that proposes revisions to plans and schedules. In closing, I consider the challenges that arise in developing adaptive user interfaces and some general lessons that have emerged from work in this area. For more information about these projects, see the ISLE web pages:
http://www.isle.org/~langley/papers/route.sss98.ps
http://www.isle.org/~gervasio/pub/cogsci98.ps
http://www.isle.org/~gervasio/pub/aaai98.ps
This talk describes work done jointly with Seth Rogers, Wayne Iba, and Melinda Gervasio.

EM Algorithms for PCA and SPCA
Sam Roweis

I will discuss an expectation-maximization (EM) algorithm for principal component analysis (PCA). The algorithm allows a few eigenvectors and eigenvalues to be extracted from large collections of high dimensional data. It is computationally very efficient in space and time. It also naturally accommodates missing information. Results on synthetic and real data show that this EM algorithm correctly and efficiently finds the leading eigenvectors of the covariance of datasets in a few iterations using up to hundreds of thousands of datapoints in thousands of dimensions.

The XML Revolution
Robert J. Glushko

XML, the Extensible Markup Language recently developed by the World Wide Web Consortium, is rapidly taking hold as the foundation of "document-based computing" through which the Web will become "self-descriptive" or "smart enough" to support applications that are infeasible given the limitations of HTML. XML will enable client-side processing, applications of intelligent agents, and automated integration of Internet services -- which taken together will revolutionize the Internet, especially for commerce. Dr. Robert J. Glushko is a technology expert and entrepreneur in information management and online publishing with two decades of experience in R&D, consulting, and product management. He is currently the Director, Information Engineering at Veo Systems, a Silicon Valley start-up building software for "the next generation of open Internet commerce" and is the Program Manager for eCoNet, a four-company multi-million dollar joint venture partly funded by an Advanced Technology Program award from the U.S. Department of Commerce. Prior to joining Veo, he was a co-founder and the Chief Scientist at Passage Systems, a consulting and systems integration firm specializing in SGML-based publishing. He previously worked at Bell Laboratories and at the Software Engineering Institute of Carnegie-Mellon University. He has a B.A. from Stanford, an M.S. (software engineering) from the Wang Institute, and a Ph.D. (cognitive psychology) from the University of California, San Diego.

This presentation is jointly sponsored by Information and Computer Science, the UCI Libraries, and the Office of Academic Computing.

Protocol Compiler
Dr.Barry M. Pangrle

This talk will describe a recently announced high level synthesis tool targeted towards synchronous digital interfaces. Protocol Compiler is based upon a language for describing the communication between components. The Protocol Compiler language allows designers to describe the communication interface at a higher level of abstraction thus keeping the designer working closer to the specification level. This reduces translation errors from specification to entry and enhances productivity. The output from Protocol Compiler is a register transfer level (RTL) VHDL or Verilog description of a finite state machine (FSM) controller for the interface. The output is fully synthesizable for implementation into an application specific integrated circuit (ASIC) or field programmable gate array (FPGA).

On the Power of Barely Random Online Algorithms
Steve Seiden

(This is joint work with John Noga.)

A barely random online algorithm is one which is distribution over some constant number of deterministic algorithms. An oblivious barely random online algorithm is a barely random online algorithm which chooses its distribution prior to serving the first request. Although a number of barely random algorithms have been presented in the literature, there has been, up to this point, no formal study of the properties of this class. Most researchers would probably agree that such algorithms are desirable, in that they conserve a precious resource---random bits. Such algorithms are also desirable in that they are generally easier to analyze than algorithms which use an unbounded number of random choices. However, it is important to consider the possible tradeoffs. The formal study of barely random online algorithms is initiated here. It is shown that barely random online algorithms are, under certain restrictions, no more powerful than oblivious barely random online algorithms. A general technique for proving lower bounds for barely random online algorithms is presented. This technique is an application of the well known von Neumann/Yao principle. For several simple online problems, lower and upper bounds for barely random online algorithms are given. The power of barely random algorithms for the ski rental problem is completely characterized. Other problems investigated include paging, total completion time scheduling, m-machine scheduling, interval scheduling and the cow path problem.

The Globus Project: building a computational grid
Carl Kesselman

Computational grids are computing environments that combine aspect of both parallel and distributed systems to provide pervasive, dependable and consistent access to high-performance remote computational resources. Enabled by the increased availability of high-speed networks and high-performance computing environments, computational grids enable the development of fundamentally new types of applications, such as distributed supercomputing, tele-immersion and smart-instruments. In one recent example, we ran the worlds largest distributed interactive simulation by exploiting computational grid technology. In this talk, I will discuss the challenges that face the construction of a persistent computational grid infrastructure and Globus, a research project at the University of Southern California and Argonne National Laboratories that is attempting to address some of these challenges. I will describe the design of the Globus system and present some of our research results in areas such as security, resource management and communication. Globus has been deployed on a large-scale testbed called the Globus Ubiquitous Supercomputer Testbed, or GUSTO. GUSTO spans over twenty sites and has access to over 2.5 TFLOPS of computing power. I will describe our experiences in not only building GUSTO, but in developing applications that can exploit its capabilities. Background information about the Globus project can be found at http://www.globus.org

Integrating Multiple Images for Segmentation
Dr. Michael Turmon

To understand the relationship of spatially resolvable solar activity to observed solar irradiance, it is necessary to find activity patterns within sequences of high-resolution solar images. To do this, we have used a set of images and related irradiance measurements from the MDI and VIRGO instruments aboard the recently-launched SoHO satellite. A trial period of image data (roughly 1 GB of digital images) has been segmented into three classes (sunspots, faculae, and quiet sun) using a statistically-based Markov random field framework. Expert knowledge is used to select both the relation of the observables to the activity classes, as well as the spatial patterns of the activity classes themselves. Segmentation of this one-month pilot period of magnetograms indicates active region features (areas and field strength) that correlate well with observed radiation emission, albeit with some scientifically unexpected and interesting aspects. Extending this work to better discriminate active regions and faculae, it co-registered in space and time with the original magnetograms. We have determined statistical models relating this bivariate observable to the underlying classes, and incorporated these models into a GUI-driven software system which automatically handles image registration and preprocessing steps. Work on using such a derived model to analyze a longer (1-year) segment of such bivariate images, comprising 20 GB of data, is now underway. (Joint work with Judit Pap, Saleem Mukhtar, Richard Bogart, Philip Scherrer, and Claus Frolich)

Cooperative Buildings: Extending the scope of CSCW beyond desktops
Dr. Dr. Norbert A. Streitz

In this talk, I will report about our new ideas on extending the scope of collaboration not only from desktops to meeting room support but taking a more comprehensive perspective resulting in highly flexible and dynamic work environments. This perspective is provided by the notion of "cooperative buildings" which addresses the issuesof how to integrate information technology, new work practicesresulting from organisational innovation, and the physical environment, the architectural structures and facility management.It incorporates also ideas from augmented reality and ubiquitous computing.

In order to illustrate this, we have developed i-LAND: an interactive landscape for creativity and innovation. i-LAND integrates several so-called "roomware" components into a combination of real, architectural as well as virtual, digital work environments for creative teams. By "roomware", we mean computer-augmented objects in rooms, like furniture, doors, walls, and others. The current realization of i-LAND covers an interactive electronic wall(DynaWall), an interactive table (InteracTable), two versions of computer-enhanced chairs (CommChairs). More components are planned.i-LAND is an example application of our vision of the Workspaces of the Future. At the same time, it is a testbed for developingnew forms of human-computer interaction and computer-supported cooperative work.

Reference:

Streitz, N., Geissler, J., Holmer, T. (1998). Roomware for cooperative buildings: Integrated design of architectural spaces and information spaces.

In: N. Streitz, S. Konomi, H. Burkhardt (Eds.), "Cooperative Buildings - Integrating Information, Organization, and Architecture". Lecture Notes in Computer Science, Vol. 1370. Springer: Heidelberg. pp. 4-21.

About the lecturer:

Dr. Dr. Norbert A. Streitz (Ph.D. in physics, Ph.D. in psychology)is currently the manager of the research division "AMBIENTE - Workspaces of the Future" of the Integrated Publication and Information Systems Institute (IPSI) in Darmstadt. He was also the Deputy Director of IPSI (1992-1998) and the manager of the research division "Cooperative Hypermedia Systems" (1991-1997),before he initiated the new research division in 1997 which is now under his direction.IPSI has about 100 employees and is one of the eight research institutes of the German National Research Center for Information Technology (GMD) which has a total of about 1100 employees.

Norbert Streitz teaches also at the Department of Computer Science of the Technical University Darmstadt. He has edited 11 books and published more than 60 technical publications. He is an associate editor of the journal ACM Transactions on Information Systems (TOIS)and a regular member of the program committees of the relevant national and international conferences in Hypermedia, Human-Computer Interaction, and CSCW. He is often asked to present seminars and tutorials, invited talks and keynote speeches in Europe, the US, and Japan. His general research activities include hypertext and hypermedia, human-computer interaction, computer-supported cooperative work (CSCW), and more recently augmented reality and ubiquitous computing.The latter results from his current activities where he is concerned with the relationship between new ways of organizing work, characteristics of information and communication technology and their implications for the design of innovative work environments including the architectural structures of "cooperative buildings".

A Presentation by Clifford Lynch

The UCI Libraries, Office of Academic Computing, and Information and Computer Science Department jointly invite you to attend a presentation by Dr. Clifford Lynch. Dr. Lynch will be giving an overview on current topics of special interest to the CNI community including scholarly publishing, authentication/authorization, and archiving/preservation of digital information. Bio: Clifford Lynch has been the Director of the Coalition for Networked Information (CNI) since July 1997. CNI, jointly sponsored by the Association for Research Libraries, CAUSE, and EDUCOM, includes about 200 member organizations concerned with the use of information technology and networked information to enhance scholarship and intellectual productivity. Prior to joining CNI, Lynch spent 18 years at the University of California Office of the President, the last 10 as Director of Library Automation, where he managed the MELVYL information system and the intercampus internet for the University. Lynch, who holds a Ph.D. in Computer Science from the University of California, Berkeley, is an adjunct professor at Berkeley's School of Information Management and Systems. He is a past president of the American Society for Information Science and is a fellow of the American Association for the Advancement of Science.

A Process-Oriented Heuristic for Model Selection
Pedro M. Domingos

Overfitting is often considered the central problem in machine learning. Current methods to avoid it are either data-oriented (using separate data for validation) or representation-oriented (penalizing complexity in the model). Both approaches have significant limitations. In this talk I will propose the logical next step: process-oriented evaluation, where a model's expected generalization error is computed as a function of the search process that led to it. I will develop the necessary theoretical framework, and describe its application to one type of learning: rule induction. A process-oriented version of the CN2 rule learner was empirically compared with the default CN2. The process-oriented version is more accurate in a large majority of the databases, with high significance, and also produces simpler models. Experiments with synthetic data suggest that process-oriented evaluation is particularly useful in high-dimensional datasets.

Instruction Cache Prefetching Using Multilevel Branch Prediction
Alexander V. Veidenbaum

Modern high-performance, multiple-issue processors have high clock rates and use small instruction caches to reduce access time. Such caches have significant miss rates reducing overall performance. Prefetching from secondary cache can hide the instruction cache miss penalty but only if initiated sufficiently far ahead of the current program counter. Existing instruction cache prefetching methods are strictly sequential and cannot do that due to their inability to prefetch past branches. This paper presents an instruction cache prefetching mechanism capable of prefetching past branches in. By keeping branch history and branch target addresses a future branch and its most likely target are predicted several branches ahead of the current branch. A prefetching architecture to utilize the predicted information is described and its accuracy evaluated. The impact of the new instruction prefetching method on overall processor performance and its interaction with sequential prefetching are described. For a 4-issue processor and a cache architecture patterned after the DEC Alpha-21164 the new prefetching unit can be more effective than sequential prefetching. The two types of prefetching eliminate different types of cache misses and can be effectively combined to achieve better overall performance.

The Design and Verification of a High-performance Low-control-overhead Asynchronous Differential Equation Solver
Kenneth Y. Yun

This talk describes the design and verification of a high-performance asynchronous differential equation solver. The design has low control overhead which allows the average-case delay to be 48% faster (tested at 22 degree C and 3.3V) than any comparable synchronous design (simulated at 100 degree C and 3V). The techniques to reduce completion sensing overhead and hide control overhead at the circuit, architectural, and protocol levels are discussed. In addition, the timed distributed control and symbolic model checking techniques that were used to gain higher confidence in the correctness of the protocol are discussed. The design has been fabricated in 0.5 um triple-metal HP CMOS14TB process and tested extensively. The fabricated chips perform as simulated, albeit somewhat faster. The test results show that our design is very robust with respect to variations in operating temperature and power supply voltage.

Production Based Synthesis Package for Finite State Machines

Forrest Brewer

The specification of finite state machines is conventionally performed using state transition diagrams or state charts to enumerate the deterministic states, their transitions and outputs. Although useful for small machines (small numbers of states), such descriptions leave much to be desired for larger designs, especially for constructions with several communicating sub-machines. In contrast, the PBS (Production Based Synthesis) package for specification and synthesis of finite state machines uses non-deterministic automaton techniques and provides a feasible system to describe finite state machines of much larger size and complexity. PBS uses a compact representation of named sequences of events which allow the designer to construct macros and templates of appropriate events and hierarchically use them to construct even higher level abstraction sequences. Most importantly, the assignment of events to state need not be made by the designer as event synchronization primitives are provided by the language. Synthesis of a finite state implementation from this format is surprisingly simple and relatively efficient machines can be constructed in time proportional to the number of bits (not the number of states) in the design. Practical PBS machines have been constructed with more that 10^35 states while maintaining mapped logic depths of only 7 2-input gates. Current research involves generalization of the primitives, scheduling optimizations and synthesis of distributed controllers meeting protocol specifications. This talk will present an overview of the PBS language and traversal based synthesis techniques, current results and future research directions.

Multimedia Analysis and Retrieval System (MARS) Project

Sharad Mehrotra

The goal of the MARS project is the design and development of next generation information systems that provides seamless access to multimedia information based on its rich internal content. Due to many fundamental limitations of retrieving multimedia information based solely on textual annotations, we have adopted a vision centric approach in which objects are represented and retrieved based on low-level visual features (e.g., color, texture, layout, etc). These visual properties may be extracted automatically from images/video making the approach scalable to large as well as heterogeneous multimedia collections. Supporting content-based queries over visual feature representations poses many significant challenges to existing practice of database management (DBMS) and information retrieval (IR). Existing IR techniques that deal primarily with textual information need to be generalized to support content-based retrieval over multimedia. Furthermore, since visual feature representations define complex non-euclidean vector spaces, techniques need to be developed to support such complex multidimensional information in DBMSs. Another challenge is to integrate multimedia IR techniques with DBMSs. Problem arises since existing DBMSs do not have any native support for storage and processing of imprecise information while content-based retrieval is inherently imprecise. In this talk, I will provide an overview of the progress we have made in addressing the above problems in supporting multimedia information in DBMSs and our research directions. If time permits, I will also show the applicability of our research in DBMSs and IR to other applications dealing with imprecision and spatio-temporal information.

An Adaptive Resource Management Architecture for Global Distributed Computing

Nalini Venkatasubramanian

Global information systems consist of asynchronous, autonomous components that are open and distributed. These components will need to handle multiple media types and satisfy varying Quality-of-Service(QoS) requirements. Adapting to changing system conditions and application requirements is often difficult and error-prone. Resource management mechanisms can potentially interfere with each other in non-compatible ways, especially in the presence of QoS requirements.

I will present a framework for providing adaptive, QoS-enabled resource management for global distributed applications. The framework supports safe composability of resource management mechanisms required to provide cost-effective QoS. Specifically, I will define a meta-architecture that permits customization of policies for placement, scheduling, synchronization and management of components. The meta-architecture allows for the separate specification of system policies from application requirements and hence permits either to be customized independently to adapt to changing system conditions and application requirements. To ensure non-interference, I have identified core resource management services -- remote creation, distributed snapshot and naming services and will show how they can be used as a basis for creating more complex activities. I will discuss the specification of services at different levels of abstraction, and reason about how these services can be safely composed to execute simultaneously. I will discuss the application of the meta-architectural framework to formulate policies for QoS based load management in distributed multimedia applications. Finally, I will describe my research plans to develop systems support for wide-area network-centric and mobile computing environments.

Fast Algorithms for Spatial and Multidimensional Joins

Nick Koudas

Since the introduction of the relational model of data, the join operation has received much attention due to its unique feature of combining data from different relations. As Database Management Systems become richer in data types, there is increasing interest to extend the join operation to new data types, like geographical or spatial and multimedia data.

In this talk, we first present an overview of work in the area of spatial and multimedia queries. We then introduce a new algorithm to compute the spatial join of two or more spatial data sets, when indexes are not available on them. Size Separation Spatial Join (S^{3}J imposes a hierarchical decomposition of the data space and, in contrast with previous approaches, requires no replication of entities from the input data sets. Thus its performance is a function of the original size of the joined data sets.

We describe S^{3}J and present an analytical and experimental evaluation of its I/O and processor requirements comparing them with those of previously proposed algorithms for the same problem. In addition, we introduce Dynamic Spatial Bitmaps (DSB), a new technique that enables S^{3}J to dynamically or statically exploit bitmap query processing techniques.We conclude by discussing generalizations of S^{3}J in order to apply to other data types.

Managing Change in Autonomous Databases

Sudarshan S. Chawathe

We are witnessing a rapid growth in the number and size of heterogeneous collections of autonomous databases. Individual databases in such collections are owned and managed by independent, and often competing, entities that cooperate to only a limited extent. For example, the collection of databases used in the construction of a building includes databases owned by the architect, the construction company, the electrical contractor, and so on. Such autonomous database collections are also common on the Internet. (For example, the collection of Web databases with information about San Francisco consists of databases operated by several competing entities.) Making effective use of such collections of autonomous databases presents several challenges due to the absence of traditional database facilities such as locks, transactions, and standard query languages. In particular, understanding and controlling how such databases evolve is an important problem that traditional database techniques are ill-equipped to address.
In this presentation, I first motivate the need for managing change in autonomous databases and discuss the main challenges it presents. I then describe a method for detecting changes in autonomous databases by comparing snapshots of data. This method is based on novel algorithms for computing a minimum-cost edit script between two trees. I also briefly present a data model for storing changes in autonomousdatabases, and a query language over data and history stored in this model. A key feature of this model and language is that they model and query changes directly, instead of as the difference between two states. I conclude by describing the implementation of a change management system that incorporates these ideas. This work is part of The C3 Project at Stanford. Further information, including a system overview and recent publications, is available at http://www-db.stanford.edu/c3/.

Hardware Compilation: The Translation of Programs into Circuits

Niklaus Wirth

We explain how programs specified in a sequential programming language can be translated automatically into a digital circuit. The possibility to specify which parts of a program are to be compiled into instruction sequences for a conventional processor, and which ones are to be translated into customized circuits has gained relevance with the advent of large programmable gate arrays (FPGA). They open the door to introduce massive, fine-grained parallelism. In order to demonstrate the feasibility of this approach, we present a tiny programming language featuring the basic programming and circuit design facilities.

Protein Structure and Function Prediction through Evolutionary Sequence Analysis -- Results and Proposed Applications in the Post-Genomics Era

Dietlind L. Gerloff

The key to the function of proteins lies in their folded structures in three dimensions. The effort to determine a new protein structure experimentally, however, still involves several years of work, and often fails to answer the central questions as to the biological role (function) of the particular protein. Attempts to predict the folded structure of proteins from studying the amino acid sequences of evolutionarily related proteins can yield important information for addressing both these problems.

The ultimate goal of protein structure prediction, which would be a general method for predicting the tertiary (three-dimensional) structure of any protein from its sequence alone, is clearly still far from being reached today. Nevertheless, there has been significant progress in the past years, for example through the consideration of multiple sequence information in "de novo" prediction cases (those without detectable sequence similarity between the target protein and the known structures). In most approaches, however, the inclusion of homologous protein sequences is regarded as a merely statistical enhancement of the data set for analysis. The secondary structure prediction method developed with Steven A. Benner at the ETH Zurich reflects a different viewpoint in that we take into account the underlying evolutionary relationships between the proteins in the set to extract additional structural, and functional, information from the multiple sequence alignment. Among the key steps in the approach is the deduction of tertiary structural information for each position in the protein from the amino acid substitution patterns at the corresponding alignment positions, i.e. the identification and structural interpretation of the natural constraints that ascertain the production of a folded, functional protein and consequently the survival of the respective host organisms.

Our approach has yielded good results in the past few years and ranks among the best secondary structure prediction methods in the field. Much more important in the genomics era, however, is the potential of the underlying consideration to also yield valuable clues for tertiary structure and protein function prediction from sequence data. The lecture will illustrate how we proceed towards tertiary structure assembly at low resolution by discussing some of our submissions in the CASP2 prediction contest in December 1996). Importantly, the prediction of functional aspects has played a crucial role in our most successful predictions over the years.

Combining the prediction of functional and structural aspects in a more systematic way is central to my research plans. The data available through "post-genomic" efforts like the "Cancer Genome Anatomy Projects" (coordinated throught the NCI) open up new opportunities for applying the structural-functional concept to extract valuable information from the vast amount of generated data. I will outline my proposed scheme for CGAP data analysis which is the core part of my research plans for the near future.

VLSI Interconnect Layout Optimization in Deep Submicron Designs

Jason Cong

In the first part of my talk, I shall present the theory and general technique of the local refinement (LR) based optimization, which was developed from our research on interconnect layout optimization. It has been successfully applied to optimal wire sizing, simultaneous device and wire sizing under the general table-lookup based device delay models, and global interconnect sizing and spacing under the general coupling capacitance models. We believe that the LR-based optimization will find applications in many other layout and synthesis optimization problems.

In the second part of my talk, I shall discuss the trend of interconnect performance as implied by the newly revised National Technology Roadmap for Semiconductors (NTRS'97, draft), and show the impact of various interconnect optimization techniques on interconnect performance in each technology generation (from the 0.25um to 0.07um technology generations).

Navigating Digital Libraries and Multimedia Archives

Robert B. Allen

Research Scientist, Bellcore

Traditional libraries use several techniques for facilitating access to large amounts of information. In particular, libraries structure their collections with metadata and hierarchical classifications. Five systems developed over the past few years by the author, along various collaborators, facilitate access to digital collections with structures similar to those used by traditional libraries. These systems include a) the Dewey GUI, b) the Facet Browser, c) hierarchical clustering of search hits, d) IBrowse/RAVE for structured video, and e) the Timeline Toolkit.

Browsing facets from a thesaurus with the Facet Browser is particularly challenging because the facets can be multiply inherited. The commercial development of IBrowse/RAVE and future directions for work on structured multimedia will be discussed.

Dynamic Categorization: A Method for Decreasing Information Overload

Wanda Pratt

Searching for information from free-text sources such as the web or repositories of the published literature can be a frustrating and overwhelming experience. Often search tools present users with a daunting list of search results, yet give them no hint of how those documents relate to their information need. Many search tools address this problem by helping users to make their searches more specific. However, when dozens or hundreds of documents are relevant to their question, users need tools that help them to explore and to understand their search results, rather than ones that eliminate a portion of those results.

To address this problem, I developed dynamic categorization, an approach that automatically organizes search results into meaningful categories by using knowledge of the user's query and a model of the domain terminology. In this talk, I describe DynaCat, an implementation that organizes documents returned from searches of the medical literature. Using an example from the domain of breast cancer, I demonstrate how DynaCat's categories provide a query-sensitive summary of the kinds of information found and help users to explore and understand their search results. I discuss DynaCat's preliminary evaluation, the opportunities for extending the system, and the final evaluation strategy. I conclude by describing my future research plans for other knowledge-rich, interaction-based solutions to the problems encountered when people search textual information sources.

Trailblazing Path to Semantic Interoperability in Digital Library Research

Hsinchun Chen

In this era of the Internet and the World Wide Web, the long-standing topic of digital libraries has become white hot. As the Internet expands and new multimedia and database technologies become mature, the goal of creating large-scale, distributed, ubiquitiously accessible, user-centered digital libraries to support learning, education, and knowledge sharing has wide support from researchers, practitioners, and the general public.

In this talk I will describe my research program which aims at achieving "deep semantic interoperability" in digital library research. Based on research findings obtained from several NSF-funded Digital Library Initiative projects and the DARPA-funded Information Management program, I will review our research in noun phrasing, semantic retrieval, neural network-based information analysis and visualization. These techniques will be described and demonstrated in the context of several ongoing digital library, GIS, medical informatics, and Internet/Intranet projects.

The Geneva Messengers

Christian F. Tschudin
University of Zurich

Messengers are autonomous flows of control that can spawn themselves dynamically across the network. I will present the Geneva Messenger Project in three parts. The first part concentrates on our specific approach to auto mobile code (instructional communication, local services only) and briefly describes the M0 (m-zero) messenger system. The second part of the talk is on resource management, more specifically on the use of economic models for coordinating the resource allocation in a distributed but otherwise uncoordinated messenger space. Letting mobile programs make buy/sell decisions leads to interesting security questions. In the third part I report on recent theoretical results (joint work with Tomas Sander, ICSI) on protecting mobile code from being tampered with by the underlying execution environment that are based on executable encrypted functions.

WebCite: Finding and Visualizing Structure in Hyper-Linked Collections

Loren Terveen

For many purposes, the Web page is too small a unit of interaction. Users often want to interact with larger-scale entities, particularly collections of topically related items. We report three innovations that address this user need.We replace the web page with the web site as the basic unit of interaction and analysis.We define a new information structure, the clan graph, that groups together sets of related sites.We illustrate a new graph visualization, the auditorium visualization, that reveals important structural and content properties of sites within a clan graph. In this talk, I will define, motivate, and give examples of the clan graph information structure and the auditorium visualization. I also will discuss preliminary evaluation, plans for more rigorous evaluation, and several possible usage scenarios.

An Introduction to AT & T Labs

Loren Terveen

AT&T Labs is one of the premier research and development organizations in the world. We have over 1,400 R&D staff members working on leading-edge technologies in computer and communications-related fields. Our mission is to be the innovation engine for AT&T and a world leader in science and technology. AT&T Labs is recruiting outstanding candidates in all areas of computer science and related fields. In this talk I will give a brief overview of research topics being pursued at AT&T Labs, mention some recruiting needs, talk about what it is like to work at the Labs, and leave ample time for questions.

Computational Analysis of DNA Structural Transitions

Dr. Craig Benham
Mount Sinai School of Medicine

DNA within living organisms occurs in a superhelical condition, which imposes stresses on the molecule. These stresses can destabilize the B-form duplex, causing local strand separations to occur at specific positions within the DNA sequence where the thermodynamic stability is least. Three computational methods have been developed to predict the locations and extents of destabilization in kilo-base length, superhelical DNA sequences. The results of these analyses agree precisely with experimental determinations of the extents and locations of denatured regions observed in vitro and in vivo. This allows their use to predict the destabilization properties of other sequences, on which experiments have not been performed.

This talk will describe the techniques used to analyze duplex destabilization in superhelical DNA. A selection of predictions regarding the destabilization properties of three classes of regulatory regions will be presented. The implications of these results concerning possible regulatory mechanisms will be considered. The close associations documented here suggest that methods to evaluate the destabilization properties of putative regulatory regions may be useful in discriminating which sites are active. The incorporation of these methods into strategies to search genomic sequences for regulatory regions will be discussed.

NSF-CISE 98: The Reorganization -- Programs, Structure, People and Thrusts

Dr. Robert P. Grafton
Program Director, National Science Foundation

The reorganization of Computer Information Sciences and Engineering (CISE) Directorate will impact considerably the way existing research programs do business with the NSF. Dr. Grafton will discuss the reorganization and its impact on the programs, particularly related to Microelectronic Information Processing Systems.

Making Sound with Visual Computer Tools

David Zicarelli

David Zicarelli will discuss the history of scientific and engineering methods for visualizing sound and our perception of it. He will demonstrate software that displays sound signals at various stages of processing in a computer environment, and explore how it affects the user's experience in the environment.

David Zicarelli is the programmer of computer music applications that are widely acknowledged to be among the most innovative and imaginative tools for experimental music. In the 1980s, he worked on the interactive algorithmic composition programs M, Jam Factory, and OvalTune for Intelligent Music. For the past seven years he has been working on MAX, a graphical programming environment for music originally developed by Miller Puckette. Zicarelli's latest collaboration with Puckette is a signal processing extension to MAX called MSP, distributed by the software company Cycling '74. He holds a Doctorate in Hearing and Speech Sciences from Stanford University.

For more information about the Gassmann Electronic Music Series, visit http://www.arts.uci.edu/dobrian/gemseries97-98.htm

1997 Colloquia

*************

Cognitive Modeling in Human-Computer Interaction
Associate Prof. Bonnie John, CMU
Wednesday, December 10, 1997
Abstract

Design of Embedded Systems
Prof. Bill Lin, University of California, San Diego
Friday, December 5, 1997
Abstract

The Activity Checklist: A Tool for Representing the "Space" of Context
Bonnie Nardi
Monday, November 24, 1997
Abstract

Constraint Programming for Coordination for Constraint Programming.
Ugo Montanari, SRI International
Wednesday, November 19, 1997
Abstract

Parallel, Persistent and Polymorphic Object-Sets
Prof. Dr. Laszlo Boszormenyi, University of Klagenfurt (Austria)
Tuesday, November 18, 1997
Abstract

The Eyes Have It: Understanding Information Visualization
Ben Shneiderman, University of Maryland
Tuesday, October 28, 1997
Abstract

Internet-Telephony based on the H.323 Standard
Martin Gitsels, Siemens Corporate Research
Monday, September 22, 1997
Abstract

Making Networked Information Useful and Usable
Frank M. Shipman, III
Monday, July 7, 1997
Abstract

DNA MicroChip Technology: A Novel Approach to Fundamental Questions in Molecular Genetics
Teresa Webster, PhD
Monday, May 5, 1997
Abstract

Code Generation for Core Processors
Peter Marwedel, University of Dortmund, Germany
Friday, June 6, 1997
Abstract

The Importance of Disordered Regions in Proteins:
Detection, Prevalence, Prediction, and Biological Significance
Prof. A. K. Dunker, Washington State University
Monday, May 12, 1997
Abstract

Cognitive Modeling in Human-Computer Interaction

Associate Prof. Bonnie John

The field of Human-Computer Interaction (HCI), whose goal is to make computers support human activity in much more satisfying ways than they currently do, has three main uses for cognitive modeling. A cognitive model can substitute for a human user to predict how users will perform on a system before it is implemented or even prototyped. A system can generate a cognitive model of the user currently interacting with the system in order to modify the interaction to better serve that user. Finally, cognitive models can substitute directly for people so groups of individuals can be simulated in situations that require many participants, e.g., for training or entertainment.

The first use, as a predictor of human performance, has had substantial success both in the laboratory and in industry (John, 1995) . A particular family of modeling techniques, called GOMS (Card, Moran & Newell, 1983; John & Kieras, 1996a) , has been credited with saving millions of dollars for a telephone company by predicting the poor efficiency of a system before it was purchased and deployed (Gray, John & Atwood, 1993) , and has had other successes in industry and government (John & Kieras, 1996b) . This talk will present some instances of GOMS models, open research issues, and the barriers to its widespread use.

The second use, creating a model of the current user with which to modify the interaction is evident in "cognitive tutors" in education. I'll discuss some examples of cognitive tutors built within the ACT-R theory of cognition and in use in the Pittsburgh USA public schools (Anderson, Corbett, Koedinger & Pelletier, 1995) . Again, open research issues and barriers to widespread use will be discussed.

Finally, I will present an example of the third use, models substituting directly for people in real-time interaction. This example, built within the Soar theory of cognition, simulates aircraft crew in a "synthetic theater of war" where thousands of entities (actual people and armaments, people in simulators, "dumb" artificial agents, and cognitively-plausible intelligent agents) all interact in real-time (Tambe et al., 1995) . I will discuss the level of cognitive plausibility currently employed in these models and some future directions for that work.

Effectiveness of Register Preloading on CP-PACS Node Processor

Prof. Hiroshi Nakamura

CP-PACS is an MPP (massively parallel processor) for large scale scientific calculations. Since September 1996, CP-PACS with 2048 node processors has been operational at University of Tsukuba. It is well known that ordinary data cache is not effective in such applications. Cache prefetching is well-known technique for hiding memory latency. The CP-PACS node processor is equipped with new mechanism called register preloading. This mechanism enables the processor to transfer required floating-point data directly(not via data cache) from/into main memory into/from floating-point registers in pipelined way. This talk focuses on the comparison between register preloading and cache prefetching and shows the effectiveness of register preloading.

Design of Embedded Systems
Prof. Bill Lin

In this talk, I will present an overview of our ongoing research on design methodology and tools for hardware/software embedded systems. I will focus in particular on our recent work on statically compiling concurrent programs, specified as a set of commmunicating processes, to an efficient embedded software implementation without the need for run-time scheduling. I will discuss our experience with these tools on several industrial applications.

The Activity Checklist: A Tool for Representing the "Space" of Context

Bonnie Nardi

I will discuss the Activity Checklist, a practical tool to provide guidance and structure for empirical work that takes account of context in the design and evaluation of computer-based technologies. The Checklist was developed jointly with Victor Kaptelinin of Umea University in Sweden. It is based on activity theory, which provides concepts and vocabulary for analyzing the context of computer-supported work. I will briefly discuss activity theory, present the Checklist as a tool for orienting field studies of computer-supported work, and show its use via an example of a specific technology, Apple Data Detectors.

BIO: Bonnie Nardi is an anthropologist who has worked in the high tech industry for several years. She has studied the use of technology in offices, hospitals, schools and libraries. Her theoretical orientation is activity theory, a philosophical framework developed by the Russian psychologists Vygotsky, Luria, Leont'ev and their students. She is the author of A Small Matter of Programming: Perspectives on End User Computing, MIT Press, 1993, and the editor of Context and Consciousness: Activity Theory and Human-Computer Interaction, MIT Press, 1996. Her Christian Science Monitor Op-Ed piece (with Vicki O'Day and Ed Valauskas) "Put A Good Librarian, not Software, in Driver's Seat" won the Special Libraries Association 1996 Media Award.

Constraint Programming for Coordination for Constraint Programming.

Ugo Montanari

Distributed systems are usually asynchronous, but some level of synchronization is needed to make possible basic activities as message reception. In practice, a key role is often played by more complex kinds of coordination, like transactions, distributed problem solving or computer-supported collaborative work.

Grammars for Distributed Systems (Degano and Montanari, 1987) are a simple model of computation for synchronized graph rewriting. The problem of determining a computation step can be very complex (it is NP-complete) and can be effectively represented and solved as a constraint problem.

A more general notion of coordination is captured by the tile model: Tiles (see a bibliography at http://www.csl.sri.com/~ugo/tiles.html) are squares representing rewriting rules with side effects, similar to SOS inference rules. Tiles rely for synchronization on a concept, equality of side effects, which can be also modeled as a constraint satisfaction problem. (work in collaboration with Francesca Rossi)

Parallel, Persistent and Polymorphic Object-Sets

Prof. Dr. Laszlo Boszormenyi

General-purpose programming languages do not provide the proper abstractions for handling large amounts of persistent and distributed data. They still suggest the paradigm of loading computations into an empty computer. Nevertheless, most computations are done in the context of huge amounts of existing data that is persistent and distributed. Neither arrays nor references and not even objects are the proper tool to abstract such contexts.

Applying sets in providing such an abstraction has a number of advantages:

Sets have a well-understood, sound theory;

The view of database systems to persistent data is based on sets;

By definition, sets are unordered, and hence especially well-suited for being processed in parallel.

Consequently, sets provide a unified abstraction to describe both persistent and distributed data.

M3Set

The programming language M3Set is an extension of Modula-3 by polymorphic sets of objects. The compiler can statically check the safe use of polymorphic classes (a class is a container of objectsets). Beside the usual set operations, a very powerful and general select expression is provided. With the help of this expression, all usual (SQL-like) queries can expressed in a declarative manner. This enables the compiler to optimize and parallelize them in a relatively easy way (as opposed to nested loop parallelization). A similar extension for Java is under development.

PPOST

PPOST (Parallel and Persistent Object Store) is a memory-resident object store. The advantages of a memory-resident store are obvious: it is fast and can store arbitrarily complex data structures. It has two obvious disadvantages as well: it does not scale and it is volatile. Both of these disadvantages can be eliminated by using parallelism. PPOST uses horizontal parallelism to ensure nearly linear scalability while storing a very large number of objects. It applies vertical parallelism to provide persistence almost entirely in the background, and allowing transaction processing actually with memory speed. PPOST can be used both stand-alone and as the run-time system for M3Set.

The Eyes Have It: Understanding Information Visualization

Ben Shneiderman

Human perceptual skills are remarkable, but largely underutilized by current graphical user interfaces. The next generation of animated GUIs and visual data mining tools can provide users with remarkable capabilities if designers follow the Visual Information-Seeking Mantra:

Overview first, zoom and filter, then details-on-demand.

But this is only a starting point in the path to understanding the rich set of information visualizations that have been proposed. Two other landmarks are:

Direct manipulation: visual representation of the objects and actions of interest and rapid, incremental, and reversible operations

Dynamic queries: user controlled query widgets, such as sliders and buttons, that update the result set within 100msec.

and are shown in the FilmFinder, Visible Human Explorer (for National Library of Medicine's anatomical data), NASA EOSDIS (for environmental data), and LifeLines (for medical records and personal histories).

As a guide to research, information visualizations can be categorized into 7 datatypes (1-, 2-, 3-dimensional data, temporal and multi-dimensional data, and tree and network data) and 7 tasks (overview, zoom, filter, details-on-demand, relate, history, and extract). Research directions include algorithms for rapid display update with millions of data points, strategies to explore vast multi-dimensional spaces of linked data, and design of advanced user controls.

Ben Shneiderman is a Professor in the Department of Computer Science, Head of the Human-Computer Interaction Laboratory, and Member of the Institute for Systems Research, all at the University of Maryland at College Park.

Dr. Shneiderman is the author of Software Psychology: Human Factors in Computer and Information Systems (1980) and Designing the User Interface: Strategies for Effective Human-Computer Interaction (1987, second edition 1992, third edition 1997), Addison-Wesley Publishers, Reading, MA.

Dr. Shneiderman has co-authored two textbooks, edited three technical books, and published more than 180 technical papers and book chapters. His 1993 edited book Sparks of Innovation in Human-Computer Interaction collects 25 papers from ten years of research at the University of Maryland. This collection includes Dr. Shneiderman's seminal paper on direct manipulation, a term he coined in 1981 to describe the graphical user interface design principles: visual presentation of objects and actions combined with pointing techniques to accomplish rapid incremental and reversible operations.

Ben Shneiderman received his BS from City College of New York in 1968, his PhD from State University of New York at Stony Brook in 1973. He received an Honorary Doctorate of Science from the University of Guelph, Ontario, Canada in 1996 and was elected as a Fellow of the Association for Computing (ACM) in 1997.

Internet-Telephony based on the H.323 Standard

Martin Gitsels

Telephony and video communication via the Internet are subjects that are currently investigated with growing interest. A major number of products implementing Internet telephony and multimedia collaboration are already available but typically use proprietary codecs and protocols. The H.323 standard by the ITU, which is named "Packet Based Multimedia Communications Systems", is viewed by many developers as well-suited basis to enable interoperability of systems in a heterogeneous environment. This talk gives an overview about Internet telephony that is based on the H.323 standard. Particularly its properties, restrictions, and extensions are discussed. Finally, the structure of a H.323 compatible Internet phone currently developed at Siemens Corporate Research is presented.

Making Networked Information Useful and Usable

Frank M. Shipman, III

The number of networked information sources and the community of users accessing these resources are growing rapidly. With this growth has come an increasing demand for tools to help people performing real tasks make use of this information. This presentation will describe two projects looking at new interfaces to networked information. The first project, Walden's Paths, provides teachers a way to collect and structure information on the Web for their students. The second project, VIKI, provides information analysts a visual and spatial workspace in which to collect and interpret information.

Frank M. Shipman III is an Assistant Research Scientist in the Department of Computer Science and Center for the Study of Digital Libraries at Texas A&M University. He has been pursuing research in the areas of hypermedia, computer-supported cooperative work, and intelligent user interfaces since 1987. Dr. Shipman's doctoral work at the University of Colorado and subsequent work at Xerox PARC and Texas A&M has investigated combining informal and formal representations in interfaces, methods for supporting incremental formalization, and interfaces for networked information sources.

DNA MicroChip Technology: A Novel Approach to Fundamental Questions in Molecular Genetics

Teresa Webster

The advent of the DNA microchip, introduced by Affymetrix in Santa Clara, CA just last year, may represent the next paradigmatic advance in genetic analysis. The technology joins together photolithography methods that are routinely used in the semiconductor industry with standard oligonucleotide chemistry. The result is a silica chip the size of a thumbnail potentially containing hundreds of thousands of oligonucleotide probes of predetermined sequence. DNA microchips can be used to address fundamental questions in molecular genetics including genetic variablity, gene expression, and genetic linkage.

Code Generation for Core Processors

Peter Marwedel

This talk responds to the rapidly increasing use of processor cores for implementing systems-on-a-chip. Such cores are avaiable both from vendors and within system companies. Applications can be found in most segments of the embedded system market, such as automotive electronics and telecommunications. These applications demand for extremely efficient processor architectures, optimized for a certain application domain or even a certain application.

Current compiler technology supports these architectures very poorly and has recently been recognized as a major bottleneck for designing systems quickly, efficiently and reliably. A number of recent research projects aim at removing this bottleneck.

In this talk, we will briefly discuss the trend towards cores in general and towards core processors in particular. We will show market trends for processor cores, give some examples of cores and we will briefly touch the issues concerning application specific instruction-set processors (ASIPs), application-specific signal processors (ASSPs), soft cores and hard cores.

We will then present new code optimization approaches taking the special characterstics of core processor architectures into account. In particular, we will present new memory allocation and code compaction algorithms.

In the final section of the talk, we will present techniques for retargeting compilers to new architectures easily. These techniques are motivated by the need for domain- or application-dependent optimizations of processor architectures. The scope for such optimizations should not be restricted to hardware architectures but has to include the corresponding work on compilers as well. We will show, how compilers can be generated from descriptions of processor architectures. Presented techniques aim at bridging the gap between electronic CAD and compiler generation.

The Importance of Disordered Regions in Proteins:
Detection, Prevalence, Prediction, and Biological Significance

Prof. A. K. Dunker

The central dogma of molecular biology is that DNA sequence determines messenger RNA sequence which in turn determines amino acid sequence. A given amino acid sequence then determines the one, specific, unique 3 dimensional structure for the given protein. The final 3D structure of the protein enables it to carry out its function. One of the central, unsolved problems in molecular biology is the code by which a given amino acid sequence determines a given 3D fold. This is called the "protein folding problem."

We recently noticed that many functional regions of proteins apparently don't fold into specific 3D structures, but rather remain in an unfolded or disordered state. Since amino acid sequence is supposed to determine protein folding, we reasoned that amino acid sequence should also determine non-folding as well. We used simple data analyses and neural networks to test whether correlations can indeed be found between amino acid sequence and non-folding. Our results, which may ultimately require a significant restructuring of the central dogma of molecular biology, suggest that there are, indeed, understandable relationships between lack of folding and amino acid sequence, and further, contrary to current views, that nature is evidently very rich in non-folding sequences. These computer studies encouraged us to consider possible roles of unfolded protein states in the realm of molecular biology. This trail led to a new classification scheme for molecular recognition and a proposal for a critical role of disordered regions in the evolution of complex biological networks.

Information and Computer Science
University of California, Irvine
Irvine, CA 92697-3425

6/20/00 1:25 PM
Cathy Kick