Coffee Talk Speakers
|From Events to Situations||Opher Etzion
Yezreel Valley College, Israel
UC Irvine, US
Keynote 1: How Event-Based Systems Took Over the World
Imperial College London
Distributed event-based systems are centred around the notion of event messages, which are generated by data producers, processed by intermediate entities, and received by data consumers. They exhibit a loosely-coupled architectural style, which results in scalable and robust system designs. This makes event-based systems extremely well-suited for the processing of event messages with low latency. In this talk, I will argue that event-based systems have been very influential within the distributed systems and databases communities over the past two decades. In particular, they have led to the design of modern low-latency data processing systems, both in industry and academia. Based on examples from my own research and from others, I will provide an overview of the evolution of event-based systems, starting from complex event processing (CEP) systems and distributed stream processing engines. More recently, these systems have given rise to distributed dataflow platforms such as Storm, Spark, Flink and SEEP, which can process large amounts of event data in near real-time by exploiting data parallelism. I will conclude the talk with open research challenges in event-based systems, which I believe will keep this community engaged for years to come.Speaker Bio:
Peter Pietzuch is an Associate Professor (Reader) at Imperial College London, where he leads the Large-scale Distributed Systems (LSDS) group in the Department of Computing. His research focuses on the design and engineering of scalable, reliable and secure large-scale software systems, with a particular interest in data management and networking issues. He has published over seventy research papers in international venues, including SIGMOD, VLDB, ICDE, USENIX ATC, ICDCS, NSDI, CoNEXT, CCS, Middleware and DEBS. He has co-authored a book on Distributed Event-based Systems published by Springer. Before joining Imperial College London, he was a post-doctoral research fellow at Harvard University, working on global stream processing systems. He holds PhD and MA degrees from the University of Cambridge for his work on scalable publish/subscribe systems.
Carnegie Mellon University, US
Self-driving vehicles seem to have become quite the rage in popular culture over just the past few years, triggered in good part by the DARPA Grand Challenges. Self-driving vehicles indeed hold the potential to revolutionize modern transportation. This talk will provide some insights on many basic questions that, need to be addressed for the revolution to take place in practice. What are the technological barriers that currently prevent vehicles to be driverless? What can or cannot be sensed or recognized? Can vehicles recognize and comprehend as good as, if not better than, humans? Does connectivity play a role? Will the technology be affordable only for the few? How do issues like liability, insurance, regulations and societal acceptance impact adoption? The talk will be based on road experiences and will add some speculation.Speaker Bio:
Prof. Raj Rajkumar is the George Westinghouse Professor of Electrical & Computer Engineering and Robotics Institute at Carnegie Mellon University. At Carnegie Mellon, he directs the National University Transportation Center for Safety, which is sponsored by the US Department of Transportation. He also directs the Real-Time and Multimedia Systems Laboratory (RTML), and co-directs the General Motors-Carnegie Mellon Connected and Autonomous Driving Collaborative Research Laboratory (CAD-CRL). Raj has served as the Program Chair and General Chair of six international ACM/IEEE conferences on real-time systems, wireless sensor networks, cyber-physical systems and multimedia computing/networking. He has authored one book, edited another book, holds three US patents, and has more than 160 publications in peer-reviewed forums. Eight of these publications have received Best Paper Awards. He has given several keynotes and distinguished lectures at several international conferences and universities. He is an IEEE Fellow, an ACM Distinguished Engineer and a co-recipient of the IEEE Simon Ramo Medal. He has been given an Outstanding Technical Achievement and Leadership Award by the IEEE Technical Committee on Real-Time Systems. Prof Rajkumar’s work has influenced many commercial operating systems. He was also the primary founder of Ottomatika Inc., a company that focused on delivering the core software intelligence for self-driving vehicles. Ottomatika was recently acquired by Delphi. His research interests include all aspects of cyber-physical systems.
Twitter Inc., US
With increasing availability of data in real-time, there has been a shift to proactive analytics rather than reactive, e.g., predictive maintenance of equipment, demand prediction, fraud detection, dynamic pricing. Even for traditional applications of predictive analytics such as, but not limited to, churn management, cross-selling, propensity to purchase and customer lifetime prediction, there is a critical need to provide insights in real-time. In addition, several new applications involving IOT are driving the need to be in real-time. In light of the above, several real-time processing platforms – both open source as well as commercial – have been developed in recent years. In this talk, I will delve into what real time analytics means, why it is needed and how it is being deployed across organizations. Finally, I will give some ideas on where it is headed in the future.Speaker Bio:
Karthik is the engineering manager for Real Time Analytics at Twitter. He has two decades of experience working in parallel databases, big data infrastructure and networking. He cofounded Locomatix, a company that specializes in real timestreaming processing on Hadoop and Cassandra using SQL that was acquired by Twitter. Before Locomatix, he had a brief stint with Greenplum where he worked on parallel query scheduling. Greenplum was eventually acquired by EMC for more than $300M. Prior to Greenplum, Karthik was at Juniper Networks where he designed and delivered platforms, protocols, databases and high availability solutions for network routers that are widely deployed in the Internet. Before joining Juniper at University of Wisconsin, he worked extensively in parallel database systems, query processing, scale out technologies, storage engine and online analytical systems. Several of these research were spun as a company later acquired by Teradata.
He is the author of several publications, patents and one of the best selling book "Network Routing: Algorithms, Protocols and Architectures." He has a Ph.D. in Computer Science from UW Madison with a focus on large scale databases and big data.
Invited talk 1: From Trill to Quill: Pushing the Envelope of Functionality and Scale
Microsoft Research, US
With the increasing volumes of data being acquired, stored, and processed in the cloud, online and offline analytics to derive value from such data has become very important. At Microsoft Research, we have been exploring new system designs that cover a diverse range of analytics scenarios at high performance. In this talk, I will briefly describe Trill (stands for a trillion events per day), a streaming engine library used across Microsoft, that can execute relational, progressive, and real-time temporal queries at unprecedented levels of performance. I will then overview several new research directions in the Trill project, that expand the functionality of Trill for various application scenarios.
Finally, I will introduce Quill (stands for a quadrillion tuples per day) - a distributed platform to scale relational and temporal analytics over large datasets in the cloud. Quill exposes an abstraction for parallel datasets and computation, called ShardedStreamable. This abstraction provides the ability to express efficient distributed physical query plans that are transferable, i.e., movable from offline to real-time and vice versa. Quill targets offline datasets, uses Trill as a library layer for query processing, and employs a master-less cloud design. Experiments on up to 40 high-end cloud machines, on benchmark and real datasets up to 1TB, find Quill to outperform SparkSQL by up to orders-of-magnitude for temporal and 6X for relational queries, while supporting a rich space of transferable, programmable, and expressive distributed physical query plans.Speaker Bio:
Badrish Chandramouli is a senior researcher in the database group at Microsoft Research. He is broadly interested in creating technologies to perform near-real-time and offline big data analytics for Cloud applications. Since 2008, Badrish has been working on the streams project - this work shipped commercially in 2010 as the Microsoft StreamInsight engine. More recently, Badrish and his colleagues created Trill, a new analytics engine that is being widely used at Microsoft, for example, in the Bing advertising platform and as part of the public-facing Azure Stream Analytics service (see MSR's blog post for more information). In the context of these projects, Badrish has worked on diverse research areas such as progressive analytics, sorting, pattern detection, query processing, and distributed systems. His research has won best paper awards at ICDE 2012 and DBTest 2010. Visit his website at http://badrish.net/ to learn more about his research and publications.
Invited talk 2: Living on the Edge – Stream Data Processing at Scale
Amazon Web Services, US
Streaming analytics is about identifying and responding to events happening in your business, in your service or application, and with your customers in near real-time. Sensors, mobile and IoT devices, social networks, and online transactions are all generating data that can be monitored constantly to enable a business to detect and then act on events and insights before they lose their value. The need for large scale, real-time stream processing of big data in motion is more evident than ever before but the potential remains largely untapped by most firms. It’s not the size but rather the speed at which this data must be processed that presents the greatest technical challenges. In this talk I will draw upon our experience with Amazon Kinesis data streaming services to highlight use cases, discuss technical challenges in scaling a data streaming service to massive scale, and look ahead to the future of stream data processing.Speaker Bio:
Roger Barga is General Manager and Director of Development at Amazon Web Services, responsible for Kinesis data streaming services including Kinesis Streams, Kinesis Firehose, and Kinesis Analytics. Before joining Amazon, Roger was in the Cloud Machine Learning group at Microsoft, responsible for product management of the Azure Machine Learning service. His experience and research interests include data storage and management, data analytics and machine learning, distributed systems and building scalable cloud services, with emphasis on stream data processing and predictive analytics. Roger is also an Affiliate Professor at the University of Washington, where he is a lecturer in the Data Science and Machine Learning programs. Roger holds a PhD in Computer Science, a M.Sc. in Computer Science with an emphasis on Machine Learning, and a B.Sc. in Mathematics and Computer Science. Roger holds over 30 patents, he has published over 100 peer-reviewed technical papers and book chapters, and authored a book on predictive analytics.
Invited talk 3: Theory and Implementation of a Distributed Event Based Platform
This paper presents theory and an implementation of a Distributed Event Based System (DEBS) platform. The theory is based on a simple model that, despite its simplicity, informs the implementation. Many software libraries operate on “data at rest’, i.e. fixed data structures such as arrays and graphs. By contrast, DEBS systems operate on “data in motion,” i.e., data structures that change, in increments, over time. Many software libraries are designed for sequential execution or synchronous parallel execution. By contrast, DEBS systems have multiple agents executing asyn- chronously. This paper presents theorems that show how programs that operate on data in motion can be constructed from programs that operate on data at rest. The paper presents sufficient conditions that enable programs operating on data at rest to be reconfigured as networks of asynchronous agents operating on data structures that change incrementally as time progresses.The paper provides a brief description of a DEBS platform, called StreamPy, implemented in Python. StreamPy is a partial reification of the theory presented here. StreamPy supports asynchronous computation and the use of libraries designed to operate on data at rest for data in motion. Plat- forms that support real time business intelligence have to be integrated with machine learning and AI tools; StreamPy supports integration of several software libraries for data analytics implemented in Python.Speaker Bio:
Coffee talk: From Events to Situations
Yezreel Valley College, Israel
UC Irvine, US
Dr. Opher Etzion is a scholar, author and innovator. A professor of Information Systems and Head of the Technological Empowerment Institute in Yezreel Valley College Israel, Opher has served in various prominent roles in almost two decades at IBM, most recently as Chief Scientist of Event Processing at the IBM Haifa Research Lab A former Lead Architect of Event Processing Technology in IBM Websphere, Opher led pioneering projects that shaped the field of event processing in their research division. Opher is also the founding chair of the Event Processing Technical Society (EPTS) and an adjunct professor at the Technion Israel Institute of Technology, where he was the founding head of the Information Systems Engineering department. He has authored many papers and is also the co-author with Peter Niblett of the book Event Processing in Action. Opher has also won several prestigious awards over the years including the Israel Air-Force Award for Introduction of New Technologies, IBM Outstanding Innovation Award, and the IBM Corporate Award for groundbreaking work on event processing. He is based in Haifa, Israel.
Ramesh Jain is an entrepreneur, researcher, and educator. He is a Donald Bren Professor in Information & Computer Sciences at University of California, Irvine where he is doing research in Social Life Networks including EventShop and Objective Self. He has been an active member of professional community serving in various positions and contributing more than 400 research papers and coauthoring several books including text books in Machine Vision and Multimedia Computing. He is a Fellow of ACM, IEEE, AAAI, IAPR, and SPIE. Ramesh co-founded several companies, managed them in initial stages, and then turned them over to professional management. He also advised major companies in technology areas. Currently he is helping Krumbs in building a micro-reporting company for community wellbeing. His research and entrepreneurial interests have been in com[puter vision, AI, and multimedia. Curently he is interested in understanding and utilizing heterogeneous streams of data for building smart social systems. Situation recognition and objective self are his current passion.