Note that this file was originally made available at:
http://www.cs.caltech.edu/~adam/isen/wacc/

The Evolution of Internet-Scale Event Notification Services: Past, Present, and Future

Adam Rifkin and Rohit Khare

adam@cs.caltech.edu and rohit@uci.edu

August 10, 1998

 

Abstract

Decomposing distributed work activities into a sequence of events affords greater dynamism than holding each participant to a script. "Loose coupling," an oft-praised quality of the Event-Based Integration (EBI) style of software architecture, emerges because event notifications can decouple sources and sinks in space, time, and topology, and provide a defined interface for the separation of concerns between different handlers. Reflecting on our survey of over a hundred "event-oriented" collaborative systems, this paper outlines an evolutionary explanation of the style’s emergence and offers a characterization of current Event Notification Services (ENS). We speculate that the novel challenges of Internet scale – crossing trust boundaries and leveraging economic network effects – motivate a common infrastructure for event distribution and notification.

1. Introduction

The range of collaborative areas collectively considered "work activities" spans an impressively broad spectrum of human endeavors: from the second-by-second coordination of a rocket liftoff, to the day-by-day deliberations of Congress, to a lifetime accumulation of personal medical records. Computers assist and automate these digital, personal, and organizational processes by modeling reality using inside-the-box metaphors and mechanisms. In general, the sequence of coordination for work activities is determined internally -- as with a Rubik's Cube puzzle, whose solution process is completely encoded within its physical configuration -- or externally, by interfacing with the world outside the model. In the latter case, the grand Cartesian promise of rationalization encourages designers to subdivide discrete quantities of the world's progress into events, ranging from the click of keystroke during paper-writing, to a paper submission deadline, to the actual weeks of paper-writing themselves. Events occur in the physical world, in other computers, and in other processes on the same computer, and none of these events make a difference until a notification is delivered to interested parties.

Precisely as a koan describing the phenomenon of a tree falling in a forest without a witness, it is not sufficient that a printer runs out of paper or a user sits down at her terminal for an event to elicit reaction necessarily. Interested participants must be listening, a notification must be formed, and it must be delivered (at either party's behest -- the key difference between "real" events in the physical world and symbolic ones in the logical world). In shadowing work activities within software structures, Event-Based Integration (EBI) emerged as a popular control integration style precisely because it directly maps onto this view of reality [Barrett et al., 1996]. Events occur, which are transmuted into messages transferred among peers, which are then acted upon by handlers that manage the synchronization and semantic constraints of notification patterns in well-defined ways (for a more complete description of event lifecycles, see [Rosenblum and Wolf, 1997]).

Events are not just a convenient data structure for representing the outside world -- for example, is it any easier to represent a symphony as a set of state transitions than simply choreographing instrument players against a common clock? -- but they are an essential technique for loosening the constraints between the "moving parts" of a distributed work activity. Event notifications and their associated handlers afford a loosening of space and time considerations (for example, "pick up a piece of luggage when it is seen" accommodates a wide variety of conveyor belt speeds and bag sizes); a loosening of semantic meanings (for example, "buy this stock under any of a set best-buy criteria" permits the decentralized definition of a set of action triggers such as price-to-earnings ratio or performance compared with an index fund); a loosening of participant constraints (for example, "any available authorized broker can purchase" allows a pool of providers to enhance the recipients' qualities of service); and, a loosening of the coordination between subsystems (for example, a decoupled billing module could know to deduct a calculated commission whenever it is notified of a stock-trade against an account). Such mechanisms reduce the cost of system evolution, because with loosely coupled components, "it is possible to integrate new components without affecting the components that implicitly invoke the new components" [Garlan and Notkin, 1991].

Architectural styles based on event notification have been successful in several collaborative-application domains over the years [Carzaniga et al., 1998]. EBI approaches became more persuasive as systems scaled from single hosts to local-area and wide-area networks, and from closed to open sets of participants. The advent of Internet-Scale, though, raises qualitatively different issues than "global scale" alone [Chandy et al., 1998] – specifically those of bridging trust domains across organizational boundaries [Khare and Rifkin, 1997b]. In this new environment, there are reasons to believe a common event-notification infrastructure could emerge for the Internet-Scale coordination of work activities and for the selection of design choices to accommodate.

In the remainder of this paper, we explore our belief that Internet-Scale Event Notification Services (ISENS) are evolving toward a single standard. In Section 2, we sketch the continuous trend of existing collaborative applications toward increasing scales, culminating in the recent emergence of EBI solutions. These event-based systems for software integration offer lessons for constructing event notification services for collaborative applications at the Internet-Scale, the convergence of which we discuss in Section 3. The design tradeoff considerations for the communication services of this future event notification protocol are outlined in Section 4, after which we summarize our recommendations in Section 5.

2. Past: Niche Applications and Widening Range

We began our study of event-oriented systems by assembling a bibliography enumerating over one hundred coordination and collaboration systems that build on the notion of events [Rifkin and Khare, 1998]. Not all of these systems are explicitly event-oriented, nor do they all employ "event notification services" per se, but taken together, they reveal an evolutionary family tree. Within each of five general application themes we see coordination systems develop more explicit event support as they scale up from single hosts to larger networks. It is illustrative to consider each of these application themes in detail since they cover different design regimes. As shown in the table below, their events differ in frequency, distribution, and content; and have different naming models, event transformation hooks, and security concerns.

 

 

 

Messaging

Presence

Conferencing

Simulation & Graphics

Software Integration

Rate of event occurrence

Minutes to days

Minutes

Seconds

Milliseconds

Milliseconds to hours

Topology of

Distribution

1-Many (news),

1-Known (mail)

1-K (buddies)

1-1,

1-K (lecture),

K-K (forum)

1-1,

small K-K groups

1-K (often broadcasts anonymous)

Notification content

Text to multimedia

Lightweight (text)

Text to multimedia

Small and stateful updates

Machine-readable streams

Naming

Mailboxes, newsgroups,

topics

Users, groups

Users, handles, channels

Participants, simulated elements

Processes, hosts, tools

Event transformations

Compression, batch delivery

Batch update, state timeout

Rendering

Aggregation, filtering, dead reckoning

Data type adaptors

Security requirements

Authentication, Confidentiality

Privacy

Authentication, Confidentiality

Closed system

Access control

 

2.1 Messaging

Messaging applications deliver news to humans. The arrival or dispatch of information is one of the most primitive events, and since these notifications have no fixed semantics, their respective handlers are often as minimal as merely announcing "You have new mail." These systems, from interprocess data exchange to network news and Web "push," deal with events that happen on the order of minutes to days – and permit the same order of magnitude latency in event notification. Notices are distributed from a single source to a fixed group of subscribers, or even an unbounded set of listeners. Their contents, while only human-readable, can come in multiple media formats.

Naming strategies for messaging systems must subsume the identification and location of mailboxes, newsgroups, topics, publishers, and subscribers. Message transformation functions include facilities for compressing information on the wire, and making provisions for the batch delivery of messages in streams. Security considerations include authentication of senders, access control for recipients, content integrity, and confidentiality of participants.

The history of messaging architectures began with strategies for information delivery within a single host, ranging from the use of files for data integration to interprocess communication for simple control integration. Often at the very low end, a select or wait packet could be used for coordination. Messaging applications in local and wide area networks took on new topologies based on their potential uses: email delivery person-to-person, mailing lists ranging upward to a known set of recipients, and then net news emerging to distribute content to a large, predominantly unknown set of recipients. This flood of information led to the desire for filtered, high-quality information channels, leading to publish-subscribe pull systems and Web channels for content dispersal regardless of push or pull delivery.

Developers of these systems dealt with many idiosyncratic design issues. Email, for example, opted for reliable delivery at the cost of potentially 5-day delays. Through delivery status notifications, mailing lists learned to check for subscription verification loopbacks. Mailing list digest deliveries changed the content push model from immediate to buffered-and-deferred; applications like PointCast and Web browsers with page content subscriptions similarly changed the content pull model from on-demand to deferred at a particular polling interval. Web "push" also allowed the bundling of content, compression, and caching as well. Net news offered geographic delivery limits to deal with issues of geographic scale; expiry of content as customized at the information sinks; and a naming model differentiating individual article identifiers from the multiple newsgroups in which it might appear. Distribution lists offered secrecy of publishers and subscribers for privacy. Multicast publish-subscribe systems such as Tibco’s Information Bus learned the lesson of not moving subscription management to the multicast layer. The dissemination of information for human consumption has led to the emergence of hybrid systems that use satellite paging, metropolitan radio sidebands, and TV vertical blanking intervals for distribution.

2.2 Presence

Presence applications maintain awareness of people and devices. Status updates are perhaps the archetypal example of event notification – a telemetry stream of notification messages that encode state changes (deltas). These systems, monitoring hearts, logins, co-authors, coffeepots, or printers, deal with events that happen on the order of seconds to minutes – and permit the same order of magnitude latency in event notification. Updates are typically distributed from a single source to a fixed set of sinks, or from multiple sources to a single sink. Their contents can be human- or machine-readable, but are typically quite small.

The infrastructure for communicating presence information is typically designed for many-to-many distribution, although in practice this is often used for one-to-a-known-set dissemination. Naming must deal with issues such as extracting users from directories and offering symbolic names for groups of similarly minded individuals. Transformation functions include batch updates and state timeouts. Privacy requires access control techniques that understand who is permitted to watch.

The history of presence systems starts within a host, in which multiuser operating systems logged and reported current logins that users could track using commands like who(1) and utmp. As hosts were networked, commands such as finger were added to return last-seen, last-read, and plan information for a given user at a given host. Now, in the wider area, instant messaging applications such as WhoDP use switchboards to maintain a directory of clients, with the actual presence traffic redirected peer-to-peer. Such solutions scale well, as does the AOL Instant Messager, which uses a centralized state server to which clients around the globe can hold open connections.

2.3 Conferencing

Conferencing applications mediate conversations between individuals and groups. The primitive events of a dialogue are the exchange of speech fragments, but like messaging applications, without well-defined semantics or event handlers. These systems, from annunciators to auditoriums to videoconferencing, deal with events that happen on the order of seconds -- and require the same order of magnitude latency in event notification. Fragments are typically delivered between a single source and sink, to a known group or to a common "channel" with known participants (K-K). Their contents, while only human-readable, could be in delivered in one of many media formats.

Conferencing systems usually build on a presence system for initiation, and have additional naming needs for user addresses, their corresponding email addresses, their handles in conference rooms, and the channels of operation that they inhabit. Security requirements include speaker authentication, confidentiality through content encryption, and privacy of audience enumeration.

The history of conferencing applications begins with shared memory within a single host, so users could write(1) to the terminal screens of other logged-in users. Next came the write system command for system administrators to broadcast system messages, and private write facilities for talking to a particular user. The original talk(1) used write in both directions, introducing the concept of a session requiring active, mutual consent. Berkeley Unix widened this functionality with networked talk – used local write on either endpoint; later, ntalk was more portable but still a binary request header, and advanced clients allowed three-way or more echoing (ytalk). In all of these command systems, security was provided per-terminal allow/deny with minimal authentication.

Zephyr [DellaFera et al., 1988] added the security model of Kerberos, but more importantly it also added a one-way messagegram facility that could be used in low-latency installations as a pseudochat. Although for conferencing Zephyr worked well up to the wide area network scale, its reliance on Kerberos limited it to single trust domains and manual interdomain authorization. Still, it qualified as a full event notification service, allowing one-to-one and one-to-a-known-set distribution, despite a lack of session semantics and content support only for text. iFlame added forums to this application by employing a forum location server.

In contrast to these point-to-point systems, Internet Relay Chat (IRC) uses a fan-out spanning tree, with complete network information about the set of servers. IRC has a weak level of confidentiality of forum content, using passwords, and channel keys. Robots can automatically trigger events; for example, invitations can be delivered as private messages or public notices.

For full streaming videoconferencing, Netshow and the MBONE vc application handle group multimedia using lossy content channels. These applications work well for the scale of hundreds of viewers in physically networked proximity to minimize lossiness.

2.4 Simulation/Graphics

Simulation and graphical interface applications maintain strict correspondence to external reality through use of a state change model based on observed phenomena. Physical state changes are the classical definition of an event, identifiable in an instant in time. These systems, simulating battles, animations, or user interactions deal with events that happen on the order of milliseconds – and require the same order of magnitude latency in event notification. Updates are typically distributed between source-sink pairs or small fixed groups. Unlike presence notifications, though, these are always machine-readable deltas with strongly defined semantics for the coalescence event streams through aggregating, filtering, batching, and even replacing lost notifications through predictions.

Modeling reality began with hosts using graphical user interface event queues, with hardware devices both delivering interrupts and being scanned, foreshadowing options of polling and invoking in event-oriented communications. Event queues can select events by criteria such as bounds and type, and events could be updated and coalesced in the queue. Later, discrete event simulations made use of these queuing and coalescing principles to model reality. As these discrete event simulations became distributed among many hosts, the lack of a shared clock led to the development of logical clocks for synchronization among processes. Developers of these distributed simulations also employed conservative transactional approaches and optimistic approaches with rollbacks to prevent event orderings from creating spurious outcomes. Distributed Interactive Simulation (DIS) [IEEE, 1995] allows the modeling of physical and synthetic players in a wargame, with myriad specific update message formats. These players use the dead reckoning technique to predict any lost updates not received by other players in the system presumed to be out of communication.

2.5 Tool Integration

Tool integration applications compose software components by orchestrating tool invocations and data flow between participants. Encapsulating tools yields process events at a higher level of abstraction. These systems, passing control and data between compilers, editors, stock-traders and other tools on transition events, deal with events ranging from seconds to hours – a suspiciously broad range which belies the diversity of systems under this theme. Process events are typically distributed to a set of interested consumers by the system; the source is unaware of that set. Their contents, while machine-readable, span an equally broad range of message sizes and modes.

Solutions have ranged from local-host window integration using AppleEvents or REXX as event-oriented user interface scripting languages. Later, many other high level languages incorporated event notifications as assembly language interrupts, unstructured language if-then-goto statements, and structured language exceptions. Language toolkits such as ISIS added functionality across a network such as data replication and node fault tolerance using virtual synchrony [Birman and Joseph, 1987]. Middleware mechanisms like the Tibco Information Bus provided message routing to groups based on content [Skeen, 1992]. But perhaps the most-sought goal of software integration has been the networked load balancing of tools for improved performance.

In the 1980s, distributed filesystems were not well suited to the performance needs of integrated software engineering tool environments. For example, consider this observation about NFS [Kazar, 1988]:

"A program such as a distributed make program (a Unix system building tool) would be very difficult to implement under NFS, as the system would have to wait 3 second (30 seconds if a directory change is involved), for results to propagate from one machine to another. Such delays could easily make up for any improvement gained from concurrent compilation."

As networked filesystem performance improved, a number of efforts arose in industry and academia to facilitate the management of loosely coupled CASE tools. Such "control integration" [Wasserman, 1990] of software decoupled the concerns for data, platform, and process integration, to focus on the issue of interprocess communication among different local-area applications. Two approaches emerged to enable software integration capabilities such as composition, cooperation, collaboration, and coordination: message-based and object-based [Arnold and Memmi, 1992].

In the FIELD system [Reiss, 1990], client tools register interest in message-expressions, and a central message server as a separate process receives messages from tools and anonymously broadcasts the updates to other interested tools, forwarded in the order received. Although FIELD has no exception handling for access control and delivery constraint violations, it has a Policy tool that can intercept and replace messages. The metaphor of "tools advertising operations" informed the broadcast message server in products such as Hewlett-Packard’s SoftBench [Cagan, 1990] and Sun Microsystem’s ToolTalk [Julienne and Holtz, 1994]. Yeast provided an event-action specification tool for systems that used this kind of central message repository to trigger software tools [Krishnamurthy and Rosenblum, 1995].

Another approach to event-oriented integration promoted a central connector to the function of broadcasting messages. The POLYLITH Software Bus [Purtilo, 1994] redirects and repackages intermodule calls per bus wiring, and tools bind their input and output ports through the bus using these wirings. Ports can be identified by name, allowing retargeting, although there is no explicit support for grouping of recipients. Simple filtering of simple, structured, and pointer message types allowed some congestion control, but the lack of support for ill-formed messages and incompatible connections confined POLYLITH’s use, like the use of FIELD-based tools, to closed, local-area systems. Both POLYLITH and the FIELD-based tools allowed the wrapping of existing tools, through a module interconnection language or encapsulating components, respectively; by the mid 1990s, companies looked to CORBA for object-based enterprise integration as well.

As the variety of approaches to Event-Based Integration (EBI) proliferated, a framework was developed to understand the tradeoffs between these different systems [Barrett et al., 1996]. This framework identified the components of EBI: registrars that keep track of participants, routers that transmit data, message transforming functions that modify data and routing, and delivery constraints that control transmission arrivals.

According to the framework, existing approaches to event-based software integration fall under five axes. First is the method of communication: either the point-to-point of remote procedure call systems (ISIS), the broadcast of implicit invocation systems (FIELD), or the software bus in which input and output are bound to the channels of an abstract bus (POLYLITH). Second is the expressiveness of module interaction descriptions: nothing but source code (REXX), procedure signatures (CORBA interface definition language), interfaces plus a programming language for reacting to messages (SoftBench), specification of connectors (POLYLITH), or the configuration specification of containers (the EBI framework). Third is the intrusiveness of module interaction descriptions: inserted into modules themselves (CORBA skeletons and stubs), wrapping instead of modifying module source (SoftBench), or hybrid using wrappers to modify source (ToolTalk). Fourth is behavior dynamism: static specification of interactions (POLYLITH), dynamic specification of interactions (SoftBench), or both (ToolTalk). And, there is the issue of naming: abstract (no one knows anyone else’s name or location, such as FIELD), aliasing (participants have pointers but no knowledge of names or locations, such as ISIS), naming (participants know names but not necessarily locations, such as CORBA), primitive (participants know names and locations, such as the Internet’s Domain Name System).

In this section, we explored five application niches EBI has occupied – all of them adapted to loosely-coupled systems – and seen how Event Notification Services (ENS) expanded their range, from hosts to local and wide area networks.

3. Present: Wide-Area Event Notification Services

To effectively shadow work activities within software models according to an event-based architectural style, application designers must make informed choices between ENS. Unfortunately, architectures, applications, and services have been conflated throughout the literature, and further obscured by marketing in industry. In this section, we present some fundamental ENS design decisions in an attempt to refactor the definition of composite descriptions like "push" and "pull", or "synchronous" and "asynchronous." We also discuss related work characterizing event-based architectures.

3.1 Latency

The frequency of event occurrence is a direct constraint on an application’s choice of ENS. In terms of the event lifecycle in [Rosenblum and Wolf, 1997], designers need to characterize what we call sampling latency bounding the delay from occurrence until observation; and delivery latency from observation until notification. Both of those parameters are typically the same order of magnitude: it neither makes sense to simulate a dogfight over email, nor to deliver the daily comic as a flood of instant messages. At one end of the spectrum, Simple Network Management Protocol (SNMP) trap messages provide immediate sampling and low latency delivery of infrequent alarms, while Network News Transfer Protocol (NNTP) postings could take days or weeks to propagate across the Internet.

Comparing sampling latency to the frequency of event occurrence determines how many events might be reported per notification message. Sampling latency interacts closely with who initiates communication, the notification source or sink (Section 3.2). For sink-initiated transfers, this appears as a "poll-interval," but source-initiated transfers may also limit the notification rate. For example, rather than sending a TCP segment for each keystroke in an interactive session, the Nagle algorithm waits up to one network round-trip time to wait for additional keystrokes to reduce congestion.

Achieving low sampling latencies – less than delivery latency – can require source-initiation. Low event frequencies also rule out sink-initiation as inefficient. Buffering can easily increase sampling latency in either case, though, by reporting more events per notification. Source-initiated mailing lists, for example, offer "digest" modes that batch daily or weekly traffic. Conversely, event logs can store notifications for sinks to poll later.

Delivery latency governs how long a notification might remain "in transit." It could be explicitly specified in a message (an Expiry: header), implicitly specified by an event distributor (an email delivery status notification timeout), or entirely dependent on system and network load (as with Internet Relay Chat). Network transport performance puts a lower bound on delivery latency – a single IP packet can take up to 255 seconds or hops to deliver. Low delivery latencies generally restrict designers to end-to-end distribution over store-and-forward ENS (Section 3.3).

Latency is not the only source of delay in event observation and notification. Bandwidth definitely affects those rates, but remains secondary at global scale and especially in the Internet for small messages. Subscription management is another bottleneck, governing the setup and teardown delays for expressing interest in event occurrences.

3.2 Initiation

One of the greatest sources of confusion in describing ENS is which end initiates notification delivery. Either the source interrupts or the sink polls, but these terms are conflated with "push" (often implemented by polling), "asynchronous" (assuming an interrupt can be sent at any time), and "real time" (assuming that an interrupt is sent with low sampling latency). Initiation is also conflated with connection-establishment; in fact, some protocols can open a TCP connection and then swap roles (for example, the SMTP TURN command). Some systems use depots to bridge both initiation modes within a single application. For example, email notification typically proceeds in two phases, SMTP delivery to a mailbox with POP/IMAP polling from there to the client.

Interrupt-driven delivery is also often assumed not to require acknowledgment, or that correspondingly, transports that do not require acknowledgement can only be used for source-initiation. In fact, DNS requests regular naming updates over UDP, and one-way Zephyr messagegrams are sent over TCP. Multicast, however, does not permit acknowledgement scalably, and is thus only suited for source-initiated announcements.

As mentioned in Section 3.1, the polling interval establishes sampling latency, but ENS definitions vary whether to use it as an upper or lower bound.

An efficient initiation decision depends on the relative number of sources and sinks (the distribution topology of Section 3.3). Namely, the smaller set should initiate. If one or both sets are arbitrarily large, however, it depends whether there is a simplifying assumption at hand. To post to N newsreaders, there are smaller, enumerable sets of news servers in the network, or perhaps a single multicast channel -- and source-initiation remains viable. For N people to become aware of one’s presence, though, protocols like finger must be sink-initiated since there are no blackboards to rendezvous at for that application.

3.3 Distribution

Tracking all the point As and point Bs in the network is the fundamental job of an ENS; this is also known as routing or distribution. Lists of interested parties must be maintained at either endpoint, or in a separate queue, channel, or relay. This implies end-to-end delivery, because the initiator knows the set of recipients, or store-and-forward delivery, because only the store knows to whom to forward, respectively.

If there is an intermediator, there is a tradeoff between single-hop – such as FIELD’s Msg server, the POLYLITH Software Bus, or AOL Instant Messenger’s centralized presence database – and creating a spanning tree of mail, news, or chat servers. A mesh topology suggests the choice between several algorithms: shortest-path, flood-fill, and with or without reliable handoff.

The naming model touches on issue of scale; namely, can we enumerate the sets of named endpoints? Are they from a closed or open group? Is there an inherently-group address available, a la wireless range, multicast hops, or NNTP geographic scoping?

Developers should be very careful using transport as a group address, as in the first two cases listed above. First, there is the question of whether these group addresses will scale. Further, there is a good chance that the hard problems of distribution get pushed to another layer at which the developer may have no fine-grained control.

3.4 Delivery Constraints

The suite of strategies for delivery constraints all deal with the question: upon receiving (or not receiving) a notficiation, what steps can/must the ENS take before passing it up?

The ENS is responsible for ensuring certain guarantees before delivering a notification to the handler above. Ask yourself, when does the protocol (appear to) block at the protocol layer, which is a different layer than blocking or nonblocking within the application – for example, when notifications are specialized into request and response commands.

Intermediation and topology fall out as consequences of the choices for delivery constraints. Issues such as event notification subscriptions – for example, who actually maintains the list – should also be considered; with first-class connectors and subscription-maintainers as components, the system is more resilient.

Over the Internet, the channels themselves are nonblocking (modulo the practicalities of buffer size), so the protocol should only withhold delivery to ensure:

If a packet is lost (for example, after a timer expires), we can

In addition, we may also want to enforce security guarantees:

Consider the performance tradeoffs of such security checks; different choices seem based on the number of channels and the number of principals. The revocation timing model also needs to be considered.

3.5 Security

In the conventional security world, generations of eunuchs worked to define the security perimeter of an application as narrowly as possible and to make mechanical trust decisions. Security of events needs to be implemented on a per-event basis as opposed to a per-channel basis. We need to build confidence that the client and server computers are sending events with integrity and privacy; that the event generator was authenticated to generate and propagate events, so that the event handlers can trust the meaning and intent of those events. As we automate the event observation and notification, we only can automate trust decisions that we can ultimately audit and/or rollback. Security should not be an afterthought, but it is orthogonal to other issues of design.

3.6 Transport

Transport is often a confused issue among application developers. Typical errors include confusing transport with latency ("we can use UDP for efficiency without worrying about reliability") and using transport to do group membership ("multicast will by its very nature get us an efficient, reliable broadcast"). We believe transport decisions should not alter the underlying semantics of a system’s behaviors. Trying to implement reliability atop UDP, one often reinvents TCP; multicast only makes sense on a shared physical medium such as a dedicated local area or wide area network. Retry policies should be based on a combination of latency choice and transport choice.

Furthermore, issues such as bandwidth are only subsidiary at the Internet scale. In fact, ENS at the Internet scale require less about adapting to ever-widening geographic, temporal, and numeric scales, than to the trust boundaries uniquely defining "Internet-Scale." As we will explore in Section 4, new tensions are surfacing in bridging dissimilar ontology, security, mobility, and transport models, and these tensions will affect event-oriented software architecture at the Internet scale.

3.7 Architectural Impact

As we’ve seen in Section 3, upon separating out Event Notification Services, we can characterize them with the goal of helping developers design and evaluate ENS. To that end, architecture trumps methodology: especially at the Internet scale, communication patterns govern success more than particular techniques or algorithms.

As a result, we need principles for selecting among ENS. We are in the process of developing an event-oriented software architecture model, inspired by C2 and Representational State Transfer. C2’s model of connectors, components, and notifications can bridge a range of current proposals, hint at design rules for verifying event-based development, reuse a common ENS at varying levels of abstraction, and perhaps even offer a lattice of available ENSs. Representational State Transfer’s messages separate the artifact (wire) and ideal (remote) form and allow dynamism and scale through statelessness.

The C2 Style [Taylor et al., 1996] in particular assumes a reliable network. Components respond to notifications and emit requests asynchronously. Connectors coordinate all communication, and are first-class objects that function as routers, broadcasters, filters, and prioritizers. Messages are names with typed parameters, and messages come in two flavors: notifications of state changes flow "down," and requests for action flow "up".

Upon separating out ENS, we can characterize them, not by commercial terms like "push" and "real-time," but by casting them in terms of sampling interval; delivery latency; reliability requirements; blocking semantics; whether a polling sink or a notifying source initiates communication; and whether delivery is direct or intermediated. These issues consequently suggest event-based application development issues such as explicit support for event queues, choice of the transport, and the naming and content models, as offered by the Rosenblum-Wolf framework for event observation and notification [Rosenblum and Wolf, 1997].

This framework defines the lifecycle of an event:

  1. Determination of which events shall be observable
  2. Expression of interest in an event or pattern
  3. Occurrence of an event
  4. Observation
  5. Relation of an event to a pattern of interest
  6. Notification to an application
  7. Receipt by the application
  8. Response of the application

Given that lifecycle, the framework has seven components:

  1. Object model of senders and receivers
  2. Event model characterizes event phenomena
  3. Naming model of references to items of interest
  4. Observation model of identifiable patterns
  5. Time model of events causing notifications
  6. Notification model of mechanisms to express interest and receive them
  7. Resource model for allocation and accounting

A model such as Rosenblum and Wolf’s could lead to the development of a generic Internet notification interface, as we discuss in Section 4.

4. Future: Toward a Generic Internet Notification Interface

The rise of explicitly event-oriented application architectures speaks to the value of loose coupling as systems scaled to larger networks. Wide Area Networks increase the geographic reach and number of participants to the point where a separate event notification infrastructure could best handle the concerns of latency and scale – by decoupling the delivery and dispatch processes, respectively. Internet-Scale, though, is precisely about adding on the unique concerns of interorganizational internetworking.

4.1 Defining with "Internet-Scale"

Scaling information systems across trust boundaries requires more than common security technology and authorization models. These are also ontological boundaries, where opposite participants may mean different things with different words – or the same one [Khare and Rifkin, 1997a]. This applies not only to the semantics of an event notification, but the notification service as well, requiring interoperability. Finally, cooperative technology-adoption games between organizations illuminate the network effects accruing to a common notification service – unification only an evolvable notification service can leverage.

Evolvable systems have three features: resilience in the presence of partial information and/or partial failure, simplicity in being weak enough to subsume extant protocols, and extensibility in a decentralized fashion. These features suggest requirements for application scalability at the Internet scale: evolvability as just described; decentralization meaning no central registry or router like FIELD central message server or CORBA object request broker; reliability with respect to both latency variations and exceptional program behavior; security that is not session-oriented; naming with hooks for using URIs for components and connectors, with ramifications for location, mobility, and persistence; and concurrency control over real-time constaints and remote synchronization.

We note that these issues are not new to the networking culture. VMTP [Cheriton and Williamson, 1988] pointed out the deficiencies in transport protocols as naming, performance, and functionality. Performance needs included efficient connection setup and tear down for streaming, overflows for windows greater than a maximum burst size, selective transmission, rate-based flow, and an economic model for message passing. Functionality needs included reliable broadcast, a connectionless datagram service, message-level security, priority constraints, and mobility through multi-homed hosts. These were important to the wide-area network then, and they are important to the Internet now, in dealing with the distribution of event notifications across time, space, and organizations.

Security, protocol style, reliability, and economics may seem like Zen issues, like the spectrum in software engineering from white box to grey box to black box. After all, EBI has been successful recently because loose coupling is a hallmark of Internet-scale development, by permitting a dynamic communication topology, and by separating the engineering tradeoffs for latency and efficiency. EBI’s success at the wide area network scale may not translate to success at the Internet Scale, because of the trust boundaries that must be crossed when information is passed between different organizations [quote Vint Cerf?]. The myriad applications generating and ending event notification content across myriad organizations requires political decentralization so that applications may evolve without the explicit tweaking performed by today’s system administrators.

Furthermore, crossing trust domains raises the consequences to ontologies as communities need to communicate using federated directories [Khare and Rifkin, 1997a]. ISENS must accommodate diversity at this level, so federation services will be necessary: for example, Internet Scale email or chat must cope with lots of ways to identify humans. Ontologies also drive the choice of document format [Khare and Rifkin, 1998]; for example, Keryx developed its own transfer syntax to accommodate data representation and filtering [Brandt and Kristensen, 1997].

Internet Scale also implies an implicit understanding and dealing with the economic ramifications of ENSs. As we have seen with the battle between closed and open systems, network effects drive development decisions and subsequent application successes and failures [cite Whitehead?]. This dovetails the effects of the culture of Internet Scale, which prefers that formats be human readable and hackable, and that support is made for whichever programming styles, languages, and memes the developers desire in a melting pot of decentralized interoperability.

Ultimately, at the Internet Scale, Event Notification Services have greater consequence to interoperability and evolvability than do Event-Based Integrations, because EBI are methodology and ENS are architecture. Therefore, it would be useful to define an EN bearer service for use by developers to guide their decisions in choosing coordination tradeoffs among a bundle of EN bearer services. A generic Internet Scale event notification protocol will accommodate this mosaic of design services.

The future appears to favor a generic interface for notifications at the Internet-Scale, with built-in evolvability features to expand and adapt to new collaborative systems' demands. However, the true challenges of the future are trust, protocol style, and economic, with ramifications for issues such as ontology, security, mobility, and reliability. A common, interoperable standard gains value largely in proportion to the other developers and applications using it.

4.2 Evaluation of Contenders

Presently there are several protocol proposals, any of which might evolve into a standard for Internet-scale event notifications. In the area of presence and chat, there are RendezVous Protocol (RVP), Simple General Awareness Protocol (SGAP), and WhoDP. In the area of tool integration, there are Simple Workflow Access Protocol (SWAP), Internet Printing Protocol (IPP), and Web Distributed Authoring and Versioning (WebDAV). In the area of generic notification services, there are Basic Lightweight Internet Protocol (BLIP) and Generic Event Notification Architecture (GENA). In addition, efforts making use of Java include Keryx and the Jini distributed events effort.

Many of these proposals are missing from their design several of the issues we identified in Section 3. For example, we believe that connectors should be first-class objects, so they do not hide the details of their subscription and queueing policies and implementations. We also believe that transport is an engineering decision, not a semantic one; we are weary of efforts that promote the use of "reliable" datagrams and multicast, because such efforts are prone to reinvent TCP and risk ACK implosion, respectively. And we believe that security at the message level should not be an afterthought for a globally deployed infrastructure. Too often protocol designers punt the issue of security to session-oriented semantics; this will not scale to the Internet.

4.3 Using HTTP/1.1 as a Generic Internet Notification Interface

Most current contenders extend existing messaging protocols or propose wholly new ones. We recommend a layered Interent-Scale ENS wire protocol for notifications, perhaps as an asynchronous version of HTTP with hooks for notification management, interfaces for advertising and subscribing, policies for information queue management, generic notification typing, notification trapping based on methods and resources, link maintenance, and capabilities for new-content push.

HTTP/1.1 has served well as a performant protocol for globally distributing hypermedia [Fielding et al., 1997]. We believe as is, HTTP/1.1 can accommodate many of the presence, directory, authoring, versioning, and printing event notification scenarios for the Web. These include scenarios for partially connected clients, partially connected servers, order of magnitude of clients or servers, order of magnitude of topics or messages or message sizes, lossiness, and message delivery constraints.

We believe that all that is needed is an agreed-on client and server interpretation of the existing HTTP/1.1 syntax and the addition of methods for subscribing and unsubscribing. This minimal approach can accommodate all of the aforementioned event notification scenarios with the Web’s existing infrastructure, eliminating the need for a specialized "Generic Real-time Internet Protocol for Events" for event-triggered polling in immediate-time and time-triggered interrupting in deferred-time.

5. Conclusions

This paper sought an evolutionary explanation of the popularity of collaborative applications coordinating through event-based implicit invocation, its current challenges, and likely next steps. From the past, we discerned a common trend of widening range -- from single hosts to local-area networks to wide-area networks -- within each of five application themes employing events as notification messages that trigger commands: information distribution, presence, conferencing and instant, simulation and graphics, and interapplication integration. Academic literature in the area can be reinterpreted to separate applications using Event Based Integration from underlying Event Notification Services.

Upon separating out ENS, we can characterize them, not by commercial terms like "push" and "real-time," but by casting them in terms of sampling interval; delivery latency; reliability requirements; blocking semantics; whether a polling sink or a notifying source initiates communication; and whether delivery is direct or intermediated. These issues consequently suggest event-based application development issues such as explicit support for event queues, choice of the transport, and the naming and content models, as offered by the Rosenblum-Wolf framework for event observation and notification.

ENS at the present moment require less about adapting to ever-widening geographic, temporal, and numeric scales, than to the trust boundaries uniquely defining "Internet-Scale." New tensions are surfacing in bridging dissimilar ontology, security, and mobility models. Furthermore, the politics and economics of Internet scale are dominated by the "network effect" -- the tendency for a common, interoperable standard to become more valuable precisely in proportion to the numbers of developers and applications using it.

The future appears to favor a generic interface for notifications at the Internet-Scale, with built-in evolvability features to expand and adapt to new collaborative systems' demands. However, the true challenges of the future are trust, protocol style, and economic, with ramifications for issues such as ontology, security, mobility, and reliability. A common, interoperable standard gains value largely in proportion to the other developers and applications using it.

We believe that the C2 and Representational State Transfer architectural styles show promise for enabling software engineers to understand and decide between these tradeoffs. C2’s model of connectors, components, and notifications can bridge a range of current proposals, hint at design rules for verifying event-based development, reuse a common ENS at varying levels of abstraction, and perhaps even offer a lattice of available ENSs. Representational State Transfer’s messages separate the artifact (wire) and ideal (remote) form and allow dynamism and scale through statelessness.

Most current contenders extend existing messaging protocols or propose wholly new ones. We recommend a layered Interent-Scale ENS wire protocol for notifications, perhaps as an asynchronous version of HTTP with hooks for notification management, interfaces for advertising and subscribing, policies for information queue management, generic notification typing, notification trapping based on methods and resources, link maintenance, and capabilities for new-content push.

Acknowledgements

Mr. Khare's work was sponsored by the Defense Advanced Research Projects Agency and Air Force Research Laboratory, Air Force Materiel Command, USAF, under agreement number F30602-97-2-0021.

Mr. Rifkin's work was supported under the Caltech Infospheres Project, sponsored by the CISE directorate of the National Science Foundation under Problem Solving Environments grant CCR-9527130 and by the NSF Center for Research on Parallel Computation under Cooperative Agreement Number CCR-9120008.

He would also like to thank Microsoft’s Systems and Networking Research Group for their support in this research.

Both authors would like to thank the program committee and participants of the Workshop on Internet-Scale Event Notification for their inspiration and helpful comments.

References

John E. Arnold and Gerard Memmi. "Control Integration and its Role in Software Integration," Proceedings of the Fifth International Conference on Software Engineering and its Applications, Toulouse, December 1992.

Daniel J. Barrett, Lori A. Clarke, Peri L. Tarr, and Alexander E. Wise. "An Event-Based Software Integration Framework," ACM Transactions on Software Engineering and Methodology, Volume 5, Number 4, Pages 378-421, October 1996.

Kenneth P. Birman and Thomas A. Joseph. "Exploiting Virtual Synchrony in Distributed Systems," Proceedings of the Eleventh Symposium on Operating Systems Principles, Austin, Texas, 1987.

Søren Brandt and Anders Kristensen. "Web Push as an Internet Notification Service," W3C Workshop on Push Technology, Boston, Massachusetts, September 1997.

Martin R. Cagan. "The HP SoftBench Environment: An Architecture for a New Generation of Software Tools," Hewlett-Packard Journal, Pages 36-47, June 1990.

Antonio Carzaniga, Elisabetta Di Nitto, David S. Rosenblum, and Alexander L. Wolf. "Issues in Supporting Event-based Architectural Styles", July 1998.

K. Mani Chandy, Adam Rifkin, and Eve Schooler. "Using Announce-Listen with Global Events to Develop Distributed Control Systems," ACM 1998 Workshop on Java for High-Performance Network Computing, February 1998.

David R. Cheriton and Carey L. Williamson. "VMTP as the Transport Layer for High Performance Distributed Systems," IEEE Communications Magazine, Volume 27, Number 6, Pages 37-44, June 1989.

C.A. DellaFera, M.W. Eichin, R.S. French, D.C. Jedlinsky, J.T. Kohl, and W.E. Somerfeld. "Zephyr Notification Service," USENIX Conference Proceedings, Dallas, Texas, Winter 1988.

Roy Fielding, Jim Gettys, Jeff Mogul, Henrik Frystyk, and Tim Berners-Lee. Hypertext Transfer Protocol -- HTTP/1.1, RFC 2068, January 1997. Available at http://www.w3.org/Protocols/rfc2068/rfc2068

David Garlan and David Notkin. "Formalizing Design Spaces: Implicit Invocation Mechanisms," Proceedings of VDM '91: Formal Software Development Methods, October 1991.

IEEE. 1278.1 IEEE Standard for Distributed Interactive Simulation -- Application Protocols (ANSI), 1995.

Astrid M. Julienne and Brian Holtz. ToolTalk and Open Protocols: Inter-Application Communication, Prentice Hall, 1994.

Michael L. Kazar. "Synchronization and Caching Issues in the Andrew File System," Proceedings of the Winter USENIX Conference, Pages 27-36, Dallas, Texas, February 1988.

Rohit Khare and Adam Rifkin. "Capturing the State of Distributed Systems with XML," World Wide Web Journal, Volume 2, Number 4, Pages 207-218, Autumn 1997. (1997a)

Rohit Khare and Adam Rifkin. "Weaving a Web of Trust," World Wide Web Journal, Volume 2, Number 3, Pages 77-112, Summer 1997. (1997b)

Rohit Khare and Adam Rifkin. "The Origin of (Document) Species," WWW7 Conference, Brisbane, Australia, April 1998, printed in Computer Networks and ISDN Systems, Volume 30, Pages 389-397,

1998.

Balachander Krishnamurthy and David S. Rosenblum. "Yeast: A General Purpose Event-Action System," IEEE Transactions on Software Engineering, Volume 21, Number 10, Pages 845-857, October 1995.

James M. Purtilo. "The POLYLITH Software Bus," ACM Transactions on Programming Languages and Systems, Volume 16, Number 1, Pages 151-174, January 1994.

Steven P. Reiss. "Connecting Tools Using Message Passage in the FIELD Environment," IEEE Software, Volume 7, Number 4, Pages 57-66, July 1990.

Adam Rifkin and Rohit Khare. "A Bibliography of Event Papers," July 1998.

David S. Rosenblum and Alexander L. Wolf. "A Design Framework for Internet-Scale Event Observation and Notification," Proceedings of the Sixth European Software Engineering Conference / ACM SIGSOFT Fifth Symposium on the Foundations of Software Engineering, Pages 344-360, September 1997.

Dale Skeen. "An Information Bus Architecture for Large-Scale, Decision-Support Environments," Proceedings of USENIX, 1992.

Richard N. Taylor, Nenad Medvidovic, Kenneth M. Anderson, E. James Whitehead Jr., Jason E. Robbins, Kari A. Nies, Peyman Oreizy, and Deborah L. Dubrow. "A Component- and Message-Based Architectural Style for GUI Software," IEEE Transactions on Software Engineering, Volume 22, Number 6, Pages 390-406, June 1996.

Jim Waldo. "A Minimalist Approach to Distributed Event Notifications," Workshop on Internet Scale Event Notification, Irvine, California, July 1998.

A. Wasserman. "Tool Integration in Software Engineering Environments," in Software Engineering Environments, Lecture Notes in Computer Science 467, Springer-Verlag, F. Long (Editor), 1990.