Speechifying [This is the kind of fuzzy crap that I just have to get out of my system. It's the opposite of the clarity & conciseness needed for the thesis. Instead, this is a hypothetical address for the talk I'm giving at the O'Reilly conference in May.] I'd like to speak to you today about a new way to integrate software across the Internet. If we choose to view software modules as hosts on a network, then we can apply the same principles networking researchers used to internetwork decentralized LANs to "internetwork" applications. Specifically, I want to introduce a new product that makes this style of integration easy: an application-layer router. I believe that such a "router" is the missing element that completes the software industry's current rush to address Internet-scale integration, under the moniker of "Web Services." Now, you may ask yourselves, "Don't we already have ways -- too many ways! -- to build distributed software already?' Remote Procedure Calls, dataflow graphs, mobile code, and so many other architectural styles come readily to mind as tools to connect software systems across the Internet. However, almost all these architectural styles assume that the Internet is merely an extension of the LAN. After all, an IP packet is an IP packet, whether inside or outside the firewall, whether across a LAN or a dialup modem. But there's a reason why, thirty years later, we are in the happy position of using IP "all the way down"; it wasn't always the case. Originally, there were a slew of competing network protocols: AppleTalk, DECnet, Novell IPX, and so on. IP was invented anew to address the new challenge of networking networks. No single network protocol was appropriate for unifying all the others. Until IP, network protocols were tightly coupled to link-layer hardware and operating systems. IP, by contrast, had to address three new challenges: scaling across time, space, and organizational boundaries. First, IP had to be stable for a long time. To let new computers, network adapters, and operating systems invented years apart -- now decades! -- to "speak" IP compatibly, a neutral team had to hammer out simple, concrete specifications that could stand the test of time. Second, IP had to work across larger spaces: continents, not just campuses or cities. To accommodate the widely varying latency, bandwidth, and jitter of all sorts of communications links, IP was designed as an asynchronous "store-and-forward" network. Signaling techniques appropriate for a few meters of copper cable running Ethernet simply wouldn't work across the Atlantic. Finally, IP had to work across organizational boundaries. Different organizations' networks had very different ways to identify hosts, users, terminals, files -- you couldn't even assume everyone used 8-bit bytes back then! That's why internetworking required yet another new namespace: IP addresses, and later, DNS names and so on. Conversely, IP did not include security; it left concepts such as users, passwords, and encryption to applications running "on top." And yet, thirty years later, you're likely to get fired if you deploy anything besides IP. The experiment that was merely intended to integrate separate LANs took over as the LAN format, too; while so many of the once-dominant LAN protocols IP struggled to accommodate are nearly-extinct curiosities. So what does this tale about networks have to do with software architectures? I claimed that today's state of the art for developing distributed software merely treats the Internet as a slower kind of LAN, since "it's all IP all the way down." We're still vainly trying to provide the illusion of a single, large-scale von Neumann computer out of all these distributed parts. Instead, I claim there are brand-new concerns that arise at Internet- scale. We need decentralized software that can cope with vastly larger scales across time, space, and organizations. And to that end, I want to tell you why traditional middleware doesn't match up to these challenges -- and that we can apply networking concepts that can. To begin with, I'd like to illustrate the weakness of current software integration technology, and compare it to SOAP. When software components are separated by time, I mean to consider the challenge of interoperability when components are written years apart, by separate teams. With technologies such as CORBA IIOP, Microsoft DCOM, Java RMI, or TIBCO's information bus, the messages sent are in a fragile, binary, and, often, proprietary format. This implies tight coupling of components in terms of vendor, language, and interface versions. Using SOAP and WSDL loosens these couplings, since these standards leverage XML to allow Web Services to be called from any platform, and to allow interfaces & data formats to evolve gracefully. When software components are separated by distance, this translates back into time, or latency. Traditional RPC and event-based integration systems assume that the network is reliable and low- latency. For example, TIBCO relies on IP multicast to announce events to all nodes. Multicast is only efficient at LAN-scale. By contrast, a nomadic laptop may go days without connecting to the Internet. SOAP acknowledges this challenge by allowing many different kinds of transport, such as SMTP (email). While the calling application may still block, as with a traditional RPC, SOAP at least allows developers to loosen coupling in time, and hence account for wider geographical range. Finally, consider what happens when software components are separated by organizational boundaries. In the travel industry, a "day" means any 24-hour period for car rentals; but only a single evening in a hotel. A reservation service will have to explicitly handle these variances; this is the bulk of the multi-billion dollar EAI (enterprise application integration) industry. The Web Services vision, in contrast, is for an intelligent actor to look up the relevant schema for both services in a UDDI directory and at least translate miles into kilometers, if not also the contractually distinct definitions of "day" in the car rental and hotel industries. So what's the problem, then? It would seem that, yes, there are limitations to using today's integration technology at Internet-scale, but that SOAP, WSDL, UDDI, and the rest of the menagerie of Web Services technologies are sufficiently evolved successors to them that we will be able to successfully integrate software across the Internet. I believe these technologies are only one half of the solution. On the surface, the analogy would seem to hold: the IP packet format and TCP protocols are all we needed to network networks; shouldn't SOAP messages be all we need to network software? But there was a complementary concept implied by the very nature of IP packets: the IP router. Literally, the IP specifications don't define or require a construct called a "router", but the router was the device that unleashed the full potential of IP to actually interconnect LANs. Similarly, I claim SOAP routers are necessary to actually unleash the full potential of Web Services. But that's skipping steps, my claiming that we need SOAP routers. Let me begin by explaining what they are and how they work. Also, in stepping back from the industry hype and of-the-moment buzzwords, I'll put aside any mention of SOAP and specific technologies for the time being. Theoretically, most integration models abstract software components as miniature machines. Machines have control levers and input feeds; we know how to chain them together in sequence, or nest them, as in the very word, "subroutine." These machines are tightly coupled, in that the output of one must be directly and immediately fed to the next. Furthermore, "next" is itself well defined, unfortunately allowing machines to rely on the exact implementation of the others. To decouple these factories, "brokers" emerged to buy, warehouse, and sell intermediate goods. Integration models adopted the same abstraction. Object Brokers, as a generic category of middleware, allowed the invoking machine to dynamically bind to the "next." This deferred a range of choices to run-time, such as directing which actual computer to invoke the command on; enforcing that the caller had the appropriate security credentials; and queuing invocations to mask transient connectivity failures. Even more capable variants of Object Brokers incorporated Transaction Monitors, so that even distributed invocations could be modeled as atomic invocations. My colleague Roy Fielding continued in this vein to catalog a wide range of architectural styles for distributed software integration. Ultimately, he synthesized a new style he suggests best represents the power of the Web: REST (Representational State Transfer). To quote Dr. Fielding: "The central feature that distinguishes the REST architectural style from other network-based styles is its emphasis on a uniform interface between components." His conclusion is where we'll begin. Just as the HTTP-based Web provides a uniform interface for accessing and transmitting any hypermedia resource, I claim SOAP-based Web Services provide a uniform interface for invoking and responding to any software component. The key addition we're making this time is that, unlike hypermedia transfer, software components require asynchronous messaging, since we need to encompass both RPC and event-based integration styles. [OK, so I broke my rule about buzzwords. It's a draft! ;-] Assume I'll come back later and defend why I believe SOAP is REST applied to software integration. What new powers do we gain by stipulating this? I'm proposing that even third parties can add "ilities" -- reliability, availability, scalability, security, extensibility, and visibility -- to REST services without modifying the services or callers. One of the most powerful, and underappreciated, implications of HTTP's proxy support is the potential to compose active proxies to extend the Web. Content can be tailored to various devices; advertising can be stripped out (or inserted); identities can be anonymized; protocols can be gatewayed; and so on. The key is that third parties can assemble custom proxy chains of fourth party services, all without modifying the origin server or user-agent. Similar implications hold for SOAP intermediary support. To date, these have been little-used in the early phases of RPC and distributed-objects applications. However, the lessons learned in HTTP have made SOAP's intermediary support even more powerful, most notably by adding the mustUnderstand attribute. It affords forward- compatibility for coping with future actors and headers. A subtler lesson is in the social construction of proxies. For the hypermedia Web, user-agents presumed there was a single, permanent proxy, typically for caching or content-filtering. We simply didn't envision dynamically calculating purpose-built chains of proxies for given transactions. I can testify to that personally because of the long-running failure of a proposal I made at W3C called PEP that called for just that. But for the services Web, SOAP has laid the groundwork for per- transaction intermediation. Microsoft's Henrik Frystyk Nielsen has gone so far as to propose Web Services routing and referral standards along these lines. Indeed, the new challenge is systematizing our ability to string together intermediaries at will. If we can call a multi-hop, multi-path composition of active proxies a route, it sounds reasonable that a device that automatically calculates and enacts such routes should be called a router. So having named it, what does it actually do, and how does it work? A router is a device that, given a symbolic name, resolves it into an address(es) of communication paths one layer below for onward delivery. An IP router maps IP addresses into the MAC address(es) of a LAN adapter. An application-layer router maps resource names into application protocol messages. For example, presenting an application-layer router with the document "party tonight!" at the URL /Rohit/announcements might be resolved into specific onward URLs such as mailto:adam@knownow.com, ftp://fred@mit.edu/inbox, and http://roy-s-webserver/logger.cgi , if there were three such routing rules, or "subscriptions" for each listening service. How would such a router be implemented? Once again, I appeal to Layer 3 precedents. The key to internetworking many different LAN protocols is that rather than translating them directly, each one is mapped to IP as an intermediate form. So a Layer-3 IP router would have several kinds of LAN adapter cards, and upon receipt of any packet, it would internally convert it to IP format, store away its copy, and then indicate to the input LAN that the data had been consumed. Then, at some later time, if the router hadn't been forced to throw it away due to memory exhaustion or aging, the destination addresses for the packet would be calculated, and it would be transmitted onward, after being translated out to a foreign LAN format if necessary. That is exactly how I propose a Layer-7 router ought to work. Or, specifically, what I term a SOAP router. Just as IP provided a metaformat for encoding many different addressing schemes, signaling messages, and payloads, I posit that a SOAP message is similarly flexible enough to intermediate all other major Layer-7 application protocols. Specifically, rather than using IP addresses to identify computers, we use WebDAV collections (directories) to identify topics. This way, FTP directories, mailboxes, newsgroups, and SNMP devices can all be mapped into WebDAV collections. Then, new or modified resources within the router's topic space can be delivered onward using the same range of supported protocols. "OK," you might agree, warily. "So what?" [... well from here on out, you had to be at the talk!]