Research Overview
My research topics are situated in the general domains of programming languages
and compilers, mobile code, and security. The common theme that underlies most
of my research emphases is virtualization, a topic that cross-cuts
many different traditional research areas. It is only very recently that
researchers in virtualization are starting to behave as a distinct research
community, and partly due to my own initiative, there is now a new ACM/USENIX
co-sponsored conference on Virtual
Execution Environments that
hopefully will serve as a crystallization point for a strong self-sufficient
virtualization research community.
Over time, the primary focus of my work has shifted away from pure performance aspects to encompass an ever larger share of security aspects. I believe that some of the most interesting research topics for years to come will lie at the intersection of performance and security, as many otherwise virtuous security schemes have no chance of ever being deployed unless they can also be made practical.
In the remainder of this web page, I first briefly
introduce the major research areas in which I have been working. Following
this overview,
I then describe each of the major themes within these areas in
chronological order in additional detail in its own section, with pointers
to selected publications.
Research in the Broader Area of Modular Programming Languages
My research career originated in the Modular Languages community centered
around my former advisor, Turing Award Winner Niklaus
Wirth. As part of my doctoral dissertation
research, I developed a new mobile-code representation based on compression
of abstract syntax trees, along with an implementation of a just-in-time compiler
for this format. My work preceded the arrival of Java by several years.
My work on component-oriented programming languages continues
the theme and spirit of Wirth's languages, but has taken a back seat in comparison
to the other research topics I have been working on. The reality of life at
the beginning of the 21st century is that new programming languages originating
in academia have almost no chance of gaining widespread adoption. Hence, while
I continue to serve on the Steering Committee
of the Joint Modular Languages conference series,
the primary focus of my research is no longer in this domain. In my other research,
I am fairly language agnostic, and consider both Java and C# adequate for the
tasks at hand.
Compiler-Oriented Research with Emphasis on Performance
The first major research project that I started in Irvine, and for which I was able to obtain one of the prestigious CAREER grants from the National Science Foundation (NSF), focused on continuous program optimization. The central insight here is that perfect profiling information is available at runtime (the real behavior of a program processing actual input data). This precise information can be used to optimize a program far more accurately than any off-line compiler could possibly do. Along with my student Thomas Kistler, who is now applying the same ideas at Transmeta, Inc. in their Code Morphing Software, I developed a prototype system in which the running software is continuously re-optimized in the background to adapt it to changing user behavior. The two main articles describing this work have both been published in ACM Transactions on Programming Languages and Systems (TOPLAS).
Another major research project has focused on in-network compilation of Java, with emphasis on resource-constrained devices. In order for a handheld device (such as a mobile phone or PDA) to run Java code, it needs to contain either a Java interpreter or a just-in-time compiler. Both options are costly: an interpreter needs to run at a high clock frequency to overcome the overhead of interpretation, while a just-in-time compiler consumes extra memory and processor resources for the compilation step. My solution has been to integrate the just-in-time compiler into the network itself, in what I call a code-generating router. The code-generating router translates mobile code into the native instruction set of the handheld device while it is passing through. For example, in a mobile telephony system, such a component could be co-located with the base station. This project has been funded by a $1M grant from the Office of Naval Research (ONR), on which I am the lead PI.
A related more recent project focuses on providing "virtual power" to handheld devices. While handheld, battery-powered devices such as personal digital assistants and Internet-enabled mobile phones are emerging as new access points to the world's digital infrastructure, their short battery lives are a factor that is holding back their enormous potential. The aim of this project is to leverage our code-generating network infrastructure to better support these devices. To this effect, we are developing adaptive just-in-time compiler technology (in the network) that continually monitors execution profiles and power consumption in the mobile device and then provides dynamically optimized code that enables the mobile device to make optimal use of the available resources. As part of this project, we are also studying approaches to automatically partition the computation in such a way that only one part executes on the mobile device, while the rest executes on a computing node inside the network. This project is funded by a $2M Information Technology Research (ITR) grant from the National Science Foundation; I am the lead PI.
A third project is targeting high-performance computing applications. These are hitting performance roadblocks on several fronts. First, it is becoming increasingly difficult to further increase clock frequencies because on-chip power densities are rapidly approaching hard physical limits. Second, even though there is substantial enthusiasm about Grid Computing, there are actually few real-world problems that exhibit coarse-grained parallelism to a sufficient degree that the computation can be distributed beneficially to hundreds of thousands of computing nodes. Future improvements in hardware are going to be mainly architectural, in the direction of chip multiprocessing. In this context, avoiding yet another roadblock, memory bandwidth, will be a major challenge.
This is where my work on dynamic binary parallelization comes in: My students and I have been building a virtual execution environment that takes existing PowerPC binaries and attempts to statically extract the residual parallelism from this code. At run-time, the code is then dynamically scheduled for speculative superblock execution on a chip multiprocessor. This work is currently not funded by any grant; the students working in this area have been supporting themselves through Teaching Assistantships.
My final research theme in the "performance" category concerns itself with native code for off-the shelf microprocessors, but with a twist involving virtual machines once again. In my vision of the computing future, eventually all code will execute in managed runtime environments such as Microsoft's "Dot Net" platform. This leads to the interesting question of how to deal with the vast amounts of legacy software that are available only as native code. My students and I have been studying how to execute legacy code on top of managed runtimes. In a project jointly sponsored by a gift from Microsoft Research and the State of California through the UC Micro program, we are investigating strategies for "upcompiling" existing low-level binaries into efficiently executing higher-level managed code.
Research with Simultaneous Emphasis on Performance and Security
The second major research project that I started at Irvine revisited the central
topic of my doctoral dissertation: what would be the ideal
transport format for mobile code if backward compatibility with existing
formats weren't an issue. It is now widely acknowledged that virtual machines,
such as the Java Virtual Machine or the Common Language Runtime, are not the
optimal approach. In the past 6 years, two of my Ph.D. students (who will both
graduate in 2005) have explored alternative ends of the solution space, funded
by a major (approximately $1M) grant that I obtained as sole PI from the Defense
Advanced Projects Research Agency (DARPA).
One student, Christian Stork, has focused on compressing "high-level" abstract syntax trees to obtain a mobile-code format that is safe by construction and that can transport verifiable program annotations that are useful to a just-in-time code generator. The central idea for making a format safe "by construction" is to enumerate all the possible legal programs, and transmit only the index that corresponds to the particular program to be transmitted. Such an enumeration is possible because programs are derived from a grammar; we found several interesting ways for making this practical. The second student, Jeffery von Ronne, has concentrated on the "low level", turning the compiler-friendly Static Single Assignment Form into a mobile-code representation. Based on my original idea (on which a patent has been filed that is pending at this time), we came up with a variant of SSA that can provide verifiable control safety and reference safety.
In a second major thrust that also considers performance and security simultaneously, I have been investigating systems that provide what I have termed foundational security. A major problem of current "secure mobile code" approaches has been that we are building these systems on top of completely unreliable operating systems: Hardly a day goes by without the discovery of some "critical flaw" that would enable a remote attacker to gain complete control of any machine on the Internet exhibiting that flaw. A determined adversary who wants to undermine my system today would be unlikely to attack it at the mobile-code layer; instead, he or she would very probably circumvent the mobile-code system altogether by exploiting a vulnerability the operating system.
To counter the problem of unsafe operating systems, I have been working on pushing the techniques that are now used in the "mobile code" domain much further down in the software stack. Ideally, one would run almost the entire operating system on top of a virtualization layer that provides code verification along with a typed hardware abstraction. The virtualization layer would be sealed along with the processor into a tamper-proof hardware device. In the resulting system, one would not even need to trust the boot block of the hard drive—all software would be verified ahead of every use. Unfortunately, however, current mobile-code systems such as Java or the Dot Net Common Runtime are hardly up to the task of supporting a complete operating system running on top. My research endeavors to find solutions that achieve this goal, based on a novel combination of virtual machines and proof-carrying code methods. This work is funded by the National Science Foundation's Trusted Computing Program and by an unrestricted gift from Intel Corporation.
Research with a Primary Focus on Security
An essential feature of currently proposed "Trusted Computing" platforms is remote attestation, by which a secure computing platform can vouch to a remote service provider (e.g. a video streaming site) that certain claimed properties about the local system configuration are true (e.g., that it is running a certain video player that won't make an illicit copy of the video). The current state of the art in this domain is rather primitive and involves digital signing of executables—i.e., it is platform-dependent and is not scalable to an ever-growing number of versions and patches.
My research on semantic remote attestation takes
the general idea of remote attestation and moves it to the level of objects
of a programming language. To this effect, my students and I have built
a trusted virtual machine that can vouch to a remote party that certain
properties of objects are true. This work has gained a lot of attention recently;
it won the Best Paper Award at this year's USENIX VM conference and
was one of 4 papers of the (in itself highly competitive) New Security Paradigms
Workshop (NSPW) that was selected for a further presentation at the "Best
of NSPW" session
in the 2004 Annual Computer Security Applications Conference (ACSAC) conference.
This research is funded primarily by an unrestricted gift from Sun Microsystems,
Inc.
In my work on Java bytecode verification,
my students and I have introduced a novel bytecode verification algorithm
that verifies bytecode via Static Single Assignment (SSA) form construction.
The
resulting SSA representation can immediately be used for optimization and
code generation. Our prototype implementation requires less time to transform
Java
bytecode into SSA form and verify it than it takes Sun's verifier to merely
confirm the validity of the bytecode, with the added benefit that SSA is
available "for
free" to later compilation stages.
Continuous Program Optimization (1996-2000)
In my first major research project after becoming an Assistant Professor, funded by an NSF CAREER grant, I investigated feedback-directed dynamic optimization. A working prototype system was built that continuously optimizes the already running software in the background during idle cycles.
The implemented system supports dynamic loading of additional libraries. Normally, separate compilation and dynamic loading are associated with a "modularization penalty", because at compile-time there is no information about what other libraries will eventually be used concurrently. In a system such as ours, optimization occurs at run-time, with perfect information about the participating code pieces, so that cross-optimization is possible—for example, in-lining across code distribution unit boundaries (from one module into another, or even from the operating system into application code).
Our system uses live profiling information from the current execution to guide its optimization decisions, and was one of the first systems to do so. Previous optimizers used a "training run" to collect profiles that were then used to steer an off-line optimization, which would in turn yield the final object program. This results in good performance only if the eventual input data is similar to that used earlier during training. In contrast, our system bases its optimizations on observations made only seconds earlier, using the actual input data.
The availability of accurate profiling information that relates to the actual input data opens the door to completely new optimizations. In this project, we concentrated on optimizations for the data-cache hierarchy, in particular on object-oriented programs creating dynamic data structures. By optimizing the internal storage layout of dynamic data objects, we were able to reduce cache misses by up to 35% and increase performance by up to 24%.
Our algorithm proceeds in two phases: first, it identifies data members of an object that are frequently accessed closely after one another. If such a cluster of data members can all be mapped onto the same cache line, then any cache miss will bring all of the related data items into the cache simultaneously, reducing future cache misses. Second, we compute the optimal ordering of the individual data members on the same cache line. Since cache lines are always filled in ascending order by the memory controller, the predominant access order in the execution order should reflect the storage order in memory. Additionally, our algorithm exploits the fact that there is a "preferred" bank in banked memory systems and assigns data elements to it that are more likely to cause the initial cache miss.
Our system profiles a running program in the background, and then creates a new version incorporating the new data layout. When the new version is ready, execution is briefly interrupted and a special version of the garbage collector is run. This not only reclaims storage, but it also transforms all reachable data items to the new internal layout. Finally, execution resumes using the new program code and the transformed data. Complex as it sounds, we were able to implement and fully debug this system, and demonstrate its benefits in a series of benchmarks.
The work is summarized in two major journal papers:
- T. Kistler and M. Franz; "Automated Data-Member Layout of Heap Objects to Improve Memory-Hierarchy Performance"; ACM Transactions on Programming Languages and Systems (TOPLAS), Vol. 22, No. 3, pp. 490-505; May 2000.
- T. Kistler and M. Franz; "Continuous Program Optimization: A Case Study"; ACM Transactions on Programming Languages and Systems (TOPLAS), Vol. 25, No. 4, pp. 500-548; July 2003.
This project has also resulted in highly effective technology transfer: Thomas Kistler, the graduate student who was the co-author on all three of these papers is now working at Transmeta, Inc., a company literally founded on dynamic runtime optimization. Their product is partly hardware (a custom VLIW processor) and partly software (a runtime binary translator) that work together to implement the x86 architecture.
Ideal Transport Format for Mobile Code (1999-present)
Mobile code platforms, such as the Java Virtual Machine and the "Dot Net" Common Language Runtime, are a huge achievement, allowing for target-machine independent code whose safety can be verified by the code consumer. Unfortunately, however, the designs of their virtual-machine based code representations don't optimally support the needs of just-in-time optimizing compilers and dynamic optimizers. With these two current mobile-code transportation formats, almost all the analyses performed by optimizing compilers need to occur at the code consumer, repeatedly, rather than being performed only once at the code producer and transported along with the mobile code itself. Moreover, the verification process itself is costly as it requires iterative data-flow analysis.
In the course of the past 6 years, my graduate students and I have been studying
alternatives to the virtual machine approach. In particular, I invented a class
of representations for target-machine independent mobile programs that can provably
encode only legal programs. Hence, there is no way in which an adversary
could even construct a malicious mobile program: Every well-formed mobile program
that is expressible in such an encoding is guaranteed to map back to a source
program that is deemed legal in the original source context, and mobile programs
that are not well-formed can be rejected trivially. Further, such an encoding
can be designed to guarantee not only referential integrity and type safety
within a single distribution module, but also to enforce these properties across
compilation-unit boundaries.
A problem of previous approaches to mobile code has been that the additional provisions for security usually lead to a loss of efficiency, often to the extent of making an otherwise virtuous security scheme unusable for all but trivial programs. To avoid this trap, my research from the onset has focused not only on security, but also on performance. To avoid coming up with yet another ad-hoc solution, I decided to explore the solution space from opposite ends: Christian Stork's dissertation research examines the problem from the high-end, basing the mobile code format on compressed abstract syntax trees that are semantically close to the source language. In order to make efficient compilation possible at the target, these syntax trees are augmented by verifiable annotations. Jeffery von Ronne's dissertation, on the other hand, starts at the low end, using a variant of Static Single Assignment Form as the basis of the mobile code format.
The following three publications give an overview of this research theme. The first is a broad summary paper that describes the two contrasting approaches of "high-level" vs. "low-level" encoding. The second paper describes a new escape analysis algorithm that is efficiently verifiable at the code consumer—it thereby makes it possible to shift most of the effort required for this analysis from the code consumer to the code producer. The third is the conference paper that introduced our SSA-based mobile code representation; my pending patent application is based on this paper.
- M. Franz, W. Amme, M. Beers, N. Dalton, P. H. Fröhlich, V. Haldar, A. Hartmann, P. S. Housel, F. Reig, J. von Ronne, Ch. H. Stork, and S. Zhenochin; "Making Mobile Code Both Safe And Efficient"; in J. Lala (Ed.), Foundations of Intrusion Tolerant Systems; IEEE Computer Society Press; ISBN 0-7695-2057-X; pp. 337-356; December 2003.
- M. Beers, Ch. Stork, and M. Franz; "Efficiently Verifiable Escape Analysis"; in M. Odersky (Ed.), Proceedings of the 18th European Conference on Object-Oriented Programming (ECOOP 2004), Oslo, Norway, ISBN 3-540-22159-X, pp. 75-95; June 2004.
- W. Amme, N. Dalton, J. von Ronne, and M. Franz; "SafeTSA: A Type Safe and Referentially Secure Mobile-Code Representation Based on Static Single Assignment Form"; in Proceedings of the ACM Sigplan Conference on Programming Language Design and Implementation (PLDI 2001), Snowbird, Utah, pp. 137-147; June 2001.
Component-Based Software Languages and Systems (2001-2004)
Hierarchical decomposition is a technique that is central to most engineering disciplines: A problem is successively divided "top-down" into sub-problems until it is small enough to be solved directly, yielding a self-contained sub-system that will become part of the eventual solution. The individual sub-systems that have been created in this manner are then combined in a "bottom up" fashion. Since the various sub-systems have been designed in isolation to each fulfil a particular task, they can later be put together in combinations that differ from the original configuration, i.e., they can be reused. Hierarchical decomposition has the important structural property that when several smaller parts are combined into a larger part, the larger part "shadows" the smaller ones. One no longer needs to understand the smaller parts in order to use the larger part that has been constructed out of them. Just as importantly, one can even change the interfaces between the smaller parts without affecting the interface of the large part to the outside world.
Curiously, this structural property is lost in most object-oriented systems. Object-oriented design often differs from hierarchical decomposition because inheritance can be used not only to express specialization, but also generalization. As a consequence, object-oriented design does not automatically create tree structures. For example, in languages that support multiple inheritance, each of several base classes can be used in the specification of several dependent ones, creating a web of dependencies. And even without multiple inheritance, object-oriented design destroys locality. Objects are backward-compatible with objects of their superclasses, while method overriding has the effect that the meaning of a certain piece of code can be retroactively changed in a future extension—a situation that is exacerbated in the presence of dynamic class loading.
The goal of my research in component-based systems has been to overcome these old problems of object-orientation and return to the simplicity of hierarchical decomposition. I have been investigating new ways for structuring large, extensible software systems, using direct programming-language support where possible. The vehicle for this research has been an experimental programming language called Lagoona, in which the main code-reuse paradigm is message forwarding, rather than method inheritance. A Lagoona object receiving a message for which it has no explicitly associated method implementation can forward the received message to another object, even without being aware of what the message actually signifies. This approach results in system architectures that are strictly hierarchical black-box compositions of reuse components, rather than the intricate self-referential patterns often found in object-oriented designs. Nevertheless, Lagoona¹s message forwarding mechanism is powerful enough to emulate both the class-based as well as the prototype-object-based inheritance mechanisms offered by more conventional object-oriented programming languages.
We are currently in the midst of a paradigm shift toward component-oriented software development, and significant progress has been made in understanding and harnessing this new paradigm. Somewhat strangely then, the new paradigm does not currently extend to the level at which the components themselves are constructed. While we have composition architectures and languages that describe how systems are put together out of such atomic program parts, the parts themselves are still constructed based on the previous paradigm of object-oriented programming. This is a mismatch that is holding back compositional software design: many of the assumptions that underly object-oriented systems simply do not apply in the open and dynamic contexts of component software environments. I believe that using a language that supports component-oriented programming at the smallest granularity is the answer, and Lagoona is a prototype of such a language.
An overview of this research theme can be found at:
- M. Franz, P. H. Fröhlich, and A. Gal; "Supporting Software Composition at the Programming-Language Level"; Science of Computer Programming, Special Issue on New Software Composition Concepts; accepted for publication.
In-Network Compilation for Resource-Constrained Devices (2001-present)
Given the acknowledged importance of existing and emerging mobile code technologies, remarkably little attention has so far been devoted to the manner in which mobile programs are actually deployed. The predominant model by far, which for example underlies the distribution of Java "applets" over the Internet, identifies dynamically linkable parts of mobile programs by URL strings. The model further assumes that the constituent parts that make up a mobile program will be downloaded to a single location, and then verified, linked, possibly dynamically compiled, and finally executed at that very location. This model is unnecessarily simplistic, as it precludes many useful alternative deployment strategies, particularly in the realm of network-connected embedded devices.
With the ongoing expansion of wireless coverage, once autonomous devices such Personal Digital Assistants (PDAs), Internet-capable mobile phones, and mobile sensors have evolved into nodes of large distributed systems. Their increased connectivity and flexibility has opened up an enormous potential for the execution of mobile code. Astonishingly, at the same time there has been no change to the role of the network: The execution model for mobile code is the same for small devices as it is for powerful workstations. Current approaches for bringing the benefits of mobile code to PDAs and other embedded devices rely on node-local just-in-time compilation or, even worse, on pure interpretation by virtual machines. In a resource-constrained setting, these approaches often produce unsatisfactory results.
My research goal in this area has been to effect a paradigm shift toward a network-centric mobile code execution model. Instead of the traditional device-centric approach that embeds a just-in-time compiler component into every mobile device, we shift the mobile code optimization and compilation effort into the networking infrastructure itself, into what I have termed code generating routers. A code generating router intercepts mobile-code Hypertext Transfer Protocol (HTTP) requests originating from mobile devices, reissues them to the original target server, retrieves the Java class files sent from the server in response, compiles them on the fly, and then returns a native binary to the mobile device instead of the Java class file originally requested.
This architecture turns the execution model for mobile code from being decentralized and device oriented into being centralized and network oriented, thereby saving resources on the mobile device in terms of processor time, memory and battery consumption. This is achieved without changing the current client-server architecture for mobile code deployment. The latter is particularly important, since it provides for backward compatibility with existing mobile code solutions.
An interesting side-effect of placing the code generator on a powerful stationary computer rather than into the resource-constrained device is that it thereby becomes possible to perform elaborate optimizations. One such highly beneficial optimization is rapid type analysis, which allows to pinpoint all library routines that will actually be used by a program. For example, a trivial "Hello World" program imports over 500 Kilobytes of libraries, but actually uses only about 15 Kilobytes of this. The size of the standard Java libraries is so large that handheld devices typically include only a small subset. Our technique allows to identify the routines that are actually called, which can then be sent to the handheld device along with the program being executed. Hence, our approach is a viable way of supporting the complete Java language standard on such small devices, rather than merely subsets as is the current practice.
A more detailed overview of this work has appeared in:
- Ch. W. Probst, A. Gal, and M. Franz; "Code Generating Routers: A Network-Centric Approach to Mobile Code"; in Proceedings of the 2003 IEEE 18th Annual Workshop on Computer Communications (CCW'2003), Dana Point, California, IEEE Press, ISBN 0-7803-8239-0, pp. 179-186; October 2003.
Virtual Power for Handheld Devices (2002-present)
A more recent project, closely related to the previous research theme on in-network compilation, focuses specifically on battery-operated handheld devices. As the Internet keeps growing in size and capability, a vision of wireless ubiquitous computing is taking shape that will enable "anywhere" access to the Internet's vast offerings and create wholly new applications based on a user's physical location.
But while handheld, battery-powered devices such as personal digital assistants and web-enabled mobile phones are emerging as new access points to the world's digital infrastructure, their relatively high cost and short battery life are factors that are holding back their enormous potential.
The aim of this particular project is to leverage our code-generating network infrastructure to simultaneously address the cost and battery-life issues affecting handheld devices, thereby getting one step closer to a broad vision of ubiquitous computing: not just "anywhere" access to the infrastructure, but also by "anyone" (implying a low cost of mobile devices) and "anytime" (implying long-term battery autonomy). Our specific focus is the digital university campus with wireless Internet coverage.
In this setting, our aim is to increase the utility and battery-life and decrease the cost of handheld wireless computers by enabling the use of relatively simple hardware for the mobile devices. Somewhat surprisingly, our research is entirely in the software domain. Our solution is to mask the limited computational prowess of a handheld device by seamlessly coupling it to a relatively high-powered stationary computational infrastructure via an "always on" wireless connection. By off-loading power-intensive operations to the stationary infrastructure, we provide the battery-powered mobile device with what we term virtual power. Our project studies the software techniques that are necessary to implement virtual power, constructs a prototype, and then validates our findings empirically.
In particular, we are developing adaptive just-in-time compiler technology for minimizing power use on mobile devices running mobile code, and adaptive scheduling methods using results from the Computational Grid research community. Depending on the algorithm to be run on the mobile device and its current distance from the nearest base station, the computation to be performed is automatically partitioned between a part to be executed in the stationary infrastructure and another to be run on the mobile device. This partitioning is transparent to any outside server communicating with the mobile program in a client-server setting; the illusion created is that the mobile program is executing in its entirety on the mobile device. Hence, for example, one could execute a downloadable version of a spreadsheet package even though the mobile device itself didn't have the minimum resources normally required for this task.
My ultimate aim in this research theme is to create a blueprint for a low-cost ubiquitous computing infrastructure based on existing technology and open standards. I believe that this research is complementary to commercial interests addressing mainly the corporate-customer end of the mobile computing spectrum. The prototype system that my students and I are building will be deployed in an educational setting, as a technology to enable student learning and interaction. As such, I anticipate that the most rigorous verification of our results will be demonstrated by the degree to which students working on the project take advantage of the new capabilities.
There are not yet any publications that describe any actual results from this research theme. However, an early paper that describes the vision behind this theme (and, in part, also the previous theme) is the following:
- M. Franz; "A Fresh Look At Low-Power Mobile Computing"; in L. Benini, M. Kandemir, J. Ramanujam (Eds.), Compilers and Operating Systems for Low Power; Kluwer Academic Publishers, Boston, ISBN 1-4020-7573-1, pp. 209-220; September 2003.
Dynamic Binary Parallelization (2003-present)
As long as there has been computing machinery, there has been a demand for ever more performance—and improvements in hardware have more or less consistently delivered such improvements at a steady state. But now, we are approaching an almost perfect storm at which the technologies that have been providing these steady improvements are coming up against insurmountable boundaries.
First, it is becoming increasingly difficult to further increase clock frequencies as on-chip power densities are rapidly approaching hard physical limits that exhaust the range of available cooling options. Already, the first water-cooled personal computers are on the market. Intel's recent announcement that in the future they will no longer be pursuing pure increases in clock frequency as a development goal was an ominous sign that the "End of the Age of Megahertz" is near.
Second, the memory hierarchy has not kept up with the improvements in central processing units, and the gap is widening. We now have multi-level caches with enormous bandwidth differences between the different layers, and latencies that are so large that they start dominating total execution times.
Third, in spite of major research efforts, we have been generally unable to discover the degree of parallelism in most real-world problems that would be necessary to profitably distribute these computations to large-scale computing grids with hundreds of thousands of nodes.
The solution to these problems is still likely to be architectural, in the form of chip multiprocessors and similar designs. However, exploiting such parallel hardware effectively will be difficult. It is not yet clear whether this difficulty will be reflected in increasing design complexity of the hardware, through the addition of features for exploiting coarser grain parallelism such as loop and thread-level speculation, or whether the burden will be placed on the programmer and/or compiler in the form of an explicitly parallel programming model.
My students and I have begun addressing this question by designing a software extended architecture (SEA) that permits to reach beyond instruction-level parallelism while preserving a strictly sequential programming model. As virtual-machine technology is maturing, this approach seems to be the most cost-effective path to performance (both in terms of programmer productivity as well as hardware complexity). Software extensions can duplicate hardware mechanisms without incurring the same degree of complexity costs.
Whereas previous SEAs have targeted instruction-level parallelism for performance enhancements, we are interested in coarser-grained parallelization, between the lower threshold of instruction-level parallelism and the upper threshold of (Grid-level) distributed computing. More specifically, this entails implementing speculative and dynamic multi-threading and vectorization mechanisms in software.
At this coarser-grained level of parallelism, we believe that, due to its wider program view, a SEA can provide equivalent or better performance than complex hardware-only implementations. We believe that SEAs can address both the hardware and software aspects of the challenges of improving program performance on explicitly parallel processors, while preserving the illusion of sequential execution.
Our prototype takes existing PowerPC binaries and attempts to statically extract the residual parallelism from this code. At run-time, the code is then dynamically scheduled for speculative superblock execution on a chip multiprocessor. An introduction into this work can be found in:
- E. Yardimci, N. Dalton, Ch. Fensch, and M. Franz; "Azure: A Virtual Machine for Improving Execution of Sequential Programs on Throughput-Oriented Explicitly-Parallel Processors"; in Proceedings of the 11th International Workshop on Compilers for Parallel Computers (CPC 2004), Seeon, Germany, Shaker Verlag, pp. 61-174; July 2004.
Security "From The Ground Up" (2002-present)
Despite numerous "trusted systems" initiatives, current systems software remains riddled with errors. The recent "Slammer" incident shows that an adversary exploiting just one such error (a "buffer overrun") can cause tens of thousands of hosts to fail around the world in just minutes. Even more suprisingly, for this particular vulnerability there had been a patch available more than six months earlier, and yet even many of the OS manufacturer's (Microsoft's) own computers succumbed to the attack.
Two facts are becoming increasingly evident: First, operating systems are becoming so large and evolving so quickly that it is getting prohibitively expensive to manually inspect every line of code for the absence of errors. Second, the current approach of "patching" errors as they are discovered is failing us—on one hand, we now have attacks that can spread worldwide in minutes. On the other hand, if not even Microsoft can keep its own computers current, then how can one expect that other organizations will apply all patches immediately as they are released, and in the correct order?
Worse yet, critical software is increasingly developed outside of the United States and/or using "open-source" processes. Some of the developers might actually be agents of foreign nation states. Although one of the tenets of the open-source movement is that the resulting code is safer because users can inspect programs for the possible existence of back-doors, this is mostly wishful thinking. It is virtually impossible to manually audit millions of lines of code—and many open source projects have no strong audit controls, so that it may be impossible to attribute individual code fragments to the programmers who inserted them. As a consequence, we may be basing trustworthy application programs on completely untrustworthy operating-system foundations.
My solution uses critical advances from the realm of mobile code to make systems safe "from the ground up". In many mobile-code systems, incoming programs from untrusted hosts are verified prior to execution. Verification means that the code itself is automatically examined before it is executed—this is very different from (cryptographic) code signing, in which only the transport envelope of the code is checked and the code is trusted blindly if the check succeeds. Using the verification idea from the mobile-code domain, we are currently building a complete system in which almost all of the code is verified prior to every execution, including the operating system itself. The only thing that needs to be trusted is a minimal safe-code platform core (encompassing a verifier and a small dynamic code generator), which can be made small enough to make it feasible to manually verifiy this line by line, using techniques appropriate for mission-critical software, such as fly-by-wire control systems. The core can then be sealed along with the processing unit into a tamper-proof hardware implement. Everything above this layer is verified, i.e., even the code in the root directory of the local hard drive no longer needs to be trusted.
Our solution combines the best features of previous Virtual Machine (VM) and Proof-Carrying Code (PCC) approaches. Our hybrid safe-code solution uses a virtual machine that has been designed specifically to support proof-carrying code, while simultaneously providing efficient just-in-time compilation and target-machine independence. In particular, our approach reduces the complexity of the required proofs, resulting in fewer proof obligations that need to be discharged at the target machine.
A first paper has appeared at the highly selective (acceptance rate below 20%) 2003 ACM Sigplan Workshop on Interpreters, Virtual Machines, and Emulators (IVME); an extended version of this paper was selected for a special issue of the journal Science of Computer Programming.
- D. Chandra, A. Gal, V. Haldar, F. Reig, and N. Wang; "A Portable Virtual Machine Target For Proof-Carrying Code"; Science of Computer Programming, Special Issue on Interpreters, Virtual Machines, and Emulators; accepted for publication.
Semantic Remote Attestation (2003-present)
Trusted Computing is an effort to bring some of the security properties of closed, proprietary systems to open, commodity systems. Trusted computing introduces mechanisms and components in both hardware and software that check and enforce the integrity of a system, and allow it to authenticate itself to remote systems. For example, a trusted start-up procedure makes sure that the operating system has not been tampered with. Using a chain of reasoning that starts from a trusted hardware module, we can arrive at a conclusion about the state of a system after boot-up. Similarly, we can deduce for sure what particular program is running on a system.
Remote attestation is the process by which an application authenticates itself to a remote party. When asked to authenticate itself, an application asks the operating system for an endorsement. The operating system signs a hash of the executable of the application. The entire certificate chain, starting from the trusted module all the way up to the application, is sent to the remote party. The remote party verifies each certificate of this chain, and also checks that the corresponding hashes are of software it approves. Note that the remote party needs to know exactly what it is looking for.
This method of remote attestation suffers from several critical drawbacks: (1) It says nothing about program behavior. (2) It is static, inflexible and inexpressive. (3) Upgrades and patches to programs are hard to deal with, and (4) It is fundamentally incompatible with a widely varying, heterogeneous computing environment.
We have been striving to make remote attestation more flexible and expressive by leveraging language-based techniques and virtual machines. The goal is to attest program behavior, not a particular binary. Our domain is platform-independent mobile code that runs on a virtual machine. Instead of a trusted operating system, we use a Trusted Virtual Machine (TVM) to attest to remote parties various properties of code running within it. In our model, requests for remote attestation are made to the TVM. These requests ask for the enforcement or checking of specific security policies on code that is being run by the TVM. Thus, what is enforced is not the execution of a particular binary, but security policies and other constraints specified by a remote party. Note that this sort of attestation is a much more fine-grained and semantically richer operation than signing the hash of an executable image—we call this semantic remote attestation
Some examples of properties that a trusted virtual machine can attest to are the dynamic runtime state of the program currently executing, its input, the classes it consists of, and the interfaces it observes. A TVM also has the ability to run arbitrary analysis code (within the limits set by the security policy of the local host) on the program being attested on behalf of the remote party. Thus the remote party can test for a wide variety of properties by sending across code that does the appropriate analysis. Finally, a TVM can test host system properties: a remote party can send across code that tests certain relevant system and virtual machine properties, and the TVM can attest its results. For example, before running a distributed computing program (such as SETI@Home, or Folding@Home), the server may want to test the floating point behavior of target platform by having the TVM run a test suite of floating point programs.
Our first paper on this research won the Best Paper Award at the 2004 USENIX VM conference. A second paper got accepted to the highly selective 2004 New Security Paradigms Workshop (NSPW) and is one of 4 papers selected for a second presentation in a "Best of NSPW" session at ACSAC 2004.
- V. Haldar, D. Chandra, and M. Franz; "Semantic Remote Attestation: A Virtual Machine Directed Approach to Trusted Computing"; in Proceedings of the 3rd USENIX Virtual Machine Research & Technology Symposium (VM¹04), San Jose, California, ISBN 1-931971-20-X, pp. 29-41; May 2004. Winner of Best Paper Award.
- V. Haldar and M. Franz; "Symmetric Behavior-Based Trust: A New Paradigm for Internet Computing"; to appear in Carla Marceau, Simon Foley (Eds.), New Security Paradigms Workshop (NSPW 2004), White Point, Nova Scotia; September 2004. Selected for ACSAC 2004 Presentation.
Java Bytecode Verification (2003-present)
Java Virtual Machine instructions ("bytecodes") can read and store intermediate values in two different locations: the operand stack and local variables. These locations are ad-hoc polymorphic in that the same stack location or local variable can hold values of different types during program execution. Verification ensures that these locations are used consistently and intermediate values are always read back with the same types that they were originally written as.
Verification also ensures control-flow safety, but this is a comparatively trivial task. Conversely, verifying that the data flow is well-typed is rather complex. The JVM bytecode verifier uses an iterative data-flow analysis algorithm for this task, and unfortunately, this algorithm has an execution complexity that rises quadratically with program length. This gives rise to my prediction that we will soon witness denial-of-service attacks on mobile-code systems that will exploit the algorithmic complexity of the verifier itself: By sending a legal, but difficult-to-verify program to a server virtual machine, one can keep that server occupied for an inordinate amount of time, effectively making it unavailable for useful work. In my experiments, a 4kByte JAR file containing Java code with worst-case verification complexity required more than 15 minutes of verification time on a 2.4 Gigahertz Pentium IV processor.
Java bytecode verification has other problems. If the verification task determines that the bytecode is indeed safe, then this code is forwarded in its original form (as received from the code producer) to the JVM's execution component, which may be an interpreter or a just-in-time compiler. Beyond the result denoting whether or not verification was successful, all other information computed by the verifier is discarded and is not passed onwards. In many cases, this results in a duplication of work when a just-in-time compiler subsequently performs a very similar data-flow analysis all over again.
My students and I have found an alternative verification mechanism that avoids such duplication of work. Instead of verifying Java Virtual Machine bytecode directly, we first annotate it in such a way that the flow of values between instructions becomes explicit rather than going through the operand stack. In a second step, we then transform further into a Static Single Assignment (SSA) variant of JVM bytecode. Verifying programs in this format can then be performed in near linear-time and without requiring an iterative data-flow analysis. Moreover, since Dominator Tree construction and SSA generation are already performed during the verification phase, and can be re-used "for free" during subsequent code generation and optimization, our method can speed up a subsequent just-in-time compilation phase significantly.
Our benchmarks indicate that the aggregate time required for first transforming JVM bytecode into the SSA-based form and then verifying in that form is still less than the time needed for performing the standard verification algorithm directly on JVM bytecode. Our approach imposes no overhead for methods that will be interpreted without just-in-time compilation, because SSA-based verification is still overall faster than the traditional verifier. Moreover, by providing an SSA representation "for free", the cost of just-in-time compilation is lowered. This makes it possible to compile larger parts of the program on-the-fly without exhausting the patience of end-users.
The work so far has led to two technical reports; one of them is currently under review by a conference:
Managed Code Environments for Legacy Software (2004-present)
The current practice of "patching" software vulnerabilities as soon as they are discovered is not sustainable. First, the sheer number of such threats is growing rapidly—the latest Symantec Security Threat Report records 4,496 new viruses and worms for the Windows platform for the first six months of 2004 alone, an increase of more than 300% over the previous six months, and 1,237 newly discovered distinct vulnerabilities, an average of almost 7 per day. Second, and more seriously, the time window available between discovery of a vulnerability and the first emergence of an exploit has diminished to 5.8 days on average, and is expected to decline further. As lead times keep shrinking, it is becoming increasingly difficult to develop, distribute, and apply such "patches" before an actual exploit arrives.
In the long term, type-safe languages such as Java will hopefully prevent buffer overflow and memory corruption errors from occuring in the first place. Unfortunately, however, the overwhelming majority of application programs in service today and for many years to come are legacy applications written in unsafe languages such as C and C++. In this research theme, I am addressing such existing legacy programs, including programs for which no source code is available.
Previous software-based solutions for hardening existing applications written in unsafe languages against memory usage errors are focusing mainly on static analysis of source code and require some form of human input to perform the final disambiguation between actual errors and false positives. Even if adoption of static-analysis techniques would pick up, it would not only take decades to retrofit all existing legacy systems, but it would still not address the many systems for which no source code is available in the first place.
Instead of such automated analysis of program source code guided by manual annotations, my students and I have been working on building an environment for safe legacy software execution that uses virtual machine technology to enforce strict memory-safety properties during execution, by checking each pointer operation and control flow transfer.
Our virtual machine executes binary code that could in principle be executed directly on a hardware processor. In contrast to a physical CPU, however, our virtual machine assumes all pointers to be wild and strictly enforces memory safety using extensive pointer checking. While such a virtual machine in general will execute code significantly slower than a physical CPU, previous work by others has shown that the overall performance degradation introduced by software interpretation of an actual processor's instruction set can be made negligible as long as "hot" traces are executed at full speed. For this, our virtual machine monitors the execution frequency of code traces and selects candidate traces for further optimization.
For frequently executed code traces, a generalized form of software trace scheduling is used to eliminate redundant safety checks from frequently executed code blocks (loops). For such a loop trace, the data-flow analysis required to hoist redundant checks out of the loop, to optimize them, or to eliminate them altogether is trivial and very fast, because only a linear sequence of instructions has to be considered.
This work has only just started; a technical report is in preparation.
|