Runtime resource management for many-core systems is increasingly complex. The complexity can be due to diverse workload characteristics with conflicting demands, or limited shared resources such as memory bandwidth and power. Resource management strategies for many-core systems must distribute shared resource(s) appropriately across workloads, while coordinating the high-level system goals at runtime in a scalable and robust manner. In this chapter, the concept of reflection is used to explore adaptive resource management techniques that provide two key properties: the ability to adapt to (1) changing goals at runtime (i.e., self-adaptivity) and (2) changing dynamics of the modeled system (i.e., self-optimization). By supporting these self-awareness properties, the system can reason about the actions it takes by considering the significance of competing objectives, user requirements, and operating conditions while executing unpredictable workloads.
We are seeing an increasing number of complex cyber-physical systems (CPS) deployed for various applications, such as road-traffic control involving communicating autonomous cars and infrastructure, or smart grids controlling energy delivery down to the individual device. These distributed applications follow common design objectives, such as energy-efficiency, and require guarantees for high availability, real time or safety. In this context, autonomy is crucial: multiple system goals varying over time need to be adaptively managed and objectives holistically coordinated. By empowering future CPS with self-awareness, these systems promise to dynamically adapt, learn, and manage unforeseen changes.
In order to provide performance increases despite the end of Moore's law and Dennard scaling, architectures aggressively exploit data- and thread-level parallelism using billions of transistors on a single chip, enabled by extreme geometry miniaturization. A resulting challenge is the control, optimization, and reliable operation of such complex multiprocessing architectures. Modern and future systems will be required to operate under multi-dimensional variability: from varying workload, quality-of-service (QoS) goals, and non-functional requirements to varying environmental and operating conditions. A trend has recently emerged to abstract such complex multiprocessing architectures as self-aware factories whose resources are monitored, configured and their use is planned during runtime. In this paper, we present the Information Processing Factory (IPF) paradigm for mixed-criticality. We introduce its 5-layer hierarchical organization and a system configuration framework that ensures that the strict requirements of the safety-critical functions are always met while dynamically managing and optimizing the mixed-critical system at runtime. We illustrate the application of (IPF) in heterogeneous domains with two representative use-cases (healthcare and automotive), investigate the use of (IPF) to achieve long-term dependability, and highlight the open challenges. Experimental results report the reliability levels achievable with the proposed paradigm.
Users of embedded and cyber-physical systems expect dependable operation for an increasingly diverse set of applications and environments. Reactive self-diagnosis techniques either use unnecessarily conservative guardbands, or do not prevent catastrophic failures. In this letter we utilize machine-learning techniques to design a prediction engine in order to predict failures on-device in embedded systems. We evaluate our prediction engine’s effectiveness for predicting temperature behavior on a mobile system-on-chip, and propose a realizable hardware implementation for the use-case.
MPSoCs increasingly depend on adaptive resource management strategies at runtime for efficient utilization of resources when executing complex application workloads. In particular, conflicting demands for adequate computation performance and power-/energy-efficiency constraints make desired application goals hard to achieve. We present a hierarchical, cross-layer hardware/software resource manager capable of adapting to changing workloads and system dynamics with zero initial knowledge. The manager uses rule-based reinforcement learning classifier tables (LCTs) with an archive-based backup policy as leaf controllers. The LCTs directly manipulate and enforce MPSoC building block operation parameters in order to explore and optimize potentially conflicting system requirements (e.g., meeting a performance target while staying within the power constraint). A supervisor translates system requirements and application goals into per-LCT objective functions (e.g., core instructions-per-second (IPS). Thus, the supervisor manages the possibly emergent behavior of the low-level LCT controllers in response to 1) switching between operation strategies (e.g., maximize performance vs. minimize power; and 2) changing application requirements. This hierarchical manager leverages the dual benefits of a software supervisor (enabling flexibility), together with hardware learners (allowing quick and efficient optimization). Experiments on an FPGA prototype confirmed the ability of our approach to identify optimized MPSoC operation parameters at runtime while strictly obeying given power constraints.
The number and complexity of embedded system platforms used in mixed-criticality applications are rapidly growing. They run large and evolving applications on heterogeneous multi- or manycore processing platforms requiring dependable operation and long lifetime. Examples include automated and autonomous driving, smart buildings, industry 4.0, and personal medical devices. The Information Processing Factory (IPF) applies principles inspired by factory management to master the complexity of future, highly- integrated embedded systems and to provide continuous operation and optimization at runtime. A general objective is to identify a sweet spot between a maximum of autonomy among IPF constituent components and a minimum of centralized control in order to ensure guaranteed service even under strict safety and availability requirements. This paper addresses the challenges of IPF and how to tackle them with a set of techniques: self-diagnosis for early detection of degradation and imminent failures combined with unsupervised platform self-adaptation to meet performance and safety targets.
Resource management strategies for many-core systems dictate the sharing of resources among applications such as power, processing cores, and memory bandwidth in order to achieve system goals. System goals require consideration of both system constraints (e.g., power envelope) and user demands (e.g., response time, energy-efficiency). Existing approaches use heuristics, control theory, and machine learning for resource management. They all depend on static system models, requiring a priori knowledge of system dynamics, and are therefore too rigid to adapt to emerging workloads or changing system dynamics. We present SOSA, a cross-layer hardware/software hierarchical resource manager. Low-level controllers optimize knob configurations to meet potentially conflicting objectives (e.g., maximize throughput and minimize energy). SOSA accomplishes this for many-core systems and unpredictable dynamic workloads by using rule-based reinforcement learning to build subsystem models from scratch at runtime. SOSA employs a high-level supervisor to respond to changing system goals due to operating condition, e.g., switch from maximizing performance to minimizing power due to a thermal event. SOSA's supervisor translates the system goal into low-level objectives (e.g., core instructions-per-second (IPS)) in order to control subsystems by coordinating numerous knobs (e.g., core operating frequency, task distribution) towards achieving the goal. The software supervisor allows for flexibility, while the hardware learners allow quick and efficient optimization. We evaluate a simulation-based implementation of SOSA and demonstrate SOSA's ability to manage multiple interacting resources in the presence of conflicting objectives, its efficiency in configuring knobs, and adaptability in the face of unpredictable workloads. Executing a combination of machine-learning kernels and microbenchmarks on a multicore system-on-a-chip, SOSA achieves target performance with less than 1% error starting with an untrained model, maintains the performance in the face of workload disturbance, and automatically adapts to changing constraints at runtime. We also demonstrate the resource manager with a hardware implementation on an FPGA.
Studies have shown memory needs vary significantly across applications. Recent work has explored using hybrid memory technology (SRAM+NVM) in on-chip memories of multicore processors (CMPs) to support the varied needs of diverse workloads. Such works suggest architectural modifications that require supplemental management in the memory hierarchy. Instead, we propose to deploy hybrid memory in a manner that integrates seamlessly with the existing heterogeneous multicore (HMP) architectural model, and therefore does not require any supplemental management, simply the integration of different memory technologies on-chip. We evaluate platforms with a combination of/ast (SRAM cache) and slow (STT-MRAM cache) core-types for mobile workloads.
Resource management strategies for many-core systems need to enable sharing of resources such as power, processing cores, and memory bandwidth while coordinating the priority and significance of system- and application-level objectives at runtime in a scalable and robust manner. State-of-the-art approaches use heuristics or machine learning for resource management, but unfortunately lack formalism in providing robustness against unexpected corner cases. While recent efforts deploy classical control-theoretic approaches with some guarantees and formalism, they lack scalability and autonomy to meet changing runtime goals. We present SPECTR, a new resource management approach for many-core systems that leverages formal supervisory control theory (SCT) to combine the strengths of classical control theory with state-of-the-art heuristic approaches to efficiently meet changing runtime goals. SPECTR is a scalable and robust control architecture and a systematic design flow for hierarchical control of many-core systems. SPECTR leverages SCT techniques such as gain scheduling to allow autonomy for individual controllers. It facilitates automatic synthesis of the high-level supervisory controller and its property verification. We implement SPECTR on an Exynos platform containing ARM»s big.LITTLE-based heterogeneous multi-processor (HMP) and demonstrate that SPECTR»s use of SCT is key to managing multiple interacting resources (e.g., chip power and processing cores) in the presence of competing objectives (e.g., satisfying QoS vs. power capping). The principles of SPECTR are easily applicable to any resource type and objective as long as the management problem can be modeled using dynamical systems theory (e.g., difference equations), discrete-event dynamic systems, or fuzzy dynamics.
This paper deals with challenges and possible solutions for incorporating self-awareness principles in EDA design flows for autonomous systems. We present a holistic approach that enables self-awareness across the software/hardware stack, from systems-on-chip to systems-of-systems (autonomous car) contexts. We use the Information Processing Factory (IPF) metaphor as an exemplar to show how self-awareness can be achieved across multiple abstraction levels, and discuss new research challenges. The IPF approach represents a paradigm shift in platform design by envisioning the move towards a consequent platform-centric design in which the combination of self-organizing learning and formal reactive methods guarantee the applicability of such cyber-physical systems in safety-critical and high-availability applications.
Dynamic voltage and frequency scaling (DVFS) is a well-established technique for power management of thermal-or energy-sensitive chip multiprocessors (CMPs). In this context, linear control theoretic solutions have been successfully implemented to control the voltage-frequency knobs. However, modern CMPs with a large range of operating frequencies and multiple voltage levels display nonlinear behavior in the relationship between frequency and power. State-of-the-art linear controllers therefore under-optimize DVFS operation. We propose a Gain Scheduled Controller (GSC) for nonlinear runtime power management of CMPs that simplifies the controller implementation of systems with varying dynamic properties by utilizing an adaptive control theoretic approach in conjunction with static linear controllers. Our design improves the accuracy of the controller over a static linear controller with minimal overhead. We implement our approach on an Exynos platform containing ARM's big.LITTLE-based heterogeneous multi-processor (HMP) and demonstrate that the system's response to changes in target power is improved by 2x while operating up to 12% more efficiently for tracking accuracy.
Traditional approaches for managing software-programmable memories (SPMs) do not support sharing of distributed on-chip memory resources and, consequently, miss the opportunity to better utilize those memory resources. Managing on-chip memory resources in many-core embedded systems with distributed SPMs requires runtime support to share memory resources between various threads with different memory demands running concurrently. Runtime SPM managers cannot rely on prior knowledge about the dynamically changing mix of threads that will execute and therefore should be designed in a way that enables SPM allocations for any unpredictable mix of threads contending for on-chip memory space. This article proposes ShaVe-ICE, an operating-system-level solution, along with hardware support, to virtualize and ultimately share SPM resources across a many-core embedded system to reduce the average memory latency. We present a number of simple allocation policies to improve performance and energy. Experimental results show that sharing SPMs could reduce the average execution time of the workload up to 19.5% and reduce the dynamic energy consumed in the memory subsystem up to 14%.
Heterogeneous Multiprocessors (HMPs) are becoming pervasive in current modern embedded platforms (e.g. mobile devices). These platforms often provide better power-performance tradeoffs than their homogeneous predecessors; however, novel and intelligent resource management policies are required to manage the added complexity of heterogeneous platforms and exploit their power-performance benefits. In this paper we propose PoliCym, a framework for the prototyping, validating, and deploying resource management policies for heterogeneous platforms. PoliCym provides two main benefits to resource management policy developers and to the research community: 1) a trace-based offline simulator allows policies to be quickly prototyped, debugged, and validated on top of arbitrary platform configurations; and 2) a light-weight sensing-actuation interface allows the same policies to be efficiently deployed on top of Linux-based systems without the need for implementation changes or additional development cycles. We evaluate our light-weight interface in terms of overhead and validate the PoliCym offline simulator for an ARM big.LITTLE based HMP platform running Linux.
Distributed Scratchpad Memories (SPMs) in embedded many-core systems require careful selection of data placement to achieve good performance. Applications mapped to these platforms have varying memory requirements based on their runtime behavior, resulting in under- or overutilization of the local SPMs. We propose SPMPool to share the available on-chip SPMs on many-cores among concurrently executing applications in order to reduce the overall memory access latency. By pooling SPM resources, we can assign underutilized memory resources, due to idle cores or low memory usage, to applications dynamically. SPMPool is the first workload-aware SPM mapping solution for many-cores that dynamically allocates data at runtime—using profiled data—to address the unpredictable set of concurrently executing applications. Our experiments on workloads with varying interapplication memory intensity show that SPMPool can achieve up to 76% reduction in memory access latency for configurations ranging from 16 to 256 cores, compared to the traditional approach that limits executing cores to use their local SPMs.
To meet the performance and energy efficiency demands of emerging complex and variable workloads, heterogeneous manycore architectures are increasingly being deployed, necessitating operating systems support for adaptive task allocation to efficiently exploit this heterogeneity in the face of unpredictable workloads. We present SPARTA, a throughput-aware runtime task allocation approach for Heterogeneous manycore Platforms (HMPs) to achieve energy efficiency. SPARTA collects sensor data to characterize tasks at runtime and uses this information to prioritize tasks when performing allocation in order to maximize energy-efficiency (instructions-per-Joule) without sacrificing performance. Our experimental results on heterogeneous manycore architectures executing mixes of MiBench and PARSEC benchmarks demonstrate energy reductions of up to 23% when compared to state-of-the-art alternatives. SPARTA is also scalable with low overhead, enabling energy savings in large-scale architectures with up to hundreds of cores.
Many multimedia applications exhibit a phasic behavior. Phasic behavior of applications has been studied primarily focused on code execution. However, temporal variation in an application's memory usage can deviate from its program behavior, providing opportunities to exploit these memory phases to enable more efficient use of on-chip memory resources. In this work, we define memory phases as opposed to program phases, and illustrate the potential disparity between them. We propose mechanisms for light-weight online memory-phase detection. Additionally, we demonstrate their utility by deploying these techniques for sharing distributed on-chip Scratchpad Memories (SPMs) in multi-core platforms. The information gathered during memory phases are used to prioritize different memory pages in a multi-core platform without having any prior knowledge about running applications. By exploiting memory-phasic behavior, we achieved up to 45% memory access latency improvement on a set of multimedia applications.