Studies have shown memory needs vary significantly across applications. Recent work has explored using hybrid memory technology (SRAM+NVM) in on-chip memories of multicore processors (CMPs) to support the varied needs of diverse workloads. Such works suggest architectural modifications that require supplemental management in the memory hierarchy. Instead, we propose to deploy hybrid memory in a manner that integrates seamlessly with the existing heterogeneous multicore (HMP) architectural model, and therefore does not require any supplemental management, simply the integration of different memory technologies on-chip. We evaluate platforms with a combination of/ast (SRAM cache) and slow (STT-MRAM cache) core-types for mobile workloads.
Resource management strategies for many-core systems need to enable sharing of resources such as power, processing cores, and memory bandwidth while coordinating the priority and significance of system- and application-level objectives at runtime in a scalable and robust manner. State-of-the-art approaches use heuristics or machine learning for resource management, but unfortunately lack formalism in providing robustness against unexpected corner cases. While recent efforts deploy classical control-theoretic approaches with some guarantees and formalism, they lack scalability and autonomy to meet changing runtime goals. We present SPECTR, a new resource management approach for many-core systems that leverages formal supervisory control theory (SCT) to combine the strengths of classical control theory with state-of-the-art heuristic approaches to efficiently meet changing runtime goals. SPECTR is a scalable and robust control architecture and a systematic design flow for hierarchical control of many-core systems. SPECTR leverages SCT techniques such as gain scheduling to allow autonomy for individual controllers. It facilitates automatic synthesis of the high-level supervisory controller and its property verification. We implement SPECTR on an Exynos platform containing ARM»s big.LITTLE-based heterogeneous multi-processor (HMP) and demonstrate that SPECTR»s use of SCT is key to managing multiple interacting resources (e.g., chip power and processing cores) in the presence of competing objectives (e.g., satisfying QoS vs. power capping). The principles of SPECTR are easily applicable to any resource type and objective as long as the management problem can be modeled using dynamical systems theory (e.g., difference equations), discrete-event dynamic systems, or fuzzy dynamics.
This paper deals with challenges and possible solutions for incorporating self-awareness principles in EDA design flows for autonomous systems. We present a holistic approach that enables self-awareness across the software/hardware stack, from systems-on-chip to systems-of-systems (autonomous car) contexts. We use the Information Processing Factory (IPF) metaphor as an exemplar to show how self-awareness can be achieved across multiple abstraction levels, and discuss new research challenges. The IPF approach represents a paradigm shift in platform design by envisioning the move towards a consequent platform-centric design in which the combination of self-organizing learning and formal reactive methods guarantee the applicability of such cyber-physical systems in safety-critical and high-availability applications.
Dynamic voltage and frequency scaling (DVFS) is a well-established technique for power management of thermal-or energy-sensitive chip multiprocessors (CMPs). In this context, linear control theoretic solutions have been successfully implemented to control the voltage-frequency knobs. However, modern CMPs with a large range of operating frequencies and multiple voltage levels display nonlinear behavior in the relationship between frequency and power. State-of-the-art linear controllers therefore under-optimize DVFS operation. We propose a Gain Scheduled Controller (GSC) for nonlinear runtime power management of CMPs that simplifies the controller implementation of systems with varying dynamic properties by utilizing an adaptive control theoretic approach in conjunction with static linear controllers. Our design improves the accuracy of the controller over a static linear controller with minimal overhead. We implement our approach on an Exynos platform containing ARM's big.LITTLE-based heterogeneous multi-processor (HMP) and demonstrate that the system's response to changes in target power is improved by 2x while operating up to 12% more efficiently for tracking accuracy.
Traditional approaches for managing software-programmable memories (SPMs) do not support sharing of distributed on-chip memory resources and, consequently, miss the opportunity to better utilize those memory resources. Managing on-chip memory resources in many-core embedded systems with distributed SPMs requires runtime support to share memory resources between various threads with different memory demands running concurrently. Runtime SPM managers cannot rely on prior knowledge about the dynamically changing mix of threads that will execute and therefore should be designed in a way that enables SPM allocations for any unpredictable mix of threads contending for on-chip memory space. This article proposes ShaVe-ICE, an operating-system-level solution, along with hardware support, to virtualize and ultimately share SPM resources across a many-core embedded system to reduce the average memory latency. We present a number of simple allocation policies to improve performance and energy. Experimental results show that sharing SPMs could reduce the average execution time of the workload up to 19.5% and reduce the dynamic energy consumed in the memory subsystem up to 14%.
Heterogeneous Multiprocessors (HMPs) are becoming pervasive in current modern embedded platforms (e.g. mobile devices). These platforms often provide better power-performance tradeoffs than their homogeneous predecessors; however, novel and intelligent resource management policies are required to manage the added complexity of heterogeneous platforms and exploit their power-performance benefits. In this paper we propose PoliCym, a framework for the prototyping, validating, and deploying resource management policies for heterogeneous platforms. PoliCym provides two main benefits to resource management policy developers and to the research community: 1) a trace-based offline simulator allows policies to be quickly prototyped, debugged, and validated on top of arbitrary platform configurations; and 2) a light-weight sensing-actuation interface allows the same policies to be efficiently deployed on top of Linux-based systems without the need for implementation changes or additional development cycles. We evaluate our light-weight interface in terms of overhead and validate the PoliCym offline simulator for an ARM big.LITTLE based HMP platform running Linux.
Distributed Scratchpad Memories (SPMs) in embedded many-core systems require careful selection of data placement to achieve good performance. Applications mapped to these platforms have varying memory requirements based on their runtime behavior, resulting in under- or overutilization of the local SPMs. We propose SPMPool to share the available on-chip SPMs on many-cores among concurrently executing applications in order to reduce the overall memory access latency. By pooling SPM resources, we can assign underutilized memory resources, due to idle cores or low memory usage, to applications dynamically. SPMPool is the first workload-aware SPM mapping solution for many-cores that dynamically allocates data at runtime—using profiled data—to address the unpredictable set of concurrently executing applications. Our experiments on workloads with varying interapplication memory intensity show that SPMPool can achieve up to 76% reduction in memory access latency for configurations ranging from 16 to 256 cores, compared to the traditional approach that limits executing cores to use their local SPMs.
To meet the performance and energy efficiency demands of emerging complex and variable workloads, heterogeneous manycore architectures are increasingly being deployed, necessitating operating systems support for adaptive task allocation to efficiently exploit this heterogeneity in the face of unpredictable workloads. We present SPARTA, a throughput-aware runtime task allocation approach for Heterogeneous manycore Platforms (HMPs) to achieve energy efficiency. SPARTA collects sensor data to characterize tasks at runtime and uses this information to prioritize tasks when performing allocation in order to maximize energy-efficiency (instructions-per-Joule) without sacrificing performance. Our experimental results on heterogeneous manycore architectures executing mixes of MiBench and PARSEC benchmarks demonstrate energy reductions of up to 23% when compared to state-of-the-art alternatives. SPARTA is also scalable with low overhead, enabling energy savings in large-scale architectures with up to hundreds of cores.
Many multimedia applications exhibit a phasic behavior. Phasic behavior of applications has been studied primarily focused on code execution. However, temporal variation in an application's memory usage can deviate from its program behavior, providing opportunities to exploit these memory phases to enable more efficient use of on-chip memory resources. In this work, we define memory phases as opposed to program phases, and illustrate the potential disparity between them. We propose mechanisms for light-weight online memory-phase detection. Additionally, we demonstrate their utility by deploying these techniques for sharing distributed on-chip Scratchpad Memories (SPMs) in multi-core platforms. The information gathered during memory phases are used to prioritize different memory pages in a multi-core platform without having any prior knowledge about running applications. By exploiting memory-phasic behavior, we achieved up to 45% memory access latency improvement on a set of multimedia applications.