Static and Dynamic Instruction Mapping for Spatial Architectures [abstract] (PDF)
Feng Liu
Ph.D. Thesis, Department of Electrical Engineering, Princeton University, 2018.

In response to the technology scaling trends, spatial architectures have emerged as a new style of processors for executing programs more efficiently. Unlike traditional out-of-order (OoO) processors, which time-share a small set of functional units, a spatial computer is composed of hundreds or even thousands of simple and replicated functional units. Spatial architectures avoid the overheads of time-sharing and of generating schedules repeatedly, by mapping instruction sequences onto the functional units explicitly and reusing the map- ping across multiple invocations.

Currently, spatial architectures mainly use static methods to map and schedule instructions onto the arrays of functional units. The existing methods have several limitations: First, for programs with irregular memory accesses and control flows, they yield poor performance because the functional units need to be invoked sequentially to respect data and control dependences. Second, static methods cannot fully exploit speculation techniques, which are the dominant performance sources in OoO processors. Finally, static methods cannot adapt to changing workloads and are not compatible across hardware generations.

To address these issues and improve the applicability of spatial architectures, this dissertation proposes two techniques. The first, Coarse-Grained Pipelined Accelerators (CGPA), is a static compiling framework that exploits the hidden parallelism within irregular C/C++ loops and translates them into spatial hardware modules. The proposed technique has been implemented as a compiler pass and the experiment shows 3.3x speedup over the performance achieved by an open-source tool baseline.

The second technique, Dynamic Spatial Architecture Mapping (DYNASPAM), reuses the speculation system in the OoO processors to dynamically produce high performance scheduling and execution on a dedicated spatial fabric. The proposed technique is modeled by a cycle accurate simulator and the experiment shows the new technique can achieve 1.4x geomean performance improvement and 23.9% energy consumption reduction, compared to an aggressive OoO processor baseline.