A System for Flexible Parallel Execution [abstract] (PDF)
Arun Raman
Ph.D. Thesis, Department of Electrical Engineering, Princeton University, December 2011.

Exponential growth in transistor density combined with diminishing returns from uniprocessor improvements has compelled the industry to transition to multicore architectures. To realize the performance potential of multicore architectures, programs must be parallelized effectively. The efficiency of parallel program execution depends on the execution environment comprised of workload, platform, and performance goal. In writing parallel programs, most programmers and compilers expose parallelism and optimize it to meet a particular performance goal on a single platform under an assumed set of workload characteristics. In the field, changing workload characteristics, new parallel platforms, and deployments with different performance goals make the programmer's or compiler's development-time or compile-time choices suboptimal.

This dissertation presents Parcae, a generally applicable holistic system for platform-wide dynamic parallelism tuning. Parcae includes:
1. the Nona compiler, which applies a variety of auto-parallelization techniques to create flexible parallel programs whose tasks can be efficiently paused, reconfigured, and resumed during execution;
2. the Decima monitor, which measures resource availability and system performance to detect change in the environment; and
3. the Morta executor, which cuts short the life of executing tasks, replacing them with other functionally equivalent tasks better suited to the current environment.

Parallel programs made flexible by Parcae outperform original parallel implementations in a variety of interesting scenarios.