A System for Flexible Parallel Execution [abstract] (PDF)
Arun Raman
Ph.D. Thesis, Department of Electrical Engineering,
Princeton University, December 2011.
Exponential growth in transistor density combined
with diminishing returns from uniprocessor
improvements has compelled the industry to
transition to multicore architectures.
To realize the
performance potential of multicore architectures,
programs must be parallelized effectively. The
efficiency of parallel program execution depends
on the execution environment comprised of
workload, platform, and performance goal. In
writing parallel programs, most programmers and
compilers expose parallelism and optimize it to
meet a particular performance goal on a single
platform under an assumed set of workload
characteristics. In the field, changing workload
characteristics, new parallel platforms, and
deployments with different performance goals make
the programmer's or compiler's development-time or
compile-time choices suboptimal.
This dissertation presents Parcae, a generally applicable holistic system for
platform-wide dynamic parallelism tuning. Parcae includes:
Parallel programs made flexible by Parcae outperform original
parallel implementations in a variety of interesting scenarios.
1. the Nona compiler, which applies a variety of auto-parallelization techniques
to create flexible parallel programs whose tasks can be efficiently paused, reconfigured, and resumed during execution;
2. the Decima monitor, which measures resource availability and system performance to detect change in the environment; and
3. the Morta executor, which cuts short the life of executing tasks, replacing
them
with other functionally equivalent tasks better suited to the current environment.