Scalable Speculative Parallelization on Commodity Clusters [abstract] (ACM DL, PDF)
Hanjun Kim, Arun Raman, Feng Liu, Jae W. Lee, and David I. August
Proceedings of the 43rd IEEE/ACM International Symposium on
Microarchitecture (MICRO), December 2010.
Highest ranked paper in double-blind review process.
While clusters of commodity servers and switches are the most
popular form of large-scale parallel computers, many programs are
not easily parallelized for execution upon them. In particular,
high inter-node communication cost and lack of globally shared
memory appear to make clusters suitable only for server applications
with abundant task-level parallelism and scientific applications
with regular and independent units of work. Clever use of pipeline
parallelism (DSWP), thread-level speculation (TLS), and speculative
pipeline parallelism (Spec-DSWP) can mitigate the costs of
inter-thread communication on shared memory multicore machines.
This paper presents Distributed Software Multi-threaded
Transactional memory (DSMTX), a runtime system which makes these
techniques applicable to non-shared memory clusters, allowing them
to efficiently address inter-node communication costs. Initial
results suggest that DSMTX enables efficient cluster execution of a
wider set of application types. For 11 sequential C programs
parallelized for a 4-core 32-node (128 total core) cluster without
shared memory, DSMTX achieves a geomean speedup of 49x. This
compares favorably to the 15x speedup achieved by our
implementation of TLS-only support for clusters.