Projects

Cellgen

The main software artifact of my Ph.D. dissertation work was a source-to-source compiler for shared-memory abstractions on the Cell processor. Cellgen would accept OpenMP-like source code, and produce high-performance code that would execute on the SPEs (special vector processors which were divorced from the main memory hierarchy). Doing this required significant static analysis of the user code to determine how data was being used in order to generate the right data transfers to and from the SPE to maintain both performance and correctness.

source: github.com/scotts/cellgen

Papers

Streamflow

A scalable memory allocator which is a drop-in replacement for standard memory allocation in C and C++ programs by overriding malloc() and free(). Streamflow avoids synchronization across threads as much as possible, and only uses lock-free synchronization when it is unavoidable. The common allocation path avoids synchronization by using per-thread private heaps, and a lock-free remote free list. If a thread needs to allocate memory, and it has sufficient free memory to satisfy the request, it only needs to touch data structures that it owns. If a thread frees memory that it allocated, it also only needs to touch data structures that it owns. The tricky case is when a thread needs to free memory that a different thread allocated. That memory gets added to the allocating thread’s remote free list in a lock-free manner—but the allocating thread will only touch that list when it does not have sufficient free memory.

Hence, even under high allocation pressure with threads using memory allocated by other threads, threads are unlikely to interfere with each other.

source: github.com/scotts/streamflow

As part of our evaluation of Streamflow, we implemented Maged Michael’s lock-free memory allocator as described in his PLDI 2004 paper Scalable Lock-Free Dynamic Memory Allocation. Our implementation has been tested on Linux with x86 and PowerPC architectures.

source: github.com/scotts/michael

Papers

Factory

The main software artifact of my Master’s thesis work was a C++ framework for task and data-parallelism. The semantics were heavily inspired by Cilk, although our runtime was implemented purely as a library. Intel implemented a similar idea with Thread Building Blocks, although their semantics are both richer and built up to a higher level of abstraction.

source: factory.tar.gz

Papers