Projects

Software that I have made significant contributions to:

IBM Streams Product Development

Most of my research on streaming systems uses IBM Streams. But not all of my work on Streams ends up being published research; some ends up as product features. In addition to many smaller features and bug fixes, I was a lead on the following features for IBM Streams:

  • User-Defined Parallelism: Adapted from our work on auto-parallelism (automatically parallelizing operators in a stream graph as a compiler optimization), user-defined parallelism is so-called because users opt-in to it through an annotation in the language. Similar to our research work, parallel expansion happens at submission time. This was the first time for the product that there was a distinction between the logical graph (what the programmer wrote, pre-expansion) and the physical graph (what was executed, post-expansion).
  • Dynamic UDP: Allows users to change the degree of parallelism of a parallel region at runtime. This was the first time that the product allowed runtime changes to a graph. Computing the sets of added, removed and changed operators was a challenge that required several false-starts.
  • Threading models: Allowing users to choose the threading model for sections of their application was a direct consequence of our research on dynamic and elastic scheduling. That research work went directly into the product, but it also required us to implement a full language annotation that allowed programmers to choose between the available thread scheduling options.
  • Exception ports: Users could already specify the behavior they wanted when one of their operators encountered an exception using a catch annotation. The addition of exception ports allowed users to request that when exceptions happen, tuples with messages that contain the text from that exception are sent to specially generated output ports. This capability is general, and can be applied to any kind of operator. (Note that this feature is in the product but not yet documented.)

Sliding Window Aggregators

An aggregation is a computation that takes a set of data and provides a summary. For example, an average is a common aggregation. In a streaming context, aggregations are usually performed on constantly changing windows. A window is a finite set of data from an infinite stream. A sliding window is one that incrementally adds data as it arrives, causing older data to age-out. As new data arrives and old data ages-out, we need to re-compute our aggregations. Doing these re-computations efficiently is an area of active research. This repo contains reference implementations of sliding window aggregators from papers I am a co-author on, as well as from other researchers.

source: github.com/ibm/sliding-window-aggregators

Papers

Cellgen

The main software artifact of my Ph.D. dissertation work was a source-to-source compiler for shared-memory abstractions on the Cell processor. Cellgen would accept OpenMP-like source code, and produce high-performance code that would execute on the SPEs (special vector processors which were divorced from the main memory hierarchy). Doing this required significant static analysis of the user code to determine how data was being used in order to generate the right data transfers to and from the SPE to maintain both performance and correctness.

source: github.com/scotts/cellgen

Papers

Streamflow

A scalable memory allocator which is a drop-in replacement for standard memory allocation in C and C++ programs by overriding malloc() and free(). Streamflow avoids synchronization across threads as much as possible, and only uses lock-free synchronization when it is unavoidable. The common allocation path avoids synchronization by using per-thread private heaps, and a lock-free remote free list. If a thread needs to allocate memory, and it has sufficient free memory to satisfy the request, it only needs to touch data structures that it owns. If a thread frees memory that it allocated, it also only needs to touch data structures that it owns. The tricky case is when a thread needs to free memory that a different thread allocated. That memory gets added to the allocating thread’s remote free list in a lock-free manner—but the allocating thread will only touch that list when it does not have sufficient free memory.

Hence, even under high allocation pressure with threads using memory allocated by other threads, threads are unlikely to interfere with each other.

source: github.com/scotts/streamflow

As part of our evaluation of Streamflow, we implemented Maged Michael’s lock-free memory allocator as described in his PLDI 2004 paper Scalable Lock-Free Dynamic Memory Allocation. Our implementation has been tested on Linux with x86 and PowerPC architectures.

source: github.com/scotts/michael

Papers

Factory

The main software artifact of my Master’s thesis work was a C++ framework for task and data-parallelism. The semantics were heavily inspired by Cilk, although our runtime was implemented purely as a library. Intel implemented a similar idea with Thread Building Blocks, although their semantics are both richer and built up to a higher level of abstraction.

source: factory.tar.gz

Papers