EC2 on Tuesday, July 20th

09:00‑10:00 Session 1
Location: IF G.07A
09:00 Keshav Pingali (UT Austin)
Invited talk: Parallel Programming Needs New Foundations

When parallel programming started in the 70's and 80's, it was mostly art: languages such as functional and logic programming languages were designed and appreciated mainly for their elegance and beauty. More recently, parallel programming has become engineering: imperative languages like FORTRAN and C++ have been extended with parallel constructs, and we now spend our time benchmarking and tweaking large programs no one understands to obtain performance improvements of 5-10%. In spite of all this activity, we have few insights into how to write parallel programs to exploit the performance potential of multicore processors.

In this talk, I will argue that these problems arise largely from the limitations of the program-centric abstractions like dependence graphs that we currently use to think about parallelism. I will then propose a novel data-centric abstraction called the operator formulation of algorithms, which reveals that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous in diverse applications ranging from mesh generation/refinement/partitioning to SAT solvers, maxflow algorithms, stencil computations and event-driven simulation. I will also show that the operator formulation can be used to perform a structural analysis of algorithms that can be exploited for efficient implementations of these algorithms. These considerations suggest that the operator formulation of algorithms might provide a useful foundation for a new theory of parallel programming.

10:30‑11:30 Session 2
Location: IF G.07A
10:30 Thomas Henzinger, Anmol Tomar, Vasu Singh, Thomas Wies and Damien Zufferey
(EC)^2 in EC2

Cloud computing aims to give users virtually unlimited pay-per-use computing resources without the burden of managing the underlying infrastructure. We claim that, in order to realize the full potential of cloud computing, the user must be presented with a pricing model that offers flexibility at the requirements level, such as a choice between different degrees of execution speed, and the cloud provider must be presented with a programming model that offers flexibility at the execution level, such as a choice between different scheduling policies. In this paper, we present a framework called FlexPRICE (Flexible Provisioning of Resources in a Cloud Environment) where for each job, the user purchases a virtual computer with the desired speed and cost characteristics, and the cloud provider optimizes the utilization of resources across a stream of jobs from different users.

10:50 Azadeh Farzan, Madhusudan Parthasarathy and Francesco Sorrentino
PENELOPE: Weaving Threads to Violate Atomicity

Testing concurrent programs is challenged by the interleaving explosion problem--- the problem of exploring the large number of interleavings of program, even under a single test input. Rather than try all interleavings, we propose to test wisely: to exercise only those schedules that lead to interleavings that are typical error patterns. In particular, in this paper we select schedules that exercise patterns of interaction that correspond to atomicity violations. Given an execution of a program under a test harness, our technique is to algorithmically mine from the execution a small set of alternate schedules that cause atomicity violations. The program is then re-executed under these predicted atomicity-violating schedules, and verified by the test harness. The salient feature of our tool is the efficient algorithmic prediction and synthesis of alternate schedules that cover all possible atomicity violations at program locations. We implement the tool Penelope that realizes this testing framework and show that the monitoring, prediction, and rescheduling (with precise repro) are efficient and effective in finding bugs related to atomicity violations.

11:10 Khilan Gudka and Susan Eisenbach
Fast Multi-Level Locks for Java

Atomic sections guarantee atomic and isolated execution of a block of code. Transactional Memory can be used to implement them but suffers from the inability to support system calls and has high overhead. Lock inference is a pessimistic alternative that infers the locks necessary to prevent thread interference. Our research looks at lock inference techniques for Java programs.

An important aspect of the performance of a lock inference approach is the efficiency of its runtime locks. In this paper, we describe an implementation of the multi-level locks of Gray et al using Java's Synchronizer framework and present some preliminary performance results for a number of workloads that perform a varying proportion of fine-grained and coarse-grained operations. We compare our lock implementation against Java's ReentrantReadWriteLock and the STM algorithms TL2 and LSA.

For fine-grained workloads, we show that multi-level locks perform similarly to ReentrantReadWriteLock but in workloads that mix fine-grained and coarse-grained data operations, they achieve better performance (upto 11x against ReentrantReadWriteLock and 3x against STM).

11:30‑12:30 Session 3
Location: IF G.07A
11:30 Challenge Problems for Concurrency Verification.
14:00‑15:00 Session 4
Location: IF G.07A
14:00 Tim Harris (Microsoft (Cambridge))
Invited Talk: Programming models for the Barrelfish multi-kernel operating system

Barrelfish is a new research operating system being built from scratch in a collaboration between ETH Zurich in Switzerland and Microsoft Research Cambridge in the UK. We are exploring how to structure an OS for future multi-core systems. We are motivated by two closely related trends in hardware design: first, the rapidly growing number of cores, which leads to a scalability challenge, and, second, the increasing diversity in computer hardware, requiring the OS to manage and exploit heterogeneous hardware resources.

As part of the project we are revisiting the interface between applications and the operating system, in terms of how applications invoke system services, in terms of how applications express their resource requirements to the system, and in terms of how the system decides how to allocate cores and resources to the diverse mix of software running on a multi-core desktop system.

In this talk I'll introduce the system architecture that we're exploring with Barrelfish, and I'll discuss some of the challenges and opportunities it offers in terms of how programmers write efficient, correct code.

15:30‑16:30 Session 5
Location: IF G.07A
15:30 Konrad Siek and PaweĊ‚ Wojciechowski
Statically Computing Upper Bounds on Object Calls for Pessimistic Concurrency Control

Atomic RMI is a library for pessimistic concurrency control in distributed systems that uses versioning algorithms. These algorithms require a priori knowledge about execution of transactions to achieve maximum efficiency, i.e. the maximum number of times specific objects will be accessed. This paper presents an algorithm and a tool that establishes these upper bounds on objects accessed through static program analysis.

15:50 Chao Wang, Malay Ganai and Aarti Gupta
Symbolic Predictive Analysis to Expose Concurrency Errors at Runtime

Due to the inherent nondeterminism in thread scheduling, executing the same concurrent program may lead to different behaviors. This poses a significant challenge in testing---merely running the same test multiple times does not always increase coverage, and due to the often astronomically large number of interleavings, it is practically infeasible to inspect all of them. In this position paper, we advocate a selective search (rather than random or exhaustive search) in which a small subset of high-risk interleavings are selected for error checking. The selection leverages our recent work on symbolic predictive analysis, considering an interleaving as high-risk if it breaks some implicit assumptions regarding concurrency control.

16:10 Andras Salamon and Vashti Galpin
Performance loss between concept and keyboard

Standards bodies and commercial software vendors have defined parallel constructs to harness the parallelism in computations. Using the task graph model of parallel program execution, we show how common programming constructs that impose series-parallel task dependencies can lead to unbounded slowdown compared to the inherent parallelism in the algorithm. We describe various ways in which this slowdown can be avoided.

16:30‑17:00 Session 6
Location: IF G.07A
16:30 Q-and-A session with all the speakers