Calendar

Feb
7
Sat
HPDSLs: Scala, LMS and Delite for High-­Performance DSLs and Program Generators @ D
Feb 7 @ 8:30 am – 12:00 pm

This tutorial is targeted at researchers and practitioners interested in building efficient domain specific languages (DSLs) and program generators. Lightweight Modular Staging (LMS) is a pragmatic approach to runtime code generation in Scala, and Delite is a compiler framework for embedded DSLs that simplifies the process of implementing DSLs for parallel computation and heterogeneous targets. This tutorial provides an overview of the technology stack, demonstrates use-­cases where it has been successfully applied and guides the attendees step-­by-­step through creation of simple generators and DSLs.

LLVM: An Intro to LLVM: IR, optimizations, backends and more @ San Ramon
Feb 7 @ 8:30 am – 5:30 pm

Topic Overview

  • High-level overview of LLVM & Clang
    • Will include how to get started coding on LLVM & Clang
    • Overview of core design elements, data structures, APIs, and patterns used in the codebase
    • High-level testing strategy for LLVM & Clang using tools like Clang’s ‘-verify’, opt, llc, FileCheck, and GoogleTest
    • Process of submitting a patch, code review, and community interactions
  • How to add an optimization pass to LLVM
    • Tutorial on the LLVM IR both in the abstract and at the level of internal APIs
    • Basic APIs and data structures needed to implement, test, and wire a new pass into the compiler.
    • Overview of the relationship between transform and analysis passes.
    • Overview of the different kinds of transformation passes, how they interact, and what they can and can’t do
    • Actually add a transformation pass and an analysis pass to the compiler that depend on each other and exercise this machinery.
      • Includes authoring relevant tests for each component
  • High-level overview of the architecture of an LLVM backend, with an emphasis on modifying or enhancing existing backends rather than adding a new one
    • Detailed review of where things are: from SelectionDAG to FastISel to the register allocator
    • Detailed review of exactly how a backend’s tablegen works, and how to make changes there and debug things
  • Add a target-independent SelectionDAG combine to the code generator
    • Include detailed walk through of the relevant DAG combine interfaces.
  • Add a target-specific DAG combine with special consideration of legalization
  • Add support for a new instruction pattern to a backend
  • Every bit of performance matters, and how the LLVM coding standard helps here
Lunch @ New World Cafe
Feb 7 @ 12:00 pm – 2:00 pm

Salad

Assorted Mixed Greens with Poached Pear, Sweet Onion Mustard Dressing on the side.

Entrées

Chicken Breast with Mushrooms topped with Cream Sauce.

Salmon with Capers topped with Lemon-Butter Sauce.

Quinoa Comfit

Veggie Moussaka

Dessert

Strawberry or Chocolate Mousse

Halide: Code generation for image processing and stencil computation in Halide @ A
Feb 7 @ 2:00 pm – 5:30 pm

This workshop will cover design and implementation of Halide, a domain-specific language and compiler for image processing and stencil computation, for people interested in using and building on it as a highly configurable code generator. As a language now in widespread production use, Halide is an interesting and high-impact platform for research on program transformation and code generation; as a language with explicit algebraic control over a wide range of loop synthesis and code generation strategies, it is a powerful backend for other languages and systems, especially those including stencil computation.

Topics:

  • The Halide programming model
  • Halide’s model of scheduling for loop synthesis
  • Examples of program transformation and synthesis via Halide schedules
  • Code generation in Halide
  • Mapping to the GPU and heterogeneous parallel execution via Halide schedules
  • Hands-on session with Halide, focussed on scheduling and code generation
Periscope: Code Auto-Tuning with the Periscope Tuning Framework @ B
Feb 7 @ 2:00 pm – 5:30 pm

In this tutorial, the attendees will have the opportunity to delve into the topic of application auto-tuning, presented by developers and performance engineers from the Auto-Tune project. This tutorial will provide a practical perspective to auto-tuning, exemplifying with use cases how to best harness and tailor performance analysers to tune real applications.

Feb
8
Sun
Altera: Compiling OpenCL to a streaming dataflow architecture on FPGAs @ Irvine
Feb 8 @ 8:30 am – 12:00 pm

In recent years, Field-Programmable Gate Arrays have become extremely powerful computational platforms that can efficiently solve many complex problems. Modern FPGAs comprise effectively millions of programmable elements, signal processing elements and high-speed interfaces, all of which are necessary to deliver a complete solution. The power of FPGAs is unlocked via low-level programming languages such as VHDL and Verilog, which allow designers to explicitly specify the behavior of each programmable element. While these languages provide a means to create highly efficient logic circuits, they are akin to “assembly language” programming for modern processors. This is a serious limiting factor for both productivity and the adoption of FPGAs on a wider scale.

In this tutorial, we use the OpenCL language to explore techniques that allow us to program FPGAs at a level of abstraction closer to traditional software-centric approaches. OpenCL is an industry standard parallel language based on ‘C’ that offers numerous advantages that enable designers to take full advantage of the capabilities offered by FPGAs, while providing a high-level design entry language that is familiar to a wide range of programmers.

The challenge of mapping a ‘C’ based language to FPGAs is that these languages all have implicit assumptions that the underlying architecture executing these programs is a processor based architecture. Processors are characterized by a sequence of instructions that control a datapath that manipulates data values stored in a memory. Conversely, FPGA architectures are more suited to implementing spatial computing circuits where data flows in a pipelined fashion from one functional unit to the next until computations are complete. Data can be transferred efficiently by wires, registers or FIFOs without always resorting to external storage. This tutorial will explore compiler optimizations and code generation techniques that can transform sequential programs into efficient streaming dataflow circuits for FPGAs. We will examine specific case studies of DSP filters, image processing and mathematical computations to demonstrate how these techniques can be applied to real world examples.

OpenTuner: Autotuning programs with OpenTuner @ G
Feb 8 @ 8:30 am – 12:00 pm

This tutorial will cover the usage of OpenTuner, a open source framework for building domain-specific multi-objective program autotuners. OpenTuner supports fully customizable configuration representations, an extensible technique representation to allow for domain-specific techniques, and an easy to use interface for communicating with the tuned program. A key capability inside OpenTuner is the use of ensembles of disparate search techniques simultaneously. Techniques which perform well will receive larger testing budgets and techniques which perform poorly will be disabled. OpenTuner has been used by a number of different projects to build domain specific autotuners.

The topics covered in the workshop will be:

  • Overview of autotuning: including a history of past autotuning projects and how autotuning is used today
  • Machine learning primer: empirical search, model based techniques, and which technique is right for you
  • OpenTuner framework: how is it designed and how you should use it
  • Examples of using opentuner: presentations by current users of opentuner
  • What makes a good search space representation: the secret sauce of autotuning
  • How to go about autotuning your system with OpenTuner
  • Hands-on session with OpenTuner
Using Pin++ To Author Highly Configurable Pintools for the Pin @ A
Feb 8 @ 8:30 am – 12:00 am

This tutorial will discuss an open-source framework for creating Pintools, which are analysis tools for the dynamic binary instrumentation tool named Pin, named Pin++. Pin++ is an object-oriented framework that uses template meta-programming to implement Pintools. The goal of Pin++ is to simplify programming a Pintool and promote reuse of its components across different Pintools. Our results show that Pintools implemented using Pin++ can have a 54% reduction in complexity, increase in its modularity, and up to 60% reduction in instrumentation overhead.

This tutorial will focus on the following key concepts in Pin++:

  • It will discuss the challenges of implement a Pintool using the traditional approach.
  • It will discuss how Pin++ addresses existing challenges when authoring Pintools.
  • Using hands-on examples, it will discuss how to implement basic Pintools using Pin++ so the audience can begin exploring how to apply Pin++ to their existing problems.
Lunch @ New World Cafe
Feb 8 @ 12:00 pm – 2:00 pm

Salad

Classic Caesar Salad with Dressing on the side.

Entrées

Chicken Piccata

Seafood Kebab with Salsa Fresca

Tomatoes alla Parmigiana

Veggie Lasagna

Steamed Rice

Dessert

Tiramisu

DynamoRIO: Building Dynamic Tools with DynamoRIO on x86 and ARM @ A
Feb 8 @ 2:00 pm – 5:30 pm

This tutorial will present the DynamoRIO tool platform and describe how to use its API to build custom tools that utilize dynamic code manipulation for instrumentation, profiling, analysis, optimization, introspection, security, and more. The DynamoRIO tool platform was first released to the public in June 2002 and has since been used by many researchers to develop systems ranging from taint tracking to prefetch optimization. DynamoRIO is publicly available in open source form and targets Windows, Linux, and Mac on x86 and Linux on ARM.

The tutorial will cover the following topics:

  • DynamoRIO API: an overview of the full range of DynamoRIO’s powerful API, which abstracts away the details of the underlying infrastructure and allows the tool builder to concentrate on analyzing or modifying the application’s runtime code stream. It includes both high-level features for quick prototyping and low-level features for full control over instrumentation.
  • DynamoRIO system overview: a brief description of how DynamoRIO works under the covers.
  • Description of tools provided with the DynamoRIO package, including the Dr. Memory memory debugging tool, the DrCov code coverage tool, and the DrStrace Windows system call tracing tool.
  • Sample tool starting points for building new tools
  • Advanced topics when building sophisticated tools
Graal: A research platform for dynamic compilation and managed languages @ B
Feb 8 @ 2:00 pm – 5:30 pm

The tutorial will cover the following topics:

  • Graal: a new high-performance dynamic compiler for Java written in Java
  • Introduction to the Graal intermediate representation, and how it simplifies speculative optimizations
  • Graal API: Separation of the compiler from the VM
  • Snippets: expressing high-level semantics in low-level Java code
  • Integration of the compiler with an application/library – and how that can help your research project.
  • Using Graal for static analysis
  • Graal as a compiler for dynamic programming languages
  • Project Sumatra: Compiling for the GPU
Welcome Reception and ACM Student Research Competition Posters
Feb 8 @ 6:00 pm – 9:00 pm

Graduate Category

Event-Flow Graphs for Efficient Path-Sensitive Analyses
Ahmed Tamrawi (Iowa State University)

Intelligent Heuristic Construction with Active Learning
William Ogilvie (University of Edinburgh)

An Intermediate Language for DSLs Providing Support for Automatic Optimization and OpenCL Code Generation
Riyadh Baghdadi (Inria and KU Leuven)

Employing Code Generators as De-code Generators: A Novel Approach for Assembly to IR Translation
Niranjan Hasabnis (Stony Brook University)

Reducing Memory Buffering Overhead in Software Thread-Level Speculation
Zhen Cao (McGill)

Bitwidth Analysis and Optimization Using Dynamic Compilation Strategies
Kirshanthan Sundararajah (University of Moratuwa, Sri Lanka)

Undergraduate Category

Auto-tuning the HotSpot JVM
Tharindu Rusira, Milinda Fernando, Chalitha Perera, and Chamara Philips (University of Moratuwa, Sri Lanka)

Feb
9
Mon
Conference Opening
Feb 9 @ 8:30 am – 8:50 am
Session 1: GPU Optimization
Feb 9 @ 10:20 am – 12:00 pm

Improving GPGPU Energy-Efficiency through Concurrent Kernel Execution and DVFS
Qing Jiao (National University of Singapore), Mian Lu and Huynh Phung Huynh (Institute of High Performance Computing, A*STAR, Singapore), and Tulika Mitra (National University of Singapore)

Characterizing and Enhancing Global Memory Data Coalescing on GPUs
Naznin Fauzia, Louis-Noel Pouchet, and P Sadayappan (The Ohio State University, Columbus)

Automatic Data Placement into GPU On-chip Memory Resources
Chao Li (North Carolina State University), Yi Yang (NEC labs), and Zhen Lin and Huiyang Zhou (North Carolina State University)

Session 2: Tools, Debugging, and Techniques
Feb 9 @ 1:30 pm – 2:45 pm

A Parallel Abstract Interpreter for JavaScript
Kyle Dewey, Vineeth Kashyap, and Ben Hardekopf (University of California, Santa Barbara)

On Performance Debugging of Unnecessary Lock Contentions on Multicore Processors: A Replay-based Approach
Long Zheng and Xiaofei Liao (Huazhong University of Science and Technology, China), Bingsheng He (Nanyang Technological University, Singapore), and Song Wu and Hai Jin (Huazhong University of Science and Technology, China)

Reactive Tiling
Jithendra Srinivas (Intel), Wei Ding, and Mahmut Kandemir (Penn State)

Session 3: Best Paper Session
Feb 9 @ 3:10 pm – 4:50 pm

Approximating Flow-Sensitive Pointer Analysis Using Frequent Itemset Mining
Vaivaswatha Nagaraj and R. Govindarajan (Indian Institute of Science, Bangalore)

HELIX-­UP: Relaxing Program Semantics to Unleash Parallelization
Simone Campanoni, Glenn Holloway, Gu-Yeon Wei, and David Brooks (Harvard University)

HERMES: A Fast Cross-ISA Binary Translator with Post-Optimization
Xiaochun Zhang (Institute of Computing Technology, Chinese Academy of Science), Qi Guo (Carnegie Mellon University), and Yunji Chen, Tianshi Chen, and Weiwu Hu (Institute of Computing Technology, Chinese Academy of Science)

Locality-Centric Thread Scheduling for Bulk-synchronous Programming Models on CPU Architectures
Hee-Seok Kim and Izzat El Hajj (University of Illinois at Urbana-Champaign), John Stratton (MulticoreWare Inc.), and Steven Lumetta and Wen-mei Hwu (University of Illinois at Urbana-Champaign)

Session 4a: Artifact Evaluation Discussion (Joint with PPoPP)
Feb 9 @ 5:15 pm – 5:45 pm
Session 4b: ACM Student Research Competition Presentations
Feb 9 @ 5:15 pm – 6:15 pm
Business Meeting
Feb 9 @ 7:00 pm – 8:00 pm
Feb
10
Tue
Session 5: Microarchitecture
Feb 10 @ 8:25 am – 9:40 am

Branch Prediction and the Performance of Interpreters – Don’t Trust Folklore
Erven Rohou, Bharath Narasimha Swamy, and André Seznec (Inria, France)

Optimizing the flash-RAM energy trade-off in deeply embedded systems
James Pallister, Kerstin Eder, and Simon J. Hollis (University of Bristol)

EMEURO: A Framework for Generating Multi-Purpose Accelerators via Deep Learning
Lawrence McAfee and Kunle Olukotun (Stanford University)

Session 6: Parallelism and Concurrency
Feb 10 @ 10:05 am – 11:20 am

Optimizing and Auto-Tuning Scale-Free Sparse Matrix-Vector Multiplication on Intel Xeon Phi
Wai Teng Tang (Institute of High Performance Computing, A*STAR, Singapore), Ruizhe Zhao (Peking University, China), Mian Lu (Institute of High Performance Computing, A*STAR, Singapore), Yun Liang (Peking University, China), Huynh Phung Huynh (Institute of High Performance Computing, A*STAR, Singapore), Xibai Li (Peking University, China), and Rick Siow Mong Goh (Institute of High Performance Computing, A*STAR, Singapore)

Data Provenance Tracking for Concurrent Programs
Brandon Lucia (Carnegie Mellon University) and Luis Ceze (University of Washington)

Locality Aware Concurrent Start for Stencil Applications
Sunil Shrestha (University of Delaware), Joseph Manzano, Andres Marquez, and John Feo (Pacific Northwest National Laboratory), and Guang R. Gao (University of Delaware)

Session 7: Code Generation and Optimization
Feb 10 @ 2:45 pm – 4:00 pm

Getting in Control of Your Control Flow with Control-Data Isolation
William Arthur (University of Michigan), Ben Mehne (University of California – Berkeley), and Reetuparna Das and Todd Austin (University of Michigan)

Checking Correctness of Code Generator Architecture Specifications
Niranjan Hasabnis, R. Sekar, and Rui Qiao (Stony Brook University)

Snapshot-based Loading-Time Acceleration for Web Applications
JinSeok Oh and Soo-Mook Moon (Seoul National University)

Excursion: Beach Blanket Babylon @ Club Fugazi
Feb 10 @ 4:00 pm – 10:15 pm

beach_babylon

We will be attending a private showing of Beach Blanket Babylon from 5:45 pm – 7:15 pm along with PPoPP.

After the show you will have time for dinner on your own with colleagues and new friends.

Transportation

Buses will leave the Marriott at 4:10 pm and return at 7:15 pm and 10:15 pm.

If you wish to return via public transporation, you can do so via a combination of walking, trolley and BART in around one and half hours.

Dining

The North Beach area of San Francisco is known for its Italian heritage.

Here is a link to great pizza places on Yelp and a list of restaraunts close to the theatre.

Bocce Café
478 Green @ Grant
(415) 981-2044
www.boccecafe.com
$$ ITALIAN
Until 10:30 pm
Distance to theatre: 2 blocks
Calzone’s
430 Columbus near Green
(415) 397-3600
www.calzonesf.com
$$ ITALIAN
Until 1 am
Distance to theatre: 1 1/2 blocks
Capp’s Corner
1600 Powell St. @ Green
(415) 989-2589
www.cappscorner.com
$$ ITALIAN
Until 10:30 pm
Distance to Theatre: 1/4 Block
Park Tavern
1652 Stockton St. near Filbert
(415) 989-7300
www.parktavernsf.com
$$$ NEW AMERICAN
Until 10 pm
Distance to theatre: 2 1/2 blocks
Piazza Pellegrini
659 Columbus @ Powell
(415) 397-7355
www.piazzapellegrini.com
$$ ITALIAN
Until 10 pm
Distance to theatre: 2 1/2 blocks
Trattoria Pinocchio
401 Columbus Ave. @ Vallejo
(415) 392-1472
www.trattoriapinocchio.com
$$ ITALIAN
Until 11 pm
Distance to theatre: 2 blocks
Antologia Vinoteca
515 Broadway @ Columbus
(415) 274-8423
www.antologiasf.com
LATIN AMERICAN WINE BAR
Tapas (no full meals)
Until midnight
Distance to theatre: 3 1/2 blocks
Feb
11
Wed
Session 8: Static Program Analysis and Optimization
Feb 11 @ 9:40 am – 10:55 am

PSLP: Padded SLP Automatic Vectorization
Vasileios Porpodas (University of Cambridge), Alberto Magni (University of Edinburgh), and Timothy M. Jones (University of Cambridge)

A Graph-Based Higher-Order Intermediate Representation
Roland Leißa, Marcel Köster, and Sebastian Hack (Saarland University)

Scalable Conditional Induction Variable (CIV) Analysis
Cosmin E. Oancea (University of Copenhagen) and Lawrence Rauchwerger (Texas A&M University)

Session 9: Runtime Optimization and Techniques
Feb 11 @ 11:15 am – 12:05 pm

Optimizing Binary Translation for Dynamically Generated Code
Byron Hawkins and Brian Demsky (University of California, Irvine) and Derek Bruening and Qin Zhao (Google, Inc.)

MemorySanitizer: fast detector of uninitialized memory use in C++
Evgeniy Stepanov and Konstantin Serebryany (Google)

Awards and Closing
Feb 11 @ 12:05 pm – 12:20 pm