Programme

Mar
12
Sat
2016
Real World Domain Specific Languages (RWDSL) @ BNC a
Mar 12 @ 9:00 am – 5:30 pm

As the use of computers proliferates, the complexity and variety of systems continues to grow. As a result, it is becoming increasingly inflexible to “hard wire” behaviours into software. Software developers can enable more control over their software configurations by exploiting Domain Specific Languages (DSLs). Such DSLs provide a systematic way to structure the underlying computational components: to coin a phrase, a DSL is a library with syntax. There is an enormous variety of DSLs for a very wide range of domains. Most DSLs are highly idiosyncratic, reflecting both the specific natures of their application domains and their designers’ own preferences. This workshop will bring together constructors of DSLs for “real world” domains; that is, DSLs intended primarily to aid in building software to solve real world problems rather than to explore the more theoretical aspects of language design and implementation. We are looking for submissions that present the motivation, design, implementation, use and evaluation of such DSLs.

ACM have accepted our application for publishing the proceedings from the workshop. Submissions will be published in the ACM Digital Library within its International Conference proceedings Series.

Program Transformation for Programmability in Heterogeneous Architectures (PROHA) @ BNC B
Mar 12 @ 9:00 am – 12:00 pm

Adapting code initially written in a “neutral” algorithmic style to be executed in heterogeneous architectures (featuring e.g. GPGPUs, FPGAs), and later maintaining it, is a difficult and error-prone task.  It requires knowledge about the programming model of the destination architecture, about what the original code does, and about the execution environment.  The situation is even worse when the same code needs to run in different platforms or when different sections of the same application ought to run (for, e.g., time or resource optimization purposes) in different architectures.  Assistance in (and, if possible, automation of) the process of code adaptation is of course advantageous and needs knowledge and reasoning capabilities similar to those that human programmers have.  This workshop will focus on techniques and foundations to make it possible to perform source-to-source code transformations which preserve the intended semantics of the original code aiming at producing code which is better suited to be executed in different target architectures.

Architectures and Systems for Real-time Mobile Vision applications (ASR-MOV) @ Monjuic
Mar 12 @ 2:00 pm – 5:30 pm

The increased processing capability of mobile and embedded platforms is enabling more and more ambitious machine vision applications.  Industry players are actively pushing embedded vision in the entertainment, automotive and robotics domains. Mobile vision couples high computational requirements with the heterogeneous power constrained systems. This makes it an ideal platform on which to evaluate, amongst other things, processor architectures, memory efficiency, resource scheduling, mapping, and energy efficient techniques. The ASR-MOV workshop intends to bring together system researchers to discuss how the requirements of real-time mobile vision applications impact on tools, architectures and systems.

Keynote: Calin Cascaval, Qualcomm Symphony: Orchestrating Heterogeneity for Power Aware Computing

International Workshop on Dynamic Code Auto-Tuning (DCAT) @ BNC B
Mar 12 @ 2:00 pm – 5:30 pm
This half-day workshop will focus on current developments in the area of auto-tuning, with a focus not only on performance but also on energy efficiency. A variety of projects dealing with auto-tuning – ranging from embedded systems to high-performance computing – will be covered by invited speakers. Their approaches, challenges, and latest results will be presented and discussed. The workshop will serve as a communication forum for developers and researchers as well as an opportunity for end-users to learn about ongoing activities in this field.
 
Mar
13
Sun
2016
Building Dynamic Tools with DynamoRIO on x86 and ARM (DynamoRIO) @ Tibidabo
Mar 13 @ 9:00 am – 12:30 pm

This tutorial will present the DynamoRIO tool platform and describe how to use its API to build custom tools that utilize dynamic code manipulation for instrumentation, profiling, analysis, optimization, introspection, security, and more. The DynamoRIO tool platform was first released to the public in June 2002 and has since been used by many researchers to develop systems ranging from taint tracking to prefetch optimization.  DynamoRIO is publicly available in open source form and operates on Linux and Windows on IA-32, AMD64, and ARM platforms.

International Workshop on Dynamic Compilation Everywhere (DCE) @ BNC B
Mar 13 @ 9:00 am – 12:30 pm

General purpose as well as integrated processors nowadays have to run programs written in a wide variety of languages with isolation concerns. Dynamic compilation, i.e. generate binary code at run-time, is becoming a viable solution for many usage scenarios, and the goal of this workshop is to present current research and look forward to what is going to happen in this field of growing interest for the coming years.

Scientific challenges are multiple with many inter-relations: program representation (source code, intermediate representation, data sets), fast binary code generation, patches, hardware abstraction, garbage collection, performance observation, performance trade-offs, polymorphism, operating systems.

An Open-Source GPGPU Compiler (GPUCC) @ BNC A
Mar 13 @ 2:00 pm – 5:30 pm

This tutorial will present gpucc, an open-source compiler built by Google targeting CUDA and NVIDIA GPUs. gpucc performs various general and CUDA-specific optimizations to generate high performance code. It outperforms NVIDIA’s toolchain (nvcc) on internal large-scale end-to-end benchmarks by up to 51%, and is on par for several open-source benchmarks (Rodinia, SHOC and Tensor). It supports modern language features such as those in C++11 and C++14, and compiles code 8% faster than nvcc, up to 2.4x faster for pathological compiles.

This tutorial will cover the following topics:

  • Using gpucc
    • gpucc system overview: a brief description of how gpucc works under the hood
    • Detailed performance results of gpucc vs nvcc
    • Compiling CUDA programs with gpucc: a demo on how to install gpucc and compile some sample CUDA programs
  • Contributing to gpucc
    • Performance debugging: how to debug the performance of generated binary by using nvprof and observing device code
    • Writing new optimizations for gpucc
The International Workshop on Architectural and Micro-Architectural Support for Dynamic Optimization (AMAS-DO) @ BNC B
Mar 13 @ 2:00 pm – 5:30 pm
Long employed by industry, large scale use of binary translation and on-the-fly code generation and optimization is becoming pervasive both as an enabler for virtualization, processor migration and also as processor implementation technology. The emergence and expected growth of just-in-time compilation, virtualization and Web 2.0 scripting languages brings to the forefront a need for efficient execution of this class of applications. The availability of multiple execution threads brings new challenges and opportunities, as existing binaries need to be transformed to benefit from multiple processors, and extra processing resources enable continuous optimizations and translation.
The main goal of this half-day workshop is to bring together researchers and practitioners with the aim of stimulating the exchange of ideas and experiences on the potential and limits of Architectural and MicroArchitectural Support for Binary Translation and Dynamic Optimization (hence the acronym AMAS- DO, reflecting an a change fromprevious editions). The key focus is on challenges and opportunities for such assistance and opening new avenues of research. A secondary goal is to enable dissemination of hitherto unpublished techniques from commercial projects.

 

Mar
14
Mon
2016
Opening (Joint Session)
Mar 14 @ 8:00 am – 8:30 am
Keynote – Madan Musuvathi
Mar 14 @ 8:30 am – 9:30 am

Beyond the embarrassingly parallel – New languages, compilers, and runtimes for big-data processing

Large-scale data processing requires large-scale parallelism. Data-processing systems from traditional databases to Hadoop and Spark rely on embarrassingly-parallel relational primitives (e.g. map, reduce, filter, and join) to extract parallelism from input programs. But many important applications, such as machine learning and log processing, iterate over large data sets with true loop-carried dependences across iterations. As such, these applications are not readily parallelizable in current data-processing systems.

In this talk, I will challenge the premise that parallelism requires independent computations. In particular, I will describe a general methodology for extracting parallelism from dependent computations. The basic idea is replace dependences with symbolic unknowns and execute the dependent computations symbolically in parallel. The challenge of parallelization now becomes a, hopefully mechanizable, task of performing the resulting symbolic execution efficiently. This methodology opens up the possibility of designing new languages for data-processing computations, compilers that automatically parallelize such computations, and runtimes that exploit the additional parallelism. I will describe our initial successes with this approach and the research challenges that lie ahead.

Biography

musuvathiMadan Musuvathi is a Principal Researcher at Microsoft Research working in the intersection of programming languages and systems, with specific focus on concurrency and parallelism. His interests span program analysis, systems, model checking, verification, and theorem proving. His research has led to several tools that improve the lives of software developers both at Microsoft and at other companies. He received his Ph.D. from Stanford University in 2004.

Break
Mar 14 @ 9:30 am – 10:00 am
Session 1: Profiling Feedback (Mary Lou Soffa)
Mar 14 @ 10:00 am – 11:15 am

Chair: Mary Lou Soffa (University of Virginia)

#4: Tongping Liu and Xu Liu. Cheetah: Detecting False Sharing Efficiently and Effectively

#27: Dehao Chen, Xinliang David Li and Tipp Moseley. AutoFDO: Automatic Feedback-directed Optimization for Warehouse-scale Applications

#32: Ivan Jibaja, Ting Cao, Steve Blackburn and Kathryn McKinley. Portable Performance on Asymmetric Multicore Processors

Break
Mar 14 @ 11:15 am – 11:35 am
Session 2: Data Layout and Vectorization (Dorit Nuzman)
Mar 14 @ 11:35 am – 12:50 pm

Chair: Dorit Nuzman (Intel)

#53: Probir Roy and Xu Liu. MemTool: A Lightweight Profiler to Guide Structure Splitting

#29: Linchuan Chen, Peng Jiang and Gagan Agrawal. Expoliting Recent SIMD Architectural Advances for Irregular Applications

#59: Hao Zhou and Jingling Xue. Exploiting Mixed SIMD Parallelism by Reducing Data Reorganization Overhead

Lunch
Mar 14 @ 12:50 pm – 2:20 pm
Session 3: GPU (Vijay Janapa Reddi)
Mar 14 @ 2:20 pm – 4:00 pm

Chair: Vijay Janapa Reddi (University of Texas)

#52: Raj Barik, Naila Farooqui, Brian Lewis, Chunling Hu and Tatiana Shpeisman. A Black-box Approach to Energy-Aware Scheduling on Integrated CPU-GPU Systems

#5: Christos Margiolas and Michael F.P. O’Boyle. Portable and Transparent Software Managed Scheduling on Accelerators for Fair Resource Sharing

#62: Dong Nguyen and Jongeun Lee. Communication-Aware Mapping of Stream Graphs for Multi-GPU Platforms

#8: Jingyue Wu, Eli Bendersky, Mark Heffernan, Chris Leary, Jacques Pienaar, Bjarke Roune, Rob Springer, Xuetian Weng and Robert Hundt. gpucc: An Open-Source GPGPU Compiler

Break
Mar 14 @ 4:00 pm – 4:20 pm
Session 4: ACM Student Research Competition Presentations
Mar 14 @ 4:20 pm – 6:00 pm
Break
Mar 14 @ 6:00 pm – 6:30 pm
CGO Business Meeting
Mar 14 @ 6:30 pm – 7:30 pm
Mar
15
Tue
2016
Keynote – Keshav Pingali
Mar 15 @ 8:30 am – 9:30 am

50 Years of Parallel programming: Ieri, Oggi, Domani*

Parallel programming started in the mid-60’s with the pioneering work of Karp and Miller, David Kuck, Jack Dennis and others, and as a discipline, it is now 50 years old. What have we learned in the past 50 years about parallel programming? What problems have we solved and what problems remain to be solved? What can young researchers learn from the successes and failures of our discipline? This talk is a personal point of view about these and other questions regarding the state of parallel programming.

* The subtitle of the talk is borrowed from the title of a screenplay by Alberto Moravia, and it is Italian for “Yesterday, Today, Tomorrow.”

Biography

pingaliKeshav Pingali is a Professor in the Department of Computer Science at the University of Texas at Austin, and he holds the W.A.”Tex” Moncrief Chair of Computing in the Institute for Computational Engineering and Sciences (ICES) at UT Austin. Pingali is a Fellow of the IEEE, ACM and AAAS. He was the co-Editor-in-chief of the ACM Transactions on Programming Languages and Systems, and currently serves on the editorial boards of the ACM Transactions on Parallel Computing, the International Journal of Parallel Programming and Distributed Computing. He has also served on the NSF CISE Advisory Committee (2009-2012).

Mar
16
Wed
2016
CGO Best Paper Award and Keynote – Avinash Sodani
Mar 16 @ 8:30 am – 9:30 am

Knights Landing Intel Xeon Phi CPU: Path to Parallelism with General Purpose Programming

The demand for high performance will continue to skyrocket in the future, fueled by the drive to solve the challenging problems in scientific world and to provide the horsepower needed to support the compute-hungry use cases that continue to emerge in commercial and consumer space, such as machine learning and deep data analytics. Exploiting parallelism will be crucial in achieving the huge performance gain required to solve these problems. This talk will present the new Xeon Phi Processor, called Knights Landing, which is architected to provide massive amounts of parallelism in a manner that is accessible with general purpose programming. The talk will provide insights into 1) the important architecture features of the processor and 2) the software technology to explore them. It will provide the inside story on the various architecture decisions made on Knights Landing – why we architected the processor the way we did, and on a few programming experience – how the general purpose programming model makes it easy to exploit parallelism on Xeon Phi. It will show measured performance numbers from the Knights Landing silicon on a range of workloads. The talk will conclude with showing the historical trends in architecture and what they mean for software as we extend the trends into the future.

Biography

sodaniAvinash Sodani is a Senior Principal Engineer at Intel Corporation and the chief architect of the Xeon-Phi Processor called Knights Landing. He specializes in the field of High Performance Computing (HPC). Previously, he was one of the architects of the 1st generation Core processor, called Nehalem, which has served as a foundation for today’s line of Intel Core processors. Avinash is a recognized expert in computer architecture and has been invited to deliver several keynotes and public talks on topics related to HPC and future of computing. Avinash holds over 20 US Patents and is known for seminal work on the concept of “Dynamic Instruction Reuse”.  He has a PhD and MS in Computer Science from University of Wisconsin-Madison and a B.Tech (Hon’s) in Computer Science from Indian Institute of Technology, Kharagpur in India.