Workshops and Tutorials

Tutorials 2/15/2014 2/15/2014 2/16/2014 2/16/2014
Saturday AM Saturday PM Sunday AM Sunday PM
CGO Programming Models and Compiler Optimizations for GPUs and Multicores Programming Models and Compiler Optimizations for GPUs and Multicores SPIR – A Standard Portable IR for OpenCL Kernel Language LR(1) Parser Generator Hyacc
CGO Inside X10: Implementing a High‐Level Language for Large‐Scale Distributed and Heterogeneous Platforms One VM to rule them all
PPoPP Advanced MPI Habanero-Java Run-time verification for parallel programs’ SnuCL
PPoPP Galois system PGAS+Hybrid MPI Programming distributed algorithms Analyizing analytics for parallelism(Rajesh)
PPoPP Analytics workloads (Manoj) Analytics workloads (Manoj) Jan Treibig Jan Treiblg
PPoPP Programming models for vector processing Programming models for vector processing
PPoPP programming models and applicatiosn for multicores and many cores (PMAM) programming models and applicatiosn for multicores and many cores (PMAM) parallel programming for analytics applications (Manoj) parallel programming for analytics applications (Manoj)


Saturday AM & PM

1. Title: Programming Models and Compiler Optimizations for GPUs and Multicores

Organizers: J. (Ram) Ramanujam, School of Electrical Engr. & Computer Science, Louisiana State University, USA

J. (Ram) Ramanujam received the B. Tech. degree in Electrical Engineering from the Indian Institute of Technology, Madras, India in 1983, and his M.S. and Ph. D. degrees in Computer Science from The Ohio State University, USA in 1987 and 1990 respectively. He is currently the John E. and Beatrice L. Ritter Distinguished Professor in the Department of Electrical and Computer Engineering at Louisiana State University (LSU), USA. In addition, he holds a joint faculty appointment with the LSU Center for Computation and Technology. His research interests are in compilers and runtime systems for high-performance computing, domain-specific languages and compilers for parallel computing, embedded systems, and high-level hardware synthesis. He has participated in several US NSF-funded projects including the Tensor Contraction Engine (TCE) and the Pluto project for automatic parallelization. Additional details can be found at

P. (Saday) Sadayappan, Dept. of Computer Science & Engineering, The Ohio State University, USA

P. (Saday) Sadayappan received the B. Tech. degree from the Indian Institute of Technology, Madras, India, and an M.S. and Ph. D. from the State University of New York at Stony Brook, USA, all in Electrical Engineering. He is currently a Professor in the Department of Computer Science and Engineering at The Ohio State University, USA. His research interests include compiler/runtime optimization for parallel computing, and domain-specific languages for high-performance scientific computing. He has led several US NSF-funded projects including the Tensor Contraction Engine and the Pluto project for automatic parallelization. Additional details can be found at

Abstract: On-chip parallelism with multiple cores is now ubiquitous. Because of power and cooling constraints, recent performance improvements in both general-purpose and special-purpose processors have come primarily from increased on-chip parallelism from multiple cores rather than increased clock rates. Parallelism is therefore of considerable interest to a much broader group than developers of parallel applications for high-end supercomputers. Several programming environments have recently emerged in response to the need to develop applications for graphics processing units (GPUs) and multicore processors. This tutorial will address the following topics:

  • What are the currently available programming models and API’s for explicit parallel programming of multi-core CPUs and GPUs?
  • What are the fundamental issues in achieving a significant fraction of peak performance with multicore CPUs and GPUs?
  • What are some of the current efforts at providing more convenient high-level frameworks for programming GPUs? What are the compiler optimization challenges that these frameworks address?

Sunday AM

2. Title: SPIR – A Standard Portable IR for OpenCL Kernel Language

Organizers: Boaz Ouriel and Guy Benyei, Intel Corporation, Haifa, Israel

Abstract: OpenCL is one of the most ubiquitous programming environments for many GPGPUs and other heterogeneous platforms. SPIR is a new Standard Portable Intermediate Representation based on LLVM IR, which is designed to facilitate efficient distribution of OpenCL kernel programs in device-independent intermediate format, somewhat akin to Java bytecode. A preview of SPIR 1.2 specification is available on Khronos website. In parallel, the latest Intel® SDK for OpenCL* Applications XE supports SPIR as a preview feature for Xeon and Xeon Phi platforms. This tutorial will explain all about SPIR, demonstrate how SPIR can (a) be generated by an Offline Compiler tool, and (b) be consumed using this recently available SDK. These capabilities open the door for other potential languages and programming models to access many heterogeneous platforms in a standard, portable and efficient way.

3. Title: Inside X10: Implementing a High‐Level Language for Large‐Scale Distributed and Heterogeneous Platforms

Organizers: Olivier Tardieu, IBM T.J. Watson Research Center

Olivier Tardieu joined IBM Research in 2007. He received his Ph.D. in Computer Science from Ecole des Mines de Paris in 2004. His research interests include programming language design and implementation, software safety, concurrency, and hardware synthesis. He is one of the core designers and implementers of the X10 programming language leading the runtime development. Prior to joining IBM, he conducted research at Bell Labs, Columbia University, and INRIA. He has presented multiple tutorials, invited and conference talks, and seminars on X10.

David Grove, IBM T.J. Watson Research Center

David Grove joined IBM Research in 1998 after completing his Ph.D. at the University of Washington. His primary research interests include the analysis and optimization of object-oriented languages, virtual machine design and implementation, JIT compilation, online feedback-directed optimization, and garbage collection. He has worked on a number of projects in the general area of programming language and optimization including the Cecil project at UW, and the Jikes RVM and Metronome projects at IBM. In 2008 he joined the X10 project and is leading the development of the X10 compiler. David is an ACM Fellow and has given 8 tutorials, several keynote talks, and numerous conference talks and university seminars.

Vijay Saraswat, IBM T.J. Watson Research Center

Vijay Saraswat joined IBM Research in 2003, after a year as a Professor at Penn State, a couple of years at startups and 13 years at PARC and AT&T Research. His main interests are in programming languages, constraints, logic and concurrency. At IBM, he leads the work on the design of X10, a modern object-oriented programming language intended for scalable concurrent computing. Over the last twenty years he has lectured at most major universities and research labs in USA and Europe. For the last five years, he has co-taught with Dr. Martha Kim the Principles and Practice of Parallel Programming course at Columbia University.

Benjamin Herta, IBM T.J. Watson Research Center

Benjamin Herta joined IBM in 2000 and IBM Research in 2008. For the last three years, as a member of the X10 team, Ben has been working on high performance runtimes and interconnects, and on embedding X10 within middleware data grids for resilient computation. His interests are in high performance and high availability software, computer vision, programming language implementations and runtimes. He received a M.S. in Computer Science from Rensselaer Polytechnic Institute in 2006. He holds several patents.

Abstract: Implementing a high‐level language like X10 on a variety of platforms and achieving high performance presents a number of challenges. In this tutorial we will briefly cover the core features of the X10 language and some current empirical results, but will mainly focus on presenting the key implementation technology that we developed for X10. Topics covered will include:

  • Overview: an overview of the X10 compiler and runtime systems
  • Scheduling: forkjoin style and Cilkstyle workstealing schedulers
  • Scale out: scalable distributed termination detection
  • Accelerators: compilation of X10 kernels to CUDA‐enabled GPUs
  • Resilience: enabling X10 programs and libraries to handle node failures

The tutorial is intended both for people who are generally interested in the implementation technology used by X10 and other similar PGAS languages and at researchers who are interested in using the X10 implementation (available open source at x10‐ as the basis for their own research projects.

Sunday PM

4. Title: One VM to Rule Them All

Organizers: VM Research Group, Oracle Labs

Abstract: We present Truffle, a novel framework for implementing managed languages in Java. The language implementer writes an AST interpreter, which is integrated in our framework that allows tree rewriting during AST interpretation. Tree rewrites incorporate type feedback and other profiling information into the tree, thus specializing the tree and augmenting it with run-time information. When the tree reaches a stable state, partial evaluation compiles the tree into optimized machine code. The partial evaluation is done by Graal, the just-in-time compiler of our Java VM (a variation of the Java HotSpot VM).

Oracle Labs as well as external research groups have implemented a variety of programming languages on top of Truffle, including JavaScript, Ruby, R, Python, and Smalltalk. Several of them already exceed the best implementation of that language that existed before.

Topics to be covered:

  • Motivation for a modular language framework and re-use of VM components.
  • System architecture: Truffle runs on the Graal VM, a modified version of the Java HotSpot VM with the Graal JIT compiler.
  • Overview of Truffle: core classes that form the AST and the framework for AST rewriting.
  • AST interpreter cookbook: implementation of an AST interpreter for a simple language. We will implement a language and integrate it with the Truffle framework to get a high-performance implementation of this language.
  • Overview of the existing language implementations and how they achieve excellent peak performance

Title: LR(1) Parser Generator Hyacc

Organizers: Xin Chen, University of Hawaii

Xin Chen is currently with the PanSTARRS research project at the Institute for Astronomy at the University of Hawaii. PanSTARRS is the largest Astronomy survey and database project in history to date, which aims to find asteroids that possibly can collide with the earth, and is a global research effort of ten universities and research institutes from Germany, Taiwan, the United Kingdom and the United States. Xin received his PhD in Computer Science from the University of Hawaii in 2009 with a concentration in compiler theory. HYACC is a software released from his PhD research. Before this, Xin received his MS in Computer Science from the University of Hawaii in 2003, and BS from Tsinghua University in Beijing in 1998. His interests include compiler and compiler generation theory and implementation, Bioinformatics, AI, Algorithms, Software engineering, Database and Web Programming, Big Data, Data Mining and Machine Learning.

Abstract: The space and time cost of Knuth’s canonical LR parser generation is high and has long been a myth since 1965. Robust and effective LR(1) parser generators are rare to find. Started in 2006 and towards most recently, we employed the Knuth canonical algorithm, Pager’s practical general method, lane-tracing algorithm, and other relevant algorithms, implemented an efficient, practical and open-source parser generator Hyacc, which supports full LR(0)/LALR(1)/LR(1) and partial LR(k). Hyacc is compatible with Yacc and Bison in command line interface and input/output format. Hyacc is more advanced than Yacc and Bison in that it employs the more powerful LR(1) algorithm instead of LALR(1) algorithm. Hyacc is implemented in ANSI C, is highly efficient, can be easily ported to unix/linux, mac, windows and many other platforms, and can be easily picked up by anyone familiar with Yacc or Bison. We believe Hyacc is a very useful tool and should benefit the parser generation community. We hope more people can know and use it. This proposed session will introduce the LR(1) Parser Generator Hyacc, cover its theoretical background, algorithms employed, architecture, implementation, efficiency, and discuss its usage by examples.




Optimizations are crucial to meet the performance, power and cost requirements that DSP and embedded systems have. The aim of the ODES workshop is to give the opportunity to researchers and practitioners working on this, to share their findings and get feedback. We think interacting with the community is crucial to do relevant research, and therefore ODES tries to maximize the interaction by carefully selecting the program committee members that review the submissions and at the workshop itself, reserving enough time for discussion.

AAPLC: Cancelled 

Programming languages and compiler techniques are gaining importance in the era of multi-cores and cloud/mobile computing. Asia-Pacific Programming Languages and Compilers Workshop (APPLC, pronounced as Apple-see(d)) is a response to this new reality and to the growing and active research communities in this part of the world. It aims to provide a forum for researchers and practitioners in Asia and Pacific region, as well as other global regions, to exchange innovative ideas and research experiences in programming language design, implementation and compiler techniques. Asia-Pacific region has a large and growing number of research groups in universities, national labs, and industries focusing on those areas. Their influence is bound to increase in the coming years. The workshop spans the spectrum of all programming languages and compilers related issues.