- Optimizing and Auto-Tuning Scale-Free Sparse Matrix-Vector Multiplication on Intel Xeon Phi
Wai Teng Tang (Institute of High Performance Computing, A*STAR, Singapore), Ruizhe Zhao (Peking University, China), Mian Lu (Institute of High Performance Computing, A*STAR, Singapore), Yun Liang (Peking University, China), Huynh Phung Huynh (Institute of High Performance Computing, A*STAR, Singapore), Xibai Li (Peking University, China), and Rick Siow Mong Goh (Institute of High Performance Computing, A*STAR, Singapore)
- Improving GPGPU Energy-Efficiency through Concurrent Kernel Execution and DVFS
Qing Jiao (School of Computing, National University of Singapore, Singapore.), Mian Lu and Huynh Phung Huynh (A*STAR Institute of High Performance Computing, Singapore.), and Tulika Mitra (School of Computing, National University of Singapore, Singapore.)
- Branch Prediction and the Performance of Interpreters – Don’t Trust Folklore
Erven Rohou, Bharath Narasimha Swamy, and André Seznec (Inria)
- A Parallel Abstract Interpreter for JavaScript
Kyle Dewey, Vineeth Kashyap, and Ben Hardekopf (University of California Santa Barbara)
- Approximating Flow-Sensitive Pointer Analysis Using Frequent Itemset Mining
Vaivaswatha Nagaraj and R. Govindarajan (Indian Institute of Science, Bangalore)
- EMEURO: A Framework for Generating Multi-Purpose Accelerators via Deep Learning
Lawrence McAfee and Kunle Olukotun (Stanford University)
- Optimizing Binary Translation for Dynamically Generated Code
Byron Hawkins and Brian Demsky (University of California, Irvine) and Derek Bruening and Qin Zhao (Google, Inc.)
- Relaxing Program Semantics to Unleash Parallelization
Simone Campanoni, Glenn Holloway, Gu-Yeon Wei, and David Brooks (Harvard University)
- Snapshot-based Loading-Time Acceleration for Web Applications
JinSeok Oh and Soo-Mook Moon (Seoul National University)
- Getting in Control of Your Control Flow with Control-Data Isolation
William Arthur (University of Michigan), Ben Mehne (University of California – Berkeley), and Reetuparna Das and Todd Austin (University of Michigan)
- Checking Correctness of Code Generator Architecture Specifications
Niranjan Hasabnis, R. Sekar, and Rui Qiao (Stony Brook University)
- HERMES: A Fast Cross-ISA Binary Translator with Post-Optimization
Xiaochun Zhang (Institute of Computing Technology, Chinese Academy of Science), Qi Guo (Carnegie Mellon University), and Yunji Chen, Tianshi Chen, and Weiwu Hu (Institute of Computing Technology, Chinese Academy of Science)
- Characterizing and Enhancing Global Memory Data Coalescing on GPU
Naznin Fauzia, Louis-Noel Pouchet, and P Sadayappan (The Ohio State University)
- MemorySanitizer: fast detector of uninitialized memory use in C++
Evgeniy Stepanov and Konstantin Serebryany (Google)
- Data Provenance Tracking for Concurrent Programs
Brandon Lucia (Microsoft Research) and Luis Ceze (University of Washington)
- PSLP: Padded SLP automatic vectorization
Vasileios Porpodas (University of Cambridge), Alberto Magni (University of Edinburgh), and Timothy M. Jones (University of Cambridge)
- Scalable Conditional Induction Variable (CIV) Analysis
Cosmin E. Oancea (University of Copenhagen) and Lawrence Rauchwerger (Texas A&M University)
- Resource Aware Concurrent Start for Stencil Applications
Sunil Shrestha (University of Delaware), Joseph Manzano, Andres Marquez, and John Feo (Pacific Northwest National Laboratory), and Guang R. Gao (University of Delaware)
- A Graph-Based Higher-Order Intermediate Representation
Roland Leißa, Marcel Köster, and Sebastian Hack (Saarland University)
- Optimizing the flash-RAM energy trade-off in deeply embedded systems
James Pallister, Kerstin Eder, and Simon J. Hollis (University of Bristol)
- On Performance Debugging of Unnecessary Lock Contentions on Multicore Processors: A Replay-based Approach
Long Zheng and Xiaofei Liao (Huazhong University of Science and Technology), Bingsheng He (Nanyang Technological University), and Song Wu and Hai Jin (Huazhong University of Science and Technology)
- Reactive Tiling
Jithendra Srinivas, Wei Ding, and Mahmut Kandemir (Penn State)
- Locality-Centric Thread Scheduling for Bulk-synchronous Programming Models on CPU Architectures
Hee-Seok Kim and Izzat El Hajj (University of Illinois at Urbana-Champaign), John Stratton (MulticoreWare Inc.), and Steven Lumetta and Wen-mei Hwu (University of Illinois at Urbana-Champaign)
- Automatic Data Placement into GPU On-chip Memory Resources
Chao Li (North Carolina State University), Yi Yang (NEC labs), and Huiyang Zhou (North Carolina State University)