This tutorial will present the DynamoRIO tool platform and describe how to use its API to build custom tools that utilize dynamic code manipulation for instrumentation, profiling, analysis, optimization, introspection, security, and more. The DynamoRIO tool platform was first released to the public in June 2002 and has since been used by many researchers to develop systems ranging from taint tracking to prefetch optimization. DynamoRIO is publicly available in open source form and operates on Linux and Windows on IA-32, AMD64, and ARM platforms.
This tutorial will present gpucc, an open-source compiler built by Google targeting CUDA and NVIDIA GPUs. gpucc performs various general and CUDA-specific optimizations to generate high performance code. It outperforms NVIDIA’s toolchain (nvcc) on internal large-scale end-to-end benchmarks by up to 51%, and is on par for several open-source benchmarks (Rodinia, SHOC and Tensor). It supports modern language features such as those in C++11 and C++14, and compiles code 8% faster than nvcc, up to 2.4x faster for pathological compiles.
This tutorial will cover the following topics:
- Using gpucc
- gpucc system overview: a brief description of how gpucc works under the hood
- Detailed performance results of gpucc vs nvcc
- Compiling CUDA programs with gpucc: a demo on how to install gpucc and compile some sample CUDA programs
- Contributing to gpucc
- Performance debugging: how to debug the performance of generated binary by using nvprof and observing device code
- Writing new optimizations for gpucc