As the use of computers proliferates, the complexity and variety of systems continues to grow. As a result, it is becoming increasingly inflexible to “hard wire” behaviours into software. Software developers can enable more control over their software configurations by exploiting Domain Specific Languages (DSLs). Such DSLs provide a systematic way to structure the underlying computational components: to coin a phrase, a DSL is a library with syntax. There is an enormous variety of DSLs for a very wide range of domains. Most DSLs are highly idiosyncratic, reflecting both the specific natures of their application domains and their designers’ own preferences. This workshop will bring together constructors of DSLs for “real world” domains; that is, DSLs intended primarily to aid in building software to solve real world problems rather than to explore the more theoretical aspects of language design and implementation. We are looking for submissions that present the motivation, design, implementation, use and evaluation of such DSLs.
ACM have accepted our application for publishing the proceedings from the workshop. Submissions will be published in the ACM Digital Library within its International Conference proceedings Series.
Adapting code initially written in a “neutral” algorithmic style to be executed in heterogeneous architectures (featuring e.g. GPGPUs, FPGAs), and later maintaining it, is a difficult and error-prone task. It requires knowledge about the programming model of the destination architecture, about what the original code does, and about the execution environment. The situation is even worse when the same code needs to run in different platforms or when different sections of the same application ought to run (for, e.g., time or resource optimization purposes) in different architectures. Assistance in (and, if possible, automation of) the process of code adaptation is of course advantageous and needs knowledge and reasoning capabilities similar to those that human programmers have. This workshop will focus on techniques and foundations to make it possible to perform source-to-source code transformations which preserve the intended semantics of the original code aiming at producing code which is better suited to be executed in different target architectures.
The increased processing capability of mobile and embedded platforms is enabling more and more ambitious machine vision applications. Industry players are actively pushing embedded vision in the entertainment, automotive and robotics domains. Mobile vision couples high computational requirements with the heterogeneous power constrained systems. This makes it an ideal platform on which to evaluate, amongst other things, processor architectures, memory efficiency, resource scheduling, mapping, and energy efficient techniques. The ASR-MOV workshop intends to bring together system researchers to discuss how the requirements of real-time mobile vision applications impact on tools, architectures and systems.
Keynote: Calin Cascaval, Qualcomm Symphony: Orchestrating Heterogeneity for Power Aware Computing
This tutorial will present the DynamoRIO tool platform and describe how to use its API to build custom tools that utilize dynamic code manipulation for instrumentation, profiling, analysis, optimization, introspection, security, and more. The DynamoRIO tool platform was first released to the public in June 2002 and has since been used by many researchers to develop systems ranging from taint tracking to prefetch optimization. DynamoRIO is publicly available in open source form and operates on Linux and Windows on IA-32, AMD64, and ARM platforms.
General purpose as well as integrated processors nowadays have to run programs written in a wide variety of languages with isolation concerns. Dynamic compilation, i.e. generate binary code at run-time, is becoming a viable solution for many usage scenarios, and the goal of this workshop is to present current research and look forward to what is going to happen in this field of growing interest for the coming years.
Scientific challenges are multiple with many inter-relations: program representation (source code, intermediate representation, data sets), fast binary code generation, patches, hardware abstraction, garbage collection, performance observation, performance trade-offs, polymorphism, operating systems.
This tutorial will present gpucc, an open-source compiler built by Google targeting CUDA and NVIDIA GPUs. gpucc performs various general and CUDA-specific optimizations to generate high performance code. It outperforms NVIDIA’s toolchain (nvcc) on internal large-scale end-to-end benchmarks by up to 51%, and is on par for several open-source benchmarks (Rodinia, SHOC and Tensor). It supports modern language features such as those in C++11 and C++14, and compiles code 8% faster than nvcc, up to 2.4x faster for pathological compiles.
This tutorial will cover the following topics:
- Using gpucc
- gpucc system overview: a brief description of how gpucc works under the hood
- Detailed performance results of gpucc vs nvcc
- Compiling CUDA programs with gpucc: a demo on how to install gpucc and compile some sample CUDA programs
- Contributing to gpucc
- Performance debugging: how to debug the performance of generated binary by using nvprof and observing device code
- Writing new optimizations for gpucc