Haidl M, Steuwer M, Humernbrum T, Gorlatch S
Research article in edited proceedings (conference) | Peer reviewedWriting and optimizing programs for high performance on systems with Graphics Processing Units (GPUs) remains a challenging task even for expert programmers. A promising optimization technique is multi-stage programming -- evaluating parts of the program upfront on the CPU and embedding the computed values in the GPU code, thus allowing for more aggressive compiler optimizations. Unfortunately, such optimizations are not possible in CUDA, whereas to apply them in OpenCL, programmers are forced to manipulate the GPU source code as plain strings, which is error-prone and type-unsafe. In this paper, we describe PACXX -- our approach to GPU programming in C++, with the convenient features of modern C++14 standard: type deduction, lambda expressions, and algorithms from the standard template library (STL). Using PACXX, a GPU program is written as a single C++ program, rather than two distinct host and kernel programs. We extend PACXX with an easy-to-use and type-safe API for multi-stage programming avoiding the pitfalls of string manipulation. Using just-in-time compilation techniques, PACXX generates efficient GPU code at runtime. Our evaluation shows that using PACXX allows for writing multi-stage code easier and safer than currently possible in CUDA or OpenCL. With two application studies we demonstrate that multi-stage programs can significantly outperform equivalent non-staged versions. Furthermore, we show that PACXX generates code with high performance, comparable to industrial-strength OpenCL compilers.
Gorlatch, Sergei | Professur für Praktische Informatik (Prof. Gorlatch) |
Haidl, Michael | Professur für Praktische Informatik (Prof. Gorlatch) |
Humernbrum, Tim | Professur für Praktische Informatik (Prof. Gorlatch) |
Steuwer, Michel | Institute of Computer Science |