VectorBench [Github]
VectorBench is a suite of C and C++ benchmark functions that you can use to evaluate the efficacy of your compiler transformations, particularly those that auto-vectorize scalar code or revectorize SIMD code. VectorBench includes a unique suite of more than 200 hand-vectorized functions, most of which have scalar equivalents.
Included hand-vectorized benchmarks:
- Simd library - image processing and stencil benchmarks (link). We have seperated testing code for each kernel into its own file. The kernels remain unmodified from the original benchmark suite.
- x265 - contains kernels extracted from x265 encoding and decoding library (link), such as DCT, IDCT, and dequantization computations.
- FastPFor - contains kernels from the popular integer packing library (link)
If you use VectorBench in your work, please cite Revec: Program Rejuvenation through Revectorization (BibTeX).
Revec [Github]
Reinstating performance portability in hand-vectorized code
Revec: Program Rejuvenation through Revectorization (Compiler Construction ‘19)
Charith Mendis*, Ajay Jain*, Paras Jain, Saman Amarasinghe
[arXiv] [CC slides] [CC video] (* denotes equal contribution)
Revec is a compiler optimization pass, implemented in LLVM, which revectorizes hand-vectorized code. Revec retargets code at compile time to use instructions of newer processor generations.
Modern microprocessors are equipped with Single Instruction Multiple Data (SIMD) or vector instructions which expose data level parallelism at a fine granularity. Programmers exploit this parallelism by using low-level vector intrinsics in their code. However, once programs are written using vector intrinsics of a specific instruction set, the code becomes non-portable. Modern compilers are unable to analyze and retarget the code to newer vector instruction sets. Hence, programmers have to manually rewrite the same code using vector intrinsics of a newer generation to exploit higher data widths and capabilities of new instruction sets. This process is tedious, error-prone and requires maintaining multiple code bases.
We propose Revec, a compiler optimization pass which revectorizes already vectorized code, by retargeting it to use vector instructions of newer generations. The transformation is transparent, happening at the compiler intermediate representation level, and enables performance portability of hand-vectorized code.
Revec can achieve performance improvements in real-world performance critical kernels. In particular, Revec achieves geometric mean speedups of 1.160× and 1.430× on fast integer unpacking kernels proposed in SIMD-Scan and implemented in the FastPFor integer compression library, and speedups of 1.145× and 1.195× on hand-vectorized x265 media codec kernels when retargeting their SSE-series implementations to use AVX2 and AVX-512 vector instructions respectively. In our paper, we also extensively test Revec’s impact on 216 intrinsic-rich implementations of image processing and stencil kernels relative to hand-retargeting and observe speedups of 1.102× and 1.116×.
BibTeX:
@inproceedings{Mendis:2019:RPR:3302516.3307357,
author = {Mendis, Charith and Jain, Ajay and Jain, Paras and Amarasinghe, Saman},
title = {Revec: Program Rejuvenation Through Revectorization},
booktitle = {Proceedings of the 28th International Conference on Compiler Construction},
series = {CC 2019},
year = {2019},
isbn = {978-1-4503-6277-1},
location = {Washington, DC, USA},
pages = {29--41},
numpages = {13},
url = {http://doi.acm.org/10.1145/3302516.3307357},
doi = {10.1145/3302516.3307357},
acmid = {3307357},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {Single Instruction Multiple Data (SIMD), optimizing compilation, program rejuvenation, vectorization},
}
Intransitive [Github]
Auto-discovery of equivalent intrinsic instruction sequences
Intransitive generates rules to collapse sequences of LLVM vector IR intrinsics into shorter sequences of equivalent (wider) operations. This is done via testbed generation, randomized testing of input bit sequences, and testing with combinatorially generated corner-case bit sequences. See Revec: Program Rejuvenation through Revectorization for technical details.
goSLP
Solver aided auto-vectorization to supersede heuristics
goSLP: Globally Optimized Superword Level Parallelism Framework (OOPSLA ‘18)
Charith Mendis, Saman Amarasinghe
[arXiv] [LLVM poster] [OOPSLA talk]
Modern microprocessors are equipped with single instruction multiple data (SIMD) or vector instruction sets which allow compilers to exploit superword level parallelism (SLP), a type of fine-grained parallelism. Current SLP auto-vectorization techniques use heuristics to discover vectorization opportunities in high-level language code. These heuristics are fragile, local and typically only present one vectorization strategy that is either accepted or rejected by a cost model. We present goSLP, a novel SLP auto-vectorization framework which solves the statement packing problem in a pairwise optimal manner. Using an integer linear programming (ILP) solver, goSLP searches the entire space of statement packing opportunities for a whole function at a time, while limiting total compilation time to a few minutes. Furthermore, goSLP optimally solves the vector permutation selection problem using dynamic programming. We implemented goSLP in the LLVM compiler infrastructure, achieving a geometric mean speedup of 7.58% on SPEC2017fp, 2.42% on SPEC2006fp and 4.07% on NAS benchmarks compared to LLVM’s existing SLP auto-vectorizer.
BibTeX:
author = {Mendis, Charith and Amarasinghe, Saman},
title = {goSLP: Globally Optimized Superword Level Parallelism Framework},
journal = {Proc. ACM Program. Lang.},
issue_date = {November 2018},
volume = {2},
number = {OOPSLA},
month = oct,
year = {2018},
issn = {2475-1421},
pages = {110:1--110:28},
articleno = {110},
numpages = {28},
url = {http://doi.acm.org/10.1145/3276480},
doi = {10.1145/3276480},
acmid = {3276480},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {Auto-vectorization, Dynamic Programming, Integer Linear Programming, Statement Packing, Superword Level Parallelism, Vector Permutation},
}
People
Vectorization for next-gen hardware is an initiative led by the following people:

Ajay Jain
(MIT)

Charith Mendis
(MIT)

Paras Jain
(UC Berkeley)

Saman Amarasinghe
(MIT)