Introduction While working on my previous blog post, I had tried to use likwid-perfctr instead of perf-stat. likwid-perfctr -e segfaulted on my machine. This article goes into how I triaged this issue and reported it to the LIKWID devs. It was “fun” in the way that having to open your CPU’s processor programming reference manual is fun. The Crash Just like perf stat, likwid-perfctr prints all the PMU events available on your platform using the flag -e. My CPU is a Ryzen 7 255, a (somewhat odd) Zen 4 uarch. likwid-perfctr -e appeared to run fine on the first invocation after a cold reboot, but segfaulted on subsequent invocations. ...
Accelerating copy_if using SIMD
Introduction I have a Zen 4 CPU with a bunch of AVX512 feature flags. So I thought - let’s try and use it to implement something, even if it’s in the realm of wheel-reinvention. I started with the following goals. Implement an algorithm that cannot be vectorized by my optimizing compiler, even with a polyhedral loop model. Systematically analyze its performance and answer the questions Is it as fast as it can be? If not, why? And how can we fix it? Start simple, make it work. Which means that dead simple algorithms like map/transform, reduce, adjacent_difference etc are out, as they are very autovectorizable. Even 2D stencils are out because look at this. So, I settled on std::copy_if. ...