Introduction

While working on my previous blog post, I had tried to use likwid-perfctr instead of perf-stat. likwid-perfctr -e segfaulted on my machine. This article goes into how I triaged this issue and reported it to the LIKWID devs. It was “fun” in the way that having to open your CPU’s processor programming reference manual is fun.

The Crash

Just like perf stat, likwid-perfctr prints all the PMU events available on your platform using the flag -e. My CPU is a Ryzen 7 255, a (somewhat odd) Zen 4 uarch. likwid-perfctr -e appeared to run fine on the first invocation after a cold reboot, but segfaulted on subsequent invocations.

I opened an issue on the project’s GitHub, and the issue template very helpfully pointed me towards the -V3 command line option that prints very verbose debug information. Here’s the relevant snippet from the first invocation that does not segfault.

...
DEBUG - [perfmon_check_counter_map:819] Counter DFC0 at pos 15 with dev (MSR_DEV) (0) 0
DEBUG - [perfmon_check_counter_map:819] Counter DFC1 at pos 16 with dev (MSR_DEV) (0) 0
DEBUG - [perfmon_check_counter_map:819] Counter DFC2 at pos 17 with dev (MSR_DEV) (0) 0
DEBUG - [perfmon_check_counter_map:819] Counter DFC3 at pos 18 with dev (MSR_DEV) (0) 0
DEBUG - [perfmon_check_counter_map:819] Counter DFC4 at pos 19 with dev (MSR_DEV) (0) 0
DEBUG - [access_client_read:517] Got error 'failed to read/write register' from access daemon reading reg 0xC0010248 at CPU 0
DEBUG - [perfmon_check_counter_map:819] Counter DFC5 at pos 20 with dev (MSR_DEV) (0) 0
DEBUG - [access_client_read:517] Got error 'failed to read/write register' from access daemon reading reg 0xC001024A at CPU 0
DEBUG - [perfmon_check_counter_map:819] Counter DFC6 at pos 21 with dev (MSR_DEV) (0) 0
DEBUG - [access_client_read:517] Got error 'failed to read/write register' from access daemon reading reg 0xC001024C at CPU 0
DEBUG - [perfmon_check_counter_map:819] Counter DFC7 at pos 22 with dev (MSR_DEV) (0) 0
DEBUG - [access_client_read:517] Got error 'failed to read/write register' from access daemon reading reg 0xC001024E at CPU 0
DEBUG - [perfmon_check_counter_map:819] Counter DFC8 at pos 23 with dev (MSR_DEV) (0) 0
DEBUG - [access_client_read:517] Got error 'failed to read/write register' from access daemon reading reg 0xC0010250 at CPU 0
DEBUG - [perfmon_check_counter_map:819] Counter DFC9 at pos 24 with dev (MSR_DEV) (0) 0
DEBUG - [access_client_read:517] Got error 'failed to read/write register' from access daemon reading reg 0xC0010252 at CPU 0
DEBUG - [perfmon_check_counter_map:819] Counter DFC10 at pos 25 with dev (MSR_DEV) (0) 0
DEBUG - [access_client_read:517] Got error 'failed to read/write register' from access daemon reading reg 0xC0010254 at CPU 0
DEBUG - [perfmon_check_counter_map:819] Counter DFC11 at pos 26 with dev (MSR_DEV) (0) 0
DEBUG - [access_client_read:517] Got error 'failed to read/write register' from access daemon reading reg 0xC0010256 at CPU 0
DEBUG - [perfmon_check_counter_map:819] Counter DFC12 at pos 27 with dev (MSR_DEV) (0) 0
DEBUG - [access_client_read:517] Got error 'failed to read/write register' from access daemon reading reg 0xC0010258 at CPU 0
DEBUG - [perfmon_check_counter_map:819] Counter DFC13 at pos 28 with dev (MSR_DEV) (0) 0
DEBUG - [access_client_read:517] Got error 'failed to read/write register' from access daemon reading reg 0xC001025A at CPU 0
DEBUG - [perfmon_check_counter_map:819] Counter DFC14 at pos 29 with dev (MSR_DEV) (0) 0
DEBUG - [access_client_read:517] Got error 'failed to read/write register' from access daemon reading reg 0xC001025C at CPU 0
DEBUG - [perfmon_check_counter_map:819] Counter DFC15 at pos 30 with dev (MSR_DEV) (0) 0
DEBUG - [access_client_read:517] Got error 'failed to read/write register' from access daemon reading reg 0xC001025E at CPU 0
...

DFC is short for data fabric counter. It looks like likwid-perfctr -e builds some data structure on the first invocation by querying various counters. It successfully checks 4 DFCs, but fails to check 12 other DFCs.

Hmmm…

RTFM

The maintainer responding to my issue said that he will have to look at the AMD docs to verify the DFC counter information for Zen 4. I decided to go looking on my own. The first step of course is to determine which manual my CPU corresponds to. The kind of documentation we are looking for is called a Processor Programming Reference (PPR) in AMD docs. Unfortunately there is no document titled “Ryzen 7 255 Processor Programming Reference”. Instead, they are named something like “Processor Programming Reference (PPR) for AMD Family 1Ah Model 70h”. What is this family and model?

Let’s look at the info provided by lscpu (or cat /proc/cpuinfo):

Architecture:                x86_64
  CPU op-mode(s):            32-bit, 64-bit
  Address sizes:             48 bits physical, 48 bits virtual
  Byte Order:                Little Endian
CPU(s):                      16
  On-line CPU(s) list:       0-15
Vendor ID:                   AuthenticAMD
  Model name:                AMD Ryzen 7 255 w/ Radeon 780M Graphics
    CPU family:              25
    Model:                   117
    Thread(s) per core:      2
    Core(s) per socket:      8
    Socket(s):               1
    Stepping:                2
    Microcode version:       0xa705208
...

There we have it: model 117, family 25. Converting that to hexadecimal, we get

julia> UInt8(117)
0x75

julia> UInt8(25)
0x19

The “h” suffix in the PPR name means hexadecimal. So, we are looking for a document that looks like “Processor Programming Reference (PPR) for AMD Family 19h Model 75h”. Looking it up on docs.amd.com, I found the document “Processor Programming Reference (PPR) for AMD Family 19h Model 70h”, which actually covers models 70h-77h. Per this manual, there should be 16 independent counters for DF events that can be configured simultaneously.

Figure 1.

Note that all 16 of these are shared across all cores.

Checking using cpuid and inline asm

The manual very helpfully shows how to verify the DFC count.

CPUID functions pertaining to DFCs and UMCs
Figure 2. CPUID functions pertaining to DFCs and UMCs

Figure 2 shows a few cpuid leaf functions. In x86, cpuid is an instruction that returns processor identification and feature information to the eax, ebx, ecx, and edx registers. The returned values are determined by the values in eax, and sometimes ecx as well.

The second table (CPUID_Fn80000022_EBX) shows that bits [15:10] of the ebx register should contain the number of available data fabric counters when the cpuid instruction is invoked with 0x80000022 is passed in the eax register (from the function name). It also very helpfully states that this value is fixed at 0x4. Which explains the output of likwid-perfctr -e and the subsequent segfault. The eax and ecx registers for the same function report support for LBR V2 and Performance Monitoring V2 support, and active UMC information respectively.

We can use this information to figure things out for ourselves using a simple program.

#include <stdio.h>
#include <stdint.h>

int main(void) {
    uint32_t eax, ebx, ecx, edx;

    __asm__ volatile (
        "cpuid"
        : "=a"(eax), "=b"(ebx), "=c"(ecx), "=d"(edx)
        : "a"(0x80000022), "c"(0)
    );

    printf("CPUID leaf 0x80000022 (AMD Performance Monitoring and Debug)\n");
    printf("  EAX = 0x%08X\n", eax);
    printf("  EBX = 0x%08X\n", ebx);
    printf("  ECX = 0x%08X\n", ecx);
    printf("  EDX = 0x%08X\n", edx);
    printf("\n");

    /* EAX bits */
    printf("EAX breakdown:\n");
    printf("  PerfMonV2 supported : %s\n", (eax & (1 << 0)) ? "yes" : "no");

    /* EBX bits */
    uint32_t num_core_ctrs = ebx & 0xF;           /* bits 3:0  */
    uint32_t num_df_ctrs   = (ebx >> 10) & 0x3F;  /* bits 15:10 */
    uint32_t num_umc_ctrs  = (ebx >> 16) & 0x3F;  /* bits 21:16 */

    printf("EBX breakdown:\n");
    printf("  Core PMC count      : %u\n", num_core_ctrs);
    printf("  DF (NB) PMC count   : %u\n", num_df_ctrs);
    printf("  UMC PMC count       : %u\n", num_umc_ctrs);

    /* ECX bits */
    printf("ECX breakdown:\n");
    printf("  Active UMC mask     : 0x%08X\n", ecx);
    if (ecx != 0) {
        /* popcount to get number of active UMCs */
        uint32_t active = ecx;
        int n = 0;
        while (active) { n += active & 1; active >>= 1; }
        printf("  Active UMC count    : %d\n", n);
        if (num_umc_ctrs > 0 && n > 0)
            printf("  PMCs per UMC        : %u\n", num_umc_ctrs / n);
    }

    return 0;
}

This is the stdout (emphasis mine):

CPUID leaf 0x80000022 (AMD Performance Monitoring and Debug)
  EAX = 0x00000003
  EBX = 0x00101106
  ECX = 0x0000000F
  EDX = 0x00000000

EAX breakdown:
  PerfMonV2 supported : yes
EBX breakdown:
  Core PMC count      : 6
  DF (NB) PMC count   : 4    <=============================== 4 DFCs!
  UMC PMC count       : 16
ECX breakdown:
  Active UMC mask     : 0x0000000F
  Active UMC count    : 4
  PMCs per UMC        : 4

This reports that there are 4 DF PMCs, which matches with the debug output of likwid-perfctr -e. Hmmmmmmmmm…

Conclusion and Follow Up

I reported all this information to the maintainer, and they have fixed this for Zen4/4c/5 by changing the DFC count to be determined dynamically using CPUID like they were already doing for UMCs; it now correctly reports 4 DFCs on my machine.

However, there is still a lot that I do not understand. My goal was just to get likwid-perfctr working, but how exactly do I reconcile the apparent discrepancy between Figure 1 and Figure 2? I’ll have to dig into section 2.1.8 on register sharing, and also better understand non general-purpose registers (GPRs) like model-specific registers (MSRs), and how to directly interact with them at the lowest level.