Challenge 4: Page Fault Stats

In this challenge, you’ll dive into eBPF to monitor page faults—those moments when a process accesses memory that isn’t currently mapped to their virtual address space, triggering a costly operation.

Your goal? Track and display page faults for processes that generate faults. In order to avoid flooding the user with messages, you will only display a message every X page faults per process, where X is the log_step you can set.

Description

Your goal is to develop an eBPF program that tracks the number of page faults generated by each process running of the system and prints the process ID, the name and the number of page faults of the process.

By page faults, we regroup the generic term for the situation when a program tries to access a page of memory that is not currently mapped in their address space. For instance, this can happen when a program tries to access memory that is not currently mapped, or when a program tries to write to a read-only page. In both cases, the kernel must intervene to resolve the situation, which can be costly in terms of performance.

In order to help you test your eBPF program, you are provided with a user-space program that generates a number of page faults.

Setup

Download the files for this challenge using:

$ wget --no-check-certificate https://people.montefiore.uliege.be/~gain/courses/info0940/asset/page_faults.tar.gz
$ tar -xzvf page_faults.tar.gz

The page_fault_gen program is located in page_faults/page_fault_gen.

You can compile the user-space program using the Makefile provided (simply run make within the page_fault_gen directory). Then you can run it:

$ ./page_fault_gen <num_page_faults>

This program generates a number of page faults which is approximately close to num_page_faults. This program is an helper to generate page faults and to help you test your eBPF program more easily.

Inside page_faults/src, you will find the same template as in tutorial 3. Use it to implement the eBPF program that tracks the number of page faults generated by all the processes running on the system.

What you need to do

Your task is to develop an eBPF program that monitors and track the number of page faults for each process and generate messages when the number of page fault of a particular process is a multiple of log_step. To achieve this, you will need to efficiently store and update fault statistics using eBPF maps. Your program should also send relevant data to user space by using perf buffers.

All in all, the eBPF program should be able to:
  1. Track the number of page faults generated by each process running on the system.

  2. Print the process ID, name and the number of page faults when the number of page faults of a process is a multiple of log_step. By default log_step is 50. You must use a perf buffer to print these information

As indicated, the log_step needs to be configurable. Therefore when loading the eBPF program (using the ecli command), the following argument can be provided:

  • --log_step: The log_step that will be used to determine when to print the message. Default should be 50.

This is an example of output that your program should generate:

$ sudo ecli src/package.json --log_step 70
INFO [faerie::elf] strtab: 0xa58 symtab 0xa90 relocs 0xad8 sh_offset 0xad8
INFO [bpf_loader_lib::skeleton::preload::section_loader] load runtime arg (user specified the value through cli, or predefined in the skeleton) for log_step: Number(70), real_type=<INT> 'int' bits:32 off:0 enc:signed, btf_type=BtfVar { name: "log_step", type_id: 29, kind: GlobalAlloc }
[...] # Other information provided by ecli
INFO [bpf_loader_lib::skeleton] Running ebpf program...
TIME     PID    COMM   NB_PAGE_FAULT
10:45:16  2134  gcc    70
10:45:16  2134  gcc    140
10:45:16  2134  gcc    210
10:45:16  1531  bash   70       # <-
10:45:16  2135  bash   70       # <- As you can see here, multiple different
10:45:16  1531  bash   140      # <- bash processes are running. Their page
10:45:16  2136  bash   70       # <- faults are tracked separately!
10:45:16  1531  bash   210      # <-
10:45:16  1531  bash   280      # <-
10:45:18  2137  gcc    70
10:45:18  2138  cc1    70
10:45:18  2138  cc1    140
10:45:18  2138  cc1    210
10:45:18  2138  cc1    280
10:45:18  2138  cc1    350
10:45:18  2138  cc1    420
10:45:18  2138  cc1    490
10:45:18  2138  cc1    560
10:45:18  2139  as     70
10:45:18  2139  as     140
10:45:18  2138  cc1    630
10:45:18  2138  cc1    700
10:45:18  2138  cc1    770
10:45:18  2138  cc1    840
10:45:18  2139  as     210
10:45:18  2139  as     280
10:45:18  2140  collect2 70

Tip

Pay attention to the fact that much of the code related to page faults is architecture-dependent, and some functions cannot be hooked on all architectures. Additionally, the actual implementation of page fault handling is complex, but there is a key function where the handling process begins that can be hooked across all architectures. The following resources can help you identify it: mmu-tlb-and-page-faults and chapter 4.6.1 Handling a Page Fault.

Important

  • The eBPF program is not specific to one process, but should track the page faults generated by all the processes running on the system.

  • When several processes of the same name are running, the eBPF program should not aggregate the number of page faults generated by all these processes. The number of page faults should be tracked per different process id (different PIDs).