Challenge 4: Page Fault Stats¶
In this challenge, you’ll dive into eBPF to monitor page faults—those moments when a process accesses memory that isn’t currently mapped to their virtual address space, triggering a costly operation.
Your goal? Track and display page faults for processes that generate faults. In order to avoid flooding the user with messages, you will only display a message every X page faults per process, where X is the log_step you can set.
Description¶
Your goal is to develop an eBPF program that tracks the number of page faults generated by each process running of the system and prints the process ID, the name and the number of page faults of the process.
By page faults, we regroup the generic term for the situation when a program tries to access a page of memory that is not currently mapped in their address space. For instance, this can happen when a program tries to access memory that is not currently mapped, or when a program tries to write to a read-only page. In both cases, the kernel must intervene to resolve the situation, which can be costly in terms of performance.
In order to help you test your eBPF program, you are provided with a user-space program that generates a number of page faults.
Setup¶
Download the files for this challenge using:
$ wget --no-check-certificate https://people.montefiore.uliege.be/~gain/courses/info0940/asset/page_faults.tar.gz
$ tar -xzvf page_faults.tar.gz
The page_fault_gen
program is located in
page_faults/page_fault_gen
.
You can compile the user-space program using the Makefile provided
(simply run make
within the page_fault_gen
directory). Then you
can run it:
$ ./page_fault_gen <num_page_faults>
This program generates a number of page faults which is approximately
close to num_page_faults
. This program is an helper to generate
page faults and to help you test your eBPF program more easily.
Inside page_faults/src
, you will find the same template as in
tutorial 3. Use it to implement the eBPF program that tracks the
number of page faults generated by all the processes running on the
system.
What you need to do¶
Your task is to develop an eBPF program that monitors and track the number of page faults for each process and generate messages when the number of page fault of a particular process is a multiple of log_step. To achieve this, you will need to efficiently store and update fault statistics using eBPF maps. Your program should also send relevant data to user space by using perf buffers.
- All in all, the eBPF program should be able to:
Track the number of page faults generated by each process running on the system.
Print the process ID, name and the number of page faults when the number of page faults of a process is a multiple of log_step. By default log_step is 50. You must use a perf buffer to print these information
As indicated, the log_step needs to be configurable. Therefore when loading the eBPF
program (using the ecli
command), the following argument can be provided:
--log_step
: The log_step that will be used to determine when to print the message. Default should be 50.
This is an example of output that your program should generate:
$ sudo ecli src/package.json --log_step 70
INFO [faerie::elf] strtab: 0xa58 symtab 0xa90 relocs 0xad8 sh_offset 0xad8
INFO [bpf_loader_lib::skeleton::preload::section_loader] load runtime arg (user specified the value through cli, or predefined in the skeleton) for log_step: Number(70), real_type=<INT> 'int' bits:32 off:0 enc:signed, btf_type=BtfVar { name: "log_step", type_id: 29, kind: GlobalAlloc }
[...] # Other information provided by ecli
INFO [bpf_loader_lib::skeleton] Running ebpf program...
TIME PID COMM NB_PAGE_FAULT
10:45:16 2134 gcc 70
10:45:16 2134 gcc 140
10:45:16 2134 gcc 210
10:45:16 1531 bash 70 # <-
10:45:16 2135 bash 70 # <- As you can see here, multiple different
10:45:16 1531 bash 140 # <- bash processes are running. Their page
10:45:16 2136 bash 70 # <- faults are tracked separately!
10:45:16 1531 bash 210 # <-
10:45:16 1531 bash 280 # <-
10:45:18 2137 gcc 70
10:45:18 2138 cc1 70
10:45:18 2138 cc1 140
10:45:18 2138 cc1 210
10:45:18 2138 cc1 280
10:45:18 2138 cc1 350
10:45:18 2138 cc1 420
10:45:18 2138 cc1 490
10:45:18 2138 cc1 560
10:45:18 2139 as 70
10:45:18 2139 as 140
10:45:18 2138 cc1 630
10:45:18 2138 cc1 700
10:45:18 2138 cc1 770
10:45:18 2138 cc1 840
10:45:18 2139 as 210
10:45:18 2139 as 280
10:45:18 2140 collect2 70
Tip
Pay attention to the fact that much of the code related to page faults is architecture-dependent, and some functions cannot be hooked on all architectures. Additionally, the actual implementation of page fault handling is complex, but there is a key function where the handling process begins that can be hooked across all architectures. The following resources can help you identify it: mmu-tlb-and-page-faults and chapter 4.6.1 Handling a Page Fault.
Important
The eBPF program is not specific to one process, but should track the page faults generated by all the processes running on the system.
When several processes of the same name are running, the eBPF program should not aggregate the number of page faults generated by all these processes. The number of page faults should be tracked per different process id (different PIDs).