eBPF Basics (Part 2)¶
In this tutorial, you will continue learning the basics of eBPF. Please read tutorial 3 before starting this tutorial.
The second project of this course will be to write a series of eBPF programs. This tutorial will continue helping you to understand the basics of eBPF and how to write eBPF programs.
Communication between eBPF programs and user-space programs¶
From what was learned in the previous tutorial, you can now create simple eBPF programs that hook certain events and perform some basic action, for example thanks to helper functions. However, right now the code that you can write is very limited. You simply execute a series of instructions once an event is triggered but you cannot store any data or state between events.
This is where maps come in. Maps are a way to store data in the kernel that can be accessed by (other) eBPF programs and even by user-space programs.
In this section, you will learn:
How to communicate between eBPF programs using maps.
How to communicate between user space and eBPF programs using maps.
How to communicate from eBPF programs to user space using perf buffers.
Communicating between eBPF programs using maps¶
Maps are data structures that live in the kernel and can be accessed by eBPF programs and user-space programs. They allow you to store state that persists across multiple events. In this section, you will learn how to manipulate them from the eBPF program, and in the next section how to manipulate them from the user-space.
To define maps, the SEC macro will be handy once again. You will need
to define a C structure that will represent the kind of map you want to
create. For example:
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 1024);
__type(key, __u32);
__type(value, struct value);
} values SEC(".maps");
Note
__uint(), __type(), SEC() and BPF_MAP_TYPE_HASH are
all macros defined by the kernel headers.
This structure defines a map that can store a maximum of 1024 entries. Each
entry is a key-value pair where the key is a 32-bit unsigned integer and the
value is a struct value (that you must define).
Finally, the type of the map is BPF_MAP_TYPE_HASH. This is one of the
simplest types of maps. It is a hash table where you can store key-value pairs.
You can find other types of maps here.
To interact with the map from within an eBPF program, you need to use helper
functions. For example, to insert a value in the map, you can use the
bpf_map_update_elem() helper function. In the link provided above, you can
find the helper functions you can use for each map type (for example, for
BPF_MAP_TYPE_HASH: https://docs.ebpf.io/linux/map-type/BPF_MAP_TYPE_HASH/).
Be careful: the fields you define in the map structure and the associated helper functions are not always the same per map type. Always check the documentation when using a new map type.
Try completing the template so that each time the execve syscall is called, the eBPF program will increment a counter in a map and print the number of times execve has been called so far.
Tip
The BPF_MAP_TYPE_ARRAY type of map initializes all its values to
zero, so you don’t need to initialize the counter before using it. (source)
See a possible implementation
// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
char LICENSE[] SEC("license") = "Dual BSD/GPL";
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(max_entries, 1);
__type(key, __u32);
__type(value, __u64);
} counter SEC(".maps");
SEC("tracepoint/syscalls/sys_enter_execve")
int handle_execve(struct trace_event_raw_sys_enter *ctx)
{
__u32 key = 0;
__u64 *count = bpf_map_lookup_elem(&counter, &key);
if (!count)
return 0;
(*count)++;
bpf_printk("execve called %lld times\n", *count);
return 0;
}
To test your code, run make, then sudo ./prog and in another terminal,
run sudo cat /sys/kernel/debug/tracing/trace_pipe to see the output of the
bpf_printk() helper function. You can open a third terminal and write
whatever command you want to trigger the execve syscall (echo "test" for
example).
Communicating between user space and eBPF programs using maps¶
Now, we are going to see how to interact with maps from the user-space programs. The same maps that are used in eBPF programs can be read from and written to at any time by the user-space program while the eBPF program is running. This makes maps the standard mechanism for passing configuration or data in either direction between user space and the kernel.
To interact with these maps easily, libbpf provides high-level wrapper
functions that take a pointer to the map instead of its file descriptor and
come with added checks for key and value sizes. These are the recommended
functions to use when possible. Those generally have the same name as their
kernel counterparts but with an additional underscore after bpf_map.
For instance, to update an element in a map from user space, you would use:
bpf_map__update_elem(map, &key, sizeof(key), &val, sizeof(val), BPF_ANY);
Note
These higher-level functions require the bpf/libbpf.h header,
which should already be included in your user-space program.
Important
Note that in the previous section of the tutorial, you used the
bpf_map_update_elem() helper function inside the kernel (documentation
here)
while in user-space you are using the libbpf functions. They have similar
names but they are different functions operating in entirely different
contexts.
Under the hood: How does user space actually talk to the map?
Note
While libbpf makes it easy with high-level functions, the actual mechanism used to interact with the kernel is syscalls.
For each kind of map, there are different “Syscall commands” available. For BPF_MAP_TYPE_HASH for example, you see
a syscall command named BPF_MAP_UPDATE_ELEM. What this means is that from the
user-space program, you could theoretically use the bpf() syscall directly with the command
BPF_MAP_UPDATE_ELEM as a parameter:
syscall(SYS_bpf, BPF_MAP_UPDATE_ELEM, &(union bpf_attr){
.map_fd = bpf_map__fd(map),
.key = (__u64)(uintptr_t)&key,
.value = (__u64)(uintptr_t)&val,
.flags = BPF_ANY,
}, sizeof(union bpf_attr));
However, directly using the bpf() syscall is a bit cumbersome. To help, libbpf
provides some lower-level wrapper functions
that make it slightly easier:
bpf_map_update_elem(bpf_map__fd(map), &key, &val, BPF_ANY);
But as mentioned earlier, you should skip both the raw syscalls and the
low-level wrappers and rely on the high-level bpf_map__* functions instead.
Now, try modifying your prog.c so that it sets the counter of the previous
example to 100 before running the eBPF program. You should see that the first
time you trigger the execve syscall, it will print “execve called 101 times”
instead of “execve called 1 times”.
See answer
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <bpf/libbpf.h>
#include <bpf/bpf.h>
static volatile int running = 1;
static void handle_sig(int sig) {
running = 0;
}
int main(void) {
struct bpf_object *obj;
struct bpf_program *prog;
struct bpf_link *link;
int err;
// Open the BPF object file (kernel compiled BPF program)
obj = bpf_object__open_file("prog.bpf.o", NULL);
if (!obj) {
perror("bpf_object__open_file");
return 1;
}
// Load the BPF object file into the kernel
err = bpf_object__load(obj);
if (err) {
perror("bpf_object__load");
bpf_object__close(obj); // clean up on error
return 1;
}
struct bpf_map *map = bpf_object__find_map_by_name(obj, "counter");
__u32 key = 0;
__u64 val = 100;
bpf_map__update_elem(map, &key, sizeof(key), &val, sizeof(val), BPF_ANY);
// Find the BPF program by name
prog = bpf_object__find_program_by_name(obj, "handle_execve");
if (!prog) {
fprintf(stderr, "program not found\n");
bpf_object__close(obj); // clean up on error
return 1;
}
// Attach the BPF program to the appropriate hook (e.g., tracepoint, kprobe)
link = bpf_program__attach(prog);
if (!link) {
perror("bpf_program__attach");
bpf_object__close(obj); // clean up on error
return 1;
}
signal(SIGINT, handle_sig);
signal(SIGTERM, handle_sig);
printf("Program loaded. Press Ctrl+C to exit.\n");
// Keep the program running until interrupted
while (running)
sleep(1);
// Cleanup
bpf_link__destroy(link); // detaches the program from the hook
bpf_object__close(obj); // unloads and frees the BPF object
return 0;
}
Important
In these examples, the value of the maps are simple integers,
but remember the very first example of this tutorial where the value of the
map was a structure. To be able to access values of such a map both in user
space and in kernel space, you need to define the structure in a shared
header file that is included both in the .bpf.c file and in the
user-space .c file (for example prog.h).
Communicating from eBPF programs to user space using perf buffers¶
With maps, you can share state between eBPF programs and read or write it from user space. However, maps are not well suited for streaming a continuous flow of events from the kernel to user space — for example, notifying user space every time a specific system call occurs.
Note
The perf buffer is not specific to eBPF. It is a high-performance
ring buffer mechanism provided by the Linux perf_event subsystem,
designed for efficient event logging, performance monitoring, and tracing.
It enables the kernel to stream data to user-space applications with minimal
overhead.
Using a perf buffer involves both a kernel-space and a user-space side.
Kernel side (in prog.bpf.c):
Define a map of type
BPF_MAP_TYPE_PERF_EVENT_ARRAY(it will always look the same, no need to modify it, except for the name):struct { __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY); __uint(key_size, sizeof(u32)); __uint(value_size, sizeof(u32)); } events SEC(".maps");
In an header file, define a structure representing the data you want to send to user space (include this header file both in the
.bpf.cfile and in the.cfile):struct struct_to_give_to_perf { // fields of the structure };
Use the
bpf_perf_event_output()helper function to send data to the perf buffer.struct struct_to_give_to_perf struct_perf = ... // fill the structure with the data you want to send bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &struct_perf, sizeof(struct_perf));
User-space side (in prog.c):
Replace the while loop and add the following setup before it in the
template’s prog.c with the following to create a perf buffer from the (perf
event array) map defined in the eBPF program and poll it for events every 100
ms:
// After attaching the eBPF program, look up the map and set up polling
int perf_map_fd = bpf_object__find_map_fd_by_name(obj, "events");
if (perf_map_fd < 0) {
fprintf(stderr, "Failed to find perf BPF map: %s\n", strerror(errno));
bpf_link__destroy(link);
bpf_object__close(obj);
return 1;
}
struct perf_buffer *pb = perf_buffer__new(
perf_map_fd,
8, // number of pages for the buffer (you can keep it at 8 for the project)
handle_event, // This function will be called for each event received
handle_lost, // This function will be called if events are lost
NULL,
NULL);
if (!pb) {
fprintf(stderr, "Failed to create perf buffer: %s\n", strerror(errno));
bpf_link__destroy(link);
bpf_object__close(obj);
return 1;
}
while (running) {
err = perf_buffer__poll(pb, 100 /* timeout in ms */);
if (err < 0 && err != -EINTR) {
printf("Polling error %d\n", err);
break;
}
}
perf_buffer__free(pb);
Understanding perf_buffer__poll
The function perf_buffer__poll(pb, timeout_ms) tells the user-space program
to wait for new events from the perf buffer. The timeout_ms argument
specifies the maximum amount of time (in milliseconds) the function will block
waiting for an event. It does not mean the function automatically runs every
100ms in the background. Instead, it polls the buffer and returns either when
an event is processed or when the timeout expires.
Because it only polls once per call and blocks execution while doing so, we
place it inside a while (running) loop. When the poll call times out and
returns control to the loop, it gives your user-space program a chance to
execute other logic before going back to waiting on the perf buffer.
When an event is received, the handle_event() function will be called, and
if an event is lost, the handle_lost() function will be called. You thus
have to define these two functions in your user-space program.
Here are the prototypes of these two functions:
void handle_event(void *ctx, int cpu, void *data, unsigned int data_sz)
void handle_lost(void *ctx, int cpu, __u64 lost_cnt)
Try defining the handle_event() function so that it prints the value of the
count_value field of the structure sent from the kernel when an event is
received. Also define the handle_lost() function to print a warning
message. Then, modify the eBPF program so that it sends the value of the
counter to user space every time it is updated instead of printing it in the
kernel with bpf_printk(). You should see that every time you trigger the
execve syscall, the value of the counter is printed in the console of the
user-space program.
You can find a full working example here. To download it on your VM:
$ wget --no-check-certificate https://people.montefiore.uliege.be/~gain/courses/info0940/asset/perf_example.tar.gz
$ tar -xzvf perf_example.tar.gz
Hooks (part 2)¶
Last week, you learned how to use syscall hooks and uprobes. This week, you will learn how to use kprobes.
kprobes¶
Similar to uprobes which allow you to hook a function from a user-space application, kprobes allow you to hook a function from the kernel.
The syntax to use kprobes is almost the same as for uprobes: the only
difference is that you do not provide a path to the executable where you want
to hook the function. For example, to hook the blk_mq_start_request(struct
request *rq) function, which is the function that is called when a block
device is about to start a request, you can write the following code:
SEC("kprobe/blk_mq_start_request")
int BPF_KPROBE(handle_blk_mq_start_request, struct request *rq)
{
u64 start_time_ns = BPF_CORE_READ(rq, start_time_ns);
bpf_printk("Timestamp (in nanoseconds) that this request was allocated for this IO: %lld (current time: %lld)\n", start_time_ns, bpf_ktime_get_ns());
return 0;
}
Note
BPF_CORE_READ is a macro that allows you to read a field of a
structure. In this case, it reads the start_time_ns field of the rq
structure.
Similarly to uprobes, you provide the name of the function you want to hook in
the SEC macro, you use the BPF_KPROBE() whose first argument is the name of
the function that will be called when the event is triggered, and the rest of
the arguments are the arguments of the function you want to hook.
In order to know which function to hook and what are its arguments, you need to be able to read the kernel source code and/or documentation. To this end, you can check the next page of this tutorial: Kernel Code Overview.
A final note about kprobes is that you should avoid using inline functions as probe points. This is because kprobes may not be able to guarantee that probe points are registered for all instances of that function. To know whether a function is inlined or not, you can check the prototype of the function in the kernel source code.
Note
Instead of using kprobes, you can look into fentry, which offers advantages over kprobes. It is not talked about in this tutorial for brevity (and you can choose which one you want to use for the project).
Download the full code of this example using:
$ wget --no-check-certificate https://people.montefiore.uliege.be/~gain/courses/info0940/asset/kprobe_example.tar.gz
$ tar -xzvf kprobe_example.tar.gz
A Note on eBPF Program Types¶
Throughout these tutorials, we have focused entirely on tracing programs
(like uprobes, kprobes, and tracepoints). However, eBPF is incredibly
versatile, and BPF_PROG_TYPE_TRACING is just one category.
There are many other types of eBPF programs, such as those used for
networking, security, and more. It is important to know that some
features in eBPF (like the bpf_timer API) are only available to
specific program types and cannot be used in the tracing programs we
write here.
You do not need to learn these other program types for your project or challenges. However, if you’re curious about what else eBPF can do, you can check out the official documentation on program types here: https://docs.ebpf.io/linux/program-type/