eBPF Basics (Part 2)

In this tutorial, you will continue learning the basics of eBPF. Please read tutorial 3 before starting this tutorial.

The second project of this course will be to write a series of eBPF programs. This tutorial will continue helping you to understand the basics of eBPF and how to write eBPF programs.

Communication between eBPF programs and user-space programs

From what was learned in the previous tutorial, you can now create simple eBPF programs that hook certain events and perform some basic action, for example thanks to helper functions. However, right now the code that you can write is very limited. You simply execute a series of instructions once an event is triggered but you cannot store any data or state between events.

This is where maps come in. Maps are a way to store data in the kernel that can be accessed by (other) eBPF programs and even by user-space programs.

In this section, you will learn:

  • How to communicate between eBPF programs using maps.

  • How to communicate between user space and eBPF programs using maps.

  • How to communicate from eBPF programs to user space using perf buffers.

Communicating between eBPF programs using maps

Maps are data structures that live in the kernel and can be accessed by eBPF programs and user-space programs. They allow you to store state that persists across multiple events. In this section, you will learn how to manipulate them from the eBPF program, and in the next section how to manipulate them from the user-space.

To define maps, the SEC macro will be handy once again. You will need to define a C structure that will represent the kind of map you want to create. For example:

struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 1024);
    __type(key, __u32);
    __type(value, struct value);
} values SEC(".maps");

Note

__uint(), __type(), SEC() and BPF_MAP_TYPE_HASH are all macros defined by the kernel headers.

This structure defines a map that can store a maximum of 1024 entries. Each entry is a key-value pair where the key is a 32-bit unsigned integer and the value is a struct value (that you must define).

Finally, the type of the map is BPF_MAP_TYPE_HASH. This is one of the simplest types of maps. It is a hash table where you can store key-value pairs. You can find other types of maps here.

To interact with the map from within an eBPF program, you need to use helper functions. For example, to insert a value in the map, you can use the bpf_map_update_elem() helper function. In the link provided above, you can find the helper functions you can use for each map type (for example, for BPF_MAP_TYPE_HASH: https://docs.ebpf.io/linux/map-type/BPF_MAP_TYPE_HASH/).

Be careful: the fields you define in the map structure and the associated helper functions are not always the same per map type. Always check the documentation when using a new map type.

Try completing the template so that each time the execve syscall is called, the eBPF program will increment a counter in a map and print the number of times execve has been called so far.

Tip

The BPF_MAP_TYPE_ARRAY type of map initializes all its values to zero, so you don’t need to initialize the counter before using it. (source)

See a possible implementation
// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>

char LICENSE[] SEC("license") = "Dual BSD/GPL";

struct {
    __uint(type, BPF_MAP_TYPE_ARRAY);
    __uint(max_entries, 1);
    __type(key, __u32);
    __type(value, __u64);
} counter SEC(".maps");

SEC("tracepoint/syscalls/sys_enter_execve")
int handle_execve(struct trace_event_raw_sys_enter *ctx)
{
    __u32 key = 0;
    __u64 *count = bpf_map_lookup_elem(&counter, &key);
    if (!count)
        return 0;

    (*count)++;
    bpf_printk("execve called %lld times\n", *count);

    return 0;
}

To test your code, run make, then sudo ./prog and in another terminal, run sudo cat /sys/kernel/debug/tracing/trace_pipe to see the output of the bpf_printk() helper function. You can open a third terminal and write whatever command you want to trigger the execve syscall (echo "test" for example).

Communicating between user space and eBPF programs using maps

Now, we are going to see how to interact with maps from the user-space programs. The same maps that are used in eBPF programs can be read from and written to at any time by the user-space program while the eBPF program is running. This makes maps the standard mechanism for passing configuration or data in either direction between user space and the kernel.

To interact with these maps easily, libbpf provides high-level wrapper functions that take a pointer to the map instead of its file descriptor and come with added checks for key and value sizes. These are the recommended functions to use when possible. Those generally have the same name as their kernel counterparts but with an additional underscore after bpf_map.

For instance, to update an element in a map from user space, you would use:

bpf_map__update_elem(map, &key, sizeof(key), &val, sizeof(val), BPF_ANY);

Note

These higher-level functions require the bpf/libbpf.h header, which should already be included in your user-space program.

Important

Note that in the previous section of the tutorial, you used the bpf_map_update_elem() helper function inside the kernel (documentation here) while in user-space you are using the libbpf functions. They have similar names but they are different functions operating in entirely different contexts.

Under the hood: How does user space actually talk to the map?

Note

While libbpf makes it easy with high-level functions, the actual mechanism used to interact with the kernel is syscalls.

For each kind of map, there are different “Syscall commands” available. For BPF_MAP_TYPE_HASH for example, you see a syscall command named BPF_MAP_UPDATE_ELEM. What this means is that from the user-space program, you could theoretically use the bpf() syscall directly with the command BPF_MAP_UPDATE_ELEM as a parameter:

syscall(SYS_bpf, BPF_MAP_UPDATE_ELEM, &(union bpf_attr){
     .map_fd = bpf_map__fd(map),
     .key    = (__u64)(uintptr_t)&key,
     .value  = (__u64)(uintptr_t)&val,
     .flags  = BPF_ANY,
 }, sizeof(union bpf_attr));

However, directly using the bpf() syscall is a bit cumbersome. To help, libbpf provides some lower-level wrapper functions that make it slightly easier:

bpf_map_update_elem(bpf_map__fd(map), &key, &val, BPF_ANY);

But as mentioned earlier, you should skip both the raw syscalls and the low-level wrappers and rely on the high-level bpf_map__* functions instead.

Now, try modifying your prog.c so that it sets the counter of the previous example to 100 before running the eBPF program. You should see that the first time you trigger the execve syscall, it will print “execve called 101 times” instead of “execve called 1 times”.

See answer
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <bpf/libbpf.h>
#include <bpf/bpf.h>

static volatile int running = 1;

static void handle_sig(int sig) {
    running = 0;
}

int main(void) {
    struct bpf_object *obj;
    struct bpf_program *prog;
    struct bpf_link *link;
    int err;

    // Open the BPF object file (kernel compiled BPF program)
    obj = bpf_object__open_file("prog.bpf.o", NULL);
    if (!obj) {
        perror("bpf_object__open_file");
        return 1;
    }
    // Load the BPF object file into the kernel
    err = bpf_object__load(obj);
    if (err) {
        perror("bpf_object__load");
        bpf_object__close(obj);   // clean up on error
        return 1;
    }

    struct bpf_map *map = bpf_object__find_map_by_name(obj, "counter");
    __u32 key = 0;
    __u64 val = 100;
    bpf_map__update_elem(map, &key, sizeof(key), &val, sizeof(val), BPF_ANY);

    // Find the BPF program by name
    prog = bpf_object__find_program_by_name(obj, "handle_execve");
    if (!prog) {
        fprintf(stderr, "program not found\n");
        bpf_object__close(obj);   // clean up on error
        return 1;
    }

    // Attach the BPF program to the appropriate hook (e.g., tracepoint, kprobe)
    link = bpf_program__attach(prog);
    if (!link) {
        perror("bpf_program__attach");
        bpf_object__close(obj);   // clean up on error
        return 1;
    }

    signal(SIGINT, handle_sig);
    signal(SIGTERM, handle_sig);

    printf("Program loaded. Press Ctrl+C to exit.\n");
    // Keep the program running until interrupted
    while (running)
        sleep(1);

    // Cleanup
    bpf_link__destroy(link);   // detaches the program from the hook
    bpf_object__close(obj);    // unloads and frees the BPF object

    return 0;
}

Important

In these examples, the value of the maps are simple integers, but remember the very first example of this tutorial where the value of the map was a structure. To be able to access values of such a map both in user space and in kernel space, you need to define the structure in a shared header file that is included both in the .bpf.c file and in the user-space .c file (for example prog.h).

Communicating from eBPF programs to user space using perf buffers

With maps, you can share state between eBPF programs and read or write it from user space. However, maps are not well suited for streaming a continuous flow of events from the kernel to user space — for example, notifying user space every time a specific system call occurs.

Note

The perf buffer is not specific to eBPF. It is a high-performance ring buffer mechanism provided by the Linux perf_event subsystem, designed for efficient event logging, performance monitoring, and tracing. It enables the kernel to stream data to user-space applications with minimal overhead.

Using a perf buffer involves both a kernel-space and a user-space side.

Kernel side (in prog.bpf.c):

  1. Define a map of type BPF_MAP_TYPE_PERF_EVENT_ARRAY (it will always look the same, no need to modify it, except for the name):

    struct {
        __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
        __uint(key_size, sizeof(u32));
        __uint(value_size, sizeof(u32));
    } events SEC(".maps");
    
  2. In an header file, define a structure representing the data you want to send to user space (include this header file both in the .bpf.c file and in the .c file):

    struct struct_to_give_to_perf {
        // fields of the structure
    };
    
  3. Use the bpf_perf_event_output() helper function to send data to the perf buffer.

    struct struct_to_give_to_perf struct_perf = ... // fill the structure with the data you want to send
    bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &struct_perf, sizeof(struct_perf));
    

User-space side (in prog.c):

Replace the while loop and add the following setup before it in the template’s prog.c with the following to create a perf buffer from the (perf event array) map defined in the eBPF program and poll it for events every 100 ms:

// After attaching the eBPF program, look up the map and set up polling
int perf_map_fd = bpf_object__find_map_fd_by_name(obj, "events");
if (perf_map_fd < 0) {
    fprintf(stderr, "Failed to find perf BPF map: %s\n", strerror(errno));
    bpf_link__destroy(link);
    bpf_object__close(obj);
    return 1;
}

struct perf_buffer *pb = perf_buffer__new(
    perf_map_fd,
    8,            // number of pages for the buffer (you can keep it at 8 for the project)
    handle_event, // This function will be called for each event received
    handle_lost,  // This function will be called if events are lost
    NULL,
    NULL);

if (!pb) {
    fprintf(stderr, "Failed to create perf buffer: %s\n", strerror(errno));
    bpf_link__destroy(link);
    bpf_object__close(obj);
    return 1;
}

while (running) {
    err = perf_buffer__poll(pb, 100 /* timeout in ms */);
    if (err < 0 && err != -EINTR) {
        printf("Polling error %d\n", err);
        break;
    }
}

perf_buffer__free(pb);

Understanding perf_buffer__poll

The function perf_buffer__poll(pb, timeout_ms) tells the user-space program to wait for new events from the perf buffer. The timeout_ms argument specifies the maximum amount of time (in milliseconds) the function will block waiting for an event. It does not mean the function automatically runs every 100ms in the background. Instead, it polls the buffer and returns either when an event is processed or when the timeout expires.

Because it only polls once per call and blocks execution while doing so, we place it inside a while (running) loop. When the poll call times out and returns control to the loop, it gives your user-space program a chance to execute other logic before going back to waiting on the perf buffer.

When an event is received, the handle_event() function will be called, and if an event is lost, the handle_lost() function will be called. You thus have to define these two functions in your user-space program.

Here are the prototypes of these two functions:

void handle_event(void *ctx, int cpu, void *data, unsigned int data_sz)
void handle_lost(void *ctx, int cpu, __u64 lost_cnt)

Try defining the handle_event() function so that it prints the value of the count_value field of the structure sent from the kernel when an event is received. Also define the handle_lost() function to print a warning message. Then, modify the eBPF program so that it sends the value of the counter to user space every time it is updated instead of printing it in the kernel with bpf_printk(). You should see that every time you trigger the execve syscall, the value of the counter is printed in the console of the user-space program.

You can find a full working example here. To download it on your VM:

$ wget --no-check-certificate https://people.montefiore.uliege.be/~gain/courses/info0940/asset/perf_example.tar.gz
$ tar -xzvf perf_example.tar.gz

Hooks (part 2)

Last week, you learned how to use syscall hooks and uprobes. This week, you will learn how to use kprobes.

kprobes

Similar to uprobes which allow you to hook a function from a user-space application, kprobes allow you to hook a function from the kernel.

The syntax to use kprobes is almost the same as for uprobes: the only difference is that you do not provide a path to the executable where you want to hook the function. For example, to hook the blk_mq_start_request(struct request *rq) function, which is the function that is called when a block device is about to start a request, you can write the following code:

SEC("kprobe/blk_mq_start_request")
int BPF_KPROBE(handle_blk_mq_start_request, struct request *rq)
{
    u64 start_time_ns = BPF_CORE_READ(rq, start_time_ns);
    bpf_printk("Timestamp (in nanoseconds) that this request was allocated for this IO: %lld (current time: %lld)\n", start_time_ns, bpf_ktime_get_ns());
    return 0;
}

Note

BPF_CORE_READ is a macro that allows you to read a field of a structure. In this case, it reads the start_time_ns field of the rq structure.

Similarly to uprobes, you provide the name of the function you want to hook in the SEC macro, you use the BPF_KPROBE() whose first argument is the name of the function that will be called when the event is triggered, and the rest of the arguments are the arguments of the function you want to hook.

In order to know which function to hook and what are its arguments, you need to be able to read the kernel source code and/or documentation. To this end, you can check the next page of this tutorial: Kernel Code Overview.

A final note about kprobes is that you should avoid using inline functions as probe points. This is because kprobes may not be able to guarantee that probe points are registered for all instances of that function. To know whether a function is inlined or not, you can check the prototype of the function in the kernel source code.

Note

Instead of using kprobes, you can look into fentry, which offers advantages over kprobes. It is not talked about in this tutorial for brevity (and you can choose which one you want to use for the project).

Download the full code of this example using:

$ wget --no-check-certificate https://people.montefiore.uliege.be/~gain/courses/info0940/asset/kprobe_example.tar.gz
$ tar -xzvf kprobe_example.tar.gz

A Note on eBPF Program Types

Throughout these tutorials, we have focused entirely on tracing programs (like uprobes, kprobes, and tracepoints). However, eBPF is incredibly versatile, and BPF_PROG_TYPE_TRACING is just one category.

There are many other types of eBPF programs, such as those used for networking, security, and more. It is important to know that some features in eBPF (like the bpf_timer API) are only available to specific program types and cannot be used in the tracing programs we write here.

You do not need to learn these other program types for your project or challenges. However, if you’re curious about what else eBPF can do, you can check out the official documentation on program types here: https://docs.ebpf.io/linux/program-type/