Kernel

O_DIRECT - The Problem That Grew Up With Multi-Threading

Introduction: A Problem Hiding in Plain Sight

Direct I/O (O_DIRECT) has been a contentious feature in Linux since its introduction. Linus Torvalds famously called it a design “by a deranged monkey on some serious mind-controlling substances” back in 2002. Yet for years, it continued to work—mostly. Applications used it, databases relied on it, and virtual machines benefited from its zero-copy performance.

But something fundamental has changed. As modern software has embraced multi-threading at every level—from applications to filesystems within the kernel itself—a problem that was once manageable has become critical. The truth is stark: with O_DIRECT, there is no way to guarantee that nobody will touch your I/O buffers during the operation.

eBPF UDP Load Balancer with Weighted Round-Robin

Introduction

I’ve been working on a new project that required high-performance UDP load balancing with dynamic weight adjustment. Traditional userspace load balancers introduce latency that’s unacceptable for our use case, so I decided to implement a kernel-level solution using eBPF (extended Berkeley Packet Filter).

The result is ebpflb_udp_wrr, an eBPF-based UDP load balancer that distributes incoming UDP traffic to local listeners using a weighted round-robin algorithm.

Why eBPF and XDP?

eBPF has revolutionized how we can extend kernel functionality without writing kernel modules or modifying the kernel source. Combined with XDP (eXpress Data Path), we can process packets at the earliest possible point in the networking stack—right when they arrive at the network interface—minimizing latency.

Compilers, CPU, Memory, Cache Coherency, Atomicity, Syncronization and ordering Are Not Black Magic but The Mix Is Close Enough

Introduction: The Illusion of Sequential Execution

When you write code, you naturally think in terms of sequential execution: instruction A happens, then instruction B, then instruction C. This mental model works perfectly—until you start writing concurrent code or working with hardware. Then you discover that modern CPUs, compilers, and memory systems conspire to execute your code in ways you never imagined.

The truth is that sequential consistency is largely an illusion maintained by your compiler and CPU to make programming tractable. But when multiple cores or threads are involved, that illusion breaks down in spectacular and subtle ways.