CVE-2026-46242 Explained: Bad Epoll Linux Root Exploit

Some vulnerabilities announce themselves with a crash, a panic, or a loud exploit chain that sets the security community on fire. Bad Epoll didn’t. It sat quietly in the mainline Linux kernel for almost three years. It survived kernel audits thorough enough to catch its next-door neighbor, and it left behind almost no trace of its own existence at runtime. It took a PhD student deliberately going looking for it to prove it was there at all.

Now that it’s public, tracked as CVE-2026-46242, it is forcing a hard, uncomfortable conversation about what “the code has been reviewed” actually means when the reviewer – human or machine – is looking for evidence that a bug doesn’t exist.

I’ve spent the past few days pulling apart the disclosure, the researcher’s root-cause writeup, the kernel commit history, and the wider run of Linux privilege-escalation bugs we’ve seen this year. What follows is the full, unvarnished picture of Bad Epoll, why it’s nearly invisible to standard sanitizers, how an AI model missed it, and the absolute masterclass in exploit engineering required to turn a 10-instruction race window into a reliable root shell.

The Ubiquity of Epoll (and Why You Can’t Just Turn It Off)

To understand the severity of this bug, you have to understand the footprint of the subsystem it lives in.

epoll is the mechanism Linux provides for programs to watch a massive number of file descriptors – sockets, pipes, timers, other epoll instances – for activity, without burning CPU cycles polling each one individually. It is the absolute bedrock of high-performance networking on Linux. Nginx, almost every major database engine, container runtimes, and the event loops behind Node.js and Python’s asyncio all ultimately call down into epoll_wait(). Android uses it. Chrome’s internals use it.

This ubiquity is exactly why a bug in epoll’s teardown path is exponentially more dangerous than a bug in an obscure filesystem driver. There is no module to blacklist. There is no kernel config flag that disables epoll on a production system without breaking half of userspace. Whatever the fix is, it has to be the fix. There is no workaround that doesn’t involve “patch the kernel.”

Anatomy of the Bug: A Masterclass in Race Conditions

Bad Epoll lives in fs/eventpoll.c, specifically in the path that runs when you close an epoll file descriptor that is watching another epoll file descriptor. Nesting one epoll instance inside another is a legitimate, fully supported configuration, and it is exactly the setup this race condition requires.

When a program closes an epoll file descriptor, the kernel walks that fd’s list of watched items and detaches each one. Partway through detaching a watched item, the code clears a field on the target file object (specifically, file->f_ep), effectively announcing, “this file is no longer being monitored,” and then keeps working with that target object.

Here is where the world falls apart.

If the target file is being closed by a different thread on a different CPU at almost exactly that moment, that second thread’s cleanup path checks the exact same field. If it observes the field is already cleared, it assumes cleanup has nothing left to do, skips a critical step it’s supposed to run, and proceeds straight to freeing the target object entirely – both the eventpoll structure behind it and the file object itself.

The first thread doesn’t know any of this happened. It keeps executing under the assumption that the object it just “unmarked” is still alive, and writes through a pointer into memory that the second thread has already handed back to the allocator.

This results in two distinct Use-After-Frees (UAFs):

The Eventpoll UAF: A write of 0 at offset 160 of a freed kmalloc-192 struct eventpoll.
The Silent File UAF: A read and subsequent invalid free of the struct file object, which was never reference-counted up during the teardown.

The kernel’s security team has been explicit: this requires no special capabilities, no user namespaces, and no configuration beyond CONFIG_EPOLL being enabled (which it always is on any general-purpose Linux distribution). An attacker needs exactly one thing: the ability to run code as an ordinary, logged-in local user. That’s it.

Bad Epoll PoC

This is the detail that keeps getting flattened in coverage, so let’s be precise. Both Bad Epoll and a separate, already-patched bug, CVE-2026-43074, trace back to a single 2023 kernel commit that reworked how epoll manages reference counts to reduce contention on a shared mutex. That one change quietly introduced two distinct race conditions into the same roughly 2,500-line stretch of code.

Anthropic’s Mythos AI model, reviewing that code, found and correctly flagged the first one (CVE-2026-43074). The fix for that bug landed in mainline on April 2. As a side effect of how it was fixed, it deferred the freeing of the eventpoll structure until after a grace period. This happened to close off half of what would become Bad Epoll’s race window.

But it didn’t touch the other half: the struct file object that was still being torn down without its reference count ever having been raised. That second, narrower problem is what PhD student Chung reported. It’s why a kernel that had already received “the epoll fix” in April was still vulnerable through late April. The AI model cleared the bar for the sibling bug, but it didn’t clear it twice.

Why is this specific bug nearly invisible at runtime? The forensic detail that explains the AI-auditing angle is that the struct file objects come from a slab allocator cache marked SLAB_TYPESAFE_BY_RCU. This designation exists so the kernel can reuse “type-stable” memory quickly after it’s freed, on the assumption that any code touching it afterward will verify it’s still the object it expects.

Pair that with the fact that the vulnerable read happens while the offending thread is holding a spinlock (file->f_lock), and you land in a code path where KASAN (the kernel’s standard memory-error sanitizer) structurally cannot fire. It isn’t a coincidence or an oversight in the sanitizer’s coverage; it’s a known blind spot for this exact category of allocation. To make matters worse, the 6.12 long-term-support branch never backported a companion protection for this specific check. Even a kernel built with every relevant debug option turned on won’t flag anything.

A model reviewing this file for correctness has to reason about it the same way a human auditor does when no dynamic testing signal is available: by mentally simulating concurrent execution across a window measured in single-digit machine instructions. That is a fundamentally harder task than finding a bug that trips a sanitizer during a fuzzing run.

Forging a Weapon: The kernelCTF Exploit Deep Dive

To understand the sheer engineering required to weaponize this, we have to look at the exploit written for Google’s kernelCTF program, which targets lts-6.12.67 and cos-121-18867.294.100. The exploit turns a microscopic race condition into a reliable 8-byte write, leverages it to create a dangling struct file, crosses caches to control its memory, leaks kernel addresses, and finally hijacks control flow to pop a root shell.

Here is the technical breakdown of how Chung turned a ghost into a weapon.

1. Leaking KASLR via Prefetch Side-Channels

Before you can exploit the kernel, you need to know where it is. The exploit leaks the kernel base using a prefetch side-channel (based on the EntryBleed technique by Gruss et al.). It uses libxdk to scan the memory space, bounding the range using GetKernelPageCount(), and confirms the base before proceeding.

2. Widening the 10-Instruction Race Window

The race window between clearing f_ep and the subsequent hlist_del_rcu() is about 10 instructions wide. On LTS, it’s even narrower because __ep_remove() takes a spinlock right before the window, and the lock shares a 64-byte cache-line with f_op, making the subsequent read of f_op usually a cache hit.

To widen this, the exploit employs two brilliant techniques:

False Sharing: f_count and f_op of the target file sit on the same 64-byte cache-line. CPU 1 spins close(dup(ep_race_target)) to continuously write f_count. This invalidates CPU 0’s copy of the cache-line, forcing the racer’s is_file_epoll() read of f_op to stall on a cache miss exactly inside the race window.
Timerfd Interrupt: A shared timerfd has about 3,000 async epoll waiters attached. When it expires, the wakeup walks the whole waiter queue, stalling the racer’s close() and stretching the window.

Because the IRQ still has to fire inside that tiny window, the exploit uses an adaptive launch-ahead timing search. The main thread measures its own false-sharing burst, and the racer arms an absolute-time timerfd to fire in the middle of that burst. It then searches for the exact launch_ahead nanosecond offset that ensures the IRQ lands perfectly inside the close() execution.

3. The Race-Win Oracle (A Stroke of Genius)

The window is so narrow that most attempts miss. If you miss and try to proceed, you crash the kernel. You need a way to detect a win without triggering a side-effect like a panic.

Chung built a “race-win oracle” using a depth-3 nested epoll chain. When the race succeeds, the UAF write zeroes out refs.first of the target eventpoll. The exploit then calls epoll_ctl() to add the nested chain to the corrupted target.

The kernel’s ep_loop_check() function calculates the nesting depth.

If the race won: refs.first is 0. The kernel thinks the target has no waiters. The depth calculation passes, and epoll_ctl() returns 0.
If the race missed: refs.first still links the waiter. The depth calculation exceeds EP_MAX_NESTS, and epoll_ctl() returns -ELOOP.

By simply checking the return code of epoll_ctl(), the exploit knows if it won the race. If it gets -ELOOP, it cleanly retries. No crashes, no panics, just a silent retry loop.

4. Cross-Cache Spraying and the Pipe Reclaim

Winning the race gives us a UAF write, but we need a target that gives us arbitrary read/write. The exploit zeroes the refs field of an eventpoll that is being watched by another eventpoll. This results in a dangling pointer to a struct file.

But that struct file sits in the filp slab cache, and we don’t control its bytes yet. Enter the Cross-Cache Attack.
The exploit allocates a bunch of struct file objects (via open("/dev/null")) to fill the per-CPU partial slabs. It then frees the target file and its neighbors, turning the slab page into an empty page. Because of SLAB_TYPESAFE_BY_RCU, the page is deferred to RCU and then pushed to the buddy allocator.

The exploit then allocates 256 pages of pipe buffers. Statistically, one of those pipe pages reclaims the exact physical page that previously held our victim struct file. From that moment on, writing to the pipe gives us full control over the bytes of the dangling struct file. We can now forge a fake file object.

(Note: On Google’s COS target, struct file is 256 bytes instead of 192, and is freed via individual call_rcu callbacks rather than a single slab callback. The exploit adapts by spreading the reclaim writes over ~1 second to ensure they land after the page reaches the buddy allocator.)

5. From Constrained to Arbitrary Read

With a fake struct file in place, the exploit triggers a read by calling cat /proc/self/fdinfo/<ep_uaf_waiter>. This invokes ep_show_fdinfo(), which follows our fake file’s f_inode pointer and prints inode->i_ino and inode->i_sb->s_dev.

By setting f_inode to target_address - offset(i_ino), we can read 8 bytes from any address. But there is a catch: the kernel also dereferences inode->i_sb to read s_dev. If i_sb doesn’t point to a valid kernel memory address, the kernel panics. This is a Constrained Arbitrary Read (AAR).

To bypass this, the exploit walks the kernel’s task_struct tree starting from init_task until it finds its own process. Once it has its own task_struct, it upgrades to an Unconstrained AAR.

It uses the sigaltstack() system call to set its own current->sas_ss_sp (signal alternate stack pointer) to target_address - offset(s_dev). It then forges the fake file so that i_sb points to current->sas_ss_sp. Now, when the kernel reads i_sb->s_dev, it is actually reading the 4 bytes at our exact target address. The constraint is completely dropped.

6. RIP Control and the Stack Pivot

With an unconstrained read, the exploit walks the task_struct to find the kernel virtual address of a pre-allocated pipe page (rop_pipe). It then forges the fake struct file so that its f_op (file operations) pointer points to this controlled page.

When epoll_wait() is called on the dangling waiter, it eventually calls vfs_poll(), which executes the indirect call file->f_op->poll(file, pt). Because we control f_op, we control the instruction pointer (RIP).

However, we don’t control the stack. We need a stack pivot to move the execution flow to our ROP chain parked in the pipe page.
On LTS, the exploit chains four gadgets:

PIVOT1: Loads a value from our controlled page into rax and calls it.
PIVOT2: Sets up rbx and loads the next gadget.
PIVOT3: Sets up rcx and rdx.
PIVOT4: Executes push [rcx] ; pop rsp ; ret. This pops our controlled address into the stack pointer (rsp), effectively pivoting the stack to our ROP chain.

(On COS, the layout is different, and the exploit uses a simpler 2-gadget pivot combined with a ret-slide.)

Once the stack is pivoted, the ROP chain executes commit_creds(&init_cred) and switch_task_namespaces(), returning to userspace to spawn a root shell via execve.

Stability

The exploit retries the race/cross-cache loop until it wins or the 5-minute kernelCTF time budget expires. On the CI runners, it achieves a 99% success rate on LTS and 98% on COS. The only failures occurred on a specific AMD EPYC CPU model where the prefetch-based KASLR leak occasionally returned an incorrect base, causing a panic. But against the race itself? It is terrifyingly reliable.

The 2026 Kernel Landscape: Bad Epoll in Context

Bad Epoll joins a running series of named Android-rooting kernel bugs – Bad Binder, Bad IO_uring, Bad Spin. They share the same basic shape: a race or memory-safety flaw in core kernel code that happens to be reachable on Android’s stripped-down configuration. Of the roughly 130 vulnerabilities exploited through kernelCTF to date, only about ten are viable for rooting Android. Bad Epoll is one of them precisely because epoll isn’t an optional module.

It is also landing in an unusually crowded year for Linux privilege-escalation disclosures:

CVE-2026-31431 (Copy Fail): Surfaced in April and added to CISA’s Known Exploited Vulnerabilities (KEV) catalog.
DirtyFrag (Fragnesia, DirtyClone, pedit COW): These are deterministic page-cache-write primitives, the same class that made 2022’s Dirty Pipe so widely weaponized. There is no race to win; you either have the primitive or you don’t.
CVE-2026-31694: A flaw in the FUSE filesystem driver credited to the AI-driven security firm Bynario.

Put side by side, these bugs sketch out two different trajectories. The deterministic bugs are getting easier to find and easier to industrialize. The race conditions, however, remain hard at every single stage: hard to find even for an AI model that found a sibling bug in the exact same file, hard to patch correctly on the first attempt (as the two-month gap for Bad Epoll shows), and hard to turn into a reliable exploit without serious, independent engineering.

Defensive Takeaways: What to Do Right Now

There is no configuration workaround for Bad Epoll. The only real remediation is patching. That means pulling in the upstream fix (commit a6dc643c6931) or your distribution’s backport, and actually rebooting into the patched kernel. Confirming a package updated on disk while the vulnerable kernel keeps running underneath a stale process is a common, fatal mistake.

If you cannot patch immediately, you can raise the bar:

Enable KASLR and SLUB randomization where they aren’t already the default. This raises the difficulty for exploits relying on predictable memory layouts.
Tighten local shell access. Review who has accounts on shared build servers and CI runners. Limit what untrusted code is allowed to execute under a low-privileged account.
Hunt for post-exploitation artifacts. Because the race leaves essentially nothing behind at runtime, incident response teams should look for the downstream signs of a successful local-root event. Audit for new setuid binaries, unexpected changes to sudoers files, new privileged user accounts, altered boot configurations, and authentication events that immediately follow a known low-privilege compromise (like a web app vulnerability).

As of this writing, CVE-2026-46242 doesn’t appear on CISA’s KEV list, and there’s no confirmed evidence of it being used in the wild. But given how complete and public the technical disclosure is, that status is temporary.

Bad Epoll is a stark reminder that in the kernel, the absence of evidence is not evidence of absence. Sometimes, the most dangerous bugs are the ones that leave no trace at all – until someone specifically builds a weapon to prove they are there. Patch your kernels.

PoC (Courtesy of Jaeyoung Chung) – security-research/pocs/linux/kernelctf/CVE-2026-46242_lts_cos/exploit/lts-6.12.67/exploit.cpp at submit-cve-2026-46242 · J-jaeyoung/security-research