[NDSS'20] HFL: Hybrid Fuzzing on the Linux Kernel
Abstract
Hybrid fuzzing, combining symbolic execution and fuzzing, is a promising approach for vulnerability discovery because each approach can complement the other. However, we observe that applying hybrid fuzzing to kernel testing is challenging because the following unique characteristics of the kernel make a naive adoption of hybrid fuzzing inefficient: 1) having indirect control transfers determined by system call arguments, 2) controlling and matching internal system state via system calls, and 3) inferring nested argument type for invoking system calls. Failure to handling such challenges will render both fuzzing and symbolic execution inefficient, and thereby, will result in an inefficient hybrid fuzzing. Although these challenges are essential to both fuzzing and symbolic execution, to the best of our knowledge, existing kernel testing approaches either naively use each technique separately without handling such challenges or imprecisely handle a part of challenges only by static analysis.
To this end, this paper proposes HFL, which not only combines fuzzing with symbolic execution for hybrid fuzzing but also addresses kernel-specific fuzzing challenges via three distinct features: 1) converting indirect control transfers to direct transfers, 2) inferring system call sequence to build a consistent system state, and 3) identifying nested arguments types of system calls. As a result, HFL found 24 previously unknown vulnera- bilities in recent Linux kernels. Additionally, HFL achieves 15% and 26% higher code coverage than Moonshine and Syzkaller, respectively, and over kAFL/S2E/TriforceAFL, achieving even four times better coverage, using the same amount of resources (CPU, time, etc.). Regarding vulnerability discovery performance, HFL found 13 known vulnerabilities more than three times faster than Syzkaller.
Outline
Difficulties of performing hybrid fuzzing on kernel
P1: Having indirect control transfers determined by system call arguments. Kernel prefers to use function pointer table to support polymorphism.
1 | size_t (*ucma_table[])(struct ucma_file *file, |
P2: Controlling and matching internal system state via system calls. Correct system call execution order is necessary to touch the deep logic of syscall.
P3: Inferring nested argument type for invoking system calls. A field number in one structure points to another structure.
Drawbacks of previous work
either naively use fuzzing/symbolic execution separately by not handling such challenges - Razzer: Fuzzing - Digtool - kAFL: Fuzzing - Periscope: Fuzzing - perf fuzzer - CAB-Fuzz: Hybrid Fuzzer without handling P1, P2, P3
or imprecisely handle a part of challenges only by static analysis - Difuze (infer the structure of arguments of ioctl) - Handle P3 but only perform static analysis - Cannot infer abstract pointer which is only avaiable at runtime - IMF (analyze trace of macOS) - Handle P2, without handling P1, P3 - Moonshine (analyze trace of linux) - Handle P2, without handling P1, P3 - Analyze explicit dependencies by performing static analysis on kernel - HFL further perform symbolic checking to get precise constants between system state varibles
Why this paper is accepted
Novelty: first kernel hybrid fuzzer
- First work to handle hybrid kernel fuzzing :open_mouth:
Practical
- Apply Hybird fuzzing approach to test the linux kernel
- Achieve significant coverage improvement
- Uncover multiple previously unknown vulnerabilities
How it works
Kernel Fuzzing + Symbolic Execution
IV-B: converting indirect control transfers to direct transfers at compile time for symbolic analyze
- resolve function pointer, unfold indirect transfer to direct one offline :+1:
- only interest in those index variable originates from syscall parameters by data flow analysis
1 | // before translation |
IV-C: inferring system call sequence to build a consistent system state
- Step 1: point-to analysis to capture candidate read/write dependency pairs (offline) (the same as the static analysis step of Razzer)
- Step 2: runtime validation to identify true dependencies
(infer execution order)
- check if they access the same address during symbolically execution
- distinguish true dependency pairs & write must before read
- Step 3: detect parameter dependencies (infer parameter dependencies)
IV-D: identifying nested arguments types of system calls at runtime
- monitor invocation of copy_from_user
- using symbolic state to track offset and length of nested data structure
- not the same as S&P'20: EASIER :sweat_smile:
Technique details
IV-A: Fuzzing: Syzkaller
- :sparkles: Fully automated kernel-specific features instead of using manually written template of Syzkaller
IV-C & IV-D: Symbolic Execution: S2E
All syscall parameters is symbolized
s2e_make_symbolic
:sparkles: Identify always-false or always-true branch as hard branch by maintaining a frequency table
Pass user program that executes hard branch to symbolic analyzer
Symbolize all syscall parameters and perform concolic execution to reach another branch
IV-B: Unfold indirect control flow (Kernel translator)
- GIMPLE representation in gcc
IV-C: Infer syscall order and syscall paramter dependency
- Static analysis: SVF using LLVM Linux
Code https://bitbucket.org/anonyk/hfl-release/src/master/
Evaluation
24 previously unknown vulnerabilities in recent Linux kernels
HFL achieves 15% and 26% higher code coverage than Moonshine and Syzkaller
HFL measures the upper bound number of coverage for kernel
- HFL found 13 known vulnerabilities more than three times faster than Syzkaller
- Contribution of each feature in HFL by comparing with Syzkaller
Examples
Three example to overcome proposed difficulties
Difficult 1: Indirect control flow in RDS network
1 |
|
Difficult 2: calling dependency across PPP ioctl
1 | long ppp_ioctl(struct file *file, int cmd, long arg) { |
Difficult 3: Nested argument in PPP driver
1 | struct ppp_option_data { |
Related Work
Kernel Testing | Kernel | Target | Coverage guided | Call Sequence Inference | Nested Syscall Argument | Indirect Control Flow | Handle Branch Condition | Remark |
---|---|---|---|---|---|---|---|---|
perf_fuzzer | Linux | perf_event set | ||||||
Digtool | Win | |||||||
kAFL | Win & Linux & MacOS | file system & single syscall | :star: | Fuzzer | ||||
Razzer | Linux | syscall traces | :star: | Race Bug | ||||
PeriScope | Linux | device drivers | ||||||
FIRM-AFL | Firmware | |||||||
CAB-Fuzz | Wind | device drivers | :star: | Hybrid | ||||
IMF | Linux | syscall traces | :star: | Fuzzer | ||||
Syzkaller | Linux | syscall traces | :star: | :star: | :star: | Fuzzer | ||
Moonshine | MacOS | syscall traces | :star: | :star: | Fuzzer | |||
DIFUZE | Linux (Android) | device drivers | :star: | |||||
JANUS | Linux | file system | Fuzzer | |||||
Krace | Linux | file system | :star: | Race Bug | ||||
HFL | Linux | syscall traces | :star: | :star: | :star: | :star: | :star: | Hybrid |
Kminer | ||||||||
Dr. checker | ||||||||
Reflection
- HFL虽然也进行了static point-to analysis,但HFL在runtime时利用符号执行验证识别true dependency,来修正false positive。
- kernel的子系统分开坐可能会降低难度
Category | Analysis Target | Size (.bc) | Syscalls used |
---|---|---|---|
File System | fs/ | 75 MB | open, read, ioctl, write, lseek |
Network | net/ | 255 MB | socket, accept, bind, listen, ioctl,
getsocket, setsocket, sendto, recvfrom, sendmsg, recvmsg |
Drivers | drivers/ | 322 MB | open, ioctl, write, read |
- 其实implicit dependencies的话,不是函数A在函数B前面就可以,他们的参数可能还有依赖关系,可能两个field字端要保持一致,也就是说,参数间的依赖关系,不止存在于explict dependencies例如open和read这种关系,还存在于如下这种情况。
1 | ioctl(fd, DRM_ALLOC, {*_1}); |