[FSE'19] Cerebro: Context-Aware Adaptive Fuzzing for Effective Vulnerability Detection

Key: use complexity of uncovered code to foresee the benefits of fuzzing a seed (Input Potential)

Abstract

Existing greybox fuzzers mainly utilize program coverage as the goal to guide the fuzzing process. To maximize their outputs, coveragebased greybox fuzzers need to evaluate the quality of seeds properly, which involves making two decisions: 1) which is the most promising seed to fuzz next (seed prioritization), and 2) how many efforts should be made to the current seed (power scheduling). In this paper, we present our fuzzer, Cerebro, to address the above challenges. For the seed prioritization problem, we propose an online multi-objective based algorithm to balance various metrics such as code complexity, coverage, execution time, etc. To address the power scheduling problem, we introduce the concept of input potential to measure the complexity of uncovered code and propose a cost-effective algorithm to update it dynamically. Unlike previous approaches where the fuzzer evaluates an input solely based on the execution traces that it has covered, Cerebro is able to foresee the benefits of fuzzing the input by adaptively evaluating its input potential. We perform a thorough evaluation for Cerebro on 8 different real-world programs. The experiments show that Cerebro can find more vulnerabilities and achieve better coverage than state-of-the-art fuzzers such as AFL and AFLFast. #### Problem

  1. Seed Prioritization
  2. Power Scheduling

Existing work

  1. Seed Prioritization: path rarity (AFLFast) or code complexity
  2. Power Scheduing: singal objective or mix several objectives via weighted sum

Drawback of existing work

  1. Seed Prioritization: Not aware of uncovered code close to the execution trace
  2. Power Scheduling
    1. single objective is bias;
    2. weighted sum is empirically decided

Novelty

  1. Input potential for Power scheduling: the complexity of not yet covered code near the execution trace of the input
  2. Multi-objective optimization for Seed prioritization: balance various metrics such as code complexity, coverage, execution time, etc.

Overview

Static Analyzer

方法:scan throughout the source code and calculates a complexity score for each function

  1. 结构复杂度:McCabe’s Cyclomatic Complexity (根据边和节点)
  2. 操作码复杂度:Halstead Complexity Measures (根据操作码)

静态复杂度 =(结构复杂度+操作码复杂度)* 100 / 2

以静态复杂度初始化动态潜在分数

Dynamic Scorer

动态潜在分数:所有后继函数能够带来的潜在奖励的和,若某函数已覆盖则不作数

组合分数 = 静态分数+动态分数

Multi-objective Scorer

针对每个seed,多目标标准:

  1. 文件大小:越小越好
  2. 执行时间: 越短越好
  3. 覆盖的边数:越多越好
  4. 是否带来新代码覆盖率:有新代码覆盖率更好
  5. 执行序列的静态复杂度:越复杂的代码越容易出错
Power Schqeduler

每个seed的组合分数