Wei's Blog

[Tech] Joern Usage & MVP & APHP

Posted on 2024-03-28 Edited on 2024-03-29

1. Joern Background

https://docs.joern.io/cpgql/reference-card/

2. Joern Generate DDG

Method 1: generate in joern console

joern> importCode.c.fromString("""
     | static void virtio_pci_remove(struct pci_dev *pci_dev)
     | {
     |  struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
     |  struct device *dev = get_device(&vp_dev->vdev.dev);
     |
     |  pci_disable_sriov(pci_dev);
     |
     |  unregister_virtio_device(&vp_dev->vdev);
     |
     |  if (vp_dev->ioaddr)
     |      virtio_pci_legacy_remove(vp_dev);
     |  else
     |      virtio_pci_modern_remove(vp_dev);
     |
     |  pci_disable_device(pci_dev);
     |  put_device(dev);
     | }
     | """)

[USENIX'22] Drifuzz: Harvesting Bugs in Device Drivers from Golden Seeds

Posted on 2024-03-27 In Papers

Abstract

Peripheral hardware in modern computers is typically assumed to be secure and not malicious, and device drivers are implemented in a way that trusts inputs from hardware. However, recent vulnerabilities such as Broadpwn have demonstrated that attackers can exploit hosts through vulnerable peripherals, highlighting the importance of securing the OS-peripheral boundary. In this paper, we propose a hardware-free concolic-augmented fuzzer targeting WiFi and Ethernet drivers, and a technique for generating high-quality initial seeds, which we call golden seeds, that allow fuzzing to bypass difficult code constructs during driver initialization. Compared to prior work using symbolic execution or greybox fuzzing, Drifuzz is more successful at automatically finding inputs that allow network interfaces to be fully initialized, and improves fuzzing coverage by 214% (3.1×) in WiFi drivers and 60% (1.6×) for Ethernet drivers. During our experiments with fourteen PCI and USB network drivers, we find eleven previously unknown bugs, two of which were assigned CVEs.

[USENIX'22] SyzScope: Revealing High-Risk Security Impacts of Fuzzer-Exposed Bugs in Linux kernel

Posted on 2022-05-20 Edited on 2024-03-27 In Papers

Key: uncover new high-risk impacts given a bug with seemingly low-risk impacts

Abstract

Fuzzing has become one of the most effective bug finding approach for software. In recent years, 24*7 continuous fuzzing platforms have emerged to test critical pieces of software, e.g., Linux kernel. Though capable of discovering many bugs and providing reproducers (e.g., proof-of-concepts), a major problem is that they neglect a critical function that should have been built-in, i.e., evaluation of a bug’s security impact. It is well-known that the lack of understanding of security impact can lead to delayed bug fixes as well as patch propagation. In this paper, we develop SyzScope, a system that can automatically uncover new “high-risk” impacts given a bug with seemingly “low-risk” impacts. From analyzing over a thousand low-risk bugs on syzbot, SyzScope successfully determined that 183 low-risk bugs (more than 15%) in fact contain high-risk impacts, e.g., control flow hijack and arbitrary memory write, some of which still do not have patches available yet.

[FSE'19] Cerebro: Context-Aware Adaptive Fuzzing for Effective Vulnerability Detection

Posted on 2022-05-10 Edited on 2024-03-27 In Papers

Key: use complexity of uncovered code to foresee the benefits of fuzzing a seed (Input Potential)

Abstract

Existing greybox fuzzers mainly utilize program coverage as the goal to guide the fuzzing process. To maximize their outputs, coveragebased greybox fuzzers need to evaluate the quality of seeds properly, which involves making two decisions: 1) which is the most promising seed to fuzz next (seed prioritization), and 2) how many efforts should be made to the current seed (power scheduling). In this paper, we present our fuzzer, Cerebro, to address the above challenges. For the seed prioritization problem, we propose an online multi-objective based algorithm to balance various metrics such as code complexity, coverage, execution time, etc. To address the power scheduling problem, we introduce the concept of input potential to measure the complexity of uncovered code and propose a cost-effective algorithm to update it dynamically. Unlike previous approaches where the fuzzer evaluates an input solely based on the execution traces that it has covered, Cerebro is able to foresee the benefits of fuzzing the input by adaptively evaluating its input potential. We perform a thorough evaluation for Cerebro on 8 different real-world programs. The experiments show that Cerebro can find more vulnerabilities and achieve better coverage than state-of-the-art fuzzers such as AFL and AFLFast.

Kernel Coverage Analysis

Posted on 2022-04-29 Edited on 2024-03-27 In Techniques

1. Clang Santizer Coverage

给定如下程序test.c，该程序有三个边：

#include <stdio.h>

void foo(int *a) {
    if (a) {
        *a = 0;
    }
}

int main(int argc, char const *argv[]) {
    int a; 

    printf("Please enter one numbers:");
    scanf("%d", &a);

    foo(&a);
    printf("%d\n", a);
    return 0; 
}

[S&P'16] (State of) The Art of War: Offensive Techniques in Binary Analysis

Posted on 2021-06-04 Edited on 2024-03-27 In Paper

Abstract

Finding and exploiting vulnerabilities in binary code is a challenging task. The lack of high-level, semantically rich information about data structures and control constructs makes the analysis of program properties harder to scale. However, the importance of binary analysis is on the rise. In many situations binary analysis is the only possible way to prove (or disprove) properties about the code that is actually executed.

[Tool] SVF Static Analysis

Posted on 2021-06-04 Edited on 2024-03-27

Static Value-Flow Analysis Framework

A scalable, precise and on-demand interprocedural program dependence analysis framework for both sequential and multithreaded programs.

Value-Flow Analysis

resolves both control and data dependence.
- Does the information generated at program point A flow to another program point B along some execution paths?
- Can function F be called either directly or indirectly from some other function F 0?
- Is there an unsafe memory access that may trigger a bug or security risk?

Key features of SVF

Sparse: compute and maintain the data-flow facts where necessary
Selective : support mixed analyses for precision and efficiency trade-offs
On-demand : reason about program parts based on user queries.

[NDSS'19] Send Hardest Problems My Way: Probabilistic Path Prioritization for Hybrid Fuzzing

Posted on 2021-06-02 Edited on 2024-03-27

Abstract

Hybrid fuzzing which combines fuzzing and concolic execution has become an advanced technique for software vulnerability detection. Based on the observation that fuzzing and concolic execution are complementary in nature, the state-of-theart hybrid fuzzing systems deploy “demand launch” and “optimal switch” strategies. Although these ideas sound intriguing, we point out several fundamental limitations in them, due to oversimplified assumptions. We then propose a novel “discriminative dispatch” strategy to better utilize the capability of concolic execution. We design a Monte Carlo based probabilistic path prioritization model to quantify each path’s difficulty and prioritize them for concolic execution. This model treats fuzzing as a random sampling process. It calculates each path’s probability based on the sampling information. Finally, our model prioritizes and assigns the most difficult paths to concolic execution. We implement a prototype system DigFuzz and evaluate our system with two representative datasets. Results show that the concolic execution in DigFuzz outperforms than those in state-of-the-art hybrid fuzzing systems in every major aspect. In particular, the concolic execution in DigFuzz contributes to discovering more vulnerabilities (12 vs. 5) and producing more code coverage (18.9% vs. 3.8%) on the CQE dataset than the concolic execution in Driller.

[Tool] Angr Binary Analysis

Posted on 2021-05-31 Edited on 2024-03-27

Workflow

Binary Lifting libVEX : Binary Code to VEX IR
Binary Loading CLE : load binary with differernt formats
- Resolve dynamic symbol
- Perform relocation
- Initialize program state
Program State Representation: SimuVEX
- Program state (SimState) is a snapshot of values in registers and memory, open files, etc.

[PLDI'14] Compiler Validation via Equivalence Modulo Inputs

Posted on 2021-05-20 Edited on 2024-03-27

Abstract

We introduce equivalence modulo inputs (EMI), a simple, widely applicable methodology for validating optimizing compilers. Our key insight is to exploit the close interplay between (1) dynamically executing a program on some test inputs and (2) statically compiling the program to work on all possible inputs. Indeed, the test inputs induce a natural collection of the original program’s EMI variants, which can help differentially test any compiler and specifically target the difficult-to-find miscompilations.