[Tech] Taint Analysis
Taint Analysis Classification
Explicit Analysis How taint propagates based on the data dependencies between variables.
Implicit Analysis How taint propogates through condition instructions based on the control dependencies between variables.
Taint Analysis Tools
1. Taintgrind
github 动态分析工具
Valgrind
is a dynamic instrumentation framework for
building dynamic analysis tools, just like Pin. Taintgrind is built on
the top of Valgrind, we need first to build and install Valgrind.
1 | # build, install valgrind |
Capstone
is a disassembly framework with the target of
becoming the ultimate disasm engine for binary analysis and reversing in
the security community. Like many taint analysis tools,
capstone
is also needed by Taintgrind.
1 | wget https://github.com/aquynh/capstone/archive/3.0.4.tar.gz -O capstone.tar.gz |
For the simple example test/sign32.c
1 |
|
Run tool Taintgrind with Valgrind
1 | ../build/bin/valgrind --tool=taintgrind tests/sign32 |
The output is of the form
1 | Address/Location | Assembly instruction | Instruction type | Runtime value(s) | Information flow |
Taintgrind borrows the bit-precise shadow memory from MemCheck and only propagates explicit data flow. This means that Taintgrind will not propagate taint in control structures such as if-else, for-loops and while-loops. Taintgrind will also not propagate taint in dereferenced tainted pointers.
Regarding control-flow dependency and pointer dependence, please refer to https://github.com/wmkhoo/taintgrind/wiki/Control-flow-and-Pointer-tainting.
Taint file input example
1 | ./build/bin/taintgrind --file-filter=/home/wchenbt/Projects/Taint/valgrind/FAQ.txt ~/Projects/Taint/benchmark/gzip-1.6/build/bin/gzip -c FAQ.txt |
2. Triton
github documentation (这个工具使用成本好像有点高,需要自己设置taint source)
既可以模拟执行静态分析,也能利用Python Binding动态运行分析结果
Triton is a Pin-based concolic execution framework that provides components like a taint engine, a dynamic symbolic execution engine, a snapshot engine, translation of x64 instruction into SMT2-LIB, a Z3 interface to solve constraints and Python bindings. Based on these components, you can build tools for automated reverse engineering or vulnerability research.
The Triton project itself generate an .so file to be used by python, an example is showed as below. tutorial
1 | #!/usr/bin/env python |
Pin
is a dynamic binary instrumentation framework for
the IA-32, x86-64 and MIC instruction-set architectures that enables the
creation of dynamic program analysis tools, there is a tutorial of Pin
for us tutorial
to refer to.
This project is also shipped with a Pintool tracer and may be compiled with these following commands:
1 | # get and install the latest z3 relesae |
There are several tutorials about the usage of python bindings. The
interface in this tutorial1
is quite old, the high-level ideas should be ok, but please refer to the
src/examples
and toturial2
for the correct usage. Also attach tutorial3 here for
reference.
Here is an example to count the number of executed instructions using PinTool.
1 | #!/usr/bin/env python2 |
Run the above program and check the output.
1 | ➜ Triton git:(master) ✗ ./build/triton src/examples/pin/count_inst.py src/samples/crackmes/crackme_xor |
The example of performing taint analysis for a specfic program is shown as below.
1 | #!/usr/bin/env python2 |
The order of executed inserted callback is listed as below:
1 | BEFORE_SYMPROC |
3. Pyre
github 针对python的类型检查和静态分析工具
1 | mkdir my_project && cd my_project |
Pyre ships with Pysa, a security focused static analysis tool we've built on top of Pyre that reasons about data flows in Python applications. Please refer to the documentation to get started with security analysis.
4. Psalm
github 针对php的静态分析工具
1 | # install composer |
5. LibDFT
基于Pin的动态分析工具 source for x86
1 | wget https://www.cs.columbia.edu/~vpk/research/libdft/libdft-3.1415alpha.tar.gz |
source for x86_64 used by Angora
1 | git clone https://github.com/AngoraFuzzer/libdft64.git |
6. Bincat or Ponce
IDA插件 静态分析工具 github
BinCAT is a static Binary Code Analysis Toolkit, designed to help reverse engineers, directly from IDA or using Python for automation.
https://airbus-seclab.github.io/bincat/RECON-MTL-2017-bincat-biondi_rigo_zennou_mehrenberger.pdf
7. PolyTracker
github 动态分析 binary tree (二叉森林?)
Polytracker is an LLVM pass that instruments the programs it compiles to track which bytes of an input file are operated on by which functions. It outputs a JSON file containing the function-to-input-bytes mapping.
The blog instroduce Polytracker and PolyFile https://blog.trailofbits.com/2019/11/01/two-new-tools-that-tame-the-treachery-of-files/.
8. Tigress
https://github.com/JonathanSalwan/Tigress_protection
9. DECAF/DECAF++
10. TaintInduce
https://taintinduce.github.io/
11. DFSan
12. Doop
Android https://bitbucket.org/yanniss/doop/src/master/
Pyt https://github.com/python-security/pyt
Pixy https://github.com/oliverklee/pixy
Phosphor github taint analysis for JVM
Dytan 找不到GitHub 嘤嘤嘤 备用github github2 基于Pin
DataTracker github
DataTracker-EWAH github used by Vuzzer
DFSan github
Taintdroid
FlowDroid https://blogs.uni-paderborn.de/sse/tools/flowdroid/
Apposcopy
Scandroid
[ISSTA'11] Saving the World Wide Web from Vulnerable JavaScript https://dl.acm.org/doi/pdf/10.1145/2001420.2001442
[PLDI'09] TAJ: effective taint analysis of web applications https://dl.acm.org/doi/10.1145/1542476.1542486
[FSE'19] Nodest: feedback-driven static analysis of Node.js applications
Bap https://github.com/BinaryAnalysisPlatform/bap