[Tech] Joern Usage & MVP & APHP
1. Joern Background
https://docs.joern.io/cpgql/reference-card/
2. Joern Generate DDG
Method 1: generate in joern console
1 | joern> importCode.c.fromString(""" |
Method 2: run the following command
1 | # source files are located under current directory |
The above would generate a file named cpg.bin
, then, we
generate dog file to visualize the dog
1 | # generate ddg |
The generated dot files are under directory out-ddg
, we
check it and found out-ddg/1-ddg.dot
is the one we
want.
1 | dot -Tpng -o test.png out-ddg/1-ddg.dot |
2. Analyze DDG
There is a null ptr dereference inside the function to be analyzed.
1 | static int mvebu_uart_probe(struct platform_device *pdev) { |
In line 2, match
is returned from function
of_match_device
, without verifying whether
match
is NULL, and instead directly dereference it in line
5 would cause NPD.
2.1 MVP Algorithm
The input of MVP is a security patches. For instance, to fix the above NPD, the patch adds the check from line 14 to line 15.
1 | static int mvebu_uart_probe(struct platform_device *pdev) |
Step 1: MVP first analyzes the patch and extracts four variables as shown below and \(f_{v}\) and \(p_v\) are vulnerable and patched function. \[ S_{del}:& \texttt{set of deleted statements} \\ S_{add}:& \texttt{set of added statements} \\ S_{vul}:& \texttt{all statements in vulnerable functions} \\ S_{pat}:& \texttt{all statements in patched functions} \]
Thus, for the patch above, \(S_{add} = \{s_{14}, s_{15}\}\), \(S_{del} = \{\}\).
Step 2: MVP takes statements in \(S_{add}\) and \(S_{del}\) as slicing criterion and perform forward and backward slicing of PDF of patched function \(p_{v}\) and vulnerable function \(f_v\) to capture the data and control dependencies.
- Backward slicing is normal, just includes all data dependence and control dependence
Example. By taking \(s_{15}\) as slicing criterion, the result of backward slicing is {\(s_{3}\) (ctrl), \(s_4\) (data), \(s_{10}\) (ctrl)}
- Forward slicing is customized, otherwise too many statements would
be involved if the added statements are
if
conditions.- Assignment is included
- Return statements will not be forward sliced
- For function call and conditional statement, first backward on data dependence then forward
Example. First, \(s_{15}\) is return statement. Thus we do not perform forward slicing. Second, to handle the conditional statement in \(s_{14}\), we first conduct backward slicing on data dependencies and obtain \(s_4\). Then we set \(s_4\) as slicing criterion, and conduct forward slicing on data dependence. The results include \(\{s_{63}, s_{66}, s_{67}, s_{75}, s_{118}, s_{120}, s_{122}, s_{123}\}\).
For reference, this is the selected PDG for function after patch.
For reference, this is the selected PDG for function before patch.
[!IMPORTANT]
- Joern does not distinguish
a
anda->b
, any statement that may be affected will be considered as data dependence related one.
Step 3: MVP puts the slicing results into \(S^{sem}_{del}\) and \(S_{add}^{sem}\), making it as semantically-related statements of all changed statements in changed function of security patch.
Example. Since no deleted statements in the current patch. \(S_{del}^{sem}\) is \(\{\}\) while \(S^{sem}_{add} = \{s_3, s_4, s_{10}, s_{14}, s_{63}, s_{66}, s_{67}, s_{75}, s_{118}, s_{120}, s_{122}, s_{123}\}\) contains so many noises incurred during forward slicing.
Step 4: MVP computes the vulnerability signature and patch signature is generated below:
\(V_{syn} = S^{sem}_{del} \cup (S_{vul} \cap S_{add}^{sem})\): vulnerability syntax signature
\(V_{sem} = \{(s_1, s_2, \texttt{type}) | s_1, s_2 \in V_{syn}\}\)
\(P_{syn} = S^{sem}_{add} \setminus S_{vul}\) statements that only exist in patched function \(p_{v}\)
$P_{sem} = {(s_1, s_2, ) | s_1, s_2 S^{sem}{add}} {(s_1, s_2, ) | s_1, s_2 S{vul}} $ data or control dependencies between two statements that only exist in patched function
Although the above formula looks quite complicate, but you should notice that
- \(V_{syn} \cup P_{syn} = S^{sem}_{del} \cup S_{add}^{sem}\)
To more noise in case \(V_{syn}\) is too large, MVP iteratively remove from \(V_{syn}\) which are farthest from the slicing criterion on PDG.
Example.
- \(V_{syn} = \{s_3, s_4, s_{10}, s_{63}, s_{66}, s_{67}, s_{75}, s_{118}, s_{120}, s_{122}, s_{123}\}\)
- \(V_{sem} = \{(s_3, s_{10}, \texttt{data}), (s_{10}, s_{63}, \texttt{ctrl}), (s_4, s_{63}, \texttt{data}), ...\}\)
- $P_{syn} = {s_{14}} $
- \(P_{sem} = \{\}\)
Step 5: Apply abstraction, normalization and hashing procedure
- Formal parameters => PARAM
- Local variables => VARIABLE
- String => STRING
Step 6: Determine whether a target function is vulnerable based on the principle that its signature matches the vulnerability signature but does not match patch signature.
Reference
- https://docs.joern.io/export/