[Tech] Taint-Specifications
Several Cases
The following cases are concluded by the inspection of summaries/specifications generated by several tools.
- Android and Java Summaries: StubDroid, Modelgen, Apposcopy
- Javascript Summaries: Taser
How to decide Input and Output
Basically, the inputs and outputs of a method are not only the parameters and return values, especially for those methods that take function callback as parameters.
The taint source can be parameters,
this
reference for instance methods, and all visiable static fields.The taint sink can be return value, parameters, also the
this
reference for instance methods.The following summary are generated by
StubDroid
for methodwriteObject
in classjava.io.ObjectOutputStream
. The data flows from the first parameter to the fieldinnerStream
with typejava.io.OutputStream
ofthis
reference.1
2
3
4
5
6
7
8
9<method id="void writeObject(java.lang.Object)">
<flows>
<flow isAlias="false">
<from sourceSinkType="Parameter" ParameterIndex="0" /> <!--the first parameter object-->
<to sourceSinkType="Field" AccessPath="[java.io.ObjectOutputStream: java.io.OutputStream innerStream]"
AccessPathTypes="[java.io.OutputStream]" /> <!--the impact to outputstream-->
</flow>
</flows>
</method>To test the correctness of this summary, we need to detect the field and check its value. A test example is shown as below. The object
oos
takes output streamout
as parameter of constructor, then invokeswriteObject
to write data toout
via an internal field reference. To detect such implicit information flow, the summaries need to specify that the call tooos.writeObject
has an implicit side-effect onout
. So we treatout
as the output of the summary and the parameter ofwriteObject
as the input.1
2
3
4
5OutputStream out = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(out);
oos.writeObject("1234");
oos.close();
System.out.println(out.toString());Similarly, Apposcopy manually writes the following specification to simulate the behavior of
writeObject
related to the data flow.1
2
3
4
public final void writeObject(Object object) throws IOException {
ObjectInputStream.object = object;
}When testing such summaries, the output is no longer restricted to the method itself. We need to deal with the output case by case and further monitor the if the output changes with different inputs.
Further, due to the dynamic feature, it's even more complex to determine the input and output for method in Javascript. Please find the examples in Section 5.
Taint source in summary may not be sound
The taint source specified by the generated summaries may not be sound. So it's not enough to only focus on the mutation of the source in summary, other potential source should also be taken into consideration.
Implicit Effect on the Output
Something like index, offset, will effect the output, but will not be counted as taint source. The following method
<java.io.BufferedReader: int read(char[],int,int)>
reads characters into a portion of an array. The second parameter offset is used to specify at which to start storing characters. Obvoiusly, offset decides the final content of arraya
.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23/*
* @param cbuf Destination buffer
* @param off Offset at which to start storing characters
* @param len Maximum number of characters to read
*
* @return The number of characters read, or -1 if the end of the
* stream has been reached
*
* @exception IOException If an I/O error occurs
*/
public int read(char cbuf[], int off, int len) throws IOException {
...
}
/* Below shows one usage of read method */
public static void main(String[] args) {
try {
char []a = new char[50];
Arrays.fill(a, '1');
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(System.in));
bufferedReader.read(a, 1, 5);
} catch (Exception e){ }
}The specification/summary generated by several work are listed as below. All of them don't mark arg#2 offset as taint source. The summaries are correct because the
index
oroffset
are just data position, not data itself. But it's not consistent with our idea.- Modelgen: arg#3->this; this->return; this->arg#1; arg#3->return; arg#3->arg#1;
- Mannually written summay of Apposcopy
1
2
3
4
5
6
7
// arg#3->return;
public int read(char[] buffer, int offset, int length) throws IOException {
return 0;
}- StubDroid: This summary means data flows from field
innerReader
to the first parameterbuffer
.
1
2
3
4
5
6
7
8
9<method id="int read(char[],int,int)">
<flows>
<flow isAlias="false">
<from sourceSinkType="Field" AccessPath="[java.io.BufferedReader: java.io.Reader innerReader]"
AccessPathTypes="[java.io.Reader]" /> <!-- java.io.Reader is the private field of bufferedReader -->
<to sourceSinkType="Parameter" ParameterIndex="0" /> <!-- parameter 0 is the first parameter buffer -->
</flow>
</flows>
</method>Different Granularity
Different summaries generated by different tools have different granularities. The summaries generated by StubDroid and Taser are field-sensitive, but modelgen only consider
this
reference instead of distinguishing different fields. StubDriod even defines the access path to deal with the case likea.b.c
.CallBack
JavaScript libraries interact with their application code in complex ways, e.g., through callbacks, or by allowing plugins to be configured inside the library. Therefore, it is non-trivial to determine where the library code starts and where the client code ends.
1
2
3
4
5
6
7
8
9
10
11
12
13
14let userInput = {
tempDir: "./path/to/dir",
cacheDir: "./path/to/cache"
}
const _ = require("lodash");
const rimraf = require("rimraf");
// _.forIn(object, [iteratee=_.identity])
// iteratee has three paras (value, key, object).
let obj = _.forIn(userInput, function(value) { // iteratee
rimraf(value, function(err) {
if (err)
console.log(err)
})
})This code fragment uses two of the most popular npm packages: lodash, a general-purpose utility library, and rimraf, a simple library for recursively deleting directories on the disk. In this example, the forIn method from lodash is used to iterate through the values of each property on the
userInput
object. Each of these values is then passed to therimraf
module. The tool Taser generates several summaries for the forIn method, also marks the first parameter ofrimraf
method as additional sink, due torimraf
may be controlled by attracker to delete arbitrary directory of server.ID In Out 1 (parameter 0 (member forIn (root lodash))) [userInput] (parameter 1 (parameter 1 (member forIn (root lodash)))) [iteratee.key] 2 (parameter 0 (member forIn (root lodash))) [userInput] (parameter 2 (parameter 1 (member forIn (root lodash)))) [ieteratee.iterable(userInput)] 3 (parameter 0 (member forIn (root lodash))) [userInput] (return (member forIn (root lodash))) [obj] 4 (member * (parameter 0 (member forIn (root lodash)))) [userInput.*] (parameter 0 (parameter 1 (member forIn (root lodash)))) [iteratee.value] 5 (parameter 0 (root rimraf))) [value] Sink The data flows from user code to library code (via parameter 0 of forIn), then back to the user code due to the exists of callback (via parameter 1 of forIn), next enter library code (via parameter 0 of rimraf) and finally back to the user code (via return value obj). To test these summaries, we need to write test case like below.
1
2
3
4
5
6
7
8
9
10
11
12
13const _ = require("lodash");
function Foo() {
this.a = 1;
this.b = 2;
}
retObj = _.forIn(new Foo, function(value, key, object) {
console.log(key); // test summary 1
diff(object, Foo); // test summary 2
console.log(value); // test summary 4
});
diff(retObj, Foo) // test summary 3In this case, we need to customize our callback to check if the output of the library method changes.