Taint-Specifications

Posted on 2020-09-30 Edited on 2024-03-27 In Taint

Several Cases

The following cases are concluded by the inspection of summaries/specifications generated by several tools.

Android and Java Summaries: StubDroid, Modelgen, Apposcopy
Javascript Summaries: Taser

How to decide Input and Output

Basically, the inputs and outputs of a method are not only the parameters and return values, especially for those methods that take function callback as parameters.

The taint source can be parameters, this reference for instance methods, and all visiable static fields.

The taint sink can be return value, parameters, also the this reference for instance methods.

The following summary are generated by StubDroid for method writeObject in class java.io.ObjectOutputStream. The data flows from the first parameter to the field innerStream with type java.io.OutputStreamof this reference.

<method id="void writeObject(java.lang.Object)">
  <flows>
    <flow isAlias="false">
      <from sourceSinkType="Parameter" ParameterIndex="0" />  <!--the first parameter object-->
      <to sourceSinkType="Field" AccessPath="[java.io.ObjectOutputStream: java.io.OutputStream innerStream]"
          AccessPathTypes="[java.io.OutputStream]" />  <!--the impact to outputstream-->
    </flow>
  </flows>
</method>

To test the correctness of this summary, we need to detect the field and check its value. A test example is shown as below. The object oos takes output stream out as parameter of constructor, then invokes writeObject to write data to out via an internal field reference. To detect such implicit information flow, the summaries need to specify that the call to oos.writeObject has an implicit side-effect on out. So we treat out as the output of the summary and the parameter of writeObject as the input.

OutputStream out = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(out);
oos.writeObject("1234");
oos.close();
System.out.println(out.toString());

Similarly, Apposcopy manually writes the following specification to simulate the behavior of writeObject related to the data flow.

@STAMP(flows = {@Flow(from = "object", to = "!this")})
public final void writeObject(Object object) throws IOException {
    ObjectInputStream.object = object;
}

When testing such summaries, the output is no longer restricted to the method itself. We need to deal with the output case by case and further monitor the if the output changes with different inputs.

Further, due to the dynamic feature, it's even more complex to determine the input and output for method in Javascript. Please find the examples in Section 5.

Taint source in summary may not be sound

The taint source specified by the generated summaries may not be sound. So it's not enough to only focus on the mutation of the source in summary, other potential source should also be taken into consideration.

Implicit Effect on the Output

Something like index, offset, will effect the output, but will not be counted as taint source. The following method <java.io.BufferedReader: int read(char[],int,int)> reads characters into a portion of an array. The second parameter offset is used to specify at which to start storing characters. Obvoiusly, offset decides the final content of array a.

/*
 * @param      cbuf  Destination buffer
 * @param      off   Offset at which to start storing characters
 * @param      len   Maximum number of characters to read
 *
 * @return     The number of characters read, or -1 if the end of the
 *             stream has been reached
 *
 * @exception  IOException  If an I/O error occurs
 */
public int read(char cbuf[], int off, int len) throws IOException {
    ...
}

/* Below shows one usage of read method */
public static void main(String[] args) {
    try {
        char []a = new char[50];
        Arrays.fill(a, '1');
        BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(System.in));
        bufferedReader.read(a, 1, 5);
    } catch (Exception e){ }
}

The specification/summary generated by several work are listed as below. All of them don't mark arg#2 offset as taint source. The summaries are correct because the index or offset are just data position, not data itself. But it's not consistent with our idea.

Modelgen: arg#3->this; this->return; this->arg#1; arg#3->return; arg#3->arg#1;
Mannually written summay of Apposcopy

@STAMP(flows = {@Flow(from = "this", to = "buffer"),     // this->arg#1;
                @Flow(from = "length", to = "buffer"),   // arg#3->arg#1;
                @Flow(from = "length", to = "this"),     // arg#3->this;
                @Flow(from = "length", to = "@return")}) // arg#3->return; 
public int read(char[] buffer, int offset, int length) throws IOException {
  	return 0;
}

StubDroid: This summary means data flows from field innerReader to the first parameter buffer.

<method id="int read(char[],int,int)">
  <flows>
    <flow isAlias="false">
      <from sourceSinkType="Field" AccessPath="[java.io.BufferedReader: java.io.Reader innerReader]"
            AccessPathTypes="[java.io.Reader]" />  <!-- java.io.Reader is the private field of bufferedReader -->
      <to sourceSinkType="Parameter" ParameterIndex="0" />  <!-- parameter 0 is the first parameter buffer -->
    </flow>
  </flows>
</method>

Different Granularity

Different summaries generated by different tools have different granularities. The summaries generated by StubDroid and Taser are field-sensitive, but modelgen only consider this reference instead of distinguishing different fields. StubDriod even defines the access path to deal with the case like a.b.c.

CallBack

JavaScript libraries interact with their application code in complex ways, e.g., through callbacks, or by allowing plugins to be configured inside the library. Therefore, it is non-trivial to determine where the library code starts and where the client code ends.

let userInput = {
    tempDir: "./path/to/dir",
    cacheDir: "./path/to/cache"
}
const _ = require("lodash");
const rimraf = require("rimraf");
// _.forIn(object, [iteratee=_.identity])
// iteratee has three paras (value, key, object).
let obj = _.forIn(userInput, function(value) {  // iteratee
    rimraf(value, function(err) {
        if (err)
            console.log(err)
    })
})

This code fragment uses two of the most popular npm packages: lodash, a general-purpose utility library, and rimraf, a simple library for recursively deleting directories on the disk. In this example, the forIn method from lodash is used to iterate through the values of each property on the userInput object. Each of these values is then passed to the rimraf module. The tool Taser generates several summaries for the forIn method, also marks the first parameter of rimraf method as additional sink, due to rimraf may be controlled by attracker to delete arbitrary directory of server.

ID	In	Out
1	(parameter 0 (member forIn (root lodash))) [userInput]	(parameter 1 (parameter 1 (member forIn (root lodash)))) [iteratee.key]
2	(parameter 0 (member forIn (root lodash))) [userInput]	(parameter 2 (parameter 1 (member forIn (root lodash)))) [ieteratee.iterable(userInput)]
3	(parameter 0 (member forIn (root lodash))) [userInput]	(return (member forIn (root lodash))) [obj]
4	(member * (parameter 0 (member forIn (root lodash)))) [userInput.*]	(parameter 0 (parameter 1 (member forIn (root lodash)))) [iteratee.value]
5	(parameter 0 (root rimraf))) [value]	Sink

The data flows from user code to library code (via parameter 0 of forIn), then back to the user code due to the exists of callback (via parameter 1 of forIn), next enter library code (via parameter 0 of rimraf) and finally back to the user code (via return value obj). To test these summaries, we need to write test case like below.

const _ = require("lodash");
  
function Foo() {
    this.a = 1;
    this.b = 2;
}

retObj = _.forIn(new Foo, function(value, key, object) {
    console.log(key);   // test summary 1
    diff(object, Foo);  // test summary 2
    console.log(value); // test summary 4
});
diff(retObj, Foo)  // test summary 3

In this case, we need to customize our callback to check if the output of the library method changes.