⚠️ 🧪 This post is work-in-progress. Changes are expected as this post is part of my research series.

Symbolic-model Guided Fuzzing

research-blog

Traditionally, Fuzzing works on the bit level as “to fuzz” means to “generate a stream of random characters to be consumed by a target program”1. The notion of characters is used equivalent with bytes here. Fuzzers like AFL start with some seed which could be a JPEG file for example. Then AFL starts to execute the JPEG parser and randomly mutates bytes of the input to generate a seed pool. This is done by flipping bits or applying arithmetic operations.

This has major shortcomings when used to fuzz protocols. The messages sent by a protocols often depend on previous messages. This means by randomly flipping bits you maybe not reach deep states within the protocol. Even if you reach good coverage of executed lines, this does not mean that interesting traces of the protocol have been executed.

This is also mentioned in the AFL README: “Unfortunately, fuzzing is also relatively shallow; blind, random mutations make it very unlikely to reach certain code paths in the tested code, leaving some vulnerabilities firmly outside the reach of this technique”. The AFL approach uses branch (edge) coverage 2. That means that every edge between two code blocks in a control flow graph have been visited by the PUT. They use a clever method to instrument programs efficiently to detect differences between the two execution traces A -> B and B -> A, where A and B are code blocks. AFL uses three variables cur_location, shared_mem and prev_location. Firstly, curr_location is set to a random value which is generated when instrumenting the program:

1
cur_location = <COMPILE_TIME_RANDOM>;

Next we XOR the current location and the previous location and increment a counter at that position in the shared memory map:

1
shared_mem[cur_location ^ prev_location]++;

Now we shift cur_location once to the right to detect differences between A -> B and B -> A.

1
prev_location = cur_location >> 1;

There are a lot of different ways to express coverage 3. The following figure shows their relative power. One could also say that path coverage subsumes all other coverage methods. MC/DC is not comparable to boundary interior.

Strukturelle Testmethoden: Vergleich [A. Knapp]
Strukturelle Testmethoden: Vergleich [A. Knapp]

Generally, it is not possible to test all paths as there could be infinitely many. Take for example a loop which loops a long as b is true. As it is not known how long b will stay true, the program could loop endlessly.

Therefore, our goal is to cover all “interesting” paths of our protocol. In my thesis I want to tackle this problem of determining “interesting” by using symbolic models.

Big Picture

Below you can see a sketch of the big picture.

We derive symbolic traces from a symbolic model. These traces are put into a seed pool. From this seed pool we randomly draw a symbolic trace. We not execute that trace using some implementation like OpenSSL. The trace through the implementation is called concrete trace. We want to gather now information about the concrete execution. Firstly, we use this information to create a security context which can be checked by a bug oracle. Secondly we use information from the execution, like symbolic coverage to mutate the symbolic trace and evaluate the performance of it.

Symbolic Model Example

Let’s use as a first example the naïve handshake protocol from the ProVerif manual p. 12.

\begin{align} A \rightarrow B:& \; pk(sk_A) \\ B \rightarrow A:& \; aenc(sign((pk(sk_B),k),sk_B),pk(sk_A)) \\ A \rightarrow B:& \; senc(s, k) \end{align}

The model of this protocol looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
{1}new skA: skey;
{2}new skB: skey;
{3}let pkA: pkey = pk(skA) in
{4}out(c, pkA);
{5}let pkB: pkey = pk(skB) in
{6}out(c, pkB);
(
    {7}in(c, kEnc: bitstring);
    {8}let signedK: bitstring = adec(kEnc,skA) in
    {9}let (rpkB: pkey,k: key) = checksign(signedK,pkB) in
    {10}out(c, senc(message,k))
) | (
    {11}in(c, pkX: pkey);
    {12}new k_1: key;
    {13}out(c, aenc(sign((pkB,k_1),skB),pkX));
    {14}in(c, mEnc: bitstring);
    {15}let z: bitstring = sdec(mEnc,k_1) in
    0
)

Note that this model has an attack hidden! This means the query query attacker(message) evaluates to true.

We want to create symbolic traces now from this model.

Happy Path Trace

⚠️ The following notation is very ambiguous right now. There is no clear syntax defined. These should be taken with a grain of salt.

An example for a happy symbolic trace would be:

\begin{align} A := spawn\_session(); B := spawn\_session();\\ \color{green}sk_A := gen\_sk(); out(pk(sk_A));\\ \color{blue}sk_B := gen\_sk();out(pk(sk_B)); \\ \color{blue}pk_X := in();\\ \color{blue}out(aenc(sign((pk_B,k),sk_B),pk_X)));\\ \color{green}k^{signed} := in();\\ \color{green}k := checksign_A(k^{signed},pk_B);\\ \color{green}out(senc(message, k));\\ \color{blue}encMessage := in();\\ \color{blue}sdec(encMessage, k). \end{align}

Messages in green are triggered by $A$ and those in blue by $B$.

Attack Trace

\begin{align} A := spawn\_session(); B := spawn\_session();\\ \color{green}sk_A := gen\_sk(); out(pk(sk_A));\\ \color{blue}sk_B := gen\_sk();out(pk(sk_B)); \\ \color{red}sk_E := gen\_sk(E); out(pk(sk_E));\\ \color{blue}\overbrace{pk_X}^{\text{pk of Eve}} := in();\\ \color{blue}out(aenc(sign((pk_B,k_1),sk_B),pk_X)));\\ \color{red}out(aenc(sign((pk_B,k_1),sk_B),\overbrace{pk_A}^{\rightarrow \text{Attacking A}})));\\ \color{green}k^{signed} := in();\\ \color{green}k := checksign(k^{signed},pk_B);\\ \color{green}out(senc(message, k));\\ \color{red}encMessage := in();\\ \color{red}sdec(encMessage, k). \end{align}

Messages in orange are from the attacker $E$. The attacker got access to the $message$ by reusing the signature created by $B$. The visual attack trace can be inspected here.

Visualization of the attack trace.

Linking to Implementations

Now there is some manual work to do to link this to an implementation. We need to transform the symbolic trace to actual function calls of e.g. OpenSSL. Within the library we expose internals through the security context. We add there for example the secrets and the used nonce values.

Now lets jump back to our simple protocol. More or less one could imagine that a library implementing that protocol could have the following interface:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
type Context = {
    channel: Channel,
    sk: skey,
    k: Key
}

skey gen_sk()
pkey pk(sk: skey)

ByteArray HelloClient(ctx, sk_A: pkey)                          // eq. (5)
ByteArray HelloServer(ctx, sk_B: pkey)                          // eq. (6)

pkey ReceiveHello(ctx)                                          // eq. (7)

ByteArray SendNewKey(ctx, pk_X: pkey, sk_B: skey)               // eq. (8), k is generated
Key RecvKey(ctx, pk_B: pkey, sk_A: skey)                        // eq. (9-10)
            throws SignatureError

ByteArray SendEncMessage(ctx, message: ByteArray, key: Key)     // eq. (11)
ByteArray ReceiveEncMessage(ctx, message: ByteArray, key: Key)  // eq. (12-13)

An example concrete trace would be:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
let c = new Channel()
let ctxClient = c.newContext()
let ctxServer = c.newContext()

let skA: skey = ctxClient.gen_sk()
let skB: skey = ctxServer.gen_sk()

// We suppose in our model that keys are preshared.
let pkA: pkey = pk(skA)
let pkB: pkey = pk(skB)

HelloClient(ctxClient, skA)
HelloServer(ctxServer, skB)
let pkX: pkey = ReceiveHello(ctxServer)
SendNewKey(ctxServer, pkX, skB)
let k: = ReceiveKey(ctxClient, pkB, skA)
SendEncMessage(ctxClient, "Hello World!", k)
let message = ReceiveEncMessage(ctxSer er, k)

The goal of the driver/test harness is now to call the correct entry functions depending on the trace above.

Also the security context is filled with the symmetric key $k$ for example. Also when running SendNewKey we store the identity of the sender and the receiver. That could be B and E for example. When running ReceiveKey we also store store the sender and receiver. If the pair of sender and receiver does not match, but the implementation did not throw the SignatureError, then the bug oracle detected an authentication violation.

Now this is very specific and just some guessing very specific to this example. ProVerif already provides queries to detect security properties. Like for example secrecy of a message query attacker(message) or authentication: query x:key,y:pkey; event(termClient(x,y))==>event(acceptsServer(x,y)). These queries define connections between events. Events can be triggered and recorded in the security context. After adding an event the security context can be checked by the bug oracle. The oracle then decides whether the context contains violations. For example for a given symmetric key x and public key y, if termClient(x,y) happens but acceptsServer(x,y) hasn’t been recorded yet then we have an authentication violation.

Therefore, the bug oracle is a function of: Violations[] ask_oracle(securityCtx: SecurityContext). You provide it a security context and the oracle decides which violations are contained.


  1. Fuzzing Terminology ↩︎

  2. afl-fuzz whitepaper ↩︎

  3. Code Coverage↩︎

Do you have questions? Send an email to max@maxammann.org