Automated vulnerability auditing in machine code Tyler Durden Phrack Magazine #64 Version of May 22 2007 I. Introduction a/ On the need of auditing automatically b/ What are exploitation frameworks c/ Why this is not an exploitation framework d/ Why this is not fuzzing e/ Machine code auditing : really harder than sources ? II. Preparation a/ A first intuition b/ Static analysis vs dynamic analysis c/ Dependences & predicates - Controlflow analysis - Dataflow analysis d/ Translation to intermediate forms e/ Induction variables (variables in loops) III. Analysis a/ What is a vulnerability ? b/ Buffer overflows and numerical intervals - Flow-insensitive - Flow-sensitive - Accelerating the analysis by widening c/ Type-state checking - Memory leaks - Heap corruptions d/ More problems - Predicate analysis - Alias analysis and naive solutions - Hints on detecting race conditions IV. Chevarista: an analyzer of binary programs a/ Project modelization b/ Program transformation c/ Vulnerability checking d/ Vulnerable paths extraction e/ Future work : Refinement V. Related Work a/ Model Checking b/ Abstract Interpretation VI. Conclusion VII. Greetings VIII. References IX. The code .::###########################################################::. Software have bugs. That is quite a known fact. ----------------------[ I. Introduction In this article, we will discuss the design of an engine for automated vulnerability analysis of binary programs. The source code of the Chevarista static analyzer is given at the end of this document. The purpose of this paper is not to disclose 0day vulnerability, but to understand how it is possible to find them without (or with restricted) human intervention. However, we will not friendly provide the result of our automated auditing on predefined binaries : instead we will always take generic examples of the most common difficulties encountered when auditing such programs. Our goal is to enlight the underground community about writing your own static analyzer and not to be profitful for security companies or any profit oriented organization. Instead of going straight to the results of the proposed implementation, we may introduce the domain of program analysis, without going deeply in the theory (which can go very formal), but taking the perspective of a hacker who is tired of focusing on a specific exploit problem and want to investigate until which automatic extend it is possible to find vulnerabilities and generate an exploit code for it without human intervention. Chevarista hasnt reached its goal of being this completely automated tool, however it shows the path to implement incrementally such tool with a genericity that makes it capable of finding any definable kind of vulnerability. Detecting all the vulnerabilities of a given program can be untractable, and this for many reasons. The first reason is that we cannot predict that a program running forever will ever have a bug or not. The second reason is that if this program ever stop, the number of states (as in "memory contexts") it reached and passed through before stopping is very big, and testing all of of possible concrete program paths would either take your whole life, or a dedicated big cluster of machine working on this for you during ages. As we need more automated systems to find bugs for us, and we do not have such computational power, we need to be clever on what has to be analysed, how generic can we reason about programs, so a single small analyzer can reason about a lot of different kinds of bugs. After all, if the effort is not worth the genericity, its probably better to audit code manually which would be more productive. However, automated systems are not limited to vulnerability findings, but because of their tight relation with the analyzed program, they can find the exact conditions in which that bug happens, and what is the context to reach for triggering it. But someone could interject me : "But is not Fuzzing supposed to do that already ?". My answer would be : Yes. But static analysis is the intelligence inside Fuzzing. Fuzzy testing programs give very good results but any good fuzzer need to be designed with major static analysis orientations. This article also applies somewhat to fuzzing but the proposed implementation of the Chevarista analyzer is not a fuzzer. The first reason is that Chevarista does not execute the program for analyzing it. Instead, it acts like a (de)compiler but perform analysis instead of translating (back) to assembly (or source) code. It is thus much more performant than fuzzing but require a lot of development and litterature review for managing to have a complete automatic tool that every hacker dream to maintain. Another lost guy will support : "Your stuff looks more or less like an exploitation framework, its not so new". Exploitation frameworks are indeed not very new stuffs. None of them analyze for vulnerabilities, and actually only works if the builtin exploits are good enough. When the framework aims at letting you trigger exploits manually, then it is not an automated framework anymore. This is why Chevarista is not CORE-Impact or Metasploit : its an analyzer that find bugs in programs and tell you where they are. One more fat guy in the end of the room will be threatening: "It is simply not possible to find vulnerabilities in code without the source .." and then a lot of people will stand up and declare this as a prophety, because its already sufficiently hard to do it on source code anyway. I would simply measure this judgement by several remarks: for some peoples, assembly code -is- source code, thus having the assembly is like having the source, without a certain number of information. That is this amount of lost information that we need to recover when writing a decompiler. First, we do not have the name of variables, but naming variables in a different way does not affect the result of a vulnerability analysis. Second, we do not have the types, but data types in compiled C programs do not really enforce properties about the variables values (because of C casts or a compiler lacking strong type checking). The only real information that is enforced is about variable size in memory, which is recoverable from an assembly program most of the time. This is not as true for C++ programs (or other programs written in higher level objects-oriented or functional languages), but in this article we will mostly focuss on compiled C programs. A widely spread opinion about program analysis is that its harder to acheive on a low-level (imperative) language rather than a high-level (imperative) language. This is true and false, we need to bring more precision about this statement. Specifically, we want to compare the analysis of C code and the analysis of assembly code: --------------------------------------------------------------------- | Available information | C code | Assembly code | |---------------------------------------------------------------------| | Original variables names| Yes (explicit) | No | |---------------------------------------------------------------------| | Original types names | Yes (explicit) | No | |---------------------------------------------------------------------| | Control Sequentiality | Yes (explicit) | Yes (explicit) | |---------------------------------------------------------------------| | Structured control | Yes (explicit) | Yes (recoverable)| |---------------------------------------------------------------------| | Data dependencies | Yes (implicit) | Yes (implicit) | |---------------------------------------------------------------------| | Data Types | Yes (explicit) | Yes (recoverable)| |---------------------------------------------------------------------| | Register transfers | No | Yes (explicit) | |---------------------------------------------------------------------| | Selected instructions | No | Yes (explicit) | --------------------------------------------------------------------- Lets discuss those points more in details: - The control sequentiality is obviously kept in the assembly, else the processor would not know how to execute the binary program. However the binary program does not contain a clearly structured tree of execution. Conditionals, but especially, Loops, do not appear as such in the executable code. We need a preliminary analysis for structuring the control flow graph. This was done already on source and binary code using different algorithms that we do not present in this article. - Data dependencies are not explicit even in the source program, however we can compute it precisely both in the source code and the binary code. The dataflow analysis in the binary code however is slightly different, because it contains every single load and store between registers and the memory, not only at the level of variables, as done in the source program. Because of this, the assembly programs contains more instructions than source programs contain statements. This is an advantage and a disadvantage at the same time. It is an advantage because we can track the flow in a much more fine-grained fashion at the machine level, and that is what is necessary especially for all kind of optimizations, or machine-specific bugs that relies on a certain variable being either in the memory or in a register, etc. This is a disadvantage because we need more memory to analyse such bigger program listings. - Data types are explicit in the source program. Probably the recovery of types is the hardest information to recover from a binary code. However this has been done already and the approach we present in this paper is definitely compatible with existing work on type-based decompilation. Data types are much harder to recover when dealing with real objects (like classes in compiled C++ programs). We will not deal with the problem of recovering object classes in this article, as we focuss on memory related vulnerabilities. - Register level anomalies can happen [DLB], which can be useful for a hacker to determine how to create a context of registers or memory when writing exploits. Binary-level code analysis has this advantage that it provides a tighter approach to exploit generation on real world existing targets. - Instruction level information is interested again to make sure we dont miss bugs from the compiler itself. Its very academically well respected to code a certified compiler which prove the semantic equivalence between source code and compiled code but for the hacker point of view, it does not mean so much. Concrete use in the wild means concrete code, means assembly. Additionally, it is rarer but it has been witnessed already some irregularities in the processor's execution of specific patterns of instructions, so an instruction level analyzer can deal with those, but a source level analyzer cannot. A last reason I would mention is that the source code of a project is very verbose. If a code analyzer is embedded into some important device, either the source code of the software inside the device will not be available, or the device will lack storage or communication bandwidth to keep an accessible copy of the source code. Binary code analyzer do not have this dependencie on source code and can thus be used in a wider scope. To sum-up, there is a lot of information recovery work before starting to perform the source-like level analysis. However, the only information that is not available after recovery is not mandatory for analysing code : the name of types and variables is not affecting the execution of a program. We will abstract those away from our analysis and use our own naming scheme, as presented in the next chapter of this article. -------------[ II. Preparation We have to go on the first wishes and try to understand better what vulnerabilities are, how we can detect them automatically, are we really capable to generate exploits from analyzing a program that we do not even execute ? The answer is yes and no and we need to make things clear about this. The answer is yes, because if you know exactly how to caracterize a bug, and if this bug is detectable by any algorithm, then we can code a program that will reason only about those known-in-advance vulnerability specificities and convert the raw assembly (or source) code into an intermediate form that will make clear where the specificities happens, so that the "signature" of the vulnerability can be found if it is present in the program. The answer is no, because giving an unknown vulnerability, we do not know in advance about its specificities that caracterize its signature. It means that we somewhat have to take an approximative signature and check the program, but the result might be an over-approximation (a lot of false positives) or an under-approximation (finds nothing or few but vulnerabilities exist without being detected). As fuzzing and black-box testing are dynamic analysis, the core of our analyzer is not as such, but it can find an interest to run the program for a different purpose than a fuzzer. Those try their chance on a randomly crafted input. Fuzzer does not have a *inner* knowledge of the program they analyze. This is a major issue because the dynamic analyzer that is a fuzzer cannot optimize or refine its inputs depending on what are unobservable events for him. A fuzzer can as well be coupled with a tracer [AD] or a debugger, so that fuzzing is guided by the debugger knowledge about internal memory states and variable values during the execution of the program. Nevertheless, the real concept of a code analysis tool must be an integrated solution, to avoid losing even more performance when using an external debugger (like gdb which is awfully slow when using ptrace). Our technique of analysis is capable of taking decisions depending on internal states of a program even without executing them. However, our representation of a state is abstract : we do not compute the whole content of the real memory state at each step of execution, but consider only the meaningful information about the behavior of the program by automatically letting the analyzer to annotate the code with qualifiers such as : "The next instruction of the will perform a memory allocation" or "Register R or memory cell M will contain a pointer on a dynamically allocated memory region". We will explain in more details heap related properties checking in the type-state analysis paragraph of Part III. In this part of the paper, we will describe a family of intermediate forms which bridge the gap between code analysis on a structured code, and code analysis on an unstructured (assembly) code. Conversion to those intermediate forms can be done from binary code (like in an analyzing decompiler) or from source code (like in an analyzing compiler). In this article, we will transform binary code into a program written in an intermediate form, and then perform all the analysis on this intermediate form. All the studies properties will be related to dataflow analysis. No structured control flow is necessary to perform those, a simple control flow graph (or even list of basic blocks with xrefs) can be the starting point of such analysis. Lets be more concrete a illustrate how we can analyze the internal states of a program without executing it. We start with a very basic piece of code: Stub 1: ------- o o : internal state if (a) / \ b++; -> o o /\ : control-flow splitting else \ / \/ : control-flow merging c--; o ------- In this simplistic example, we represent the program as a graph whoose nodes are states and edges are control flow dependencies. What is an internal state ? If we want to use all the information of each line of code, we need to make it an object remembering which variables are used and modified (including status flags of the processors). Then, each of those control state perform certains operations before jumping on another part of the code (represented by the internal state for the if() or else() code stubs). Once the if/else code is finished, both paths merge into a unique state, which is the state after having executed the conditional statement. Depending how abstract is the analysis, the internal program states will track more or less requested information at each computation step. For example, once must differentiate a control-flow analysis (like in the previous example), and a dataflow analysis. Imagine this piece of code: Stub 2: ------- Code Control-flow Data-flow with predicates a ---o--- / \ \ / \ \ / c \ \ c = 21; o | o b o \ b = a; | | / \ / \ a = 42; o \/ ------ / if (b != c) / \ /\ |b != c| / a++; o o / \ ------ / else \ / / \ / \ / a--; o | a o a o c += a; | \ | / ------- o \ | / \ | / \ | / c o | (...) In a dataflow graph, the nodes are the variables, and the arrow are the dependences between variables. The control-flow and data-flow graphs are actually complementary informations. One only cares about the sequentiality in the graph, the other one care about the dependences between the variables without apparently enforcing any order of evaluation. Adding predicates to a dataflow graph helps at determining which nodes are involved in a condition and which instance of the successors data nodes (in our case, variable a in the if() or the else()) should be considered for our analysis. As you can see, even a simple data-flow graph with only few variables starts to get messy already. To clarify the reprensentation of the program we are working on, we need some kind of intermediate representation that keep the sequentiality of the control-flow graph, but also provide the dependences of the data-flow graph, so we can reason on both of them using a single structure. We can use some kind of "program dependence graph" that would sum it up both in a single graph. That is the graph we will consider for the next examples of the article. Some intermediate forms introduces special nodes in the data-flow graph, and give a well-recognizable types to those nodes. This is the case of Phi() and Sigma() nodes in the Static Single Assignment [SSA] and Static Single Information [SSI] intermediate forms and that facilitates indeed the reasoning on the data-flow graph. Additionally, decomposing a single variable into multiple "single assignments" (and multiple single use too, in the SSI form), that is naming uniquely each apparition of a given variable, help at desambiguizing which instance of the variable we are talking about at a given point of the program: Stub 2 in SSA form Stub 2 in SSI form Data-flow graph in SSI form ------------------ ------------------ -------------------------- c1 = 21; c1 = 21; o a1 b1 = a1; b1 = a1; / \ if (b1 != c1) (a3, a4) = Sigma(a2); (a3, a4) = Sigma(a2) o o b1 a2 = a1 + 1; if (b1 != c1) /| else a3 = a2 + 1; / | / | / | / | o c1 a3 = a1 - 1; else | | | a4 = Phi(a2, a3) a4 = a2 - 1; a3 o o a4 | c2 = c1 + a4; a5 = Phi(a3, a4); \ | | c2 = c1 + a5; \ | | ---------------- ------------------- \ | | \| | a5 = Phi(a3, a4) o | \ / o c2 . . . Note that we have not put the predicates (condition test) in that graph. In practice, its more convenient to have additional links in the graph, for predicates (that ease the testing of the predicate when walking on the graph), but we have removed it just for clarifying what is SSA/SSI about. Those "symbolic-choice functions" Phi() and Sigma() might sound a little bit abstract. Indeed, they dont change the meaning of a program, but they capture the information that a given data node has multiple successors (Sigma) or ancestors (Phi). The curious reader is invited to look at the references for more details about how to perform the intermediate translation. We will here focuss on the use of such representation, especially when analyzing code with loops, like this one: Stub 3 C code Stub 3 in Labelled SSI form ------------- --------------------------- int a = 42; int a1 = 42; int i = 0; int i1 = 0; P1 = [i1 < a1] (, ) = Sigma(P1,i2); (, ) = Sigma(P1,a2); while (i < a) { => Loop: a3 = Phi(, ); i3 = Phi(, ); a--; a5 = a4 - 1; i++; i5 = i4 + 1; P2 = [i5 < a5] (, ) = Sigma(P2,a6); (, ) = Sigma(P2,i6); } End: a8 = Phi(, ); i8 = Phi(, ); a += i; a10 = a9 + i9; ----------- --------------------------------- By trying to synthetize this form a bit more (grouping the variables under a unique Phi() or Sigma() at merge or split points of the control flow graph), we obtain a smaller but identical program. This time, the Sigma and Phi functions do not take a single variable list in parameter, but a vector of list (one list per variable): Stub 3 in Factored & Labelled SSI form -------------------------------------- int a1 = 42; int i1 = 0; P1 = [i1 < a1] (, ) (i2) ( ) = Sigma(P1,( )); (, ) (a2) Loop: (a3) (, ) ( ) = Phi( ); (i3) (, ) a5 = a4 - 1; i5 = i4 + 1; P2 = [i5 < a5] (, ) (a6) ( ) = Sigma(P2, ( )); (, ) (i6) End: (a8) (, ) ( ) = Phi( ); (i8) (, ) a10 = a9 + i9; ---------------------------------------- How can we add information to this intermediate form ? Now the Phi() and Sigma() functions allows us to reason about forward dataflow (in the normal execution order, using Sigma) and backward dataflow analysis (in the reverse order, using Phi). We can easily find the inductive variables (variables that depends on themselves, like the index or incrementing pointers in a loop), just using a simple analysis: Lets consider the Sigma() before each Label, and try to iterate its arguments: (, ) (a6) ( ) = Sigma(P2, ( )); (, ) (i6) -> (,) ( ) (, _|_ ) -> (, _|_ ) ( ) (, _|_ ) We take _|_ ("bottom") as a notation to say that a variable does not have any more successors after a certain iteration of the Sigma() function. After some iterations (in that example, 2), we notice that the left-hand side and the right-hand side are identical for variables a and i. Indeed, both side are written given a6 and i6. In the mathematical jargon, that is what is called a fixpoint (of a function F) : F(X) = X or in this precise example: a6 = Sigma(a6) By doing that simple iteration-based analysis over our symbolic functions, we are capable to deduce in an automated way which variables are inductives in loops. In our example, both a and i are inductive. This is very useful as you can imagine, since those variables become of special interest for us, especially when looking for buffer overflows that might happen on buffers in looping code. We will now somewhat specialize this analysis in the following part of this article, by showing how this representation can apply to -------------------[ III. Analysis The previous part of the article introduced various notions in program analysis. We might not use all the formalism in the future of this article, and focuss on concrete examples. However, keep in mind that we reason from now for analysis on the intermediate form programs. This intermediate form is suitable for both source code and binary code, but we will keep on staying at binary level for our examples, proposing the translation to C only for understanding purposes. Until now, we have shown our to understand data-flow analysis and finding inductive variables from the (source or binary) code of the program. So what are the steps to find vulnerabilities now ? A first intuition is that there is no generic definition for a vulnerability. But if we can describes them as behavior that violates a certain precise property, we are able to state if a program has a vulnerability or not. Generally, the property depends on the class of bugs you want to analyse. For instance, properties that express buffer overflow safety or property that express a heap corruption (say, a double free) are different ones. In the first case, we talk about the indexation of a certain memory zone which has to never go further the limit of the allocated memory. Additionally, for having an overflow, this must be a write access. In case we have a read access, we could refer this as an info-leak bug, which may be blindly or unblindly used by an attacker, depending if the result of the memory read can be inspected from outside the process or not. Sometimes a read-only out of bound access can also be used to access a part of the code that is not supposed to be executed in such context (if the out-of-bound access is used in a predicate). In all cases, its interesting anyway to get the information by our analyzer of this unsupposed behavior, because this might lead to a wrong behavior, and thus, a bug. In this part of the article, we will look at different class of bugs, and understand how we can caracterize them, by running very simple and repetitive, easy to implement, algorithm. This algorithm is simple only because we act on an intermediate form that already indicates the meaningful dataflow and controlflow facts of the program. Additionally, we will reason either forward or backward, depending on what is the most adapted to the vulnerability. We will start by an example of numerical interval analysis and show how it can be useful to detect buffer overflows. We will then show how the dataflow graph without any value information can be useful for finding problems happening on the heap. We will enrich our presentation by describing a very classic problem in program analysis, which is the discovery of equivalence between pointers (do they point always on the same variable ? sometimes only ? never ?), also known as alias analysis. We will explain why this analysis is mandatory for any serious analyzer that acts on real-world programs. Finally, we will give some more hints about analyzing concurrency properties inside multithread code, trying to caracterize what is a race condition. ------------[ A. Numerical intervals When looking for buffer overflows or integer overflows, the mattering information is about the values that can be taken by memory indexes or integer variables, which is a numerical value. Obviously, it would not be serious to compute every single possible value for all variables of the program, at each program path : this would take too much time to compute and/or too much memory for the values graph to get mapped entirely. By using certain abstractions like intervals, we can represent the set of all possible values of a program a certain point of the program. We will illustrate this by an example right now. The example itself is meaningless, but the interesting point is to understand the mecanized way of deducing information using the dataflow information of the program graph. We need to start by a very introductionary example, which consists of finding Stub 4 Interval analysis of stub 4 ------- --------------------------- int a, b; b = 0; b = [0 to 0] if (rand()) b--; b = [-1 to -1] else b++; b = [1 to 1] After if/else: b = [-1 to 1] a = 1000000 / b; a = [1000000 / -1 to 1000000 / 1] [Reported Error: b can be 0] In this example, a flow-insensitive analyzer will merge the interval of values at each program control flow merge. This is a seducing approach as you need to pass a single time on the whole program to compute all intervals. However, this approach is untractable most of the time. Why ? In this simple example, the flow-insensitive analyzer will report a bug of potential division by 0, whereas it is untrue that b can reach the value 0 at the division program point. This is because 0 is in the interval [-1 to 1] that this false positive is reported by the analyzer. How can we avoid this kind of over-conservative analysis ? We need to introduce some flow-sensitiveness to the analysis, and differentiate the interval for different program path of the program. If we do a complete flow sensitive analysis of this example, we have: Stub 4 Interval analysis of stub 4 ------- --------------------------- int a, b; b = 0; b = [0 to 0] if (rand()) b--; b = [-1 to -1] else b++; b = [1 to 1] After if/else: b = [-1 to -1 OR 1 to 1] a = 1000000 / b; a = [1000000 / -1 to 1000000 / -1] or [1000000 / 1 to 1000000 / 1] = {-1000000 or 1000000} Then the false positive disapears. We may take care of avoiding to be flow sensitive from the beginning. Indeed, if the flow-insensitive analysis gives no bug, then no bugs will be reported by the flow-sensitive analysis either (at least for this example). Additionally, computing the whole flow sensitive sets of intervals at some program point will grow exponentially in the number of data flow merging point (that is, Phi() function of the SSA form). For this reason, the best approach seems to start with a completely flow insensitive, and refine the analysis on demand. If the program is transforted into SSI form, then it becomes pretty easy to know which source intervals we need to use to compute the destination variable interval of values. We will use the same kind of analysis for detecting buffer overflows, in that case the interval analysis will be used on the index variables that are used for accessing memory at a certain offset from a given base address. Before doing this, we might want to do a remark on the choice of an interval abstraction itself. This abstraction does not work well when bit swapping is involved into the operations. Indeed, the intervals will generally have meaningless values when bits are moved inside the variable. If a cryptographic operation used bit shift that introduces 0 for replacing shifted bits, that would not be a a problem, but swapping bits inside a given word is a problem, since the output interval is then meaningless. ex: c = a | b (with A, B, and C integers) c = a ^ b c = not(c) Giving the interval of A and B, what can we deduce for the intervals of C ? Its less trivial than a simple numerical change in the variable. Interval analysis is not very well adapted for analyzing this kind of code, mostly found in cryptographic routines. We will now analyze an example that involves a buffer overflow on the heap. Before doing the interval analysis, we will do a first pass to inform us about the statement related to memory allocation and disallocation. Knowing where memory is allocated and disallocated is a pre-requirement for any further bound checking analysis. Stub 5 Interval analysis with alloc annotations ------ ---------------------------------------- char *buf; buf = _|_ (uninitialized) int n = rand(); n = [-Inf, +Inf] buf = malloc(n) buf = initialized of size [-Inf to Inf] i = 0; i = [0,0], [0,1] ... [0,N] while (i <= n) { assert(i < N) buf[i] = 0x00; i++; i = [0,1], [0,2] ... [0,N] (iter1 iter2 ... iterN) } return (i); Lets first explain that the assert() is a logical representation in the intermediate form, and is not an assert() like in C program. Again, we never do any dynamic analysis but only static analysis without any execution. In the static analysis of the intermediate form program, a some point the control flow will reach a node containing the assert statement. In the intermediate (abstract) word, reaching an assert() means performing a check on the abstract value of the predicate inside the assert (i < N). In other words, the analyzer will check if the assert can be false using interval analysis of variables, and will print a bug report if it can. We can also let the assert() implicits, but representing them explicitely make the analysis more generic, modular, and adaptable to the user. As you can see, there is a one-byte-overflow in this example. It is pretty trivial to spot it manually, however we want to develop an automatic routine for doing it. If we deploy the analysis that we have done in the previous example, the assert() that was automatically inserted by the analyzer after each memory access of the program will fail after N iterations. This is because arrays in the C language start with index 0 and finish with an index inferior of 1 to their allocated size. Whatever kind of code will be inserted between those lines (except, of course, bit swapping as previously mentioned), we will always be able to propagate the intervals and find that memory access are done beyond the allocated limit, then finding a clear memory leak or memory overwrite vulnerability in the program. However, this specific example brings 2 more questions: - We do not know the actual value of N. Is it a problem ? If we manage to see that the constraint over the index of buf is actually the same variable (or have the same value than) the size of the allocated buffer, then it is not a problem. We will develop this in the alias analysis part of this article when this appears to be a difficulty. - Whatever the value of N, and provided we managed to identify N all definitions and use of the variable N, the analyzer will require N iteration over the loop to detect the vulnerability. This is not acceptable, especially if N is very big, which in that case many minuts will be necessary for analysing this loop, when we actually want an answer in the next seconds. The answer for this optimization problem is a technique called Widening, gathered from the theory of abstract interpretation. Instead of executing the loop N times until the loop condition is false, we will directly in 1 iteration go to the last possible value in a certain interval, and this as soon as we detect a monotonic increase of the interval. The previous example would then compute like in: Stub 5 Interval analysis with Widening ------ ------------------------------- char *buf; buf = _|_ (uninitialized) int n = rand(); n = [-Inf, +Inf] buf = malloc(n) buf = initialized of size [-Inf to Inf] i = 0; i = [0,0] while (i <= n) { assert(i < N); iter1 iter2 iter3 iter4 ASSERT! buf[i] = 0x00; i = [0,0], [0,1] [0,2] [0,N] i++; i = [0,1], [0,2] [0,3] [0,N] } return (i); Using this test, we can directly go to the biggest possible interval in only a few iterations, thus reducing drastically the requested time for finding the vulnerability. However this optimization might introduce additional difficulties when conditional statement is inside the loop: Stub 6 Interval analysis with Widening ------ ------------------------------- char *buf; buf = _|_ (uninitialized) int n = rand() + 2; n = [-Inf, +Inf] buf = malloc(n) buf = initialized of size [-Inf to Inf] i = 0; i = [0,0] while (i <= n) i = [0,0] [0,1] [0,2] [0,N] [0,N+1] { if (i < n - 2) i = { assert(i < N - 1) [Never triggered !] buf[i] = 0x00; i = [0,0] [0,1] [0,2] [0,N] } i++; i = [0,1] [0,2] [0,3] [0,N] [0,N+1] } return (i); In this example, we cannot assume that the interval of i will be the same everywhere in the loop (as we might be tempted to do as a first hint for handling intervals in a loop). Indeed, in the middle of the loop stands a condition (with predicate being i < n - 2) which forbids the interval to grow in some part of the code. This is problematic especially if we decide to use widening until the loop breaking condition. We will miss this more subtle repartition of values in the variables of the loop. The solution for this is to use widening with thresholds. Instead of applying widening in a single time over the entire loop, we will define a sequel of values which corresponds to "strategic points" of the code, so that we can decide to increase precisely using a small-step values iteration. The strategic points can be the list of values on which a condition is applied. In our case we would apply widening until n = N - 2 and not until n = N. This way, we will not trigger a false positive anymore because of an overapproximation of the intervals over the entire loop. When each step is realized, that allows to annotate which program location is the subject of the widening in the future (in our case: the loop code before and after the "if" statement). Note that, when we reach a threshold during widening, we might need to apply a small-step iteration more than once before widening again until the next threshold. For instance, when predicates such as (a != immed_value) are met, they will forbid the inner code of the condition to have their interval propagated. However, they will forbid this just one iteration (provided a is an inductive variable, so its state will change at next iteration) or multiple iterations (if a is not an inductive variable and will be modified only at another moment in the loop iterative abstract execution). In the first case, we need only 2 small-step abstract iterations to find out that the interval continues to grow after a certain iteration. In the second case, we will need multiple iteration until some condition inside the loop is reached. We then simply needs to make sure that the threshold list includes the variable value used at this predicate (which heads the code where the variable a will change). This way, we can apply only 2 small-step iterations between those "bounded widening" steps, and avoid generating false positives using a very optimized but precise abstract evaluation sequence. In our example, we took only an easy example: the threshold list is only made of 2 elements (n and (n - 2)). But what if a condition is realized using 2 variables and not a variable and an immediate value ? in that case we have 3 cases: CASE1 - The 2 variables are inductive variables: in that case, the threshold list of the two variables must be fused, so widening do not step over a condition that would make it lose precision. This seem to be a reasonable condition when one variable is the subject of a constraint that involve a constant and the second variable is the subject of a constraint that involve the first variable: Stub 7: Threshold discovery ------- ------------------- int a = MIN_LOWERBOUND; int b = MAX_UPPERBOUND; int i = 0; int n = MAXSIZE; while (i < n) Found threshold n { if (a < i < b) Found predicate involving a and b (...) if (a > sizeof(something)) Found threshold for a i = b; else if (b + 1 < sizeof(buffer)) Found threshold for b i = a; } In that case, we can define the threshold of this loop being a list of 2 values, one being sizeof(something), the other one being sizeof(buffer) or sizeof(buffer) - 1 in case the analyzer is a bit more clever (and if the assembly code makes it clear that the condition applyes on sizeof(buffer) - 1). CASE2 - One of the variable is inductive and the other one is not. So we have 2 subcases: - The inductive variable is involved in a predicate that leads to modification of the non-inductive variable. It is not possible without the 2 variables being inductives !Thus we fall into the case 1 again. - The non-inductive variable is involved in a predicate that leads to modification of the inductive variable. In that case, the non-inductive variable would be invariant over the loop, which mean that a test between its domain of values (its interval) and the domain of the inductive variable is required as a condition to enter the code stubs headed by the analyzed predicate. Again, we have 2 sub-subcases: * Either the predicate is a test == or !=. In that case, we must compute the intesection of both variables intervals. If the intersection is void, the test will never true, so its dead code. If the intersection is itself an interval (which will be the case most of the time), it means that the test will be true over this inductive variable intervals of value, and false over the remaining domain of values. In that case, we need to put the bounds of the non-inductive variable interval into the threshold list for the widening of inductive variables that depends on this non-inductive variable. * Or the predicate is a comparison : a < b (where a or b is an inductive variable). Same remarks holds : we compute the intersection interval between a and b. If it is void, the test will always be true or false and we know this before entering the loop. If the interval is not void, we need to put the bounds of the intersection interval in the widening threshold of the inductive variable. CASE3 - None of the variables are inductive variables In that case, the predicate that they define has a single value over the entire loop, and can be computed before the loop takes place. We then can turn the conditional code into an unconditional one and apply widening like if the condition was not existing. Or if the condition is always false, we would simply remove this code from the loop as the content of the conditional statement will never be reached. As you can see, we need to be very careful in how we perform the widening. If the widening is done without thresholds, the abstract numerical values will be overapproximative, and our analysis will generate a lot of false positives. By introducing thresholds, we sacrify very few performance and gain a lot of precision over the looping code analysis. Widening is a convergence accelerator for detecting problems like buffer overflow. Some overflow problem can happen after millions of loop iteration and widening brings a nice solution for getting immediate answers even on those constructs. I have not detailed how to find the size of buffers in this paragraph. Wether the buffers are stack or heap allocated, they need to have a fixed size at some point and the stack pointer must be substracted somewhere (or malloc needs to be called, etc) which gives us the information of allocation alltogether with its size, from which we can apply our analysis. We will now switch to the last big part of this article, by explaining how to check for another class of vulnerability. ------------[ B. Type state checking (aka double free, memory leaks, etc) There are some other types of vulnerabilities that are slightly different to check. In the previous part we explained how to reason about intervals of values to find buffer overflows in program. We presented an optimization technique called Widening and we have studied how to weaken it for gaining precision, by generating a threshold list from a set of predicates. Note that we havent explicitely used what is called the "predicate abstraction", which may lead to improving the efficiency of the analysis again. The interested reader will for sure find resources about predicate abstraction on any good research oriented search engine. Again, this article is not intended to give all solutions of the problem of the world, but introduce the novice hacker to the concrete problematic of program analysis. In this part of the article, we will study how to detect memory leaks and heap corruptions. The basic technique to find them is not linked with interval analysis, but interval analysis can be used to make type state checking more accurate (reducing the number of false positives). Lets take an example of memory leak to be concrete: Stub 8: ------- 1. u_int off = 0; 2. u_int ret = MAXBUF; 3. char *buf = malloc(ret); 4. do { 5. off += read(sock, buf + off, ret - off); 6. if (off == 0) 7. return (-ERR); 8. else if (ret == off) 9. buf = realloc(buf, ret * 2); 10.} while (ret); 11. printf("Received %s \n", buf); 12. free(buf); 13. return; In that case, there is no overflow but if some condition appears after the read, an error is returned without freeing the buffer. This is not a vulnerability as it, but it can help a lot for managing the memory layout of the heap while trying to exploit a heap overflow vulnerability. Thus, we are also interested in detecting memory leak that turns some particular exploits into powerful weapons. Using the graphical representation of control flow and data flow, we can easily find out that the code is wrong: Graph analysis of Stub 8 ------------------------ o A A: Allocation | | o<---- | \ o \ / \ \ / \ \ R: Return R o o REA / REA: Realloc \ / / \ / / o / | / | / | / | / |/ o | F: Free F o | R o R: Return Note that this representation is not a data flow graph but a control-flow graph annotated with data allocation information for the BUF variable. This allows us to reason about existing control paths and sequence of memory related events. Another way of doing this would have been to reason about data dependences together with the predicates, as done in the first part of this article with the Labelled SSI form. We are not dogmatic towards one or another intermediate form, and the reader is invited to ponder by himself which representation fits better to his understanding. I invite you to think twice about the SSI form which is really a condensed view of lots of different information. For pedagogical purpose, we switch here to a more intuitive intermediate form that express a similar class of problems. Stub 8: ------- 0. #define PACKET_HEADER_SIZE 20 1. int off = 0; 2. u_int ret = 10; 3. char *buf = malloc(ret); M 4. do { 5. off += read(sock, buf + off, ret - off); 6. if (off <= 0) 7. return (-ERR); R 8. else if (ret == off) 9. buf = realloc(buf, (ret = ret * 2)); REA 10.} while (off != PACKET_HEADER_SIZE); 11. printf("Received %s \n", buf); 12. free(buf); F 13. return; R Using simple DFS (Depth-First Search) over the graph representing Stub 8, we are capable of extracting sequences like: 1,2,(3 M),4,5,6,8,10,11,(12 F),(12 R) M...F...R -noleak- 1,2,(3 M),4,(5,6,8,10)*,11,(12 F),(12 R) M(...)*F...R -noleak- 1,2,(3 M),4,5,6,8,10,5,6,(7 R) M...R -leak- 1,2,(3 M),(4,5,6,8,10)*,5,6,(7 R) M(...)*R -leak- 1,2,(3 M),4,5,6,8,(9 REA),10,5,6,(7 R) M...REA...R -leak- 1,2,(3 M),4,5,6,(7 R) M...R -leak- etc More generally, we can represent the set of all possible traces for this example : 1,2,3,(5,6,(7 | 8(9 | Nop)) 10)*,(11,12,13)* with | meaning choice and * meaning potential looping over the events placed between (). As the program might loop more than once or twice, a lot of different traces are potentially vulnerable to the memory leak (not only the few we have given), but all can be expressed using this global generic regular expression over events of the loop, with respect to this regular expression: .*(M)[^F]*(R) that represent traces containing a malloc followed by a return without an intermediate free, which corresponds in our program to: .*(3)[^12]*(7) = .*(3).*(7) # because 12 is not between 3 and 7 in any cycle In other words, if we can extract a trace that leads to a return after passing by an allocation not followed by a free (with an undetermined number of states between those 2 steps), we found a memory leak bug. We can then compute the intersection of the global regular expression trace and the vulnerable traces regular expression to extract all potential vulnerable path from a language of traces. In practice, we will not generate all vulnerable traces but simply emit a few of them, until we find one that we can indeed trigger. Clearly, the first two trace have a void intersection (they dont contain 7). So those traces are not vulnerable. However, the next traces expressions match the pattern, thus are potential vulnerable paths for this vulnerability. We could use the exact same system for detecting double free, except that our trace pattern would be : .*(F)[^A]*(F) that is : a free followed by a second free on the same dataflow, not passing through an allocation between those. A simple trace-based analyzer can detect many cases of vulnerabilities using a single engine ! That superclass of vulnerability is made of so called type-state vulnerabilities, following the idea that if the type of a variable does not change during the program, its state does, thus the standard type checking approach is not sufficient to detect this kind of vulnerabilities. As the careful reader might have noticed, this algorithm does not take predicates in account, which means that if such a vulnerable trace is emitted, we have no garantee if the real conditions of the program will ever execute it. Indeed, we might extract a path of the program that "cross" on multiple predicates, some being incompatible with others, thus generating infeasible paths using our technique. For example in our Stub 8 translated to assembly code, a predicate-insensitive analysis might generate the trace: 1,2,3,4,5,6,8,9,10,11,12,13 which is impossible to execute because predicates holding at states 8 and 10 cannot be respectively true and false after just one iteration of the loop. Thus such a trace cannot exist in the real world. We will not go further this topic for this article, but in the next part, we will discuss various improvements of what should be a good analysis engine to avoid generating too much false positives. ------------[ C. How to improve In this part, we will review various methods quickly to determine how exactly it is possible to make the analysis more accurate and efficient. Current researchers in program analysis used to call this a "counter-example guided" verification. Various techniques taken from the world of Model Checking or Abstract Interpretation can then be used, but we will not enter such theoretical concerns. Simply, we will discuss the ideas of those techniques without entering details. The proposed chevarista analyzer in appendix of this article only perform basic alias analysis, no predicate analysis, and no thread scheduling analysis (as would be useful for detecting race conditions). I will give the name of few analyzer that implement this analysis and quote which techniques they are using. ----------------------[ a. Predicate analysis and the predicate lattice Predicate abstraction [PA] is about collecting all the predicates in a program, and constructing a mathematic object from this list called a lattice [LAT]. A lattice is a set of objects on which a certain (partial) order is defined between elements of this set. A lattice has various theoretical properties that makes it different than a partial order, but we will not give such details in this article. We will discuss about the order itself and the types of objects we are talking about: - The order can be defined as the union of objects (P < Q iif P is included in Q) - The objects can be predicates - The conjunction (AND) of predicate can be the least upper bound of N predicates. Predicates (a > 42) and (b < 2) have as upper bound: (a > 42) && (b < 2) - The disjunction (OR) of predicates can be the greatest lower bound of N predicates. Predicates (a > 42) and (b < 2) would have as lower bound: (a > 42) || (b < 2) So the lattice would look like: (a > 42) && (b < 2) / \ / \ / \ (a > 42) (b < 2) \ / \ / \ / (a > 42) || (b < 2) Now imagine we have a program that have N predicates. If all predicates can be true at the same time, the number of combinations between predicates will be 2 at the power of N. THis is without counting the lattice elements which are disjunctions between predicates. The total number of combinations will then be then 2*2pow(N) - N : We have to substract N because the predicates made of a single atomic predicates are shared between the set of conjunctives and the set of disjunctive predicates, which both have 2pow(N) number of elements including the atomic predicates, which is the base case for a conjunction (pred && true) or a disjunction (pred || false). We may also need to consider the other values of predicates : false, and unknown. False would simply be the negation of a predicate, and unknown would inform about the unknown truth value for a predicate (either false or true, but we dont know). In that case, the number of possible combinations between predicates is to count on the number of possible combinations of N predicates, each of them being potentially true, false, or unknown. That makes up to 3pow(N) possibilities. This approach is called three-valued logic [TVLA]. In other words, we have a exponential worse case space complexity for constructing the lattice of predicates that correspond to an analyzed program. Very often, the lattice will be smaller, as many predicates cannot be true at the same time. However, there is a big limitation in such a lattice: it is not capable to analyze predicates that mix AND and OR. It means that if we analyze a program that can be reached using many different set of predicates (say, by executing many different possible paths, which is the case for reusable functions), this lattice will not be capable to give the most precise "full" abstract representation for it, as it may introduce some flow-insensitivity in the analysis (e.g. a single predicate combinations will represent multiple different paths). As this might generate false positives, it looks like a good trade-off between precision and complexity. Of course, this lattice is just provided as an example and the reader should feel free to adapt it to its precise needs and depending on the size of the code to be verified. It is a good hint for a given abstraction but we will see that other information than predicates are important for program analysis. ---------------------[ b. Alias analysis is hard A problem that arises in both source code but even more in binary code automated auditing is the alias analysis between pointers. When do pointers points on the same variables ? This is important in order to propagate the infered allocation size (when talking about a buffer), and to share a type-state (such as when a pointer is freed or allocated : you could miss double free or double-something bugs if you dont know that 2 variables are actually the same). There are multiple techniques to achieve alias analysis. Some of them works inside a single function (so-called intraprocedural [DDA]). Other works across the boundaries of a function. Generally, the more precise is your alias analysis, the smaller program you will be capable to analyze. It seems quite difficult to scale to millions of lines of code if tracking every single location for all possible pointers in a naive way. In addition to the problem that each variable might have a very big amount of aliases (especially when involving aliases over arrays), a program translated to a single-assignment or single-information form has a very big amount of variables too. However the live range of those variables is very limited, so their number of aliases too. It is necessary to define aliasing relations between variables so that we can proceed our analysis using some extra checks: - no_alias(a,b) : Pointers a and b definitely points on different sets of variables - must_alias(a,b) : Pointers a and b definitely points on the same set of variables - may_alias(a,b) : The "point-to" sets for variables a and b share some elements (non-null intersection) but are not equal. NoAliasing and MustAliasing are quite intuitive. The big job is definitely the MayAliasing. For instance, 2 pointers might point on the same variable when executing some program path, but on different variables when executing from another path. An analysis that is capable to make those differences is called a path-sensitive analysis. Also, for a single program location manipulating a given variable, the point-to set of the variable can be different depending on the context (for example : the set of predicates that are true at this moment of abstract program interpretation). An analysis that can reason on those differences is called context-sensitive. Its an open problem in research to find better alias analysis algorithms that scale to big programs (e.g. few computation cost) and that are capable to keep sufficiently precision to prove security properties. Generally, you can have one, but not the other. Some analysis are very precise but only works in the boundaries of a function. Others work in a pure flow-insensitive manner, thus scale to big programs but are very imprecise. My example analyzer Chevarista implements only a simple alias analysis, that is very precise but does not scale well to big programs. For each pointer, it will try to compute its point-to set in the concrete world by somewhat simulating the computation of pointer arithmetics and looking at its results from within the analyzer. It is just provided as an example but is in no way a definitive answer to this problem. --------------------[ c. Hints on detecting race conditions Another class of vulnerability that we are interested to detect automatically are race conditions. Those vulnerability requires a different analysis to be discovered, as they relates to a scheduling property : is it possible that 2 thread get interleaved (a,b,a,b) executions over their critical sections where they share some variables ? If the variables are all well locked, interleaved execution wont be a problem anyway. But if locking is badly handled (as it can happens in very big programs such as Operating Systems), then a scheduling analysis might uncover the problem. Which data structure can we use to perform such analysis ? The approach of JavaPathFinder [JPF] that is developed at NASA is to use a scheduling graph. The scheduling graph is a non-cyclic (without loop) graph, where nodes represents states of the program and and edges represents scheduling events that preempt the execution of one thread for executing another. As this approach seems interesting to detect any potential scheduling path (using again a Depth First Search over the scheduling graph) that fails to lock properly a variable that is used in multiple different threads, it seems to be more delicate to apply it when we deal with more than 2 threads. Each potential node will have as much edges as there are threads, thus the scheduling graph will grow exponentially at each scheduling step. We could use a technique called partial order reduction to represent by a single node a big piece of code for which all instructions share the same scheduling property (like: it cannot be interrupted) or a same dataflow property (like: it uses the same set of variables) thus reducing the scheduling graph to make it more abstract. Again, the chevarista analyzer does not deal with race conditions, but other analyzers do and techniques exist to make it possible. Consider reading the references for more about this topic. -----------[ IV. Chevarista: an analyzer of binary programs Chevarista is a project for analyzing binary code. In this article, most of the examples have been given in C or assembly, but Chevarista only analyze the binary code without any information from the source. Everything it needs is an entry point to start the analysis, which you can always get without troubles, for any (working ? ;) binary format like ELF, PE, etc. Chevarista is a simplier analyzer than everything that was presented in this article, however it aims at following this model, driven by the succesful results that were obtained using the current tool. In particular, the intermediate form of Chevarista at the moment is a graph that contains both data-flow and control-flow information, but with sigma and phi functions let implicit. For simplicity, we have chosen to work on SPARC [SRM] binary code, but after reading that article, you might understand that the representations used are sufficiently abstract to be used on any architecture. One could argue that SPARC instruction set is RISC, and supporting CISC architecture like INTEL or ARM where most of the instruction are conditional, would be a problem. You are right to object on this because these architectures requires specific features of the architecture-dependant backend of the decompiler-analyzer. Currently, only the SPARc backend is coded and there is an empty skeleton for the INTEL architecture [IRM]. What are, in the detail, the difference between such architectures ? They are essentially grouped into a single architecture-dependant component : The Backend On INTEL 32bits processors, each instruction can perform multiple operations. It is also the case for SPARC, but only when conditional flags are affected by the result of the operation executed by the instruction. For instance, a push instruction write in memory, modify the stack pointer, and potentially modify the status flags (eflags register on INTEL), which make it very hard to analyze. Many instructions do more than a single operation, thus we need to translate into intermediate forms that make those operations more explicit. If we limit the number of syntactic constructs in that intermediate form, we are capable of performing architecture independant analysis much easier with all operations made explicit. The low-level intermediate form of Chevarista has around 10 "abstract operations" in its IR : Branch, Call, Ternop (that has an additional field in the structure indicating which arithmetic or logic operation is performed), Cmp, Ret, Test, Interrupt, and Stop. Additionally you have purely abstract operations (FMI: Flag Modifying Instruction), CFI (Control Flow Instruction), and Invoke (external functions calls) which allow to make the analysis further even more generic. Invoke is a kind of statement that inform the analyzer that it should not try to analyze inside the function being invoked, but consider those internals as an abstraction. For instance, types Alloc, Free, Close are child classes of the Invoke abstract class, which model the fact that malloc(), free(), or close() are called and the analyzer should not try to handle the called code, but consider it as a blackbox. Indeed, finding allocation bugs does not require to go analyzing inside malloc() or free(). This would be necessary for automated exploit generation tho, but we do not cover this here. We make use the Visitor Design Pattern for architecturing the analysis, as presented in the following paragraph. --------------------[ B. Program transformation & modeling The project is organized using the Visitor Design Pattern [DP]. To sum-up, the Visitor Design Pattern allows to walk on a graph (that is: the intermediate form representation inside the analyzer) and transform the nodes (that contains either basic blocs for control flow analysis, or operands for dataflow analysis: indeed the control or data flow links in the graph represents the ancestors / successors relations between (control flow) blocs or (data flow) variables. The project is furnished as it: visitor: The default visitor. When the graph contains node which type are not handled by the current visitor, its this visitor that perform the operation. THe default visitor is the root class of the Visitor classes hierarchy. arch : the architecture backend. Currently SPARC32/64 is fully provided and the INTEL backend is just a skeleton. The whole proof of concept was written on SPARC for simplicity. This part also includes the generic code for dataflow and control flow computations. graph : It contains all the API for constructing graphs directly into into the intermediate language. It also defines all the abstract instructions (and the "more" abstract instruction as presented previously) gate : This is the interprocedural analysis visitor. Dataflow and Control flow links are propagated interprocedurally in that visitor. Additionally, a new type "Continuation" abstracts different kind of control transfer (Branch, Call, Ret, etc) which make the analysis even easier to perform after this transformation. alias : Perform a basic point-to analysis to determine obvious aliases between variables before checking for vulnerabilities. THis analysis is exact and thus does not scale to big programs. There are many hours of good reading and hacking to improve this visitor that would make the whole analyzer much more interesting in practice on big programs. heap : This visitor does not perform a real transformation, but simplistic graph walking to detect anomalies on the data flow graph. Double frees, Memory leaks, and such, are implemented in that Visitor. print : The Print Visitor, simply prints the intermediate forms after each transformation in a text file. printdot : Print in a visual manner (dot/graphviz) the internal representation. This can also be called after each transformation but we currently calls it just at this end of the analysis. Additionally, another transformation have been started but is still work in progress: symbolic : Perform translation towards a more symbolic intermediate forms (such as SSA and SSI) and (fails to) structure the control flow graphs into a graph of zones. This visitor is work in progress but it is made part of this release as Chevarista will be discontinued in its current work, for being implemented in the ERESI [RSI] language instead of C++. --------------- ----------- ----------- ---------- | | | | | | | | RAW | Architecture | | Gate | | Alias | | Heap | ----> | | -> | | -> | | -> | | -> Results ASM | Backend | | Visitor | | Visitor | | Visitor | | | | | | | | | --------------- ----------- ----------- ---------- --------------------[ C. Vulnerability checking Chevarista is used as follow in this demo framework. A certain big testsuits of binary files is provided in the package and the analysis is performed. In only a couple of seconds, all the analysis is finished: # We execute chevarista on testsuite binary 34 $ autonomous/chevarista ../testsuite/34.elf .:/\ Chevarista standalone version /\:. [...] => chevarista Detected SPARC Chevarista IS STARTING Calling sparc64_IDG Created IDG SPARC IDG : New bloc at addr 0000000000100A34 SPARC IDG : New bloc at addr 00000000002010A0 [!] Reached Invoke at addr 00000000002010A4 SPARC IDG : New bloc at addr 0000000000100A44 Cflow reference to : 00100A50 Cflow reference from : 00100A48 Cflow reference from : 00100C20 SPARC IDG : New bloc at addr 0000000000100A4C SPARC IDG : New bloc at addr 0000000000100A58 SPARC IDG : New bloc at addr 0000000000201080 [!] Reached Invoke at addr 0000000000201084 SPARC IDG : New bloc at addr 0000000000100A80 SPARC IDG : New bloc at addr 0000000000100AA4 SPARC IDG : New bloc at addr 0000000000100AD0 SPARC IDG : New bloc at addr 0000000000100AF4 SPARC IDG : New bloc at addr 0000000000100B10 SPARC IDG : New bloc at addr 0000000000100B70 SPARC IDG : New bloc at addr 0000000000100954 Cflow reference to : 00100970 Cflow reference from : 00100968 Cflow reference from : 00100A1C SPARC IDG : New bloc at addr 000000000010096C SPARC IDG : New bloc at addr 0000000000100A24 Cflow reference to : 00100A2C Cflow reference from : 00100A24 Cflow reference from : 00100A08 SPARC IDG : New bloc at addr 0000000000100A28 SPARC IDG : New bloc at addr 0000000000100980 SPARC IDG : New bloc at addr 0000000000100A10 SPARC IDG : New bloc at addr 00000000001009C4 SPARC IDG : New bloc at addr 0000000000100B88 SPARC IDG : New bloc at addr 0000000000100BA8 SPARC IDG : New bloc at addr 0000000000100BC0 SPARC IDG : New bloc at addr 0000000000100BE0 SPARC IDG : New bloc at addr 0000000000100BF8 SPARC IDG : New bloc at addr 0000000000100C14 SPARC IDG : New bloc at addr 00000000002010C0 [!] Reached Invoke at addr 00000000002010C4 SPARC IDG : New bloc at addr 0000000000100C20 SPARC IDG : New bloc at addr 0000000000100C04 SPARC IDG : New bloc at addr 0000000000100910 SPARC IDG : New bloc at addr 0000000000201100 [!] Reached Invoke at addr 0000000000201104 SPARC IDG : New bloc at addr 0000000000100928 SPARC IDG : New bloc at addr 000000000010093C SPARC IDG : New bloc at addr 0000000000100BCC SPARC IDG : New bloc at addr 00000000001008E0 SPARC IDG : New bloc at addr 00000000001008F4 SPARC IDG : New bloc at addr 0000000000100900 SPARC IDG : New bloc at addr 0000000000100BD8 SPARC IDG : New bloc at addr 0000000000100B94 SPARC IDG : New bloc at addr 00000000001008BC SPARC IDG : New bloc at addr 00000000001008D0 SPARC IDG : New bloc at addr 0000000000100BA0 SPARC IDG : New bloc at addr 0000000000100B34 SPARC IDG : New bloc at addr 0000000000100B58 Cflow reference to : 00100B74 Cflow reference from : 00100B6C Cflow reference from : 00100B2C Cflow reference from : 00100B50 SPARC IDG : New bloc at addr 0000000000100B04 SPARC IDG : New bloc at addr 00000000002010E0 SPARC IDG : New bloc at addr 0000000000100AE8 SPARC IDG : New bloc at addr 0000000000100A98 Intraprocedural Dependance Graph has been built succesfully! A number of 47 blocs has been statically traced for flow-types [+] IDG built Scalar parameter REPLACED with name = %o0 (addr= 00000000002010A4) Backward dataflow analysis VAR %o0, instr addr 00000000002010A4 Scalar parameter REPLACED with name = %o0 (addr= 00000000002010A4) Backward dataflow analysis VAR %o0, instr addr 00000000002010A4 Scalar parameter REPLACED with name = %o0 (addr= 00000000002010A4) Backward dataflow analysis VAR %o0, instr addr 00000000002010A4 Backward dataflow analysis VAR %fp, instr addr 0000000000100A48 Return-Value REPLACED with name = %i0 (addr= 0000000000100A44) Backward dataflow analysis VAR %i0, instr addr 0000000000100A44 Backward dataflow analysis VAR %fp, instr addr 0000000000100A5C Return-Value REPLACED with name = %i0 (addr= 0000000000100A58) Backward dataflow analysis VAR %i0, instr addr 0000000000100A58 Backward dataflow analysis VAR [%fp + 7e7], instr addr 0000000000100A6C Scalar parameter REPLACED with name = %o0 (addr= 0000000000201084) Backward dataflow analysis VAR %o0, instr addr 0000000000201084 Scalar parameter REPLACED with name = %o0 (addr= 0000000000201084) Backward dataflow analysis VAR %o0, instr addr 0000000000201084 Scalar parameter REPLACED with name = %o1 (addr= 0000000000201084) Backward dataflow analysis VAR %o1, instr addr 0000000000201084 Scalar parameter REPLACED with name = %o1 (addr= 0000000000201084) Backward dataflow analysis VAR %o1, instr addr 0000000000201084 Scalar parameter REPLACED with name = %o2 (addr= 0000000000201084) Backward dataflow analysis VAR %o2, instr addr 0000000000201084 Scalar parameter REPLACED with name = %o2 (addr= 0000000000201084) Backward dataflow analysis VAR %o2, instr addr 0000000000201084 Backward dataflow analysis VAR %fp, instr addr 0000000000100A84 Return-Value REPLACED with name = %i0 (addr= 0000000000100A80) Backward dataflow analysis VAR %i0, instr addr 0000000000100A80 Backward dataflow analysis VAR [%fp + 7d3], instr addr 0000000000100AA4 Backward dataflow analysis VAR [%fp + 7df], instr addr 0000000000100ABC Backward dataflow analysis VAR [%fp + 7e7], instr addr 0000000000100AAC Backward dataflow analysis VAR %fp, instr addr 0000000000100AD4 Return-Value REPLACED with name = %i0 (addr= 0000000000100AD0) Backward dataflow analysis VAR %i0, instr addr 0000000000100AD0 Backward dataflow analysis VAR [%fp + 7d3], instr addr 0000000000100AF4 Backward dataflow analysis VAR [%fp + 7d3], instr addr 0000000000100B24 Backward dataflow analysis VAR [%fp + 7df], instr addr 0000000000100B18 Backward dataflow analysis VAR [%fp + 7e7], instr addr 0000000000100B70 Backward dataflow analysis VAR [%fp + 7e7], instr addr 0000000000100B70 Backward dataflow analysis VAR [%fp + 7e7], instr addr 0000000000100B70 Backward dataflow analysis VAR [%fp + 7e7], instr addr 0000000000100B38 Backward dataflow analysis VAR %fp, instr addr 0000000000100964 Backward dataflow analysis VAR %fp, instr addr 0000000000100964 Backward dataflow analysis VAR %fp, instr addr 0000000000100964 Scalar parameter REPLACED with name = %o0 (addr= 0000000000100958) Backward dataflow analysis VAR %o0, instr addr 0000000000100958 Scalar parameter REPLACED with name = %o0 (addr= 0000000000100958) [....] Backward dataflow analysis VAR %fp, instr addr 0000000000100B6C Backward dataflow analysis VAR [%fp + 7df], instr addr 0000000000100B60 Backward dataflow analysis VAR [%fp + 7e7], instr addr 0000000000100B58 [+] GateVisitor finished [+] AliasVisitor finished + Entered Node Splitting for Node id 24 + Entered Node Splitting for Node id 194 + Entered Node Splitting for Node id 722 + Entered Node Splitting for Node id 794 + Entered Node Splitting for Node id 1514 + Entered Node Splitting for Node id 1536 + Entered Node Splitting for Node id 1642 [+] SymbolicVisitor finished Entering DotVisitor + SESE visited + SESE visited * SESE already visited * SESE already visited + SESE visited + SESE visited * SESE already visited * SESE already visited * SESE already visited ! Node pointed by (nil) is NOT a SESE + SESE visited * SESE already visited * SESE already visited * SESE already visited [+] Print*Visitors finished Starting HeapVisitor Double Free found Double Free found Double malloc [+] Heap visitor finished [+] Chevarista has finished The run was performed in less than 2 seconds and multiple vulnerabilities have been found in the binary file (2 double free and one memory leak as indicated by the latest output). Its pretty useless without more information, which brings us to the results. -------------------------[ D. Vulnerable paths extraction Once the analysis has been performed, we can simply check what the vulnerable paths were: ~/IDA/sdk/plugins/chevarista/src $ ls tmp/ cflow.png chevarista.alias chevarista.buchi chevarista.dflow.dot \ chevarista.dot chevarista.gate chevarista.heap chevarista.lir \ chevarista.symbolic dflow.png Each visitor (transformation) outputs the complete program in each intermediate form. The most interesting thing is the output of the heap visitor that give us exactly the vulnerable paths: ~/IDA/sdk/plugins/chevarista/src $ cat tmp/chevarista.heap [%fp + 7e7] [%fp + 7df] [%l0] *********************************** * * * Multiple free of same variables * * * *********************************** ****************** path to free : 1 ****************** @0x2010a4 (0) {S} 32: inparam_%i0 = Alloc(inparam_%i0) @0x100a44 (4) {S} 46: %g1 = outparam_%o0 @0x100a48 (8) {S} 60: local_%fp$0x7e7 = %g1 @0x100bcc (8) {S} 1770: outparam_%o0 = local_%fp$0x7e7 @0x1008e4 (8) {S} 1792: local_%fp$0x87f = inparam_%i0 @0x1008f4 (8) {S} 1828: outparam_%o0 = local_%fp$0x87f @0x2010c4 (0) {S} 1544: inparam_%i0 = Free(inparam_%i0) ****************** path to free : 2 ****************** @0x2010a4 (0) {S} 32: inparam_%i0 = Alloc(inparam_%i0) @0x100a44 (4) {S} 46: %g1 = outparam_%o0 @0x100a48 (8) {S} 60: local_%fp$0x7e7 = %g1 @0x100b58 (8) {S} 2090: %g1 = local_%fp$0x7e7 @0x100b5c (8) {S} 2104: local_%fp$0x7d7 = %g1 @0x100b68 (8) {S} 2146: %g1 = local_%fp$0x7d7 @0x100b6c (8) {S} 2160: local_%fp$0x7df = %g1 @0x100c14 (8) {S} 1524: outparam_%o0 = local_%fp$0x7df @0x2010c4 (0) {S} 1544: inparam_%i0 = Free(inparam_%i0) ****************** path to free : 3 ****************** @0x2010a4 (0) {S} 32: inparam_%i0 = Alloc(inparam_%i0) @0x100a58 (4) {S} 96: %g1 = outparam_%o0 @0x100a5c (8) {S} 110: local_%fp$0x7df = %g1 @0x100c14 (8) {S} 1524: outparam_%o0 = local_%fp$0x7df @0x2010c4 (0) {S} 1544: inparam_%i0 = Free(inparam_%i0) ****************** path to free : 4 ****************** @0x2010a4 (0) {S} 32: inparam_%i0 = Alloc(inparam_%i0) @0x100a58 (4) {S} 96: %g1 = outparam_%o0 @0x100a5c (8) {S} 110: local_%fp$0x7df = %g1 @0x100b60 (8) {S} 2118: %g1 = local_%fp$0x7df @0x100b64 (8) {S} 2132: local_%fp$0x7e7 = %g1 @0x100bcc (8) {S} 1770: outparam_%o0 = local_%fp$0x7e7 @0x1008e4 (8) {S} 1792: local_%fp$0x87f = inparam_%i0 @0x1008f4 (8) {S} 1828: outparam_%o0 = local_%fp$0x87f @0x2010c4 (0) {S} 1544: inparam_%i0 = Free(inparam_%i0) ~/IDA/sdk/plugins/chevarista/src $ As you can see, we now have the complete vulnerable paths where multiple frees are done in sequence over the same variables. In this example, 2 double frees were found and one memory leak, for which the path to free is not given, since there is no (its a memory leak :). A very useful trick was also to give more refined types to operands. For instance, local variables can be identified pretty easily if they are accessed throught the stack pointer. Function parameters and results can also be found easily by inspecting the use of %i and %o registers (for the SPARC architecture only). ----------------[ E. Future work : Refinement The final step of the analysis is refinement [CEGF]. Once you have analyzed a program for vulnerabilities and we have extracted the path of the program that looks like leading to a corruption, we need to recreate the real conditions of triggering the bug in the reality, and not in an abstract description of the program, as we did in that article. For this, we need to execute for real (this time) the program, and try to feed it with data that are deduced from the conditional predicates that are on the abstract path of the program that leads to the potential vulnerability. The input values that we would give to the program must pass all the tests that are on the way of reaching the bug in the real world. Not a lot of projects use this technique. It is quite recent research to determine exactly how to be the most precise and still scaling to very big programs. The answer is that the precision can be requested on demand, using an iterative procedure as done in the BLAST [BMC] model checker. Even advanced abstract interpretation framework [ASA] do not have refinement in their framework yet : some would argue its too computationally expensive to refine abstractions and its better to couple weaker abstractions together than tring to refine a single "perfect" one. ---------------[ V. Related Work Almost no project about this topic has been initiated by the underground. The work of Nergal on finding integer overflow into Win32 binaries is the first notable attempt to mix research knowledge and reverse engineering knowledge, using a decompiler and a model checker. The work from Halvar Flake in the framework of BinDiff/BinNavi [BN] is interesting but serves until now a different purpose than finding vulnerabilities in binary code. On a more theoretical point of view, the interested reader is invited to look at the reference for findings a lot of major readings in the field of program analysis. Automated reverse engineering, or decompiling, has been studied in the last 10 years only and the gap is still not completely filled between those 2 worlds. This article tried to go into that direction by introducing formal techniques using a completely informal view. Mostly 2 different theories can be studied : Model Checking [MC] and Abstract Interpretation [AI] . Model Checking generally involves temporal logic properties expressed in languages such as LTL, CTL, or CTL* or [TL]. Those properties are then translated to automata. Traces are then used as words and having the automata not recognizing a given trace will mean breaking a property. In practice, the formula is negated, so that the resulting automata will only recognize the trace leading to vulnerabilities, which sounds a more natural approach for detecting vulnerabilities. Abstract interpretation [ASA] is about finding the most adequate system representation for allowing the checking to be computable in a reasonable time (else we might end up doing an "exhaustive bruteforce checking" if we try to check all the potential behavior of the program, which can btw be infinite). By reasoning into an abstract domain, we make the state-space to be finite (or at least reduced, compared to the real state space) which turn our analysis to be tractable. The strongest the abstractions are, the fastest and imprecise our analysis will be. All the job consist in finding the best (when possible) or an approximative abstraction that is precise enough and strong enough to give results in seconds or minuts. In this article, we have presented some abstractions without quoting them explicitely (interval abstraction, trace abstraction, predicate abstraction ..). You can also design product domains, where multiple abstractions are considered at the same time, which gives the best results, but for which automated procedures requires more work to be defined. ------[ VI. Conclusion I Hope to have encouraged the underground community to think about using more formal techniques for the discovery of bugs in programs. I do not include this dream automated tool, but a simplier one that shows this approach as rewarding, and I look forward seing more automated tools from the reverse engineering community in the future. The chevarista analyzer will not be continued as it, but is being reimplemented into a different analysis environment, on top of a dedicated language for reverse engineering and decompilation of machine code. Feel free to hack inside the code, you dont have to send me patches as I do not use this tool anymore for my own vulnerability auditing. I do not wish to encourage script kiddies into using such tools, as they will not know how to exploit the results anyway (no, this does not give you a root shell). ------[ VII. Greetings Why should every single Phrack article have greetings ? The persons who enjoyed Chevarista know who they are. ------[ VIII. References [TVLA] Three-Valued Logic http://en.wikipedia.org/wiki/Ternary_logic [AI] Abstract Interpretation http://www.di.ens.fr/~cousot/ [MC] Model Checking http://en.wikipedia.org/wiki/Model_checking [CEGF] Counterexample-guided abstraction refinement E Clarke - Temporal Representation and Reasoning [BN] Sabre-security BinDiff & BinNavi http://www.sabre-security.com/ [JPF] NASA JavaPathFinder http://javapathfinder.sourceforge.net/ [UNG] UQBT-ng : a tool that finds integer overflow in Win32 binaries events.ccc.de [SSA] Efficiently computing static single assignment form R Cytron, J Ferrante, BK Rosen, MN Wegman ACM Transactions on Programming Languages and SystemsFK [SSI] Static Single Information (SSI) CS Ananian - 1999 - lcs.mit.edu [MCI] Modern Compiler Implementation (Book) Andrew Appel [BMC] The BLAST Model Checker http://mtc.epfl.ch/software-tools/blast/ [AD] 22C3 - Autodafe : an act of software torture events.ccc.de/congress/2005/fahrplan/events/606.en.html [TL] Linear Temporal logic http://en.wikipedia.org/wiki/Linear_Temporal_Logic [ASA] The ASTREE static analyzer www.astree.ens.fr [DLB] Dvorak LKML select bug Somewhere lost on lkml.org [RSI] ERESI (Reverse Engineering Software Interface) http://eresi.asgardlabs.org [PA] Automatic Predicate Abstraction of C Programs T Ball, R Majumdar, T Millstein, SK Rajamani ACM SIGPLAN Notices 2001 [IRM] INTEL reference manual http://www.intel.com/design/pentium4/documentation.htm [SRM] SPARC reference manual http://www.sparc.org/standards/ [LAT] Wikipedia : lattice http://en.wikipedia.org/wiki/Lattice_%28order%29 [DDA] Data Dependence Analysis of Assembly Code ftp://ftp.inria.fr/INRIA/publication/publi-pdf/RR/RR-3764.pdf [DP] Design Patterns : Elements of Reusable Object-Oriented Software Erich Gamma, Richard Helm, Ralph Johnson & John Vlissides ------[ IX. The code Feel free to contact me for getting the code. It is not included in that article but I will provide it on request if you show an interest.