Fenris is a multipurpose tracer, debugger, and code analysis tool.
Version: 0.07-m2 build 3245Fenris is a suite of tools suitable for code analysis, debugging, protocol analysis, reverse engineering, forensics, diagnostics, security audits, vulnerability research and many other purposes.
Operating System: Linux
The main logical components are:
· Fenris: high-level tracer, a tool that detects the logic used in C programs to find and classify functions, logic program structure, calls, buffers, interaction with system and libraries, I/O and many other structures. Fenris is mostly a "what's inside" tracer, as opposed to ltrace or strace, tracers intended to inspect external "symptoms" of the internal program structure. Fenris does not depend on libbfd for accessing ELF structures, and thus is much more robust when dealing with "anti-debugging" code.
· libfnprints and dress: fingerprinting code that can be used to detect library functions embedded inside a static application, even without symbols, to make code analysis simplier; this functionality is both embedded in other components and available as a standalone tool that adds symtab to ELF binaries and can be used with any debugger or disassembler.
· Aegir: an interactive gdb-alike debugger with modular capabilities, instruction by instruction and breakpoint to breakpoint execution, and real-time access to all the goods offered by Fenris, such as high-level information about memory objects or logical code structure.
· nc-aegir: a SoftICE-alike GUI for Aegir, with automatic register, memory and code views, integrated Fenris output, and automatic Fenris control (now under development).
· Ragnarok: a visualisation tool for Fenris that delivers browsable information about many different aspects of program execution - code flow, function calls, memory object life, I/O, etc (to be redesigned using OpenDX or a similar data exploration interface).
· ...and some other companion utilities.
Code analysis is not limited to debugging, quality assurance or security audits. Understanding and handling file formats or communication protocols used by proprietary solutions, a problem that many corporations face when they decide to change their base software platform from one, obsolete or insufficient solution to another, perhaps more suitable, is a task that can consume long months and millions of dollars, especially when any misjudgment or misinterpretation is made.
Because of that, accurate and complete information about existing solutions has to be obtained and evaluated in a timely manner. This project is an attempt to fill the gap between currently used tools by providing a freely available program analysis utility, suitable for black-box code audits, algorithm analysis,
rapid reconnaissance in open-source projects, tracking down bugs, evaluating security subsystems, performing computer forensics, etc.
This program does not automate the process of auditing, and does not favor any particular use. Instead of that, it is intended to be a flexible and universal application that will be a valuable solution for many advanced users. While functional, it is probably not tested sufficiently, there are many issues to fix, several known bugs, some portability problems.
It is being released primarily to get user feedback, comments, and, most important, to request development support, as my resources are very limited, both in terms of available time and development platforms. This project is and will be distributed as a free software, regardless of projected use, accompanied by complete sources, under the terms and
conditions of GPL. Why do you might need this code? Well, there are few reasons...
Human beings are, so far, the best code analysts. Unlike computer programs, they have imagination, ability to build synthetic abstract models, and yet to observe and analyze smallest details at the same time. Functionality is often being described as "doing what the program is supposed to do", security as "doing what the program is supposed to do and
nothing more". While it might sound funny, that is the most general and complete definition we have. In most real-life scenarios only humans really know what are their expectations. Building strict formal models of our expectations does not necessarily mean that models themselves are flawless, and is very time-consuming. Then, even with such models,
validating the code is not always possible, due to its computational complexity. That is why real, live programs (not including some critical developments) do not have such models, do not follow any particular coding guidelines, and cannot be formally examined without human judgment.
Unfortunately, humans are also highly inaccurate and very expensive. They work slowly, and better results can be achieved by hiring better specialists and performing more careful audit. And after all, even the best expert can overlook something in complex, hard to read code. It is almost impossible for human to perform an accurate audit of a large, complex, heterogeneous project written e.g. in C - like Sendmail, BIND, Apache - and provide results in reasonable time.
Things get even worse when humans try to understand algorithms and protocols used by complex closed-source black box solutions. They are simply too slow, and not always able to make accurate guesses about dozens of complicated, conditional parameter passes and function calls before final action is taken.
While it might sound surprising, human-driven code audit is very similar to playing chess - it is a general analysis of possible states, way too many to be implicitly projected by our conscience, a result of experience, knowledge, some unparalleled capabilities of human brain, and luck. It is also a subject to false moves and misjudgment. And there are maybe just a few hundred excellent players.
As for today, freely and commercially available audit tools both use two opposite approaches. First approach tends to minimize human role by automating the review of source code. Source code analysis methods are good in spotting known, repeatable static errors in the code - such as format string vulnerabilities. On the other hand, static tools are not able to trace and analyze all possible execution paths of complex application by
simply looking at its source.
The reason for inability to follow all execution paths lies deeply in the foundations of modern computation theory, and one of its aspects is known as "the halting problem". Speaking in more general terms, in many cases (such as complex software, or even underlying operating system), the amount of medium needed to store all possible states of a complex program exceeds significantly the number of particles in the
universe; and the amount of time needed to generate and process them sequentially is greater than the lifetime of our universe, even having a machine that works with the speed of light.
This might be changed by the development of new computation models, such as quantum computing, or by creating mathematical models that allow us to make such problems non-polynomial - but for now, we are far from this point, and static analysis is restrained in many very serious ways, even though many software suppliers tend to market their products as the ultimate, 100% solutions. Subtle, complex, conditional dynamic errors, such as privilege dropping problems, input-dependent table overflows in C and many other issues usually cannot be detected without generating a completely unacceptable number of false positives.
This kind of software is highly dependent on coding style, and specific notation or development practices might render them less efficient - for example, automated audit utilities can usually detect problems like insecure call to strcpy() function, but will very likely not notice insecure manual copy in do-while
loop. The truth is, for programs that do not have previously built formal models, static auditing utilities look for known, common problems in known, common types of code in a very limited scope.
Another issue is the applicability of this approach to algorithm analysis tasks. In the domain of automated audit tools, this problem is "reduced" to building a formal model of program behavior, or, more appropriately, generating certain predictive statements about the code. While there are very interesting developments in this direction, such as the work of professor Patrick Cousot, it is very difficult to make any detailed, accurate and abstract enough run-time predictions for complex source code that has any immediate value in the analysis of unknown algorithm.
Last but not least, static analysis of sources can be deployed only when the source code is available, which does not have to be the case. This approach is a subject to many shortcomings, tricky assertions, and is a technique of strictly limited capabilities. This is, of course, not to dismiss this method - but to demonstrate that this much favored approach is not flawless and how much it needs to be accompanied with auxiliary methods.
The second approach to be discussed here is based on a dynamic run-time program analysis. This method is usually used to provide the user with information about actual program execution path, letting him make decisions on which path to follow and giving him free will to draw any conclusions and perform all the synthetic reasoning.
This method is
applied to a live binary executed in real-time and is based on monitoring syscalls (strace), libcalls (ltrace) or functions (xtrace); in certain cases, breakpoint debuggers, such as gdb, can be used, however it is usually not feasible to use them to perform anything more than in-depth analysis of a very small portion of program functionality. Usually, such analysis provides a very useful information on what is happening, and this information is provided in uniform, reduced-output form.
A careful auditor can analyze program behavior and find interesting or potentially dangerous run-time conditions. By monitoring how a given application interacts with external world, he (or she) can determine whether some other
conditions can be triggered and eventually explore them by examining sources or re-running the program. Advantages are enormous, as such software enables the auditor to spot very subtle errors in code that "looked good", to observe actual execution, not to try to figure it out, and to find or trace down not obvious or non-schematic vulnerabilities. Run-time trace tools are primarily used for fast reconnaissance tasks and for tracing down notorious errors that are not clearly visible in the source, significantly reducing the time of such operations.
There are, however, serious drawbacks related to this method. First of all, known tracing tools do not provide the complete information. They will detect strcpy() call, but won't report if exactly the same functionality has been implemented from scratch by the author of given program. And, in some cases, the amount of produced data
can be enormous, and because of its completely unstructured character, it makes the observation of overall execution vector almost impossible. Two most important problems are: correlating trace data with actual code, and determining what occurred in the "dark matter" between two lines of trace output.
There are some attempts to combine both approaches - run-time evaluation and source code analysis - such as Purify or many other commercial development support products. Unfortunately, they all feature a limited set of capabilities that need development-side or compilation-time support and are not really suitable for comprehending black box solutions or performing a general analysis. Most of them are targeted for dynamic memory debugging and code / memory profiling.
While not mentioned above, there is also another approach to black-box code - high-level decompiler. However, the complexity of modern compilers makes it very difficult to develop an effective C decompiler or similar utility, and there are only a few (two?) projects available to accomplish it, all of them not able to deal with too complex or optimized code. Finally, there is no guarantee that generated output code will be any help in comprehending the program. For now, this approach remains almost purely theoretical,
and I am not aware of any auditors using it extensively. Why? Well, here's an example of decompiled, mildly optimized code *with* some symbolic information: http://www.backerstreet.com/rec/ex386/hdgO.rec . One may argue it is less readable than cross-referenced disassembly.
This project, Fenris, is named after the monstrous wolf, son of the Norse god Loki. It is not the ultimate answer to all questions, not a solution for all problems, and under no circumstances is intended to replace other tools and techniques. On the other hand, it makes one step forward compared to other tools, trying to support the auditor and to make his work much more effective. This is accomplished by combining a number of techniques, including partial run-time decompiler, stateful analysis, code fingerprinting, I/O analysis, high-level visualization layer, traditional interactive debugger features and run-time code modification capabilities. The goal is to provide a very detailed trace information, and, at the same time, to provide data suitable to build a model of program behavior more quickly and in more convenient way.
Fenris is not supposed to find vulnerabilities or bugs, or to guess algorithms or describe protocols. It is supposed to report and analyze the execution path - detect and describe functional blocks, monitor data flow in the program, marking its lifetime, source, migration and destination, analyze how functions work and what conditions are evaluated.
At the end, it can deliver you an execution model of traced program (or arbitrarily chosen portion of it, if complete trace results in too much noise or irrelevant information), and hint you how this model can change in different conditions. Fenris does not need source codes of analyzed application, but obviously does not keep the auditor from using them.
For many users, Fenris might be a new tool or tools, for others - just a command-line replacement or addition to strace, ltrace, gdb or similar applications (there's a brief list of other nice tools in doc/other.txt). And that's the idea - to build a tool that is simple, reusable, but also precise and smart. It is supposed to have advantages over other tools, but not to be an ultimate replacement or the final solution. Some users can just use very specific features, such as automated function fingerprinting, and use companion tools instead of the main program.