Codehawk

Making Software Safer


Sound

CodeHawk applies abstract interpretation techniques to achieve sound results, applying over-approximation to guarantee never missing a property violation.

Scalable

Using divide-and-conquer methods CodeHawk can scale easily to large code targets, and can reuse previous results on subsequent runs to minimize rework.

Flexible

One analysis engine works on verification of multiple source languages, and can be used in binary reverse engineering tasks. Using multiple abstract domains the engine supports a variety of analysis problems and properties.

Presentations

Coming Soon

CodeHawk Technology

Our core technology is based on abstract interpretation, a mathematical theory developed by Cousot and Cousot for sound static static analysis. Although very powerful, the theory has been found difficult to put into practice, resulting in few practical tools available. Over the past fifteen years we have succeeded in creating an abstract interpretation engine that supports sound static analysis of programs of hundreds of thousands of lines of code.

Our CodeHawk abstract interpretation engine includes very efficient iterators, and several abstract domains, including

  • constants
  • intervals
  • strided intervals
  • linear equalities
  • polyhedra
  • symbolic sets
  • value sets, and
  • taint

The CodeHawk Abstract Interpretation Engine has its own internal language, called CHIF (CodeHawk Internal Form) that provides an expressive, general representation of programs. This design enables CodeHawk to accommodate a number of different front-ends from different input languages. We currently offer front-ends for C, Java byte code, and X86, and MIPS executables.

The CodeHawk Abstract Interpretation Engine is implemented in OCaml. Its source code is available on under MIT License, in the CodeHawk/CH/chlib directory.

codehawk
codehawk

CodeHawk-C

CodeHawk-C is our C-language analyzer, a sound static analys tool targeted specifically at me mory safety properties. The analyzer has a built-in specification that precisely encodes absence of so-called undefined behavior, which icludes all memory safety vulnerabilities, integer overflow vulnerabilities, use-after-free, etc. (covering more than 50 CWEs).

The analyzer uses CIL, developed by George Necula and his students at UC Berkeley, as its front-end parser to transform C applications into structured form. From there the analyzer generates proof obligations for all constructs in the language that may lead to undefined behavior; the proof obligations effectively state that undefined behavior is not reachable. Invariants generated by the CodeHawk Abstract Interpretation Engine are then used to (attempt to) prove all of these proof obligations valid. If successful the invariants provide easily checkable evidence for the absence of undefined behavior.

The approach is compositional: proof obligations may be lifted to the function API if their validity depends on arguments being constrained, e.g., contained within a certain range, or being not NULL. These conditions are then imposed on the callers of the function, thus propagating the responsibility for safety to the most relevant locations. Proof obligations may also be deferred to return values of other functions or values of global variables or heap-allocated data structures.

Output from the analyzer is saved in XML, in highly indexed form for conciseness. Results are accessible via a collection of python scripts that can easily be customized for integration in a larger tool chain.

The C analyzer OCaml source code is available on under MIT License in the CodeHawk/CHC directory.

A prebuilt CodeHawk C Analyzer with python scripts to run the analyzer and access the results is also available on under the MIT License. It provides instructions that allow you to quickly get started with the analyzer.

CodeHawk-Binary

CodeHawk-Binary is our analyzer for x86 and mips executables. The x86 analyzer is useful in general reverse engineering and also provides several facilities specifically tailored to malware analysis for Windows executables. It provides automated deep semantic analysis of the program, uncovering details hidden by obfuscation efforts. It recovers variables and indirect calls, and reports indicators of compromise based on data flow analysis results, that are not available from the more shallow analyses performed by other tools.

The binary analyzer OCaml source code is available on under MIT License in the CodeHawk/CHB directory

A prebuilt CodeHawk Binary Analyzer with python scripts to run the analyzer and access the results is also available on under the MIT License. It provides instructions that allow you to get started with the analyzer within a few minutes.

codehawk

codehawk

CodeHawk-Java

Coming soon!

Try Our Analyzers

Prebuilt, ready to use, versions of our analyzers are available on GitHub:

We also provide some benchmark and demonstration repositories:

trybuy

About Us

Learn more about the company's technology and motivating pursuits.

More...