There’s an interesting quote that said the ability to take finer measurements allows leaps in the advances of science. Looking back at when I started to reverse engineer as a kid, it’s cool to see how I may approach problems differently as I learned new techniques.
Uncovering Structure of Target Binary
Detecting Decryption vs Compression
If you’re not sure if it’s compressed or zipped, try running an entropy check with
ent. Something which is encrypted should generally be indistinguishable from pure randomness (of course, this isn’t always the case if a weak algorithm is used such as ECB).
GNU File Utility & Binwalk
Sometimes you have a binary blob and you may not sure what’s in it. GNU’s
file utility can help uncover the format. If it’s an unknown format, using binwalk can help to check if it contains multiple binaries.
Reading raw assembly can be tedious, especially when you start working with multiple instruction sets.
Ghidra and IDA can take assembly and produce C code, which can be read a lot easier. Disassemblers such as BinaryNinja allow converting assembly to a common instruction language to make code must easier to read.
The downsides to Decompiled/Lifting code is results can be incorrect, so be mindful of this. Looking for symbol/library names can be useful, a quick search engine search may lead you to the original code.
Note decompilers exist for byte-code languages too, such as Java and .Net. These are a lot more accurate in terms of correctness. Symbols may be mangled though.
Reviewing Static References To Data
If trying to locate a piece of code, see if there is reference to data related to an operation. For instance, if the program prints a message, search for a reference to the message in a Disassembler . This often leads you directly to the code in question.
Graphing Control Flow/Data Flow of Programs
Many disassemblers allow graphing of code as nodes, showing branches between basic blocks & calls to functions. Functions that have a bunch of branching within them may be related to parsing of complex data structures. Loops and other constructs can quickly be spotted to get the general idea of what is being performed.
Fingerprinting is useful for finding inlined code, or for binaries with stripped symbols. For instance, crypto algorithms (md5, aes, etc.) can be fingerprinted by initial values/instructions to determine if they are embedded in the binary. The way in which code/data is structured may help with attribution too, to help uncover the origin of where it came from.
Breaking on Instruction/Memory Access
Perhaps there isn’t a simple reference to a string, or maybe a value is fetched dynamically from a socket. In this case, breaking on API calls related to the functionality can help (recv, MessageBoxExA, etc). If a binary has symbols, all the better. Symbol names can be descriptive and hint what they are used for.
Note some processors support hardware breakpoints, which raises a signal when a condition is met. This has the benefit of not having to modify the in-memory executable.
Conditional/Code Executing Breakpoints
This is a slightly more advanced version of simple breakpoints. The idea is when an interesting api call is reached, you write some code to perform an action. This could be dumping registers/memory addresses, pausing the debugger when a condition is met, etc. Most debuggers have this this functionality. ltrace/strace can be seen as a crude form of this.
Some code may be polymorphic (ie, self modifying at runtime). Performing a snapshot of the program in memory after it decrypt’s/unpacks itself can be useful. Scripts can be used to unpack programs, sometimes without the target binary even running. Snapshotting memory can also be useful for finding memory addresses which contain a value of interest. Game trainer creators may often support this feature, such as for “freezing” your health points to the same value.
Leveraging Scripting Languages for Easy Breakpoints
console.log() function. If you add the function call to the interpreted script, and then place a breakpoint on the function, it’s an easy way to debug what has changed between calls. Ex:
console.log(); var words = 'hello world'.split(' '); console.log();
Time Travel Debugging
This one is pretty cool. Debuggers such as WinDbg and GDB (although slow) allow for continually recording the state of a program while it executes. You can then cause the debugger to break, and then step through the program forward and backwards in time.
Programs such as AFL Unicorn and QEMU allow running cross-architecture code. This can be useful for fuzzing, or running code in an environment similar to that of the target device.
Recording & Reviewing Execution Traces
Recording Program Execution Traces
Taking this further, we can record program execution traces between two points. This could done with Pintool, IDA, etc. Tools often allow you to specify at what granularity traces should be recorded (functions, basic blocks, individual instructions, etc.) Lighthouse is useful for visualizing the traces.
Diffing Program Execution Traces
This one is really useful when you are unsure where code is, but you are pretty certain you can control when the code runs. The methodology is pretty simple, execute the program in such a way that the code if interest is executed. Next, perform a similar execution of the program, but don’t cause the interesting code to be executed. Diffing the execution traces can lead you to which ran uniquely between traces. Write or use a tool to do the diffing.
Automated Diffing of Program Execution Traces
I actually haven’t tested this out before, but would be interesting. Say you are interested in finding related code across different version of the same program. The idea would be to automate interactions with the program, so the diffing can be automatically done too. This could help if basic fingerprinting would fail for whatever reason. Perhaps this could be leveraged to understand programs which implement custom Byte Code interpreters, especially if they change instruction sets between releases.
Some dissemblers support data flow analysis on binaries. I find this to be often limited, but can he helpful to determine low hanging fruit (ie, is snprintf used, but the length argument from a strlen() call, and writes to a stack based buffer?). Constraint solves such as Z3 can also be useful if needed to satisfy some conditions.