Empirical Analysis of Binary Obfuscations [Working]
Empirical Analysis of Binary Obfuscations [Working]
This is just my personal thought on binary obfuscation techniques. Typically, I want to understand these aspects:
- Effectiveness of each obfuscation technique: How well does it prevent reverse engineering? Can they be trivially de-obfuscated?
- Will compiler optimizations break the obfuscation, or enhance its obfuscation effect?
This is a quite ad-hoc analysis, not quite a paper-ready work (I won’t post this here if it is).
What I do NOT care:
- Performance overhead
Obfuscation Frameworks
Here, I use Polaris Obfuscator for the analysis.
De-obfuscation Tools
- https://github.com/cdong1012/ollvm-unflattener
- https://github.com/cq674350529/deflat
Metrics
- CFG Similarity: Typically, we use graph edit distance to measure the similarity between the CFG of an original binary function and the CFG of the obfuscated one.
Workflow
It is intended to be a combination of qualitative and quantitative analysis. For the quantitative part:
- Prepare dataset. Two types: real-world projects (e.g., coreutils, ffmpeg, libxml2, etc.) and synthetic code snippets (e.g., Csmith generated code).
- Apply obfuscation techniques to the dataset, and compile with different optimization levels (e.g., -O0, -O1, -O2, -O3).
- Use de-obfuscation tools to try to recover the original CFG.
- Use metrics proposed above for evaluation.
References
- https://plzin.github.io/posts/mba
- https://github.com/mazeworks-security/MSiMBA
- https://arxiv.org/pdf/2406.10016
This post is licensed under CC BY 4.0 by the author.