New Article in MDPI Machine Learning and Knowledge Extraction
Our colleague Sebastian Raubitzek, researcher at SBA Research and a member of the Security and Privacy Research Group at the University of Vienna, has published a journal article titled Classification of Obfuscation Techniques in LLVM IR: Machine Learning on Vector Representations in MDPI’s journal Machine Learning and Knowledge Extraction in collaboration with the CD lab AsTra.
© Niklas Schnaubelt
Abstract
We present a novel methodology for classifying code obfuscation techniques in LLVM IR program embeddings. We apply isolated and layered code obfuscations to C source code using the Tigress obfuscator, compile them to LLVM IR, and convert each IR code representation into a numerical embedding (vector representation) that captures intrinsic characteristics of the applied obfuscations. We then use two modern boost classifiers to identify which obfuscation, or layering of obfuscations, was used on the source code from the vector representation.
To better analyze classifier behavior and error propagation, we employ a staged, cascading experimental design that separates the task into multiple decision levels, including obfuscation detection, single-versus-layered discrimination, and detailed technique classification. This structured evaluation allows a fine-grained view of classification uncertainty and model robustness across the inference stages. We achieve an overall accuracy of more than 90% in identifying the types of obfuscations.
Our experiments show high classification accuracy for most obfuscations, including layered obfuscations, and even perfect scores for certain transformations, indicating that a vector representation of IR code preserves distinguishing features of the protections. In this article, we detail the workflow for applying obfuscations, generating embeddings, and training the model, and we discuss challenges such as obfuscation patterns covered by other obfuscations in layered protection scenarios.
Authors
Sebastian Raubitzek, Patrick Felbauer, Kevin Mallinger, and Sebastian Schrittwieser
