The number of malicious Android apps is increasing rapidly. Android malware can damage or alter other files or settings, install additional applications, etc. To determine such behaviors, a security analyst can significantly benefit from identifying the family to which an Android malware belongs, rather than only detecting if an app is malicious. Techniques for detecting Android malware, and determining their families, lack the ability to handle certain obfuscations that aim to thwart detection. Moreover, some prior techniques face scalability issues, preventing them from detecting malware in a timely manner.
To address these challenges, we present a novel machine learning-based Android malware detection and family identification approach, RevealDroid, that operates without the need to perform complex program analyses or to extract large sets of features. Specifically, our selected features leverage categorized Android API usage, reflection-based features, and features from native binaries of apps. We assess RevealDroid for accuracy, efficiency, and obfuscation resilience using a large dataset consisting of more than 54,000 malicious and benign apps. Our experiments show that RevealDroid achieves an accuracy of 98% for detection of malware and an accuracy of 95% for determination of their families. We further demonstrate RevealDroid’s superiority against state-of-the-art approaches.
To access RevealDroid source code, you'll need two projects RevealDroid legacy code—which contains the package API extractor, native extraction code, and legacy code for handling Weka-based functionality—and the android-reflection-analysis code—which mostly handles reflection analyses and sklearn-based machine learning functionality.
To access the RevealDroid dataset (approximately 10GB in size), please follow this link.
To evaluate RevealDroid, we also compared it against state-of-the-practice commercial anti-virus (AV) products available on VirusTotal. We met or exceeded the accuracy values of 60 commercial AVs for our evaluation. Given that our technique utilizes machine learning, our technique learns to detect malware automatically, unlike many existing state-of-the-practice tools. Detailed results are available here.