The number of malicious Android apps is increasing rapidly. Android malware can damage or alter other les or settings, install additional applications, etc. To determine such behaviors, a security analyst can signi cantly bene t from identifying the family to which an Android malware belongs, rather than only detecting if an app is malicious. Techniques for detecting Android malware, and determining their families, lack the ability to handle certain obfuscations that aim to thwart detection. Moreover, some prior techniques face scalability issues, preventing them from detecting malware in a timely manner.
To address these challenges, we present a novel machine learning-based Android malware detection and family identi cation approach, RevealDroid, that operates without the need to perform complex program analyses or to extract large sets of features. Speci cally, our selected features leverage categorized Android API usage, re ection-based features, and features from native binaries of apps. We assess RevealDroid for accuracy, e ciency, and obfuscation resilience using a large dataset consisting of more than 54,000 malicious and benign apps. Our experiments show that RevealDroid achieves an accuracy of 98% in detection of malware and an accuracy of 95% in determination of their families. We further demonstrate RevealDroid’s superiority against state-of-the-art approaches.
To access RevealDroid source code, you'll need two projects RevealDroid legacy code—which contains the package API extractor, native extraction code, and legacy code for handling Weka-based functionality—and the android-reflection-analysis code—which mostly handles reflection analyses and sklearn-based machine learning functionality.
To access the RevealDroid dataset (approximately 10GB in size), please follow this link.
To evaluate RevealDroid, we also compared it against state-of-the-practice commercial anti-virus (AV) products available on VirusTotal. We met or exceeded the accuracy values of 60 commercial AVs for our evaluation. Given that our technique utilizes machine learning, our technique learns to detect malware automatically, unlike many existing state-of-the-practice tools. Detailed results are available here.