Mobile Spyware Identification and Categorization: A Systematic Review

,


Introduction
Mobile phones have been observing increasing popularity over time.The number of mobile phone users was 7.26 billion (bn) in 2022, while they are envisaged to reach about 7.49 bn by 2025 [1].Among these, almost six bn are Smartphone users, out of which 5.07 bn have access to the Internet [2].Most of these users use Smartphones powered by Android Operating Systems [3].In contrast, only some use other Operating Systems, such as iOS [4], [5].Smartphones with internet access are always prone to cyber security threats.Among these threats, the most common are Trojans, Worms, Ransomware, and Spyware, as shown in (Figure 1) [6].
Figure 1: Common threats to a smartphone.
As its name suggests, spyware is a malicious program aimed at spying on a user's equipment (in most cases, a Smartphone).It is like a ghost in your machine [7].While residing unauthorizedly in the Smartphone, the ghost can adversely impact the phone.These harms can stretch from using the victim's camera and speaker/ microphone, pattern recording, keystroke logging, stealing banking, and other credentials to crypto mining using your phone.Almost 26,000 cyber-attacks are carried out daily, with spyware having a dominant share [8].Most of these intrusions have economic motivation at the backdrop.In 2012, the US senate committee on Commerce, Science, and Transportation reported that every one of eleven Smartphones was affected by spyware [9].This has surely increased significantly, as the total damage is now anticipated to reach $10 trillion (tr) by 2025 [10].A survey shows that 85% of phones are being affected by spyware [11].

Past works
There are considerable advancements being made by the researchers in detecting, classifying, and combating these threats posed to Smartphones.Some of the researchers have tried to survey and overview the research findings of other researchers.Some of these works are discussed here.B. Amro et al. have overviewed the existing malware detection techniques for mobile phones.The two major mobile phones operating systems (OS), Android and iOS, are considered for the research.Advantages and disadvantages of each technique have also been summarized [12].
In another research, Y.S.I.Hamed et al. have discussed cloud-based intrusion detection systems for mobile malware.The authors have concluded in the research that mobile isn't processing intensive device whereas IDS needs an intensive processing, therefore, cloud-based solutions are more viable ones [13].
Developments in deep learning for malware detection are surveyed by Z. Wang et al.The authors have brought all types of malwares for Android under investigation.Different aspects of deep learning for malware detection have also been delineated [14].The classical Machine Learning (ML) algorithms for malware detection have been surveyed by R. Vinayakumar et al. in [15].In this research, the classical ML algorithms of malware detection, classification, and categorization are evaluated using two datasets i.e. public and private.Different timescales are used to remove all dataset bias in the experimental analysis and come out with proposed model using image processing technique with optimal ML parameters to fill the gap of time-consuming problem of current malware detection algorithms and provides an effective zero-day malware detection solution.
M. Ashawa et al. have highlighted the malware detection techniques being used for Android phones.It has been inferred from the research that most of these techniques are inefficient in detecting malwares with obfuscation.At the end, the authors have presented a critical review of each of the malware detection techniques [16].E. M. Karanja et al. have surveyed the literature regarding malware attack and their detection in Internet of Things (IoTs).In this research, characterization, propagation, and analysis tools of IoTs malware has been discussed [17].
The literature available, so far, has been focusing on general malware detection.Moreover, the current overview papers aren't specifically aimed at the malware of mobile phones.Along with that, it considers every type of attack like Denial of Service (DoS) attack, phishing and spoofing etc., but not that attack alone that entails spyware.Despite the severity of threat that the spyware alone presents, there is very little focus on overviewing its literature.

Rationale of this research
Neutralizing spyware, thus, becomes a critically important task ahead.Scientists have been regularly investigating new and more effective methods of combating spyware.On the contrary, spyware also changes its signatures [18].However, there are a few broad categories of spyware detection.It is also worth noting that the broad categorization of spyware detection includes almost the same methods as any other type of malware.However, the details differ.These methods are (i) Static Methods, (ii) Dynamic Methods, (iii) Hybrid Methods, and (iv) Machine Learning.
Researchers have been regularly investigating spyware attack vulnerabilities in mobile phones.But these techniques have yet to be combined and analyzed to serve as a reference for further research.This paper has comprehensively surveyed the techniques used to detect and identify spyware in Smartphones.There are survey and overview papers in the literature (some of which have been discussed above) that focuses on every type of malware.Rationale of this paper is that it focuses on overviewing the techniques used for detecting and combating spyware specifically.This paper introduces a literature survey of the modern methods for spyware detection in mobile phones and also reveals the features and drawbacks of these modern methods.

Research questions
The rest of the paper is organized as follows: section 2 discusses the background of spyware variants, dataset obtaining, the security architecture of Android phones, and methods of detecting spyware; Section 3 explores methodology used for this research; section 4 has overviewed the related literature in detail; discussions and conclusions are provided in sections 5 and 6 respectively.

Background
Before plunging into different spyware detection methods in the literature, a few things need to be discussed in the context of spyware types, Android security architecture, dataset acquisition, and the detection and identification methods of spyware.

A. Variants of spyware
Spyware is a type of malware used for spying and espionage purposes.Since it changes its signatures through obfuscation, the exact count of spyware variants is difficult to determine.However, a few are: 'SW.SecurePhone,' 'SW.Qieting', 'mSpy,' 'Flexispy,' 'GnatSpy,' and 'Android APT Spy' [19], [20].Yet, all these spywares can be categorized broadly.Some of the infamous categories of Android spyware are as follows.
• Spybots: This type of spyware monitors user patterns, gathers information about different user's activities, and later these are transmitted to third parties without the user's consent.This can intrude on the user equipment, a useful application, or any browser extension, etc. [21] • Cookies: When cookies act as spyware, they transmit the user's web surfing behavior to unauthorized people.It is passive spyware that works based on existing web browser functions.[22] • Systems monitors: Generally, system monitors are used for recording user actions with good intentions.It uses this record for any future system diagnostic.On the contrary, system monitors can publish these user activities to the public while acting as spyware.
Keyloggers are examples of system monitors that steal user information [23].
• Browser hijackers: This spyware tries to change users' browser settings and preferences.Later, it changes the content on the website per the spyware author's will [24].
• Miners: This is emerging spyware that uses a hosted phone to mine Cryptocurrency.This runs in the background constantly and can adversely affect cell phone resources.• Code for malware: Spyware also comes with covert code for installing malware like Trojans and viruses.• Legitimate spyware: These are used for spying on intimate partners or children but can also serve dual purposes, e.g., 'find-my-phone' and 'Hello spy' etc. [25].

B. Android security architecture/features
Since Android systems dominate the smartphone market and most of the literature investigates spyware issues in Android, it is necessary to discuss Android security architecture.Some of the key features/components of Android Security architecture are as follows: • The most important part of its security is the Linux kernel.• Securing communication among different processes • Leaving a signature on every application • Permissions are of two types: granted by the user and defined by the application [26].• Sandboxing all applications [27] • Deep defense is also one of its important features.• Security embedded in design [28]

C. Dataset acquisition
Dataset acquisition comes as a prerequisite for experimenting with spyware detection.Researchers have used various datasets acquired from different real platforms or using virtual environments.Some researchers have preferred using datasets acquired from real mobile phones, while others have used the datasets of virtual environments [29].Some of the popular datasets are highlighted below.
• Derbin4000: This is a publicly available dataset of 4000 benign and the same number of malicious samples.It contains samples of many types of malware and can be filtered for spyware samples [30].
• AMD project: Belonging to AMD, this dataset consists of 24,553 malicious instances.This, too, can be filtered for obtaining the dataset of spyware samples as it has 71 families of malware [31].
• AAGM Dataset: This dataset has been collected by installing 1900 applications on smartphones.This is among the most viable datasets for spyware detection in Smartphones [32].• M0Droid Dataset: This is a dataset that has been obtained using the M0Droid tool.This dataset consists of data obtained on the kernel level of Smartphones.It has recorded signatures of different families of malware.One of these families is spyware [33].
• Self-Recorded Datasets: Some researchers prefer to capture data using sniffing tools.Some common tools are Wireshark [34], TCPdump [35], NetworkMiner [36] and Kismet [37].Using these tools, researchers can generate traffic of their choice.For spyware specifically, these tools can help chase down new spyware variants.

D. Detection and identification methods of spyware
A variety of methods are used for spyware detection, the four prevalent methods used for spyware detection and identification are presented in Figure 2 and discussed briefly in this part as follows: • Static method: This spyware identification method analyzes the spyware program to detect malicious parts.This malicious part of the program is later used to identify any future spyware intrusion.Reverse engineering is applied for future identification, and programs like those detected would be classified as spyware.Different tools are used to identify the malicious code: IDA Pro, Ollydbg, etc. [38].Various popular techniques are used for static spyware analysis: Fingerprinting, File Format Inspection, assembly, etc., as shown in Figure 2.

• Dynamic method:
The dynamic method has dynamism in detecting spyware.It makes decisions based on the function and behavior of the spyware.In this method, a model is trained to record the behavior of the spyware based on past data.It also can identify spyware on runtime.Techniques employed in this method mainly trace functions, their parameters, and control flow [39].The major tools for running dynamic analysis include Sandbox [40], RegShot [41] and Process Explorer [42], etc.
• Hybrid method: As the name suggests, the hybrid method takes advantage of both the static method and dynamic method by combining them.It first runs static analysis and then assesses if any sign of spyware behavior exists.It uses static and dynamic identification [38].

Related work
The boom of mobile phones has been attracting the attention of researchers.Rich literature is available on it, such as [49], [50], [51], [52], [53], [54], [55], [56], [57], [58].As so far generic malware detection is concerned, it too has enough material available, as [59], [60], [61], [62], [6].However, mobile spyware detection, in specific, has very limited research on it.The primary reason is that there is research on general mobile malware detection rather than spyware detection.Another issue is that as Android systems dominate mobile customer count, so is the research arena.This survey, too, would follow the same pattern dominated by the research on detecting spyware in Android phones.Nonetheless, this paper would be inclusive of the research carried for any other mobile OS like iOS.The most recent works are discussed as below.
M. Naser et al. have adopted a novel approach to identify spyware in Android phones.They have applied three ML models on a novel dataset of spyware.There were 168,501 spyware instances in the dataset.The models applied were SVM, NB, and Fine Decision Trees (FDT).FDT was the most accurate classifier, with a value of 98.2% [63].
A comprehensive research in this direction has been published by E. Liu et al.This study is about tracing three main mechanisms of spyware actions: how it abuses Application Program Interfaces (APIs), how users' personal information is being stolen through APIs, and evading detection systems by hiding the presence of the application.The authors have considered 14 popular consumer spyware applications.A total of eight malicious capabilities have been described for how spyware steals information, evades detection system, and persist in a phone.Each capability has also been proposed with a mitigation method.Some of the capabilities are using the camera obscurely, invisible access to the microphone, recording screen instances, and hiding the malicious app icon and instead using some popular and useful app icons.JADX was used for decompiling the source code [64].The result section of this study shows different threats revealed in the experiments, their respective threat model, and the specific result of the vulnerability of any app to this threat [65].
M. K. Qablain et al. have adopted a two-pronged approach to investigating mobile spyware.The first ramification of their research is about acquiring a novel dataset of spyware, while the other is about detecting Android spyware.The dataset was acquired from five commercially available spyware applications: mSPY [66], UMobix [67], MobileSPY [68], FlexiSPY [69], and TheWiSPY [70].All the spyware applications functioned in full swing.The traffic generated by these applications was recorded by a packet sniffer called PCAPDroid.The dataset is in PCAP as well as CSV format.The data have class A for normal traffic, class B for spyware installation traffic, and class C for typical spyware traffic.Afterwards, ML models were trained to detect spyware traffic.The authors have used different ML models independently for each variant of spyware.Among these, RF has performed the best among all.Overall, the binary classifier has achieved an accuracy of 79% while the multi-class classifier 77% [71].
Another study has been added by F. Pierazzi et al. to identify spyware in Android phones.This research has threefold objectives: it first distinguishes between goodwares and spyware; then spyware is related to its specific family; at last, it also automatically selects key features to underpin if the malicious program is spyware or any other malware.Static analysis has been combined with ML/DL for carrying out experiments.A novel model has been proposed to incorporate many ML classifiers in one called Ensemble Late Fusion (ELF).VirusTotal [72] has been used to create the dataset.The dataset consisted of a total of 15000 malware.The model works so that predictions are made using six commonly used traditional and four DL classifiers.ELF then makes predictions based on the predictions made by traditional classifiers.Regarding, ELF has achieved an F1 score of 0.982 and an Area Under Curve (AUC) too of 0.982 for spyware vs. goodware prediction, an F1 score of 0.960, and an AUC of 0.963 in the case of spyware vs. other malware prediction [73].
D. Harkin et al. have carried out interesting research on spyware.The focus of their research is to compare which between Android and iOS is more susceptible to getting compromised.The authors have maintained that Android users are more prone to spyware victimization vis-à-vis iOS users.They have backed their view that Android offers more 'openness' while iOS is more 'closed.'Nine general spyware applications were considered, like MSpy and Trackview, etc.Based on five reasons, the authors have concluded that Android phones are more vulnerable to spyware attacks than iOSsupported phones.In the end, it has been concluded in the research that the main reason for the vulnerability of both Operating Systems (OS) is their design philosophy, where Android is permissive.At the same time, iOS is more reserved [74].
Literature has further been enriched by H. M. Salih et al.In this research, a fake game application was developed and installed on an Android phone for spying purposes.The spyware has three-fold mechanisms: an Android application for spying, a desktop application for controlling the victim's phone, database to store the victim's information.On one side, the application steals information from the user.On the other hand, the desktop application phone can take control of many phone features.The authors have deduced this lesson from the research that names of spyware applications should be stored in a database, and every new application installed should be matched with it; Google and other giants should take serious actions against attackers; encryption of memory should also be ensured; anti-virus applications should be used [75].
An approach has been proposed by M. Conti et al. to identify spyware based on the network traffic it generates.The proposed technique has been called ASAINT (A Spy App Identification System based on Network Traffic).It has been tested on both Android and iOS.For carrying experiment, the researchers first set up a network with a gateway, an AP (Access Point), and many nodes.The traffic was captured using Wireshark for 73,33 hours.A total of 3365 instances were included in the final dataset.The identification was made using three ML algorithms: RF, Logistic Regression (LR), and K Nearest Neighbors (KNN).Regarding accuracy, RF gave a commendable f1 score of up to 0.92.LR had the best time efficiency of classification.The overall accuracy was about 85% [76].
F. Fasano et al. proposed a novel method of detecting Android spyware.In their work, they have proposed a temporal logic-based framework.This framework functions based on the formal method of model checking/ validation.The model accepts two inputs: a Labelled Transition System (LTS) and temporal formula.The result is true if the formula is verified and false if not.Mucalculus [77] has been used for model checking.On the implementation side, a dataset of 80 applications from 26 categories was collected.Malicious copies of these apps were generated using Android Framework for Exploitation (AFE) [78] along with DroidChameleon [79].This dataset was then experimented with for spyware detection using temporal logic.It gave an astounding result of 0.98 (98%) accuracy.[80] S. Hutchinson et al. have experimented with forensic analysis of spyware.In the experiment, the researchers first considered a spyware application belonging to Android.spy.277.origin[81] family, obtained from GitHub.This application was installed on an emulator for permission, code, and traffic analysis.The traffic that the application generated was captured using Wireshark.First, its code was analyzed, especially the AndroidManifest.xmlfile; its required permissions were analyzed.Then, its manipulation of information like email and messages was monitored; at last, the application was installed on a real phone to see if Play Protect works.It was revealed that for the first time there was no problem in installing the app; for the second time, it was flagged as malicious by Play Protect, and for the third time, the application did not get installed.At the end of the research, the authors proposed a framework for forensic researchers for any future analysis.This framework includes Static, Dynamic, and network analysis of an application under investigation.This will give a clear picture of the application [82].R. Zhang et al. have applied reverse engineering to exploit a vulnerability in Android phones.In the research, the authors have deployed AI to carry a stealthy attack, called Vaspy, on the phone using voice.The spyware imitates the activation voice for voice assistants.The spyware uses ML to select a suitable time for the attack.The spyware was tested against VirusTotal.It was further tested against three prevalent Android spyware miners: Derbin, DroidAPIMiner, and MaMaDroid.The disguised spyware proved resilient against the detectors [83].
Another interesting aspect of mobile spyware has been unearthed by R. Chatterjee et al.In this research, the authors have delineated how some applications are overtly or covertly used for intimate partner violence.There are two faceted findings in the research.How many undetected applications are present for spouses' surveillance, and how do some surveillance applications happen to be dual purposed, i.e., legal and covert?For this purpose, such applications were searched for with many keywords, such as "track my wife," and more than 27000 URLs (Uniform Resource Locators) were returned.Among these, more than 10000 applications were found, and an ML algorithm was trained to filter out the irrelevant applications.The model succeeded in achieving 93% accuracy.At last, 61 on-store and nine off-store applications were selected for in-depth analysis.As far as the existing anti-spyware applications are concerned, big names like AVG [84] and McAfee [85] even could not correctly classify what was manually labelled as spyware.Their detecting accuracy was a mere 3%.As a result of this research, as the authors claim, google has started improving its security [86].
M. H. Saad et al. have conducted another promising experiment.Authors of the research have developed a traffic intercepting malicious application, which they have called a 'chameleon.'When installed on an Android system, the developed application would act as a man in the middle.This spyware disease/application is designed in such a way as to intercept incoming SMS, incoming call, and outgoing call.Then the recorded information is transmitted to a cloud database.The authors have proposed a dynamic fuzz-based detection model to detect this spyware The authors name the proposed spyware detection model 'DroidSmartFuzzer.'Further, the authors have constructed a real environment for detecting the behavior of spyware.At last, the obtained results have been empirically compared with real results.The DroidSmartFuzzer was tested against 20 spyware applications, some free and others proprietary [87].
An attention-grabber aspect of spyware has been targeted by H. Abulola et al. in their research.They have unveiled how a 'notification listener' can exploit an Android's phone security.The main applications targeted in the research are WhatsApp, Facebook Messenger, BBM, and SMS.An 'SMS Backup' application is installed and granted permission for fiction listening.These notifications were routed to be sent to the attacker's email.The authors were successful in exploiting the action listening capability of Android.The experiment shows that in Android 4.3, the capability can be exploited for all four services; in contrast, in Android 5.0, the capability can be exploited only for SMS and BBM notification.In the end, the authors have suggested that BBM should change its notification structure.At the same time, Android should look into its permission mechanism [19].
P. Kaur et al. have also added their part.The authors have proposed a novel hybrid approach for detecting spyware in Android phones.In their proposed methodology, a broadcast listener has been deployed to look for any new application installation or update to any existing application.Upon receiving any new or updated application, the broadcast receiver locates its .apkfile and reengineers it.The researchers have considered various applications for their experiment.These applications have been scanned using existing antivirus software and validated using the proposed solution.The proposed solution analyses three aspects of an application and classifies an application as spyware or not.The three aspects considered for analysis are Description, Interface, and Source code analysis.Each of these three aspects has a certain weightage in decision-making.Source code analysis has got the highest weightage of 70%.The result shows that the proposed solution has, in some cases, performed better than the existing anti-viruses [88].
An attempt has been made by Z. Zhang et al. to enhance the security of cameras on Android phones.To do so, the authors have developed an application to spy-onuser using the phone's camera.This application will evade the three traditional ways of spying camera detection: API auditing, anti-spyware, and Mobile Device Management (MDM).Such an attack is called a transplantation attack.For this purpose, the authors repackaged the existing application with camera permission.The app was tested for 69 different phones from 8 different vendors with different Android versions; it gave a success rate of almost 46%, meaning that half of the phones worldwide are susceptible to transplantation attacks.To defend against such attacks, the authors have proposed two steps solutions: separating permission and group ID and implementing SEAndroid policy [89].

Permission analysis
Spyware has specific permissions that can be traced Some benign apps also need the same permissions

Discussions
This paper introduced the latest techniques used to detect spyware in mobile phones.The pros and cons of these techniques are also exhibited in this paper.These techniques include ML algorithms, behavior-based techniques, traffic analysis, and permission analysis.The ML algorithms are precise and accurate.However, they have issues regarding the false positives and false negatives.It can easily recognize the applications that used spyware behavior.As a con of behavior-based techniques, there exists spyware applications that could behave surreptitiously.The traffic analysis has the ability to detect the spyware from characteristics of the traffic generated by the spyware.Nevertheless, legitimate spyware applications generate the same traffic.The permission analysis is able to trace the specific permissions of the spyware.Nonetheless, there exists benign apps requiring the same permissions.It can be inferred from the analyzed paper that most of the experiments were performed on datasets of pre-meditated spyware and virtual environments.In very few instances, data was collected from real environments.This is problematic because spyware in a real environment may vary much more than in a virtual environment.
Moreover, most of the literature focuses on the general malware of mobile phones.Spyware, in specific, is the very least targeted in research.Spyware should be focused on the most because of its stealthy nature and its covert way of action.Besides that, spyware issues in Androidbased phones are investigated the most.That is because Android is the most targeted platform by spyware perpetrators.The reason Android is targeted the most by threats is its popularity and open nature.Researchers must realize that spyware issue should also be investigated in other platforms like iOS and IoTs as well.
In the so far literature, most identification methods follow behavioral detection.It has proved to be very effective.Other methods, too, need to be employed.
There is very least focus on cloud-based IDS among the researchers.So, they should focus on cloud-based IDS.Such systems give the mobile phones freedom from intensive processing which is the scarcest resource for a mobile phone [13].H. Al Bazar et.al.
Researchers must also consider fast detection solutions because of that a slower detection may provide enough time to an attacker to have the mission accomplished till the system detects it as a threat.

Conclusions
The increasing popularity of mobile phones is resulting in so many security challenges.Among these security challenges, spyware is the most prevalent one.It can harm the victim's device directly by stealing information or opening the way to other malicious software.Researchers have been trying to curb this menace.Some research has so far been carried out on the issue.This paper surveyed many of the techniques for detecting spyware in mobile phones, analyzing the most recently proposed methods and techniques, the achieved results of each proposed method, and the most relevant were discussed here.This paper would serve as a reference point for the researchers of mobile spyware domain.

•
What is the state-of-art of spyware detection in mobile phones?•What are the advantages and disadvantages of the state-of-art for spyware detection in mobile phones?

Table 1 .
Table1: Search results of different keywords.Detection of spyware in mobile phones D. Harkin et al.M. H. Saad et al. H. Abualola et al.M. K. Qabalin et al.R. Zhang et al.
[48]chine learning method: This spyware detection method utilizes ML algorithms to classify the encountered intrusion as spyware or not.A dataset of real spyware instances or virtually generated spyware H. Al Bazar et.al.traffic is recorded.An ML classifier is then trained using certain features of this data.After training, the model is used for future spyware prediction.Some popular techniques used are Deep Learning (DL), Support Vector Machine (SVM), Random Forest (RF) and Naive Bayes (NB), etc.[43].Figure 2: Overview of spyware detection methods.3MethodologyWhileconductingthisresearch,IEEEXplore [44], MDPI [45], ScienceDirect [46] and ACM [47] were consulted as the main sources.The keywords searched were 'mobile spyware,' 'Smartphone spyware,' 'Android Spyware,' and 'Spyware detection.'Mostresultsobtainedaftersearching these keywords overlapped, as shown in Table1.For this research, papers published after 2015 were considered for two reasons: mobile phones have greatly updated their security and research work before 2015 may be of very less use now (for instance, N. Xu et al. have investigated spyware issues in 3G[48], while now is the era of 5G); the other is that there can be very least research found on mobile spyware of era before 2015 for there was no boom of Smartphones before 2015.This paper has arranged the cited works chronologically, as shown in Table2.A summary of techniques used has been given in M. H. Saad et al. H. Abualola et al.P. Kaur et al.M. K. Qabalin et al. '

Table 2 :
A summary of spyware detection methods.

Table 1 :
Advantages and disadvantages of different techniques used for spyware detection.