Categories
Uncategorized

On Malware, Computer Security & Digital Life

This is a paper for our CS 253 class, Computer Security. It’s a lengthy, quite formal paper, which talks about computer security with relation to malware. Before the TLDR, I’ll just share some striking quotes I find in my references.

Truly, computer security has become an arms race where data is both the target and the weapon. [Okay, this one is actually mine.]

This [malware] modularity also means a new era for malware developers, with a new business model of distributed labor where malware developers can work simultaneously on different parts of the system, and modules can be sold on underground markets.

Some covert channels on the Android platform are the vibration settings, which all apps can modify. Soundcomber codes the data into a sequence of vibration setting changes then the pair Trojan detects this and decodes them back to data.

International sports organizations are also compromised which is rather strange. However, when understood that it comprises the International Olympic Committee and the World Anti-Doping Agency in the lead-up and follow-up to the 2008 Olympics, one can say that it is a state actor due to the lack of obvious commercial benefits of such an act.

Here’s the paper:

On Malware, Computer Security & Digital Life

The dangers of malware is pervasive. The days when malware is easily recognizable and bragged about openly by computer hobbyists are gone. Nowadays, malware is a tool used for corporate espionage, state-sponsored cyberattacks, activism, fraud, and even revenge attacks by ex-employees [1]. Malware is also becoming more complex, as we see examples of malware moving laterally across a network on its own, changing its own structure to evade detection, and even self-destructing to ensure no trace can be found [2].Perhaps equally as alarming is that there are reports that indicate a thriving underground market for malware modules just like additional features on legitimate software. These modules can make the malware resistant to a type of detection, make it viable for new computer architectures, provide it with dubious digital signatures, or include some malicious functionality like keylogging, remote access tools (RAT) and botnetting.

Cybersecurity has become increasingly difficult with its traditional reactive methods of file signatures or labor-intensive work of reverse engineering. In fact, Symantec, one of the largest antivirus vendors, has admitted that its products are able to detect only approximately 45% of new threats [3]. This is largely because of a repertoire of tricks by malicious actors to hide malware. One such technique is steganography, which is the field of cloaking secret data into inconspicuous carriers like images, audio files, office documents and even network traffic.

One can say off-hand that I am not the target of these attacks, since I’m not privy to national secrets or the head of a multinational company. However, in this paper, I shall take myself as an example and discuss the many facets of my digital life, which is vulnerable to cybercriminals. In particular, I shall be taking the point of view of computer security and see how malware hide its intent and how it can significantly compromise my life.  I shall also discuss how we can attempt to mitigate, if not totally prevent, the risks on our digital safety.

I’m a software engineer

I recall my friend’s story on a business trip in Munich. While he and two of his colleagues were in a beer hall, they shared tables with a friendly couple, which is a common situation at that place. The two parties got chatty over beer but when the conversation shifted to each person’s respective jobs, the couple’s mood changed upon hearing my friend’s answer. They politely asked to be excused and left the establishment. My friend was working for Trend Micro, a multinational computer security company.

My friend and I were teammates at the time. We are both software engineers. I write code shipped to our products. I have a sizeable administrative role over our network because of my work. With these in mind, I become a prime target for malicious actors. If my accounts get compromised, they can get information from my workstation and access the company network horizontally, exfiltrating not only intellectual assets but potentially damaging operational processes. Fortunately, I know some of his attack vectors, which is the way malicious actors can hack into a system.

Spear-phishing is a common way of compromising someone in the Internet. An email attachment is sent to an individual with the right level of access at the company. These attachments contain exploits that when opened, can trigger the download or installation of malware. Since the crux of this technique lie on its inconspicuousness, cybercriminals have turned to social engineering. They send emails that is designed specifically for the victim, for example, in the case of the recently discovered Operation Tropic Trooper, emails were sent to Taiwanese government units and Philippine military divisions. The emails have contextually significant titles such as, 關於104年中央政府總預算.doc (About 104 years total central government budget.doc) or ASG Plan Bombing in Zamboanga City. The cybercriminals seem also familiar with the operating systems of the victims since they used old Microsoft XP vulnerabilities CVE-2010-3333 and CVE-2012-0158. In fact, these are very commonly exploited vulnerabilities in this platform and has already long been patched by Microsoft [4]. The next phase involves a Trojan, which downloads a JPEG files disguising as Windows XP wallpapers. Embedded in the image file is another malware, which performs the data exfiltration, and system sabotage. Operation Tropic Trooper may sound too complicated for what it is supposed to do, but each step adds a layer of inconspicuousness needed to maintain cover.

Other operations by cybercriminals are more troubling. In 2011, the Laboratory of Cryptography and System Security in Budapest, Hungary, discovered the malware Duqu in the wild. They say that it bears many resemblances to Stuxnet, another piece of malware that has been analyzed by security experts and has been picked up by the media as the first targeted attack against a high profile, real-life industrial target, the Iranian Nuclear infrastructure. Both malware feature a modular architecture, which allows attackers to build a targeted attack from various pieces of code. This modularity also means a new era for malware developers, with a new business model of distributed labor where malware developers can work simultaneously on different parts of the system, and modules can be sold on underground markets. It is perfectly similar to how I would engineer code – I will make parts of it reusable as a library so that my colleagues will not have to reinvent the wheel. In this case, other cybercriminals do not have to rediscover security flaws in CVE’s or create a module that performs steganography on images. They will just have to find a seller.

Among its features, Duqu contains a keylogger. The other components uses the registry at the Windows bootup process, injects itself to a system process which includes functionality for the keylogger as well as containing a signed driver that can be used on Windows 7 computers. This functionality of using the registry that loads up a driver is very similar to Stuxnet. The researchers also found more similarities with Stuxnet. Duqu contains the digital signature of the Taiwanese manufacturer C-Media. C-Media’s parent in the trust chain is Verisign. This suggest the private key of C-Media has been compromised. To avoid detection in the data exfiltration stage, it hides the information into innocent digital images and then sent over the Internet to Command and Control Servers (C&C Servers) [5]. This is particularly intriguing since trusted third parties (TTP) are supposed to have revocation lists to quickly patch up security flaws.

I use the Internet daily

The Internet has become indispensable for me. I use it for online banking, social networking and gaming. However, the presence of phishing and malicious websites demands my vigilance. Phishing websites reproduce a popular site in an attempt to fool users to typing in their credentials. These websites can be Amazon, Google or Yahoo, where users typically type in the same authentication credentials interchangeably. These phishing websites can also contain Trojans, like the malware Regin, which not only contains RAT features, but also captures network traffic to enable lateral movement within the network. Regin is especially interesting, since different versions of it is found in the wild, tailored to specific victims like government units, infrastructure operators, businesses and private individuals [6].

Browsers too have become targets by cybercriminals. With browsers nowadays offering plugins and enhancements, cybercriminals have been looking for ways to exploit this functionality for phishing. One such example is a Facebook attack which leverages on Google Chrome’s plugin environment. Using spam messages by an infected account, cybercriminals send shortened malicious links which leads users to a phishing site mimicking Facebook. In the site, an automatic download of a Google Chrome extension is offered, and since the website looks legitimate, the user can be fooled to accepting the installation. When installed, it in turn spams Facebook messages and installs other harmful applications like Adware and Spyware [7].

Threats can also come from the mobile platform, as demonstrated by the developers of a proof-of-concept Soundcomber. This is a Trojan which extracts a small amount of private information from the audio sensor of the phone. It can extract credit card and PIN numbers from the tone and speech-based interaction with the phone menu. Although the Android platform is strict on its permissions, Soundcomber circumvents this by having only minimal permissions. It cannot connect to the network, which is a red flag with most mobile antivirus systems. To send its data to the cybercriminals, it needs another paired Trojan. The Android system can prevent direct communication between applications, although the researchers proposed steganographic ways to avoid detection. Some covert channels on the Android platform are the vibration settings, which all apps can modify. Soundcomber codes the data into a sequence of vibration setting changes then the pair Trojan detects this and decodes them back to data. The same can be done on the sound volume settings. Another covert channel is the screen settings, and although changing this is more visible, the Android hardware has a flaw. If the wake-lock setting was held for a short enough time, the latency in the electronics of the device would prevent the screen from actually turning on.  Lastly, it can do the same with the paired Trojan using file lock changes. With these novel approaches, the paired Trojan can transmit sensitive data through the Internet [8].

I may be vigilant but what happens when the internet services that I am using is the one compromised? The Linux platform, used as the choice backend operating system for web applications, is commonly perceived as secure. However, sophisticated attackers have turned to steganography to hide attacks into network traffic. Linux.Fokirtor, the name given to it by Symantec, allowed an attacker to perform the usual functionality—such as executing remote commands—however, the back door did not open a network socket or attempt to connect to a command-and-control server (C&C). That would be immediately spotted by network security products. Rather, the back door code was injected into the SSH process to monitor network traffic and looks for the following sequence of characters: colon, exclamation mark, semi-colon, period (“:!;.”). With this, the attacker can avoid detection by standard security products, even by well-defended IT companies [9].

McAfee has revealed operation Shady Rat, which is an ongoing investigation of targeted intrusions to more than 70 global companies, governments and non-profit organizations. The paper describes how national secrets, source code, bug databases, email archives, negotiation plans, confidential business documents and schematics have been exfiltrated by cybercriminals. They gained access to one specific C&C server used by the intruders and analyzed the log files. McAfee went on to analyze how the intrusions started, which they described as standard spear-phishing activities. When opened, malware is downloaded and it initiates communication to a C&C server. It interprets the instructions encoded in the hidden comments in webpage code. Afterwards, live intruders can login the machine and move laterally within the organization to find target data [10].

The main focus of the paper is the possible motivations of the intruders. McAfee found that the nature of the victims can shed light to the intruders’ intent. There are 71 companies and organizations which have been compromised in the report. Alarmingly, the largest group contains governments, with the US being the most compromised. Other notables are defense contractors and IT industries, which hints at corporates espionage. International sports organizations are also compromised which is rather strange. However, when understood that it comprises the International Olympic Committee and the World Anti-Doping Agency in the lead-up and follow-up to the 2008 Olympics, one can say that it is a state actor due to the lack of obvious commercial benefits of such an act [10].

In the previous sections, we have already seen how cybercriminals operate. They use social engineering like spear-phishing and phishing websites to trick victims to running malware. These malware can be as simple as downloading other malware, or as complicated as exfiltrating data stealthily. Malware can avoid detection through steganography, hiding itself in innocent JPEG images, or through structural weaknesses of software, such as the ubiquitous but vulnerable packed executable (PE) file format. In the next section, we shall discuss modern ways computer security fights back.

Internet Safety Vocabulary

We are having our first peeks into the Internet of Things (IoT). Smartphones were only the beginning as our physical and digital lives become more intertwined. We will have more wearable technologies and automation systems spanning from our homes to our offices. In the future, we will have devices that have an identity in the internet, which means everything (note: everything, not necessarily everyone) can communicate in the internet. This exposes everything to massive security issues.

It turns out that simply knowing a basic set of computer security vocabulary helps a great deal in mitigating risk. Computer vendors have launched consumer campaigns to educate the public about computer security [11, 12, 13]. These campaigns becomes extremely important in the age of IoT. These campaigns focus on the dangers of phishing sites, suspicious emails and the motivations of cybercriminals. They highlight the importance of strong passphrases – replacing passwords – two factor authentication, SSL and digital certificates, and antivirus products. Among these products are next generation security consumer software such as mobile antivirus products and browser security features. For enterprises, security companies offer intrusion detection systems and network traffic monitoring, containerization architectures to minimize lateral movement in targeted attacks, and repositories of known attack vectors and infection chains to defend against future threats.

Malware Detection as part of Computer Security

At the heart of all computer security products and services mentioned, malware detection is still the main battleground for both cybercriminals and security vendors. Cybercriminals are constantly trying to write new forms of malware, and developing new features to evade detection. In fact, according to AV-Test, a security vendor benchmark organization, 2014 saw approximately 130 million new forms of malware, compared to 80 million in 2013 and 30 million in 2012 [14].

Information hiding techniques play an important role in the infamy of malware. Mazurczyk and Caviglione published an article that attempts to classify these techniques to three categories [3]. Understanding these three categories give security experts a structured approach into dealing with each threat. The first category hides information by modulating the status of hardware resources, the second uses network traffic to evade detection, and the third technique involves other files as a host for the embedded malware. An example from the first category is Soundcomber, which uses modulation of system settings for evasion and transmission. An example from the second category is the aforementioned Linux.Forkitor, which hides information in unused fields in network packet headers. The third is the most prevalent, where we find malware such as Duqu, Regin and the malware used in Operation Tropic Trooper. These malware hide their information using a host file, either a PE or a JPEG file. For JPEG images, it is common to embed hidden information in the least significant bit (LSB) of either the spatial or frequency domain. Both methods should be invisible to the human visual system to avoid detection.

Since 2001, there was already research interest in detecting steganographic content hidden in the Internet. Prior to the September 11 attacks, the US media reports that terrorists are using steganography to hide their communication. This was the motivation of Provos and Honeyman’s paper. They developed Stegdetect, which can find steganographic content in the LSB of JPEG images. They start with leveraging on the JPEG format, which uses a discrete cosine transform of 8×8 pixel blocks with 64 coefficients each. By analyzing the histogram of the coefficient frequencies and computing statistics like the chi-square test, they found that images with hidden content can be detected. For each popular steganographic system known at the time, JSteg, JSteg-Shell and JPHide, they compute signatures which is a statistical distribution of their respective distortion algorithms. Each algorithm uses the LSB in specific portions of the file, which alters the distribution of the coefficients, and thus can be detected by chi-square tests. However, one of the authors has also created a system called Outguess, which preserves statistical properties and can evade the statistical tests. No false positives were found using Stegdetect. However, they have false negatives of around 2% for JSteg, 15%-60% for JPHide and 60% for OutGuess [15].

A weakness of Stegdetect is that it is reactive such that it needs to know how the DCT coefficients are modified, and to some extent, where they are modified, to perform acceptably. This is a weakness described by Dabeer in their paper, where they said that the host probability mass function usually varies substantially across image databases. Dabeer improved on the earlier success of Stegdetect, and used hypothesis testing and likelihood ratio. In their framework, they used a parameter R which estimates the rate of hidden data, and discussed their findings for low to high hiding rates. [16]

Embedding data in the LSB of the host image is only one technique in steganography. Other methods are DCT domain embedding and bit-plane complexity segmentation steganography, both of which are more resilient to statistical analyses. On the other hand, techniques for steganalysis are difference image histogram methods, closest color pairs and feature extraction methods. The difference image histogram method relies on a family of difference images, their histograms and their relationships. The closest color pair method relies on the observation that the number of close color pairs is increased considerably when an image has secret information in its LSB plane. The feature extraction method, inspired by successes in the data mining community, relies on a number of hand-crafted variables, like the characteristics of wavelet-transformed images [17].

Aside from embedding hidden information in images, PE can also be a host carrier by using packers. Packed executables is standard industry practice for hiding installation procedures and binaries for legitimate software vendors. The functionality that enables this is also the exploit cybercriminals are using to embed malware in legitimate applications. Although Microsoft has released standards for writing PE, both legitimate vendors and cybercriminals disregard this occasionally. The result is an ecosystem vulnerable to malware. Detection of packed executables rely on the entropy analysis. Entropy, in information theory, measures the information contained in a medium, or in other words, the randomness of a sample. The average entropy for packed executables is generally higher for packed executables [18]. See Table 1 for the complete details of this observation. In addition, the distribution of entropy for a sliding window across the file can change radically due to the presence of hidden data [19].

Dataset Average Entropy 99.99% Confidence Interval Highest Average Entropy 99.99% Confidence Interval
Plain Text 4.347 4.066- 4.629 4.715 4.401- 5.030
Native Executables 5.099 4.941- 5.258 6.227 6.084- 6.369
Packed Executables 6.801 6.677- 6.926 7.233 7.199- 7.267
Encrypted Executables 7.175 7.174- 7.177 7.303 7.295- 7.312

Table 1. Computed Statistical measures [18]

Another way to detect packed executables is feature extraction. Certain structural properties can be compiled, such as the number of import functional calls, entropy values, section names, number of sections and access rights for each section. Packed PE’s have higher entropy values, and a smaller number of import function calls relative to a file’s size or complexity. Section access rights are also suspicious if they have read and execute access since normal executables usually only have read and write executables, save for one section. Packers also use non-standard section names, while normal executables have predefined section names under their respective legitimate software vendors [20].

Recently, an innovative family of techniques involving image processing has emerged for malware detection. The motivation of this work is that the attackers reuse old code and apply obfuscation techniques to generate new variants of malware. In this technique, researchers use the image visualization technique called byteplots, and in this domain, the texture and structural differences between packed malware and normal executables are significant. The researchers extract wavelet-based features, intensity features and texture features. Using the state-of-the-art Support Vector Machine classifiers with radial basis functions, they achieved an accuracy of 95% on their dataset of 25000 malware and 12000 normal executables [21].

Using machine learning, Cylance has yielded its first commercial product. The company argues that its product, focusing on big data and machine learning, can detect and prevent malware better than current industry incumbents. Research certainly holds great promise in this direction, and time will tell if how, in turn, cybercriminals can evade detection [22].

Final Thoughts

Computer security experts also has its arsenal of defenses against malicious writers. Both groups are on a continuous arms race, with the stakes getting higher year after year. The media hype centers on security breaches and stolen data, although there has been also reports of Interpol apprehending botnet operators and cybercriminal groups. What can we then expect in the future?

There is the adage that it is easier to break into a system than defend it, and perhaps this can be given some credit. Even in the presence of HTTPS, antivirus products, and two-factor authentication, cybercriminals still hack their way to high profile companies and organizations.  In the present time, security experts are still largely reactive in responding to these threats – successful attacks has to happen first before flaws can be patched up.

I argue that these weaknesses can be patched up proactively in two ways, computer security education and data mining. The first one can cover the basics of security so that individuals and companies can have the right approach to security – developing competency in securing digital resources, as well as developing risk models to compartmentalize vital data resources within the infrastructure. The second one, data mining, can cover the multitude of malicious techniques cybercriminals use, that we, as humans, cannot necessarily have in mind all of the time. Imagine building a progressively larger repository of known attacks over the years. It becomes progressively difficult to break into platforms, giving companies some breathing room to again think of the next countermeasure.

Truly, computer security has become an arms race where data is both the target and the weapon.

References

[1] G. Smith, “Matthew Keys Case Shows Rogue Employees Can Be Just As Dangerous As Hackers,” Huffington Post, 19 May 2013. [Online]. Available: http://www.huffingtonpost.com/2013/03/19/matthew-keys-rogue-employee-hackers_n_2903021.html. [Accessed 30 May 2015].
[2] W. Broad, J. Markoff and D. Sanger, “Israeli Test on Worm Called Crucial in Iran Nuclear Delay,” The New York Times, 15 January 2011. [Online]. Available: http://www.nytimes.com/2011/01/16/world/middleeast/16stuxnet.html?_r=0. [Accessed 30 May 2015].
[3] W. Mazurczyk and L. Caviglione, “Information Hiding as a Challenge for Malware Detection,” IEEE Security and Privacy, vol. 13, no. 2, pp. 89-93, 2015.
[4] Trend Micro, “How Operation Tropic Trooper Infiltrates Secret Keepers,” 14 May 2015. [Online]. Available: http://www.trendmicro.com/vinfo/us/security/news/cyber-attacks/operation-tropic-trooper-infiltrates-secret-keepers.
[5] Laboratory of Cryptography and System Security, “crysys.hu,” Budapest University of Technology and Economics, Budapest, Hungary, 2011.
[6] Symantec, “Regin: Top-tier espionage tool enables stealthy surveillance,” November, 24, 2014.
[7] Trend Micro, “Chrome Lure Used in Facebook Attack despite Google’s New Policy,” 26 May 2015. [Online]. Available: http://blog.trendmicro.com/trendlabs-security-intelligence/chrome-lure-used-in-facebook-attack-despite-googles-new-policy/. [Accessed 28 May 2015].
[8] R. Schlegel, A. Kapadia and X. Wang, “Soundcomber: AStealthy and Context-Aware Sound Trojanfor Smartphones”.
[9] Symantec, “Linux Back Door Uses Covert Communication Protocol,” 13 November 2013. [Online]. Available: http://www.symantec.com/connect/blogs/linux-back-door-uses-covert-communication-protocol. [Accessed 27 May 2015].
[10] McAfee, “Revealed: Operation Shady RAT,” McAfee, 2011.
[11] Trend Micro, “Internet Safety for Kids,” Trend Micro, [Online]. Available: http://www.trendmicro.com/us/home/internet-safety/.
[12] Kaspersky, “Internet Security Center,” Kaspersky, [Online]. Available: http://www.kaspersky.com/internet-security-center/internet-safety.
[13] Symantec, “Family Resources,” Symantec, [Online]. Available: http://ph.norton.com/family-resources/.
[14] AV-Test, “Malware Statistics,” AV-Test, [Online]. Available: http://www.av-test.org/en/statistics/malware/. [Accessed 29 May 2015].
[15] N. Provos and P. Honeyman, “Detectign Steganographic Content on the Internet,” CITI Technical Report, 2001.
[16] O. Daber and S. Chandrasekaran, “Detection of Hiding in the Least Significant Bit,” in IEEE Transactions on Signal Processing, 2004.
[17] A. Hernandez-Chamorro, A. Espejel-Trujillo and J. Lopez-Hernandez, “A Methodology of Steganalysis for Images,” in International Conference on Electrical, Communications and Computers, 2009.
[18] R. Lyda and J. Hamrock, “Using Entropy Analysis to Find Encrypted and Packed Malware,” IEEE Security & Privacy, vol. 5, no. 2, pp. 40-45, 2007.
[19] S. Han, K. Lee and S. Lee, “Packed PE File Detection for Malware Forensics,” IEEE, 2009.
[20] M. Baig, P. Zavarsky and R. Ruhl, “The study of evasion of packed PE from static detection,” in World Congress on Internet Security, Guelph, Ontario, 2012.
[21] K. Kancherla and S. Mukkamala, “Image Visualization Based Malware Detection,” in IEEE Symposium on Computational Intelligence in Cyber Security, Singapore, 2013.
[22] Cylance, “Math vs Malware,” Cylance, California, 2014.
[23] Wallstreet Journal, “Symantec Develops New Attack on Cyberhacking,” Wallstreet Journal, 4 May 2014. [Online]. Available: http://www.wsj.com/articles/SB10001424052702303417104579542140235850578. [Accessed 29 May 2015].

By krsnewwave

I'm a software engineer and a data science guy on recommender systems, natural language processing, and computer vision.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s