You are on page 1of 5

Unicode-proof Code Injection Attack on Windows CE

- A Novel Approach of Evading Intrusion Detection System for Mobile Network


Yang Song1,2, Yuqing Zhang1,2 *, Yingfei Sun2
1

JingBo Yan
Key Lab of Computer Networks and Information Security of Ministry of Education Xidian University Xian, China yanjb@nipc.org.cn injection attack, the attacker sends malicious data to the target device to trigger a bug in a certain program. This action consequently diverts the control flow of the program to a specific part of the data, written in such a way that its representation in memory is also a sequence of valid instruction codes for the processor. This section of the data, called shellcode, implements the malicious intent on the target device. Writing shellcode is prone to many difficulties. Because many programs filter or restrict their input data, shellcode often has to be written to allow for such restrictions. In practice, many programs only accepts data that is encoded in Unicode, and therefore, non-Unicode strings are converted to Unicode before being processed by the program. Such conversion is widespread on Windows CE because the API functions use UTF-16, a character encoding for Unicode that encodes most characters to a 16-bit half-word, for string representation. Instruction encoding, however, often results in invalid characters that are defined in neither ASCII nor the ANSI code pages. In such cases, the character will be converted to Unicode as an interrogation mark, resulting in the destruction of the instruction. All ARM instructions are four bytes long (one 32-bit word). For English and other Western European languages, characters that are defined in either ASCII or the ANSI code pages account for 1.24% of the UTF-16 characters. That is, only 0.0153% of 32-bit words can be used to construct Unicode-proof instructions. Specifically speaking, 26 instructions are Unicode-proof, with 8 of these limited to a particular version of ARM processor, while the instruction operands are restricted to a small set of values. It is difficult to build powerful shellcode for many important operations with so few instructions, e.g., calling a system API cannot be carried out by these instructions. Moreover, because the ARM processor does not validate the instruction cache while the memory is modified, this hampers the application of selfmodifying code, which is widely used in shellcode writing. This paper presents a novel approach of building Unicode-proof shellcode under these constraints and therefore attackers can launch Unicode-proof code injection attack on Windows CE devices. As a result, certain bugs that previously only led to a device crash should now be considered highly dangerous vulnerabilities. Because the Unicode-proof shellcode is composed of printable characters, it also has a great advantage of evading intrusion detection systems that try to detect the existence of shellcode in input coming from the network. Our approach

National Computer Network Intrusion Protection Center 2 School of Information Science and Engineering Graduate University of Chinese Academy of Sciences Beijing, China songy@nipc.org.cn, {zhangyq, yfsun}@gucas.ac.cn

Abstract Code injection attack is a major way of spreading malware on network. The key section of code injection attack is a small piece of code, called shellcode, which performs unauthorized operations when it is injected into software as part of valid data. On Windows CE, input data are often encoded using Unicode before being processed. In such cases, shellcode should be built in a way that bypasses such encoding; that is, it should be Unicode-proof. Unicode-proof shellcode also has great advantage of evading instruction detection system. However, it is quite difficult to build Unicode-proof shellcode for the ARM architecture, on which most embedded devices are developed, because the subset of instructions that can be used to write Unicode-proof shellcode is very limited. Moreover, the instruction cache in the ARM processor restricts the application of self-modifying code, which is frequently used in shellcode writing. This novel research proposes an approach to building ARM Unicode-proof shellcode on Windows CE under these constraints. The approach applies to all versions of ARM processors and Windows CE, including systems evolved from Windows CE, such as Windows Mobile and Windows Phone. The shellcode is tested on three currently available devices. Keywords- Unicode-proof; code injection; Windows CE

INTRODUCTION With the rapid development of the mobile market, the ARM processor has become one of most widely used processors in recent years. The ARM architecture evolved from the basic Reduced Instruction Set Computer (RISC) architecture. Enhancements to the RISC architecture allows ARM processors to achieve a good balance between performance and power consumption, making them ideally suited to embedded equipment. However, the increased processing power and the use of desktop-like operating systems on these devices have made them vulnerable to threats of malicious software (malware) [1]. With the wide application of 3G mobile network the security of embedded equipment, such as smart mobile phone and personal digital assistant, becomes increasingly more important. The code injection attack [2], one of the most severe classes of attacks on desktop-PCs, is also a major means of spreading malware [3] on embedded devices, e.g., Mulliners attack [4] against Windows Mobile. Specifically, in a code

* Corresponding author

is applicable to all versions of Windows CE and ARM processors, and is also applicable to those systems evolved from Windows CE, such as Windows Mobile and Windows Phone. The main contributions of this paper are as follows. We present an analysis of all Unicode-proof instructions. Moreover, for those instructions that are not Unicode-proof, we discuss whether they can be emulated by Unicode-proof instructions and if so, how to emulate them. We propose a novel method to flush the instruction cache on Windows CE. As far as we can see, this is the only way to bypass the effect of the instruction cache against self-modifying code with Unicodeproof instructions on Windows CE. The rest of this paper is organized as follows. Section II presents an analysis of Unicode-proof instructions, while Section III describes our approach to creating Unicodeproof shellcode, including the emulation of many nonUnicode-proof instructions and a discussion of the instruction cache. In Section IV we evaluate our approach on three existing devices. Related work is discussed in Section V and Section VI concludes the paper. ANALYSIS OF UNICODE-PROOF INSTRUCTIONS The ARM architecture has uniform fixed-length (32-bit) instruction fields. The ARM processor has 16 visible 32-bit registers, generally called Rn (0<=n<=15). Three of these registers have special roles: R13/Stack pointer (SP), R14/Link register (LR), R15/Program counter (PC). The remaining registers have no specific hardware purpose, but the first four registers are generally used to transfer function parameters in the software. In addition, the CPSR (Current Program Status Register) holds the current operating process status, and contains four condition code flags: Negative (N flag), Zero (Z flag), Carry (C flag) and oVerflow (V flag). All these registers, including the CPSR, are accessible by ARM instructions under specific conditions. Table I lists all Unicode-proof instructions on the ARM architecture together with the number of values that are available as operands for the instruction. Some of the instructions that require specific ARM architecture versions, as highlighted in Table I, are excluded in the following
TABLE I. Instruction Class Arithmetic/logic Multiply Comparison Load and store ARM UNICODE-PROOF INSTRUCTIONS Instruction Opcode (the number of values available for the operands) ADC(6732) ADD(1020) AND(8160) EOR(7956) ORR(204) QDSUB(36) QSUB(16) RSB(6528) RSC(6528) SBC(7040) SUB(6528) MLA(117) MUL(120) SMLAL(200) MULL(116) SMULW(32) UMLAL(147) UMULL(15) CMN(561) CMP(374) LDRD(1122) LDRH(2096) LDRSB(2227) LDRSH(2096) STRD(1968) STRH(1968)

Set register to zero Assume we know nothing about the state of memory, the value of any register or anything else when the shellcode starts up. We would probably want to initialize the program state before processing the real functionality of the shellcode. The following instructions provide a great start by setting R0 and R2 to zero. EOR {EQ / CS/HS} {S=0} R0, R0, R0 00200000 EOREQ {S=0} R2, R3, R3 LSL R0 00232013 This is done by performing an Exclusive-OR operation on two operands with the same value. Notice that the EOR instruction is Unicode-proof only if the destination register is assigned to R0 or R2. Therefore, we have to initialize other registers by copying the value of R0 to them. Copy the value of R0 to other registers, or the value of other registers to R0 We can copy the value of R0 to other registers using the MLA instruction, because only the multiply instructions are Unicode-proof irrespective of the destination register. MLAEQ {S=0} [R0-R14] R2, R1, R0 The expression listed above multiplies the value of R2 with the value of R1, and adds the value of R0 to the result of the multiplication. The calculation equals the value of R0 when R2 is zero, and can be written to any register. Because we can set R2 to zero, we then have the ability to copy the value of R0 to any other register. To copy the value of other registers to R0, we perform an addition operation on R0 and one other register. The result of the addition operation is equal to the other register if R0 is set to zero. Produce arbitrary 32-bit immediate values It is complex to produce a new value in a register because we are restricted to manipulating a few 32-bit immediate values with the SBC instruction. By multiplying the results of the SBC instruction no more than three times (2.45 times on average), we can produce 28.97% of all 16bit values. The rest of the 16-bit values can be produced by adding a value less than 0x33 to the result of the multiplication. Therefore, a 32-bit immediate value can be constructed by concatenating two 16-bit values, that is, by adding one 16-bit value to the other 16-bit value shifted left by 16 bits. Write/Read register to/from memory Instructions LRDH and STRH are Unicode-proof if the destination register of them is limited to R0 and R2. By using these instructions twice, we can write/read a 32-bit register to/from the memory address specified in any register. Also, because we are able to copy the value of any register to R0 at will, we can write/read any register to/from memory. UNICODE-PROOF SHELLCODE Overview As discussed above, only a few ARM instructions are Unicode-proof, while the operands of the instruction are restricted to a small set of values. To build powerful Unicode-proof shellcode, we provide two methods to achieve the functionality of those instructions that are not Unicode-proof. The first method emulates some of the

discussion, to ensure that our approach is compatible with all versions of the ARM processor. We propose five essential operations that are frequently used in shellcode writing. Each operation is implemented by a sequence of Unicode-proof instructions, as detailed below.

ARM instructions using a sequence of Unicode-proof instructions, while the other creates the ARM instruction using self-modifying code. Emulating ARM instructions with Unicode-proof instructions has two advantages. Typically it takes fewer instructions to emulate an ARM instruction than to construct the ARM instruction with self-modifying code, especially if the ARM instruction is Unicode-proof when the operands are assigned to particular values. Moreover, because this method does not require flushing the instruction cache as is the case with self-modifying code, it is portable across various operating systems. However, not all ARM instructions can be emulated using Unicode-proof instructions. For example, branching instructions cannot be emulated, because writing to the PC register causes unpredictable results in many cases. More details are discussed in Subsection B. Typical self-modifying code contains two parts: first, it produces values representing a valid instruction in a certain register; then, it writes the value of the register to a specific memory location. By using Unicode-proof instructions, we are able to produce arbitrary values and write them to memory locations, as discussed in Section II. However, the ARM architecture does not support self-modifying code because the instruction cache prevents the execution of modified code. A novel approach to flush the instruction cache on Windows CE is proposed in Subsection C. In summary, building Unicode-proof shellcode includes three steps: 1) build a classic shellcode that consists of both Unicode-proof and non-Unicode-proof instructions; 2) if an instruction in the shellcode is not Unicode-proof but can be emulated by Unicode-proof instructions, replace it with a sequence of Unicode-proof instructions that achieve the same functionality; 3) insert self-modifying code at the beginning of the shellcode to overwrite all instructions that are still not Unicode-proof when the shellcode is executed. Emulate ARM instructions There are a total of 147 instructions that can be categorized into 13 groups on the ARM architecture. Many instructions are only available on specific versions of the ARM processor, such as SEL and ADD16. These instructions are ignored in our discussion because they are designed to improve the efficiency of the processor rather than to provide indispensable functionality. Four groups of instructions can be emulated by sequences of Unicode-proof instructions, as discussed below. Arithmetic/logic instructions All instructions except BIC in this group can be emulated. These instructions have three operands: Rd, Rn and shifter_operand. If the instructions are supposed to be Unicode-proof, Rd is limited to R0 and R2. Assuming that Rd is Rx, other than R0 or R2, the instructions can be broken down into the following instructions: <opcode> R0, Rn, Rm shift Rs; MLA Rx, R2, R1, R0 (0 < x <= 15) According to the syntax of ARM instructions, the shift_operand can always be constructed by two operand Rm and Rs, like the first instruction indicates. The second instruction copies the value of R0 to Rx if either R1 or R2 is

assigned to zero. Since we are able to manipulate the content of arbitrary registers as discussed above, we can always find a way that makes the result of these two instruction as same as that of the original instruction. Instruction BIC cannot be emulated because 1) it is not Unicode-proof, irrespective of the values of the operands, and 2) the functionality of BIC requires performing a NOT operation on a certain register, which is not supported by Unicode-proof instructions. Comparison instructions Comparison instructions update the CPSR register with the result of certain operations without storing the actual result in any register. There are four instructions in this class. The emulation of CMN and CMP is the same as the emulation of arithmetic/logic instructions. Instructions TST and TEQ are not Unicode-proof irrespective of what the operands are. However, these instructions actually perform the operations, in the same way as instructions AND and EOR do, respectively. The only difference is that instructions TST and TEQ do not record the result of the operation. Therefore, these two instructions can be emulated by instructions AND and EOR simply by ignoring the result. Multiply instructions There are six types of multiply instructions that are available on all versions of ARM processors, and all of these, except SMLAL, are Unicode-proof if the operands are assigned to certain values. These instructions have four operands: 1) Rd/RdHi, 2) Rn/RdLo, 3) Rm, and 4) Rs. We can expand the scope of these operands by copying the values to registers that make the instruction Unicode-proof, similar to what we did with the operands of the arithmetic/logic instructions. Instruction SMLAL is not Unicode-proof irrespective of the operands. However, it can be emulated by instruction SMULL and ADC. Load and Store instructions Instructions STRH and LDRH have four operands: U, Rd, Rn, and Rm/offset_8. The scope of Rd can be expanded by copying the value of the register, which is the same as what we did to the operands of the arithmetic instructions. Operands U, Rn, and Rm/offset_8 are used to calculate the memory address that the instructions write to or read from. To expand the scope of these operands, we can calculate the address ahead and record the result to certain registers that make the instruction Unicode-proof. By this way, we are able to emulate all LDRH and STRH instructions. Other instructions in this group, except LDRT and STRT, can be emulated by repeatedly using LDRH or STRH instructions. Instruction cache Modern processors generally have an instruction cache to improve processing efficiency. However, the ARM processor does not validate the instruction cache when the corresponding memory is modified. Therefore, modified instructions cannot be executed by the processor. Moreover, published methods of flushing instruction cache on ARM are not available with Unicode-proof instructions.

1. Modify codes

The key of flushing the instruction cache in our method is to divert the control flow to somewhere that is currently not cached, so the processor has to refill the cache with instructions at the new address. Although we do not have any branching instructions, and cannot directly modify the PC register, we can alter the control flow by triggering a fault. The exception mechanism of Windows CE, called Structured Exception Handling (SEH), will then catch the fault and divert the program to a proper exception handler. Using the following expression as an example, the instruction will raise a Data Abort exception if both R0 and R2 are set to zero (this indicates a writing action to 0x00000000). The system is then informed of the exception, and searches for a proper exception handler. STRH R0, [R0-R2] -- 000000B2 The OS then dispatches the exception to a proper exception handler. That is, the program is diverted to the starting point of the exception handler. Provided that the instructions in the exception handler are not cached, the processor will refill its instruction cache. If we form a branching instruction in the exception handler before the exception arises, the program will be diverted back to our shellcode, and the processor will have to refill its instruction cache again. Therefore, the modifications to the shellcode will be read into the instruction cache and executed literally. The whole method is arranged as shown in Fig. 1. If the OS cannot find a proper handler in the user context for the exception, it will terminate the program or abort the current operation immediately rather than pass control back to the user code. We solve the problem by modifying the calling chain of the program. On Windows CE, the system will give each function on the calling chain an opportunity to handle the exception. Building a calling chain is based on the call stack, and therefore gives us an opportunity to modify the calling chain. On the ARM architecture, the call stack contains a block of memory space pointed to by the SP register and the contents of the LR register. Since we are able to manipulate the content of LR register as discussed in Section II, it is not a problem to modify the call stack. Moreover, every Win32 program on Windows CE has at least one exception handler that is associated with the default entry function [5] of the executable file. This exception handler calls an import function, called XcptFilter, with the following instructions: LDR R12, [PC, #4] LDR R12, [R12] BX R12 The first instruction loads a four-byte address allocated below the third instruction into R12, while the second instruction gets the address of XcptFilter accordingly. Because the address of XcptFilter is considered as data and will not be cached in the instruction cache, we can modify it to be the specific address of our shellcode, and therefore the third instruction will divert the program back to our shellcode. Consequently, the instruction cache will be forced to be flushed during such process. EVALUTION We present an example in this subsection to prove the effectiveness of our method in practice. The shellcode in the

S elf-m odifyingcode R aise an E xception Modified codes

2. F orma branch ins truction 3. Jum to the E ception p x 4. Jum back to p Modified codes

Exception H andler

Figure 1. Method to flush the instruction cache

example displays a message box by calling the system API MessageBox, as shown in Fig. 2. The example is tested on three openly available devices: 1) Dopod 818 pro; runs on Windows Mobile 5.0 (evolved from Windows CE 5.0), published in 2005; 2) Samsung i8000; runs on Windows Mobile 6.5 (evolved from Windows CE 5.2) , published in 2009; 3) Meizu M8; runs on Meizu OS 0.9.3.8 (evolved from Windows CE 6.0), published in 2008. The target application is simply modified from a default program generated by Visual Studio 2008. It emulates the classic buffer overflow vulnerability when the program starts up. Our shellcode is then loaded by a code injection attack to display a popup message box. The general shellcode that sets the parameters and calls MessageBox is simple, as show in Fug. 3. The structure of the Unicode-proof shellcode consists of a total of 744 bytes, and can be artificially divided into three sections. The first section consists of 636 bytes, including the self-modifying code and two dummy instructions. The purpose of this section is to create a branching instruction to call MessageBox on the location of the first dummy instruction. To ensure that the shellcode can regain control of the program after the instruction cache is flushed by raising an exception, we modify the call stack to insert the Entry function into the calling chain, and replace the address of XcptFilter with the address of the subsequent section of shellcode, as we discussed in Section III. Moreover, in case the CPSR register is modified when the exception arises, we create an instruction at the address of the second dummy instruction to set the Z flag. The second section consists of 56 bytes. It sets the parameters of MessageBox by several Unicode-proof instructions. Finally, the third section is the same as the common shellcode. From this example, we can see that creating Unicodeproof shellcode is not an easy task: we spent 632 bytes of instructions merely to create a branching instruction (and to make it executable, of course). In practice the length of the Unicode-proof shellcode may be smaller, but not significantly so, as it depends on the contents of the registers and memory when the shellcode takes control. However, to make the example objective, we chose not to take advantage of a particular program. RELATED WORKS Building general shellcode on the ARM architecture is possible for Linux [6] and Windows CE [7]. In these papers, self-modifying code is discussed to make the shellcode nullfree. However, no cache flush is needed in [6] because only the arguments of the instruction SWI are modified. Hurman [7] provided a method to bypass the instruction cache by using the MRC instruction, which is neither Unicode-proof nor privileged after ARMv6. Building Unicode-proof shellcode is possible on the IA32 architecture [8]. Because the length of instructions is

(a) Dopod 818 pro

(b) Samsung i8000

(c) Meizu M8

Figure 2. Example execution on real devices


Unicode -proof S hellcode
S elfm ifying od

ModifyL re is r (call s R g te tack) Modifytheaddre sof X s cptF r ile

C ateanins re tructionto s t Zflagto 1 e C atea branchingins tion to call re truc Me s B s age ox Rais ane ption e xce

62 2 68 Byte ytes B s

G eneral S hellcode
16 6 1 Byte yt s B es 4Byte ytes 4B s 52 2 5 B s yte B

D m um y instruction S et param eters of Messag eBox D m um y instruction D ata

4Byte yte 4B s

S et param eters of Messag eBox C all Messag eBox D ata

56Byte 6 ytes 5 B s 4Byte s 4 ytes 52 2 5 Byte ytes B s

can use Unicode-proof instructions to implement code injection attack on Windows CE, and therefore certain bugs that previously only led to a device crash should now be considered highly dangerous vulnerabilities. It also helps an attacker to evade from intrusion detection systems that try to detect the existence of shellcode in input from the network. This is important especially because the security of mobile devices has already become the target for scrutiny on nowadays network. It is a non-trivial task because 1) only 26 instructions with operands restricted to a small set of values are Unicode-proof; and 2) the instruction cache in the ARM processor prevents the execution of self-modifying code. Our approach is applicable not only on all versions of Windows CE, but also on the evolved systems of Windows CE, including Windows Mobile and even Windows Phone. ACKNOWLEDGMENT This work is supported by The National Natural Science Foundation of China (60773135, 60970140, 90718007), the High Technology Research and Development Program of China (863 Program) (2007AA01Z427, 2007AA01Z450). REFERENCES
[1] M. Becher and R. Hund, Kernel-Level Interception and Applications on Mobile Devices. http://pi1.informatik.uni-mannheim.de/filepool/ publications/TR-2008-003.pdf , May 2008. [2] J. Mason, S. Small, F. Monrose, and G. MacManus. English shellcode. In Proceedings of the 16th ACM conference on Computer and communications security (CCS09), 2009. [3] M. Becher, F. C. Freiling, and B. Leidner. On the Effort to Create Smartphone Worms in Windows Mobile. In Information Assurance and Security Workshop, 2007, pages 199206, 20-22 June 2007. [4] C. Mulliner and G. Vigna. Vulnerability Analysis of MMS User Agents. In Proceedings of the 22nd Annual Computer Security Applications Conference on Annual Computer Security Applications Conference (ACSAC06), 2006. [5] Microsoft corp. Linking to the CRT. http://msdn.microsoft.com/enus/library/ms859584.aspx [6] funkysh. Into my ARMs: Developing StrongARM/Linux shellcode. Phrack, 58, Dec 2001. [7] T. Hurman. Exploring Windows CE shellcode, June 2005. http://www.pentest.co.uk/documents/exploringwce/exploring_wce_sh ellcode.html [8] Obscou. Building IA32 'Unicode-Proof' Shellcodes. Phrack, 61, Aug. 2003. [9] rix. Writing IA32 alphanumeric shellcodes. Phrack, 57, Aug. 2001. [10] Y. Younan, et al. Filter-resistant code injection on ARM. In Proceedings of the 15th ACM Conference on Computer and Communications Security (CCS09), 2009.

Figure 3. Structure of Unicode-proof shellcode and General shellcode

variable, many more instructions can be used compared with the ARM architecture. As such it is much easier to build Unicode-proof shellcode on the IA32 architecture. Moreover, self-modifying code works correctly according to the Intel architecture processor requirements, so there is no need to flush the cache for the IA32 architecture. Another well-known application filter accepts only alphanumeric characters as valid input. An approach to building alphanumeric shellcode is available for both the IA32 [9] and ARM [10] architectures. Despite instructions consisting only of alphanumeric characters also being limited, many more instructions are available than the Unicode-proof instructions. According to Younan [10], 0.34% of all 32-bit words can be used for alphanumerical instructions on the ARM architecture, which is 22 times more than what is available for Unicode-proof instructions, and there are even more alphanumerical instructions available on the IA32 architecture. Moreover, the instruction cache can be flushed in alphanumeric shellcode using a single SWI instruction, which is neither Unicode-proof nor available on Windows CE. CONCLUSION This paper discussed how to build ARM Unicode-proof shellcode on Windows CE. We demonstrated that attackers

You might also like