Assembly – Basic introduction


To totally unlock this section you need to Log-in


Login

Assembly is considered the bottom of the barrel of programming languages - it's considered as low-level as you can go with a programming language. But, as all executables must utilize assembly one way or other, this is also why it is considered very powerful when attempting to learn what is done in a specific executable.

For example, if one program encrypts certain types of files, and you need to learn how the encryption algorithm is done, then you would disassemble the program. From there, assuming you know assembly, you may be capable of understanding what the program does (More importantly, what that algorithm is, which would allow you to write a decryption algorithm).

Assembly uses hexadecimal numbers, so it should be understood the number system is organized as follows:

0 = 0, 1 = 1, 2 = 2, 3 = 3, 4 = 4, 5 = 5, 6 = 6, 7 = 7, 8 = 8, 9 = 9
A = 10
B = 11
C = 12
D = 13
E = 14
F = 15

The above shows numbers from base 16, the hexadecimal system, to base 10, the standard decimal system.

Firstly, assembly is entirely about data manipulation (In general, that's all programming is - manipulating data, effecting hardware to do what you want). To be put simply, usually three things are being modified:

1) The stack
2) Registers/Flags
3) The memory of a program

Now, to explain what the above:

1) The stack is a large stack of numbers, manipulated for handing off parameters[9] to functions[9], storing the registers, and storing other miscellaneous data.

2) Registers are used for completing varying operations (Comparing data, arithmetic functions, logical operations, etc). Usually, they'll store certain types of numbers/addresses, from as low as 4-bits, all the way up to 32-bits (It's possible to go higher than 32-bits, but, most users won't encounter situations where that will be necessary to know). Flags are used for marking registers for different purposes (e.g.: The overflow flag, or OF, will set itself to the number 1, from 0, if an operation[4] using that register is larger than the space that the register can handle; so if you're using a 4-bit register to handle 32-bit data, the OF flag would be set to 1).

3) Varying data in the program is constantly being modified, as the stack and registers can handle only so much data at once, in many cases, it's more efficient to leave some data modification in the program itself (Though it should be noted, this is only done in memory; meaning, if you were to modify the program to display a random popup every 15 minutes while it was running, the moment the program were exited, when you re-open it later, the popup would no longer appear).

Modifying the stack is done through a number of ways, the most common being using PUSH and POP instructions. In assembly, each line is an instruction, limited to at most two parameters, and as little as none.

The PUSH instruction accepts one parameter, which is added to the top of the stack. For example:

PUSH 5

The above would push the value 5 onto the stack, so that it would look like this:

00000005

Now, it should be mentioned, usually a stack base pointer (another type of register, which will be explained further later on) is pushed onto the stack, to act as a reference point for modifying the stack. Therefore, in the beginning of most functions/programs, you'll find the following line:

PUSH EBP

Which simply causes the stack to start looking like this:

00000000

From there, if we can push my data onto the stack:

00000005
00000000

Or, we can save one of my registers by using POP:

POP EAX

NOTE: EAX is an example of a 32-bit register - a full list of available registers and what each one is used for will be covered later.

Assuming the value of EAX was 7C90FFDD, the stack will look like:

00000005
00000000
7C90FFDD

That covers standard modification of the stack - we'll cover more later, such as how functions access certain portions of the stack for parameters being handed off, etc.

There are many varying types of registers, but to explain the bare basics, we'll start with the general purpose registers. It's necessary to note, the following are all prefixed with the same letter to represent that they are extended registers (32-bit). Therefore, the 16-bit register for EAX is AX:

EAX - Accumulator Register
EBX - Base Register
ECX - Counter Register (Used for looping)
EDX - Data Register (Used in multiplication and division)
ESI - Source (Used in memory operations)
EDI - Destination (Used in memory operations)

The above registers each have a sub 4-bit register; for EAX, as the 16-bit is AX, the 4-bit registers are AH and AL - therefore, for (E)BX the 4-bit registers are BH and BL, etc. When referencing pointers, it may be important to keep in mind the different registers.

Processor Registers

There are ten 32-bit and six 16-bit processor registers in IA-32 architecture. The registers are grouped into three categories:

  • General registers
  • Control registers
  • Segment registers

The general registers are further divided into the following groups:

  • Data registers
  • Pointer registers
  • Index registers

Data Registers

Four 32-bit data registers are used for arithmetic, logical, and other operations. These 32-bit registers can be used in three ways:

  1. As complete 32-bit data registers: EAX, EBX, ECX, EDX.
  2. Lower halves of the 32-bit registers can be used as four 16-bit data registers: AX, BX, CX and DX.
  3. Lower and higher halves of the above-mentioned four 16-bit registers can be used as eight 8-bit data registers: AH, AL, BH, BL, CH, CL, DH, and DL.

Some of these data registers have specific use in arithmetical operations.

  • AX is the primary accumulator; it is used in input/output and most arithmetic instructions. For example, in multiplication operation, one operand is stored in EAX or AX or AL register according to the size of the operand.
  • BX is known as the base register, as it could be used in indexed addressing.
  • CX is known as the count register, as the ECX, CX registers store the loop count in iterative operations.
  • DX is known as the data register. It is also used in input/output operations. It is also used with AX register along with DX for multiply and divide operations involving large values.

Assembly - Basic introduction

Pointer Registers

The pointer registers are 32-bit EIP, ESP, and EBP registers and corresponding 16-bit right portions IP, SP, and BP. There are three categories of pointer registers −

Instruction Pointer (IP) − The 16-bit IP register stores the offset address of the next instruction to be executed. IP in association with the CS register (as CS:IP) gives the complete address of the current instruction in the code segment.

Stack Pointer (SP) − The 16-bit SP register provides the offset value within the program stack. SP in association with the SS register (SS:SP) refers to be current position of data or address within the program stack.

Base Pointer (BP) − The 16-bit BP register mainly helps in referencing the parameter variables passed to a subroutine. The address in SS register is combined with the offset in BP to get the location of the parameter. BP can also be combined with DI and SI as base register for special addressing.

Assembly - Basic introduction

Modifying Registers

Modifying registers is essential for loading data from/to the stack or from/to data in the program memory. The most used instruction for loading data into a register is the MOV instruction.

To load what's stored at the address 01009000 into register EAX:

MOV EAX, DWORD PTR DS:[01009000]

One new thing was introduced on top of the MOV instruction and the EAX register:

DWORD PTR DS:[Address]

DWORD is a 32-bit value. PTR stands for "pointer", meaning that the data at address 01009000 is being loaded, not the number 01009000. DS stands for "data segment", meaning the loaded value is from the .data section.

To expand, there are four "segment registers", pointing to the segments in the executable:

CS - Code Segment (References anything in the .code section)
DS - Data Segment (References anything in the .data section)
SS - Stack Segment (References the stack)
ES - Extra Segment (Rarely used)

There are also three pointer registers (One of them earlier was already referenced, EBP):

EBP - Base Pointer
ESP - Stack Pointer (Offset to the EBP - "points" to the EBP)
EIP - Instruction Pointer (Points to the address of the next instruction)

Now, apart from the MOV instruction, there is also the LEA instruction. The LEA instruction (Load Effective Address) is slightly slower, and ends with slightly larger code. It's used in preparing the loading of pointers into registers, allowing even math operations to be used (NOTE: Where as MOV can load data into memory, LEA is limited to only modifying registers).

The use is identical to MOV:

LEA EAX, DWORD PTR SS:[EBP-4]

Note the use of the stack being referenced - [EBP-4] means to go to the stack pointer and access the line directly above it. A better example of LEA would be:

LEA EAX, [EAX+EBX*4+256]

Note the use of multiplication via the asterisk, and even addition between registers. Now, onto the easy math operations:

  • ADD destination, source - Adds the "destination" and "source", leaving the result on the "destination".
  • SUB destination, source - Subtracts the "destination" and "source", leaving the result on the "destination".
  • SAL destination, source - Shifts the destination to the left source times (e.g.: 15 shifted once to the left would turn into 5, but shifting once to the right, and the number would still be 5).
  • SAR destination, source - Shifts the destination to the right source times (e.g.: 15 shifted once to the left would turn into 1, but shifting once to the left, and the number would be 10).
  • INC destination - Increment the destination (Add one to the given value).
  • DEC destination - Decrement the destination (Subtract one to the given value).

The final important factor in the basics of assembly are conditional statements (If condition then statement, if not condition then statement, etc) and looping. For comparing data, the CMP instruction is used:

CMP EAX, 1

Now, the comparison has to end up somewhere, and the possible outcomes are different types of jumps. If EAX is greater than (Or equal to), less than (Or equal to), and equal to (Or not) the number 1, then a jump to a specific address is made. If not, nothing is done.

CMP EAX, 1
JE 00401000

  • jge -Jump if they're greater or equal ; This will not work on negative registers.
  • jg - Jump if they're greater than.
  • jle -Jump if they're less or equal.
  • jl - Jump if they're less.
  • jne - Jump if they're not equal ; This conditional jump and all the following will work with both negative and positive numbers alike.
  • je - Jump if they're equal.
  • jne - Jump if they're not equal.
  • jae - Jump if they're above/greater than or equal.
  • ja - Jump if they're above/greater than.
  • jbe - Jump if they're below/less than or equal.
  • jb - Jump if they're below/less than.

The other operation for comparing two numbers is the TEST instruction, which is identical to an AND, but rather than storing the result, the next instructions will check if the result of the AND was zero or one.

  • jz - Jump if the result was zero.
  • jnz - Jump if the result was not zero (Meaning it was one).

Assume EAX is 00000001:

TEST EAX, 1
JNZ 00401000

Since the value of EAX is 1 and the comparison value is 1, the jump will not occur. Now, these tactics can also be used to repeat steps, for example:

0100739D   MOV EAX,0
010073A2   CMP EAX,5
010073A5   JE 010073B1
010073AB   INC EAX
010073AC   JMP 00401000
010073B1   RETN

The EAX register is set to zero, then EAX is compared to 5 - if EAX has the value 5, it jumps to the RETN instruction, to exit the function. Otherwise, the executing continues, and INC EAX is called, to add 1 to EAX repeatedly, until eventually, EAX is 5, and will jump to the RETN.