IZBR

Hi! My name is Isaac Basque-Rice, I'm A Security Engineer and former Abertay Ethical Hacker, and this website is a repository for all the cool stuff I've done, enjoy!


Project maintained by IBRice101 Hosted on GitHub Pages — Theme by mattgraham

Assembly Basics

NOTE: With a few exceptions this guide assumes you have some basic knowledge of computing, such as binary, hexadecimal, memory, and so on.

Background and Overview

Assembly language is essentially the lowest level programming language that is at least somewhat reasonable for humans to use. “Low-level”, in this instance, refers to the fact that the instructions the developer inputs and what is actually ran on the machine correspond extremely closely to one another, and hence you are “lower to the metal” of the computer itself.

The main thing that must be borne in mind for ASM, however, is that your mode of thinking has to be different. In order to read and understand ASM, you need to be able to logically execute instructions step by step. This is no easy task, and not something even I am very good at really, but I hope this guide will help you on the way to mastering this thing!

There are different assembly languages for different CPU architectures. The most common of which in the modern day, it can be argued, are x86 and ARM, with the former running on most of what would be considered to be traditional computers (Desktop PCs, Laptops, and so on), and ARM processors running on nearly everything else. There are plenty of exceptions to be sure but these are the general rules.

Assembly, or asm, is generally converted into executable machine code by an assembler, such as nasm or masm.

Naturally, when working this close to the metal the allocation of very specific sizes of memory is of great importance, processors generally support the following data sizes:

In each of these CPUs there is also what are known as “registers”. These registers are small sets of data storage locations on the CPU itself. They are present to facilitate the extremely fast storage and transfer of data during the execution of a program.

In most processors there are 8 general purpose registers, four dedicated to storing data, two of which are pointers, and a final two which are indexes. Each of these sets of registers are divided into between two and four sizes depending on the number of bits within a processor and the type of register one is considering, the size of each of these registered is denoted by the prefix “r-“ for 64-bit processors (i.e., long values in C++), “e-“ for 32-bit (or ints), and no prefix for 16-bit (short values). Each register in the data group is further subdivided into 8-bit registers “-h” and “-l” (AH, AL, BH, BL, etc.), and the registers in the index and pointer groups have 8-bit register subdivisions that are only present on 64-bit devices (denoted by the “-l” suffix only)

Syntax

Instructions

With this in mind we can move on to syntax. Each assembly instruction can be generally separated into four sections, as can be seen below:

[label]    mnemonic    [operands]    [;comment]

What do these mean?

Sections

Assembly files can be separated into numerous “sections” depending on the architecture and operating system on which the code is running. For standalone assembly files these sections are as follows:

ELF, or Linux Executable files, also have the following user sections:

PE files, executables in Windows, contain the following sections:

Code Example (Hello World)

The following is a Hello World code example using what we have learnt so far, this is intended for Linux

section	.text
   global _start    ;must be declared for linker (ld)
	
_start:	            ;tells linker entry point
   mov	edx,len     ;message length
   mov	ecx,msg     ;message to write
   mov	ebx,1       ;file descriptor (stdout)
   mov	eax,4       ;system call number (sys_write)
   int	0x80        ;call kernel
	
   mov	eax,1       ;system call number (sys_exit)
   int	0x80        ;call kernel

section	.data
msg db 'Hello, world!', 0xa     ;string to be printed
len equ $ - msg                 ;length of the string

Compiling This Code

In Linux, this hello world program (hello.asm) can be ran on x86 architecture using the following commands:

nasm -f elf hello.asm
ld -m elf_i386 -s -o hello hello.o 
./hello

The first command uses the Netwide Assembler to assemble the code into an ELF (-f for format), a linux binary executable, and directs the output to an outfile (hello.o), after this the GNU linker takes over which uses -m to emulate an i386 architecture (x86_32), strips the executable of symbol information using -s, and then specifies the name of the output file using -o [name]. The final command simply runs the program.

Mnemonic Table

Below is a non-exhaustive list of mnemonics that may be worth remembering in x86 assembly, their syntax, what they do, and a couple of notes if they are so needed.

Mnemonic Description Further Notes
ADD Add dest = dest + src
AND Logical and consult logical and truth table to know what this does precisely
CALL Call subroutine transfers control of the program to another function
DEC Decrement synonymous with x– or x-=1
DIV Divide always divides the 64 bits value across EDX:EAX by a value. the second operand is implied
IDIV Signed integer divide same as DIV but signed
IMUL Signed integer multiply dest = dest * src (the operands are signed)
INC Increment synonymous with x++ or x+=1
INT Interrupt generates a software interrupt, takes one operator representing said interrupt (i.e. INT 03h)
JA Jump if above  
JAE Jump if above or equal  
JB Jump if below  
JBE Jump if below or equal  
JC Jump if carry  
JCXZ Jump if cx zero  
JE Jump if equal  
JECXZ Jump if ECX zero  
JG Jump if greater  
JGE Jump if greater or equal  
JL Jump if less  
JLE Jump if less or equal  
JMP Jump  
JNA Jump if not above  
JNAE Jump if not above or equal  
JNB Jump if not below  
JNBE Jump if not below or equal  
JNC Jump if no carry  
JNE Jump if not equal  
JNG Jump if not greater  
JNGE Jump if not greater or equal  
JNL Jump if not less  
JNLE Jump if not less or equal  
JNO Jump if no overflow  
JNS Jump if no sign (= positive)  
JNZ Jump if not zero  
JO Jump if overflow  
JS Jump if sign (= negative)  
JZ Jump if zero  
MOV Move (copy) used to read and write data into memory
MUL Multiply synonymous with x * y
NOP No operation (Ox90) do nothing
NOT Invert each bit 1 becomes 0 and vice versa
OR Logical or consult logical and truth table to know what this does precisely
POP Pop from stack remove contents from top of memory stack
PUSH Push onto stack add contents to top of memory stack
RET Return from subroutine pops the return address off the stack and returns control to that location
ROL Rotate left like shl but shifted bits are rotated to the other end
ROR Rotate right like shr but shifted bits are rotated to the other end
SAL Shift left shift bits of the operand destination to the left by the number of bits in the count operand
SAR Shift right shift bits of the operand destination to the right by the number of bits in the count operand
SHL Shift logical left same as SAL
SHR Shift logical right same as SAR
SUB Subtract synonymous with x - y
XCHG Exchange swaps the value of two registers
XOR Logical exclusive or consult logical and truth table to know what this does precisely