Hi! My name is Isaac Basque-Rice, I'm A Security Engineer and former Abertay Ethical Hacker, and this website is a repository for all the cool stuff I've done, enjoy!
NOTE: With a few exceptions this guide assumes you have some basic knowledge of computing, such as binary, hexadecimal, memory, and so on.
Assembly language is essentially the lowest level programming language that is at least somewhat reasonable for humans to use. “Low-level”, in this instance, refers to the fact that the instructions the developer inputs and what is actually ran on the machine correspond extremely closely to one another, and hence you are “lower to the metal” of the computer itself.
The main thing that must be borne in mind for ASM, however, is that your mode of thinking has to be different. In order to read and understand ASM, you need to be able to logically execute instructions step by step. This is no easy task, and not something even I am very good at really, but I hope this guide will help you on the way to mastering this thing!
There are different assembly languages for different CPU architectures. The most common of which in the modern day, it can be argued, are x86 and ARM, with the former running on most of what would be considered to be traditional computers (Desktop PCs, Laptops, and so on), and ARM processors running on nearly everything else. There are plenty of exceptions to be sure but these are the general rules.
Assembly, or asm, is generally converted into executable machine code by an assembler, such as nasm or masm.
Naturally, when working this close to the metal the allocation of very specific sizes of memory is of great importance, processors generally support the following data sizes:
In each of these CPUs there is also what are known as “registers”. These registers are small sets of data storage locations on the CPU itself. They are present to facilitate the extremely fast storage and transfer of data during the execution of a program.
In most processors there are 8 general purpose registers, four dedicated to storing data, two of which are pointers, and a final two which are indexes. Each of these sets of registers are divided into between two and four sizes depending on the number of bits within a processor and the type of register one is considering, the size of each of these registered is denoted by the prefix “r-“ for 64-bit processors (i.e., long values in C++), “e-“ for 32-bit (or ints), and no prefix for 16-bit (short values). Each register in the data group is further subdivided into 8-bit registers “-h” and “-l” (AH, AL, BH, BL, etc.), and the registers in the index and pointer groups have 8-bit register subdivisions that are only present on 64-bit devices (denoted by the “-l” suffix only)
With this in mind we can move on to syntax. Each assembly instruction can be generally separated into four sections, as can be seen below:
[label] mnemonic [operands] [;comment]
What do these mean?
label:
. These are found in the symbol table.label:
8:
. These are used when quick references are needed purely locally and normally for one or two operationssource
and destination
operands, that is to day that if, for example, one were to add the values 1 and 2, where 2 was the source and 1 was the destination, the memory location of 1 would be overwritten with (1+2), or 3, and the memory location of 2 would be freed.
add eax, ebx ; add contents of ebx to eax
add %ebx, %eax // add contents of ebx to eax
Assembly files can be separated into numerous “sections” depending on the architecture and operating system on which the code is running. For standalone assembly files these sections are as follows:
section .data
, this is where data and constants that do not need to be changed at runtime are stored. Strings, file names, size of buffers, and so on are stored in here.section .bss
, for variables that are changed at runtime.section .text
, this is where the logic of the code is stored, this section must also begin with global _start
followed by _start:
so the kernel knows where to begin execution of the program.ELF, or Linux Executable files, also have the following user sections:
.comment.
: holds version control information and other comments such as compiler information.data
and .data1
: initialised read/write data.debug
: for debugging info.fini
: runtime finalisation instructions.init
: runtime initialisation instructions.line
: line number info for symbolic debugging.note
: various notes for a whole slew of things (consult the elf(5) man page for further information).rodata
and .rodata1
: read only data for non-writeable segmentsPE files, executables in Windows, contain the following sections:
.edata
: export directory for an app or DLL.idata
: stores data for imported functions.rdata
: read only version of .data
.rsrc
: resources such as strings, icons, and so onThe following is a Hello World code example using what we have learnt so far, this is intended for Linux
section .text
global _start ;must be declared for linker (ld)
_start: ;tells linker entry point
mov edx,len ;message length
mov ecx,msg ;message to write
mov ebx,1 ;file descriptor (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
section .data
msg db 'Hello, world!', 0xa ;string to be printed
len equ $ - msg ;length of the string
In Linux, this hello world program (hello.asm
) can be ran on x86 architecture using the following commands:
nasm -f elf hello.asm
ld -m elf_i386 -s -o hello hello.o
./hello
The first command uses the Netwide Assembler to assemble the code into an ELF (-f
for format), a linux binary executable, and directs the output to an outfile (hello.o), after this the GNU linker takes over which uses -m
to emulate an i386 architecture (x86_32), strips the executable of symbol information using -s
, and then specifies the name of the output file using -o [name]
. The final command simply runs the program.
Below is a non-exhaustive list of mnemonics that may be worth remembering in x86 assembly, their syntax, what they do, and a couple of notes if they are so needed.
Mnemonic | Description | Further Notes |
---|---|---|
ADD | Add | dest = dest + src |
AND | Logical and | consult logical and truth table to know what this does precisely |
CALL | Call subroutine | transfers control of the program to another function |
DEC | Decrement | synonymous with x– or x-=1 |
DIV | Divide | always divides the 64 bits value across EDX:EAX by a value. the second operand is implied |
IDIV | Signed integer divide | same as DIV but signed |
IMUL | Signed integer multiply | dest = dest * src (the operands are signed) |
INC | Increment | synonymous with x++ or x+=1 |
INT | Interrupt | generates a software interrupt, takes one operator representing said interrupt (i.e. INT 03h ) |
JA | Jump if above | |
JAE | Jump if above or equal | |
JB | Jump if below | |
JBE | Jump if below or equal | |
JC | Jump if carry | |
JCXZ | Jump if cx zero | |
JE | Jump if equal | |
JECXZ | Jump if ECX zero | |
JG | Jump if greater | |
JGE | Jump if greater or equal | |
JL | Jump if less | |
JLE | Jump if less or equal | |
JMP | Jump | |
JNA | Jump if not above | |
JNAE | Jump if not above or equal | |
JNB | Jump if not below | |
JNBE | Jump if not below or equal | |
JNC | Jump if no carry | |
JNE | Jump if not equal | |
JNG | Jump if not greater | |
JNGE | Jump if not greater or equal | |
JNL | Jump if not less | |
JNLE | Jump if not less or equal | |
JNO | Jump if no overflow | |
JNS | Jump if no sign (= positive) | |
JNZ | Jump if not zero | |
JO | Jump if overflow | |
JS | Jump if sign (= negative) | |
JZ | Jump if zero | |
MOV | Move (copy) | used to read and write data into memory |
MUL | Multiply | synonymous with x * y |
NOP | No operation (Ox90) | do nothing |
NOT | Invert each bit | 1 becomes 0 and vice versa |
OR | Logical or | consult logical and truth table to know what this does precisely |
POP | Pop from stack | remove contents from top of memory stack |
PUSH | Push onto stack | add contents to top of memory stack |
RET | Return from subroutine | pops the return address off the stack and returns control to that location |
ROL | Rotate left | like shl but shifted bits are rotated to the other end |
ROR | Rotate right | like shr but shifted bits are rotated to the other end |
SAL | Shift left | shift bits of the operand destination to the left by the number of bits in the count operand |
SAR | Shift right | shift bits of the operand destination to the right by the number of bits in the count operand |
SHL | Shift logical left | same as SAL |
SHR | Shift logical right | same as SAR |
SUB | Subtract | synonymous with x - y |
XCHG | Exchange | swaps the value of two registers |
XOR | Logical exclusive or | consult logical and truth table to know what this does precisely |