Introduction to Reverse Engineering - What is Assembly Code?
Welcome to our beginner’s series to reverse engineering, binary exploitation, web exploitation, and other security-related concepts.
This series will cover some basic concepts related to security, hacking, cybersecurity, or whatever you want to call it. The following are some upcoming topics you can check:
- What is Binary Exploitation
- Introduction to Registers
- The Stack explained
- Introduction to Calling Conversions
- Brief Introduction to Global Offsec Table
- Introduction to Buffers and Buffer Overflows
- Introduction to the heap and heap exploitation
- The Basics of Disassemblers, Debuggers, and Decompilers
And many more. If that sounds interesting, subscribe to our newsletter to get the post straight to your email.
What is Reverse Engineering?
Reverse engineering refers to taking an already compiled code, either machine code or byte code, and converting it back into a human-readable format.
In most cases, reverse engineering allows us to understand the program’s functionality better and determine how it runs. This can then help us find flaws and attempt to exploit them and make the code work in a different way than its intended use.
An example use of reverse engineering is software cracks. These are tools developed to circumnavigate the licensing of a given software and bypass the locked interface.
NOTE: This site does not in no way condone or encourage the use of Pirated software :D
Anatomy of Reverse Engineering
It is good to understand that reverse engineering is an extensive field built on other disciplines. However, although it can be difficult to list exactly what you need, three main components are fundamental to reverse engineering.
- Assembly or Machine Code
- Disassemblers
- Decompilers
For this tutorial, we will introduce you to the world of RE by learning the fundamentals of Assembly or Machine code. Stay tuned for upcoming topics on Disassemblers and Decompilers.
Introduction to Assembly Code
Assembly code or machine code refers to assembly instructions that are formatted to be read and understood by the computer CPU. When we write a program in any human-readable language, such as C, C++, Rust, etc, it must be converted to assembly code allowing the CPU to decode and understand the target operations. This is also known as compilation.
Once the code has been compiled to assembly code, it is hard to reverse back into human readable code as you find in your favorite language. There are tools that can do a good job of it but not quite.
Source Code to Assembly Code
Let us now illustrate how assembly code looks like. For our illustration, we will write a simple hello world program in Rust and convert it to Assembly code using Compiler Explorer.
For example, take a simple hello world program in C as shown:
#include <stdio.h>
int main() {
printf("Hello World!");
return 0;
}
Head over to the Compiler Explorer and paste the hello world program above. This should show you the resulting assembly code in real-time on the left panel.
An example resulting code is as shown:
.LC0:
.string "Hello World!"
main:
push rbp
mov rbp, rsp
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
pop rbp
ret
Ok, what is that?
Although it may look gibberish or complex at first glance, Assembly code is easy to read and interpret with little practice. This is because it’s made up of repeatable and logical instructions.
X86-64
x86-64 or amd64, or i64, is a 64-bit Complex Instruction Set Computing (CISC) architecture. This means the registers used for this architecture extend an extra 32 bits on Intel’s x86 architecture. CISC means that a single instruction can do many different things simultaneously, such as memory accesses, register reads, etc.
It is also a variable-length instruction set, meaning different instructions can be of different sizes ranging from 1 to 16 bytes long. And finally, x86-64 allows for multi-sized register access, which means you can access certain parts of a register of different sizes.
x86-64 Registers
x86-64 registers behave similarly to other architectures. A key component of x86-64 registers is multi-sized access, meaning the register RAX can have its lower 32-bits accessed with EAX. The next lower 16 bits can be accessed with AX, and the lowest 8 bits can be accessed with AL, allowing the computer to optimize program execution.
x86-64 has plenty of registers, including rax
, tax
, rcx
, rdx
, rdi
, rsi
, rsp
, rip
, r8-r15
, and more! But some registers serve particular purposes.
The special registers include:
- RIP: the instruction pointer
- RSP: the stack pointer
- RBP: the base pointer
Assembly Instructions
Assembly code is comprised of a series of instructions that determine the operation performed by the CPU. You will find various instructions, such as:
- Data Movement instructions -
mov
,pop
,push
lea
- Arithmetic and Logic Instructions -
add
,sub
,inc
,dec
,imul
,and
,or
etc. - Control Flow Instructions -
jmp
,jcondition
,cmp
,call.ret
Execution
What should the CPU execute? This is determined by the RIP register, where IP means instruction pointer. Execution follows the pattern: fetch the instruction at the address in RIP, decode it, and run it.
Examples
mov rax, 0xdeadbeef
Here the operation mov
is moving the “immediate” 0xdeadbeef
into the register RAX
mov rax, [0xdeadbeef + rbx * 4]
Here the operation mov
moves the data at the address of [0xdeadbeef + RBX*4]
into the register RAX
. When brackets are used, you can think of the program as getting the content from that effective address.
Example Execution
-> 0x0804000: mov eax, 0xdeadbeef Register Values:
0x0804005: mov ebx, 0x1234 RIP = 0x0804000
0x080400a: add, rax, rbx RAX = 0x0
0x080400d: inc rbx RBX = 0x0
0x0804010: sub rax, rbx RCX = 0x0
0x0804013: mov rcx, rax RDX = 0x0
0x0804000: mov eax, 0xdeadbeef Register Values:
-> 0x0804005: mov ebx, 0x1234 RIP = 0x0804005
0x080400a: add, rax, rbx RAX = 0xdeadbeef
0x080400d: inc rbx RBX = 0x0
0x0804010: sub rax, rbx RCX = 0x0
0x0804013: mov rcx, rax RDX = 0x0
0x0804000: mov eax, 0xdeadbeef Register Values:
0x0804005: mov ebx, 0x1234 RIP = 0x080400a
-> 0x080400a: add, rax, rbx RAX = 0xdeadbeef
0x080400d: inc rbx RBX = 0x1234
0x0804010: sub rax, rbx RCX = 0x0
0x0804013: mov rcx, rax RDX = 0x0
0x0804000: mov eax, 0xdeadbeef Register Values:
0x0804005: mov ebx, 0x1234 RIP = 0x080400d
0x080400a: add, rax, rbx RAX = 0xdeadd123
-> 0x080400d: inc rbx RBX = 0x1234
0x0804010: sub rax, rbx RCX = 0x0
0x0804013: mov rcx, rax RDX = 0x0
0x0804000: mov eax, 0xdeadbeef Register Values:
0x0804005: mov ebx, 0x1234 RIP = 0x0804010
0x080400a: add, rax, rbx RAX = 0xdeadd123
0x080400d: inc rbx RBX = 0x1235
-> 0x0804010: sub rax, rbx RCX = 0x0
0x0804013: mov rcx, rax RDX = 0x0
0x0804000: mov eax, 0xdeadbeef Register Values:
0x0804005: mov ebx, 0x1234 RIP = 0x0804013
0x080400a: add, rax, rbx RAX = 0xdeadbeee
0x080400d: inc rbx RBX = 0x1235
0x0804010: sub rax, rbx RCX = 0x0
-> 0x0804013: mov rcx, rax RDX = 0x0
0x0804000: mov eax, 0xdeadbeef Register Values:
0x0804005: mov ebx, 0x1234 RIP = 0x0804005
0x080400a: add, rax, rbx RAX = 0xdeadbeee
0x080400d: inc rbx RBX = 0x1235
0x0804010: sub rax, rbx RCX = 0xdeadbeee
0x0804013: mov rcx, rax RDX = 0x0
Control Flow
How can we express conditionals in x86-64? We use conditional jumps such as:
jnz <address>
je <address>
jge <address>
jle <address>
- etc.
They jump if their condition is true, and go to the next instruction otherwise. These conditionals check EFLAGS which are special registers that store flags on specific instructions such as add rax, rbx
which sets the o (overflow) flag if the sum is greater than a 64-bit register can hold and wraps around. You can jump based on that with a jo
instruction. The most important thing to remember is the cmp instruction:
cmp rax, rbx
jle error
This assembly jumps if RAX <= RBX
Addresses
Memory acts similarly to an immense array where the indices of this “array” are memory addresses. Remember from earlier:
mov rax, [0xdeadbeef]
The square brackets mean “get the data at this address.” This is analogous to the C/C++ syntax: rax = *0xdeadbeef;
Conclusion
This was a simple introduction to reverse engineering by learning how to work with Assembly code. This article is produced in conjunction with OSIRIS Lab and CTF101