x86 ASSEMBLY LANGUAGE

2025-04-23

x86 Assembly: A Beginner's Guide to Disassembly

This article is not just a tutorial. It's a story, a guided tour through the roads of memory addresses, and instructions, narrated by someone who once thought EAX was a new Bollywood remix track. Whether you're a cybersecurity fresher, a curious engineer, or just someone who clicked this article while avoiding Jira tasks, this is for you.

TL;DR

x86 Assembly is foundational to understanding how software, including malware, truly behaves under the hood. It reveals the "what" and "how" behind every action taken at runtime.
Registers act as temporary storage; critical for data movement, operations, and function logic. Knowing how general-purpose and special-purpose registers work unlocks your ability to trace the program's logic.
Memory addressing and stack operations (using PUSH, POP, CALL, and RET) are crucial to understanding program flow, especially in analyzing exploits, shellcode, or reverse-engineered binaries.
Function calls and the calling convention help trace arguments, returns, and stack frames. This knowledge is invaluable when debugging or analyzing control transfers within a binary.
Common instructions like MOV, ADD, SUB, and logical/bitwise operations form the building blocks of logic inside malware and benign software alike.
Control flow instructions such as JMP, CALL, and conditional jumps (JZ, JNZ, etc.) allow binaries to make decisions. Malware uses these heavily for logic branching and evasion.
Strings, APIs, and imports offer early behavioral indicators. Functions like VirtualAlloc, CreateFile, or connect often hint at intentions such as persistence, execution, or communication.
Dynamic API resolution and obfuscation techniques are commonly used to evade static detection.

Introduction

Well, in the 'Malware Analysis', the x86 disassembly is your weapon, which you should know how to use before stepping onto the battlefield. Modern threats are hard to detect in plain sight; they hide behind compiled binaries, layers of obfuscation, and packing techniques that strip away all visible traces of high-level logic.

All we have is raw machine code ... unreadable to the average eye, but a rich source of behavioral insight to those who understand it.

Disassembly is the process of "converting binary instructions into human-readable assembly code". This is often the first step in reverse engineering, where analysts attempt to reconstruct the original logic of a program without access to its source code. Tools like IDA Pro, Ghidra, and OllyDbg help make sense of this low-level code by revealing instruction flow, memory access patterns, and API calls. (I know it has already started sounding like some quantum mechanics, but bear with me, it's real fun)

This article aims to break down the fundamentals of x86 disassembly in a structured, beginner-friendly way, along with clarity, precision, and practical context. Whether you're just entering the world of reverse engineering or revisiting the basics to sharpen your skills, this guide will help you build a strong foundation in reading and understanding disassembled code.

1. What is Disassembly?

Every journey into malware analysis eventually lands you in front of a disassembler: a tool that doesn't speak the language of Python, Java, or C. It speaks assembly. And your job? To translate :)

"Disassembly is the act of converting machine code"

It is converting those raw binary instructions stored inside executable files, into assembly language, a more readable, symbolic representation of what the CPU actually executes. Unlike high-level programming languages that offer loops, variables, and classes, assembly speaks in registers, opcodes, and addresses. It may look cryptic at first glance, but it holds the truth of what a program actually does.

1.1 What Exactly Is Happening When We "Disassemble"?

When a program is compiled (say, written in C or C++), all its human-readable code is translated into machine code. The sequences of bytes that a processor understands and executes. This machine code is packed into an executable format like .exe or .dll.

A disassembler takes these binary instructions and reverses them into their assembly counterparts. However, it's not a perfect reversal; the names of variables, functions, or comments are long gone, but the structure and logic are still intact, just in a much lower-level form.

Take this simple C code:

int sum = a + b;

After disassembly, it might look like:

mov eax, [ebp+8]   ; load variable a
add eax, [ebp+12]  ; add variable b

1.2 Why Does Disassembly Matter in Malware Analysis?

Most malware is distributed as compiled binaries. There is no source code to audit, no GitHub repo to dig into. What you're left with is a file that needs to be reversed, and disassembly is the very first lens through which its behavior becomes visible.

Here's what disassembly allows you to do:

Reveal control flow: You can see how functions are called and how decisions (like if/else) are made.
Detect malicious behavior: Identify file access, network connections, and registry modifications.
Understand obfuscation: Spot when the malware is trying to hide what it's doing using packing or encryption.
Find payloads: Navigate to hidden code blocks and understand how they are executed.

In short, it strips away the user-facing behavior and exposes the real intent of the code.

1.3 A Glimpse of What You Might See

Let's look at an example. Here's a snippet from a disassembled binary:

push ebp
mov ebp, esp
sub esp, 0x10
mov eax, [ebp+8]
call eax

To someone new, it may look like noise. But an analyst reads this like a paragraph:

Set up the stack frame.
Reserve space on the stack.
Fetch a function pointer passed as an argument.
Call it dynamically.

This is a classic signature of indirect function calls, something malware authors use to evade detection and resolve APIs dynamically at runtime.

Don't worry if it doesn't make sense to you now, it will make sense later. You're reading this article to get rid of those scary PUSH, MOV monsters, right? Just Keep Reading.

1.4 The First Step Toward Seeing Clearly

Disassembly is not about memorizing every instruction. It's about developing a mindset .... learning to trace logic, recognize patterns, and ask the right questions. You begin to move from "What does this do?" to "Why is it doing this?", and eventually, "What is it trying to hide?"

The more you practice, the clearer things become. What once looked like hex and chaos starts to resemble method and design. Disassembly is where the veil begins to lift. It's where curiosity turns into investigation and investigation into insight.

2. Assembly Language Basics

Before diving deeper into disassembly, it's crucial to understand the language that disassemblers speak ... "The Assembly". Think of it as the middle ground between raw binary and human-readable code. It might look intimidating at first, but once you get hold of it, you will start observing patterns.

To help simplify this, we'll break down assembly into bite-sized concepts.

2.1 Registers: The CPU's Working Desk

Registers are like fast-access notepads inside the CPU. Instead of going to RAM for every single operation (which is slower), the CPU uses registers to store data temporarily while it's working.

Here are the common general-purpose registers in x86:

Register	Purpose
EAX	Accumulator – for arithmetic ops
EBX	Base – often used for addressing
ECX	Counter – used in loops
EDX	Data – used in I/O operations
ESI	Source Index – used in array ops
EDI	Destination Index – for string ops
ESP	Stack Pointer – points to top of the stack
EBP	Base Pointer – used to access function parameters

You'll see these over and over again, so consider this as your vocabulary list.

2.2 Instructions — The Action Verbs

Assembly programs are made of instructions. Each instruction tells the CPU to perform a specific operation.

Here are some common ones:

Instruction	Meaning
MOV	Move data from one place to another
PUSH	Place data onto the stack
POP	Remove data from the stack
CALL	Call a function
RET	Return from a function
ADD/SUB	Arithmetic operations
JMP	Jump to another address
CMP	Compare two values
JZ/JNZ	Jump if zero / not zero

Example:

mov eax, 5
add eax, 3

This loads the value 5 into the EAX register, then adds 3 to it, resulting in EAX = 8.

Simple, right? It's like math with variables, just very close to the metal.

2.3 The Stack — Function Calls and Local Variables

The stack is a Last-In-First-Out (LIFO) structure used to manage function calls and local data. Think of it like a pile of clothes or a stack of books kept upon one another — last in, first out.

The x86 Layout [Source: Practical Malware Analysis]

So here is what happens when a function is called:

The return address is pushed onto the stack.
Local variables are created by subtracting from the stack pointer (ESP).
Parameters are accessed relative to the base pointer (EBP).

Here's a classic function setup:

push ebp
mov ebp, esp
sub esp, 0x10

This is setting up the stack frame, a standard practice in almost every function in a program.

2.4 Memory Access: Addressing the Data

Assembly can work with memory using pointers and offsets.

Example:

mov eax, [ebp+8]

This means:

Get the value stored at the memory location ebp + 8 and store it in eax.

There are different ways assembly can access memory:

Type	Syntax	Description
Direct	mov eax, [0x401000]	Access a fixed memory location
Register indirect	mov eax, [ebx]	Access memory at address stored in ebx
Base + Offset	mov eax, [ebp+8]	Common in function stack frames
Indexed	mov eax, [ebx+ecx*4]	Used for arrays or looped data access

This is commonly how arguments passed to a function are accessed.

2.5 Decision Making in Assembly

Unlike high-level languages that use if and else, assembly uses combinations of CMP (compare) and Jxx (jump) instructions.

Example:

cmp eax, 0
je  label

Reads like:

Compare eax with 0. If equal, jump to label

This is how conditions are implemented and how malware decides what code to run under what circumstances.

2.6 Understanding Function Calls in x86 Assembly

In x86 assembly, function calls may seem verbose compared to high-level languages, but the underlying logic is methodical and predictable. Function calls use the CALL and RET instructions. The call instruction is the primary way to invoke a function in x86.

It does two important things:

Pushes the address of the next instruction (i.e., return address) onto the stack.
Jumps to the target function's address.

Example: CALL 0x401000

This jumps to the address 0x401000 and saves the return address on the stack. Once the function is done, it executes ret to go back to where it came from.

Malware often uses indirect calls call eax

This means the actual function address is stored in eax, making it harder to statically analyze.

3. Instructions: MOV, PUSH, POP

When you begin analyzing x86 assembly, some instructions appear so frequently that they become the natural starting point for learning. Among them, MOV, PUSH, and POP are essential. These instructions handle how data is moved around between registers, memory, and the stack.

3.1 MOV

The MOV instruction is used to copy data from one location to another. It doesn't move in the traditional sense (like deleting from source), but rather duplicates the value.

Examples:

mov eax, 5           ; Copy the value 5 into the eax register
mov ebx, eax         ; Copy the value from eax into ebx
mov [ebp-4], eax     ; Store eax's value into a local variable
mov eax, [ebp+8]     ; Load the value from memory (typically a function argument)

Key Points:

The source stays unchanged.
You can't directly copy from memory to memory.
It's used for data transfer between registers, memory, and constants.

3.2 PUSH

PUSH adds a value to the top of the stack and adjusts the stack pointer accordingly.

Example:

push eax

This saves the current value of eax onto the stack. It's useful when a function is about to change a register and needs to preserve its original value. In reverse engineering, spotting a PUSH before a CALL tells you that data (often a function argument) is being prepared.

3.3 POP

POP is the counterpart to PUSH. It takes the top value from the stack and places it into a register or memory location.

Example:

pop eax

This pulls the last pushed value from the stack into eax. After this, ESP is increased by 4, effectively removing the value from the stack.

POP is commonly seen:

At the end of a function to restore register values.
During cleanup after a function call.

3.4 Arithmetic

This summarises some other instructions.

Instruction	Syntax	Effect / Description
ADD	add reg, value	Adds the value to the register.
SUB	sub reg, value	Subtracts the value from the register.
INC	inc reg	Increments the register by 1 (equivalent to add reg, 1).
DEC	dec reg	Decrements the register by 1 (equivalent to sub reg, 1).
NEG	neg reg	Changes the sign of the value (e.g., eax = -eax).
CMP	cmp reg, value	Subtracts value from register only to set flags, not store.

4. Control Flow

In high-level languages, you're used to if, else, while, and for. In assembly, these friendly structures are replaced by jumps, flags, and comparisons. Control flow instructions define the logic and decisions of a program. When you're analyzing malware, these instructions tell you how it hides, loops, and executes conditionally.

4.1 Unconditional Jumps: JMP

Syntax: jmp destination

This instruction tells the CPU to jump to a new memory address without checking any condition. Think of it as a "go-to" without a second thought.

Example:

jmp 0x401050

This will shift the control straight to the address 0x401050, ignoring what comes next.

Malware often uses jmp to skip over detection code or to hop into a payload region.

4.2 Conditional Jumps: JE, JNE, JG, JL, etc.

These instructions check the status flags (set by previous operations like cmp) and jump only if conditions match.

Instruction	Meaning	Condition
je/jz	Jump if Equal/Zero	ZF=1 (Zero flag is set)
jne/jnz	Jump if Not Equal/Not Zero	ZF=0
jg/jnle	Jump if Greater	ZF=0,SF=OF
jl/jnge	Jump if Less.	SF≠OF
ja	Jump if Above(unsigned)	CF=0,ZF=0
jb	Jump if Below(unsigned)	CF=1

These are how if-else and loops are formed at the assembly level.

Example:

cmp eax, 5
je 0x401080

If eax == 5, control jumps to 0x401080.

4.3 Loop Instructions: LOOP, LOOPE, LOOPNE

These are specialized instructions for repeating code blocks:

LOOP: Decrements ecx and jumps if ecx != 0.
LOOPE / LOOPZ: Jumps if ecx != 0 and Zero Flag = 1.
LOOPNE / LOOPNZ: Jumps if ecx != 0 and Zero Flag = 0.

Rare in modern compilers, but malware sometimes uses them in obfuscated logic.

4.4 Function Return: RET

RET pops the top value from the stack into EIP, transferring control back to the calling function. It marks the end of a function and is crucial for tracing malware call trees.

Example:

ret

4.5 Call Instructions: CALL

Pushes the current instruction pointer (EIP) to the stack and jumps to the target address. Commonly used to invoke subroutines or APIs.

Example:

call 0x401000

This will:

Push the return address onto the stack.
Jump to 0x401000.

In malware, call is often used to invoke API functions dynamically or jump into shellcode.

Quick Tip

When analyzing control flow in malware:

Trace jmp and call instructions carefully.
Use tools like IDA or Ghidra to visualize flow graphs.
Beware of control flow obfuscation — multiple jumps may be used just to confuse you.

5. Strings, APIs & Imports

When you start reversing binaries, you'll quickly realize that the presence of readable strings, imported APIs, and function references can reveal a lot often before even stepping through a single instruction.

Attackers try to obfuscate, encrypt, or dynamically resolve functions to make our lives harder, but this section is your toolkit for catching them early.

5.1 Understanding Strings in Binaries

Strings in disassembled code are human-readable sequences stored in the binary's data section. These can include:

File paths
Registry keys
Error messages
URLs and IP addresses
User-agent strings
Commands to be executed

Why Strings Matter in Malware Analysis?

Because strings often tell you:

What the malware interacts with
Where it connects (network indicators)
What system resources or tools does it target
What encryption routines or libraries might it use

Example:

push offset aHttpGoogleCom ; "http://google.com"
call ds:WinHttpOpen

The presence of that URL gives you a clear sign that this sample phones home.

NOTE: Use tools like strings, Ghidra, or IDA to extract all readable strings and correlate them with context.

5.2 APIs and Their Role in Malware

API calls form the bridge between the malware and the operating system. Windows provides thousands of APIs, and malware samples selectively use many for:

File operations (CreateFile, WriteFile)
Network communication (WinHttpOpen, send, recv)
Process and memory manipulation (CreateProcess, VirtualAlloc, WriteProcessMemory)
Persistence and registry edits (RegCreateKeyEx, SetValueEx)

If you understand the APIs used, you understand the behavior of the malware.

Commonly Abused API Categories

Category	API Examples	Purpose
File I/O	CreateFile, WriteFile, DeleteFile	Read/write/delete
Process Management	CreateProcess, OpenProcess, TerminateProcess	Run or control processes
Memory Manipulation	VirtualAlloc, ReadProcessMemory	Shellcode injection, process hollowing
Networking	WSAStartup, connect, send, recv	C2 communication, data exfiltration
Registry Access	RegOpenKey, RegSetValue	Establish persistence, configuration
Evasion & Obfuscation	IsDebuggerPresent, NtQueryInformationProcess	Anti-debugging, stealth

5.3 Import Address Table (IAT)

When a program uses an API, it doesn't store the whole function — just its reference in the Import Address Table (IAT). This table is populated during the program load by the Windows loader, pointing imported functions to their respective addresses in memory.

You'll often find:

call dword ptr [<&KERNEL32.CreateFileA>]

That means the binary is using CreateFileA, and it was resolved via the IAT.

5.4 Static vs. Dynamic API Resolution

Malware authors know that analysts rely heavily on imports. To evade static detection, many resolve API calls at runtime.

Dynamic Resolution Example:

call GetProcAddress
call LoadLibraryA

The malware may:

Load kernel32.dll dynamically.
Use GetProcAddress to retrieve VirtualAlloc.
Call it via a register or an indirect pointer.

You may see:

call eax

With no clear reference, making the analysis harder.

Tip: Set breakpoints on LoadLibrary and GetProcAddress while debugging to track which APIs are being resolved dynamically.

Conclusion

As we draw the curtain on this exploration of x86 disassembly, remember that every binary holds a story, and now you have the keys to translate it. You've seen how registers whisper secrets, how the stack frames map function calls like a subterranean railway, and how control-flow branches carve paths through a program's intent. You've learned to spot the tell-tale strings and API imports that betray a malicious payload, and to unravel obfuscation with patience and precision. Mastering these skills won't happen overnight, but with each disassembled function you decode, you strengthen your ability to see through a malware author's smokescreen.

Reference:

Practical Malware Analysis: The hands-on guide to dissecting malicious software [Michael Sikorski, Andrew Honig]

Read other posts

← VARSITYMENTOR EXPOSURE CAMP CTFROOM INTERNSHIP REPORT →