In the previous article, we discussed the origins of vulnerabilities and how having an offensive mindset can be a great tool for defense. Now, we are going to learn the fundamentals of assembly code because it is the foundation for reverse engineering. In addition, most binary disassembly software are not able to translate bytes into C-like code (except for Ghidra, and there are a few exceptions to this that we will talk about later).
As mentioned before, assembly languages can be thought of as a list of instructions that describes exactly what the computer is doing. In other words, it is almost a direct translation from bytes to human readable instructions.
If you look at the example above, you can see that the bytes from the binary are on the left, and assembly representation of the bytes are on the right. We can often find patterns to determine whether certain bytes call a specific instruction (e.g. 0xe8 is call) or accessing a specific place in memory. Understanding these patterns can help construct shell code (as we will see later, most exploits are trying to gain shell access or higher level (root) privileges).
More important than understanding how bytes correlate with instructions, it is essential to understand how to read assembly code. There are many manuals and cheat sheets to help you learn how to read assembly (e.g. https://cs.brown.edu/courses/cs033/docs/guides/x64_cheatsheet.pdf). It will take time to be able to become fluent in assembly; however, it is well worth it because most disassembled binaries will be in assembly format.
In addition to learning instructions, you need to become familiar with the different syntax related to assembly. The two that are most used are Intel and AT&T (http://staffwww.fullcoll.edu/aclifton/courses/cs241/syntax.html). You could come across both during your journey in binary exploitation and reverse engineering. Intel is generally more common; however, we have noticed that academia uses AT&T syntax more frequently.
The most useful tool that we use when learning/teaching reverse engineering Is GDB. It is the most effective way in teaching me how binary exploitation works because it forces you to have an understanding about the program is doing. It also allows you to step through the code and view memory in the middle of execution. Several important concepts that you should learn prior to reverse engineering with GDB is disassembling a function (disass <function name>), reading data from a register or the stack (x/#xw $reg), and being able to step through a program/set break points. It is also important to be able to understand how to find what functions are in the binary. A good cheat sheet that we recommend is from the University of Texas (https://users.ece.utexas.edu/~adnan/gdb-refcard.pdf); however, we recommend googling tutorials, as they will teach you tricks to make it more effective.
To get started, we created a list of tasks that you should complete before reading the next article:
1) Create a hello world program, compile it, then disassemble it using GDB. (gcc helloworld.c -o helloworld)
2) Write down a list of instructions that you do not know and google them.
3) Do steps two and three again; however, have the program’s main function call another function, include variables, and have a return value (in the second function).
4) Use gdb to set breakpoints, step through, see values in registers, examine the stack.
Now it is your turn to start your reverse engineering course. Be patient, Google everything that you do not know, and be creative. If you have any questions, contact us at email@example.com. We look forward to teaching you, so stay tuned for the next blog post. The next post will be the first one covering buffer overflows.