Conquering x86 Assembly
Last semester, I took a course on x86 (32 bit) MASM assembly language. It was objectively a difficult course and about a third of the students dropped. I had a strong approach from the beginning and was able to ace the course. This post describes my strategy for conquering x86 assembly.
Table of Contents
- Table of Contents
- What makes Assembly hard?
- Setting up the Development Environment
- Including the Irvine libraries
- Floor the gas on studying early - use my flashcards
- Thoughts on Code Style
- Additional Resources
- One of my projects
- Final Thoughts
What makes Assembly hard?
Higher level languages share a lot of syntax. It’s not hard to grasp Python if you’ve already mastered Java, or approach Ruby if you understand Javascipt. You’ll see similarities in loops, variable declarations, and functions. You just have to alter the phrasing a bit, but most of the code can be transcribed line-for-line. Assembly, however, deals with processor instructions on the most basic level. A for-loop only exists insofar as you repeatedly jump to a specific line in the code. You can’t just operate on any variable; you have to manually move values into registers to operate them and remove them to memory locations to store them. You can’t just compare variables, you have to store them in registers, execute a compare instruction, then check what flags are set in a flags register to branch. There are more steps for every given process and every movement of information has to be managed by the programmer.
Another major stumbling block is that resources for x86 masm assembly are rather sparse. You can’t simply google an error message and find a Stackoverflow answer. You need to look at textbooks and references and figure out the problem yourself.
What makes Assembly fun?
Assembly can really be satisfying to program, once you overcome the initial learning curve. Getting a grasp of how a processor functions is valuable. At such a low level, there is huge flexibility with what you can do. You will improve your binary arithmetic and really own data sizes. You can make highly efficient processes. There were points while writing assembly that I truly felt in touch with the processor, in a way I haven’t with any other language. I became one with my computer.
Setting up the Development Environment
Except for the very end of the course, I had one groupmate, Luke, who was, fortunately, as interested in conquering Assembly as I was. We diverged from what the professor recommended and created what is, in my opinion, a better system for writing single-file Assembly programs.
The basic development environment is available here on my github.
The professor recommended we use Visual Studio Community Edition to develop assembly. While VS will assemble, and you can find a plugin for syntax highlighting, it’s slow, and it’s annoying to create a whole new project when you just want to assemble a 50 line piece of code to test something. I found it way more straightforward to develop in VSCode.
However, you do need to get VS Community Edition to get the assembler. Luke worked out how to get the actual Assembler out of the Visual Studio files. I can’t share the Assembler itself in the linked development environment, due to Microsoft’s restrictions, but the instructions for where to find the MASM assembler and linker are in the Github readme on the Development environment we created. That repo and readme should provide everything you need to get a VSCode Assembly Development Environment up and running in less than half an hour.
Including the Irvine libraries
We used the textbook Assembly Language for x86 Processors by Kip Irvine as a textbook. I highly recommend purchasing the Irvine textbook. It is easy to follow, much better than any free resources I found online. You can find it on Irvine’s website.
You’ll also want the Irvine libaries, which are available on his Github, along with sample projects. Here’s the zip of the Irvine libraries. I did include this in the development environment but please note, they are only licensed for educational purposes.
The libraries wrap Windows functions to, for example, change the color or move the cursor in the console, read input, or create a popup window. They are very convenient for creating actual functional console applications. And the Irvine textbook provides detailed instructions for using them.
This convenient website catalogs most of the functions, structures, and macros that Irvine provides.
Floor the gas on studying early
I realized early that Assembly was tough and there was a lot of new information to understand. So for the first 2 or 3 weeks, I studied every day and I studied efficiently. I read the textbook and created hundreds of Anki cards from it. I reviewed all the cards daily. I wrote small programs to test things out. Anki is so efficient at slamming information into your memory; I can’t recommend it enough.
Here are my flashcard decks. You will of course have to download Anki, the magnificent open source flash card program. Feel free to use them in your own studies:
- x86assembly - ch1 - overview.apkg
- x86assembly - ch2 - processor architecture.apkg
- x86assembly - ch4 - addressing sections only
- x86assembly - ch5 - procedures and stack - book notes.apkg
- x86assembly - class2 - basics - video notes.apkg
- x86assembly - class3 - instructions and references video notes.apkg
- x86assembly - class4 - mult, div, i_o video notes.apkg
- x86assembly - selected Final topics.apkg
After the initial push, I didn’t have to review the cards much. I’d glance through them before a quiz or test, and I created one last deck to shore up some areas before the final exam, but that initial push set me up for success and made the rest of the class a lot easier.
The cards are flawed in numerous ways but they’re free, and, considering the lack of study materials out there, are probably some of the better study resources you’ll find for beginning MASM anywhere on the internet.
Thoughts on Code Style
The projects we created for this course ran several thousand lines. This quickly becomes very difficult to manage. Having good style is, in my opinion, critical for navigating the code file as it grows.
Over the course of the semester, I developed a system for code style that worked out really well for us.
Descriptions for Procedures
Create descriptions following every procedures explaining what registers they use for parameters and what registers they alter as ‘return’ values. This is critical for being able to easily re-use procedures.
Here’s an example of a simple procedure that takes the al register as a parameter and returns in that same register. The most common source of bugs, in my experience, was unintentionally altering registers.
Don’t miss an opportunity to add semantic information
Use every opportunity to add semantic information to code.
- Labels should be long and desciptive
- Identifiers should be long and descriptive
- Procedure names should be long and descriptive
Here’s an example of how long names and compartmentalizing into many procedures can create readable code, even in x86 assembly:
Use indent-based code folding.
Indent everything so that it can fold, and so you can read it.
- Instructions inside procedures should be tabbed over
- Tab over the areas that will be looped, as if it were a code block
- Compartmentalize as much as possible into procedures
- Group like procedures and create visually obvious dividers between sections of procedures. Have the procedures tabbed more than the dividers to allow for folding on the dividers.
Here’s an example of how a 2743 line project can be displayed in just 35 lines due to folding on indentation levels. The Print Procedures
section is unfolded, as is the Print Errors
subsection. I made frequent use of the VS Code shortcut ctrl + k
ctrl + 0
to refold everything after making edits in a particular location.
Use high-level pseudcode in comments
Especially when you’re debugging, use pseudocode similar to a high level language in adjacent comments to clarify what you’re doing. This is how I tracked down most bugs. I also used it whenever I wanted my group mate to be able to quickly understand what a procedure I wrote was doing.
I imitated C++ style code, with an addition. When describing a register in the comments, I add a double colon (::
) and then a semantic name describing the data that register is holding on that particular line. That way, I could track what was supposed to be going on in registers as I worked.
Here’s an example of a heavily commented procedure.
Additional Resources
Here are the resources I found most useful while coding.
- California State University, Dominguez Hills - catalog of Irvine Procedures
- Felix Cloutier’s complete instruction reference
- A guide to MASM assembly, from Yale University
- Powerpoint slides covering Assembly basics
- Geoff Chapel’s List of everything in Kernel32.dll, for doing extra hacky stuff to Windows
One of my projects
I am including the third project I did for this class with this post. It was an “operating system simulator”. There are two lists of ‘jobs’, a run list and a hold list. Each job has a priority and a run time. When the step command is executed, jobs in the run list have their time decremented until they are completed. You can execute the command help
in the program for a list of available commands. Jobs do not actually do anything real. The program was more about exploring input parsing handling variable input arguments, which was accomplished with a COMMAND and COMMAND_PARAM structure.
I think it is a good example of the code style I have described in this post, and I hope you find it interesting.
Download it here: OS_simulator_project.asm
Final Thoughts
I don’t know how many pure x86 MASM Assembly projects I’ll create in the future. But I definitely expect to read plenty of Assembly while debugging C and C++ code. C++ also allows for in-lining assembly and I look forward to the first opportunity I have to make a procedure blazing fast by in-lining.
I also believe that knowing MASM will translate into other Assembly languages. At some point, maybe I’ll program a microcontroller to do something nifty. Maybe I’ll try to learn ARM. Having these fundamentals from MASM will make such a transition much easier.
In general, I learned a lot about computer architecture from this course and I’m a much stronger programmer because of it.