Lab 5: Exploring Assembly Language on x86_64 and AArch64 Architectures

Introduction

In this blog post, I’ll share my experience working with assembly language on both x86_64 and AArch64 architectures as part of the SPO600 lab. This lab was an eye-opening journey into the low-level world of programming, where I got to write, debug, and optimize assembly code. I’ll walk you through the steps I took, the challenges I faced, and my thoughts on working with different assembly languages.


1. Getting Started with the Lab

The lab began with unpacking the provided archive containing example programs in both C and assembly language. The directory structure was well-organized, with separate folders for x86_64 and AArch64 assembly code, as well as portable C versions for comparison. Using the tar command, I extracted the archive and explored the files.

The first step was to build and run the C versions of the "Hello World" program on both architectures. This helped me understand the differences in the generated binaries and the underlying system calls. Using objdump -d, I disassembled the binaries to inspect the machine code and compare it with the assembly output generated by gcc -S. This was a great way to see how high-level C code translates into low-level assembly instructions.



2. AArch64 Assembly: Writing and Debugging

The next step was to work with the AArch64 assembly code. I reviewed the provided "Hello World" example and built it using make. The objdump -d command allowed me to disassemble the binary and compare it with the source code. This was particularly helpful in understanding how the assembler translates human-readable instructions into machine code.

Adding a Loop

I then modified the AArch64 code to include a loop that prints a message multiple times. Here’s the basic loop structure provided:

.text

.globl _start

min = 0

max = 6

_start:

    mov x19, min

loop:

    /* ... body of the loop ... */

    add x19, x19, 1

    cmp x19, max

    b.ne loop

    mov x0, 0

    mov x8, 93

    svc 0

To make the loop more interesting, I combined it with the "Hello World" code to print "Loop" multiple times. Then, I modified the message to include the loop index, printing values from 0 to 5. Converting the integer index to a character required adding 48 (the ASCII value for '0') to the index value.

Extending the Loop

Next, I extended the loop to print values from 00 to 32 as two-digit decimal numbers. This involved dividing the index by 10 to separate the tens and units digits, converting each digit to a character, and printing them. I also added logic to suppress the leading zero for single-digit numbers, making the output cleaner.

Hexadecimal Output

Finally, I modified the code to print the loop index in hexadecimal format. This required converting the index to its hexadecimal representation and printing it character by character.


3. x86_64 Assembly: A Different Beast

After completing the AArch64 portion, I moved on to the x86_64 architecture. The loop structure in x86_64 assembly looked quite different:

.text

.globl _start

min = 0

max = 5


_start:

    mov $min, %r15

loop:

    /* ... body of the loop ... */

    inc %r15

    cmp $max, %r15

    jne loop


    mov $0, %rdi

    mov $60, %rax

    syscall

I repeated the same steps as with AArch64: adding a loop, printing the index, and extending it to handle two-digit numbers and hexadecimal output. The x86_64 division instruction (div) was particularly tricky, as it requires setting up specific registers before performing the division.


4. Comparing Assembly Languages

Working with both AArch64 and x86_64 assembly languages was a fascinating experience. Here are some key differences I noticed:


  • AArch64: The instruction set is more modern and streamlined. The use of general-purpose registers (x0-x30) and the straightforward syntax made it relatively easy to work with. The division and remainder operations required separate instructions, which added a bit of complexity.
  • x86_64: The instruction set is more complex, with a mix of general-purpose and specialized registers. The division instruction (div) requires careful setup, and the overall syntax feels more archaic compared to AArch64.
  • 6502: While not part of this lab, I’ve worked with 6502 assembly in the past. Compared to AArch64 and x86_64, the 6502 feels much simpler but also more limited due to its 8-bit architecture and small register set.


5. Challenges and Learnings

Writing and debugging assembly code was both challenging and rewarding. Here are some key takeaways:

  • Debugging: Debugging assembly code requires a deep understanding of the CPU’s state at each step. Tools like objdump and gdb were invaluable for inspecting registers and stepping through instructions.
  • Performance: Writing efficient assembly code requires careful consideration of instruction selection and register usage. Even small optimizations can have a significant impact on performance.
  • Portability: Unlike high-level languages, assembly code is not portable across architectures. This makes it essential to understand the specific nuances of each platform.


6. Final Thoughts

This lab was an excellent introduction to assembly language programming on modern architectures. It reinforced my understanding of how high-level code is executed at the hardware level and gave me a newfound appreciation for the complexity of modern CPUs.

While assembly language is rarely used for everyday programming, it remains a powerful tool for performance-critical applications and low-level system programming. The experience of writing and debugging assembly code has made me a better programmer overall, as it deepened my understanding of how computers work under the hood.


Conclusion

This lab was a challenging but rewarding experience that gave me hands-on experience with assembly language programming on two major architectures. I hope this blog post provides a clear overview of my journey and the insights I gained. If you’re interested in low-level programming, I highly recommend diving into assembly language—it’s a skill that will deepen your understanding of computing and make you a more versatile programmer.


Optional Challenge: Printing Times Tables in AArch64 Assembly

As an optional challenge, I decided to write an AArch64 assembly program to print the times tables from 1 to 12. The goal was to format the output neatly, suppress leading zeros, and add a spacer between each table. Here’s how I approached the problem and the code I used to achieve it.


The Problem

The program needed to print the times tables in the following format:

  1 x  1 =   1

  2 x  1 =   2

  3 x  1 =   3

  ...

 12 x 12 = 144

Each table (1x1 to 12x12) should be separated by a spacer line (-------------), and the numbers should be formatted with proper alignment.


The Solution

I wrote an AArch64 assembly program that uses nested loops to calculate and print the times tables. The outer loop iterates through the multipliers (1 to 12), while the inner loop iterates through the multiplicands (1 to 12). The product is calculated using the mul instruction, and the results are printed using a formatted string.


Here’s the code I used:

// AArch64 Assembly Program to Print Times Tables


// Data Section: Define constants and strings

.data

msg:        .asciz  "  %d x %2d = %3d\n"  // Format string for the output

spacer:     .asciz  " -------------\n"    // Spacer between tables

newline:    .asciz  "\n"                  // Newline character


// Text Section: Program code

.text

.globl _start


// Main Program

_start:

    mov x19, 1                  // Outer loop counter (multiplier)

outer_loop:

    mov x20, 1                  // Inner loop counter (multiplicand)

inner_loop:

    // Calculate the product (x21 = x19 * x20)

    mul x21, x19, x20


    // Print the formatted output

    ldr x0, =msg                // Load the format string address

    mov x1, x19                 // First argument: multiplier

    mov x2, x20                 // Second argument: multiplicand

    mov x3, x21                 // Third argument: product

    bl printf                   // Call printf


    // Increment the inner loop counter

    add x20, x20, 1

    cmp x20, 12                 // Check if inner loop counter <= 12

    b.le inner_loop             // If true, continue inner loop


    // Print the spacer

    ldr x0, =spacer             // Load the spacer string address

    bl printf                   // Call printf


    // Increment the outer loop counter

    add x19, x19, 1

    cmp x19, 12                 // Check if outer loop counter <= 12

    b.le outer_loop             // If true, continue outer loop


    // Exit the program

    mov x0, 0                   // Exit status code (0 = success)

    mov x8, 93                  // Syscall number for exit (93)

    svc 0                       // Invoke syscall


Key Features of the Code

1.Nested Loops:

  • The outer loop (outer_loop) iterates through the multipliers (1 to 12).
  • The inner loop (inner_loop) iterates through the multiplicands (1 to 12).

2.Formatted Output:

  • The msg format string (" %d x %2d = %3d\n") ensures proper alignment of the numbers.
  • The printf function is used to print the formatted output.

3.Spacer Between Tables:

  • After each inner loop completes, a spacer line (-------------) is printed to separate the tables.

4.Exit Syscall:

  • The program exits gracefully using the exit syscall (svc 0).


Challenges and Learnings

  • Formatting: Ensuring proper alignment of the numbers required careful use of format specifiers (%2d and %3d).
  • Nested Loops: Managing the loop counters (x19 and x20) and ensuring they reset correctly was a key challenge.
  • Function Calls: Using printf in assembly required setting up the arguments in the correct registers (x0, x1, x2, x3).

Conclusion

This optional challenge was a great way to practice writing structured assembly code. It reinforced my understanding of loops, function calls, and formatted output in AArch64 assembly. While assembly language can be verbose and challenging, it’s incredibly rewarding to see the results of your work at such a low level.

Comments

Popular posts from this blog

Lab-1: Exploring 6502 Assembly

Project Stage III: Multi-Clone Analysis in GCC Pass

Lab 2 - 6502 Math Lab