Assembly Programming and Debugging in Eclipse on Linux

The original article my article is based on is: http://dorkasaurusrex.blogspot.com/2009/05/debugging-assembly-in-eclipse.html. While being good information to achieve the ultimate goal it is outdated in the year 2020 since Eclipse has changed so much since the article was published.

This article will use the 12-2019 Eclipse for C++ developers and the Linux GCC Toolchain to assemble and debug code within Eclipse.

Basically Eclipse 12-2019 is installed. A C++ project is created. A main.S file is added. The compiler and linker options are adjusted. A debug configuration is created and the sample application is compiled and debugged.

Installing and running Eclipse on Linux

The file eclipse-cpp-2019-12-R-linux-gtk-x86_64.tar.gz is downloaded and extracted.

tar -zxvf eclipse-cpp-2019-12-R-linux-gtk-x86_64.tar.gz

A eclipse folder is created. Eclipse can be run with:

./eclipse/eclipse
Create a Project

File > New > C++ Project > C++ Managed Build > Executable > Empty Project > Linux GCC > Name: test

Add an Assembly File

File > New > Source File > main.S

The casing of the filename is in fact relevant! There has to be an uppercase S as an extension. The S extension is used by Eclipse to select the correct tools from the toolchain to assemble and link your application! If you use .s it will not work!

Add assembler code to the main.S file.

.section .data
message: .string "Hello World!\n"

.section .text

# this directive allows the linker to see the "main" label
# which is our entry point
.globl main

# this directive allows the eclipse gdb to see a function called "main"
.func main
main:
mov $4, %eax
mov $1, %ebx
mov $message, %ecx
mov $14, %edx
int $0x80
mov $0, %eax
Adjust the Tool Settings

Project > Properties > C/C++ Build > Settings > Tab: Tool Settings > GCC Assembler > Command line pattern append: -g –gstabs immediately following ${COMMAND}

PIE or NO-PIE

The way Linux, Unix and BSD systems assemble code has changed according to the posts:

https://stackoverflow.com/questions/58106310/nasm-linux-shared-object-error-relocation-r-x86-64-32s-against-data

https://stackoverflow.com/questions/46123505/assembling-with-gcc-causes-weird-relocation-error-with-regards-to-data

Instead of using absolute addresses, the 64 bit way of modern assembly is now relocatable assembly. The code listing above is not moden 64 bit relocatable assembly.

The way to get the Linux GCC toolchain to assemble the code anyways is to add the -no-pie flag to the linker:

Project > Properties > C++ Build > Settings > GCC C++ Linker > Command Line Patter > add -no-pie after ${FLAGS}

${COMMAND} ${FLAGS} -no-pie ${OUTPUT_FLAG} ${OUTPUT_PREFIX}${OUTPUT} ${INPUTS}
Create a Debug Configuration

For eclipse to start an application from within the IDE, you have to supply a Run Configuration for normal execution or a Debug Configuration for debugging the application. Run / Debug Configurations are descriptors of an application run. They contain things such as environment variables and command line parameters to the application amongst other options and parameters.

First make sure your application does in fact build properly. In the next step you have to select your binary, that is why you have to be able to build your project.

In our case, in Eclipse, create a new Debug Configuration. As a type select ‘C/C++ Application’. Click Browser and select your binary for the C/C++ Application input field.

Set a breakpoint into the assembler source code and start the Debug configuration.

Read From stdin in Linux Assembler

Reading from stdin means to let the user type text and to consume that text in an application as soon as the user finishes their input by typing enter. Enter will add a linefeed character in Linux

\n = 10 = 0x0A = line feed

The user input first goes into a Linux buffer. You can call a Linux function to retrieve an amount of bytes from that buffer. Once you retrieved bytes, those bytes are subtracted from the Linux buffer so it contains only the input that was not consumed yet. You should always consume the Linux Buffer completely so that it is empty. The reason is that the buffer survives function calls. When you ask the user to input new data on a new occasion, the same input buffer is used. If it was not drained, old input will be read. The second user input might goes behind the existing data. You will expect new data but you are reading the old data first! So always drain the input buffer when asking the user for input, even if you are only interested in the first n characters.

The input is read into a array variable in your application (array of consecutive bytes in the data section). The array variable has to be defined with a fixed length in assembler, e.g. you define a byte array of 100 bytes.

Two things can happen when the user types and sends the input via enter:

  1. The user input from the Linux buffer and the newline fit into the variable in it’s entirety
  2. The user input from the Linux buffer and the newline is too large to fit into the variable.

If the input fits into the buffer, you just have to call the Linux function once which will then drain the entire input buffer. If the input is too large for your variable, you have call the Linux function several times until the Linux Input buffer is empty.

Implementation wise, reading from stdin can be done via int 80h which lets an assembler application call the Linux interrupt 80h. int 80h supports several functions https://www.tutorialspoint.com/assembly_programming/assembly_system_calls.htm. You select the function by putting its id into the eax register.

Reading from stdin has the id 3. ebx remains 0, ecx contains the array variable to put the bytes into. edx contains the amount of bytes to read, which is set to the length of the array variable.

To find out how many characters really were read from the function 3, function 3 will put the amount of bytes read into eax.

The implementation here is taken from https://stackoverflow.com/questions/23468176/read-and-print-user-input-with-x86-assembly-gnu-linux

It will read the first 5 bytes into an array variable and then it will drain the Linux input buffers one byte at a time by reading bytes into a dummy character variable until it sees the newline character. The dummy character is not processed further which means all the rest of the input is just ignored by this solution. In other words this code is only interested in the first 5 bytes and it will ignore the entire rest. The program then proceeds to output the first 5 bytes before it terminates itself.

BUFFER_SIZE equ 5
LINE_FEED equ 10

global _start           ; must be declared for using gcc ???

section .data
    str: times BUFFER_SIZE db 0 ; Allocate buffer of x bytes
    lf:  db 10          ; LF line feed

section .bss
    e1_len resd 1
    dummy resd 1

section .text

_start:                 ; tell linker entry point ???
    ; https://stackoverflow.com/questions/23468176/read-and-print-user-input-with-x86-assembly-gnu-linux

; read using function 3 (sys_read)
    mov eax, 3          ; Read user input into str
    mov ebx, 0          ; |
    mov ecx, str        ; | <- destination
    mov edx, BUFFER_SIZE        ; | <- length
    int 80h             ; \

    mov [e1_len], eax   ; Store number of inputted bytes
    cmp eax, edx        ; all bytes read?
    jb .2               ; yes: ok
    mov bl, [ecx+eax-1] ; BL = last byte in buffer
    cmp bl, LINE_FEED   ; LF in buffer?
    je .2               ; yes: ok
    inc DWORD [e1_len]  ; no: length++ (include 'lf')

; drain the linux input buffer
    .1:                 ; Loop
    mov eax, 3           ; SYS_READ
    mov ebx, 0          ; EBX=0: STDIN
    mov ecx, dummy      ; pointer to a temporary buffer
    mov edx, 1          ; read one byte
    int 0x80            ; syscall
    test eax, eax       ; EOF? eax contains the amount of bytes read
    jz .2               ; yes: ok
    mov al, [dummy]     ; AL = character
    cmp al, LINE_FEED   ; character = LF
    jne .1              ; no -> next character
    .2:                 ; end of loop

; output the array variable using function 4 from int 80h (sys_write)
    mov eax, 4          ; Print 100 bytes starting from str
    mov ebx, 1          ; |
    mov ecx, str        ; | <- source
    mov edx, [e1_len]   ; | <- length
    int 80h             ; \

; return using function 1 from int 80h (sys_exit)
    mov eax, 1          ; Return
    mov ebx, 0          ; | <- return code
    int 80h             ; \
TARGET_DIR := target
MKDIR_P = mkdir -p

all: directories main

main: main.o
	ld target/main.o -o target/main

main.o: main.asm
	nasm -f elf64 main.asm -o target/main.o

clean:
	rm target/main.o target/main

# https://www.gnu.org/software/make/manual/html_node/Phony-Targets.html
.PHONY: directories
directories: ${TARGET_DIR}

${TARGET_DIR}:
	${MKDIR_P} ${TARGET_DIR}