Creating an ISO with GRUB and Your kernel as an elf File

Source Material and Usage

The following article contains a very good explanation about how to create an iso image that contains the GRUB bootloader to load your own kernel which has the elf file format.

Just type make in the folder. It will run the makefile that creates an iso image. You should create a cross compiler for 32 bit x86 code using the System V ABI for the kernel binary to be created correctly. See this Article.

You can then run that iso image on Ubuntu with either VirtualBox, bochs or qemu.

For qemu, first install qemu:

sudo apt-get install qemu
sudo apt-get install qemu-system-i386

Then start qemu using the iso file:

qemu-system-i386 -m 2G -cdrom image.iso

For VirtualBox, you can install VirtualBox via the Ubuntu activities search box. Then in the graphical user interface create a virtual machine and configure it to have your iso loaded in the cdrom drive and start it.

Bootloader and Kernel

The GRUB bootloader implements the Multiboot specification. The multiboot specification describes how a bootloader and a kernel can interact with each other. Any bootloader implementing the Multiboot specification can load any operating system that also adheres to the Multiboot specification.

What does it mean for an operating system to be multiboot compliant? The specification in section “3.1 OS image format” states that the operating system kernel binary must contain a Multiboot header structure in its first 8192 bytes. The structure must be contained in the text segment. Only if after scanning the text segment this structure can be found by a multiboot bootloader, the bootloader will recognize the binary as a kernel and list it in the list of bootable operating systems for example.

Do not confuse the Multiboot header structure with the Multiboot Information Data structure. The Multiboot header is part of the OS kernel binary and is a service for the bootloader by the kernel, whereas the Multiboot Information Data structure is passed to the kernel main function as a parameter by the bootloader and is a service by the bootloader for the operating system.

Multiboot Information Data Structure

The Multiboot specification under section “3.3 Boot information format” says

Upon entry to the operating system, the EBX register contains the physical address of a Multiboot information data structure, through which the boot loader communicates vital information to the operating system. The operating system can use or ignore any parts of the structure as it chooses; all information passed by the boot loader is advisory only.

The Multiboot Information Data structure is a way to transfer information from the bootloader into the kernel. This way, the kernel can learn for example how much physical memory is available.

To use it, you must define the structure. The easiest way to define the structure is to insert the official header file into your codebase. The bottom of the multiboot specification contains the header file in raw text form along with a small example operating system that shows how to use that structure.

Kernel Start Address – At which address will the kernel be loaded by GRUB?

There is no fixed value, it depends on what the elf file instructs GRUB to do.

The kernel binary created from the article is an elf file. The elf file format contains the address it expects to be loaded at. After loading an elf file the elf file should be under the requested address in virtual/physical memory so that all absolute addresses in the application actually point to the correct location.

GRUB is capable of loading elf binary files. If it loads a elf binary, it will load it at the physical address that the elf file requests. The linker script that creates the elf file looks like this:

OUTPUT_FORMAT(elf32-i386)
ENTRY(start)
SECTIONS
 {
   . = 1M;
   .text BLOCK(4K) : ALIGN(4K)
   {
       *(.multiboot)
       *(.text)
   }
   .data : { *(.data) }
   .bss  : { *(.bss)  }
 }

It first tells the linker to create a elf binary. Then, under the sections block, it lets the text segment start at 1M which is one megabyte = 0x00100000. The elf file will now specify that it wants the text segment to be loaded at 1M. The text segment contains the kernel’s executable code. So the kernel is loaded to 1M by GRUB in this example. You could also choose another physical address to load your kernel to.

Kernel End Address – How does one know how large the kernel is and where usable memory starts

The kernel itself is a binary and is placed into memory. As the binary has a certain size in bytes, it will take up a certain amount of memory. So far we know where the kernel binary starts in memory (defined by the linker, contained in the elf binary).

How do we figure out, where the kernel binary ends and where free space starts after the kernel? Another thing to keep in mind is that the kernel binary will keep getting larger and larger the more features and therefore code you add to your kernel codebase. It is inconvenient to make an assumption about an upper bound of the kernel’s size.

Also how does a bootloader know how many sectors to copy from the elf binary into RAM? If the bootloader copies too many sectors, that is not a problem aside from a waste of space. If the bootloader copies too few sectors, only a part of the kernel is available in RAM which will have fatal consequences. Your functions will work just fine until in the midst of execution, your code will have incorrect behaviour such as not outputting the log statements you expect or just total fault of all operations. The reason is that the instruction pointer just moves into parts of the memory that should contain more kernel code but just have not been loaded by the bootloader.

GRUB as a bootloader will determine how large your kernel is by looking at the metadata in the elf file. You do not have to worry about GRUB. If you write your own custom bootloader that is a problem you have to solve. Also if your kernel is a flat binary file and not an elf binary with metadata, how does GRUB or your custom bootloader know how many sectors to loader into RAM to load the entire kernel?

A problem you have to solve in your kernel (!= bootloader) is to figure out, where the first address is that can be use to store data in RAM (= placement address).

After the kernel boots, you will be in a state where paging is disabled and no heap is set up. In this phase you will use placement memory which means you put a unsigned byte pointer to the start address (= placement address) and whenever you require n bytes of memory, you increment the placement address pointer by n. The problem is, that this approach is so simple and basic that it lacks a lot of features that a heap has. For example you cannot free memory with the placement memory system because you have no metadata where objects start and where they end. From this lack of features it follows that one way to deal with this situation is to accept the fact that the kernel will never free memory that it has allocated in the phase before paging and a heap have been activated.

How does the kernel learn about the placement address? The kernel code can use a variable that contains an address set by the linker script. If the linker script sets the address after the kernel binary and all kernel segments, the kernel code suddenly knows about an address where placement memory can start. The linker can set the address correctly because he constructs the binaries and segments and hence knows their sizes. An example of such a linker script is James Molloys linker script. Check out the end label in the linker script. end is the address where the placement memory could start.

/* Link.ld -- Linker script for the kernel - ensure everything goes in the */
/*            Correct place.  */
/*            Original file taken from Bran's Kernel Development */
/*            tutorials: http://www.osdever.net/bkerndev/index.php. */

ENTRY(start)
SECTIONS
{

    .text 0x100000 :
    {
        code = .; _code = .; __code = .;
        *(.text)
        . = ALIGN(4096);
    }

    .data :
    {
        data = .; _data = .; __data = .;
        *(.data)
        *(.rodata)
        . = ALIGN(4096);
    }

    .bss :
    {
        bss = .; _bss = .; __bss = .;
        *(.bss)
        . = ALIGN(4096);
    }

    end = .; _end = .; __end = .;
}

Now the kernel code can now use C code to make use of the end label:

// end is defined in the linker script.
extern u32int end;
u32int placement_address = (u32int)&end

If you look closely, the end variable’s value is not used at all! It is the end variable’s address that is used to retrieve the end of the kernel! (See here)

Working with GRUB Modules

GRUB can, besides loading your kernel, put so called modules in memory. Modules are files (code binaries or just any arbitrary file) that the kernel can use to provide additional functionality or to read configuration from or do anything with in general. A module could be a binary program such as a hello world test program as an elf binary that you want to run as a test of your kernel elf loader for example. It could be a module that allows the kernel to understand how to read a FAT12 file system. It could also be a prepared image file of a filesystem that the kernel can use during it’s early stages of operation.

If you make use of GRUB’s module loader feature, it is not enough to just know where the kernel binary ends, you need to know where in memory GRUB has put the modules and also how much memory those modules occupy. GRUB will choose a memory address to put the modules. There is no well-known memory location where GRUB puts the modules, you have to retrieve the memory locations from GRUB somehow. You can learn that information from the memory map stored inside the multiboot information data. (Example code at the bottom of the multiboot specification).

Knowing which parts of the memory is occupied by your kernel’s binary, the placement memory and the modules is important because you do not want to override that memory in order to guarantee stable operation of your OS. The OS has to have a way to mark the memory as occupied. As the example OS that is build throughout those articles, will use paging, the most straightforward way is to maintain a bitmap of occupied phyiscal frames as outlined in James Molloy’s article about paging.

How does the kernel interact with the multiboot loader to learn about the end address of the modules? The information can be read from the multiboot information data. To retrieve that structure, the assembler boot code has to push the ebx register onto the stack because ebx contains the address of the multiboot information data which was put into ebx by the multiboot loader.

Let’s implement this idea using GRUB and a custom kernel! In this test, the module is a plain ASCII text file that contains the sentence “This is a plain text file module test.”. Create a file called “test” in the folder that contains the Makefile. Into the file “test” enter the following text:

This is a plain text file module test.

Update the Makefile to copy the “test” file into the boot folder. You can use another folder if you want.

CP := cp
RM := rm -rf
MKDIR := mkdir -pv

BIN = kernel
CFG = grub.cfg
ISO_PATH := iso
BOOT_PATH := $(ISO_PATH)/boot
GRUB_PATH := $(BOOT_PATH)/grub

#GCC := gcc
GCC := ~/dev/cross/install/bin/i386-elf-gcc

#LD := ld
LD := ~/dev/cross/install/bin/i386-elf-ld


.PHONY: all

all: bootloader kernel linker modules iso
  @echo Make has completed.

bootloader: boot.asm
  nasm -f elf32 boot.asm -o boot.o

kernel: kernel.c
  $(GCC) -m32 -c kernel.c -o kernel.o

linker: linker.ld boot.o kernel.o
  $(LD) -m elf_i386 -T linker.ld -o kernel boot.o kernel.o

iso: kernel
  $(MKDIR) $(GRUB_PATH)
  $(CP) $(BIN) $(BOOT_PATH)
  $(CP) $(CFG) $(GRUB_PATH)
  grub-file --is-x86-multiboot $(BOOT_PATH)/$(BIN)
  grub-mkrescue -o image.iso $(ISO_PATH)

modules:
  $(MKDIR) $(GRUB_PATH)
  $(CP) test $(BOOT_PATH)

.PHONY: clean
clean:
  $(RM) *.o $(BIN) *iso

If you build using the command “make”, the “test” file will be part of the created iso image. (It is not part of the kernel itself but part of the iso image).

Let GRUB know about the module so it is loaded alongside your kernel. To configure GRUB, change the menuentry in the grub.cfg file and add the “module” keyword and pass in the path to where the “test” file is contained in the iso image (/boot/test in this example).

# timeout in seconds, -1 waits indefinitely without timing out ever
set timeout=-1

# first entry is the default entry to boot after a timeout
set default=0

# custom kernel
# https://www.gnu.org/software/grub/manual/grub/grub.html#menuentry
menuentry "The worst kernel ever" {
        multiboot /boot/kernel
        module /boot/test /boot/test
}

An explanation of all parameters allowed by the menuentry is contained in https://www.gnu.org/software/grub/manual/grub/grub.html#menuentry.

In your assembler file which eventually calls the kernel’s main function, right before calling main, push ebx and eax in this order onto the stack. The multibootloader will write the address of the multiboot information data structure into ebx and the multiboot magic number into eax. Pushing data onto the stack will actually make those pushed bytes available as parameters to the called function in your C code! This is part of the Application Binary Interface (ABI) used which defines this behaviour.

bits 32

section .multiboot               ;according to multiboot spec
        dd 0x1BADB002            ;set magic number for bootloader
        dd 0x0                   ;set flags
        dd - (0x1BADB002 + 0x0)  ;set checksum

section .text
global start
extern main                      ;defined in the C file

start:
        cli                      ;block interrupts
        mov esp, stack_space     ;set stack pointer

        
        push   ebx             ;Push the pointer to the Multiboot information structure.
        push   eax             ;Push the magic value.

        call main                ; call main
        hlt                      ;halt the CPU

section .bss
resb 8192                        ;8KB for stack
stack_space:

Now update your kernel’s main function. An example is contained at the bottom of the multiboot specification. I will only list excerpts from there because there is quite a bit of code involved for printing strings via the BIOS.

void main(unsigned long magic, unsigned long addr) {
    
  multiboot_info_t *mbi;

  terminal_buffer = (unsigned short *)VGA_ADDRESS;

  // clear_screen();
  cls();

  vga_index = 0;
  // print_string("Hello World!", WHITE_COLOR);
  printf("Hello World!\n\n");

  /* Am I booted by a Multiboot-compliant boot loader? */
  if (magic != MULTIBOOT_BOOTLOADER_MAGIC) {
    vga_index = 160;
    // print_string("Invalid magic number", RED);
    printf("Invalid magic number!\n");

    return;
  }

  /* Set MBI to the address of the Multiboot information structure. */
  mbi = (multiboot_info_t *)addr;

  /* Are mods_* valid? */
  if (CHECK_FLAG(mbi->flags, 3)) {

    module_t *mod;
    int i;
    int j;

    printf("mods_count = %d, mods_addr = 0x%x\n", mbi->mods_count,
           mbi->mods_addr);

    for (i = 0, mod = (module_t *)mbi->mods_addr; i < mbi->mods_count;
         i++, mod += sizeof(module_t)) {
      printf(" mod_start = 0x%x, mod_end = 0x%x, string = %s\n", mod->mod_start,
             mod->mod_end, (char *)mod->string);

      // output the first characters from the test module
      char *character = mod->mod_start;
      for (j = 0; j < 37; j++) {
        // putchar(&mod->mod_start);
        putchar((*character));
        character++;
      }

      printf("\n");
    }
  } else {
    printf("No mods found!\n");
  }

  /* Is the section header table of ELF valid? */
  if (CHECK_FLAG(mbi->flags, 5)) {
    elf_section_header_table_t *elf_sec = &(mbi->u.elf_sec);

    printf("elf_sec: num = %d, size = 0x%x,"
           " addr = 0x%x, shndx = 0x%x\n",
           elf_sec->num, elf_sec->size, elf_sec->addr, elf_sec->shndx);
  }

  /* Are mmap_* valid? */
  if (CHECK_FLAG(mbi->flags, 6)) {

    memory_map_t *mmap;

    printf("mmap_addr = 0x%x, mmap_length = 0x%x\n", mbi->mmap_addr,
           mbi->mmap_length);

    for (mmap = (memory_map_t *)mbi->mmap_addr;
         (unsigned long)mmap < mbi->mmap_addr + mbi->mmap_length;
         mmap = (memory_map_t *)((unsigned long)mmap + mmap->size +
                                 sizeof(mmap->size))) {

      printf(" size = 0x%x, base_addr = 0x%x%x,"
             " length = 0x%x%x, type = 0x%x",
             mmap->size, mmap->base_addr_high, mmap->base_addr_low,
             mmap->length_high, mmap->length_low, mmap->type);

      // https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#Boot-modules
      //
      // ‘type’ is the variety of address range represented, where a value of 1
      // indicates available RAM, value of 3 indicates usable memory holding
      // ACPI information, value of 4 indicates reserved memory which needs to
      // be preserved on hibernation, value of 5 indicates a memory which is
      // occupied by defective RAM modules and all other values currently
      // indicated a reserved area.

      switch (mmap->type) {

      case 1:
        printf("Available RAM\n");
        break;

      case 3:
        printf("Usable memory holding ACPI information\n");
        break;

      case 4:
        printf("Reserved memory which needs to be preserved on hibernation\n");
        break;

      case 5:
        printf("Defective RAM\n");
        break;

      default:
        printf("Reserved Area\n");
        break;
      }
    }
  }

  vga_index = 80;
  // print_string("Goodbye World!", WHITE_COLOR);
  printf("Goodbye World!\n");

  return;
}

You can see that the main function now does not have a void parameter any more but it contains the magic number as first parameter and a pointer to the multiboot information data structure as the second parameter.

The kernel’s main function makes use of these parameters to output data about the modules and the memory map. Our goal was to use the custom module which is the plain ASCII text file “test”. The above main function does cheat quite a bit! It loops over all available modules and outputs the first 37 bytes contained in each module. This code assumes that there is a module that contains our “test” file containing a sentence consisting of 37 characters (“This is a plain text file module test.”)!

When you make this project and start the iso using qemu, you will see the text contained in “test” be printed.

As a general reminder: This is not production ready code! This code is merely here to illustrate concepts! It is not good code by any stretch of the imagination! Do not use this code in any project you plan to use for production purposes! Learn the concepts, write your own code, write tests to verify your code and prevent it from regression errors, let your colleagues or peers review your code and tests and only then use your own code for any important purpose! Do yourself a favor and cover your own back.

The Stack

The multiboot specification in “3.2 Machine State” states that the register ESP has an undefined value after booting and then ads:

The OS image must create its own stack as soon as it needs one.

Maybe GRUB will initialize a valid stack but according to the specification it does not have to. The kernel should therefore always creates a stack for itself. This is advisable because if your OS is loaded by a multiboot bootloader other than GRUB, this bootloader might not set a stack pointer and your OS has to be prepared for that situation.

Writing the ISO to a bootable USB Stick

In order to use your operating system on a real machine outside an emulator, you have to get the machine to boot your operating system. The most straightforward way is to boot from USB.

You need a USB stick which contains no relevant data as the USB stick will be erased in the process. You also need your operating system packaged as an ISO. On the machine in the BIOS settings, make sure that the machine does use USB as one of the entries in it’s boot order.

Creating a bootable ISO cannot be achieved by just manually copying the ISO file to an existing USB stick, as your ISO file is then just contained on the stick just like a regular file in the filesystem. Instead, you can use a tool that correctly lays out the ISO on the USB stick so that the machine can recognize the USB stick as a bootable media.

On ubuntu you can use the Startup Disk Creator application which comes preinstalled with the standard ubuntu installation.

The Startup Disk Creator is a little bit on in the sense that it will not accept your custom iso image. I had to rename the image to give it the .img extension. So instead of image.iso, I had to rename it to image.img

mv image.iso image.img

After you have your image.img file, load it in Startup Disk Creator, select the USB stick to write it to and start the process. The USB stick now can be used to boot your operating system.

If you prefer to use command line only to create the USB stick, you should look at this post. It maybe contains a command line only solution.