James Molloys Paging and Heap

Why is it not safe to call kmalloc between identity mapping and once the heap is active

James Molloys tutorials are the best source I can find about implementing paging including a heap.

According to osdev wiki the code has some flaws but there is no other write up which goes into the same detail as James Molloy so I read his tutorials quite a bit and I take away a lot from them.

I had a hard time understanding why James Molloy states that between identity mapping the frames and activating the heap, calls to his kmalloc() function are prohibited and the placement_address memory pointer should not be moved.

I think I finally figured out, why he organizes his heap setup code the way he did it and why he does not want to call kmalloc after identity mapping frames and before the heap is functional.

The reason is that in his initialize_paging() function, during identity mapping he iterates over all frames from 0x0000 up to the current value of placement_address. The area between 0x0000 up to placement_address contains everything that was loaded into memory by grub, that means the kernel code and all the data the kernel uses. The area also contains the Page Directory and Page Tables created so far. It also contains the heap’s Page Tables and Page Table Entries. This are should not be overridden ever otherwise the system will crash. To prevent the area from being overridden, the frames covering this area are allocated by calling alloc_frame(). Once a frame is allocated, it is marked as used and will never be handed out to any other program by the heap. Allocating frames from 0x0000 to placement_address is also called the identity mapping loop:

i = 0;
while (i < placement_address+0x1000)
{
    // Kernel code is readable but not writeable from userspace.
    alloc_frame( get_page(i, 1, kernel_directory), 0, 0 );
    i += 0x1000;
}

If at this point, kmalloc is called, kmalloc would move placement_address further and it would use memory that is not secured by a frame and hance could be overriden if a frame is created over that data and handed out to an application by the heap. That is why after identity mapping, James Molloy does not want kmalloc to be called.

In his tutorials he then changes the code of kmalloc to use the heap if the heap is finally activated. That means if the heap is initialized, kmalloc will not use placement_address any more but it will use the heap and it is safe to call kmalloc again.

That is why he says: kmalloc should not be called between identity_mapping and the point when the heap is ready.

Why is the heap paging code split in two parts?

If you look at initialise_paging() you can see that pages are created for the heap before identity mapping and frames are assigned to the pages after identity mapping().

By creating pages for the heap, I mean that a Page Table Entry is requested. Requesting a Page Table Entry can potentially trigger a call to kmalloc as if a Page Table is full while a new page is requested, a new block of memory has to be allocated to house a new Page Table. This basically means that in order to use paging, there is a overhead for management/meta data which is Page Directory, Page Directory Entry, Page Tables and Page Table Entries.

James Molloy first creates Page Tables and Page Table Entries for the heap, without allocating frames for the Page Tables and Page Table Entries. He then identity maps the area from 0x0000 up to placement_address and after that allocates frames for the heap entries.

The reason is that he wants the heap page tables and entries to be stored within the area 0x0000 up to placement_address. He wants that data to be located in the identity mapped frames. He does not assign frames to the heaps Page Table Entries because if he would assign frames, they would be placed at the address 0x0000 because no frames are used yet.

To understand why the first allocated frame goes to 0x0000, you have to understand how James Molloy decides which frames should be used next. To decide which frame to use, he maintains a bitmap that points to all frames from 0 to MAX_FRAME in that order and contains the information whether that frame is used or still available. Whenever a frame is needed, the next free frame in that order is used.

Because no frames have ever been used before the heap is created, the first free frame is frame 0x0000. The heap frames should not be located at 0x0000 because 0x0000 has to contain the kernels frames so they are identity mapped. That is why the heap frames are only assigned after the identity mapping has used all the frames it needs to cover the kernel. The heap frames are then taken from some part of the physical memory but not from the kernels identity mapped area.

The Memory Map

A diagram helps to understand the situation. The diagram shows the memory map used by James Molloy. The memory starts from 0x0000 at the bottom and goes up to where placement_address is currently located. The linker script puts the kernel code .text, .bss and all other sections into the memory between 0x0000 and placement_address.

The kernel will have the information where the linker has put the last section. The kernel will that initialize placement_address to a memory location after the last loaded section. placement_address will then grow upwards, whenever kmalloc is called before the heap is active. All kmalloc does without heap is just to move placement_address upwards so that the caller can use the memory without the memory being used twice by some other program. Once the heap is active, the next free frame in the bitmap of frames is used when someone requests memory and a free frame could be anywhere in RAM.

What happens during identity mapping can be visualized on the diagram pretty well. identity mapping is basically looping over the memory from 0x0000 up to placement_address, covering the entire RAM by frames along the way. The frames are marked as used. Once the iteration is done, all kernel code, kernel data and all files and modules loaded by grub are secured by frames with a set used flag and noone will override that data because the heap will not return those frames to any other process.

Leave a Reply