Nick Mudge Ignition Software Consulting & Development

I'm slowly working on an operating system, from machine code up. This doesn't mean I'll write everything in machine code, it just means I'm starting with machine code and plan to bootstrap from there.

I've written the beginnings of a boot loader in machine code. I'm using a C program to output the machine code into it's own file. Here's the C program:

#include <stdio.h>
#define NUM 5

typedef struct {
    char *string;
    int size;
} Instruction;

int write_instructions(Instruction ins[]);

int write_instructions(Instruction ins[]) {
    FILE *fp;
    int bwritten = 0;
    int i;
    fp = fopen("./output", "w");
    for(i = 0; i < NUM; i++) {
        bwritten += fwrite(ins[i].string, 1, ins[i].size, fp);
    }
    fclose(fp);
    return bwritten;
}

int main() {    
    Instruction ins[NUM];

    ins[0].string = "\xb8\x00\xb8";         /* mov $0xb800, %ax */
    ins[0].size = 3;
    ins[1].string = "\x8e\xd8";             /* mov %ax, %ds*/
    ins[1].size = 2;
    ins[2].string = "\xc6\x06\x00\x00\x43"; /* movb $'A', 0 */
    ins[2].size = 5;
    ins[3].string = "\xc6\x06\x01\x00\x1e"; /* movb $0x1e, 1 */
    ins[3].size = 5;
    ins[4].string = "\xeb\xfe";             /* jmp 254 (-2) */
    ins[4].size = 2;
    printf("%d bytes written\n", write_instructions(ins));
    return 0;

} 

The equivalent AT&T syntax assembly is commented on the right. What this program does in terms of assembly code is described in this article. I'm just going to explain here how I got the machine instructions.

I found out what the hex codes would be for the instructions by looking them up in the Intel Instruction Set Reference, A-M book. And by verifying by looking at the machine code generated by the assembly language equivalent program.

Let's look at the first instruction: \xb8\x00\xb8
This puts the value b800h in the ax register. Notice that the bytes of the value are reversed. This is because Intel processors are little endian, which means that bytes of values are reversed in order. More info
The part of a machine code that determines what the instruction actually does is called the opcode. In this case the opcode is b8, which moves a specified 16-bit value (called an immediate value) into a register. Now notice that b8 doesn't say which register to move the value into. Which register is determined by adding a number to b8. Add 0 for the ax register, 1 for the cx register, 2 for the dx register and so on. In this case b8 + 0 equals b8, so that's the opcode.

The instruction \x8e\xd8 moves the value in the ax register into the data segment register (ds). But how does this instruction specify these two registers? The first byte 8e is the opcode for moving a 16-bit value from a general purpose register or memory location into a segment register. The second byte specifies which register or memory location to move the value to which register. The first 5 bits specify the general purpose register or memory location, the last 3 bits specify which segment register. In the Intel Instruction Set Reference book A-M there are tables that map the registers and addressing modes to hex numbers, and vice-versa.

The instruction \xc6\x06\x00\x00\x43 puts the ASCII value for A in memory address 0 (indexed off the data segment register). c6 is part of an opcode that indicates that an 8-bit immediate value will be place in a register or memory location. 06 indicates that the immediate value will be placed in a memory location specified by the next two bytes. 43 is the hex value for A.

The \xeb\xfe instruction jumps execution to two bytes ago. And two bytes ago is \xeb\xfe. So this is a jump that immediately jumps to itself. eb is a relative jump. It indicates that the next byte says how many bytes to jump forward or backward. In decimal the hex number fe is 254. This would seem to indicate that the code should jump 254 bytes forward in memory. However, since the CPU uses the two's complement system, numbers over 127 are negative numbers. 254 is -2.

Comments

Marshall
6 April 2009

Hey Nick,

You should take a look at in-lining assembly code. It's the best way to go about it without having to dump the raw bytes. It looks like this:

int
func()
{
asm("accept_request:"
"movl %1, %%ebx;"
"push %%ebx;"
"push %%esp;"
);
}

you can pass it arguments, get the return values and everything. Look here for more info:http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

This is the way it's done.
Nick Mudge
10 April 2009

Thank you. This is good to know.
Name: (required)
Email: (required)
Website:
What has four legs, rhymes with bat and says, "Meow?" (One word answer.)
Spam Filter: