NetworkProgramming – wfbsoftware

Network Programming Cheat Sheet

IPv4 Addresses

in_addr_t from String

#include <arpa/inet.h>

in_addr_t ipAddr = inet_addr("127.0.0.1");

in_addr_t to String

#include <arpa/inet.h>

in_addr_t ipAddr = inet_addr("127.0.0.1");

char buffer[INET_ADDRSTRLEN];
inet_ntop(AF_INET, &ipAddr, buffer, INET_ADDRSTRLEN);

printf("IP as String: %s\n", buffer);

IPv4 (AF_INET) From String to Binary Form

#include <arpa/inet.h>

uint32_t bits;
inet_pton(AF_INET, "127.0.0.1", &bits);

Ethernet Address to String

#include ???

struct ether_addr eth_addr;

// find ethernet adress and store it into eth_addr ...

printf("MAC: %s\n", ether_ntoa(&eth_addr));

String to Ethernet Address

#include ???

struct ether_addr *eth_addr = ether_aton(param_mac);
if (eth_addr == NULL)
{
    return;
}

printf("MAC: %s\n", ether_ntoa(eth_addr));

free(eth_addr);
eth_addr = NULL;

This section is influenced by this post of https://www.saminiir.com. I will try to explain what is happening in his posts step by step. It was easy for me to follow his general ideas but hard for me to follow them up with running code on macOS. In his post on ARP, Sami describes how to set up a TAP interface and bind to that interface using a C userspace application that implements a subset of the ARP protocol. The C application will contain just enough ARP to answer an arping call to that TAP interface. The C application will return the MAC-Address of the TAP interface as part of the ARP answer.

I will explain TUN/TAP, ARP, MAC-Addresses and user space applications in the rest of this post.

General Thoughts

During my tests on macOS, I made a bunch of experiences and I learned a lot of things. I try to sum up my findings in this section. Please take the points in this section with a grain of salt because I am a beginner with low level network programming and I basically do not understand most of what is happening with the technology at hand. Be warned!

Applications that open() tuntap interfaces should be run with administrator privileges (sudo)!
You should install wireshark! Wireshark allows you to select from a list of interfaces. After selecting one of the interfaces by double clicking, it will then monitor all traffic (ethernet frames) coming in and going out on that interface. You even get a hex dump of the frames with detailed explanation and a digest of the nested protocols inside the ethernet frame. As we will use virtual TAP interfaces, wireshark can bind to the interface and you can check what data is send to your application.
If you have a hexdump of a frame (e.g. copied from wireshark), you can past that hex dump into the Hex Packet Decoder (HPD) on gasmi net https://hpd.gasmi.net/. The Hex Packet Decoder will digest the dump and show you a nicely colored view of which bytes pertain to which of the OSI-Models protocols. You can use your mouse pointer to hover over bytes to get more detailed information. Within each protocol layer, HPD will show you what type of information the bytes carry. HPD can also validate if checksums and parity codes are correct or faulty and which values would be correct. This can help in debugging your check algorithms.
TAP interfaces can be written to and read from by a user space application. If there is no user space application reading from the device, the ethernet frames sent to the device are not forwareded to the internet and are just lost. It is not possible to send ethernet frames for a ICMP echo command to google and expect an answer without a user space application that receives the packages and does the communication tasks.
Sending a ethernet frame to a TAP interface requires the name of the interface (e.g. tap0). The tools wireshark, arping and nping all take the name of the interface to send data to as a command line parameter or from the GUI. A ethernet packet has to be sent to the ethernet interface. Only then can it be retrieved by the user space application that has the TAP interface opened. In contrast to TCP/IP where an application has to know a hostname or IP and a port to send data, for raw ethernet, the interface name and MAC address are needed.
It is not possible to call open() on a TAP interface in more than one process! The second and all following processes that try to open() will get a “Device is busy” error code.
If you want to send ethernet frames to a user space application that has a TAP interface opened, you can use raw sockets on macOS as described in this post: https://www.vankuik.nl/2012-02-09_Writing_ethernet_packets_on_OS_X_and_BSD. The example application outlined in the post is able to send data to the interface without a “Device is busy” error!
If you want to bind to a tcp/ip socket via hostname/IP and a portnumber using a tuntap interface, you have to???

TUN/TAP interfaces

Network traffic usually starts in an application in user space such as a web browser or a email client. That application uses sockets for example to talk to the operating system for sending packages into the network. The socket will send the packages to a network interface.

Usually network interfaces forward packages that are send to them from an application to a driver that drives a hardware network card.

TUN/TAP devices are simulated, virtual network interfaces that forward all packages to a software application in user space instead of to a driver.

The user space software, that gets the packages forwarded, can answer directly or send messages into the network using raw sockets or do something else.

The first of Sami’s ideas is to use a TAP device and let the user space application, that he describes and writes in his blog posts, answer ARP requests that are sent using the arping utility. In this case arping is the application in user space that starts the network traffic.

A TUN device is a OSI-Layer 3 element and works with IP-Packets. A TAP device is a OSI-Layer 2 element and works with ethernet frames.

TUN/TAP interfaces on macOS

Linux provides TUN/TAP devices without installing further software. Unlike Linux, macOS does not have TUN/TAP devices. The free software tuntaposx is a way to install TUN/TAP on macOS.

Download the application from http://tuntaposx.sourceforge.net/download.xhtml. Inside the .tar.gz file, there is .pkg file that starts an installer after a double click. You have to allow the installation of this application using the macOS security dialogs that popup during the installation procedure.

The TAP device will not be created by the installer! Instead, after the installation, you can create a TAP device manually from the command line or write an application that creates a TAP device in code.

This post describes the situation very well. tuntaposx provides /dev/tunX and /dev/tapX where X is a numerical value starting from 0 and range to a maximum parameter (default 16) that is set at compile time.

The devices are created automatically when they are used for the first time by an open() call from a application or by a command from the terminal such as:

exec 5<>/dev/tap0

The command above will create a device tap0.

At this point, the device has no IP address and a call to read() will currently fail in this state! To assign an IP and bring the device up, use ifconfig or write code in an application to achieve the same effect programatically.

Using ifconfig you can now configure this device and assign a IP:

sudo ifconfig tap0 10.1.2.3 up

The device tap0 is now ready to receive packages.

Another script that opens and closes a device is:

exec 4<>/dev/tap0  # opens device, creates interface tap0
ifconfig tap0 10.10.10.1 10.10.10.255
ifconfig tap0 up
hexdump -c <&4 # reads from device - a cheap etherdump
(...here, the tap0 interface is working, try ping 10.10.10.255 ...)
exec 4>&-  # closes device, destroys interface tap0

Working with ARP

The Address Resolution Protocol (ARP) is used to retrieve the Media-Access-Control-Address (MAC-Address) of a network interface given it’s IP address. (Reverse-ARP converts a MAC-Address into an IP-Address.)

A MAC-Address is required to send messages using the ethernet protocol. The ethernet protocol does not use IP-Addresses.

In order to transmit a ethernet frame between two interfaces, you have to write the source MAC-Address and destination MAC-Address into the ethernet frames you want to send. In order to determine those two MAC-Addresses, you can use arping. arping is an open source utility that implements ARP. Use arping on the remote IP-Address and use arping on the IP-Address that is assigned to the local network interface to retrieve both their MAC-Addresses. Now use those values as source and destination addresses in the ethernet frame.

How can ARP send a package to the destination interface if the sender does not know it’s MAC-Address and the IP-Address is not used in the link layer? ARP will broadcast a request by using the MAC-Address ff:ff:ff:ff:ff:ff which is received by all interfaces. Every interface will compare it’s own IP to the IP in the request. The particular interface on which the IPs match, will answer the request with its MAC-Address, not by broadcasting but by sending a direct message to the sender.

Write the User Space Application

Now it is finally time to show some code. The code is supposed to receive all packages send to the tap0 interface, filter out and answer ARP requests.

Because the interface can get any type of packages, the application will actively look for ARP packages and filter out all other packages (e.g. ipv4 packages).

TODO

Once it detects an incoming ARP package, the app has to answer.
Why does the application receive packages from the start? The answer can be found using wireshark and binding to the TAP interface. A service called airport keeps sending MDNS packages to the interface.
How can I make the app wait for input? A call to select() causes the application to wait until there is data ready for reading on the interface. select() can be called using a timeout object for timeouts or a NULL parameter which causes the application to wait for input indefinitely. If select() returns without timeouting (data is ready) you still have to consume the data via a call to read().

Excuse the messy code, I have to clean it up! If you are a beginner programmer, this code is not a good example right now, you should come back when I did clean up the code!

The code basically calls open(), on the tun/tap device tap0. In this state, it is not possible for the application to read from the device! To read, first a IP-Address has to be assigned to the device and it has to be brought up. The device has a MAC address as MAC addresses are burned or assigned by the hardware vendor when the hardware is produced. The device does not have an IP yet!

To assign a IP address, ioctl() calls are used. This is the reason why you have to run this app with admin rights (sudo) as only the admin is allowed to change the system state with ioctl(). This first part of the application is similar to the command

sudo ifconfig tap0 10.10.10.1 10.10.10.255 up

You could theoretically put a sleep() into the application after opening the tap device and enter the ifconfig command to assign it an ip and a netmask and bringing it up manually in another console. Then after the sleep() the application can continue to read from the device as the device now has a IP address. It is more convenient if the application assigns an IP programatically.

Instead of using ifconfig or setting a harcoded IP in code, the correct way would be to implement the DHCP protocol and ask the router to lease a free IP address in the network. Maybe we implement a subset of DHCP in a later post.

After the device has an IP, the app starts to read from the device. The part I currently fail to understand is, why the app immediately receives Ethernet frames! I do not know who sends those frames! It is not the arping utility because I do not call the arping utility! Maybe it is the operating system or some firewall.

The application will now read from the device. It will first check the device using a call to select(). select() allows to check if a device is in a desired state for a specific operation.

From the manpages:

select() and pselect() allow a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become “ready” for some class of I/O operation (e.g., input possible).

So if select() returns a success() code, the application calls read() to read bytes from the network into a buffer in memory. The bytes are a ethernet frame which in turn contains an ARP packet in its payload section.

The ethernet header struct is packed, so we can just cast the byte buffer pointer to the ethernet header structure. An alternative to packing is to parse the byte buffer manually and copy values into the non-packed header field by field. That way the compiler can insert padding bytes to align the elements in the struct as he sees fit. I do actually not know what disadvantages packed structs have but I imagine that the performance can get pretty bad if the compiler cannot apply it’s magic. However, the manual work is not worth the compiler’s freedom for this example so packing is used.

Once a ethernet header pointer is available, the application can use the ethernet header’s ethertype field to check which protocol the ethernet frames payload belongs to. If the ethertype is a code that denotes ARP, the package is further investigated. If the ethertype is something else, such as a code for ipV4, the application skips the ethernet frame for now. We only want to answer an ARP request for now.

Also as a side node, the ethernet header contains the target MAC. The target MAC is set to ff:ff:ff:ff:ff:ff which you can see from the applications console output. The ff:ff:ff:ff:ff:ff MAC address is the broadcast address. You can see that the communication partner first broadcasts into the network to resolve an IP to a MAC address, so he can talk to the MAC address using ethernet (which will carry ipv4 as a payload). Our task in the future is to send the MAC address of the tun/tap device as an answer to the broadcaster using an answer solely targeted at the communcaction parter (no broadcast).

Just like with the ethernet header, the payload is cast to a packed struct of the ARP header. For now, the application just outputs the ARP header’s fields. The ARP header contains a lot of information that we can use to finally answer the ARP request.

TODO: answer the request

Something nobody ever tells you: You have to run this application with admin rights (sudo)! Otherwise the application will not be able to alter the tun/tap interface (calling ioctl(), assign an IP, bring it up into the up state, …)

Major parts of this code example are copied from https://github.com/LaKabane/libtuntap which is an excellent library that performs most of the setup code for you in a portable fashion! You should check it out!

#include <net/if.h>
#include <net/if_arp.h>
#include <net/if_utun.h>

#include <netinet/if_ether.h>
#include <netinet/in_systm.h>
#include <netinet/in.h>
#include <netinet/ip.h>

#include <unistd.h>
#include <fcntl.h>
#include <string.h>
#include <signal.h>
#include <errno.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <ifaddrs.h>
#include <errno.h>

#include <arpa/inet.h>

//#if defined(__APPLE__) && defined(HAVE_NET_UTUN_H)
#include <sys/kern_control.h>
#include <sys/sys_domain.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
//#endif

// from if_tun.h because it does not exist on mac. TUNSETIFF ifr flags
#define IFF_TUN 0x0001
#define IFF_TAP 0x0002
#define IFF_NO_PI 0x1000
#define IFF_ONE_QUEUE 0x2000
#define IFF_VNET_HDR 0x4000
#define IFF_TUN_EXCL 0x8000

static char *device_ptr;

#define BUFFER_LEN 2048

//#define ARP_ETHERNET_FRAME_TYPE 0x0806            // 1544, ARP, Address resolution protocol ethernet frame type
#define ETHERTYPE_ARP_ENDIANNESS 0x0608 // endianess changed

#define ETHERTYPE_IP_ENDIANNESS 0x0008 // IPv4 endianess changed

#define ARP_802DOT2_FRAME_TYPE 0x0004 // 1024 is in fact 0x0004 = 802.2 frames

struct eth_hdr
{
    uint8_t dmac[6];
    uint8_t smac[6];
    uint16_t ethertype;
    uint8_t payload[];
} __attribute__((packed));

struct arp_hdr
{
    uint16_t hwtype;
    uint16_t protype;
    unsigned char hwsize;
    unsigned char prosize;
    uint16_t opcode;
    unsigned char data[];
} __attribute__((packed));

#if defined Windows
typedef IN_ADDR t_tun_in_addr;
typedef IN6_ADDR t_tun_in6_addr;
#else // Unix
typedef struct in_addr t_tun_in_addr;
typedef struct in6_addr t_tun_in6_addr;
#endif

typedef int t_tun;

struct device
{
    t_tun tun_fd;
    int ctrl_sock;
    int flags; // ifr.ifr_flags on Unix
    unsigned char hwaddr[ETHER_ADDR_LEN];
    char if_name[IF_NAMESIZE + 1];
};

int tuntap_sys_set_ipv4(struct device *dev, t_tun_in_addr *s4, uint32_t bits)
{
    struct ifaliasreq ifa;
    struct ifreq ifr;
    struct sockaddr_in addr;
    struct sockaddr_in mask;

    memset(&ifa, '\0', sizeof ifa);
    strlcpy(ifa.ifra_name, dev->if_name, sizeof(ifa.ifra_name));

    printf("A) %s\n", ifa.ifra_name);

    memset(&ifr, '\0', sizeof ifr);
    strlcpy(ifr.ifr_name, dev->if_name, sizeof(ifr.ifr_name));

    printf("B) %s\n", ifr.ifr_name);

    // Delete previously assigned address
    ioctl(dev->ctrl_sock, SIOCDIFADDR, &ifr);

    // Fill-in the destination address and netmask,
    // but don't care of the broadcast address
    (void)memset(&addr, '\0', sizeof addr);
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = s4->s_addr;
    addr.sin_len = sizeof(addr);
    (void)memcpy(&ifa.ifra_addr, &addr, sizeof addr);

    (void)memset(&mask, '\0', sizeof mask);
    mask.sin_family = AF_INET;
    mask.sin_addr.s_addr = bits;
    mask.sin_len = sizeof(mask);
    (void)memcpy(&ifa.ifra_mask, &mask, sizeof ifa.ifra_mask);

    // Simpler than calling SIOCSIFADDR and/or SIOCSIFBRDADDR
    if (ioctl(dev->ctrl_sock, SIOCSIFADDR, &ifa) == -1)
    {
        //tuntap_log(TUNTAP_LOG_ERR, "Can't set IP/netmask");
        printf("Can't set IP/netmask\n");
        printf("ERRNO: (%d) %s\n", errno, strerror(errno));
        printf("If the error is 'operation not permitted' make sure you have to run this app with administrator rights (sudo)!\n");

        return -1;
    }

    return 0;
}

/*
http://tuntaposx.sourceforge.net/faq.xhtml

I'm a developer and I try to read() and write() to the character devices. However, 
all it gives me is an "Input/Output error". 
Why is that?

You can only read and write packets from and to the kernel while the corresponding network interface is up. 
The setup sequence is as follows (using tap0 as an example):

    open() the character device /dev/tap0.
    Configure the network interface tap0 and bring it up. 
    Typically, you'll also want to assign an IP address. 
    Here is an example using ifconfig (but you can also configure the device programatically using the usual IOCTLs):

    ifconfig tap0 10.1.2.3 up
    							
    Once the interface has been brought up, you can use the read() and write() functions on the character device's 
    file descriptor to receive or send a packet at a time.
    When you're done, close() the character device. This will remove the network interface from the system. 
     */

void print_hex_memory(void *mem, const int len)
{
    int i;
    unsigned char *p = (unsigned char *)mem;
    for (i = 0; i < len; i++)
    {

        // after 16 bytes, insert a newline
        if ((i % 16 == 0) && i > 0)
        {
            printf("\n");
        }

        printf("0x%02x ", p[i]);
    }
    printf("\n");
}

/*
 * Taken from Kernel Documentation/networking/tuntap.txt
 */
static int tun_alloc(char *dev)
{
    //struct ifreq ifr;
    struct ifaliasreq ifr;
    int fd, err;

    //if ((fd = open("/dev/net/tap", O_RDWR)) < 0)
    if ((fd = open("/dev/tap0", O_RDWR)) < 0)
    {
        perror("Cannot open TUN/TAP dev\n"
               "Make sure one exists with "
               "'$ mknod /dev/tap0 c 10 200'");

        return 1;
    }

    printf("device is opened %d!\n", fd);

    // before this timesout, type
    // sudo ifconfig tap0 10.10.10.1 10.10.10.255
    // sudo ifconfig tap0 up
    //sleep(10);

    memset(&ifr, 0, sizeof(ifr));

    printf("device is cleared!\n");

    // Flags: IFF_TUN   - TUN device (no Ethernet headers)
    //        IFF_TAP   - TAP device
    //
    //        IFF_NO_PI - Do not provide packet information
    //
    //ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
    if (*dev)
    {
        strncpy(ifr.ifra_name, dev, IFNAMSIZ);
    }
    printf("device name '%s'\n", ifr.ifra_name);

    printf("Creating socket ...\n");
    int sock = -1;
    if ((sock = socket(AF_INET, SOCK_DGRAM, 0)) < 0)
    {
        printf("ERRNO: (%d) %s\n", errno, strerror(errno));
        return -3;
    }
    printf("Creating socket done %d!\n", sock);

    printf("Setting ip ...\n");

    in_addr_t in_addr = inet_addr("10.10.10.1");

    struct sockaddr_in addr;
    memset(&addr, 0, sizeof(addr));
    addr.sin_addr.s_addr = in_addr;

    struct device libDevice;
    memset(&libDevice, 0, sizeof(struct device));
    strncpy(libDevice.if_name, "tap0\0", IFNAMSIZ);
    libDevice.ctrl_sock = sock;
    libDevice.flags &= IFF_LINK0;

    printf("tuntap_sys_set_ipv4 ...!\n");

    uint32_t bits;
    inet_pton(AF_INET, "10.10.10.255", &bits);

    if (tuntap_sys_set_ipv4(&libDevice, &(addr.sin_addr), bits) != 0)
    {
        printf("tuntap_sys_set_ipv4 failed!\n");
        return -1;
    }
    printf("tuntap_sys_set_ipv4 done.!\n");

    //sleep(10);

    printf("Setting ip done.\n");

    //
    // read and output mac address
    //

    struct ifaddrs *ifa = 0;
    if (getifaddrs(&ifa) != 0)
    {
        printf("Could not retrieve if addresses!\n");
        goto cleanup;
    }
    if (ifa == NULL)
    {
        printf("Can't get link-layer address\n");
    }

    struct ether_addr eth_addr;

    struct ifaddrs *pifa = 0;
    for (pifa = ifa; pifa != NULL; pifa = pifa->ifa_next)
    {
        // only output the addresses of the tun/tap interface
        if (strcmp(pifa->ifa_name, ifr.ifra_name) != 0)
        {
            continue;
        }

        printf("addresses found for ifc!\n");

        // The MAC address is from 10 to 15.
        //
        // And yes, I know, the buffer is supposed
        // to have a size of 14 bytes.
        //(void)memcpy(dev->hwaddr,
        //             pifa->ifa_addr->sa_data + 10,
        //            ETHER_ADDR_LEN);

        // initialize with zeroes
        (void)memset(&eth_addr.ether_addr_octet, 0, ETHER_ADDR_LEN);

        // copy data in
        (void)memcpy(&eth_addr.ether_addr_octet, pifa->ifa_addr->sa_data + 10, ETHER_ADDR_LEN);
        break;
    }

    printf("MAC: %s\n", ether_ntoa(&eth_addr));

    freeifaddrs(ifa);
    ifa = 0;

    /*
     * ioctl() ==  input/output control == system call
     * 
     * http://man7.org/linux/man-pages/man2/ioctl.2.html
     * 
     * Sends request codes to drivers. The reaction to the code is up to the driver implementation.
     * 
     * Parameters:
     * int fd -  file descriptor
     * unsigend long request - request code
     * ... - variadic parameter list
     */

    char buffer[BUFFER_LEN];

    printf("Trying to read ...\n");

    struct timeval timeout;
    timeout.tv_sec = 0;
    timeout.tv_usec = 10000;

    fd_set set;
    FD_ZERO(&set);    // clear the set
    FD_SET(fd, &set); // add our file descriptor to the set

    for (int i = 0; i < 10; i++)
    {
        memset(buffer, 0, BUFFER_LEN);

        printf("\n");
        printf("Selecting...\n");

        // select() and pselect() allow a program to monitor multiple file
        // descriptors, waiting until one or more of the file descriptors become
        // "ready" for some class of I/O operation (e.g., input possible).
        int rv = select(fd + 1, &set, NULL, NULL, &timeout);
        printf("rv: %d\n", rv);

        if (rv == -1)
        {
            // an error accured
            perror("select\n");
            printf("ERRNO: (%d) %s\n", errno, strerror(errno));
        }
        else if (rv == 0)
        {
            // a timeout occured
            printf("timeout\n");
            printf("ERRNO: (%d) %s\n", errno, strerror(errno));
        }
        else
        {

            printf("Something was read!\n");

            int read_result = read(fd, buffer, BUFFER_LEN);
            if (read_result != 0)
            {
                printf("ERRNO: (%d) %s\n", errno, strerror(errno));
            }
            else
            {
                printf("Something was read!\n");
            }

            print_hex_memory(buffer, BUFFER_LEN);

            struct eth_hdr *ethHeader = (struct eth_hdr *)buffer;

            // 6 byte destination MAC
            printf("Destination MAC: ");
            print_hex_memory(ethHeader->dmac, 6);

            // 6 byte source MAC:
            printf("Source MAC:      ");
            print_hex_memory(ethHeader->smac, 6);

            // 2 byte ethernet frame type
            // 1544 = 0x0806 = ARP
            if (ethHeader->ethertype == ETHERTYPE_ARP_ENDIANNESS)
            {
                printf("Ethertype: %d ARP\n", ethHeader->ethertype);

                struct arp_hdr
                {
                    uint16_t hwtype;
                    uint16_t protype;
                    unsigned char hwsize;
                    unsigned char prosize;
                    uint16_t opcode;
                    unsigned char data[];
                } __attribute__((packed));

                // payload is ARP
                struct arp_hdr *arpHeader = (struct arp_hdr *)ethHeader->payload;

                // https://de.wikipedia.org/wiki/Address_Resolution_Protocol

                // https://www.iana.org/assignments/arp-parameters/arp-parameters.xhtml
                // 256 - HW_EXP2
                printf("ARP hardware address type: %d \n", arpHeader->hwtype);
                printf("ARP protocol address type: %d ", arpHeader->protype);
                if (arpHeader->protype == ETHERTYPE_IP_ENDIANNESS)
                {
                    printf("ipv4");
                }
                else
                {
                    printf("unknown");
                }
                printf("\n");
                printf("ARP hardware address size: %d \n", arpHeader->hwsize);
                printf("ARP protocol address size: %d \n", arpHeader->prosize);
                printf("ARP opcode: %d \n", arpHeader->opcode);

                unsigned char *tempPtr = arpHeader->data;

                printf("Source MAC: ");
                print_hex_memory(tempPtr, 6);
                tempPtr += 6;

                printf("Source IP:  ");
                print_hex_memory(tempPtr, 4);
                tempPtr += 4;

                printf("Dest MAC:   ");
                print_hex_memory(tempPtr, 6);
                tempPtr += 6;

                printf("Dest IP:    ");
                print_hex_memory(tempPtr, 4);
            }
            else if (ethHeader->ethertype == ETHERTYPE_IP_ENDIANNESS)
            {
                printf("Ethertype: %d IPv4\n", ethHeader->ethertype);
            }
            else
            {
                printf("UNKNOWN Ethertype: %d ???\n", ethHeader->ethertype);
            }
        }

        printf("Selecting done.\n");
    }

cleanup:
    printf("Closeing device ...\n");
    close(fd);
    fd = 0;
    printf("Closeing device done.\n");

    return fd;
}

int main(int argc, char **argv)
{
    printf("You have to run this app with administrator rights (sudo)!\n");
    printf("You have to run this app with administrator rights (sudo)!\n");
    printf("You have to run this app with administrator rights (sudo)!\n");

    device_ptr = calloc(16, 1);
    strncpy(device_ptr, "tap0", strlen("tap0"));

    if (tun_alloc(device_ptr) != 0)
    {
        printf("There was an error allocating the tun/tap device!\n");
    }

    free(device_ptr);
    device_ptr = 0;

    return 0;
}

Installing arping on macOS

arping is an opensource application that allows to send Address Resolution Protocol (ARP) messages from the command line. It is not a standard utility and it is not installed on macOS by default. It can be installed using brew.

brew install arping
brew link arping

On my machine arping was not available after linking. It was installed to /usr/local/Cellar/arping/2.19/sbin/ and it can be used from there.

Send the arping to the tap0 interface using this command:

arping -I tap0 10.0.0.4

sudo /usr/local/Cellar/arping/2.19/sbin/arping -I tap0 10.10.10.1

An nping command that also outputs the frame in hex is:

sudo ./nping -vvv --dest-mac ff:ff:ff:ff:ff:ff --ether-type 0x0800 -e tap0 --send-eth --data ffffffff 10.10.10.1

Realtek rtl8139 Network Interface Card

Introduction

This post will explain all I know about developing a driver for the Realtek rtl8139 network adapter. It is a network interface card that is capable of 10 / 100 Mbit/s network speeds. It is emulated by qemu which makes it a prime target for learning about driver development.

To enable the rtl8139 in qemu, use

qemu-system-i386 -net nic,model=rtl8139 -fda <your_image>

The osdev wiki says:

If you find your driver suddenly freezes and stops receiving interrupts and you’re using kvm/qemu. Try the option -no-kvm-irqchip

Initialization Process

In order to initialize the network card, there are several settings to configure. There are two types of locations where settings have to be applied:

PCI Configuration Space
ioaddr

Finding the Device in the System

The rtl8139 is connected to the PCI bus. With PCI, every device is identified by a pair of IDs. The pair consists of a vendor ID and a device ID. The rtl8139 has a vendor ID of 0x10ec and a device ID of 0x8139. You can check any pair of vendor ID and device ID on https://www.pcilookup.com or http://pciengine.com/.

First, you have to check if the system has the rtl8139 build into it (or if qemu does emulate the card) by listing all the PCI devices of the system and searching for the vendor and device ID. PCI device listing is described here.

PCI Configuration Space

PCI (Peripheral Component Interconnect) is a way to configure hardware on a pure software basis. Extension cards you put into your PC via a PCI slot, are part of the PCI system.

A PCI system consists of up to 256 busses, each bus can contain up to 32 devices, every device can be a package of up to 8 functions. That means a PCI extension card can act as up to 8 devices when plugged into the PC. Each of these devices will get it’s own function via PCI.

A PC usually only contains a single PCI bus, so instead of using 256 busses it only contains 1.

One of these function in one of the devices on one of the busses will be the RTL 8139 but it is not predefined which one it is. That means the tuple of (bus, device, function) is unknown and the driver has to find the device.

In order to find the touple, there are several ways to do it. The simplest way is to iterate over all busses, devices and functions. On each function, using the current touple (bus, device, function) it is possible to read data from the device at those coordinates. If the touple does not point to an existing device, contine with the next touple. If there is a device at the touple, it is possible to read the so called PCI configuration space of that device. The configuration space contains several registers on the PCI Hardware. Two important registers are the vendor and device registers.

Here is a general depiction of the PCI configuration space of PCI card. You can see that the first four byte contain the vendor Id and the device Id.

Reading and writing the PCI configuration space is done via ports. A port is a memory address that points to hardware instead of a memory cell. Ports can be used to write and read data and to communicate with hardware instead of writing and reading to memory. First you have to specify the address you want to manipulate by writing data to the configuration address 0xCF8. Once that location is configured, you can read or write data by reading and writing to the configuration data address 0xCFC.

The RTL 8319 (and every PCI card) has specific values for vendor and device. Knowing these values, the card can be identified and the touple (bus, device, function) can be found.

Having the touple (bus, device, function) the driver can start with the configuration.

The code to iterate and find the RTL 8139 is listed here:

const u32int PCI_ENABLE_BIT = 0x80000000;
const u32int PCI_CONFIG_ADDRESS = 0xCF8;
const u32int PCI_CONFIG_DATA = 0xCFC;

// func - 0-7
// slot - 0-31
// bus - 0-255
//
// described here: https://en.wikipedia.org/wiki/PCI_configuration_space under
// the section "software implementation"
// parameter pcireg: 0 will read the first 32bit dword of the pci control space
// which is DeviceID and Vendor ID
// pcireg = 1 will read the second 32bit dword which is status and command
// and so on...
u32int r_pci_32(u8int bus, u8int device, u8int func, u8int pcireg) {

  // compute the index
  //
  // pcireg is left shifted twice to multiply it by 4 because each register
  // is 4 byte long (32 bit registers)
  u32int index = PCI_ENABLE_BIT | (bus << 16) | (device << 11) | (func << 8) |
                 (pcireg << 2);

  // write the index value onto the index port
  outl(index, PCI_CONFIG_ADDRESS);

  // read a value from the data port
  return inl(PCI_CONFIG_DATA);
}

int realtek8319Found = 0;

unsigned char pci_bus = 0;
unsigned char pci_device = 0;
unsigned char pci_device_fn = 0;

// there are 256 busses allowed
for (bus = 0; bus != 0xff; bus++) {

// per bus there can be at most 32 devices
for (device = 0; device < 32; device++) {

  // every device can be multi function device of up to 8 functions
  for (func = 0; func < 8; func++) {

    // read the first dword (index 0) from the PCI configuration space
    // dword 0 contains the vendor and device values from the PCI configuration space
    data = r_pci_32(bus, device, func, 0);
    if (data != 0xffffffff) {
       
       // parse the values
       u16int device_value = (data >> 16);
       u16int vendor = data & 0xFFFF;

       // check vendor and device against the values of the RTL 8139 PCI device
       realtek8319Found = 0;
       if (vendor == 0x10ec && device_value == 0x8139) {

        realtek8319Found = 1;

        pci_bus = bus;
        pci_device = device;
        pci_device_fn = func;

        k_printf("RTL8139 found! bus: %d", pci_bus);
        k_printf(" device: %d", pci_device);
        k_printf(" func: %d \n", pci_device_fn);
      }

    }
  }
}

ioaddr

If the Realtek 8139 is build into a PC, it gets a ioaddr assigned during system boot. The device is mapped into memory at that ioaddr. By writing or reading data from memory at that ioaddr, the operating system can configure the card.

The ioaddr can be read from the PCI configuration space at the byte 4. Byte 4 is where the command register starts. The ioaddr is stored in the lowest three bits of the command register.

// read the ioaddr/base_address
u32int pci_ioaddr = r_pci_32(pci_bus, pci_device, pci_device_fn, 4);
k_printf("pci_ioaddr: 0x%x \n", pci_ioaddr);

unsigned long ioaddr = pci_ioaddr & ~3;
k_printf("ioaddr: 0x%x \n", ioaddr);

Using the ioaddr, the driver can power up the card.

Powering up the card

Write the value 0 into the config1 address via the ioaddr.

// write a byte out to the specified port.
void outb(u8int value, u16int port) {

  __asm__ __volatile__("outb %1, %0" : : "dN"(port), "a"(value));
}

outb(0x00, ioaddr + Config1);
k_printf("starting chip done.\n");

Bus Mastering

Next step is to enable bus mastering. If you do not enable bus mastering, qemu will not transfer any data between the memory of the operating system and the memory on the RTL 8139 network card but it will transfer zeroes instead.

A transfer of data is necessery to send a packet or to receive packets. To send a packet, first the data is copied from the memory of the operating system into a buffer on the PCI card. From the buffer the card transfers the data onto the wire.

The transfer of data is performed via DMA (Direct Memory Access). If a PCI card is not assigned rights to be the bus master, it cannot perform DMA. Only the bus master is allowed to perform DMA. (Sidenote: It was reported that on some real hardware, enabling bus mastering is not needed. qemu was updated to make bus mastering mandatory. If you test on qemu, you need this step)

If bus mastering is turned off, qemu will not copy any data to the card but it will only copy zeroes.

The same goes for receiving. The PCI card receives data from the wire and writes that data into a buffer. The operating system will copy data from the buffer into the memory of the operating system via DMA. If bus mastering is turned off, qemu will only transfer zeroes instead of the real data.

To enable bus mastering, you have to set bit 3 (zero indexed, bit3 is actually the fourth bit if you start counting from 1 instead from 0) inside the command register.

The bit is set by reading the command register, flipping bit 3 and writing the value back into the command register.

// https://wiki.osdev.org/RTL8139
// enable bus mastering in the command register
// Some BIOS may enable Bus Mastering at startup, but some versions
// of qemu don't. You should thus be careful about this step.
k_printf("BUS mastering ...\n");

u16int command_register =
    pci_read_word(pci_bus, pci_device, pci_device_fn, 0x04);

k_printf("BUS mastering command_register = %x\n", command_register);

command_register |= 0x04;

pci_write_word(pci_bus, pci_device, pci_device_fn, 0x04, command_register);

command_register = pci_read_word(pci_bus, pci_device, pci_device_fn, 0x04);

k_printf("BUS mastering command_register = %x\n", command_register);

Software Reset

Next is a software reset

// software reset
// https://wiki.osdev.org/RTL8139
// Sending 0x10 to the Command register (0x37) will send the RTL8139 into a
// software reset. Once that byte is sent, the RST bit must be checked to
// make sure that the chip has finished the reset. If the RST bit is high
// (1), then the reset is still in operation.

// ChipCmd is the Command Register 0x37 = 55
// 0x10 == 0001 0000 == bit 5
// k_printf("Reset the chip %d ...\n", i);
outb(0x10, ioaddr + ChipCmd);
while ((inb(ioaddr + ChipCmd) & 0x10) != 0) {
  k_printf("waiting for reset!\n");
}
k_printf("Reset done.\n");

Enable Receiver and Transmitter

// enable receiver and transmitter
// Sets the RE and TE bits high
// k_printf("Enable receiver and transmitter %d...\n", i);
// 0x0C = 1100 = bit 2 und bit 3
outb(0x0C, ioaddr + ChipCmd);
k_printf("Enable receiver and transmitter done.\n");

Set Transmit and Receive Configuration Registers

// https://www.lowlevel.eu/wiki/RTL8139
// CR (Transmit Configuration Register, 0x40, 4 Bytes) und RCR
// (Receive Configuration Register, 0x44, 4 Bytes) setzen.
outl(0x03000700, ioaddr + TxConfig);
outl(0x0000070a, ioaddr + RxConfig);

Configuration Done

At this point the RTL 8139 is ready to send and receive data. Next the Sending of data is explained.

Sending Data

The data to send is written into a buffer (byte array) in operating system memory. Then the buffer is transferred over to the card via DMA (which is why the driver enables bus mastering). You have to specify the physical address for DMA! The PCI card does not understand paging! It only reads from memory at physical locations and does no go through the memory management unit.

My tip for you is to turn off paging during your initial tests with the RTL 8139 just to rule out that source of error.

TSAD and TSD

The way that the RTL 8139 accepts data for sending is explained in this section. On a more abstract level, the card has four hardware buffers for sending. Those buffers are also called descriptors. At any one point in time, there is only a single hardware buffer active. After the reset of the card during initialization, the buffer with index 0 is the active buffer.

The card will send the data stored in the currently active hardware buffer and then make the next hardware buffer in line the active buffer. Once data has been send from buffer 3, the index is reset to 0 and 0 is active again.

Each one of the four hardware buffers is implemented via registers which are available via two memory locations. There is a memory location called TSAD and one called TSD per hardware buffer.

TSAD is the transmission start register. It has to contain the physical address of the buffer that contains the data that the operating system wants to send. The data is transferred between the operating system and the card via DMA in the first step. Once the data is stored in the card’s internal memory, it is transferred onto the wire from there.

TSD is the transmission status or transmission control register and has to be set to contain the length of the data to send in bits 0 to 12 which is the length of the buffer in TSAD in bytes. Also the bit 13 (OWN bit) has to be set to 0. If the OWN bit is zero (low), the hardware on the RTL 8139 card will start to transmit the data to the card and from the card onto the wire. If the DMA transfer between the operationg system and the card was successfull, the OWN bit is set to 1 (high) by the hardware. Once the OWN bit is high, the card will start to transfer the data from the cards internal memory over the wire. I think that the name OWN was choosen to tell the user that the card now owns the data to transfer.

For each of the four buffers there is a pair of TSAD and TSD. The addresses are:

// TSAD = Transmit Start Registers = 32bit = Physical Address of data to
// be sent
u8int TSAD_array[4] = {0x20, 0x24, 0x28, 0x2C};

// TSD - Transmit Status / Command Registers = 32bit
u8int TSD_array[4] = {0x10, 0x14, 0x18, 0x1C};

The operating system has to remember which is the currently active buffer because it is not possible to ask the RTL 8139 card about which buffer is active at the moment. The variable tx_cur is used to store the index of the active buffer.

int tx_cur = 0;

The operating system prepares a buffer (array of byte / char) of data to send. For this example, let’s send 256 bytes containing the ASCII character ‘A’ which has the hex code 0x41 or decimal code 65.

int len = 256;
unsigned char tx_buffer[len];
for (int i = 0; i < len; i++) {
    tx_buffer[i] = 'A';
}

The variable len stores the size of the buffer.

Fill TSAD and TSD of the currently active buffer with the data to send.

// Second, fill in physical address of data to TSAD
outl(tx_buffer, ioaddr + TSAD_array[tx_cur]);

// Fill the length to TSD and start the transmission by setting the OWN
// bit to 0 Start https://wiki.osdev.org/RTL8139#Transmitting_Packets
u32int status = 0;
status |= len & 0x1FFF; // 0-12: Length
status |= 0 << 13;      // 13: OWN bit

outl(status, ioaddr + TSD_array[tx_cur]);

Wait until the OK bit (bit 15) is high. This signals that the transmission is completed. The OWN bit will tell you, when the data was transferred between the operating system and the card. Once the data is stored on the card, it will start to transmit that data over the wire. Once the wire transfer is complete, the card will set the OK bit in the TSD to high which means that the transfer is done and the next transfer buffer is active.

u32int transmit_ok = inl(ioaddr + TSD_array[tx_cur]);
while (transmit_ok & (1 << 15) == 0) {
    k_printf("Waiting for transmit_ok ...\n");
    transmit_ok = inl(ioaddr + TSD_array[tx_cur]);
}
k_printf("Waiting for transmit_ok done. transmit_ok = %d\n", transmit_ok);

Tell the operating system which buffer is active after the last buffer was used. In order to do that, increment tx_cur and wrap around back to zero if the last buffer was used in the prior send operation.

tx_cur++;
if (tx_cur > 3) {
    tx_cur = 0;
}

Now that you are able to send an arbitray byte array into the network, you have to learn how to construct valid ethernet frames for a protocol such as ARP, ICMP, DHPC, TCP, IP, HTTP or anything else. This is not the RTL 8139 driver’s job so the details are not explained in this article.

Constructing the frames for a specific protocol in the OSI model is the job of the so called IP-stack.

Retrieving the MAC Address

The RTL 8139 sends and receives data and is therefore a part of a network. As such it needs an address so packets can be sent point to point between the sender and the receiver.

On the lower levels of the OSI stack where Ethernet frames are sent, the MAC address is used for this purpose. A MAC address is a unique address assigned to a RLT 8139 during manufacturing.

When implementing ARP for example, you need to know the MAC address of your card. This section explains how to retrieve the NIC’s MAC address.

On qemu, you can specify the MAC address on the command line. Knowing the MAC address when testing code is a big advantage because as soon as you retrieve the expected MAC address, it is proven that the code works correctly.

The qemu command line parameter mac specifies the mac address.

/home/<user>/dev/qemu/build/i386-softmmu/qemu-system-i386 \
-monitor stdio \
-cdrom image.iso \
-netdev user,id=network0 \
-device rtl8139,netdev=network0,mac=52:54:00:12:34:56 \
-object filter-dump,id=network_filter_object,netdev=network0,file=dump.dat

Here 52:54:00:12:34:56 is used as a mac address.

The MAC address is stored in a EEPROM chip on the card. To read the EEPROM you need a function.

// Delay between EEPROM clock transitions.
// No extra delay is needed with 33Mhz PCI, but 66Mhz may change this.
#define eeprom_delay() inl(ee_addr)

// The EEPROM commands include the alway-set leading bit.
#define EE_WRITE_CMD (5 << 6)
#define EE_READ_CMD (6 << 6)
#define EE_ERASE_CMD (7 << 6)

static int read_eeprom(long ioaddr, int location) {

  unsigned retval = 0;
  long ee_addr = ioaddr + Cfg9346;
  int read_cmd = location | EE_READ_CMD;

  outb(EE_ENB & ~EE_CS, ee_addr);
  outb(EE_ENB, ee_addr);

  // Shift the read command bits out.
  for (int i = 10; i >= 0; i--) {

    int dataval = (read_cmd & (1 << i)) ? EE_DATA_WRITE : 0;

    outb(EE_ENB | dataval, ee_addr);
    eeprom_delay();

    outb(EE_ENB | dataval | EE_SHIFT_CLK, ee_addr);
    eeprom_delay();
  }

  outb(EE_ENB, ee_addr);
  eeprom_delay();

  for (int i = 16; i > 0; i--) {

    outb(EE_ENB | EE_SHIFT_CLK, ee_addr);
    eeprom_delay();

    retval = (retval << 1) | ((inb(ee_addr) & EE_DATA_READ) ? 1 : 0);

    outb(EE_ENB, ee_addr);
    eeprom_delay();
  }

  // Terminate the EEPROM access.
  outb(~EE_CS, ee_addr);

  return retval;
}

Using this function, the MAC can be read and stored into an array. The array is then output to show that the correct MAC address is read.

// prepare mac address read
int mac_address_index = 0;
u32int mac_address[6];
for (int i = 0; i < 6; i++) {
  mac_address[i] = 0;
}

// Read EEPROM
//
// Read the MAC Addresses from the NIC's EEPROM memory chip
// k_printf("read_eeprom() ...\n");

int readEEPROMResult = read_eeprom(ioaddr, 0) != 0xffff;
if (readEEPROMResult) {

  // loop three times to read three int (= 32 bit)
  for (int i = 0; i < 3; i++) {

    u16int data = read_eeprom(ioaddr, i + 7);

    mac_address[mac_address_index] = data & 0xFF;
    mac_address[mac_address_index + 1] = data >> 8;

    mac_address_index += 2;
  }

} else {

  // loop six times
  for (int i = 0; i < 6; i++) {

    u16int data = inb(ioaddr + i);

    mac_address_index += 1;
  }
}

// DEBUG: print MAC Address
k_printf("MAC: ");
for (int i = 0; i < 6; i++) {
  k_printf("%x:", mac_address[i]);
}
k_printf("\n");

Debugging

This section will introduce you to two ways of debugging the process of sending data using the RTL 8139.

The first method is telling qemu to dump all incoming and outgoing packages to a file. The file is in the pcap format which makes it possible to open the file in wireshark. wireshark is a networking tool that can display all field in ethernet packages and knows a large array of protocols for detailed display of all fields in packets.

If your RTL driver sends data, you can look at what data is send by loading the dump file and looking at the send packets using wireshark.

The second method is to compile qemu and enable the debug output in the emulation layer of the RTL 8139 card. Sadly there is no command line parameter to enable the RTL 8139 debug output. You can only enable the debug output by changing qemu’s code and and compiling qemu. This sounds hard but it actually is pretty easy. If I managed to do it, you will easily be able to do it as well. This method was only tested on a Ubuntu linux. The steps to compile on windows or mac are unknown to me. You can follow method 2 on Ubuntu linux easily.

Dumping Network Traffic with qemu

qemu internally contains so called objects for diverse purposes. One of those objects is the filter-dump object. You can apply the filter-dump object to one of the network interface cards to dump all packets into a file.

/home/<user>/dev/qemu/build/i386-softmmu/qemu-system-i386 \
-monitor stdio \
-cdrom image.iso \
-netdev user,id=network0 \
-device rtl8139,netdev=network0,mac=52:54:00:12:34:56 \
-object filter-dump,id=network_filter_object,netdev=network0,file=dump.dat

The filter-dump object is pointed to the netdev. It will capture traffic on that netdev. The netdev is the RTL 8139 NIC. The output file is called dump.dat it is written into the folder where you start qemu.

Open dump.dat in qemu. You should see the packet you have sent! If the RTL 8139 only sends zeroes, check that you are specifying virtual addresses and check the code that enables bis mastering.

Compile qemu and Enable RTL Debug Output

https://forum.osdev.org/viewtopic.php?f=1&t=28285

In qemu/hw/net/rtl8139.c

#define DEBUG_RTL8139 1
replace by
define DEBUG_RTL8139 1

Build Qemu
https://en.wikibooks.org/wiki/QEMU/Installing_QEMU

0. sudo apt-get install libglib2.0-dev libpango1.0-dev libatk1.0-dev libsdl2-dev
1. git clone git://git.qemu-project.org/qemu.git
2. cd qemu
3. git submodule init
4. git submodule update --recursive
5. git submodule status --recursive
6. git checkout stable-4.1
7. mkdir build
8. cd build
9. ../configure --disable-kvm --prefix=PFX --target-list="i386-softmmu x86_64-softmmu" --enable-sdl
10. make

In step 6, replace the version number with the most current qemu release.
In step 9, the command specifies targets and only lists i386. That way only x86 32 bit qemu is built.
If you call ../configure without additional parameters, qemu will be build for all possible targets which will take forever.

The qemu executable will be placed inside build folder. For example in /home/<user>/dev/qemu/build/i386-softmmu/qemu-system-i386

Now qemu will output debug statements to the command line. You should see lines like these:

RTL8139: +++ transmitting from descriptor 0
RTL8139: +++ transmit reading 42 bytes from host memory at 0x0010504a
RTL8139: +++ transmitted 42 bytes from descriptor 0

Category: NetworkProgramming

TCP/IP Stack Ethernet

Sending a Ethernet Frame