Linux Kernel Programming

Not as scary as you think

Gavin Smyth

This is an article which appeared in EXE magazine in September 1999, under the title of "Cracking The Kernel."

(Since writing this, a number of articles have been published on the O'Reilly web site about porting 2.0 device drivers to 2.2, more or less extending the last chapter of the book I mention below: well worth a read.


Writing kernel code is an arcane art, especially when a 'real' multi-tasking operating system is involved, isn't it? I want to dispel that myth and show you that the life of a Linux kernel programmer is fairly straightforward - but don't tell the people who pay me!

In the world of the PC, not very many people need to delve into the kernel, but there are times when you have to get close to the hardware. The most obvious case is where you or your team have produced some bit of hardware to plug into a PC and therefore need a device driver. However, occasionally you may want to perform some sort of custom control of existing hardware - effectively displacing the standard handling contained within your operating system. For example, you could be working on some networking application that requires less flexibility but more efficiency than a general ethernet stack, and replacing the existing general purpose stack or scheduling with your own might be essential. Another example where you might replace or augment existing code is if, say, you want to experiment with robotics by having a number of actuators and sensors attached to your PC's printer port: a standard printer driver is hardly appropriate here. Device driver code is very close to operating system kernel code, and is the most common starting point for kernel code writing: you can gain a better understanding of the internals of your operating system by writing a device driver.

Why the need for a device driver?

In times of yore, before Windows, life was easy: there was nothing between application code and the hardware, so you could do whatever you wanted. Under DOS, you could ignore the rudimentary device driver layer and access any memory or I/O address with impunity - while this did simplify hardware control, it was also rather limiting in what else you could do. Other forces were driving PC operating systems to become the complex pieces of software they currently are, and part of that complexity is exhibited in the layers placed between application code and the hardware - all in the interests of making the OS more reliable, robust and flexible. It is interesting to note that most small embedded systems, while they usually have facilities such as multi-tasking, still permit complete and direct access to any of the memory space, mainly because size and efficiency requirements make anything else impractical and too expensive.

Nowadays, if you have to control some hardware within a PC, you generally have to write a device driver which sits in kernel space to manage interactions between a program (in user space) and the hardware. In the last such project with which I was involved, we were fortunate in having the option of choosing our operating system, and picked Linux for reasons which included cost (Linux, without a GUI in the form of the X window system, runs in a lot less memory than Windows NT or 95/98) and ease of writing device drivers. Elaborating on that last point: a short period of investigation showed that to be able to write a Windows device driver, you need the Windows DDK (device driver kit) for whichever flavour of Windows you are targeting, probably one or more fat books (such as the excellent System programming for Windows 95 by Walter Oney, Microsoft Press ISBN 1-55615-949-8) since the extensive DDK documentation is not easy to navigate. In addition, it looks almost impossible to avoid having to write at least some assembler (though toolkits such as Vireo's VtoolsD will help) - while not unexpected in device driver writing, I would still prefer to steer as clear of assembler as possible. An even shorter investigation into equivalent development under Linux showed that you need, er, no extra tools, and a slim book such as the very readable Linux Device Drivers or Linux Kernel Internals will suffice - you definitely cannot survive without a book in this case because there is no equivalent of the DDK documentation available for Linux (but see Further Reading). After a couple of re-reviews to check that I had not missed anything, I concluded that this really was all you need for Linux - I was almost disappointed at how easy it all appeared to be.

In the rest of this article, I will give an overview of the process of writing a Linux device driver for an Intel processor based PC clone, but the techniques apply equally well to the other supported Linux architectures. Rather than getting bogged down in details of real hardware, I will use the example of a hypothetical device with an onboard 'message' buffer appearing in the PC's memory space at a particular physical address into which the external hardware will write blocks of data. The hardware will indicate the availability of data by pulling an interrupt line when the buffer is full.

The driver outline requirements were:

I must emphasise that the fourth one of those is a driver requirement: the application reads the data as required, and this may be a few seconds after the device has written it. This suggests that the interrupt routine would append messages to an internal queue to be read by the application but, for the purposes of this article, I will forgo the queue and merely maintain a single kernel space buffer. While Linux is not a real time operating system, it is sufficient for the millisecond timings in this application. Finally, because polling makes little sense, we needed to make read block but that leads to its own problems: it proved to be trivial to implement select to make the application programmer's life as easy as the device driver writer's!

Before starting, you need a capable build environment. The good news is that most Linux systems already have it - GCC and associated tools along with the kernel sources (just the header files are essential). Until kernel 2.2 appeared, it was safest to use GCC version 2.7.2 rather than the current 2.8.1 or EGCS - these apparently are a bit too clever with optimisation and break some (technically incorrect) earlier kernel code assumptions.

Implementation

Making a loadable module was my first task, just requiring the two routines init_module and cleanup_module, shown in Figure 1. The first registers the device, and initialises driver wide data structures. Registering the driver gives it a major number making it addressable by application code via this number, and associates it with a list of driver function pointers, stored in the testdriver_fops structure. (A major number implies the existence of a minor number - this distinguishes between several devices of the same type: here, since there is only one device, only one minor number, zero, is necessary.) The very first call, register_symtab, 'hides' all the code symbols so that we do have to worry about polluting the kernel name space - without this, any non-static symbols in the code would be visible to all kernel code. We get a bunch of extras for free: rather bizarrely, the module load command, insmod, can access any static integer or string variable. This is typically used to let us configure the driver at load time by overriding default values. In this case, we can assign a different major number, interrupt or base address (the three variables at the top of Figure 1): for example, insmod testdriver irq=5 overrides the default interrupt level of 10. Rather lazily, I assume that the device is present at the correct address and do not attempt any probing: a 'real' driver is likely to involve probing and allocating I/O ports as well as space in the memory map. The cleanup code for this module merely unregisters the device: in a real driver, it would also release any resources allocation during initialisation.

Figure 1 - Module loading and unloading

int major = 0;                       /* Default dynamic allocation */
unsigned long baseaddress = 0xD0000; /* Default address */
unsigned int irq = 10;               /* Default interrupt line */

struct file_operations testdriver_fops =
{
  /* Populate only with routines that exist */
  NULL,          /* seek */
  testdriver_read,
  NULL,          /* write */
  NULL,          /* readdir */
  testdriver_select,
  NULL,          /* ioctl */
  NULL,          /* mmap */
  testdriver_open,
  testdriver_release,
                 /* nothing more, fill with NULLs */
};

typedef struct
{
  char* mem;                    /* Hardware appears here in memory map */
  int irq;                      /* Interrupt line */
  int ready;                    /* Is a message ready to read? */
  char buffer[ MESSAGE_SIZE ];  /* Kernel message buffer */
  struct wait_queue* readq;     /* To suspend on if no data ready */
} TestdriverDev;

int init_module( void )
{
  int res;
  
  /* Register *no* symbol table explicitly so that no functions other than
     those in the table will be externally visible - this can't possibly
     fail, so don't bother to check... */
  register_symtab( NULL );

  if( ( res = register_chrdev( major, "testdriver", &testdriver_fops ) ) < 0 )
  {
    printk( KERN_INFO "testdriver: can\'t get major number\n" );
    return res;
  }
  if( major == 0 )
    major = res;

  memset( &testdriverDev, 0, sizeof( testdriverDev ) );
  testdriverDev.mem = (char*)baseaddress;
  testdriverDev.irq = irq;

  return 0;
}

void cleanup_module( void )
{
  unregister_chrdev( major, testdriver );
}

When a driver module has been loaded, it must be attached to some device node - an entry in /dev - for the application to be able to find it. The old way to do this was to hard code some fixed major number in the driver, but these days it is recommended to dynamically allocate one. The problem with that is determining what number has been allocated: the following two lines will extract the major number by examining all loaded modules and then create a suitable node:

major=`cat /proc/devices | awk "\\$2==\"testdriver\" {print \\$1}"`
mknod /dev/testdriver c $major 0

The next two important functions, in Figure 2, open and close the driver. When an application executes fd = open("/dev/testdriver",...), the kernel will work out that it corresponds to our driver because of its major number and, via the file operations table in Figure 1, execute testdriver_open. This function validates that the open is legal - that nothing has already opened it, for example - and then attaches the interrupt handler. (A real driver would almost certainly include extra code to enable the external hardware's interrupt controller, but I'll ignore that here.) The final argument to request_irq is passed directly to the interrupt handler, and thus can enable one routine to distinguish between several interrupting sources - in this case, there is a single source, but I pass the address of the device structure anyway. The last thing the open routine does is increment the module's usage count, so that it cannot be removed while it is being used. The close routine, as you would expect, releases the interrupt line and decrements the module usage count.

Figure 2 - Opening and closing the device

int testdriver_opened = 0;        /* True iff the device is open */

int testdriver_open( struct inode* inode, struct file* filp )
{
  /* only one of these devices, so whinge if bad (non-0) minor number */
  if( MINOR( inode->i_rdev ) != 0 )
    return -ENODEV;

  /* The device can be opened only once */
  if( testdriver_opened )
    return -EBUSY;

  if( request_irq( testdriverDev.irq, interrupt_handler, 0, "testdriver", &testdriverDev ) )
  {
    printk( "testdriver: can't get irq %i\n", testdriverDev.irq );
    testdriverDev.irq = 0;
  }

  /* Access device via file pointer - more in keeping with device drivers
     than using the static structure directly, and easier to extend later */
  filp->private_data = &testdriverDev;

  /* Now lay claim to the device */
  testdriver_opened = 1;
  MOD_INC_USE_COUNT;

  return 0;
}

void testdriver_release( struct inode* inode, struct file* filp )
{
  TestdriverDev* dev = (TestdriverDev*)( filp->private_data );

  /* Kill the interrupt */
  if( dev->irq )
  {
    free_irq( dev->irq, 0 );
    PRINTK( "interrupt released" );
  }

  testdriver_opened = 0;
  MOD_DEC_USE_COUNT;
}

Having got all that out of the way, it's time to read some data. The basic idea is that the hardware triggers an interrupt routine to read the data to an internal buffer, which is later read by the application. The interrupt routine is shown in Figure 3. One job the interrupt routine must do, omitted in this example, is reset the external hardware: sometimes this will occur automatically as a side-effect of the data read, but more often, there has to be an explicit write to some external location. The PC's interrupt controller itself must also be reset, but this is handled automatically by the operating system. After copying the external data into the driver buffer, any application that might be waiting for the data is made available to run by calling wake_up_interruptable. Such an application will have been suspended and placed on the queue by the testdriver_read routine, in Figure 4, invoked when the application calls read(fd, buff, MESSAGE_SIZE), once again found by indexing the testdriver_fops vector.

Figure 3 - Interrupt routine

static void interrupt_handler( int irq, void* devId, struct pt_regs* regs )
{
  TestdriverDev* dev = (TestdriverDev*)devId;

  memcpy( dev->buffer, dev->mem, MESSAGE_SIZE );
  dev->ready = 1;

  /* Now, tell the app... */
  wake_up_interruptible( &dev->readq );
}

The read routine in complicated by an odd looking loop to handle blocking reads and disposing of unwanted signals. If no data are available and the read is non-blocking, as indicated by the O_NONBLOCK flag (which would have been specified when the device was opened), the read returns immediately with a failure code - error codes are always negative while a positive return value contains the number of bytes read into the supplied buffer. If the read is blocking and there are no data, the application is suspended by interruptible_sleep_on, to be woken up by the interrupt routine as explained above. It is possible for the read to be awakened by an erroneous signal, hence the following check which causes the kernel to re-suspend the task again if necessary. The loop can exit only when there are some data to read: note the memcpy_tofs call - this copies data from kernel space to user space - in Linux on Intel processors, the FS segment register points to the user data segment, hence the name of this routine and the associated copy from user space to kernel space, memcpy_fromfs.

Figure 4 - Read routine

int testdriver_read( struct inode* inode, struct file* filp,
                     char* buf, int count )
{
  TestdriverDev* dev = (TestdriverDev*)( filp->private_data );

  if( count < MESSAGE_SIZE )
  {
    printk( "testdriver: read - buffer too small" );
    return -EINVAL;
  }

  /* Loop while nothing to read */
  while( !dev->ready )
  {
    if( filp->f_flags & O_NONBLOCK )
    {
      printk( "testdriver: read - nothing (and non blocking)" );
      return -EAGAIN;
    }

    /* Blocking call, so go to sleep until something happens */
    interruptible_sleep_on( &dev->readq );

    /* Got a signal unrelated to reading... */
    if( current->signal & ~current->blocked )
      return -ERESTARTSYS;
  }

  /* Copy the data into user space */
  memcpy_tofs( buf, dev->buffer, MESSAGE_SIZE );

  return MESSAGE_SIZE;
}

There is one problem with the driver presented thus far: if the device has been opened in blocking mode and an application calls read, it can potentially suspend forever if the hardware does not make any data available. An obvious solution is to insert some timeout mechanism, but where? The first option is within the application - have a separate thread whose purpose is merely to suspend for a short interval and kill off the main thread (cancelling the read) if necessary. The disadvantage of this is that the application writer's job is complicated by a detail which should really be hidden from view. At the other extreme, the device driver could employ a kernel timer to effectively perform the same function within the driver itself. This has the disadvantage that such timers are not completely trivial to use. A third alternative, for a lazy programmer such as me, is to let the kernel handle the timeout itself. The I/O select mechanism, familiar to unix coders, can be used here: it is normally used to determine which of several channels is ready to read (or write), as shown in Figure 5. The select statement will exit either when there is something to read on file handles fd1 or fd2 (or both), or when the timeout of 10 seconds expires: as select returns the number of ready streams, it is easy to check for that latter case.

Figure 5 - Use of select

struct fd_set readSet;
struct timeval timeout;

FD_ZERO( &readSet );
FD_SET( fd1, &readSet );
FD_SET( fd2, &readSet );

timeout.tv_sec = 10;
timeout.tv_usec = 0;

if( select( FD_SETSIZE, &readSet, NULL, NULL, &timeout ) > 0 )
{
  if( FD_ISSET( fd1, &readSet ) )
    /* Read from fd1 */
  if( FD_ISSET( fd2, &readSet ) )
    /* Read from fd2 */
}
else
  /* Process timeout (or error) */

If only one file handle is pended on, it should be obvious that this is a simple mechanism for the application writer to use to apply a timeout to a particular device: if select indicates that the device is ready to read, the read can occur without any possibility of blocking. The device driver side of the select mechanism is not much more complex: see Figure 6. The testdriver_select function is invoked behind the application's call to select and (ultimately) returns a true value when a read is possible. If nothing is available, the application is suspended, once again on the read queue, but the operating system applies the select timeout without any other work by the programmer. The second and third parameters to select are sets of file handles to check for write readiness or exceptions: as neither of these are handled by this driver, testdriver_select always returns false for any test other than read.

Figure 6 - select within the driver

int testdriver_select( struct inode* inode, struct file* filp,
                       int mode, select_table* table )
{
  TestdriverDev* dev = (TestdriverDev*)( filp->private_data );

  if( mode == SEL_IN )
  {
    /* Is there anything to read? */
    if( dev->ready )
      return 1;

    /* No, so sleep */
    select_wait( &dev->readq, table );
    return 0;
  }

  /* Don't handle write or exceptions, so always return not available */
  return 0;
}

Finally, for completeness, Figure 7 contains the make file used to build the driver. The first line extracts kernel version information for the installation process to be able to place the driver module in the correct version specific directory, a few lines later in the make file. The next line defines two extra compile time symbols necessary for virtually all device drivers, MODULE and __KERNEL__: these affect how some of the Linux header files are processed differently to the usual application mode. Apart from these, the script looks very similar to a typical unix make file.

Figure 7 - driver make file

VER = $(shell awk -F\" '/REL/ {print $$2}' /usr/include/linux/version.h)

CFLAGS = -Wall -O2 -DMODULE -D__KERNEL__

all: testdriver.o

testdriver.o: testdriver.o
	$(LD) -r $^ -o $

install: testdriver.o
	install -d /lib/modules/$(VER)/misc /lib/modules/misc
	install -c testdriver.o /lib/modules/$(VER)/misc
	install -c testdriver.o /lib/modules/misc

clean:
	-/bin/rm -f *.o core

testdriver.o: testdriver.c testdriver.h testdriver_internal.h

Conclusion

I hope I've shown that starting to look into the Linux kernel and writing a device driver is not actually very difficult. While it is not trivial, it need not be a job for wizards with a mastery of operating system incantations - it is true that there are some almost boiler plate sections of code (such as the blocking read loop), but these are few in number, at least under Linux. You do not need extra tools in your programming toolkit and you have a vast wealth of knowledge in the available source code and books such as Linux Device Drivers.

Further Reading

Linux Device Drivers

Alessandro Rubini

O'Reilly

ISBN 1-56592-292-1

400 pages

This books starts off by explaining what a device driver is, and describes the interaction between user space and kernel space. It very quickly gets down to writing some code, showing how to use and create loadable kernel modules. After this, the complexity gradually increases with chapters on character devices supporting read and write, debugging, ioctls, timing and scheduler structures, kernel memory management, hardware (yes, it really only does put in an appearance half way through the book!), interrupt handling, kerneld, block devices and file I/O, DMA, network drivers, PC buses, an overview of the kernel source code structure, and a summary of the changes between 2.0.x kernels and 2.1.x. The bulk of the book is concerned with 2.0 structures and functions, but the author does take some care that code is portable to 2.1, and hence 2.2 since that has more or less the same internals as 2.1. The book is easy to read and the copious examples well explained, though it is difficult at times to understand what the driver is doing without being able to attach some external hardware to view it. The book does not include complete code listings, but the full source is available at the O'Reilly web site.

In summary, this is an extremely good book bringing together virtually everything you need to get started with device driver writing - for more advanced work, you will have to examine the source, but the techniques explained here should help you on your way.

Linux Kernel Internals

Michael Beck, Harold Böhme, Mirko Dziadzka, Ulrich Kunitz, Robert Magnus and Dirk Verworner

Addison-Wesley

ISBN 0-201-33143-8

470 pages

As suggested by its title, this book is more a tour of the 2.0 kernel than the above one: there is a chapter on device drivers, but the authors really aim to present the internal workings of the operating system. The book starts with the layout of the source code files and how to build the kernel. After this, the authors dive straight into the kernel data structures and the main kernel algorithms. Chapters on memory management, inter process communication, file systems follow, and then there is a discussion of device drivers and brief note on implementing one. The final chapters include network handling, modules, debugging and multi-processor systems. About a third of the book is devoted to appendices listing system calls and other kernel functions, commands related to kernel access, the proc filing system, Linux booting - these all to some degree read like copies of man pages, but do include examples and it is useful to have them in one place. The package also includes a CD with a (rather old) Slackware distribution.

This book is more detailed than Linux Device Drivers and, in my opinion, not quite as readable. However, if you are more interested in how the kernel works than writing a device driver yourself, Linux Kernel Internals is highly recommended.

Linux Documentation Project

The LDP contains gems such as The Linux Kernel, The Linux Kernel Hacker's Guide and The Linux Kernel Module Programmer's Guide. These cover a lot of ground but, in my opinion, are not as readable as the books mentioned above. Other useful online sources include the HOWTOs and, of course, the readily available kernel and driver source code.

Kernel building and modules

A typical Linux kernel can be viewed as having two types of components: a set of core code which always has to be accessible, and other units, typically device drivers, which do not need to be present on every Linux system. An example of the first class is the scheduler - an operating system is not going to be particularly useful without some notion of tasks and switching between them; an example of the second is a printer driver - if you have no printer, there is little point in having a driver consuming memory and other resources. Run-time loadable and unloadable components, known as modules, were introduced as a way to prevent a monolithic Linux kernel from growing by having to include all devices.

For standard drivers, you usually have the option of building them into the kernel or as separate modules (or not building them at all). The only advantage of building a driver into the kernel is that it is available as soon as the OS starts - you don't need insmod. For new drivers, you also have the option of modifying kernel build files to incorporate your driver into the kernel itself, or just building it as a module as I have done in this article.

Here's a brief summary of the steps you need to take to build a driver into the kernel:

For completeness, this is how you build a Linux kernel:

For more information on kernel building, I suggest having a look at the online Linux documentation mentioned above.


Gavin's home page | BeesKnees home page

Last modified on 15th December 2000