Welcome to Linux Forums! With a comprehensive Linux Forum, information on various types of Linux software and many Linux Reviews articles, we have all the knowledge you need a click away, or accessible via our knowledgeable members.
Find the answer to your Linux question:
Site Navigation
Linux Forums
Linux Articles
Product Showcase
Linux Downloads
Linux Hosting
Free Magazines
Job Board
IRC Chat
RSS Feeds
Free Publications


This is the second part of the series "Introducing LKM programming" started some time ago. In the last article we have learnt some OS basics, architecture types and we saw a little example that performed the "Hello world" from inside the kernel.

In this article we see some aspects that we have to keep in mind when we are programming in kernel mode. We also see a new example of kernel module to test in our machines and we focus on system calls.

Basic rules

The kernel is a special part of you operating system. Due to this, it is not surprising that the rules under the kernel is written are different from the normal programming. In fact, if you remember my previous article you already know this: remember that we used the printk function instead of printf function. And the reason is because printf is a function from the standard C library...

Let's see the special rules that we must know:

  • No libraries support
    The modules that we write are linked only against the kernel. On the contrary, when we write a normal program we are linking to the C library and probably against other libraries like the GNOME ones. The reason for this feature is complex. One of the reasons is because it could be not secure to let user space libraries to execute in privileged mode (from inside the kernel). Moreover, since printf calls to write system call, if the write system call calls again to printf we could have an infinite loop.
  • Fixed stack size
    When programming in user-space, we have a really big stack size (it can grow up to MB's), but in kernel mode we have a fixed size. This size depends on the architecture. This fixed size implies various restrictions:
    • Not to implement deep recursive functions
    • Not to define large arrays inside functions. Try to allocate them dynamically
    One of the benefits of this limitation is that the search for the task_struct of a process can be done with few assembler instructions, just applying some bit masks.
  • No memory protection
    When we are executing user-mode programs, the kernel cares for not corrupt memory. If a user program tries to access to a protected memory address, then the kernel returns an error to the process and kills it. But the kernel is supervised by nobody, so a memory corruption will cause a system crash.
  • No Floating Point
    The kernel uses integer (fixed) arithmetic. The context for the FPU (Float Point Unit) is not automatically saved. You can do this by hand, but it is a better idea not to use FPU inside the kernel.
  • No MMX
    Because the same reason above showed. The MMX registers are not automatically saved.

Focusing on system calls

The Syscalls (system calls) are functions inside the kernel that allow normal processes to do something. There are 294 syscalls defined in a 2.6 linux kernel (#define NR_syscalls 294). You can see the list at unistd.h file. Most of them are self-explanatory like read, write and several others. These functions perform the real work that processes need to properly work. For example, when a process want to write some data to a file, is the kernel trough the write system call who writes data to disk. They are the entry point to work in kernel mode.

How to deal with syscalls

Let's take a look at the following code:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{
    int fd;
    fd=open("my_file.txt",O_RDONLY);
    if(fd==-1)
       printf("Errorn");
    return(0);
}

In the code above, we are not working directly with syscalls. There is an established procedure that leads us from the open C function to the actual open syscall (this is, from user-land to kernel-land).This process is shown below:

First, we use the "open" C function. This function, actually calls the open wrapper from libc. Then the real syscall is invoked by moving some values to the cpu registers and invoking the 0x80 interrupt. After this the sys_open syscall service routine is executed. The first two steps are executed in user-mode while the last ones are executed in kernel mode. In fact, the 0x80 interrupt asks to run into kernel mode.

Of course, the kernel checks the parameter values to avoid the possibility of errors. Remember that once we are inside kernel mode, we can easily break or corrupt our system. Parameters are passed to the syscall using cpu registers in an ordered mode: eax,ebc,ecx,edx,esi,edi and ebp. The first parameter (contained in eax) is always the syscall routine identification constant, i.e. the value of __NR_open, __NR_socketcall, etc.

There is a method to call syscall routines from user programs. This is done by means of and < unistd.h> includes. Using them, it is possible to call a syscall service routine with the syntax that follows:

syscall(the_syscall,parameter, parameter...)

Let's see an example code:

#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include < unistd.h>
#include < stdio.h>
int main()
{
    int ret;
    ret=syscall(SYS_open,"my_file.txt",O_RDONLY);
    if (ret==-1)
                        printf("Errorn");
        return(0);
}

This code actually performs the same operation than the previous example, but this calls system calls directly.

The unistd.h file provides the identification numbers for every defined system call. The unistd.h file can be found in the unistd.h file at include/asm-$arch/ inside your kernel source directory.

   #define __NR_restart_syscall 0
   #define __NR_exit 1
   #define __NR_fork 2
   #define __NR_read 3
   #define __NR_write 4
   #define __NR_open 5
   #define __NR_close 6
   #define __NR_waitpid 7
   #define __NR_creat 8
   #define __NR_link 9
   #define __NR_unlink 10
   #define __NR_execve 11
   #define __NR_chdir 12
   #define __NR_time 13
   #define __NR_mknod 14
   #define __NR_chmod 15
  ...
  ...

Most of the syscalls have the same names than the user-land libc equivalent functions, so they will be not explained.

Playing with syscalls.

In the 2.4 kernel series, the syscall table was exported to modules. This means that we could declare the syscall table inside our module as follows:

extern unsigned long syscall_table;

This provided certain flexibility for the modules since they could replace some of the entries at this table to make it point to their custom system calls. However, this is not good, because syscalls are very critical. Imagine that one of these custom syscalls corrupts kernel memory, or maybe it doesn't perform the expected task in a safer way. Because of this, the syscall table is not exported to modules anymore. This prevents to access it and to change the addresses that it contains. Once I've said this, I would like to present the final example of this article, but always keep in mind this: The method used here is for learning purposes only. There are millions of things that can go bad with this code because of its nature. So this kind of hack should not be used in production software in anyway. Specially, this code is not safe against module unloading and has race conditions. Some synchronization methods should be used to ensure that the module unloading doesn't produce an Oops (in the worst case, the Oops can lead your system to goes down.). I explicitly omitted this additional code to improve code's clarity.

Code example:

First of all, we need to define both the syscall table and the routine that we want to replace. This is done with the two first lines:

 static unsigned long **sct;
int (* old_unlink)(void);

The syscall table is an array of pointers and the function that we will replace (unlink in this case) is a pointer to function.

The next step is to get the syscall table's address. Because this is not exported, we need another method to get it. We use the address that can be found at System.map file. The System.map file contains the memory addresses for the symbols declared in the kernel. This file is used by some applications like the klogd daemon that uses it to translate debug information from kernel into a more human-readable format.

We can use grep to find this address:

fernape@localhost:/boot$ grep sys_call_table System.map-2.6.15.6
c02aa560 D sys_call_table

So, once we have the two needed addresses (remember that unlink and others are exported so they can be freely used), now we can make the swapping:

sct=(void *)0xc02aa560;
old_unlink=(void *)sct[__NR_unlink];
sct[__NR_unlink]=(unsigned long *)&my_unlink;

First, we set the syscall table address and then we replaced the original unlink syscall by our own routine that doesn't perform anything but a "unlink unavailable" message from kernel. Obviously, we also save the original address in order to restore the syscall table when unloading the module.

Here is the complete code:

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/types.h>
#include <asm/unistd.h>
static unsigned long **sct;
int (* old_unlink)(void);
int my_unlink(void)
{
    printk(KERN_INFO "unlink unavailablen");
    return 0;
}
static int enter (void)
{
    sct=(void *)0xc02aa560;
    old_unlink=(void *)sct[__NR_unlink];
    sct[__NR_unlink]=(unsigned long *)&my_unlink;
    return 0;
}
static void go_out(void)
{
    printk(KERN_INFO "Bye, restoring syscallsn");
    sct[__NR_unlink]=(unsigned long *)old_unlink;
}
module_init(enter);
module_exit(go_out);

Now let's test it. Create a single file named "myfile" (touch myfile). And now, insert the module with insmod or modprobe. After this, try to delete the "myfile" file. You will see the "unlink unavailable" message and if you run ls, you will see that your file keeps intact. If you don't see the message, try with tail /var/log/messages.

Don't forget to unload the module and check that now, you can delete files.

 Jun 24 17:21:47 localhost kernel: unlink unavailable

Conclusion

This second article about LKM programming tried to offer an overview of the mechanism that makes system calls possible. Although the method above exposed to get the syscall table address was used by some programs like ancient Oprofile versions and (not so ancient) Intel Vtune driver, I would like to remark again that this should not be done for real production modules. There are better ways to achieve these purposes like using the Linux Trace Toolkit or kprobes (and there are others more ugly if possible, that scan part of the kernel address space to find the syscall table). I just hope that you enjoyed reading this and playing with the code as I did.

Lesson 2

Lesson 4

 
Rate This Article: poor excellent
 
Comments about this article
Software Engineer
writen by: Zeeshan on 2006-07-27 05:30:20
Thats a sweet way of intercepting system calls. And this code works fine. Keep it up. We would also like to see some functionality of kprobes and Linux Trace Toolkit as you have mentioned.
RE: Software Engineer written by Zeeshan:
Hello to all
writen by: Max on 2007-01-13 16:17:10
http://www.gfy.com
RE: Hello to all written by Max:
John Smith
writen by: www/lifestyle-mortgage-options on 2007-02-14 07:45:25
Very good
RE: John Smith written by www/lifestyle-mortgage-options:
www.lifestyle-mortgage-options.com
writen by: www.lifestyle-mortgage-options on 2007-02-14 07:47:41
www.lifestyle-mortgage-options.com
RE: www.lifestyle-mortgage-options.com written by www.lifestyle-mortgage-options:
good article
writen by: ansz on 2007-05-18 04:43:00
Very nice article .. looking forward for ur next article.
RE: good article written by ansz:
little problem
writen by: ky on 2007-05-18 04:49:28
hello, I have tested this code and I get this error: BUG: unable to handle kernel paging request at virtual address c0610508 (and the syscall table address is c06104e0)
RE: little problem written by ky:
RE: little problem
writen by: gghose on 2012-03-21 21:46:34
Hi ky,
As I tried experimenting with similar code I hit the exact same problem.
I am running Ubuntu distro, and upgraded the kernel to 3.2.2.
I am wondering, were you able to resolve this? I can understand, it has been a while, but trying in case you do remember.
Thanks,
gghose
Reply to gghose:
developer
writen by: V on 2007-10-15 19:57:33
How to write a kernel module only with kernel headers and hook it up in the IP stack in a function. I am currently having a problem where I need to create a LKM and make use of the module functionality to be called in the Linux Kernel IP Stack. For this should i need to have the source code of the kernel? Is there anyother way I could implement this without having the kernel source ? Any help appreciated. Thanks V
RE: developer written by V:

Comment title: * please do not put your response text here