Tuesday, July 29, 2008

Kernel Process Data-Structures

In the simplest words, a process is defined as "execution context of a running program". Execution context is the collection of all the data structures, registers, memory addresses and other resources that the process uses while executing a program. Therefore, a program itself is not a process. Rather, it is a collection of data structures that fully describes how far the execution of program has progressed. In other words, a process is an active program, i.e., an in-memory representation of a program stored in hard-disk. An important point to note is that two or more processes can co-exist which are executing the same program, and a single process can execute two or more programs in its life span.


Running and managing a process is a complex activity for kernel. In this post (and probably the next few) I try to make a note of how beautifully the Linux kernel performs this task. Specifically in this post I note how the Linux kernel keeps track of the process currently scheduled to run on CPU.


Data Structure to Hold Entire Information About a Process:

Kernel identifies each process with a unique process descriptor represented by struct task_struct, which contains all the information about a specific process. List of all process descriptors is maintained in a circular doubly linked list.

The following screenshot shows a (partial) definition of the task_struct structure, defined in linux/sched.h. I have removed many fields here, which are not directly relevant to our discussion here.

getpid() Does Not Return Process id ?!?

But I have been always using it to get the pid of calling process without knowing this fact!!! Indeed, getpid() returns tgid (thread group id), which is the pid of the leader of threads executing within the same group, (which is usually the first process created within that group). Since Linux doesn't support multithreading directly, each process executes within a single thread in its own thread group. Therefore, a single threaded process naturally becomes its thread group leader. The difference occurs only when some library (e.g. POSIX) is used to create multithreaded processes, in which case, getpid() returns the value of calling process' tgid field. By the way, in Linux, threads are called light weight processes and they also have pids, since as we can see, there is nothing called thread id in task_struct.

Food for Thought:

1. What if I need to determine the pid of a light weight process, which is not a group leader?


In a uniprocessor system, only a single process can run at a time. The pointer to the task_struct of the process currently running on CPU is returned by a macro current. Thus, the statement


printk(KERN_ALERT "Process Name: %s\n" "PID: %d\n", current->comm, current->pid);


prints the name and pid of the currently executing process.


How Does the current Macro Determine the Currently Running task_struct?

Other than task_struct, there is one more data structure associated with a process, thread_info, which contains the low level information about a process when it is running on CPU. Both the task_struct and thread_info structures contain a field pointing to each-other, named thread_info and task respectively (line #439 in the task_struct snapshot). But unlike task_struct, which is a dedicated structure for each process, there is one thread_info structure per processor in the system. The thread_info field of task_struct (and eventually, the task field of thread_info) is updated as soon as the process is scheduled onto the processor. When a process is not running, the value of thread_info field of task_struct does not mean anything.

For each process running in kernel mode, kernel uses the following union spanning 8KB:

union thread_union{

struct thread_info thread_info;

unsigned long stack[2048]; /* (long: 4 bytes) x 2K = 8KB*/

};


Food for Thought:

2. What happens if I reverse the order of the two declarations above in thread_union?

3. What is wrong in following code (courtesy of Kernel Trap mailing list http://kerneltrap.org/node/5835)

char *get_sp() { asm("movl %esp, %eax"); }


int main(void){

struct thread_info *current_thread_info;

struct task_struct *current_task;

current_thread_info = (struct thread_info *)((unsigned long)get_sp()&~0x1FFF));

current_task = (struct task_struct *)(current_thread_info->task);

printf("my pid is: %d\n",(unsigned int)current_task->pid);

}

As soon as a process switches to kernel mode, the (8 KB) thread_union is "created" in the memory area kernel data segment which is designated for kernel by kernel only (address to kernel data segment is returned by macro __KERNEL_DS). The user mode processes cannot access the kernel data segment.


Since at any given time, esp register points to a memory location lying somewhere between this 8 KB area (adrressed by 13 bits), the starting address of thread_union can be obtained by masking out the 13 least significant bits of the esp register. Furthermore, since the first field of thread_union is thread_info, obtained value is a pointer to thread_info. This masking operation is performed by kernel routine current_thread_info() which looks like following:


movl $0xffffe000,%ecx /* store the mask */

andl %esp,%ecx /* mask the esp, ecx points to thread_info*/


Having acquired the address of thread_info structure, getting the pointer to task_struct is trivial. Hence, the current macro is:


current = current_thread_info()->task


This is how the current macro points to the task_struct of the process currently running on CPU.

2 comments:

Unknown said...

Hey Jai, Shweta -- One of the most useful blogs i have read in the recent past. It is great to know that you folks have started writing about linux kernel. Whether it is correct (though as most of it will be accurate to the best of your knowledge) or incorrect is secondary but you have got the most important thing in place, WRITING.

I could understand most of it but did not find time to get some hands on. Well, actually I don't have any system to try as my laptop is in some trouble.

I suppose there should be some questions coming your way soon from me. Let's have a good discussion!!!

1 most DISAPPOINTMENT -- you people seem to have stopped writing. That would spell doom's day to people like me. Please try to take out some time from your busy schedules -- Dear didi and dearest darling!!!

NeverMind said...

Dear Rahul,

As this is the first written appreciation from a true critic, to us, receiving these words felt like receiving an award. We sincerely thank you for your time that you have put to this. We are very glad to know that you could understand most part of the posts because it was our primary aim to make it very basic for everyone. And about your questions, yes, we are eagerly waiting to get cornered (and even shot!!) with your darts.

Unfortunately, we couldn't post for a long time after "Kernel Process Data-Structures", because of Shweta settling with her new job, and me getting dead in a junk project here. But then, sure, your comments have re-motivated us, and here goes a new post, "Process Resource Limits". We have tried to make it even more fundamental than the previous ones, and we hope that readers will find it interesting, as much as we did while writing.

Read On Readers, Comment On Critics, and Inquisitors, please Ask On!!! All of you drive us to write better. Thank You so much to every one of you.