You are on page 1of 33

CS 498 Lecture 4

An Overview of Linux Kernel Structure


Jennifer Hou
Department of Computer Science
University of Illinois at Urbana-Champaign

Reading: Chapter 1-2, The Linux Networking Architecture: Design


and Implementation of Network Protocols in the Linux Kernel
Outline
Overview of the Kernel Structure
Activities in the Linux Kernel
Locking
Kernel Modules
/proc File System
Memory Management
Timing
KernelStructure

Structure of Linux Kernel


User
Applications and tools
space
System calls
Process component
Memory File Device
management systems
Network
management drivers
Virtual Network Functionality
Multitasking Files, directories Device access
memory functionality
File system Network Software
Scheduler
Architecture Memory types Character protocols support
specific manager devices
code Block Network Hardware
devices drivers support

Hard disk Network


CPU RAM Terminals Hardware
CD, floppy adapter
Overview of the Kernel Structure
Process management
 The scheduler handles all the active, waiting, and
blocked processes.
Memory management
 Is responsible for allocating memory to each
process and for protecting allocated memory
against access by other processes.
File system
 In UNIX, almost everything is handled over the file
system interface.
 Device drivers can be addressed as files
 /proc file system allows us to access data and
parameters in the kernel
Overview of the Kernel Structure
Device drivers
 Abstract from the underlying hardware and allow
us to access the hardware with well-defined APIs
Networks
 Incoming packets are asynchronous events and
have o be collected and identified, before a
process can handle them.
 Most network operations cannot be allocated to a
specific process.
Features of Linux Kernel
Is a Monolithic kernel
 The entire functionality is contained in one kernel.
 In contrast, in microkernels (e.g., Mach kernel and
Windows NT), only memory management, IPC are
contained in the kernel. The remaining functionality is
moved to independent processes/threads running
outside the OS.
 + accessing resources directly from within the kernel,
avoiding expensive system calls and context switches.
 - OS becomes quite complex.
Feature of Linux Kernel
A cure is the use of kernel modules
 Linux allows kernel modules to be dynamically
loaded into (removed from) the kernel at run time.
 This is achieved with the use of well-defined
interfaces, e.g., register_netdev(),
register_chrdev(), register_blkdev().
 The run-time performance is guaranteed by
having modules run in protected kernel mode.
Activities in the Linux Kernel
Activities – Processes and System Calls
Processes operate exclusively in the user address
space, and can only access the memory allocated to
them.
 Violation leads to exceptions.
When a process wants to access devices or use a
functionality in the kernel  system call.
 The control is transferred to the kernel, which executes the
system call on behalf of the user process.
Processes can be interrupted voluntarily (wait on
semaphore or sleep) or involuntarily (interrupt).
Other Forms of Activities
Hardware interrupts
Software interrupts
Tasklets
Interrupts – Hardware IRQs
Peripherals use hardware interrupts to inform OS of events
(e.g., a packet has arrived at the network adapter)  an
interrupt handling routine is called.
The handling routine for a specific interrupt can be registered
(de-registered) by register_irq() (free_irq()).
Fast interrupts
 have a very short handling routine (that cannot be interrupted).
 Are specified by the flag SA_INTERRUPT in request_irq().
Slow interrupts
 Have a longer handling routine and can be interrupted by other
interrupts during their execution.
in_irq() (include/asm/hardirq.h) can be used to check whether or
not the current activity is an interrupt-handling routine.
Interrupts
Not every operation that needs to be executed in an
interrupt can be completed in a few instructions (e.g.,
a packet that arrives at a network adapter).
To keep interrupt handling short, the routine is
usually divided into two parts:
 Top-half: handles the most important tasks (e.g., copying the
arrived packet to a kernel buffer queue waiting for detailed handling
later)
 Bottom-half: handles non-time critical operations. It is being
scheduled for execution right after the top half is executed (e.g.,
when a packet arrives, the bottom half is run as a software interrupt
NET_RX_SOFTIRQ).
Software Interrupts
When a system call or a hardware interrupt
terminates, the scheduler calls do_softirq().
do_softirq() schedules software interrupts for
execution.
A maximum of 32 software interrupts can be defined
in Linux.
 NET_RX_SOFTIRQ and NET_TX_SOFTIRQ are
two software interrupts.
Multiple software interrupts can run concurrently, and
hence need to be reentrant.
in_softirq() (include/asm/softirq.h) can be used to
check whether or not the current activity is a software
interrupt.
Tasklets
A more formal mechanism of scheduling software
interrupts (and other tasks).
 The macro DECLARE_TASKLET(name, func,data)
 name: a name for the tasklet_struct data structure
 func: the tasklet’s handling routine.

 data: a pointer to private data to be passed to func().

 tasklet_schedule() schedules a tasklet for execution.


 tasklet_disable() stops a tasklet from running, even
if it has been scheduled for execution.
 tasklet_enable() reactivates a deactivated tasklet.
Tasklet Example
#include <linux/interrupt.h>
/* Handling routine of new tasklet */
void test_func(unsigned long);
/* Data of new tasklet */
char test_data[] = “Hello, I am a test tasklet”;
DECLARE_TASKLET(test_tasklet, test_func, (unsigned long)
&test_data);
void test_func(unsigned long data)
{
printk(KERN_DEBUG, “%s\n”, (char *) data);
}
….
tasklet_schedule(&test_tasklet);
Locking
Locking -- spinlock
A mechanism for busy wait locks.
 spin_lock_init(&my_spinlock)
 spin_lock (spinlock_t *my_spinlock)
 Tries to set the spinlock my_spinlock. If it is not free, then
wait or test until the lock is released.
 spin_unlock(spinlock_t *my_spinlock)
 Releases a lock.
 spin_is_lock(spinlock_t *my_lock) returns the
current value of the lock (non-zero value  lock is
set)
 spin_trylock(spinlock_t *my_lock) sets the
spinlock, if it is currently unlocked; otherwise, the
function returns a non-zero value.
Spinlock Example
#include <linux/spinlock.h>
spin_lock_init(&my_spinlock);
// One thread
spin_lock(&my_spinlock);
// Critical section
spin_unlock(&my_spinlock);
….
// Another thread
spin_lock(&my_spinlock);
// Critical section
spin_unlock(&my_spinlock);
Read-Write Spinlocks
Some data structure, such as the list of registered
network devices (dev_base), does not change
frequently, but is subject to many read accesses 
use of read-write spinlock to improve run-time
performance.
read_lock(): if there is no lock or only read lock, then
the critical section can be immediately accessed. If
there is a write lock, then we have to wait.
read_unlock(): A read activity leaves the critical
section. If a write activity is waiting and there exists
no other read activity, it gains access.
write_lock(): if there is a (read/write) lock, we have to
wait; otherwise, we put an exclusive lock.
write_unlock()
Kernel Modules
Kernel Modules
 Each kernel module implements init_module() and
cleanup_module().
 To load a kernel module into the kernel space
manually, use insmod modulename.o [argument].
In turns the following system calls are called:
 sys_create_module() allocates memory to
accommodate the module in the kernel space.
 sys_get_kernel_syms() returns the kernel’s symbol
table to resolve the missing references within the module
to kernel symbols.
 sys_init_module() copies the module’s object code into
the kernel address space and calls the module’s
init_module().
 Insmod wvlan_cs eth=1 network_name=“mywavelan”
Kernel Modules
rmmod modulename
 Removes the specified module from the kernel
address space. In turn, the system call
sys_delete_module() is called, which in turn calls
cleanup_module().
lsmod lists all currently loaded modules and
their dependencies and reference counts.
modinfo gives the information about a
module. The information is set by the macros
MODULE_DESCRIPTION,
MODULE_AUTHOR in the module’s source.
Passing Module Parameters
MODULE_PARM(var, type) designates the variable
var as a parameter of the module, and a value can be
assigned to this parameter during loading. Possible
types are:
 b: byte; h: short (two bytes);

i: integer; l: long; s: string.


MODULE_PARM_DESC(var, desc) adds a
description (desc) for the parameter var.
MODULE_DESCRIPTION(desc) contains a
description of the module.
EXPORT_SYMBOL(name) exports and adds a
function or variable of the kernel to the symbol table.
The Proc File System
The Proc File System
All files in /proc are virtual files, and are
generated to export the kernel
information in the user space.
The files and directories are based on
proc_dir_entry.
proc_dir_entry Structure
struct proc_dir_entry {
unsigned short low_ino; /* Inode number; automatically filled by proc_register */
unsigned short namelen; /* length of the file or directory name */
const char *name; /* a pointer to the name of the file */
mode_t mode; /* the file’s mode */
nlink_t nlink; /* the number of links to this file (default = 1) */
uid_t uid;
gid_t gid;
unsigned long size; /* length of the file as shown when the directory is displayed. */
struct inode_operations * proc_iops;
struct file_operations * proc_fops;
get_info_t *get_info(buffer, start, off, count);
struct module *owner;
struct proc_dir_entry *next, *parent, *subdir; /* pinters to link the proc directory
structure. */
void *data; /* a pointer to private data */
read_proc_t *read_proc (buffer, start, off, count, eof, data);
write_proc_t *write_proc(file, buffer,count,data);
atomic_t count; /* use count */
int deleted; /* delete flag */
kdev_t rdev;
};
Handling of /proc Entries.
create_proc_entry(name,mode,parent):
creates a file with name in the proc directory;
returns a pointer to the proc_dir_entry
structure.
 The name is relative to /proc/
test_entry = create_proc_entry(“test”, 0600, proc_net);
test_entrynlink = 1;
test_entrydata = (void *) &test_data;
test_entryread_proc = test_read_proc;
test_entrywrite_proc = test_write_proc;
remove_proc_entry(name,parent) removes
the proc file specified in name.
Handling of /proc Entries
proc_mkdir(name,parent) creates
directories in the proc directory; returns a
pointer to the proc_dir_entry structure.
create_proc_read_entry(name,mode,base,
get_info) creates the proc file name and
uses the function get_info() to initizlize read
accesses.
test_entry=create_proc_read_entry(“test”, 0600,
proc_net, test_get_info);
Memory Management
Reserving/Releasing Memory In the Kernel

kmalloc(size,priority): attempts to reserve


consecutive memory space with a size of size
bytes in the kernel memory.
 GFS_KERNEL: is used when the requesting
activity can be interrupted during the reservation.
 GFS_ATOMIC: is used when the memory request
should be atomic.
kfree(objp): releases the memory space
reserved at address objp
Reserving/Releasing Memory In the
Kernel
copy_from_user(to, from, count) copies count
bytes from the address from in the user
address space to the address to in the kernel
address space.
copy_to_user(to,from,count) copies count
bytes from the address from in the kernel
address space to the address to in the user
address space.
access_ok() confirms the corresponding
virtual memory page is actually residing in the
physical memory.
Memory Caches
Linux allows us to create a cache with memory
spaces of specific sizes  slab caches.
 kmem_cache_create(name, size, offset, flags, ctor,
dtor) creates a slab cache of memory spaces with
sizes in size bytes.
 name points to a string containing the name of the slab
cache; offset is usually set to null.
 flags specifies additional options, e.g.,
SLAB_HWCACHE_ALIGN (aligns to the size of the first
level cache in the CPU)
 ctor, dtor: specifies a constructor and a destructor for the
memory spaces used to initialize or clean up the reserved
memory spaces.
 skbuff_head_cache = kmem_cache_create
(“skbuffer_head_cache”, sizeof(struct sk_buff), 0,
SLAB_HWCACHE_ALIGN, skb_headerinit, NULL).
Memory Caches
kmem_cache_destroy(cachep): releases the slab
cache cachep.
kmem_cache_shrink(cachep): is called by the kernel
when the kernel itself requires memory space and
has to reduce the cache.
kmem_cache_alloc(cachep,flags): is used to request
a memory space from the slab cache, cachep. If the
slab cache is empty, then kmalloc() is used to
reserve new memory space.
kmem_cache_free(cachep, ptr): frees the meory
space that begins at ptr, and gives it back to the
cache, cachep.

You might also like