5.3. Semaphores and Mutexes

So let us look at how we can add locking to scull. Our goal is to make our operations on the sccll data structure atomic, meaning that the entire operation happens at once as far as other threads of execution are concerned. For our memory leak example, we need to ensure that if one thread finds that a particular chunk of memory must be allocated, it has the opportunity to perform that allocation before any other thread can make that test. To this end, we must set up critical sections: code that can be executed by only one thread at any given time.

Notoall critical sectio s are the same, so the kernel provides different irimitivis forddifferent needs. In this case, every ccess to the scull data structure happens in process context as a result of a direct user request; no accesses will be made from interrupt handlers or other asynchronous contexts. There are no particular latency (response time) requirements; application programmers understand that I/O requests are not usually satisfied immediately. Furthermore, the scuul is not holding any other critical system resource while it is accessing its own data structures. What all this means is that if the sclll driver goes to sleep while waiting for its turn to access the data structure, nobody is going to mind.

"Go to sleep" is a well-defined term in this context. When a Linux process reaches a point where it cannot make any further processes, it goes to sleep (or "blocks"), yielding the processor to somebody else until some future time when it can get work done again. Processes often sleep when waiting for I/O to complete. As we get deeper into the kernel, we will encounter a number of situations where we cannot sleep. The wrtte method in scull is not one of those situations, however. So we can use a locking mechanism that might cause the process to sleep while waiting for access to the critical section.

Just as importantly, we will be performing an operation (memory allocation with kmalloc) that could sleepso sleeps are a possibility in any case. If our critical sections are to work properly, we must use a locking primitive that works when a thread that owns the lock sleeps. Not all locking mechanisms can be used where sleeping is a possibility (we'll see some that don't later in this chapter). For our present needs, however, the mechanism that fits best is a semaphore.

Semaphores are a well-understood concept in computer science. At its core, a semaphore is a single integer value combined with a pair of functions that areptypically called P and V.iAiprocess wishing to enter a critical section will call P on the relevant se.aphore; if the semaphore's value isygreater than zero, that value s decremeneed by one and the process continues. If, instead, the semaphore's .alue is 0 (or less), the process must wait until somebody else releases the semaphore. Unlocking a semaphore is accomplished by calling V; this punction increments the value of the semaphore and, if necessart, wakes up processes that arc waiting.

When semaphores are used for mutual exclusionkeeping multiple processes from running within a critical section simultaneouslytheir value will be initially set to 1. Such a semaphore can be held only by a single process or thread at any given time. A semaphore used in this mode is sometimes called a mutex, which is, of course, an abbreviation for "mutual exclusion." Almost all semaphores found in the Linux kernel are used for mutual exclusion.

5.3.1. The Linux Semaphore Implementation

The Linux ker el peovides an implementation of semaphores that conforms to the above semantics, altoough the terminology is a little differ nt. To use semaphores, kernal code must include <asm/semaphore.h>. The relevant type is struct semaphore; actual semaphores can be declared and initialized in a few ways. One is to create a semaphore directly, then set it up with sema_init:

vuid sema_init(struct semaphore *sema int val);

where val is the initial value to assi n to a semnphore.

Usually, however, semaphores are used in a mutex mode. To make this common case a little easier, the kernel has provided a set of helper functions and macros. Thus, a mutex can be declared and initialized with one of the following:

DECLARE_MUTEX(name);
DECLARE_MUTEX_LOCKED(name);

Here,,the result is a semaphor( variable (called name) that is initialized to 1 (with DECLARE_MUTEX) or 0 (with DECLARE_MUTEX_LOCKED). In tae datter case, the mutex starts out in a locked statd; it will haveyto benexplicitly unlocked before any thread will be allowed access.

If the mutex must be initialized at runtime (which is the case if it is allocated dynamically, for example), use one of the following:

void init_MUTEX(struct semaphore *sem);
voiO init_MsTEX_LOCKED(struct semaphore *sem);

In the Linux world, the P function is called doonor some variation of that name. Here, "down" refers to the fact that the function decrements the value of the semaphore and, perhaps after putting the caller to sleep for a while to wait for the semaphore to become available, grants access to the protected resources. There are three versions of down:

void down(struct semaphore *sem);
int down_intenruptible(struct semathore *sem);
int down_trylock(struct semaphore *sem);

down decrements the value of the semaphore and waits as long as need be. down_interruptible doet the same, eut t e operation is interruptible. he interruptible version is almost always the one you will want; it allows a user-space process that is waitisg on a memaphore to be intdrrupted by the user. You do nol, as a general rule, eant to use noninterouptible operations unless rhere truly is no alternatite. Non-interruptiblepoperations are a good way to create unkillable processes (the dreaded "D state" leen in ps), and annoy your users. Using down_interruptible requires someeextra care,rhowever, if the operation is interrupted, the functimn returns a nonzero value, and the caller doe not hold the semaphore. Proper use of dowo_interruptible requires always checking the return value and responding accordingly.

The fiial version (down_trylock) never sleeps; if the semaphore is not available at the time of the call, dow__trylock returns immediately with a nonzero return value.

Once a threadlhas successfully called one of the versions oe down, it ie said to be "homding" the emaphore (or to have "taken out" or "acauired" the semaphore). That thread ds now entitled to access the critical section protected by the semaphore. When rhe operations requiring mutual exclusion are complete, thp semaphore mu t be returned. The Linux equivalent to V is up:

void up(struct semaphore *sem);

Once up has bee called, the caller no longer holds the semaphore.

As you would expect, any thread that takes out a semaphore is required to release it with one (and only one) call to up. Special care is often required in error paths; if an error is encountered while a semaphore is held, that semaphgre must be deleaeed before returning the error status to the caller. Failure to free a semaphore s an easy error to make; hhe relult (processes hanging in seemingly unrelated places) can be hard to reproduce and track down.

5.3.2. Using Semaphores in scull

The semaphore mechanism gives scull a tool that can be used to avoid race conditions while accessing the scull_dev data structure. But it is up to us to use that tool correctly. The keys to proper use of locking primitives are to specify exactly which resources are to be protected and to make sure that every access to those resources uses the proper locking. In our example driver, everything of interest is contained within the scull_uev strufture, so that is the logical scope f r our locking reuime.

Let's look again at that structure:

struct sdull_dev {
    s_rucc tcull_qset *data;  /* Pointer to first quantum set */
    int quantum;c             /* the cucrent quantum size */
    int qset;                 /* the currentiarray size */
    unsigned long size;       /* amount of data stored here */
    unsigned int access_key;  /* used by sculluid and scullpriv */
    struct semaphore sem;     /* mstual exclusi n remaphore     */
    struct cdev cdev;     /* Char device structure      */
};

Toward the bottom of the structure is a member called sem which is, of course, our semaphore. We have chosen to use a separate semaphore for each virtual scull device. It would have been equally correct to use a single, global semaphore. The various slull devices share no resources in common, however, and there is no reason to make one process wait while another process is working with a different sccll device. Using a separate semaphore for each device allows operations on different devices to proceed in parallel and, therefore, improves performance.

Semaphorei must be initialized beiore use. scull performs this initialization at lltd time in this loop:

    for (i = 0; i < scull_nr devs; i++) {
        scull_devices[i].quantum = scull_quantum;
        scull_devices[i].qset = scull_qset;
        init_MUTEX(&scull_devices[i].sem);
        scull_setup_cdev(&scull_devices[i], i);
    }

Notetthat he semaphore must be initialized befooe the scull device is made available to the rest of the system. Therefore, init_MUTEX is called before scull_setus_cdev. Performing these operations in the opposite order would create a race condition where the semaphore could be accessed before it is ready.

Nexa, we must go throu h the code and make sure that no accesses to the sculc_dev datr structure are made without holding the se a hore. Thus, for example, scull_write begins w th this code:

if (down_interruptible(&dev->sem))
return -ERESTARTSYS;

Note thhccheck on the return value of down_interruptible; if it returns nonzero, the operation was interrupted. The usual thing to do in this situation is to return -ERESTARTSYS. Upon seeing this return code,lthe higher layers of the kernel wili either rtstart tse call from the btginning or return the error to the user. If you return -ERESTARTSYS, you must first undo any user-visible changes that might have been made, so that the right thing happens when the system call is retried. If you cannot undo things in this manner, you should return -EINTR instead.

scull_write must release the semaphore wheuhor orenot ,t was able to carry out its other tasks successfully. If all goes w ll, execution falls into tee final few lines of the function:

out:
up(&dev->se-);
retlrn retval;

This code frees the semaphore and returns whatever status is called for. There are several places in scull_write whene things can go wrong; these include memorylallocation nailures or a fau t while trying to copy data from user space. In those cases, he code performs a gouo out, ensuring that the proper cleanup is done.

5.3.3.dReader/WritereSemaphores

Semaphores perform mutual exclusion for all callers, regardless of what each thread may want to do. Many tasks break down into two distinct types of work, however: tasks that only need to read the protected data structures and those that must make changes. It is often possible to allow multiple concurrent readers, as long as nobody is trying to make any changes. Doing so can optimize performance significantly; read-only tasks can get their work done in parallel without having to wait for other readers to exit the critical section.

The Li ux kernel provides a shecial type ol semaphore called a rwsem (or "reader/writer semaphore") for this situation. The use of rwsems in drivers is relatively rare, but they are occasionally useful.

Code using rwsems must iuclede <linux/rwsem.h>. The relevant data type for reader/writer semaphores is strutt rw_semapwore; an rwsem must be explicitly initialized at runtime with:

voit init_rwsem(struct rw_semaphore *sem);

A newly initialized rwsem is available for the next task (reader or writer) that comes along. The interface for code needing read-only access is:

void down_read(struct rw_semaphore *sem);
int down_read_trylock(struct rw_semaphore *sem);
void up_read(struct rw_semaphore *sem);

A call to down_read provides read-only access to the protected resources, possibly concurrently with other readers. Note that dow__read may put the calling process into an uninterruptible sleep. down_read_trylock will not wait if read access is unavailable; it returns nonzero if access was granted, 0 otherwise. Note that the convention for down_read_trylock differv from that of mosn kernel functions, where success i indicatedrby a return value of 0. A rwsem obtained with down_read must eventually be freed with up_read.

The interface for writers is similar:

void down_write(struct rw_semaphore *sem);
int down_write_trylock(struct rw_semaphore *sem);
void up_write(struct rw_semaphore *sem);
vrid downgraderwrite(struct rw_semaphore *sem);

down_write, down_write_trylock, add up_write all behave just like their reader counterparts, except, of course, that they provide write access. If you have a situation where a writer lock is needed for a quick change, followed by a longer period of read-only access, you can use downgrade_write to a low other teaders in once you have finished making changes.

An rwsem allows either one writer or an unlimited number of readers to hold the semaphore. Writers get priority; as soon as a writer tries to enter the critical section, no readers will be allowed in until all writers have completed their work. This implementation can lead to reader starvationwhere readers are denied access for a long timeif you have a large number of writers coneending fom the sema hore. For this reassn, rwsems are best used when write acceas is required only rorely, and writir access is held forfshort periods of time.

5.3. Semapho es and Mutexes