Home > Win APIs > Thread Synchronization with Kernel Objects

Thread Synchronization with Kernel Objects

The wonderful thing about user-mode synchronization is that it is very fast. If you are concerned about your thread’s performance, you should first determine whether a user-mode thread synchronization mechanism will work for you.

Although user-mode thread synchronization mechanisms offer great performance, they do have limitations, and for many applications they simply do not work. For example, the interlocked family of functions operates only on single values and never places a thread into a wait state. You can use critical sections to place a thread in a wait state, but you can use them only to synchronize threads contained within a single process. Also, you can easily get into deadlock situations with critical sections because you cannot specify a timeout value while waiting to enter the critical section.

As you’ll see, kernel objects are far more versatile than the user-mode mechanisms. In fact, the only bad side to kernel objects is their performance. When you call any of the new functions mentioned in this post, the calling thread must transition from user mode to kernel mode. This transition is costly: it takes about 200 CPU cycles on the x86 platform for an empty system call—and this, of course, does not include the execution of the kernel-mode code that actually implements the function your thread is calling. But what takes several orders of magnitude more is the overhead of scheduling a new thread with all the cache flushes/misses it entails. Here we’re talking about tens of thousands of cycles.

Wait functions cause a thread to voluntarily place itself into a wait state until a specific kernel object becomes signaled. Notice that the thread is not placed into a wait state if the kernel object is signaled when a Wait function is called. By far, the most common of these functions is WaitForSingleObject.

WaitForSingleObject‘s return value indicates why the calling thread became schedulable again. If the object the thread is waiting on became signaled, the return value is WAIT_OBJECT_0; if the timeout expires, the return value is WAIT_TIMEOUT. If you pass a bad parameter (such as an invalid handle) to WaitForSingleObject, the return value is WAIT_FAILED (call GetLastError for more information.)

The function, WaitForMultipleObjects, is similar to WaitForSingleObject except that it allows the calling thread to check the signaled state of several kernel objects simultaneously.

You can use WaitForMultipleObjects in two different ways—to allow a thread to enter a wait state until any one of the specified kernel objects becomes signaled, or to allow a thread to wait until all the specified kernel objects become signaled. The bWaitAll parameter tells the function which way you want it to work. If you pass TRUE for this parameter, the function does not allow the calling thread to execute until all the objects have become signaled.

The WaitForMultipleObjects function’s return value tells the caller why it got rescheduled. The possible return values are WAIT_FAILED and WAIT_TIMEOUT, which are self-explanatory. If you pass TRUE for bWaitAll and all the objects become signaled, the return value is WAIT_OBJECT_0. If you pass FALSE for bWaitAll, the function returns as soon as any of the objects becomes signaled. In this case, you probably want to know which object became signaled. The return value is a value between WAIT_OBJECT_0 and (WAIT_OBJECT_0 + dwCount – 1). In other words, if the return value is not WAIT_TIMEOUT and is not WAIT_FAILED, you should subtract WAIT_OBJECT_0 from the return value. The resulting number is an index into the array of handles that you passed as the second parameter to WaitForMultipleObjects. The index tells you which object became signaled.

If you pass FALSE for the bWaitAll parameter, WaitForMultipleObjects scans the handle array from index 0 on up, and the first object that is signaled satisfies the wait. This can have some undesirable ramifications. For example, your thread might be waiting for three child processes to terminate by passing three process handles to this function. If the process at index 0 in the array terminates, WaitForMultipleObjects returns. Now the thread can do whatever it needs to and then loop back around, waiting for another process to terminate. If the thread passes the same three handles, the function returns immediately with WAIT_OBJECT_0 again. Unless you remove the handles that you’ve already received notifications from, your code will not work correctly.

While you debug a process, all threads within that process are suspended when breakpoints are hit. So debugging a process makes the “first in, first out” algorithm highly unpredictable because threads are frequently suspended and resumed.

Of all the kernel objects, events are by far the most primitive. They contain a usage count (as all kernel objects do), a Boolean value indicating whether the event is an auto-reset or manual-reset event, and another Boolean value indicating whether the event is signaled or nonsignaled.

Events signal that an operation has completed. There are two different types of event objects: manual-reset events and auto-reset events. When a manual-reset event is signaled, all threads waiting on the event become schedulable. When an auto-reset event is signaled, only one of the threads waiting on the event becomes schedulable.

Read about synchronization object security and access right from this link.

PulseEvent makes an event signaled and then immediately nonsignaled; it’s just like calling Set-Event immediately followed by ResetEvent. If you call PulseEvent on a manual-reset event, any and all threads waiting on the event when it is pulsed are schedulable. If you call PulseEvent on an auto-reset event, only one waiting thread becomes schedulable. If no threads are waiting on the event when it is pulsed, there is no effect.

Waitable timers are kernel objects that signal themselves at a certain time or at regular intervals. They are most commonly used to have some operation performed at a certain time.

The x86 processors deal with unaligned data references silently. So passing the address of a FILETIME to SetWaitableTimer always works when your application is running on an x86 CPU. However, other processors do not handle unaligned references as silently. In fact, most other processors raise an EXCEPTION_DATATYPE_MISALIGNMENT exception that causes your process to terminate. Alignment errors are the biggest cause of problems when you port code that works on x86 computers to other processors. If you pay attention to alignment issues now, you can save months of porting effort later!

SetWaitableTimer‘s last parameter, bResume, is useful for computers that supports suspend and resume. Usually, you pass FALSE for this argument. However, if you’re writing a meeting-planner type of application in which you want to set timers that remind the user of scheduled meetings, you should pass TRUE. When the timer goes off, it takes the machine out of suspend mode (if it’s in suspend mode) and wakes up the threads that are waiting on the timer. The application then plays a sound and presents a message box telling the user of the upcoming meeting. If you pass FALSE for the bResume parameter, the timer object becomes signaled but any threads that it wakes up do not get CPU time until the machine is somehow resumed (usually by the user waking it up)

 

Semaphore Kernel Object

Semaphore kernel objects are used for resource counting. They contain a usage count, as all kernel objects do, but they also contain two additional signed 32-bit values: a maximum resource count and a current resource count. The maximum resource count identifies the maximum number of resources that the semaphore can control; the current resource count indicates the number of these resources that are currently available.

 

Mutex Kernel Object

Mutex kernel objects ensure that a thread has mutual exclusive access to a single resource. In fact, this is how the mutex got its name. A mutex object contains a usage count, thread ID, and recursion counter. Mutexes behave identically to critical sections. However, mutexes are kernel objects, while critical sections are user-mode synchronization objects (except if contention is high). This means that mutexes are slower than critical sections. But it also means that threads in different processes can access a single mutex, and it means that a thread can specify a timeout value while waiting to gain access to a resource.

The thread ID identifies which thread in the system currently owns the mutex, and the recursion counter indicates the number of times that this thread owns the mutex. Mutexes have many uses and are among the most frequently used kernel objects. Typically, they are used to guard a block of memory that is accessed by multiple threads. If multiple threads were to update the memory block simultaneously, the data in the block would be corrupted. Mutexes ensure that any thread accessing the memory block has exclusive access to the block so that the integrity of the data is maintained.

For mutexes, there is one special exception to the normal kernel object signaled/nonsignaled rules. Let’s say that a thread attempts to wait on a nonsignaled mutex object. In this case, the thread is usually placed in a wait state. However, the system checks to see whether the thread attempting to acquire the mutex has the same thread ID as recorded inside the mutex object. If the thread IDs match, the system allows the thread to remain schedulable—even though the mutex was nonsignaled. We don’t see this “exceptional” behavior applied to any other kernel object anywhere in the system. Every time a thread successfully waits on a mutex, the object’s recursion counter is incremented. The only way the recursion counter can have a value greater than 1 is if the thread waits on the same mutex multiple times, taking advantage of this rule exception.

Device objects are synchronizable kernel objects, which means that you can call WaitForSingle-Object, passing the handle of a file, socket, communication port, and so on. While the system performs the asynchronous I/O, the device object is in the nonsignaled state. As soon as the operation is complete, the system changes the state of the object to signaled so that the thread knows that the operation has completed. At this point, the thread continues execution.

About these ads
Categories: Win APIs
  1. December 11, 2010 at 9:39 pm | #1

    Abdelrahman, you’re the best. Im a fan forever I just downloaded some of you’re research work.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,812 other followers

%d bloggers like this: