简单大气成熟的网名2字:可重入和线程安全概念

来源:百度文库 编辑:偶看新闻 时间:2024/05/02 19:35:24
在多线程或有异常控制流的情况下,当某个函数运行到中途时,控制流(也就是当前指令序列)就有可能被打断而去执行另一个函数.而"另一个函数"很有可能是它本身.

如果在这种情况下不会出现问题,比如说数据或状态不会被破坏,行为确定。那么这个函数就被称做"可重入"的.补充:函数是可重入(reentrant)的,是指对于相同的(并且合法的)函数参数(包括无参函数的情况),多次调用此函数产生的行为是可预期的,即函数的行为一致,或者结果相同。不能保证这一点的函数称为不可重入(non-reentrant)函数。可重入和线程安全(Thread-Safe)是两个不同的概念:可重入函数一定是线程安全的;线程安全的函数可能是重入的,也可能是不重入的;线程不安全的函数一定是不可重入的。
可重入函数:     Reentrant Function
线程安全函数:   Thread-Safe Function
可重入和线程安全不是一个概念。
可重入 => 线程安全

可重入函数要解决的问题是,不在函数内部使用静态或全局数据,不返回静态或全局数据,也不调用不可重入函数。

线程安全函数要解决的问题是,多个线程调用函数时访问资源冲突。

函数如果使用静态变量,通过加锁后可以转成线程安全函数,但仍然有可能不是可重入的,比如strtok。

strtok是既不可重入的,也不是线程安全的。

加锁的strtok不是可重入的,但线程安全。

而strtok_r既是可重入的,也是线程安全的。
3. reentrant函数与thread safe函数的区别

reentrant函数与是不是多线程无关,如果是reentrant函数,那么要求即使是同一个进程(或线程)同时多次进入该函数时,该函数仍能够正确的运作.
该要求还蕴含着,如果是在多线程环境中,不同的两个线程同时进入该函数时,该函数也能够正确的运作.

thread safe函数是与多线程有关的,它只是要求不同的两个线程同时对该函数的调用在逻辑上是正确的.

从上面的说明可以看出,reentrant的要求比thread safe的要求更加严格.reentrant的函数必是thread safe的,而thread safe的函数
未必是reentrant的
结论:
1. reentrant是对函数相当严格的要求,绝大部分函数都不是reentrant的(APUE上有一个reentrant函数
的列表).
什么时候我们需要reentrant函数呢?只有一个函数需要在同一个线程中需要进入两次以上,我们才需要
reentrant函数.这些情况主要是异步信号处理,递归函数等等.(non-reentrant的递归函数也不一定会
出错,出不出错取决于你怎么定义和使用该函数). 大部分时候,我们并不需要函数是reentrant的.

2. 在多线程环境当中,只要求多个线程可以同时调用一个函数时,该函数只要是thread safe的就可以了.
我们常见的大部分函数都是thread safe的,不确定的话请查阅相关文档.

3. reentrant和thread safe的本质的区别就在于,reentrant函数要求即使在同一个线程中任意地进入两次以上,
也能正确执行.

大家常用的malloc函数是一个典型的non-reentrant但是是thread safe函数,这就说明,我们可以方便的
在多个线程中同时调用malloc,但是,如果将malloc函数放入信号处理函数中去,这是一件很危险的事情.

4. reentrant函数肯定是thread safe函数,也就是说,non thread safe肯定是non-reentrant函数
不能简单的通过加锁,来使得non-reentrant函数变成 reentrant函数
"使用的全局变量的函数也不一定是不可重入的。"这句是正确的,只要正确使用就可以了,
但是不使用全局变量是写可重入函数的简单方法.
"调用了不可重入函数的函数不一定是不可重入的。"这句是不对的,
因为你无法保证被调用的不可重入函数部分不被重入

=================================================================

In the early days of programming, non-reentrancy was not a threat toprogrammers; functions did not have concurrent access and there were nointerrupts. In many older implementations of the C language, functionswere expected to work in an environment of single-threaded processes.

Now, however, concurrent programming is common practice, and youneed to be aware of the pitfalls. This article describes some potentialproblems due to non-reentrancy of the function in parallel andconcurrent programming. Signal generation and handling in particularadd extra complexity. Due to the asynchronous nature of signals, it isdifficult to point out the bug caused when a signal-handling functiontriggers a non-reentrant function.

This article:

  • Defines reentrancy and includes a POSIX listing of a reentrant function
  • Provides examples to show problems caused by non-reentrancy
  • Suggests ways to ensure reentrancy of the underlying function
  • Discusses dealing with reentrancy at the compiler level

What is reentrancy?

A reentrant function is one that can be used by more than one task concurrently without fear of data corruption. Conversely, a non-reentrantfunction is one that cannot be shared by more than one task unlessmutual exclusion to the function is ensured either by using a semaphoreor by disabling interrupts during critical sections of code. Areentrant function can be interrupted at any time and resumed at alater time without loss of data. Reentrant functions either use localvariables or protect their data when global variables are used.

A reentrant function:

  • Does not hold static data over successive calls
  • Does not return a pointer to static data; all data is provided by the caller of the function
  • Uses local data or ensures protection of global data by making a local copy of it
  • Must not call any non-reentrant functions

Don't confuse reentrance with thread-safety. From the programmerperspective, these two are separate concepts: a function can bereentrant, thread-safe, both, or neither. Non-reentrant functionscannot be used by multiple threads. Moreover, it may be impossible tomake a non-reentrant function thread-safe.

IEEE Std 1003.1 lists 118 reentrant UNIX? functions, which aren't duplicated here. See Resources for a link to the list at unix.org.

The rest of the functions are non-reentrant because of any of the following:

  • They call malloc or free
  • They are known to use static data structures
  • They are part of the standard I/O library

Signals and non-reentrant functions

A signal is a software interrupt. It empowers a programmerto handle an asynchronous event. To send a signal to a process, thekernel sets a bit in the signal field of the process table entry,corresponding to the type of signal received. The ANSI C prototype of asignal function is:

void (*signal (int sigNum, void (*sigHandler)(int))) (int);

Or, in another representation:

typedef void sigHandler(int);
SigHandler *signal(int, sigHandler *);

When a signal that is being caught is handled by a process, thenormal sequence of instructions being executed by the process istemporarily interrupted by the signal handler. The process thencontinues executing, but the instructions in the signal handler are nowexecuted. If the signal handler returns, the process continuesexecuting the normal sequence of instructions it was executing when thesignal was caught.

Now, in the signal handler you can't tell what the process wasexecuting when the signal was caught. What if the process was in themiddle of allocating additional memory on its heap using malloc, and you call mallocfrom the signal handler? Or, you call some function that was in themiddle of the manipulation of the global data structure and you callthe same function from the signal handler. In the case of malloc, havoc can result for the process, because malloc usually maintains a linked list of all its allocated area and it may have been in the middle of changing this list.

An interrupt can even be delivered between the beginning and end ofa C operator that requires multiple instructions. At the programmerlevel, the instruction may appear atomic (that is, cannot be dividedinto smaller operations), but it might actually take more than oneprocessor instruction to complete the operation. For example, take thispiece of C code:

temp += 1;

On an x86 processor, that statement might compile to:

mov ax,[temp]
inc ax
mov [temp],ax

This is clearly not an atomic operation.

This example shows what can happen if a signal handler runs in the middle of modifying a variable:


Listing 1. Running a signal handler while modifying a variable
#include 
#include

struct two_int { int a, b; } data;

void signal_handler(int signum){
printf ("%d, %d\n", data.a, data.b);
alarm (1);
}

int main (void){
static struct two_int zeros = { 0, 0 }, ones = { 1, 1 };

signal (SIGALRM, signal_handler);

data = zeros;

alarm (1);

while (1)
{data = zeros; data = ones;}
}

This program fills data with zeros, ones, zeros, ones,and so on, alternating forever. Meanwhile, once per second, the alarmsignal handler prints the current contents. (Calling printfin the handler is safe in this program, because it is certainly notbeing called outside the handler when the signal happens.) What outputdo you expect from this program? It should print either 0, 0 or 1, 1.But the actual output is as follows:

0, 0
1, 1

(Skipping some output...)

0, 1
1, 1
1, 0
1, 0
...

On most machines, it takes several instructions to store a new value in data,and the value is stored one word at a time. If the signal is deliveredbetween these instructions, the handler might find that data.a is 0 and data.bis 1, or vice versa. On the other hand, if we compile and run this codeon a machine where it is possible to store an object's value in oneinstruction that cannot be interrupted, then the handler will alwaysprint 0, 0 or 1, 1.

Another complication with signals is that, just by running testcases you can't be sure that your code is signal-bug free. Thiscomplication is due to the asynchronous nature of signal generation.


Non-reentrant functions and static variables

Suppose that the signal handler uses gethostbyname, which is non-reentrant. This function returns its value in a static object:

static struct hostent host; /* result stored here*/

And it reuses the same object each time. In the following example, if the signal happens to arrive during a call to gethostbyname in main, or even after a call while the program is still using the value, it will clobber the value that the program asked for.


Listing 2. Risky use of gethostbyname
main(){
struct hostent *hostPtr;
...
signal(SIGALRM, sig_handler);
...
hostPtr = gethostbyname(hostNameOne);
...
}

void sig_handler(){
struct hostent *hostPtr;
...
/* call to gethostbyname may clobber the value stored during the call
inside the main() */
hostPtr = gethostbyname(hostNameTwo);
...
}

However, if the program does not use gethostbyname or any other function that returns information in the same object, or if it always blocks signals around each use, you're safe.

Many library functions return values in a fixed object, alwaysreusing the same object, and they can all cause the same problem. If afunction uses and modifies an object that you supply, it is potentiallynon-reentrant; two calls can interfere if they use the same object.

A similar case arises when you do I/O using streams. Suppose the signal handler prints a message with fprintf and the program was in the middle of an fprintfcall using the same stream when the signal was delivered. Both thesignal handler's message and the program's data could be corrupted,because both calls operate on the same data structure: the streamitself.

Things become even more complicated when you're using a third-partylibrary, because you never know which parts of the library arereentrant and which are not. As with the standard library, there can bemany library functions that return values in fixed objects, alwaysreusing the same objects, which causes the functions to benon-reentrant.

The good news is, these days many vendors have taken the initiativeto provide reentrant versions of the standard C library. You'll need togo through the documentation provided with any given library to know ifthere is any change in the prototypes and therefore in the usage of thestandard library functions.


Practices to ensure reentrancy

Sticking to these five best practices will help you maintain reentrancy in your programs.

Practice 1

Returning a pointer to static data may cause a function to be non-reentrant. For example, a strToUpper function, converting a string to uppercase, could be implemented as follows:


Listing 3. Non-reentrant version of strToUpper
char *strToUpper(char *str)
{
/*Returning pointer to static data makes it non-reentrant */
static char buffer[STRING_SIZE_LIMIT];
int index;

for (index = 0; str[index]; index++)
buffer[index] = toupper(str[index]);
buffer[index] = '\0';
return buffer;
}

You can implement the reentrant version of this function by changingthe prototype of the function. This listing provides storage for theoutput string:


Listing 4. Reentrant version of strToUpper
char *strToUpper_r(char *in_str, char *out_str)
{
int index;

for (index = 0; in_str[index] != '\0'; index++)
out_str[index] = toupper(in_str[index]);
out_str[index] = '\0';

return out_str;
}

Providing output storage by the calling function ensures thereentrancy of the function. Note that this follows a standardconvention for the naming of reentrant function by suffixing thefunction name with "_r".

Practice 2

Remembering the state of the data makes the function non-reentrant.Different threads can successively call the function and modify thedata without informing the other threads that are using the data. If afunction needs to maintain the state of some data over successivecalls, such as a working buffer or a pointer, the caller should providethis data.

In the following example, a function returns the successivelowercase characters of a string. The string is provided only on thefirst call, as with the strtok subroutine. The function returns \0 when it reaches the end of the string. The function could be implemented as follows:


Listing 5. Non-reentrant version of getLowercaseChar
char getLowercaseChar(char *str)
{
static char *buffer;
static int index;
char c = '\0';
/* stores the working string on first call only */
if (string != NULL) {
buffer = str;
index = 0;
}

/* searches a lowercase character */
while(c=buff[index]){
if(islower(c))
{
index++;
break;
}
index++;
}

return c;
}

This function is not reentrant, because it stores the state of the variables. To make it reentrant, the static data, the index variable, needs to be maintained by the caller. The reentrant version of the function could be implemented like this:


Listing 6. Reentrant version of getLowercaseChar
char getLowercaseChar_r(char *str, int *pIndex)
{

char c = '\0';

/* no initialization - the caller should have done it */

/* searches a lowercase character */

while(c=buff[*pIndex]){
if(islower(c))
{
(*pIndex)++; break;
}
(*pIndex)++;
}
return c;
}

Practice 3

On most systems, malloc and free are notreentrant, because they use a static data structure that records whichmemory blocks are free. As a result, no library functions that allocateor free memory are reentrant. This includes functions that allocatespace to store a result.

The best way to avoid the need to allocate memory in a handler is toallocate, in advance, space for signal handlers to use. The best way toavoid freeing memory in a handler is to flag or record the objects tobe freed and have the program check from time to time whether anythingis waiting to be freed. But this must be done with care, becauseplacing an object on a chain is not atomic, and if it is interrupted byanother signal handler that does the same thing, you could "lose" oneof the objects. However, if you know that the program cannot possiblyuse the stream that the handler uses at a time when signals can arrive,you are safe. There is no problem if the program uses some other stream.

Practice 4

To write bug-free code, practice care in handling process-wide global variables like errno and h_errno. Consider the following code:


Listing 7. Risky use of errno
if (close(fd) < 0) {
fprintf(stderr, "Error in close, errno: %d", errno);
exit(1);
}

Suppose a signal is generated during the very small time gap between setting the errno variable by the close system call and its return. The generated signal can change the value of errno, and the program behaves unexpectedly.

Saving and restoring the value of errno in the signal handler, as follows, can resolve the problem:


Listing 8. Saving and restoring the value of errno
void signalHandler(int signo){
int errno_saved;

/* Save the error no. */
errno_saved = errno;

/* Let the signal handler complete its job */
...
...

/* Restore the errno*/
errno = errno_saved;
}

Practice 5

If the underlying function is in the middle of a critical sectionand a signal is generated and handled, this can cause the function tobe non-reentrant. By using signal sets and a signal mask, the criticalregion of code can be protected from a specific set of signals, asfollows:

  1. Save the current set of signals.
  2. Mask the signal set with the unwanted signals.
  3. Let the critical section of code complete its job.
  4. Finally, reset the signal set.

Here is an outline of this practice:


Listing 9. Using signal sets and signal masks
sigset_t newmask, oldmask, zeromask;
...
/* Register the signal handler */
signal(SIGALRM, sig_handler);

/* Initialize the signal sets */
sigemtyset(&newmask); sigemtyset(&zeromask);

/* Add the signal to the set */
sigaddset(&newmask, SIGALRM);

/* Block SIGALRM and save current signal mask in set variable 'oldmask'
*/
sigprocmask(SIG_BLOCK, &newmask, &oldmask);

/* The protected code goes here
...
...
*/

/* Now allow all signals and pause */
sigsuspend(&zeromask);

/* Resume to the original signal mask */
sigprocmask(SIG_SETMASK, &oldmask, NULL);

/* Continue with other parts of the code */

Skipping sigsuspend(&zeromask); can cause aproblem. There has to be some gap of clock cycles between theunblocking of signals and the next instruction carried by the process,and any occurrence of a signal in this window of time is lost. Thefunction call sigsuspend resolves this problem byresetting the signal mask and putting the process to sleep in a singleatomic operation. If you are sure that signal generation in this windowof time won't have any adverse effects, you can skip sigsuspend and go directly to resetting the signal.


Dealing with reentrancy at the compiler level

I would like to propose a model for dealing with reentrant functions at the compiler level. A new keyword, reentrant, can be introduced for the high-level language, and functions can be given a reentrant specifier that will ensure that the functions are reentrant, like so:

reentrant int foo();

This directive instructs the compiler to give special treatment tothat particular function. The compiler can store this directive in itssymbol table and use it during the intermediate code generation phase.To accomplish this, some design changes are required in the compiler'sfront end. This reentrant specifier follows these guidelines:

  1. Does not hold static data over successive calls
  2. Protects global data by making a local copy of it
  3. Must not call non-reentrant functions
  4. Does not return a reference to static data, and all data is provided by the caller of the function

Guideline 1 can be ensured by type checking and throwing an errormessage if there is any static storage declaration in the function.This can be done during the semantic analysis phase of the compilation.

Guideline 2, protection of global data, can be ensured in two ways.The primitive way is by throwing an error message if the functionmodifies global data. A more sophisticated technique is to generateintermediate code in such a way that the global data doesn't getmangled. An approach similar to Practice 4, above, can be implementedat the compiler level. On entering the function, the compiler can storethe to-be-manipulated global data using a compiler-generated temporaryname, then restore the data upon exiting the function. Storing datausing a compiler-generated temporary name is normal practice for thecompiler.

Ensuring guideline 3 requires the compiler to have prior knowledgeof all the reentrant functions, including the libraries used by theapplication. This additional information about the function can bestored in the symbol table.

Finally, guideline 4 is already guaranteed by guideline 2. There isno question of returning a reference to static data if the functiondoesn't have one.

This proposed model would make the programmer's job easier infollowing the guidelines for reentrant functions, and by using thismodel, code would be protected against the unintentional reentrancy bug.


Resources

  • You can read or download IEEE Std 1003.1 from unix.org, a Web site of The Open Group (registration is required to view or download the document).

  • Starting with Synchronization is not the enemy (developerWorks, July 2001), this series of three articles covers issues of threading and concurrency when programming in the Java? language.

  • PowerPC developers will appreciate the insights presented in Save your code from meltdown using PowerPC atomic instructions (developerWorks, November 2004); it describes techniques for safe concurrent programming in PowerPC assembly language.

  • Good background for UNIX programmers includes UNIX Network Programming by W. Richard Stevens and Design of the Unix Operating System by Maurice J. Bach.

  • Find more resources for Linux developers in the developerWorks Linux zone.

  • Get involved in the developerWorks community by participating in developerWorks blogs.

  • Browse for books on these and other technical topics.

About the author

Dipakprovides Level 3 support for Distributed File System (DFS). His workinvolves kernel- and user-level debugging of dumps and crashes, as wellas fixing the reported bugs on the AIX and Solaris platforms. ContactDipak at dipakjha@in.ibm.com.