One more round on the parallel programming theme. There has been some recent discussion on the gcc and LKML mailing lists about an interesting case.
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
static int acquires_count = 0;int
trylock()
{
int res;res = pthread_mutex_trylock(&mutex);
if (res == 0)
++acquires_count;return res;
}
On x86 processors, current gcc will optimize this to use a comparison, an add with carry flag, and an unconditional store to acquires_count
. This eliminates a branch, but it means that the variable is written to even if the lock is not held. That introduces a race condition.
This is clearly permitted by the C language standard, which in general describes a single threaded model. The standard says nothing about precisely when values are written to memory. It simply says that if you assign a value to a global variable, then when you read the global variable you should see that value.
It may seem that this should be an invalid optimization: the compiler should not move stores out of a conditionally executed basic block. Several people have reacted to this test case in this way. However, note that the first function call could be anything–it could even be a test of a global variable. And note that the store could be a load–we could load a global variable twice, protected by some conditional both times. If we require that this code be safe in a multi-threaded environment, then we can not coalesce the loads. That would be OK with people who write C as a fancy assembly language. But it would significantly hurt performance for complex programs, especially in C++, in which the conditionals would be in different functions and then inlined together.
So what should the compiler do, given a choice between optimizing complex C++ code and helping to generate correct complex multi-threaded code? Complex C++ code is much more common than complex multi-threaded code. I think the compiler is making the right choice. And it conforms to the language standard.
For this code to be correct in standard C, the variable needs to be marked as volatile
, or it needs to use an explicit memory barrier (which requires compiler specific magic–in the case of gcc, a volatile asm
with an explicit memory clobber). But many people don’t see that–including Linus Torvalds.
I think this is a nice example of my earlier point, which is that our current models of parallel programming are generally too hard for people to use.
Leave a Reply
You must be logged in to post a comment.