The demo project (Visual Studio 2015 solution) demonstrating the behavior in this article can be downloaded here.

Introduction

Using atexit() to specify functions to be called if an application terminates is quite common practice. This is especially true for libraries since the C-standard specified atexit()-function is a way for the library to register its cleanup logic without relying on the 3rd-party application to properly call a specific cleanup function.

This is also what the library the author was working with did. Since the usage of the atexit()-function is nothing uncommon, it was quite surprising to observe that obviously the cleanup handling (which got registered via the atexit()-function) occurred after some resources were already freed when compiling the code with Microsoft’s Universal C runtime. In this particular case, this fact resulted in the cleanup function being stuck in an endless loop with the result of the app never terminating.

Well known behavior of atexit()

To understand the root cause of the problem, let’s first take a look at a simple case of using an atexit()-registered function to stop a thread and wait until the thread terminated before the hosting application closes cleanly:

#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <iostream>

static int threadCounter = 0;
static HANDLE handle = nullptr;
static bool running = true;

static void terminateThread(void)
{
  running = false;
  WaitForSingleObject(handle, INFINITE);
  std::cout << "done waiting - counter is: " << threadCounter << "\n";
}

static DWORD WINAPI dummy_worker(void*)
{
  threadCounter++;

  while (running) { Sleep(1000); }

  threadCounter--;
  return 0;
}

int main(void)
{
  atexit(&terminateThread);

  handle = CreateThread(NULL, 0, &dummy_worker, nullptr, 0, nullptr);
  Sleep(100);
  return 0;
}

#define WIN32_LEAN_AND_MEAN

#include <windows.h>

#include <iostream>

static int threadCounter = 0;

static HANDLE handle = nullptr;

static bool running = true;

static void terminateThread(void)

{

running = false;

WaitForSingleObject(handle, INFINITE);

std::cout << "done waiting - counter is: " << threadCounter << "\n";

}

static DWORD WINAPI dummy_worker(void*)

{

threadCounter++;

while (running) { Sleep(1000); }

threadCounter--;

return 0;

}

int main(void)

{

atexit(&terminateThread);

handle = CreateThread(NULL, 0, &dummy_worker, nullptr, 0, nullptr);

Sleep(100);

return 0;

}

(Sidenote on this code: The code is kept as simple as possible to demonstrate the actual problem. The fact that it’s not really thread-safe is not relevant for this topic.)

As we see, the test case is quite simple.
main() spawns a simple worker thread (dummy_worker()) which increments a threadCounter when it’s started, waits until running is set to false just to decrement the threadCounter again.
In main() we register the terminateThread()-function using atexit() so to make sure that we cleanly shut down the running thread.
To do that, terminateThread() sets running to false and waits until the thread got signaled (i.e. terminated) via WaitForSingleObject() just to print out the current thread counter value (which we certainly expect to be 0 at this point).
Right before we return from main() we give the thread some time to ensure it’s started.

Running this app, we see it behaves as we expected and get the output:
done waiting – counter is: 0

No big surprise here.

atexit() and DLLs

Now let’s make things a bit more interesting and move that code inside a DLL (into the startThread()-function) and call that from the application’s main()-function.

#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include "testdll.h"

int main(void)
{
  startThread();
  Sleep(100);
  return 0;
}

#define WIN32_LEAN_AND_MEAN

#include <windows.h>

#include "testdll.h"

int main(void)

{

startThread();

Sleep(100);

return 0;

}

#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <iostream>

static int threadCounter = 0;
static HANDLE handle = nullptr;
static bool running = true;

static void terminateThread(void)
{
  running = false;
  WaitForSingleObject(handle, INFINITE);
  std::cout << "done waiting - counter is: " << threadCounter << "\n";
}

static DWORD WINAPI dummy_worker(void*)
{
  threadCounter++;

  while (running) { Sleep(1000); }

  threadCounter--;
  return 0;
}

void startThread(void)
{
  atexit(&terminateThread);

  handle = CreateThread(NULL, 0, &dummy_worker, nullptr, 0, nullptr);
}

#define WIN32_LEAN_AND_MEAN

#include <windows.h>

#include <iostream>

static int threadCounter = 0;

static HANDLE handle = nullptr;

static bool running = true;

static void terminateThread(void)

{

running = false;

WaitForSingleObject(handle, INFINITE);

std::cout << "done waiting - counter is: " << threadCounter << "\n";

}

static DWORD WINAPI dummy_worker(void*)

{

threadCounter++;

while (running) { Sleep(1000); }

threadCounter--;

return 0;

}

void startThread(void)

{

atexit(&terminateThread);

handle = CreateThread(NULL, 0, &dummy_worker, nullptr, 0, nullptr);

}

Certainly we expect to see the same behavior we saw before. So let’s get the console output:
“done waiting – counter is: 1”

This is not quite what we expected to see. In the end we did cleanly terminate the thread… Or didn’t we?

Understanding what’s going on

To get a better feeling of what’s going on here, let’s add some debug output.

We add another atexit()-registered function (in the application’s main()-function).
We add some output to DllMain() to see how attaching and detaching of threads/processes works.
We print out the state of returning from main() right before it returns.
We add some output at the start of the terminateThread()-function.

#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <iostream>
#include "testdll.h"

void atExitMainPorcess(void)
{
  std::cout << "atExitMainProcess\n";
}

int main(void)
{
  atexit(&atExitMainPorcess);

  startThread();
  Sleep(100);

  std::cout << "returning from process main\n";
  return 0;
}

#define WIN32_LEAN_AND_MEAN

#include <windows.h>

#include <iostream>

#include "testdll.h"

void atExitMainPorcess(void)

{

std::cout << "atExitMainProcess\n";

}

int main(void)

{

atexit(&atExitMainPorcess);

startThread();

Sleep(100);

std::cout << "returning from process main\n";

return 0;

}

static int threadCounter = 0;
static HANDLE handle = nullptr;
static bool running = true;

static void terminateThread(void)
{
  std::cout << "terminating thread\n";
  running = false;
  WaitForSingleObject(handle, INFINITE);
  std::cout << "done waiting - counter is: " << threadCounter << "\n";
}

static DWORD WINAPI dummy_worker(void*)
{
  threadCounter++;

  while (running) { Sleep(1000); }

  threadCounter--;
  return 0;
}

void startThread(void)
{
  atexit(&terminateThread);

  handle = CreateThread(NULL, 0, &dummy_worker, nullptr, 0, nullptr);
}

BOOL APIENTRY DllMain( HMODULE hModule,
                       DWORD  ul_reason_for_call,
                       LPVOID lpReserved
                     )
{
  switch (ul_reason_for_call)
  {
  case DLL_PROCESS_ATTACH:
    std::cout << "process attach\n";
    break;
  case DLL_THREAD_ATTACH:
    std::cout << "thread attach\n";
    break;
  case DLL_THREAD_DETACH:
    std::cout << "thread detach\n";
    break;
  case DLL_PROCESS_DETACH:
    std::cout << "process detach\n";
    break;
  }
  return TRUE;
}

static int threadCounter = 0;

static HANDLE handle = nullptr;

static bool running = true;

static void terminateThread(void)

{

std::cout << "terminating thread\n";

running = false;

WaitForSingleObject(handle, INFINITE);

std::cout << "done waiting - counter is: " << threadCounter << "\n";

}

static DWORD WINAPI dummy_worker(void*)

{

threadCounter++;

while (running) { Sleep(1000); }

threadCounter--;

return 0;

}

void startThread(void)

{

atexit(&terminateThread);

handle = CreateThread(NULL, 0, &dummy_worker, nullptr, 0, nullptr);

}

BOOL APIENTRY DllMain( HMODULE hModule,

DWORD ul_reason_for_call,

LPVOID lpReserved

)

{

switch (ul_reason_for_call)

{

case DLL_PROCESS_ATTACH:

std::cout << "process attach\n";

break;

case DLL_THREAD_ATTACH:

std::cout << "thread attach\n";

break;

case DLL_THREAD_DETACH:

std::cout << "thread detach\n";

break;

case DLL_PROCESS_DETACH:

std::cout << "process detach\n";

break;

}

return TRUE;

}

Running that code, we get the following output (numbers represent line numbers for reference):
1: process attach
2: thread attach
3: returning from process main
4: atExitMainProcess
5: process detach
6: terminating thread
7: done waiting – counter is: 1

We see that atExitMainProcess() gets called after main() returns, followed by the process detach signal the DLL got, followed by the call to terminateThread() which we registered in the DLL via atexit().

This gives us two interesting hints:

there is no output for the detaching of the thread
the atexit()-registered function of the DLL is called after the atexit()-registered function from the main process

Digging into the depths

To understand the first part, we’ve to know that terminating a process issues a call to ExitProcess() in the VS runtime if the process returned from main(). [1]
The first thing ExitProcess() does is to terminate all threads of the process (excluding the calling thread) WITHOUT receiving a DLL_THREAD_DETACH notification. [2]
That explains the fact that we do not receive the thread detach output.
Keep in mind the following additional facts to understand the conclusion further down:

after threads were terminated, they become signaled
for all DLLs the process-detach notification is sent (that corresponds to line 5 in the output)
Note, that before that step in the ExitThread() processing, the atexit()-registered function in main() was called (output: line 4)

Let’s keep these facts in mind and take a look at the second part now:

We got the output from the process’ atexit()-registered function BEFORE the output of the function we registered via the atexit()-call in startThread(), even though atexit() is defined to run the registered functions in LIFO order [3]. So why did we not get the call to terminateThread() before atExitMainProcess() was called?

The explanation is that in the VC runtime each module (i.e. each DLL and each process) has its own separate atexit-stack (as Dough Harrison explains in these threads [4/8]). This minor detail makes a fundamental difference in this scenario because it means that the order of the registered atexit()-functions is not only dependent on the order of atexit()-calls, but also in which context (i.e. module) they got called.

Understanding the behavior

Now we got to the point of understanding what is going on here.

Upon the process termination, the process’ atexit()-function stack is processed (ouput: line 4).
ExitProcess() is called and terminates our thread without the thread-detach notification.
The thread is signaled.
The process detached notification is sent to the DLL (output: line 5).
The DLL is unloaded and processes its own atexit()-function-stack which calls our terminateThread() function (output: line 6).
The call to WaitForSingleObject() returns immediately (since the thread got signaled already).

Hence, we end up with threadCounter still being set to 1.

What the standard says

The question would arise whether this behavior actually violates the C or C++ standard.
As far as the author can determine there is no violation of the standard. Actually it turns out that the termination of threads prior to their atexit()-functions being called is to prevent undefined behavior as it’s specified in the standard itself [5] which explicitly states that threads can be terminated prior to the execution of std::atexit registered functions in order to prevent undefined behavior. This is particular noted to allow thread managers as static-storage-duration objects.

On the other side the specification of atexit() [6/7] doesn’t prevent the usage of different atexit()-function-stacks per module. So again, there’s no standard violation here.

That said: It’s an implementation detail that there are multiple different atexit-stacks and it’s also an implementation detail when the atexit-functions are called in relation to when threads are terminated.

How developers can deal with the facts

For library developers it seems that there are limited options to cope with the situation. Here’s a list of possible approaches to compensate for the difference in when atexit()-registered functions are called:

ensure your cleanup code actually handles the scenarios where resources were freed already prior to the cleanup function having been called
do not use atexit() at all (or at least not in the context of DLLs) but rather provide your own cleanup function which is documented to be required to be called by 3rd-party applications utilizing your library to ensure proper resource cleanup
do not provide means to do explicit cleanup, but rather leave that task with the OS (which implicitly will cleanup resources eventually)

Conclusion

The combination of using separate per module atexit-stacks and the fact that threads which are registered from a module are killed (without notifications) prior to the module’s atexit()-registered functions having been called, makes the usage of atexit()-registered functions kind of unsuitable in situations without complete control about how the code is utilized (i.e. in libraries).

The lack of explicit requirements from the C/C++ standard in this regards, which might have been intentional and done that way for completely valid and sound reasons (which however would be beyond the author’s knowledge) does not help much with the situation unfortunately. It also raises the question whether this behavior makes sense from a design point of view and whether such a behavior doesn’t defeat the purpose of the atexit-design/-purpose (and therefore could be argued to be a defect in the standard).

The usage of per module exit stacks is at least questionable in the opinion of the author, because as it stands, at least for platform and compiler independent library development the lack of an explicit requirement in the standard adds additional complexity to the design requirements of functions being utilized via atexit()-calls.

Acknowledgments

The author would like to thank Branko Čibej and Bert Huijben for their contributions in investigating the topic and sharing their own opinions on this matter.

References

[1] = Windows Kits 10.0.10240.0 source code: ucrt/startup/exit.cpp: exit_or_terminate_process()
[2] = https://msdn.microsoft.com/en-us/library/windows/desktop/ms682658(v=vs.85).aspx
[3] = https://msdn.microsoft.com/en-us/library/tze57ck3.aspx
[4] = https://groups.google.com/d/msg/microsoft.public.vc.language/Hyyaz2Jpx-Q/t1ADCsPTikoJ
[5] = C++ Working Draft N3242=00-0012 – 3.6.3 paragraph 4
[6] = C++ Working Draft N3242=00-0012 – 18.5 paragraph 5-8
[7] = WG14/N1256 Cinnuttee Draft — September 7, 2007 ISO/IEC 9899:TC3 – 7.20.4.2
[8] = https://groups.google.com/forum/?hl=en#!msg/microsoft.public.vc.mfc/iRo37usY3vU/4Txo3KHfi0MJ

Author: luke1410

Starting the experience with programming in 1989 (back then with GW-Basic and QBasic), Stefan studied Computer Science at the HTW Aalen (Germany). Following his studies he has been working for the games industry in different areas of game engines (especially focusing on the languages C++ and Lua) for 14 years before switching industries where he is primarily focusing on Java development. View all posts by luke1410

2 thoughts on “The trouble of separate module atexit-stacks”

yeolual says:

05/20/2019 at 16:57

Спасибо за информацию!!!!!

stsp says:

02/08/2017 at 11:13

http://man.openbsd.org/atexit contains some relevant notes and warnings:

atexit() is very difficult to use correctly without creating exit(3)-time races. Unless absolutely necessary, please avoid using it.

The behavior when a shared object is unloaded is an extension to that standard.

Introduction

Well known behavior of atexit()

atexit() and DLLs

Understanding what’s going on

Digging into the depths

Understanding the behavior

What the standard says

How developers can deal with the facts

Conclusion

Acknowledgments

References

Author: luke1410

2 thoughts on “The trouble of separate module atexit-stacks”

Leave a Reply Cancel reply