C++ Beyond the Syllabus #11: Address Sanitizers
Protecting against undefined behavior.
Not a Medium member? View this entire article here!
This article is a part of the C++ Beyond the Syllabus series. Subscribe here to receive each new issue directly in your inbox.
Generally speaking, catching undefined behavior and ensuring memory safety are crucial aspects of software engineering.
In C++, pointers, references, and non-owning objects like std::string_view
and std::span
all provide ample opportunity for programmers to introduce these unwanted behaviors into their applications.
Address sanitizers provide a robust solution to undefined behavior and memory safety by catching issues at runtime and throwing detailed error messages.
Address Sanitizers
What can it do?
Address sanitizers can generally catch a few kinds of bugs:
- Use-After-Free
- Out of Bounds Accesses
- Heap Overflow
- Stack Overflow (depending on sanitizer being used)
Just because an address sanitizer can catch certain types of errors, doesn’t mean it always will. It’s a powerful tool, but it’s not foolproof.
Let’s talk about how this tool works, and then get into why there might be false negatives.
How does it work?
The address sanitizer adds shadow memory to track the allocation state of each byte in your program.
For learning purposes, imagine a sanitizer object maintaining a boolean for each byte in the program’s address space representing whether that byte is valid to access. Any call which allocates memory on the heap (e.g. — via the new
or malloc
keywords) sets the boolean representing each byte to true. Any call that frees memory (e.g. — via the delete
keyword), sets the corresponding booleans to false. Then, any code accessing dynamic memory will first verify with the sanitizer object that the bytes being accessed are currently valid.
The actual implementation is more complex, but this still illustrates a few important observations:
- An address sanitizer can only catch bugs in code that is compiled with and linked against the sanitizer. Any library code compiled separately cannot have its memory interactions tracked by the sanitizer.
- The address sanitizer adds at least O(n) time and memory overhead to any program because it will require extra work for each memory allocation, access, and deallocation. You should expect an address sanitizer to slow down your program 2–3x.
Are there other shortcomings?
There are a few:
- Performance Overhead… As mentioned above, an address sanitizer will make your code 2–3x slower. Moreover, in order to get very informative stack traces in sanitizer-generated error messages, you’ll need to manually disable a handful of compiler optimizations. (See more on this in the “How do I use it?” section below.)
- Increased Memory Usage… By keeping shadow memory to track byte initialization statuses, up to O(n) extra memory may be used, where n is the amount of memory your application allocates. This can be problematic for large applications with strict memory requirements.
- False Negatives… While the documentation states there will rarely be false positives, false negatives can be quite frequent. This is especially the case for memory access errors on the stack and when certain compiler optimizations are enabled.
How do I use it?
Address sanitizers come built into modern versions of both LLVM/Clang and GCC compilers.
Using the sanitizer is actually quite simple, just add the compiler flag -fsanitize=address
(same regardless of compiler). This will get you some bare bones sanitizing, but a few additional flags can help avoid some of the aforementioned shortcomings and generate more easily readable error messages:
-g
compiles debug information into executables, allowing error messages to include line numbers, variable names and function names.-fno-omit-frame-pointer
will prevent the compiler from optimizing away the frame pointer, which could hinder the accuracy of stack traces.-O1
will prevent the compiler from rearranging code to maximize optimizations. With greater optimizations enabled (e.g. —-O2
or-O3
), the rearranged code could mask real errors from the sanitizer’s memory checks.-fno-optimize-sibling-calls
prevents the compiler from performing tail call optimization (TCO), which helps the sanitizer accurately display the call stack leading to an error.
With these compiler options, you’ll easily be able to compile and build executables with address sanitizing enabled.
Let’s take another look at the following example code from C++ Beyond The Syllabus #9: std::string_view
:
#include <iostream>
#include <string>
#include <string_view>
std::string_view generate_string()
{
const std::string temp = "Temporary String";
std::string_view temp_view_1(temp.data(), temp.size());
std::cout << "temp_view_1: " << temp_view_1 << "\n";
return temp_view_1;
}
int main()
{
std::string_view temp_view_2 = generate_string();
std::cout << "temp_view_2: " << temp_view_2;
}
I noted in that article how the printed value of temp_view_2
was undefined:
temp_view_1: Temporary String
temp_view_2: C��Oaq�
This is an example of a common error that occurs when a pointer, reference, std::string_view
, or std::span
outlives the underlying data.
An address sanitizer could have saved us here!
Simply plugging this example into Compiler Explorer with the aforementioned compiler options will catch the bug:
On the bottom left, we see the erroneous output with no address sanitizer in use. On the right, we see the address sanitizer identified a heap-use-after-free
error and threw a runtime error including three detailed stack traces: One of the bad access, one of the original allocation of the accessed memory, and one of the freeing of the accessed memory.
Address Sanitizers in Practice
Slowing down your code 2–3x is pretty drastic… and C++ is supposed to be fast, right?
The good news is address sanitizers can be useful without slowing down your production releases.
The simplest way to start using address sanitizers today is to build your unit and integration tests with the necessary compiler options. (Just be sure not to use a sanitizer for any benchmarking tests!)
Beyond that, adding an address sanitizer to your continuous integration (CI) pipeline can be a good way to enforce team-wide adoption.
What’s Next?
If you liked this article, please consider clapping and following!
Subscribe here to receive weekly-ish issues of C++ Beyond the Syllabus directly in your inbox :)