Heap Heap Hurray: Proper C++ Heap Debugging
by sirus on Apr.03, 2009, under Code
In this post I will show you how to make a C++ heap debugger with the following capabilities:
- Detect all memory leaks on program termination with exact file and line of allocation.
- Immediately break the debugger on buffer overflow / underflow.
- Immediately break the debugger on read/writes to freed memory.
- Immediately break the debugger on delete/vector-delete mismatch and double delete.
- Integrate into a project by just compiling in 1 cpp file.
The most common way that people debug the MSVC heap is by redefining new to their down new function in a manner such as this:
#ifdef __DEBUG #define new DEBUG_NEW(__FILE__,__LINE__) #endif
While this seems to be one of the easiest ways to achieve the desired functionality, it is one of the worst ways to do it since the DEBUG_NEW above completely hoses the placement new ( foo*ptr = new (buffer) foo(); ) which is used all over the STL and in any memory manager.
instead what one must do is overload the following 8 functions:
void * operator new(size_t size) throw (std::bad_alloc); void * operator new(size_t size, const std::nothrow_t&) throw (); void * operator new[](size_t size) throw (std::bad_alloc); void * operator new[](size_t size, const std::nothrow_t&); void operator delete(void * ptr) throw (); void operator delete(void * ptr, const std::nothrow_t&) throw (); void operator delete[](void * ptr) throw (); void operator delete[](void * ptr, const std::nothrow_t&) throw ();
Note the existence of nothrow deletes even though delete under no circumstances will throw.
To achieve the functionality of breaking immediately on buffer overflow/underflow will be using virtual pages. Virtual pages have the advantage of having customizable access rights. Meaning that we can mark certain pages PAGE_READWRITE and certain pages PAGE_NOACCESS. When bad code tries to read or write an address that is within a page that is marked PAGE_NOACCESS a structured exception is throw which we will catch.
First we need to align our allocated buffer on the edge of a page marked PAGE_READWRITE next to a page marked PAGE_NOACCESS.
Since page sizes are fixed we can’t get our buffer to have PAGE_NOACCESS pages on both sides of it unless the buffer size is an exact multiple of the size of a page.
For example in the case of overflow protection we’d be setting up the following scenario:

Our internal Heap classes will keep track of a unique heap size. Meaning that we will have a Heap for objects of size 4, a heap for objects of size 8, etc.
Lets define a structure to keep track of individual allocations:
struct HeapListEntry
{
HeapListAllocType Type;
ProtectionLevel Protection;
unsigned __int64 AllocatingAddress;
byte* Data;
};
- The Type field is defines if its a Single or a Vector allocation.
- The Protection field keeps track of the page protection level between None, Partial, and full.
- The AllocatingAddress field is the address of the code where the allocation was initially made. (More on this later)
- the Data pointer is the data we will be handing out to the user.
Now to manage these allocations we will need to keep track of three lists:
//Define out collection. typedef std::vector<HeapListEntry,Mallocator<HeapListEntry> > HLEVECTOR; HLEVECTOR m_Allocated; HLEVECTOR m_Distributed; HLEVECTOR m_Free;
- The Allocated list is the memory we received from the operating system but have not yet distributed to the program.
- The Distributed list is what the program currently has ownership of.
- The Free list is what we gave to the program but it then freed and gave it back.
The reason we have we define a std::allocator for the vectors is because otherwise the vector will use new to allocate that memory and hit our memory debugger again causing an infinite recursion. In this instance the Mallocator simply calls malloc and free for the given size of HeapListEntry.
We want to have the ability to toggle whether we want to detect overflows, underflows, or not use virtual pages at all. Also we want to specify some variables.
//0 off, 1 overflow, 2 underflow #define ACCESSVIOLATIONS 1 #define PAGESIZE 0x4000 #define ALLOCD 0xA0 #define FREED 0xDF #define NOMAND 0xFE #define NOMANSIZE 0x10
- PAGESIZE is the size of the operating system’s pages.
- ALLOCD is the fill byte we use when we have allocated memory but not used it.
- FREED is the fill byte we use when we get memory back from the program.
- NOMAND is the fill byte we use to pad the available data array on either side of our buffer allowing us to detect both underflow and overflow at some point.
- NOMANSIZE is how big the no-man’s land is.
Globally we need to define a New and a Delete function for our operator overloads to call.
typedef std::map<size_t,Heap*,std::less<size_t> ,Mallocator<std::pair<size_t,Heap*> > > HEAPMAP;
HEAPMAP g_Heaps;
void* Alloc(size_t s, bool Vector = false)
{
HEAPMAP::iterator itr = g_Heaps.find(s);
if(itr == g_Heaps.end())
{
Heap* pHeap = new (malloc(sizeof(Heap))) Heap(s);
g_Heaps.insert(std::make_pair(s,pHeap));
}
return g_Heaps.find(s)->second->Alloc(Vector);
}
void Delete(void *ptr,bool Vector = false)
{
if(!ptr)
return;
for (HEAPMAP::iterator i=g_Heaps.begin();i!=g_Heaps.end();++i)
{
if(i->second->IsOwned((byte*)ptr))
{
i->second->Delete(ptr,Vector);
return;
}
}
_ASSERTE(0 && "POINTER NOT OWNED BY MEMORY MANAGER.");
}
This code maps allocation sizes to there individual heaps. If a heap is not found it is created for that size and the code calls their Alloc and Delete functions as necessary.
Our Alloc function looks as follows:
void* Heap::Alloc(bool Vector, unsigned stacklookup /* = 5 */)
{
HeapListEntry hle = GetAvailable();
if(hle.Data)
{
Protect(&hle,PL_PARTIAL);
hle.AllocatingAddress = GetCallingFunction(stacklookup);
hle.Type = Vector?HLAT_VECTOR:HLAT_SINGLE;
m_Distributed.push_back(hle);
}
return hle.Data;
}
We call GetAvailable which first tries to exhaust its free list, then its distributed list and if it still can’t find any allocated regions it will call the operating system to get more memory. Then we initialize our structure, add it to our distributed list and return the data pointer.
The Delete function looks as follows:
void Heap::Delete(void* ptr,bool Vector)
{
if(IsInList(&m_Free,(const byte*)ptr))
{
_ASSERTE(0 && "DOUBLE DELETE DETECTED");
}
else if(IsInList(&m_Distributed,(const byte*)ptr))
{
HeapListEntry* pEntry;
FindHLE(ptr,NULL,&pEntry);
if(pEntry->Type != (Vector?HLAT_VECTOR:HLAT_SINGLE))
{
_ASSERTE(0 && "delete/delete[] missmatch");
}
if(!Validate(pEntry))
{
_ASSERTE(0 && "BUFFER OVERFLOW / UNDERFLOW DETECTED! Turn on ACCESSVIOLATIONS to break immediately!");
}
HeapListEntry entry;
for(HLEVECTOR::iterator i = m_Distributed.begin();i!=m_Distributed.end();++i)
if(i->Data == pEntry->Data)
{
entry = *i;
m_Distributed.erase(i);
break;
}
memset(entry.Data,FREED,m_Size);
Protect(&entry,PL_FULL);
m_Free.push_back(entry);
}
else
{
_ASSERTE(0 && "ATTEMPTING TO DELETE UNOWNED POINTER.");
}
}
Here what we do is check for double frees, make sure that the proper delete is being called and check validate the boundaries before finally adding it to our Free list.
Now lets look at how we actually allocate the virtual pages from the operating system. When we exhaust our Free and Allocated lists we call the Grow() function bellow:
#define TOTALSIZE (NOMANSIZE + m_Size + NOMANSIZE)
#define PAGESPAN ((TOTALSIZE / PAGESIZE) + 1)
//...
void Heap::Grow()
{
unsigned newblocks = (m_Distributed.size() + 1)*2;
for(unsigned i=0;i<newblocks;++i)
{
#if ACCESSVIOLATIONS == 0
byte* ptr = (byte*)malloc(TOTALSIZE);
#else
byte* ptr = (byte*)VirtualAlloc(NULL,(PAGESPAN+1)*PAGESIZE,MEM_COMMIT,PAGE_READWRITE);
#endif
_ASSERTE(ptr && "COULD NOT ALLOCATE MORE MEMORY");
if(ptr)
{
m_SystemAllocated.push_back(ptr);
#if ACCESSVIOLATIONS == 0
// dont need to do anything
#endif
#if ACCESSVIOLATIONS == 1
//detect overflows adjust so we straddle edge of last page
ptr += (PAGESPAN)*PAGESIZE;
ptr -= m_Size + NOMANSIZE;
#endif
#if ACCESSVIOLATIONS == 2
//detect underflows adjust so we straddle edge of first page
ptr += PAGESIZE - NOMANSIZE;
#endif
memset(ptr,NOMAND,NOMANSIZE);
memset(ptr+NOMANSIZE,ALLOCD,m_Size);
memset(ptr+NOMANSIZE+m_Size,NOMAND,NOMANSIZE);
HeapListEntry hle;
hle.Data = ptr+NOMANSIZE;
Protect(&hle,PL_PARTIAL);
m_Allocated.push_back(hle);
}
}
}
Here what we are doing, is allocating the memory from the operating system either via VirtualAlloc or via malloc if ACCESSVIOLATIONS are off. If we are using virtual pages we need to allocate enough pages to span all our data plus one page that will be marked PAGE_NOACCESS.
We then memset our allocation with the proper fill bytes and adjust our data pointer accordingly. Next we Protect the region so that our edge page would be marked PAGE_NOACCESS.
This function SHOULD be expanded to support releasing the free lists on other heaps when allocations fail.
Lets look at the Protect function now:
void Heap::Protect(HeapListEntry* hle, ProtectionLevel level)
{
hle->Protection = level;
#if ACCESSVIOLATIONS == 0
return;
#else
DWORD protlevel;
void* start = NULL;
size_t size;
switch(level)
{
case PL_NONE:
protlevel = PAGE_READWRITE;
start = hle->Data - NOMANSIZE;
size = TOTALSIZE;
break;
case PL_PARTIAL:
{
Protect(hle,PL_NONE);
ProtectedRegion pr = GetProtectedRegion(hle);
protlevel = PAGE_NOACCESS;
start = pr.base;
size = pr.len;
break;
}
case PL_FULL:
protlevel = PAGE_NOACCESS;
start = hle->Data - NOMANSIZE;
size = TOTALSIZE;
break;
}
DWORD oldProt = 0;
if(!VirtualProtect(start,size,protlevel,&oldProt))
{
_ASSERTE(0 && "FAILED TO PROTECT PAGE.");
}
#endif
}
Here we simply find out the regions we need to protect and call virtual protect, pretty straight forward. Theres a trick that happens in the call to GetProtectedRegion though:
//...
Heap::ProtectedRegion Heap::GetProtectedRegion(const HeapListEntry* hle) const
{
ProtectedRegion ret(NULL,0);
#if ACCESSVIOLATIONS == 0
ret.base = hle->Data;
ret.len = 0;
#endif
#if ACCESSVIOLATIONS == 1
ret.base = hle->Data + m_Size ;
ret.len = 1;
#endif
#if ACCESSVIOLATIONS == 2
ret.base = hle->Data - 1;
ret.len = 1;
#endif
return ret;
}
You can’t partially protect pages so its enough to just pass a length of 1 to alter an entire page!
We also want to find out where the allocations happened. Since we can’t use the __FILE__ and __LINE__ macros we have to do something a little more sneaky:
unsigned __int64 Heap::GetCallingFunction(unsigned depth)
{
CONTEXT c;
#ifdef _WIN64
RtlCaptureContext(&c);
#else
c.ContextFlags = CONTEXT_CONTROL;
__asm
{
LABEL: mov eax, [LABEL];
mov c.Eip, eax;
mov c.Ebp, ebp;
mov c.Esp, esp;
}
#endif
STACKFRAME64 stack_frame = {0};
stack_frame.AddrPC.Mode = AddrModeFlat;
stack_frame.AddrPC.Offset = c.Eip;
stack_frame.AddrStack.Mode = AddrModeFlat;
stack_frame.AddrStack.Offset = c.Esp;
stack_frame.AddrFrame.Mode = AddrModeFlat;
stack_frame.AddrFrame.Offset = c.Ebp;
for(unsigned i=0;i<depth;++i)
StackWalk64(IMAGE_FILE_MACHINE_I386,GetCurrentProcess(),GetCurrentThread(),&stack_frame,&c,NULL,SymFunctionTableAccess64,SymGetModuleBase64,NULL);
return stack_frame.AddrPC.Offset;
}
What this code lets us do is find the exact address that made the allocation. This works by walking the actual stack. We capture the current context in one of two ways depending on whether we are compiling in 32bit or 64bit. Using this data we can call StackWalk64 to walk up the stack and find the exact address of the call. We will use this later to find the file and line number of the call.
There is an obvious benefit to doing this as opposed to using the macros. You only need to store a pointer as opposed an entire character array for each allocation and you can ship this code (not that you would want to) without any information disclosure to any one trying to reverse engineer your application.
This function could be expanded to also store the full stack for more intensive debugging.
To make the stack walking work as well as to catch the access violations we are going to throw we will need to do a little bit of initialization. The best way to do this is with global constructor that is guaranteed to be called before any other global. This is achieved with the init_seg(user) pragma.
class SEHInit
{
public:
SEHInit()
{
SymInitialize(GetCurrentProcess(),NULL,true);
SymSetOptions(SymGetOptions() | SYMOPT_LOAD_LINES);
SetUnhandledExceptionFilter(SEHHandler);
}
~SEHInit()
{
for(HEAPMAP::iterator i=g_Heaps.begin();i!=g_Heaps.end();++i)
{
i->second->~Heap();
free(i->second);
}
g_Heaps.clear();
}
};
#pragma init_seg(user)
SEHInit SEH;
In the constructor we call SymInitialize to enable symbol resolution and SymSetOptions to allow us to get the actual file names and line numbers. There is also a call SetUnhandledExceptionFilter to catch our access violations and print a useful message.
I will cover SetUnhandledExceptionFilter and its uses in a future post.
In our destructor we destroy our heaps and free the memory we allocated for them and dump associated memory leaks in the destructors.
Here is the code for our structured exception handler:
LONG WINAPI SEHHandler(EXCEPTION_POINTERS* pException)
{
#if ACCESSVIOLATIONS != 0
EXCEPTION_RECORD* walker = pException->ExceptionRecord;
//get to original exception.
while(walker->ExceptionRecord)
walker = walker->ExceptionRecord;
if(walker->ExceptionCode == EXCEPTION_ACCESS_VIOLATION)
{
for(HEAPMAP::iterator i=g_Heaps.begin();i!=g_Heaps.end();++i)
{
if(i->second->IsProtected((byte*)walker->ExceptionAddress))
{
_ASSERTE(0 && "BUFFER OVERFLOW / UNDERFLOW / WRITE AFTER DELETE DETECTED.");
}
}
}
#endif
return EXCEPTION_CONTINUE_SEARCH;
}
We walk the exceptionrecord until we reach the base exception. Then we make sure that the exception itself is an access violation and proceed to print out an error message if the address is on one of our heaps.
On program termination we will dump information about any items that have been allocated but not yet freed as well as validate our memory in case access violations were disabled. This is accomplished via the heap class destructor:
Heap::~Heap()
{
for(HLEVECTOR::iterator i=m_Free.begin();i!= m_Free.end();++i)
if(!Validate(&(*i)))
_ASSERTE(0 && "WRITE AFTER DELETE DETECTED! Turn on ACCESSVIOLATIONS to break imediately!");
for(HLEVECTOR::iterator i=m_Distributed.begin();i!= m_Distributed.end();++i)
{
if(!Validate(&(*i)))
_ASSERTE(0 && "BUFFER OVERFLOW / UNDERFLOW DETECTED! Turn on ACCESSVIOLATIONS to break immediately!");
PrintLeak(i->AllocatingAddress,m_Size);
}
for(VOIDVECTOR::iterator i=m_SystemAllocated.begin();i!=m_SystemAllocated.end();++i)
{
#if ACCESSVIOLATIONS == 0
free(*i);
#else
VirtualFree(*i,0,MEM_RELEASE);
#endif
}
}
Here we walk through our free list and check to make sure that the FREED fill byte is still there. Then we walk through our distributed list and print out every item that is still there for it is now a leak. Then we finally return our memory to the operating system.
Code for this can be downloaded here: heapdebugger.zip
To include it into your existing project just add the MemoryDebugger.cpp file to your project and rebuild.
April 3rd, 2009 on 3:16 PM
Excellent post and information. Thanks for sharing!
April 3rd, 2009 on 6:40 PM
That’s an awful lot of work for something that already exists in two places – the debug CRT and Windows itself. Good job implementing this – it’s a good exercise, but if you ever need this functionality in production, please don’t roll your own and use what you already get with MSVC++ and Windows – they’ve been tested and used for decades. Your code is mostly fine, but it has a number of stylistic and functional issues that make it less reliable for production. Cheers!
April 3rd, 2009 on 9:01 PM
Umm… no memory debugger has been tested on Windows for decades.
Furthermore, this post indicates why the MSVC memory debugger is flawed.
April 4th, 2009 on 7:54 AM
A quick question, how does this work in a multi-threaded environment, as far as I can tell, there is no protection around the heap resolution which can result in several heaps of size n being allocated.
I am a luddite when it comes to C++ so please feel free to disabuse me of my mistaken notions.
April 4th, 2009 on 8:56 AM
@Justin You are right. This code above is not thread safe but can be made with just a few locks.
April 4th, 2009 on 9:03 AM
@Bob The main reason for using this over the built-in CRT heap debugger is that the CRT one can only detect overflow/underflow on delete which makes it extremely hard to track down exactly where the overflow happened and the CRT one does not play well with STL.
May 15th, 2009 on 2:19 PM
Thanks a lot for this one, very useful indeed.
Uninitialized IMAGEHLP_LINE64 structs prevented me from getting results from SymGetLineFromAddr64(…) however. I fixed this by setting
hlp.SizeOfStruct = sizeof(IMAGEHLP_LINE64);
To also make this work on amd64, you need to read STACKFRAME64 struct members from correct registers and use the appropriate machine type on StackWalk64(…):
#ifdef _WIN64
stack_frame.AddrPC.Offset = c.Rip;
stack_frame.AddrStack.Offset = c.Rsp;
stack_frame.AddrFrame.Offset = c.Rbp;
#else
stack_frame.AddrPC.Offset = c.Eip;
stack_frame.AddrStack.Offset = c.Esp;
stack_frame.AddrFrame.Offset = c.Ebp;
#endif
for(unsigned i=0;i<depth;++i)
#ifdef _WIN64
StackWalk64(IMAGE_FILE_MACHINE_AMD64,GetCurrentProcess(),GetCurrentThread(),&stack_frame,&c,NULL,SymFunctionTableAccess64,SymGetModuleBase64,NULL);
#else
StackWalk64(IMAGE_FILE_MACHINE_I386,GetCurrentProcess(),GetCurrentThread(),&stack_frame,&c,NULL,SymFunctionTableAccess64,SymGetModuleBase64,NULL);
#endif
November 19th, 2009 on 10:02 AM
Thanks for this info. I’ve been trying to use it on a .NET assembly that is C++ / CLI code, that incorporates a static C++ native library — but it’s failing (hard) the first time new(size_t size) is called, from __DLLMainCRTStartup(). Any ideas? I wasn’t sure whether the managed / native heaps might be in conflict. If you want to write me at gmail, rather than in-thread, that’s fine too.
January 23rd, 2010 on 10:34 AM
This is what Microsoft’s Application Verifier Tool supposedly does when Full Heap Checking (“debugheap”) is enabled.
Microsoft Application Verifier can be downloaded for free from MS, just Google for it.
March 11th, 2010 on 8:53 PM
Substantially similar functionality is present in Windows, but is not well known. The builtin Windows implementation is called PageHeap and is completely separate from the debug heap provided within the CRT (which, as has been pointed out, is not very effective at uncovering the root causes of heap corruption).
To enable PageHeap for an application, install Microsoft’s Debugging Tools for Windows product. PageHeap can then be enabled for a specific executable by running a command such as:
gflags -p /enable prog.exe /full /protect
Any programs named ‘prog.exe’ that run after this will run with PageHeap enabled. gflags supports additional options to tweak alignment constraints, or to detect underflows instead of overflows. To disable PageHeap, run:
gflags -p /disable prog.exe
Run the following command to get usage information:
gflags -p /?
January 27th, 2011 on 8:19 PM
~”" I am really thankful to this topic because it really gives useful information ‘”:
September 12th, 2011 on 5:46 PM
One other potential problem is:
- delete object at address 0xbaddf00d
- new enough objects to eventually get one that has the address 0xbaddf00d
- some dangling pointer then deletes 0xbaddf00d thinking it is a completely different object
This could manifest itself in 2 ways:
1) we get a crash trying to follow some virtual dtor call off to the middle of nowhere
2) we successfully destroy the object and have a random crash later from double deleting or a similar issue to 1).
So to check for this issue you could just never repopulate your free list.