Coding Labs

Tutorial - Memory Pool



Memory management is a complex and wide topic. With this tutorial we'll try to work out a good way to keep track of our memory usage and to keep under control any potential issue.
This topic should be considered advanced and the tutorial assumes that you have a good knowledge of low level computer architecture.
Let's introduce our memory management device, the memory pool.

The main reason to use a memory pool is to keep your memory under tight control while paying particular attention to how much memory a system is using and how much fragmented the memory is getting. 
Another important feature we want in our memory pool is the ability to track any memory stomp that may happen while running our application.

The memory pool we implement now is a simple one. The idea is to get all the memory we need from the OS and to subdivide this memory into chunks. For example, if we want to create a pool of 5mb, we will do the following steps:

  • Allocate all the 5mb from the Operative System using new
  • Create a new free chunk which has the 5mb so that the pool is now composed of a chunk [5Mb Chunk]
  • When the user (any client of the memory pool) requires some memory (say 10Kb) the pool will:
    • Create a new occupied chunk of 10Kb
    • Create a new free chunk of 5Mb - 10Kb = ~4.99Mb
    • Link the two chunks so that the pool is now [10Kb Chunk][4.99Mb Chunk]

Every allocation will generate an instance of a new occupied chunk, while every deallocation will remove an occupied chunk and will generate an instance of a free one. If we extend the previous example and we imagine that the client required: 10Kb, 200Kb, 20bytes and than it freed the 200Kb the pool will look as follow: [10Kb][200Kb Free][20 bytes][~4.98 Kb Free] 

With this technology we can therefore keep track of where all our memory is going, how fragmented the pool are getting, if we are forgetting to free what we have allocated, and so forth.

Now that we know what benefits we get from the pool, let's create one.


1

The first thing we need to do is to define the memory layout. As said before, we will split our memory in chunks. Every chunk has a small header that contains a pointer to the next chunk, a pointer to the previous chunk, the user data size and a flag to say if the chunk is free or not.

The header consists of 16 bytes (we assume pointers to be 4 bytes, this won't work on a 64 bits architecture).
First 8 bytes are needed to store where is the next and where is the previous chunk in the pool. If the current chunk is the first one or the last one prev/next pointers will be NULL.
Other 4 bytes are used to store the user data size; this is needed because we want to know how big is the chunk of data after the header.
The last thing we save in the header is a boolean flag that states whether the chunk is free or used. This will take 1 more byte, plus 3 bytes of padding means other4 bytes which takes our grand total to 16 bytes.
Chunk Layout Diagram

The Chunk class looks as follow

class Chunk
{
public:
    Chunk(DWORD userDataSize) : m_next(NULL), 
                                m_prev(NULL), 
                                m_userdataSize(userDataSize), 
                                m_free(true) {};
    void write(void* dest)
    { 
        memcpy(dest, this, sizeof(Chunk) ); 
    }
    void read(void* src)
    { 
        memcpy(this, src, sizeof(Chunk) ); 
    }

    Chunk*  m_next;
    Chunk*  m_prev;
    DWORD   m_userdataSize;
    bool    m_free;
};
                                    

2

Now that we have defined the smalles unit in our memory management system we can write the memory pool itself.
We want our memory pool object to allow us to allocate a new block and free an existing block. We also require any pool to be able to check the pool's integrity and to dump the whole pool to a txt file.
We call our heap like pool StandardMemoryPool which inherits from MemoryPool. This will allow us to add new memory pools with different behaviours, like fixed size chunk pools orvirtual memory pools.

First of all, let's give a good look at the abstract memory pool class. Our header file starts with some thrasing options.

#ifdef _DEBUG
#define TRASH_POOLS 1
#else
#define TRASH_POOLS 0
#endif

Trashing the pool is a useful feature that we want to have in our pools. The idea here is to copy a pattern in the user data block on allocation and on deallocation. Doing this we can easily see if the chunk has been used or not and in what state it was left.
The trashing feature is useful in a debug environment, but since it requires a memory copy every time we don't really want to be active in release.

/**
*   Abstract memory pool class
*/
class MemoryPool
{
public:
    // Methods
    //-----------------------------------------    
    inline virtual void*    allocate(DWORD size) = 0;
    inline virtual void     free(void* ptr) = 0;
    inline virtual bool     integrityCheck() const = 0;
    inline virtual void     dumpToFile(const std::string& fileName, 
                                    const DWORD itemsPerLine) const = 0;
                                    

These four methods are the core of every memory pool and they are all pure virtual. These methods are the most important bits of this class; the rest is pretty straightforward:

    inline DWORD            getFreePoolSize() const { return m_freePoolSize; }
    inline DWORD            getTotalPoolSize() const { return m_totalPoolSize; }
    inline bool             hasBoundsCheckOn() const { return m_boundsCheck; }

    // Static
    //-----------------------------------------    
    static const BYTE   s_trashOnCreation = 0xCC;
    static const BYTE   s_trashOnAllocSignature = 0xAB;
    static const BYTE   s_trashOnFreeSignature  = 0xFE;
    static const BYTE   s_boundsCheckSize = 16;
    static const BYTE   s_startBound[s_boundsCheckSize];
    static const BYTE   s_endBound[s_boundsCheckSize];

protected:
    // Ctor/Dtor
    //-----------------------------------------
    MemoryPool() 
            : m_totalPoolSize(0)
            , m_freePoolSize(0)
            , m_trashOnCreation(TRASH_POOLS)
            , m_trashOnAlloc(TRASH_POOLS)
            , m_trashOnFree(TRASH_POOLS)
            , m_boundsCheck(0) 
            {};
    virtual ~MemoryPool(){};

    // Variables
    //-----------------------------------------    
    DWORD       m_totalPoolSize;
    DWORD       m_freePoolSize;

    // Bitfield
    unsigned    m_trashOnCreation : 1;
    unsigned    m_trashOnAlloc : 1;
    unsigned    m_trashOnFree : 1;
    unsigned    m_boundsCheck : 1;

};

You have probably noticed the m_boundsCheck flag and the two s_startBound ands_endBound constants; these are used by any pool to enforce a bound check mechanism that will allow us to trace any eventual memory stomp.


3


Finally we can create our first pool. First of all we write our constructor to set up the pool for us:

// Ctors/Dtor    
//-----------------------------------------    
StandardMemoryPool(DWORD sizeInBytes, bool boundCheck)
{
    //Set the bound check
    if(boundCheck) m_boundsCheck = 1;

    //Allocate the total memory
    m_poolMemory = ::new unsigned char[sizeInBytes];

First important thing, we allocate as many bytes as required. This is the only allocation we will perform for this pool; once we have the memory back from the OS we will handle it manually.

    m_freePoolSize = sizeInBytes - sizeof(Chunk);
    m_totalPoolSize = sizeInBytes;

Since we are writing the Chunk header information straight into the pool this will consume memory. This means that our actual free space is always smaller then the total space. The more chunks we have, the more memory we need to "waste".

    // Trash it if required        
    if(m_trashOnCreation)
        memset(m_poolMemory, s_trashOnCreation, sizeInBytes);

    //Allocate the first free block        
    if(m_boundsCheck)
    {
        m_freePoolSize -= s_boundsCheckSize * 2;

        Chunk freeChunk(sizeInBytes - sizeof(Chunk) - 2 * s_boundsCheckSize);
        freeChunk.write(m_poolMemory + s_boundsCheckSize);
        memcpy( m_poolMemory, s_startBound, s_boundsCheckSize );
        memcpy( m_poolMemory + sizeInBytes - s_boundsCheckSize , 
                s_endBound, s_boundsCheckSize );
    }
    else
    {
        Chunk freeChunk(sizeInBytes - sizeof(Chunk));
        freeChunk.write(m_poolMemory);
    }
}
~StandardMemoryPool()
{
    //Deallocate  
    ::delete[] m_poolMemory;
}

We finally create the first big block. Obviously this block is free since nothing has been allocated yet.
It's important to notice the two memcpys call in the "boundsCheck" version of the code. This is done to put the two guards in place. Having this guards in place will also require more memory for every chunk.

Let's now have a look at how a 64 bytes pool looks like after creation. Notice how different is the free size in the version with bounds check on. I've highlighted the nextprevsize andflag + padding position in memory.

Memory pool ----------------------------------
Type: Standard Memory
Total Size: 64
Free Size: 48


Free:	0x001e8680 [Bytes:48]

Memory Dump:
Start: 0x001e8680

0x001e8680: 00:00:00:00:00:00:00:00:30:00:00:00:01:cc:cc:cc          0   ÌÌÌ
0x001e8690: cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc  ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ
0x001e86a0: cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc  ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ
0x001e86b0: cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc  ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ
Memory pool ----------------------------------
Type: Standard Memory
Total Size: 64
Free Size: 16

Free:	0x001e8aa8 [Bytes:16]


Memory Dump:
Start: 0x001e8a98

0x001e8a98: 5b:42:6c:6f:63:6b:2e:2e:2e:2e:53:74:61:72:74:5d  [Block....Start]
0x001e8aa8: 00:00:00:00:00:00:00:00:10:00:00:00:01:cc:cc:cc             ÌÌÌ
0x001e8ab8: cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc  ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ
0x001e8ac8: 5b:42:6c:6f:63:6b:2e:2e:2e:2e:2e:2e:45:6e:64:5d  [Block......End]

4

Now the interesting part, how to implement those four core functions in our heap like memory pool. Let's go straight to the code!
First of all we write our allocation function. The allocation function must find a block big enough to allocate the required memory. Once found it will update the neighbour blocks to point to the newcome and then it will return a pointer to the user data.

inline void*    StandardMemoryPool::allocate(DWORD _size)
{
    DWORD requiredSize = _size + sizeof(Chunk);

    // If guards are required, add their size
    if(m_boundsCheck)
        requiredSize += s_boundsCheckSize * 2;

    // Now search for a block big enough
    Chunk* block = (Chunk*)( m_boundsCheck == 1 ? m_poolMemory + 
            s_boundsCheckSize : m_poolMemory);
    while(block)
    {
        if(block->m_free && block->m_userdataSize >= requiredSize ) break;
        block = block->m_next;
    }

    BYTE* blockData = (BYTE*)block;

    // If no block is found, return NULL
    if(!block) return NULL;

Straightforward so far. Notice how the required block size increases when bounds check is on. Once we have found a free block we create a new used block with part of the free memory and we update the neighbour blocks' pointers:

    // If the block is valid, create a new free block with 
   // what remains of the block memory
    DWORD freeUserDataSize = block->m_userdataSize - requiredSize;
    if( freeUserDataSize > s_minFreeBlockSize)
    {
        Chunk freeBlock(freeUserDataSize);
        freeBlock.m_next = block->m_next;
        freeBlock.m_prev = block;
        freeBlock.write( blockData + requiredSize );
        if(freeBlock.m_next)
            freeBlock.m_next->m_prev = (Chunk*)(blockData + requiredSize);
        if(m_boundsCheck)
            memcpy( blockData + requiredSize - s_boundsCheckSize, s_startBound, 
                    s_boundsCheckSize );
        block->m_next = (Chunk*)(blockData + requiredSize);
        block->m_userdataSize = _size;
    }

    // If a block is found, update the pool size
    m_freePoolSize -= block->m_userdataSize;

    // Set the memory block
    block->m_free = false;

So, if the size is enough we have the memory we need. The free block we found will be marked as used and its size will be modified to be what required. Then we create a new free block which will inherith the remaining memory.
This is 128 bytes pool before allocation:

0x00268ed0: 00:00:00:00:00:00:00:00:70:00:00:00:01:cc:cc:cc          p   ÌÌÌ
0x00268ee0: cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc  ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ
0x00268ef0: cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc  ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ
0x00268f00: cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc  ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ
0x00268f10: cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc  ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ
0x00268f20: cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc  ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ
0x00268f30: cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc  ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ
0x00268f40: cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc  ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ

Same pool after allocation:

0x00268ed0: 00:8f:26:00:00:00:00:00:20:00:00:00:00:cc:cc:cc   &          ÌÌÌ
0x00268ee0: ab:ab:ab:ab:ab:ab:ab:ab:ab:ab:ab:ab:ab:ab:ab:ab  ««««««««««««««««
0x00268ef0: ab:ab:ab:ab:ab:ab:ab:ab:ab:ab:ab:ab:ab:ab:ab:ab  ««««««««««««««««
0x00268f00: 00:00:00:00:d0:8e:26:00:40:00:00:00:01:cc:cc:cc      ÐŽ& @   ÌÌÌ
0x00268f10: cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc  ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ
0x00268f20: cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc  ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ
0x00268f30: cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc  ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ
0x00268f40: cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc:cc  ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ

I've highlighted the two headers for you. Notice the next pointer and the prev pointer which are not null anymore. Also check the flag for the first block: now it's zero because the block is in use. The first block's user data has been trashed with the 0xab pattern, so we can easily see where the user data is and we can assume that it was just initialized.

The last part of the allocation function deals with bounds check and trashing.

    // Move the memory around if guards are needed
    if(m_boundsCheck)
    {
        memcpy( blockData - s_boundsCheckSize, s_startBound, s_boundsCheckSize );
        memcpy( blockData + sizeof(Chunk) + block->m_userdataSize, s_endBound, 
                s_boundsCheckSize );
    }

    //Trash on alloc if required
    if(m_trashOnAlloc)
        memset(blockData + sizeof(Chunk), s_trashOnAllocSignature, 
                block->m_userdataSize);
    
    return (blockData + sizeof(Chunk));
}

5

Now the free function. The thing is worth noticing in this function is that it will try to merge any free block before and after the block we are trying to free. This grants us that we won't have any free block followed (or preceeded) by another free block.

inline void    StandardMemoryPool::free(void* ptr)
{
    // is a valid node?
    if(!ptr) return;
    Chunk* block = (Chunk*)( (BYTE*)ptr - sizeof(Chunk) );
    ASSERT(block->m_free == false, "This block is already free");
    if(block->m_free) return;

The pointer we are given is pointing to user data. We need to move it back of the size of the header to have the Chunk structure.

    DWORD fullBlockSize = block->m_userdataSize + sizeof(Chunk) + 
                        (m_boundsCheck == 1 ? s_boundsCheckSize * 2 : 0);
    m_freePoolSize += block->m_userdataSize;

    Chunk* headBlock = block;
    Chunk* prev = block->m_prev;
    Chunk* next = block->m_next;
    

headBlock will be the head of the free block. prev and next are pointers relative toheadBlock. Initially headBlock is the block we want to free but if the previous block is also free then we set this one to be the headBlock and we increase the size to cover both blocks.

    // If the node before is free I merge it with this one
    if(block->m_prev && block->m_prev->m_free)
    {
        headBlock = block->m_prev;
        prev = block->m_prev->m_prev;
        next = block->m_next;

        // Include the prev node in the block size so we trash it as well
        fullBlockSize += m_boundsCheck == 1 ? 
          block->m_prev->m_userdataSize + 
          sizeof(Chunk) + s_boundsCheckSize * 2 : block->m_prev->m_userdataSize + 
          sizeof(Chunk);

        // If there is a next one, we need to update its pointer
        if(block->m_next)
        {
            // We will re point the next
            block->m_next->m_prev = headBlock;

            // Include the next node in the block size if it is 
           // free so we trash it as well
            if( block->m_next->m_free )
            {
                // We will point to next's next
                next = block->m_next->m_next;
                if(block->m_next->m_next)
                    block->m_next->m_next->m_prev = headBlock;

                fullBlockSize +=m_boundsCheck == 1 ? 
                block->m_next->m_userdataSize+ 
                sizeof(Chunk) + 
                s_boundsCheckSize * 2 : 
                block->m_next->m_userdataSize + 
                sizeof(Chunk);
            }
        }
    }

If the previous one is not free while the next one is not used then we include it in the current headBlock.

    else
    // If next node is free lets merge it to the current one
    if(block->m_next && block->m_next->m_free)
    {
        headBlock = block;
        prev = block->m_prev;
        next = block->m_next->m_next;

        // Include the next node in the block size so we trash it as well
        fullBlockSize +=m_boundsCheck == 1 ? 
            block->m_next->m_userdataSize+ 
            sizeof(Chunk) + s_boundsCheckSize * 2 : 
            block->m_next->m_userdataSize + 
            sizeof(Chunk);
    }
    

Now that we have modified all the pointers we create the free block itself.

    // Create the free block
    BYTE* freeBlockStart = (BYTE*)headBlock;
    if(m_trashOnFree)
        memset( m_boundsCheck == 1 ? freeBlockStart - 
                s_boundsCheckSize : freeBlockStart, 
                s_trashOnFreeSignature, fullBlockSize );

    DWORD freeUserDataSize = fullBlockSize - sizeof(Chunk);
    freeUserDataSize = (m_boundsCheck == 1) ? freeUserDataSize - 
                        s_boundsCheckSize * 2 : freeUserDataSize;    

    Chunk freeBlock(freeUserDataSize);
    freeBlock.m_prev = prev;
    freeBlock.m_next = next;
    freeBlock.write(freeBlockStart);

    // Move the memory around if guards are needed
    if(m_boundsCheck)
    {
        memcpy( freeBlockStart - s_boundsCheckSize, s_startBound, 
                s_boundsCheckSize );
        memcpy( freeBlockStart + sizeof(Chunk) + freeUserDataSize, 
                s_endBound, s_boundsCheckSize );
    }
}

That's it. The hard part is done. 
Now, since we want to handle several pools, we may want to have an XML file to store all this information. This means that we need a manager class and some XML parsing stuff. To parse the XML I've used a small opensource parser called TinyXML. It's included in the compressed file.

The memory pool manager class is simply a Singleton that contains a hashtable with all the pools. When constructed the manager will search for "pool.xml" and will generate all the pools as specified in that file. We can then get the pool by name as follow:

    MemoryPool* pool = MemoryPoolManager::it().getPool(name);

To make it easy to use we probably want to override new and delete. This is done in Allocation.h. This file also contains some useful defines that allow us to simply invoke a macro to allocate and deallocate as follows:

    MyClass* object = NEW(POOL("MyPool")) MyClass(param1, param2);
    DELETE(POOL("MyPool"), object);

That's it for now. I'm considering if it is worth to split this tutorial in two blocks to be more detailed, but I'm sure that if you are interested in memory management you are good enough to dig out any information from the source! :)

 



Download Deferred Rendering OpenGL Tutorial Memory Pool Tutorial


Only Visual Studio 2010 project file (and source code) is provided
Download sources (rar ~48 Kb)