Cuda allocator class, uses the buddy system. More...
Public Member Functions | |
CudaPtr< void > | alloc (std::size_t sz, std::int16_t device) |
Returns a usable block of memory of size sz on device device if possible or on any device otherwise. More... | |
void | free (CudaPtr< void > ptr) |
Handles freeing the CudaPtr ptr by returning it to the free list. More... | |
std::uint16_t | get_max_size () const |
Returns the value of the max_size variable. More... | |
std::uint16_t | get_min_size () const |
Returns the value of the min_size variable. More... | |
void | initialise () |
Initialises the allocator, called from Cuda::setup(). More... | |
void | print_status () |
Prints the current allocator status. More... | |
std::uint16_t | set_max_size (std::uint16_t sz) |
Sets the max_size variable. More... | |
std::uint16_t | set_min_size (std::uint16_t sz) |
Sets the min_size variable. More... | |
~CudaAllocator () | |
Destructs the allocator and checks that all memory was freed correctly. More... | |
Private Types | |
using | DeviceFreeList = Vec< SizeFreeList > |
Vector of sized free lists. More... | |
using | GlobalFreeList = std::map< std::int16_t, DeviceFreeList > |
Map of devices to DeviceFreeList objects. More... | |
using | GlobalMutexes = std::map< std::int16_t, std::mutex > |
Map of devices to mutexes. More... | |
using | SizeFreeList = Pair< std::uint64_t, Vec< void * > > |
Pair of [size - vector of free regions] with each free region storing its own pointer in Pair::first and its buddy pointer in Pair::second. More... | |
Private Member Functions | |
std::uint64_t | alloc_exp2 (std::uint64_t x) |
Helper function to calculate 2**x. More... | |
void | free_memory (void *ptr) |
Helper function, frees a memory allocation to the OS. More... | |
int | get_from_upper (std::int16_t device, std::int16_t idx, bool allocate_new_memory) |
Ensures that the sized free list of device device at index idx contains at least one element by splitting a block from the upper level or allocating from the OS (if allocate_new_memory is true). More... | |
void * | get_memory (std::size_t sz, std::int16_t device) |
Helper function, obtains a block of memory of size sz from CUDA on device device . More... | |
void | handle_or_insert (std::int16_t device, void *ptr, std::int16_t idx) |
Inserts a pointer ptr into the sized free list of device device at size index idx and handles any resulting merges. More... | |
Private Attributes | |
Vec< void * > | block_pointers = {} |
Our collection of blocks we use for buddy allocations. More... | |
GlobalFreeList | freelist = {} |
Our collection of free lists. More... | |
GlobalMutexes | freelistmtx = {} |
Our collection of associated mutexes. More... | |
std::uint16_t | max_size = 32 |
Maximal size of a block given out and block size allocated from CUDA, 2**32 is 4 GB. More... | |
std::uint16_t | min_size = 8 |
Minimal size of a block given out, 2**8 is 256 Byte. More... | |
Cuda allocator class, uses the buddy system.
The allocator obtains blocks of size 2**max_size via cudaMalloc() and then splits those up when a request comes in, at least into blocks of size 2**min_size. Both max_size and min_size can be set before Cuda is initialised using Cuda::allocator_set_[max|min]_size().
The allocator is a static object, which checks on destruction that all CUDA memory was freed correctly.