Cuda allocator class, uses the buddy system. More...
Collaboration diagram for syten::Cuda::CudaAllocator:Public Member Functions | |
| CudaPtr< void > | alloc (std::size_t sz, std::int16_t device) |
Returns a usable block of memory of size sz on device device if possible or on any device otherwise. More... | |
| void | free (CudaPtr< void > ptr) |
Handles freeing the CudaPtr ptr by returning it to the free list. More... | |
| std::uint16_t | get_max_size () const |
Returns the value of the max_size variable. More... | |
| std::uint16_t | get_min_size () const |
Returns the value of the min_size variable. More... | |
| void | initialise () |
| Initialises the allocator, called from Cuda::setup(). More... | |
| void | print_status () |
| Prints the current allocator status. More... | |
| std::uint16_t | set_max_size (std::uint16_t sz) |
Sets the max_size variable. More... | |
| std::uint16_t | set_min_size (std::uint16_t sz) |
Sets the min_size variable. More... | |
| ~CudaAllocator () | |
| Destructs the allocator and checks that all memory was freed correctly. More... | |
Private Types | |
| using | DeviceFreeList = Vec< SizeFreeList > |
| Vector of sized free lists. More... | |
| using | GlobalFreeList = std::map< std::int16_t, DeviceFreeList > |
| Map of devices to DeviceFreeList objects. More... | |
| using | GlobalMutexes = std::map< std::int16_t, std::mutex > |
| Map of devices to mutexes. More... | |
| using | SizeFreeList = Pair< std::uint64_t, Vec< void * > > |
| Pair of [size - vector of free regions] with each free region storing its own pointer in Pair::first and its buddy pointer in Pair::second. More... | |
Private Member Functions | |
| std::uint64_t | alloc_exp2 (std::uint64_t x) |
| Helper function to calculate 2**x. More... | |
| void | free_memory (void *ptr) |
| Helper function, frees a memory allocation to the OS. More... | |
| int | get_from_upper (std::int16_t device, std::int16_t idx, bool allocate_new_memory) |
Ensures that the sized free list of device device at index idx contains at least one element by splitting a block from the upper level or allocating from the OS (if allocate_new_memory is true). More... | |
| void * | get_memory (std::size_t sz, std::int16_t device) |
Helper function, obtains a block of memory of size sz from CUDA on device device. More... | |
| void | handle_or_insert (std::int16_t device, void *ptr, std::int16_t idx) |
Inserts a pointer ptr into the sized free list of device device at size index idx and handles any resulting merges. More... | |
Private Attributes | |
| Vec< void * > | block_pointers = {} |
| Our collection of blocks we use for buddy allocations. More... | |
| GlobalFreeList | freelist = {} |
| Our collection of free lists. More... | |
| GlobalMutexes | freelistmtx = {} |
| Our collection of associated mutexes. More... | |
| std::uint16_t | max_size = 32 |
| Maximal size of a block given out and block size allocated from CUDA, 2**32 is 4 GB. More... | |
| std::uint16_t | min_size = 8 |
| Minimal size of a block given out, 2**8 is 256 Byte. More... | |
Cuda allocator class, uses the buddy system.
The allocator obtains blocks of size 2**max_size via cudaMalloc() and then splits those up when a request comes in, at least into blocks of size 2**min_size. Both max_size and min_size can be set before Cuda is initialised using Cuda::allocator_set_[max|min]_size().
The allocator is a static object, which checks on destruction that all CUDA memory was freed correctly.