A custom, naive CUDA dot product for complex scalars. More...
#include <cstdint>
#include <complex>
Namespaces | |
namespace | syten |
Syten namespace. | |
namespace | syten::Cuda |
Support functions (memory allocation etc.) for CUDA-based GPUs. | |
Functions | |
void | syten::Cuda::cuda_dot_conj_kernel_impl (std::size_t sz, const std::complex< double > *to_be_conj_a, const std::complex< double > *b, std::complex< double > *result, void *cuda_stream) |
Calculates the scalar product of two CUDA arrays. More... | |
Variables | |
constexpr std::size_t | cukrn_dot_elems_per_worker = 4 |
Number of elements each thread adds up in the dot kernel, 4 seems to be the optimum for a Tesla P100 in a real-world test. More... | |
constexpr std::size_t | cukrn_dot_threads = 16 |
Number of threads per thread block for the dot kernel, 16 seems to be the optimum for Telsa P100 in a real-world test. More... | |
A custom, naive CUDA dot product for complex scalars.