A custom, naive CUDA dot product for complex scalars. More...
#include <cstdint>#include <complex>
Include dependency graph for cuda_dot_cukrn.h:
This graph shows which files directly or indirectly include this file:Namespaces | |
| namespace | syten |
| Syten namespace. | |
| namespace | syten::Cuda |
| Support functions (memory allocation etc.) for CUDA-based GPUs. | |
Functions | |
| void | syten::Cuda::cuda_dot_conj_kernel_impl (std::size_t sz, const std::complex< double > *to_be_conj_a, const std::complex< double > *b, std::complex< double > *result, void *cuda_stream) |
| Calculates the scalar product of two CUDA arrays. More... | |
Variables | |
| constexpr std::size_t | cukrn_dot_elems_per_worker = 4 |
Number of elements each thread adds up in the dot kernel, 4 seems to be the optimum for a Tesla P100 in a real-world test. More... | |
| constexpr std::size_t | cukrn_dot_threads = 16 |
Number of threads per thread block for the dot kernel, 16 seems to be the optimum for Telsa P100 in a real-world test. More... | |
A custom, naive CUDA dot product for complex scalars.