◆ cuda_transpose_kernel_impl() [4/4]

void syten::CudaDenseTensorImpl::cuda_transpose_kernel_impl ( std::uint32_t  rank,
std::size_t  sz,
const std::complex< float > *  inp,
std::complex< float > *  out,
cukrn_transpose_array const &  old_dim,
cukrn_transpose_array const &  new_dim,
cukrn_transpose_array const &  ar_perm,
void *  str,
bool  do_conj 

Launcher for the CUDA tensor transposition kernel, std::complex<float> version.

ranknumber of tensor indices
sznumber of elements in the tensor
inpinput tensor array, pointer to device memory
outoutput tensor array, pointer to device memory
old_dimdimensions of the input tensor as a simple POD
new_dimdimensions of the output tensor as a simple POD
ar_permpermutation, perm[i] = j puts the old leg j at position i+1
strCUDA stream in which the computation will take place
do_conjwhether to conjugate every entry