SyTen

◆ cuda_transpose_kernel_impl() [2/4]

void syten::CudaDenseTensorImpl::cuda_transpose_kernel_impl ( std::uint32_t  rank,
std::size_t  sz,
const float *  inp,
float *  out,
cukrn_transpose_array const &  old_dim,
cukrn_transpose_array const &  new_dim,
cukrn_transpose_array const &  ar_perm,
void *  str,
bool  do_conj 
)

Launcher for the CUDA tensor transposition kernel, float version.

Parameters
ranknumber of tensor indices
sznumber of elements in the tensor
inpinput tensor array, pointer to device memory
outoutput tensor array, pointer to device memory
old_dimdimensions of the input tensor as a simple POD
new_dimdimensions of the output tensor as a simple POD
ar_permpermutation, perm[i] = j puts the old leg j at position i+1
strCUDA stream in which the computation will take place
do_conjwhether to conjugate every entry (ignored)