grain.cuda

CUDA wrapper module

Members

Aliases

CudaElementType
alias CudaElementType(M : CuPtr!T, T) = T
alias CudaElementType(M : CuArray!T, T) = T
alias CudaElementType(M : RefCounted!(CuPtr!T), T) = T
alias CudaElementType(M : RefCounted!(CuArray!T), T) = T

alias to element type of cuda storage

Classes

Global
class Global

global accessor for the cuda module in grain

Functions

axpy
void axpy(const ref CuArray!T x, ref CuArray!T y, T alpha = 1, int incx = 1, int incy = 1)

high-level axpy (y = alpha * x + y) wrapper for CuPtr

checkCUDNN
void checkCUDNN(cudnnStatus_t err)

cudnn error checker

checkCublasErrors
void checkCublasErrors(cublasStatus_t err)

cublas error checker

checkCudaErrors
void checkCudaErrors(CUresult err)

cuda error checker

copy
void copy(ref CuPtr!T src, ref CuPtr!T dst)

deep copy inter device memory without allocation

dup
auto dup(ref M m)

duplicate cuda memory (deep copy)

empty
bool empty(M m)

true if length == 0

fill_
ref fill_(ref S storage, V v, size_t N)

fill value for N elements from the first position TODO use cudnnSetTensor

fill_
ref fill_(ref S storage, V value)

fill value for all the element in device array

global
auto global()

global accessor for the cuda module in grain

sum
float sum(ref S a)

test sum

toHost
ref toHost(ref M m, scope ref T[] host)

copy device memory to host (maybe reallocate in host)

toHost
auto toHost(ref M m, T* host)

copy device memory to host (CAUTION: no reallocation here)

toHost
auto toHost(ref M m)

allocate host memory and copy device memory content

zero_
ref zero_(ref S storage)

fill zero for all the element in device array

zeros
auto zeros(size_t N)

create zero filled N elements array

Structs

CuArray
struct CuArray(T)

sub-region on CuPtr!T

CuModule
struct CuModule

cuda module compiled from ptx string

CuPtr
struct CuPtr(T)

fat pointer in CUDA

Kernel
struct Kernel(alias F)

cuda function object called by mangled name of C++/D device function F

Launcher
struct Launcher(Args...)

cuda kernel function launcher with runtime numbers of blocks/threads

Variables

isDeviceMemory
enum bool isDeviceMemory(T);

trait to identify cuda storage

Meta