CUDA Programming/DeviceQuery
Both CUDA API allows us to gather certain information such as the driver version, available devices, very detailed device properties like total available memory, bandwidth, computing capabilities etc.
Implementation
First of all, for the simplicity of the example, we did not include any error detection & correction and ignored error values retuned by functions.
We need to know the number of CUDA-capable devices on the system to begin with:
cudaError_t cudaGetDeviceCount(int *count)
Stores the number of devices with compute capability greater than or equal to 1.0,
and are avabilable for execution, in count.
Devices are enumerated, starting from 0, incrementally. Then for each device we need to query their properties. Properties are encapsulated in the following structure:
struct cudaDeviceProp{char name[256];Identifies the devicesize_t totalGlobalMem;Total amount of global memory available on the device in bytessize_t sharedMemPerBlockMaximum amount of shared memory available to a thread block in bytesint regsPerBlock;Maximum number of 32-bit registers available to a thread blockint warpsize;Warp size in threadssize_t memPitch;Maximum pitch in bytes allowed by the memory copy functions that involve memory regions allocated through cudaMallocPitch() callint maxThreadsPerBlock;Maximum number of threads per blockint maxThreadsDim[3];Maximum size of each dimension of a blockint maxGridSize[3];Maximum size of each dimension of a gridsize_t totalConstMem;Total amount of constant memory available on the device in bytesint major;Represents the major revision number of the device's compute capabilityint minor;Represents the minor revision number of the device's compute capabilityint clockRate;Clock frequency in kilohertzsize_t textureAlignment;Defines the alignment requirement; texture base addresses that are aligned to textureAlignment number of bytes do not need an offset applied to texture fetchesint deviceOverlap;Specifies whether the device can concurrently copy memory between host and device while executing a kernel Return value of one indicates that the device supports device overlapint multiProcessorCount;Number of multiprocessors on the deviceint kernelExecTimeoutEnabled;Specifies if there is a run time limit for kernels on the device Return value of 1 has positive indicationint integrated;Specifies if the device is an integrated (motherboard) GPU or a discrete (card) component Return value of 0 represents discrete, value of 1 represents integrated optionint canMapHostMemory;Specifies whether the device can map host memory into the CUDA address space Return value of 1 has positive indicationint computeMode;Specifies the compute mode that the device is currently in. Return value of cudaComputeModeDefault means that multiple threads can call cudaSetDevice() Return value of cudaComputeModeExclusive means that only one thread can call cudaSetDevice() Return value of cudaComputeModeProhibited means that no threads can call cudaSetDevice()int concurrentKernels;Specifies whether the device supports executing multiple kernels within the same context simultaneously or not Return value of 1 has positive indicationint ECCEnabled;Specifies whether the device has ECC support or not Return value of 1 has positive indicationint pciBusID;PCI bus identifier of the deviceint pciDeviceID;PCI device, or slot, identifier of the deviceint tccDriver;Specifies if the driver is using a TCC driver or not Return value of 1 has positive indication}
We can obtain the device properties using the following call:
cudaError_t cudaGetDeviceProperties(struct cudaDeviceProp *prop, int device)
Stores the properties of the device, whose id is device, in argument prop
So, with the assumption of a function with the given definition:
void printDeviceProperties(const struct cudaDeviceProp *prop)
Prints out the device properties stored in prop
the main section of the program looks like this:
struct cudaDeviceProp **cudaDevices;int count, i;cudaGetDeviceCount(&count);cudaDevices = (struct cudaDeviceProp **)malloc( sizeof(struct cudaDeviceProp *) * count );for (i = 0; i < count; ++i){cudaDevices[i] = (struct cudaDeviceProp *)malloc(sizeof(struct cudaDeviceProp));cudaGetDeviceProperties(cudaDevices[i], i);printDeviceProperties(cudaDevices[i]);free(cudaDevices[i]);}free(cudaDevices);