CUDA Programming/BenchmarkingTools
Benchmarking is a process, involves a set of software tools, to measure the relative performance of an object by running a number of standard tests and trials against it.[0] One of the common properties of these programs, which are written with different APIs such as OpenCL and CUDA and which are expected to run on different architectures, is the execution time. However, the execution time of a program is not definitive because it contains the time for code initialization, memory allocation etc. We are only interested in the time required for the GPU, or the CPU, to carry out the calculations. Moreover, for programs that run on the GPU, there are other common metrics such as kernel metrics. For these reasons, we used two benchmarking tools:
Benchmarking Tools
Simple Stopwatch
This is a simple stopwatch implementation written in C. It uses the following structure:
typedef struct stopwatch_s{double start;double stop;double time_elapsed;} stopwatch_t;
and defines the following following methods:
void start_stopwatch(stopwatch_t *);void stop_stopwatch(stopwatch_t *);
It allows us to measure the spent on the specific parts of the code.
Nvidia Compute Visual Profiler
This is a cross-platform profiling tool that has the following, and not limited to, features: [1]
- Create a profile based on:
- Kernel occupancy
- Instruction throughput
- Memory access characteristics
- Generate charts and graphs based on results
- Compare results across multiple sessions