CUDA Programming/ProjectDescription: Difference between revisions
mNo edit summary (change visibility) |
Lack of citeWiki support ... (change visibility) |
||
| Line 4: | Line 4: | ||
= Introduction = |
= Introduction = |
||
Computational science (or scientific computing) is the field of study concerned with constructing mathematical models and quantitative analysis techniques and using computers to analyse and solve scientific problems. Scientists and engineers develop computer programs, application software, that model systems being studied and run these programs with various sets of input parameters.[ |
Computational science (or scientific computing) is the field of study concerned with constructing mathematical models and quantitative analysis techniques and using computers to analyse and solve scientific problems. Scientists and engineers develop computer programs, application software, that model systems being studied and run these programs with various sets of input parameters.[0] |
||
Applications from scientific computing often require a large amount of execution time due to large system sizes or a large number of iteration steps. The execution time can be significantly reduced by a parallel execution on a suitable parallel or distributed execution platform. [ |
Applications from scientific computing often require a large amount of execution time due to large system sizes or a large number of iteration steps. The execution time can be significantly reduced by a parallel execution on a suitable parallel or distributed execution platform. [1] Historically, people in the scientific area used supercomputers or computer grids to carry out these computations. |
||
However, with the advancements in computer graphics, graphics processing units became much efficient and powerful. Because of the nature of graphical data, GPUs became more specialized in handling complex matrix calculations and doing massive mathematical computations. As the processing power of GPUs has increased, so has their demand for electrical power. This problem has lead researchers to look for alternative solutions and parallel programming has been adopted by many scientists to further optimize the performance. |
However, with the advancements in computer graphics, graphics processing units became much efficient and powerful. Because of the nature of graphical data, GPUs became more specialized in handling complex matrix calculations and doing massive mathematical computations. As the processing power of GPUs has increased, so has their demand for electrical power. This problem has lead researchers to look for alternative solutions and parallel programming has been adopted by many scientists to further optimize the performance. |
||
Nowadays, GPU is especially well suited to address problems that can be expressed as data-parallel computations with high arithmetic intensity. Many applications that process large data sets such as arrays or volumes can use a data-parallel programming model to speed up computations. These applications include, for example[ |
Nowadays, GPU is especially well suited to address problems that can be expressed as data-parallel computations with high arithmetic intensity. Many applications that process large data sets such as arrays or volumes can use a data-parallel programming model to speed up computations. These applications include, for example[2]: |
||
* Seismic simulations |
* Seismic simulations |
||
* Computational biology |
* Computational biology |
||
| Line 15: | Line 15: | ||
* Signal processing |
* Signal processing |
||
* Physical simulation |
* Physical simulation |
||
Ackermann et al. [ |
Ackermann et al. [3] have developed a computational approach to allow massively parallel simulation of biological molecular networks that leverage the massively-parallel computing power of modern graphics card. They have demonstrated that the parallelization on the GPU has showed a speedup of about factor 59 compared to a CPU implementation executed on a standard PC. |
||
Davis et al. [ |
Davis et al. [4] have carried out water simulations on GPUs and compared the performance gained using a GPU versus the same simulation on a single CPU or multiple CPUs. According to their results, their GPU implementation performs ~7x faster then on a single CPU. |
||
Another research on data normalization, done by Rodríguez et al. [ |
Another research on data normalization, done by Rodríguez et al. [5], suggests that their implementation of a quantile-based normalization method for high density oligonucleotide array data based on variance and bias running on a GPU leads up to a speed-up factor exceeding 7x versus the counterpart methods implemented on CPUs. |
||
= Research Description = |
= Research Description = |
||
| Line 24: | Line 24: | ||
== Problem == |
== Problem == |
||
The problem I'll be working on to test the hardware is “cluster analysis of gene expressions”. |
The problem I'll be working on to test the hardware is “cluster analysis of gene expressions”. |
||
Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense [ |
Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense [6]. |
||
A gene is a segment of DNA, which contains the formula for the chemical composition of one particular protein. The large majority of abundantly expressed genes are associated with common functions, such as metabolism, and hence are expressed in all cells. However, there will be differences between the expression profiles of different cells, and even in a single cell, expression will vary with time, in a manner dictated by external and internal signals that reflect the state of the organism and the cell itself [ |
A gene is a segment of DNA, which contains the formula for the chemical composition of one particular protein. The large majority of abundantly expressed genes are associated with common functions, such as metabolism, and hence are expressed in all cells. However, there will be differences between the expression profiles of different cells, and even in a single cell, expression will vary with time, in a manner dictated by external and internal signals that reflect the state of the organism and the cell itself [7]. |
||
A natural basis for organizing gene expression data is to group together genes with similar patterns or expression. For any series of measurements, a number of sensible measures of similarity in the behavior of two genes can be used [ |
A natural basis for organizing gene expression data is to group together genes with similar patterns or expression. For any series of measurements, a number of sensible measures of similarity in the behavior of two genes can be used [8]. This information, then, can be used by the experts in biological sciences to gather further knowledge in the area. |
||
This situation makes cluster analysis the best candidate for extracting the information out of gene expressions. |
This situation makes cluster analysis the best candidate for extracting the information out of gene expressions. |
||
== Methodology == |
== Methodology == |
||
| Line 34: | Line 34: | ||
The performance of different implementations of the same clustering approach on the same GPU. Different implementations are expected to make different use of memory and have different number of threads/thread blocks. |
The performance of different implementations of the same clustering approach on the same GPU. Different implementations are expected to make different use of memory and have different number of threads/thread blocks. |
||
The same implementation of an algorithm on GPUs with different specifications. |
The same implementation of an algorithm on GPUs with different specifications. |
||
The candidate APIs that will be used to program GPUs are CUDA[ |
The candidate APIs that will be used to program GPUs are CUDA[9] and OpenCL[10]. |
||
== Evaluation == |
== Evaluation == |
||
Evaluation of the work is based on performance metrics used in evaluation of processing units (CPUs and GPUs). These metrics include; total execution time, speedup, number of threads running concurrently. |
Evaluation of the work is based on performance metrics used in evaluation of processing units (CPUs and GPUs). These metrics include; total execution time, speedup, number of threads running concurrently. |
||
There are also software tools can be used for evaluation, such as Visual Profiler[ |
There are also software tools can be used for evaluation, such as Visual Profiler[11] provided by NVIDIA. |
||
== Research Paper == |
== Research Paper == |
||
It's been decided a research paper to be written that would explain the process in detail including methods and parameters, reflect the performance results determined by the tests that will be done. |
It's been decided a research paper to be written that would explain the process in detail including methods and parameters, reflect the performance results determined by the tests that will be done. |
||
| Line 44: | Line 44: | ||
= References: = |
= References: = |
||
* [ |
* [0] http://en.wikipedia.org/wiki/Computational_science |
||
* [ |
* [1] Rauber T., Rünger G., “Exploiting Multiple Levels of Parallelism in Scientific Computing”. IFIP International Federation for Information Processing, 2005, Volume 172/2005, 3-19, DOI: 10.1007/0-387-24049-7_1 |
||
* [ |
* [2] NVIDIA Tesla GPU Computing Technical Brief. Version 1.0.0, 5/24/2007 |
||
* [ |
* [3] Ackermann, J., Baecher, P., Franzel T., Goesele, M., Hamacher, K., “Massively-Parallel Simulation of Biochemical Systems” |
||
* [ |
* [4] Davis, J., Ozsoy, A., Patel, S., Taufer, M., “Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors” |
||
* [ |
* [5] Rodríguez, A., Trelles, O., Ujaldón, M., “Using Graphics Processors for a High Performance Normalization of Gene Expressions” |
||
* [ |
* [6] http://en.wikipedia.org/wiki/Cluster_analysis |
||
* [ |
* [7] Domany, Eytan. “Cluster Analysis of Gene Expression Data” |
||
* [ |
* [8] Eisen, M., Spellman, P., Brown, P., Botstein, D., “Cluster Analysis and Display of Genome-Wide Expression Patterns”. PNAS December 8, 1998 vol. 95 no. 25 14863-14868 |
||
* [ |
* [9] http://www.nvidia.com/object/what_is_cuda_new.html |
||
* [ |
* [10] http://www.khronos.org/opencl/ |
||
* [ |
* [11] http://developer.nvidia.com/object/visual-profiler.html |
||
Revision as of 22:06, 19 February 2011
Evaluating the Performance of GPGPUs and Their Use in Scientific Computing
Introduction
Computational science (or scientific computing) is the field of study concerned with constructing mathematical models and quantitative analysis techniques and using computers to analyse and solve scientific problems. Scientists and engineers develop computer programs, application software, that model systems being studied and run these programs with various sets of input parameters.[0] Applications from scientific computing often require a large amount of execution time due to large system sizes or a large number of iteration steps. The execution time can be significantly reduced by a parallel execution on a suitable parallel or distributed execution platform. [1] Historically, people in the scientific area used supercomputers or computer grids to carry out these computations. However, with the advancements in computer graphics, graphics processing units became much efficient and powerful. Because of the nature of graphical data, GPUs became more specialized in handling complex matrix calculations and doing massive mathematical computations. As the processing power of GPUs has increased, so has their demand for electrical power. This problem has lead researchers to look for alternative solutions and parallel programming has been adopted by many scientists to further optimize the performance. Nowadays, GPU is especially well suited to address problems that can be expressed as data-parallel computations with high arithmetic intensity. Many applications that process large data sets such as arrays or volumes can use a data-parallel programming model to speed up computations. These applications include, for example[2]:
- Seismic simulations
- Computational biology
- Option risk calculations in finance
- Medical Imaging
- Pattern recognition
- Signal processing
- Physical simulation
Ackermann et al. [3] have developed a computational approach to allow massively parallel simulation of biological molecular networks that leverage the massively-parallel computing power of modern graphics card. They have demonstrated that the parallelization on the GPU has showed a speedup of about factor 59 compared to a CPU implementation executed on a standard PC. Davis et al. [4] have carried out water simulations on GPUs and compared the performance gained using a GPU versus the same simulation on a single CPU or multiple CPUs. According to their results, their GPU implementation performs ~7x faster then on a single CPU. Another research on data normalization, done by Rodríguez et al. [5], suggests that their implementation of a quantile-based normalization method for high density oligonucleotide array data based on variance and bias running on a GPU leads up to a speed-up factor exceeding 7x versus the counterpart methods implemented on CPUs.
Research Description
Purpose
The purpose of this research project is to illustrate the performance gain of using GPUs in general purpose computing compared to the performance of CPUs.
Problem
The problem I'll be working on to test the hardware is “cluster analysis of gene expressions”. Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense [6]. A gene is a segment of DNA, which contains the formula for the chemical composition of one particular protein. The large majority of abundantly expressed genes are associated with common functions, such as metabolism, and hence are expressed in all cells. However, there will be differences between the expression profiles of different cells, and even in a single cell, expression will vary with time, in a manner dictated by external and internal signals that reflect the state of the organism and the cell itself [7]. A natural basis for organizing gene expression data is to group together genes with similar patterns or expression. For any series of measurements, a number of sensible measures of similarity in the behavior of two genes can be used [8]. This information, then, can be used by the experts in biological sciences to gather further knowledge in the area. This situation makes cluster analysis the best candidate for extracting the information out of gene expressions.
Methodology
Although, for the time being. the exact methodology is not completely clear, it will require implementing cluster analysis algorithm(s) to be applied on gene-expression data and evaluating the performance on several hardware architectures. Different scenarios can be designed for evaluating/illustrating the work. Some of these comparison scenarios include: An implementation of an algorithm on a single core CPU versus the parallelized form of the same algorithm on a GPU. A parallel implementation of an algorithm on a multi-core CPU versus on a GPU. The performance of different implementations of the same clustering approach on the same GPU. Different implementations are expected to make different use of memory and have different number of threads/thread blocks. The same implementation of an algorithm on GPUs with different specifications. The candidate APIs that will be used to program GPUs are CUDA[9] and OpenCL[10].
Evaluation
Evaluation of the work is based on performance metrics used in evaluation of processing units (CPUs and GPUs). These metrics include; total execution time, speedup, number of threads running concurrently. There are also software tools can be used for evaluation, such as Visual Profiler[11] provided by NVIDIA.
Research Paper
It's been decided a research paper to be written that would explain the process in detail including methods and parameters, reflect the performance results determined by the tests that will be done.
Conclusions
This paper is aimed to provide an overview of the senior project by explaining the problem at hand, different approaches to the solution, different methods that can be used and metrics for evaluating the work. Also, the information that have been gathered throughout the semester is briefly reflected.
References:
- [0] http://en.wikipedia.org/wiki/Computational_science
- [1] Rauber T., Rünger G., “Exploiting Multiple Levels of Parallelism in Scientific Computing”. IFIP International Federation for Information Processing, 2005, Volume 172/2005, 3-19, DOI: 10.1007/0-387-24049-7_1
- [2] NVIDIA Tesla GPU Computing Technical Brief. Version 1.0.0, 5/24/2007
- [3] Ackermann, J., Baecher, P., Franzel T., Goesele, M., Hamacher, K., “Massively-Parallel Simulation of Biochemical Systems”
- [4] Davis, J., Ozsoy, A., Patel, S., Taufer, M., “Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors”
- [5] Rodríguez, A., Trelles, O., Ujaldón, M., “Using Graphics Processors for a High Performance Normalization of Gene Expressions”
- [6] http://en.wikipedia.org/wiki/Cluster_analysis
- [7] Domany, Eytan. “Cluster Analysis of Gene Expression Data”
- [8] Eisen, M., Spellman, P., Brown, P., Botstein, D., “Cluster Analysis and Display of Genome-Wide Expression Patterns”. PNAS December 8, 1998 vol. 95 no. 25 14863-14868
- [9] http://www.nvidia.com/object/what_is_cuda_new.html
- [10] http://www.khronos.org/opencl/
- [11] http://developer.nvidia.com/object/visual-profiler.html