GPU Function in C++

type
status
date
slug
summary
tags
category
icon
password
😀
In this note, I talk about the basic usage of cuda(C++) for accelerating calculation(a note for learning CUDA). All code can be found in official NIVIDIA lab course. learn.nvidia.com
 

📝Heterogeneous Systems

In modern accelerated calculators, CPU is used to distribute calculating task to GPU, then GPU start to run its task(and CPU can still work when GPU run), and finally CPU collect all result and output.
##I will cite some book oneday here##

Difference between code on CPU and GPU

Here is a code segment of .cu file:
Then we run a simple .cu
We can use nvcc to compile this:
-arch is used to restrict the compiling architecture(sm_70 is from Nvidia learn lab)

Parallel running kernel function

This picture is from official Nvidia slide:
notion image
Each block has the same number of threads, in above picture, 2 blocks within each 4 threads.
All kernel function(we call them as “GPU Function” previously) are runned in the same time.
But it has some problems caused by physical achievement of GPU (the order of output can not be controlled right now. I may talk about it in future notes.)
Notice to get the condition statement(threadIdx.x == 1023 && blockIdx.x == 255), we choose <<<256, 1024>>> becase the element of array begin from 0

Accelerating ’for‘ loop

In above code, we achieve parallel acceleration by replacing iteration to ThreadIdx.x
What if we want to map a vector(such as integer 0~7) to blocks(such as 2 blocks and each has 4 threads)?
In our example, we have blockDim = 4
Integer 6 = 2 + 1*4
As we can see, the order of outpu is a mess.

Memory Allocation and Deallocation

Global pointer is just replacing malloc and free by cudaMallocManaged and cudaFree .
Example: double each integer in an int-array.
What if the number of element in the vector is smaller than total number of threads?
引用的话语
 

观点2

引用的话语

🤗 总结归纳

总结文章的内容

📎 参考文章

  • 一些引用
  • 引用文章
 
💡
有关Notion安装或者使用上的问题,欢迎您在底部评论区留言,一起交流~
Prev
Stochastic Process
Next
Introduction
Loading...
Article List
About this Notes
Basic Concepts
SDK install and env config
Journey of Rust
Algorithm
Stochastic Process and Diffusion Model
CUDA Basic Usage
DeepLearning&Pytorch