Shared Memory in CUDA using Julia and CUDAnative -- Dr. Atip Asvanund

This series continues to outline what I am learning from building a machine learning algorithm (Neuroevolution) in Julia and CUDAnative.

As mentioned in the title, I am using CUDAnative to program on GPUs.

My setup is a late-2016 MacBook Pro with a GeForce GTX 1080 Ti. If you want to know how I connect a GeForce GTX 1080 Ti to a MacBook Pro, that may be worth another article.

Shared memory is typically used to synchronize across GPU threads. It also has the added benefit of being much faster than global memory. Yet, there is only a few dozens KBs of shared memory that can be used across all threads in a block. So one cannot dump everything into shared memory just to get the speed increase.

using CUDAdrv, CUDAnative

function kernel(x)
    i = threadIdx().x
    shared = @cuStaticSharedMem(Int64,1)
    if i == 1
        shared[1] = 255
    end
    sync_threads()
    x[i] = shared[1]
    return nothing
end

d_x = CuArray{Int64,1}(10)
@cuda (1, 10) kernel(d_x)
x = Array(d_x)
println(x)

OUTPUT: [255, 255, 255, 255, 255, 255, 255, 255, 255, 255]

ดร.อธิป อัศวานันท์

เป็นผู้บริหารของบริษัททรูคอร์ปอเรชัน จำกัด (มหาชน) ซึ่งเป็นธุรกิจโทรคมนาคม บรอดคาสติ้ง และ ดิจิตอลคอนเวอร์เจนซ์ ชั้นนำของประเทศ และเป็นกรรมการในองค์กรสาธารณะ ได้แก่ สมาพันธ์สมาคมวิชาชีพวิทยุกระจายเสียงและวิทยุโทรทัศน์ และ สภาวิชาชีพกิจการการแพร่ภาพและการกระจายเสียง ดร.อธิป เป็นรองประธานกิจการไอซีทีหอการไทย และเคยดำรงตำแหน่งทางการเมืองเป็นเลขาธิการรองประธานสภาผู้แทนราษฎร

DrJoke

DrJoke

Dr. Atip Asvanund

DrJoke

Shared Memory in CUDA using Julia and CUDAnative

DrJoke

ดร.อธิป อัศวานันท์