Difference between revisions of "Cuda"

From Ghoulwiki
Jump to: navigation, search
(old cards, 256 mb)
Line 28: Line 28:
 
   gpu/cpu=    0.5  ok  err<=0.5% N=263250 size=9MB IO=15 i3:0,0 tgpu=43.3s tcpu=93.0
 
   gpu/cpu=    0.5  ok  err<=0.5% N=263250 size=9MB IO=15 i3:0,0 tgpu=43.3s tcpu=93.0
 
   gpu/cpu=    0.6  ok  err<=0.5% N=2621440 size=90MB IO=32 i3:1,1 tgpu=595.5s tcpu=1074.1
 
   gpu/cpu=    0.6  ok  err<=0.5% N=2621440 size=90MB IO=32 i3:1,1 tgpu=595.5s tcpu=1074.1
 +
 +
== new card laptop ==
 +
 +
<pre>
 +
name : GeForce 9500M GS
 +
    523584k totalGlobalMem
 +
        16k sharedMemPerBlock
 +
        8k regsPerBlock
 +
        32 warpSize
 +
      256k memPitch
 +
      512 maxThreadsPerBlock
 +
      512 maxThreadsDim[0]
 +
      512 maxThreadsDim[1]
 +
        64 maxThreadsDim[2]
 +
        63k maxGridSize[0]
 +
        63k maxGridSize[1]
 +
        1 maxGridSize[2]
 +
        64k totalConstMem
 +
        1 major
 +
        1 minor
 +
      927k clockRate
 +
      256 textureAlignment
 +
 +
</pre>
 +
  
 
== old cards, 256 mb ==
 
== old cards, 256 mb ==

Revision as of 16:37, 25 August 2008




  • $(CUDA_BIN_PATH)\nvcc.exe -arch sm_11 -ccbin "$(VCInstallDir)bin" -c -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/O2,/Zi,/MT -I"$(CUDA_INC_PATH)" -I./ -I../../common/inc -o $(ConfigurationName)\myproj.obj myproj.cu

success 11.10.2007

 gpu/cpu=     0.5  ok  err<=0.5% N=263250 size=9MB IO=15 i3:0,0 tgpu=43.3s tcpu=93.0
 gpu/cpu=     0.6  ok  err<=0.5% N=2621440 size=90MB IO=32 i3:1,1 tgpu=595.5s tcpu=1074.1

new card laptop

name : GeForce 9500M GS
    523584k totalGlobalMem
        16k sharedMemPerBlock
         8k regsPerBlock
        32 warpSize
       256k memPitch
       512 maxThreadsPerBlock
       512 maxThreadsDim[0]
       512 maxThreadsDim[1]
        64 maxThreadsDim[2]
        63k maxGridSize[0]
        63k maxGridSize[1]
         1 maxGridSize[2]
        64k totalConstMem
         1 major
         1 minor
       927k clockRate
       256 textureAlignment


old cards, 256 mb

##### ##### ##### ##### #####device 0
name : GeForce 8500 GT
    261888k totalGlobalMem
        16k sharedMemPerBlock
         8k regsPerBlock
        32 warpSize
       256k memPitch
       512 maxThreadsPerBlock
       512 maxThreadsDim[0]
       512 maxThreadsDim[1]
        64 maxThreadsDim[2]
        63k maxGridSize[0]
        63k maxGridSize[1]
         1 maxGridSize[2]
        64k totalConstMem
         1 major
         1 minor
      1371k clockRate
       256 textureAlignment
##### ##### ##### ##### #####device 1
name : GeForce 8500 GT
    261824k totalGlobalMem
        16k sharedMemPerBlock
         8k regsPerBlock
        32 warpSize
       256k memPitch
       512 maxThreadsPerBlock
       512 maxThreadsDim[0]
       512 maxThreadsDim[1]
        64 maxThreadsDim[2]
        63k maxGridSize[0]
        63k maxGridSize[1]
         1 maxGridSize[2]
        64k totalConstMem
         1 major
         1 minor
      1371k clockRate
       256 textureAlignment

new cards, 512 mb

##### ##### ##### ##### #####device 0
name : GeForce 8500 GT
    524032k totalGlobalMem
        16k sharedMemPerBlock
         8k regsPerBlock
        32 warpSize
       256k memPitch
       512 maxThreadsPerBlock
       512 maxThreadsDim[0]
       512 maxThreadsDim[1]
        64 maxThreadsDim[2]
        63k maxGridSize[0]
        63k maxGridSize[1]
         1 maxGridSize[2]
        64k totalConstMem
         1 major
         1 minor
       896k clockRate
       256 textureAlignment
##### ##### ##### ##### #####device 1
name : GeForce 8500 GT
    523968k totalGlobalMem
        16k sharedMemPerBlock
         8k regsPerBlock
        32 warpSize
       256k memPitch
       512 maxThreadsPerBlock
       512 maxThreadsDim[0]
       512 maxThreadsDim[1]
        64 maxThreadsDim[2]
        63k maxGridSize[0]
        63k maxGridSize[1]
         1 maxGridSize[2]
        64k totalConstMem
         1 major
         1 minor
       896k clockRate
       256 textureAlignment

sample output

N=65536
SX=2048
SY=64
SZ=2
I0=32
DATASIZE_IN_RAW=2304kb
DATASIZE_IN_STATE=256kb
DATASIZE_IN_INDEX=136kb
DATASIZE_IN_TOTAL=2696kb
DATASIZE_OUT_TOTAL=16384kb
0.74 sec : reading data from file
assert passed : (INDEXPOS_0(I0) == INDEXSTART_1-1)
assert passed : (INDEXPOS_1(I0-1,I0) == INDEXSTART_2-1)
assert passed : (INDEXPOS_2(I0-1,I0-1,I0) == INDEX_END-1)
assert passed : (sz < 255)
-2.478893,-2.459100,-2.459100,...,4.097229
0.09 sec : generating index data
0.02 sec : allocate and init device mem
9.55 sec : exec kernel on device
0.00 sec : receive results from device
atom[0]=57100 atom[1]=0 iNumResults=57100 kMaxResults=2097152
check : with index on cpu...
check : with index on cpu: iNumResults=57100
17.29 sec : check : with index on cpu

Press ENTER to exit...

coarse-strong-ultra-100-96_10000.txt

big file : 750 mb

N=65536
SX=2048
SY=64
SZ=2
I0=32
DATASIZE_IN_RAW=2304kb
DATASIZE_IN_STATE=256kb
DATASIZE_IN_INDEX=136kb
DATASIZE_IN_TOTAL=2696kb
DATASIZE_OUT_TOTAL=16384kb
line 1:coarse-strong-ultra-000000 000 0.04820 0.05043 0.05505 0.04887 0.05384 0.
05995 0.06215 0.06603 0.06615 0.06679 0.06202 0.06686 0.06654 0.06499 0.06705 0.
05835 0.05513 0.05442 0.05398 0.04658 0.05066 0.05162 0.04627 0.04324 0.04433 0.
04061 0.03673 0.04281 0.04078 0.03822 0.03821 0.03952 0.03648 0.03312 0.03402 0.
03277 0.03029 0.04310 0.04202 0.04168 0.03964 0.04356 0.04193 0.03870 0.04995 0.
04837 0.04460 0.04857 0.04831 0.04714 0.04734 0.05127 0.05319 0.05641 0.05886 0.
05365 0.05752 0.05877 0.05435 0.05526
line 2:0.05103 0.05033 0.04347 0.04750 0.04574 0.03781 0.03926 0.03612 0.03374 0
.03207 0.03060 0.02828 0.02507 0.02721 0.02558 0.02493 0.03374 0.02920 0.03031 0
.02906 0.03283 0.03207 0.03207 0.04170 0.04342 0.04020 0.04298 0.04131 0.04298 0
.04404 0.04810 0.04727 0.04491 0.04764 0.04301 0.04383 0.04317 0.04511 0.04545 0
.04818 0.04908 0.04463 0.04620 0.04699 0.04360 0.04529 0.04346 0.04122 0.04011 0
.03737 0.03293 0.02865 0.02832 0.02520 0.02255 0.02125 0.02035 0.01192 0.01107 0
.00685 0.01059 0.01442 0.01595 0.02073
line 3: 0.01839 0.01905 0.02360 0.02169 0.02749 0.02954 0.03428 0.03788 0.03793
0.04286 0.03990 0.04313 0.04426 0.04382 0.04529 0.04475 0.04587 0.04372 0.04377
0.04492 0.04141 0.04456 0.04552 0.04159 0.04223 0.03591 0.03698 0.03614 0.03736
0.03934 0.03516 0.03583 0.02827 0.02601 0.02256 0.02111 0.02014 0.02016 0.02150
0.01603 0.01632 0.01486 0.01818 0.01843 0.02040 0.02465 0.02337 0.02405 0.02452
0.02435 0.02745 0.02855 0.03582 0.03742 0.04081 0.04198 0.03672 0.03736 0.03749
0.03917 0.04121 0.04257 0.04485 0.0417
line 4:5 0.04121 0.04167 0.03555 0.03806 0.03863 0.03748 0.03451 0.03039 0.02957
 0.02493 0.02768 0.03054 0.02767 0.02438 0.02099 0.01713 0.01462 0.01508 0.01577
 0.01737 0.01791 0.02091 0.02171 0.02182 0.02522 0.02480 0.02493 0.02913 0.02996
 0.03269 0.03296 0.03631 0.03866 0.03451 0.04187 0.03918 0.04450 0.04598 0.04060
 0.04482 0.03857 0.04485 0.04499 0.04898 0.05087 0.04374 0.04700 0.04478 0.04084
 0.04081 0.04103 0.04135 0.03853 0.03664 0.03382 0.02755 0.02830 0.02619 0.02489
 0.02267 0.01911 0.01841 0.01437 0.015
line 5:69 0.01486 0.01429 0.01689 0.02116 0.02161 0.02497 0.02357 0.02480 0.0248
6 0.02717 0.03292 0.03393 0.03558 0.03961 0.04140 0.03921 0.03968 0.03907 0.0418
6 0.04094 0.04488 0.04213 0.04043 0.04132 0.04242 0.04413 0.04329 0.04607 0.0437
9 0.04371 0.04105 0.03754 0.03824 0.03826 0.03761 0.03649 0.03190 0.03074 0.0221
8 0.02409 0.02173 0.01948 0.01822 0.01906 0.02054 0.01712 0.01795 0.01876 0.0171
6 0.02161 0.02460 0.02727 0.02947 0.02950 0.03441 0.03185 0.03891 0.03999 0.0409
8 0.04559 0.04661 0.04359 0.04212 0.04
total lines : 1507200
Drücken Sie eine beliebige Taste . . .


big test 1 (without last index on device)

N=1048576
SX=32768
SY=1024
SZ=32
I0=32
DATASIZE_IN_RAW=36864kb
DATASIZE_IN_STATE=4096kb
DATASIZE_IN_INDEX=136kb
DATASIZE_IN_TOTAL=41096kb
DATASIZE_OUT_TOTAL=16384kb
ReadTextData data/Corel_ColorMoments_9d.ascii : 68040 lines of real data, added
980536 lines of random data
WARNING ! iRealNumLines=68040 does not match the hardcoded N=1048576
1.08 sec : reading data from file
assert passed : (INDEXPOS_0(I0) == INDEXSTART_1-1)
assert passed : (INDEXPOS_1(I0-1,I0) == INDEXSTART_2-1)
assert passed : (INDEXPOS_2(I0-1,I0-1,I0) == INDEX_END-1)
assert passed : (sz < 255)
-2.478893,-2.478893,-2.478893,...,4.097229
1.86 sec : generating index data
0.06 sec : allocate and init device mem
764.99 sec : exec kernel on device
0.00 sec : receive results from device
atom[0]=60051 atom[1]=0 iNumResults=60051 kMaxResults=2097152
check : with index on cpu...
check : with index on cpu: iNumResults=60051
357.94 sec : check : with index on cpu

Press ENTER to exit...

big data with all index levels used in device

N=1048576
SX=32768
SY=1024
SZ=32
I0=32
DATASIZE_IN_RAW=36864kb
DATASIZE_IN_STATE=4096kb
DATASIZE_IN_INDEX=136kb
DATASIZE_IN_TOTAL=41096kb
DATASIZE_OUT_TOTAL=16384kb
ReadTextData data/Corel_ColorMoments_9d.ascii : 68040 lines of real data, added
980536 lines of random data
WARNING ! iRealNumLines=68040 does not match the hardcoded N=1048576
0.96 sec : reading data from file
assert passed : (INDEXPOS_0(I0) == INDEXSTART_1-1)
assert passed : (INDEXPOS_1(I0-1,I0) == INDEXSTART_2-1)
assert passed : (INDEXPOS_2(I0-1,I0-1,I0) == INDEX_END-1)
assert passed : (sz < 255)
-2.478893,-2.478893,-2.478893,...,4.097229
1.86 sec : generating index data
0.06 sec : allocate and init device mem
354.73 sec : exec kernel on device
0.00 sec : receive results from device
atom[0]=60051 atom[1]=0 iNumResults=60051 kMaxResults=2097152
check : with index on cpu...
check : with index on cpu: iNumResults=60051
357.97 sec : check : with index on cpu

Press ENTER to exit...

big data with I0:16 : device is faster than cpu

N=1048576
SX=65536
SY=4096
SZ=256
I0=16
DATASIZE_IN_RAW=36864kb
DATASIZE_IN_STATE=4096kb
DATASIZE_IN_INDEX=18kb
DATASIZE_IN_TOTAL=40978kb
DATASIZE_OUT_TOTAL=16384kb
ReadTextData data/Corel_ColorMoments_9d.ascii : 68040 lines of real data, added
980536 lines of random data
WARNING ! iRealNumLines=68040 does not match the hardcoded N=1048576
0.95 sec : reading data from file
assert passed : (INDEXPOS_0(I0) == INDEXSTART_1-1)
assert passed : (INDEXPOS_1(I0-1,I0) == INDEXSTART_2-1)
assert passed : (INDEXPOS_2(I0-1,I0-1,I0) == INDEX_END-1)
assert FAILED : (sz < 255)
-2.478893,-2.478893,-2.478893,...,4.097229
1.91 sec : generating index data
0.06 sec : allocate and init device mem
393.60 sec : exec kernel on device
0.00 sec : receive results from device
atom[0]=60051 atom[1]=0 iNumResults=60051 kMaxResults=2097152
check : with index on cpu...
check : with index on cpu: iNumResults=60051
520.65 sec : check : with index on cpu

Press ENTER to exit...

gpu/cpu=     0.4  N=65536 size=2MB IO=16 gpu_idx3=0 cpu_idx3=0 time_gpu=10.6s time_cpu=24.8
gpu/cpu=     0.4  N=65536 size=2MB IO=16 gpu_idx3=0 cpu_idx3=0 time_gpu=10.6s time_cpu=25.3
gpu/cpu=     0.9  N=32768 size=1MB IO=32 gpu_idx3=0 cpu_idx3=0 time_gpu=3.1s time_cpu=3.4
gpu/cpu=     0.7  N=32768 size=1MB IO=32 gpu_idx3=0 cpu_idx3=0 time_gpu=2.9s time_cpu=4.3