PDA

View Full Version : OpenCL GPU+CPU GALAXY Bench


mitchde
09-01-2009, 11:31 AM
10.6 only.


http://www.insanelymac.com/forum/index.php?act=attach&type=post&id=54949


Using:
Start Galaxy
key s = switch compute Modes
>CPU>Single/Multi, CPU-Vector/SSE Single/Multi>GPU> GPU+CPU> (bold=start Mode)
key SPACE = Pause/go on
key 6 = Reset Szene
key Q = QUIT

DOWNLOAD 6 MB:
http://rapidshare.com/files/273552904/OpenCL_Galaxis.zip

LLMV GCC 4.2 Compiler compiliert = high optimized Code

Some results (posted):

mitch (C2D 3GHZ, NV 9600 GT , 1440x1140)
24 Gigaflops / around 70 U/sec : CPU ( SIM: Vector Multi-Core CPU. Mode)
73 Gigaflops / around 220 U/sec : NV 9600 GT

1600x1200
21 Gigaflops / around 65 U/sec : CPU ( SIM: Vector Multi-Core CPU. Mode)
60 Gigaflops / around 190 U/sec : Nvidia 9600 GT

Users:
170 Gigaflops / around 505 U/sec : Nvidia 9800 GTX+

MacPro Early 2008
CPU : 95 Gigaflops
GPU 60 Gigaflops NV 8800GT

iyohmamma
10-10-2009, 06:49 AM
links no work!!!!!!!! instructions??????:(

mitchde
10-10-2009, 01:21 PM
Hi, someone posted wrong (very OLD=Version1=removed) Links:

Here are my (i compiled that stuff) - all need 10.6.x.
ATI has problems with OpenCL, so only the Displacement works here. Galaxies cant run (need 10.6.2+...get fixed by Apple/ATI)

OpenCLBench_Displacement_Version 2
http://rapidshare.com/files/287474292/OpenCLBench_Displacement_V2.zip
Galaxies_8K_Version 2 - for slow OpenCL GPUs
http://rapidshare.com/files/286235157/Galaxies_8K_V2.zip
Galaxies_32K_Version 2 - for fast gpus
http://rapidshare.com/files/286234291/Galaxies_32K_V2.zip

Galaxies does an Starsimulation. If you have slow OpenCL GPU and/or slow CPU best use first the 8K Version.

Displacement does some rendering/shining.


Very soon an OpenCL Smoke Particles demo will be available:

http://freenet-homepage.de/amichalak/9600GT_65K_particles.jpg

x986123
10-10-2009, 04:48 PM
SMOKE! Whered you get that! I want try it =D

thorazine74
10-10-2009, 05:40 PM
Intel C2D 2,66@3,20 GHz + GeForce 8600GTS 512 Mb

1 OpenCL platform found!

[Platform 0]
Name: Apple
Vendor: Apple
Version: OpenCL 1.0 (Jul 15 2009 23:07:32)
Profile: FULL_PROFILE


[OpenCL-only Context]
2 OpenCL devices found!

[Device 0]
Name: GeForce 8600 GTS
Vendor: NVIDIA
Type: GPU
Device Version: OpenCL 1.0
Driver Version: CLH 1.0
Compute Units: 32
Work Group Size: 512
Clock: 1450 MHz
Global Memory: 512 MB
Local Memory: 16 KB
Cache Size: 0 KB
Cache Line Size: 0 Bytes
Available: Yes
Double-Precision: No
Extensions:
cl_khr_byte_addressable_store
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_APPLE_gl_sharing
cl_APPLE_SetMemObjectDestructor
cl_APPLE_ContextLoggingFunctions

[Device 1]
Name: Intel(R) Core(TM)2 Duo CPU E8200 @ 2.66GHz
Vendor: Intel
Type: CPU
Device Version: OpenCL 1.0
Driver Version: 1.0
Compute Units: 2
Work Group Size: 1
Clock: 3228 MHz
Global Memory (Total): 2048 MB
Global Memory (Host): 1536 MB
Global Memory (PCIe): 512 MB
Local Memory: 16 KB
Cache Size: 6144 KB
Cache Line Size: 64 Bytes
Available: Yes
Double-Precision: Yes
Extensions:
cl_khr_fp64
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_byte_addressable_store
cl_APPLE_gl_sharing
cl_APPLE_SetMemObjectDestructor
cl_APPLE_ContextLoggingFunctions

[shared OpenCL+OpenGL Context]
2 OpenCL devices found!

[Device 0]
Name: GeForce 8600 GTS
Vendor: NVIDIA
Type: GPU
Device Version: OpenCL 1.0
Driver Version: CLH 1.0
Compute Units: 32
Work Group Size: 512
Clock: 1450 MHz
Global Memory: 512 MB
Local Memory: 16 KB
Cache Size: 0 KB
Cache Line Size: 0 Bytes
Available: Yes
Double-Precision: No
Extensions:
cl_khr_byte_addressable_store
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_APPLE_gl_sharing
cl_APPLE_SetMemObjectDestructor
cl_APPLE_ContextLoggingFunctions

[Device 1]
Name: Intel(R) Core(TM)2 Duo CPU E8200 @ 2.66GHz
Vendor: Intel
Type: CPU
Device Version: OpenCL 1.0
Driver Version: 1.0
Compute Units: 2
Work Group Size: 1
Clock: 3228 MHz
Global Memory (Total): 2048 MB
Global Memory (Host): 1536 MB
Global Memory (PCIe): 512 MB
Local Memory: 16 KB
Cache Size: 6144 KB
Cache Line Size: 64 Bytes
Available: Yes
Double-Precision: Yes
Extensions:
cl_khr_fp64
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_byte_addressable_store
cl_APPLE_gl_sharing
cl_APPLE_SetMemObjectDestructor
cl_APPLE_ContextLoggingFunctions

logoutDisplay Res: 1680x1050

Displacement V2:
[GPU]: Shader 1/Shader 2
Compute: 24 ms. / 19 ms.
Display: 41 fps. / 52 fps.

Galaxy 8K V2:
Vector Single Core: 12
Vector MultiCore: 22
GeForce 8600GTS: 52
Hybrid MultiCore CPU+GPU: 29

I didnt try the 32K, I suppose is too much for this old card...

mitchde
10-11-2009, 02:27 PM
Smoke Partikels and all others are from Apple Dev Source. Just compiled.
For those only want to look, here is the Smoke Particles video:
http://www.youtube.com/watch?v=-7yTRxJhVps

On that Link you see also the DL Link (10 MB) and how to run instructions (easy ;) )
Only CUDA driver must be installed, Links also at the video link above.

CUDA has the advantage to run also in OS X 10.5 , OpenCL is 10.6!
Disadvantage (like ATI STEAM) is thats vendoe specific - CUDA app runs only on NVIDIA , and ATI STEAM (no OS X) only on ATI GPUS. OpenCL should be universal. Until now its not, because ATI gpus have trouble with OpenCL.
OpenCL main difference to CUDA + ATI STEAM is, that OpenCL part of the App is compiled at runtime ! So The dev must NOT
compile that for an specific gpu. The openCL Source will compiled for the specific gpu the OpenCL framework finds at runtime.
But OpenCL has indeed also some things to code to optimize the source for the complete different (in features+speed) gpus.
So OpenCL has great feature but is more work (also brain work) to get an big bandwith of really universal gpu computing.

hys17
10-17-2009, 12:24 AM
I passed all the test except the VolumeRender.Here's the content of the txt file:


/Users/xxx/Downloads/OpenCL_Bench_SET_V2/from Nvidia Sources/VolumeRender_OpenCL/oclVolumeRender Starting...

Press '=' and '-' to change density
']' and '[' to change brightness
';' and ''' to modify transfer function offset
'.' and ',' to modify transfer function scale

CL_DEVICE_VENDOR: NVIDIA
CL_DEVICE_NAME: GeForce GTX 280
CL_DRIVER_VERSION: CLH 1.0
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 240
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 512 / 512 / 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 512
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1350 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_IMAGE_MAX_WIDTH: 2d width 8192, 2d height 8192, 3d width 2048, 3d height 2048, 3d depth 2048
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 256 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 1024 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 16 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_EXTENSIONS:
cl_khr_byte_addressable_store
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_APPLE_gl_sharing
cl_APPLE_SetMemObjectDestructor
cl_APPLE_ContextLoggingFunctions
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
CL_DEVICE_PREFERRED_VECTOR_WIDTH: char 1, short 1, int 1, long 1, float 1, double 0


!!! Error # 0 at line 161 , in file oclVolumeRender.cpp !!!


Starting Cleanup...

TEST FAILED !!!...

oclVolumeRender.exe Exiting...
Press <Enter> to Quit


Is it because it's [from Nvidia Sources]?so the test doesn't work with unofficially supported GPU?