Device Name |
cpu-cascadelake-Intel(R) Core™ i9-10980XE CPU @ 3.00GHz |
NVIDIA GeForce RTX 3060 |
Device Vendor |
GenuineIntel |
NVIDIA Corporation |
Device Vendor ID |
0x10006 |
0x10de |
Device Version |
OpenCL 3.0 PoCL HSTR: cpu-x86_64-pc-linux-gnu-cascadelake |
OpenCL 3.0 PoCL HSTR: CUDA-sm_75 |
Device Numeric Version |
0xc00000 (3.0.0) |
0xc00000 (3.0.0) |
Driver Version |
6.0 |
6.0 |
Device OpenCL C Version |
OpenCL C 1.2 PoCL |
OpenCL C 1.2 PoCL |
Device OpenCL C all versions |
OpenCL C 0x400000 (1.0.0), OpenCL C 0x401000 (1.1.0), OpenCL C 0x402000 (1.2.0), OpenCL C 0xc00000 (3.0.0) |
OpenCL C 0x400000 (1.0.0), OpenCL C 0x401000 (1.1.0), OpenCL C 0x402000 (1.2.0), OpenCL C 0xc00000 (3.0.0) |
Device OpenCL C features |
__opencl_c_3d_image_writes 0xc00000 (3.0.0), __opencl_c_images 0xc00000 (3.0.0), __opencl_c_atomic_order_acq_rel 0xc00000 (3.0.0), __opencl_c_atomic_order_seq_cst 0xc00000 (3.0.0), __opencl_c_atomic_scope_device 0xc00000 (3.0.0), __opencl_c_program_scope_global_variables 0xc00000 (3.0.0), __opencl_c_atomic_scope_all_devices 0xc00000 (3.0.0), __opencl_c_generic_address_space 0xc00000 (3.0.0), __opencl_c_work_group_collective_functions 0xc00000 (3.0.0), __opencl_c_read_write_images 0xc00000 (3.0.0), __opencl_c_subgroups 0xc00000 (3.0.0), __opencl_c_fp16 0xc00000 (3.0.0), __opencl_c_fp64 0xc00000 (3.0.0), __opencl_c_ext_fp32_global_atomic_add 0xc00000 (3.0.0), __opencl_c_ext_fp32_local_atomic_add 0xc00000 (3.0.0), __opencl_c_ext_fp32_global_atomic_min_max 0xc00000 (3.0.0), __opencl_c_ext_fp32_local_atomic_min_max 0xc00000 (3.0.0), __opencl_c_ext_fp64_global_atomic_add 0xc00000 (3.0.0), __opencl_c_ext_fp64_local_atomic_add 0xc00000 (3.0.0), __opencl_c_ext_fp64_global_atomic_min_max 0xc00000 (3.0.0), __opencl_c_ext_fp64_local_atomic_min_max 0xc00000 (3.0.0), __opencl_c_int64 0xc00000 (3.0.0) |
__opencl_c_images 0xc00000 (3.0.0), __opencl_c_atomic_order_acq_rel 0xc00000 (3.0.0), __opencl_c_atomic_order_seq_cst 0xc00000 (3.0.0), __opencl_c_atomic_scope_device 0xc00000 (3.0.0), __opencl_c_program_scope_global_variables 0xc00000 (3.0.0), __opencl_c_generic_address_space 0xc00000 (3.0.0), __opencl_c_ext_fp32_global_atomic_add 0xc00000 (3.0.0), __opencl_c_ext_fp32_local_atomic_add 0xc00000 (3.0.0), __opencl_c_ext_fp32_global_atomic_min_max 0xc00000 (3.0.0), __opencl_c_ext_fp32_local_atomic_min_max 0xc00000 (3.0.0), __opencl_c_ext_fp64_global_atomic_add 0xc00000 (3.0.0), __opencl_c_ext_fp64_local_atomic_add 0xc00000 (3.0.0), __opencl_c_ext_fp64_global_atomic_min_max 0xc00000 (3.0.0), __opencl_c_ext_fp64_local_atomic_min_max 0xc00000 (3.0.0), __opencl_c_fp16 0xc00000 (3.0.0), __opencl_c_fp64 0xc00000 (3.0.0) |
Latest conformance test passed |
v2022-04-19-01 |
(n/a) |
Device Type |
CPU |
GPU |
Device Profile |
FULL_PROFILE |
FULL_PROFILE |
Device Available |
Yes |
Yes |
Compiler Available |
Yes |
Yes |
Linker Available |
Yes |
Yes |
Max compute units |
36 |
28 |
Max clock frequency |
2999MHz |
1777MHz |
Device Partition |
(core) Max number of sub-devices 36, Supported partition types equally, by counts, Supported affinity domains (n/a) |
(core) Max number of sub-devices 1, Supported partition types None, Supported affinity domains (n/a) |
Max work item dimensions |
3 |
3 |
Max work item sizes |
4096x4096x4096 |
1024x1024x64 |
Max work group size |
4096 |
1024 |
Preferred work group size multiple (device) |
8 |
32 |
Preferred work group size multiple (kernel) |
8 |
32 |
Max sub-groups per work group |
128 |
32 |
Sub-group sizes (Intel) |
1, 2, 4, 8, 16, 32, 64, 128, 256, 512 |
(n/a) |
Preferred / native vector sizes |
char 16 / 16, short 16 / 16, int 16 / 16, long 8 / 8, half 16 / 16 (cl_khr_fp16), float 16 / 16, double 8 / 8 (cl_khr_fp64) |
char 1 / 1, short 1 / 1, int 1 / 1, long 1 / 1, half 0 / 0 (cl_khr_fp16), float 1 / 1, double 1 / 1 (cl_khr_fp64) |
Half-precision Floating-point support |
(cl_khr_fp16) Denormals No, Infinity and NANs Yes, Round to nearest Yes, Round to zero No, Round to infinity No, IEEE754-2008 fused multiply-add No, Support is emulated in software No |
(cl_khr_fp16) Denormals No, Infinity and NANs Yes, Round to nearest Yes, Round to zero No, Round to infinity No, IEEE754-2008 fused multiply-add No, Support is emulated in software No |
Single-precision Floating-point support |
(core) Denormals Yes, Infinity and NANs Yes, Round to nearest Yes, Round to zero Yes, Round to infinity Yes, IEEE754-2008 fused multiply-add Yes, Support is emulated in software No, Correctly-rounded divide and sqrt operations Yes |
(core) Denormals Yes, Infinity and NANs Yes, Round to nearest Yes, Round to zero Yes, Round to infinity Yes, IEEE754-2008 fused multiply-add Yes, Support is emulated in software No, Correctly-rounded divide and sqrt operations No |
Double-precision Floating-point support |
(cl_khr_fp64) Denormals Yes, Infinity and NANs Yes, Round to nearest Yes, Round to zero Yes, Round to infinity Yes, IEEE754-2008 fused multiply-add Yes, Support is emulated in software No |
(cl_khr_fp64) Denormals Yes, Infinity and NANs Yes, Round to nearest Yes, Round to zero Yes, Round to infinity Yes, IEEE754-2008 fused multiply-add Yes, Support is emulated in software No |
Address bits |
64, Little-Endian |
64, Little-Endian |
Global memory size |
65114288128 (60.64GiB) |
12884377600 (12GiB) |
Error Correction support |
No |
No |
Max memory allocation |
17179869184 (16GiB) |
11793334272 (10.98GiB) |
Unified memory for Host and Device |
Yes |
No |
Shared Virtual Memory (SVM) capabilities |
(core) Coarse-grained buffer sharing Yes, Fine-grained buffer sharing Yes, Fine-grained system sharing Yes, Atomics Yes |
(core) Coarse-grained buffer sharing Yes, Fine-grained buffer sharing Yes, Fine-grained system sharing No, Atomics No |
Unified Shared Memory (USM) |
(cl_intel_unified_shared_memory) |
(n/a) |
Host USM capabilities (Intel) |
USM access, USM atomic access |
(n/a) |
Device USM capabilities (Intel) |
USM access, USM atomic access |
(n/a) |
Single-Device USM caps (Intel) |
USM access, USM atomic access |
(n/a) |
Cross-Device USM caps (Intel) |
(n/a) |
(n/a) |
Shared System USM caps (Intel) |
(n/a) |
(n/a) |
Minimum alignment for any data type |
128 bytes |
128 bytes |
Alignment of base address |
1024 bits (128 bytes) |
4096 bits (512 bytes) |
Preferred alignment for atomics |
SVM 64 bytes, Global 64 bytes, Local 64 bytes |
SVM 64 bytes, Global 64 bytes, Local 64 bytes |
Atomic memory capabilities |
relaxed, acquire/release, sequentially-consistent, work-group scope, device scope, all-devices scope |
relaxed, work-group scope |
Atomic fence capabilities |
relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope |
relaxed, acquire/release, work-group scope |
Max size for global variable |
64000 (62.5KiB) |
0 |
Preferred total size of global vars |
1048576 (1024KiB) |
0 |
Global Memory cache type |
Read/Write |
None |
Global Memory cache size |
25952256 (24.75MiB) |
(n/a) |
Global Memory cache line size |
64 bytes |
(n/a) |
Image support |
Yes |
No |
Max number of samplers per kernel |
16 |
(n/a) |
Max size for 1D images from buffer |
1073741824 pixels |
(n/a) |
Max 1D or 2D image array size |
2048 images |
(n/a) |
Base address alignment for 2D image buffers |
0 bytes |
(n/a) |
Pitch alignment for 2D image buffers |
0 pixels |
(n/a) |
Max 2D image size |
32768x32768 pixels |
(n/a) |
Max 3D image size |
2048x2048x2048 pixels |
(n/a) |
Max number of read image args |
128 |
(n/a) |
Max number of write image args |
128 |
(n/a) |
Max number of read/write image args |
128 |
(n/a) |
Pipe support |
No |
No |
Max number of pipe args |
0 |
0 |
Max active pipe reservations |
0 |
0 |
Max pipe packet size |
0 |
0 |
Local memory type |
Global |
Local |
Local memory size |
1048576 (1024KiB) |
49152 (48KiB) |
Max number of constant args |
8 |
8 |
Max constant buffer size |
1048576 (1024KiB) |
65536 (64KiB) |
Generic address space support |
Yes |
Yes |
Max size of kernel argument |
1024 |
4352 (4.25KiB) |
Queue properties (on host) |
Out-of-order execution Yes, Profiling Yes |
Out-of-order execution No, Profiling Yes |
Device enqueue capabilities |
(n/a) |
(n/a) |
Queue properties (on device) |
Out-of-order execution No, Profiling No, Preferred size 0, Max size 0 |
Out-of-order execution No, Profiling No, Preferred size 0, Max size 0 |
Max queues on device |
0 |
0 |
Max events on device |
0 |
0 |
Command buffer capabilities |
kernel printf, simultaneous use, out of order, 0x10 |
kernel printf, simultaneous use, out of order, 0x10 |
Required queue properties for command buffer |
Out-of-order execution No, Profiling No |
Out-of-order execution No, Profiling No |
Prefer user sync for interop |
Yes |
Yes |
Profiling timer resolution |
1ns |
1ns |
Execution capabilities |
Run OpenCL kernels Yes, Run native kernels Yes, Non-uniform work-groups No, Work-group collective functions Yes, Sub-group independent forward progress Yes |
Run OpenCL kernels Yes, Run native kernels No, Non-uniform work-groups No, Work-group collective functions No, Sub-group independent forward progress Yes |
IL version |
(n/a) |
(n/a) |
ILs with version |
(n/a) |
(n/a) |
printf() buffer size |
16777216 (16MiB) |
16777216 (16MiB) |
Built-in kernels |
pocl.add.i8;org.khronos.openvx.scale_image.nn.u8;org.khronos.openvx.scale_image.bl.u8;org.khronos.openvx.tensor_convert_depth.wrap.u8.f32 |
pocl.mul.i32;pocl.add.i32;pocl.dnn.conv2d_int8_relu;pocl.sgemm.local.f32;pocl.sgemm.tensor.f16f16f32;pocl.sgemm_ab.tensor.f16f16f32;pocl.abs.f32;pocl.add.i8;org.khronos |
Built-in kernels with version |
pocl.add.i8 0x402000 (1.2.0), org.khronos.openvx.scale_image.nn.u8 0x402000 (1.2.0), org.khronos.openvx.scale_image.bl.u8 0x402000 (1.2.0), org.khronos.openvx.tensor_convert_depth.wrap.u8.f32 0x402000 (1.2.0) |
pocl.mul.i32 0x402000 (1.2.0), pocl.add.i32 0x402000 (1.2.0), pocl.dnn.conv2d_int8_relu 0x402000 (1.2.0), pocl.sgemm.local.f32 0x402000 (1.2.0), pocl.sgemm.tensor.f16f16f32 0x402000 (1.2.0), pocl.sgemm_ab.tensor.f16f16f32 0x402000 (1.2.0), pocl.abs.f32 0x402000 (1.2.0), pocl.add.i8 0x402000 (1.2.0), org.khronos.openvx.scale_image.nn.u8 0x402000 (1.2.0), org.khronos.openvx.scale_image.bl.u8 0x402000 (1.2.0), org.khronos.openvx.tensor_convert_depth.wrap.u8.f32 0x402000 (1.2.0) |
Device Extensions |
cl_khr_byte_addressable_store, cl_khr_global_int32_base_atomics, cl_khr_global_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_khr_local_int32_extended_atomics, cl_khr_3d_image_writes, cl_khr_command_buffer, cl_khr_command_buffer_multi_device, cl_khr_subgroups, cl_intel_unified_shared_memory, cl_ext_buffer_device_address, cl_pocl_svm_rect, cl_pocl_command_buffer_svm, cl_pocl_command_buffer_host_buffer, cl_khr_subgroup_ballot, cl_khr_subgroup_shuffle, cl_intel_subgroups, cl_intel_subgroups_short, cl_ext_float_atomics, cl_intel_required_subgroup_size, cl_khr_fp16, cl_khr_fp64, cl_khr_int64_base_atomics, cl_khr_int64_extended_atomics |
cl_khr_byte_addressable_store, cl_khr_global_int32_base_atomics, cl_khr_global_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_khr_local_int32_extended_atomics, cl_khr_int64_base_atomics, cl_khr_int64_extended_atomics, cl_nv_device_attribute_query, cl_ext_float_atomics, cl_khr_fp16, cl_khr_fp64, cl_ext_buffer_device_address, cl_khr_subgroup_ballot, cl_khr_subgroup_shuffle |
Device Extensions with Version |
cl_khr_byte_addressable_store 0x400000 (1.0.0), cl_khr_global_int32_base_atomics 0x400000 (1.0.0), cl_khr_global_int32_extended_atomics 0x400000 (1.0.0), cl_khr_local_int32_base_atomics 0x400000 (1.0.0), cl_khr_local_int32_extended_atomics 0x400000 (1.0.0), cl_khr_3d_image_writes 0x400000 (1.0.0), cl_khr_command_buffer 0x9004 (0.9.4), cl_khr_command_buffer_multi_device 0x9001 (0.9.1), cl_khr_subgroups 0x400000 (1.0.0), cl_intel_unified_shared_memory 0x400000 (1.0.0), cl_ext_buffer_device_address 0x1000 (0.1.0), cl_pocl_svm_rect 0x9000 (0.9.0), cl_pocl_command_buffer_svm 0x9000 (0.9.0), cl_pocl_command_buffer_host_buffer 0x9000 (0.9.0), cl_khr_subgroup_ballot 0x400000 (1.0.0), cl_khr_subgroup_shuffle 0x400000 (1.0.0), cl_intel_subgroups 0x400000 (1.0.0), cl_intel_subgroups_short 0x400000 (1.0.0), cl_ext_float_atomics 0x400000 (1.0.0), cl_intel_required_subgroup_size 0x400000 (1.0.0), cl_khr_fp16 0x400000 (1.0.0), cl_khr_fp64 0x400000 (1.0.0), cl_khr_int64_base_atomics 0x400000 (1.0.0), cl_khr_int64_extended_atomics 0x400000 (1.0.0) |
cl_khr_byte_addressable_store 0x400000 (1.0.0), cl_khr_global_int32_base_atomics 0x400000 (1.0.0), cl_khr_global_int32_extended_atomics 0x400000 (1.0.0), cl_khr_local_int32_base_atomics 0x400000 (1.0.0), cl_khr_local_int32_extended_atomics 0x400000 (1.0.0), cl_khr_int64_base_atomics 0x400000 (1.0.0), cl_khr_int64_extended_atomics 0x400000 (1.0.0), cl_nv_device_attribute_query 0x400000 (1.0.0), cl_ext_float_atomics 0x400000 (1.0.0), cl_khr_fp16 0x400000 (1.0.0), cl_khr_fp64 0x400000 (1.0.0), cl_ext_buffer_device_address 0x1000 (0.1.0), cl_khr_subgroup_ballot 0x400000 (1.0.0), cl_khr_subgroup_shuffle 0x400000 (1.0.0) |
Device Topology (NV) |
(n/a) |
PCI-E, 0000:65:00.0 |
Compute Capability (NV) |
(n/a) |
8.6 |
Registers per block (NV) |
(n/a) |
65536 |
Warp size (NV) |
(n/a) |
32 |
Integrated memory (NV) |
(n/a) |
No |
Kernel execution timeout (NV) |
(n/a) |
Yes |
Concurrent copy and kernel execution (NV) |
(n/a) |
Yes |
Number of async copy engines (NV) |
(n/a) |
5 |