When libmklccgdll works correctly, it is extremely fast. However, to get optimal performance, keep these points in mind:
Once data is local, libmklccgdll hands off the actual arithmetic to underlying MKL kernels (e.g., AVX2, AVX-512 optimized code) running on each node’s CPU. It orchestrates parallelism at two levels: libmklccgdll work
| Step | Action |
|------|--------|
| ✅ | Installed Intel oneAPI Base Toolkit (or standalone MKL) |
| ✅ | Set environment with setvars.bat |
| ✅ | In Visual Studio: Project Properties → Intel Performance Libraries → Use MKL |
| ✅ | Add MKL DLL path to PATH at runtime |
| ✅ | If using debug mode, try release DLLs or install debug redistributables | When libmklccgdll works correctly, it is extremely fast