Skip to content

Fix building with CUDA toolkit 13.2#3273

Merged
zcbenz merged 4 commits intoml-explore:mainfrom
zcbenz:cuda-13-2
Mar 18, 2026
Merged

Fix building with CUDA toolkit 13.2#3273
zcbenz merged 4 commits intoml-explore:mainfrom
zcbenz:cuda-13-2

Conversation

@zcbenz
Copy link
Collaborator

@zcbenz zcbenz commented Mar 18, 2026

Copy link
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Out of curiosity why the need for launch bounds?

@zcbenz
Copy link
Collaborator Author

zcbenz commented Mar 18, 2026

When __launch_bounds__ is not specified CUDA would use heuristics to determine resources used by the kernel, and CUDA 13.2 seems to have a bug requesting too many resources and kernel would fail to launch with "too many resources requested for launch".

@zcbenz zcbenz merged commit 75f74ea into ml-explore:main Mar 18, 2026
16 checks passed
@zcbenz zcbenz deleted the cuda-13-2 branch March 18, 2026 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Illegal memory access in reduce kernel when built with CUDA Toolkit 13.1

2 participants