-
Notifications
You must be signed in to change notification settings - Fork 1k
Pull requests: deepseek-ai/DeepGEMM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Fix a race condition in contiguous k-grouped GEMM where in-flight tensormaps are updated in-place
#343
opened May 29, 2026 by
dfyz
Loading…
Fix TMEM lane address for debug mode across all SM100 kernels
#341
opened May 28, 2026 by
yejunjin
Loading…
Add SM90 FP8 paged MQA logits support for next_n=3Fix sm90 nextn3 paged mqa logits
#340
opened May 28, 2026 by
yangsiqt
Loading…
Fix ue8m0 packing: mask mantissa bits when extracting fp32 exponents
#337
opened May 19, 2026 by
yhyang201
Loading…
fix: replace numeric_limits::infinity() with literal to fix CUDA 12.8 NVRTC compilation
#327
opened May 5, 2026 by
kuishou68
Loading…
fix: correct operator precedence in pack_ue8m0_to_int assertion
#310
opened Apr 19, 2026 by
kuishou68
Loading…
change sm100_fp8_mqa_logits to 2cta, and change mma acc output to f16
#307
opened Apr 17, 2026 by
benzh-2025
Loading…
Fix JIT cache race condition with multi-process compilation
#302
opened Apr 11, 2026 by
Gregory-Pereira
Loading…
feat: support bf16 output and plain TMA writes in k_grouped_gemm on SM90;
#298
opened Mar 26, 2026 by
fedorovgv
Loading…
add caller location for util functions for better error message
#282
opened Jan 20, 2026 by
YouJiacheng
•
Draft
Add pyproject.toml for PEP 518 build system compliance
#271
opened Dec 31, 2025 by
yurekami
Contributor
Loading…
3 tasks
Previous Next
ProTip!
Filter pull requests by the default branch with base:main.