Skip to content

Enable GPU-accelerated nucleotide ungapped prefilter#1081

Open
KimBioInfoStudio wants to merge 1 commit intosoedinglab:masterfrom
KimBioInfoStudio:gpu-nucleotide-search
Open

Enable GPU-accelerated nucleotide ungapped prefilter#1081
KimBioInfoStudio wants to merge 1 commit intosoedinglab:masterfrom
KimBioInfoStudio:gpu-nucleotide-search

Conversation

@KimBioInfoStudio
Copy link

Summary

  • Map nucleotide PSSM rows from NucleotideMatrix encoding (A=0, C=1, T=2, G=3, X=4) to ConvertAA_20 positions (A=0, C=1, G=5, T=16, X=20), so existing 21-row GPU kernels score nucleotide sequences correctly
  • Always allocate 21-row PSSM for GPU path regardless of alphabet size
  • Allow GPU nucleotide search in ungapped prefilter mode (reject only gapped rescore mode)

Approach

The GPU database (makepaddedseqdb) encodes sequences using SubstitutionMatrix::aa2num which follows ConvertAA_20 alphabetical order. The GPU PSSM kernels use hardcoded 21-row shared memory. Rather than modifying CUDA kernels or database encoding, we remap the nucleotide PSSM rows in ungappedprefilter.cpp to match ConvertAA_20 positions. This keeps full compatibility with getUnpadded(), gpuserver, and the existing protein GPU path.

2 files changed, 29 insertions, 11 deletions. Zero CUDA kernel or libmarv modifications.

Test plan

Tested on DGX Spark (Blackwell B200, CUDA 13.0, aarch64):

  • Nucleotide GPU vs CPU: 30,000 scores identical (1000 queries × 10K targets, --comp-bias-corr 0 --mask 0)
  • Nucleotide small dataset: perfect match scores verified (e.g. 16bp exact match = 32)
  • Protein GPU regression: no change (patched vs unpatched GPU output identical)
  • Performance: 6.1× speedup over 20-core CPU (1000 queries × 10K targets)
Benchmark (1000q × 10K targets) GPU CPU (20 cores) Speedup
Nucleotide ungapped prefilter 940ms 5780ms 6.1×

🤖 Generated with Claude Code

Map nucleotide PSSM rows to ConvertAA_20 positions so the existing
GPU kernels (which expect 21-row amino acid encoding) can score
nucleotide sequences without any CUDA kernel changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant