Skip to content

fix: handle compound ontology ids in translate()#33

Merged
jkobject merged 1 commit into
jkobject:mainfrom
LiudengZhang:fix-translate-compound-ontology-ids
May 20, 2026
Merged

fix: handle compound ontology ids in translate()#33
jkobject merged 1 commit into
jkobject:mainfrom
LiudengZhang:fix-translate-compound-ontology-ids

Conversation

@LiudengZhang
Copy link
Copy Markdown
Contributor

Problem

translate() resolves each ontology id with a single .filter(ontology_id=...).one() call. CellxGene records multi-ethnic donors with comma-joined ids, e.g. self_reported_ethnicity_ontology_term_id = "HANCESTRO:0005,HANCESTRO:0008". That string matches no single ontology term, so .one() raises ObjectDoesNotExist and the whole run aborts.

This is what scPRINT users hit in jkobject/scPRINT#49 — a model that predicts ethnicity emits compound labels (they were in the training vocabulary), and translate() crashes the entire embedding run on them.

Fix

Add a _lookup_name(obj, ontology_id) helper that:

  • splits comma-joined ids, resolves each part, and rejoins the names ("HANCESTRO:0005,HANCESTRO:0008""European,African");
  • uses .one_or_none() so a genuinely unknown id degrades to None instead of raising.

All four branches of translate() (str / dict / set / list) now go through the helper. Return types and shapes are unchanged — only two behaviour changes, both strict improvements: compound ids resolve instead of crashing, and unknown ids return None instead of raising.

Tests

Adds tests/test_translate.py — covers single, compound, unknown, mixed, and whitespace cases using a mock registry, so it runs without a populated ontology database. (.one_or_none() is confirmed present in the pinned lamindb==2.1.1.)

Refs jkobject/scPRINT#49

translate() resolved each ontology id with a single
`.filter(ontology_id=...).one()` call. CellxGene records multi-ethnic
donors with comma-joined ids (e.g. "HANCESTRO:0005,HANCESTRO:0008"),
which match no single term, so `.one()` raised ObjectDoesNotExist and
aborted the whole run.

Add a `_lookup_name` helper that splits comma-joined ids, resolves each
part, and rejoins the names. It uses `.one_or_none()` so a genuinely
unknown id degrades to None instead of raising. All four branches of
translate() now go through the helper; return types are unchanged.

Adds tests/test_translate.py covering single, compound, unknown, and
mixed ids with a mock registry (no ontology DB needed).

Refs jkobject/scPRINT#49
jkobject pushed a commit to cantinilab/scPRINT that referenced this pull request May 20, 2026
make_adata calls translate() to fill the cosmetic conv_* columns with
human-readable names. translate() can raise on ids it cannot resolve
(e.g. comma-joined CellxGene ethnicity terms a model may predict),
which aborts the entire prediction/embedding run.

Wrap the translate() calls so an unresolved id warns and skips the
conv_* column instead of crashing. The actual predictions (pred_*) are
unaffected.

The root cause is also fixed upstream in scdataloader's translate()
(jkobject/scDataLoader#33); this keeps scPRINT robust regardless of the
installed scdataloader version.

Closes #49
@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 75.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.91%. Comparing base (c2db375) to head (be3a928).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
scdataloader/utils.py 75.00% 3 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #33      +/-   ##
==========================================
+ Coverage   48.09%   48.91%   +0.81%     
==========================================
  Files          10       10              
  Lines        1969     1977       +8     
==========================================
+ Hits          947      967      +20     
+ Misses       1022     1010      -12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants