Problem
The variant data CSV endpoint (/score-sets/{urn}/variants/data) supports several namespaces (scores, counts, vep, gnomad, clingen) that can be requested individually or in combination. However, namespaces that depend on MappedVariant data (clingen, vep) only return values when include_post_mapped_hgvs=true or gnomad is also requested. Otherwise, all their values resolve to NA.
The root cause is in get_score_set_variants_as_csv() in src/mavedb/lib/score_sets.py. The function has four hardcoded query branches that decide whether to join MappedVariant and/or GnomADVariant:
gnomad in namespaces and include_post_mapped_hgvs → joins both
include_post_mapped_hgvs only → joins MappedVariant
gnomad in namespaces only → joins GnomADVariant (via MappedVariant)
- else → selects only
Variant, no joins
The clingen and vep namespaces both read from MappedVariant (clingen_allele_id and vep_functional_consequence respectively), but neither is checked when deciding whether to join MappedVariant. When they're the only namespace requested, the query falls into branch 4, mappings stays None, and all values come back as NA.
The existing test (test_download_clingen_file_in_variant_data_path) masks this by always including include_post_mapped_hgvs=true.
Expected behavior
Every namespace should work independently. These should all return populated data:
GET /score-sets/{urn}/variants/data?namespaces=clingen
GET /score-sets/{urn}/variants/data?namespaces=vep
GET /score-sets/{urn}/variants/data?namespaces=clingen&namespaces=vep
GET /score-sets/{urn}/variants/data?namespaces=scores&namespaces=clingen
Proposed fix
Replace the four hardcoded query branches with a single composable query that determines which joins are needed based on the full set of requested namespaces:
- Needs
MappedVariant: clingen in namespaces, vep in namespaces, or include_post_mapped_hgvs is True
- Needs
GnomADVariant: gnomad in namespaces
This reduces the branching from four cases to a single query that conditionally adds joins, making it straightforward to add future namespaces (e.g. ClinVar) without further combinatorial explosion.
Changes needed
-
Refactor query logic in get_score_set_variants_as_csv() — Compute needs_mapping and needs_gnomad booleans from the inputs, build one query with conditional joins, and extract results into variants, mappings, and gnomad_data lists uniformly.
-
Add tests for independent namespace requests — Test ?namespaces=clingen and ?namespaces=vep without include_post_mapped_hgvs=true or gnomad, asserting populated (non-NA) values.
-
Update existing test — test_download_clingen_file_in_variant_data_path should drop the include_post_mapped_hgvs=true flag to verify standalone behavior.
Relevant files
src/mavedb/lib/score_sets.py — get_score_set_variants_as_csv() query logic and variant_to_csv_row()
src/mavedb/routers/score_sets.py — /variants/data endpoint
src/mavedb/models/mapped_variant.py — MappedVariant model
- `tests/routers/test_score_set.py — Existing ClinGen/gnomAD CSV tests
Problem
The variant data CSV endpoint (
/score-sets/{urn}/variants/data) supports several namespaces (scores,counts,vep,gnomad,clingen) that can be requested individually or in combination. However, namespaces that depend onMappedVariantdata (clingen,vep) only return values wheninclude_post_mapped_hgvs=trueorgnomadis also requested. Otherwise, all their values resolve toNA.The root cause is in
get_score_set_variants_as_csv()insrc/mavedb/lib/score_sets.py. The function has four hardcoded query branches that decide whether to joinMappedVariantand/orGnomADVariant:gnomadin namespaces andinclude_post_mapped_hgvs→ joins bothinclude_post_mapped_hgvsonly → joinsMappedVariantgnomadin namespaces only → joinsGnomADVariant(viaMappedVariant)Variant, no joinsThe
clingenandvepnamespaces both read fromMappedVariant(clingen_allele_idandvep_functional_consequencerespectively), but neither is checked when deciding whether to joinMappedVariant. When they're the only namespace requested, the query falls into branch 4,mappingsstaysNone, and all values come back asNA.The existing test (
test_download_clingen_file_in_variant_data_path) masks this by always includinginclude_post_mapped_hgvs=true.Expected behavior
Every namespace should work independently. These should all return populated data:
Proposed fix
Replace the four hardcoded query branches with a single composable query that determines which joins are needed based on the full set of requested namespaces:
MappedVariant:clingenin namespaces,vepin namespaces, orinclude_post_mapped_hgvsisTrueGnomADVariant:gnomadin namespacesThis reduces the branching from four cases to a single query that conditionally adds joins, making it straightforward to add future namespaces (e.g. ClinVar) without further combinatorial explosion.
Changes needed
Refactor query logic in
get_score_set_variants_as_csv()— Computeneeds_mappingandneeds_gnomadbooleans from the inputs, build one query with conditional joins, and extract results intovariants,mappings, andgnomad_datalists uniformly.Add tests for independent namespace requests — Test
?namespaces=clingenand?namespaces=vepwithoutinclude_post_mapped_hgvs=trueorgnomad, asserting populated (non-NA) values.Update existing test —
test_download_clingen_file_in_variant_data_pathshould drop theinclude_post_mapped_hgvs=trueflag to verify standalone behavior.Relevant files
src/mavedb/lib/score_sets.py—get_score_set_variants_as_csv()query logic andvariant_to_csv_row()src/mavedb/routers/score_sets.py—/variants/dataendpointsrc/mavedb/models/mapped_variant.py—MappedVariantmodel