Skip to content

Consistent Supersession Lineage Mode Across Data Discovery Endpoints #705

@bencap

Description

@bencap

Problem

Supersession filtering is currently handled inconsistently across our data discovery endpoints:

Endpoint Current Behavior
POST /score-sets/search Auto-excludes score sets with published superseding versions via build_search_score_sets_query_filter()
POST /me/score-sets/search Same query-level filter as public search
GET /score-sets/recently-published Blanks superseding_score_set based on permission, but no chain filtering
GET /experiments/{urn}/score-sets Filters via ~ScoreSet.superseding_score_set.has() + find_superseded_score_set_tail() dedup
POST /target-genes/search, GET /target-genes Uses find_superseded_score_set_tail() to keep only chain heads
POST /me/target-genes/search Simple filter: superseding_score_set is None
Variant endpoints (/variants/*) No supersession filtering at all — returns data from superseded score sets

There are at least three different filtering strategies in use (query-level join filter, tail-walk dedup, simple is None check), and some endpoints expose superseded data with no opt-out. API consumers have no way to control this behavior.

Proposal

Introduce a lineage mode query parameter across all data discovery endpoints that controls how superseded score set chains are represented in results.

Query Parameter

Parameter Values Default
lineage collapsed, expanded collapsed

Behavior

  • collapsed (default): Return only data associated with the head node (latest accessible version) of each superseded score set chain. This is the "clean" view — consumers see one representative per lineage.
  • expanded: Return all data regardless of supersession status. Every score set and its associated variants/targets are included, even if the score set has been superseded by a published replacement.

Affected Endpoints

All data discovery / listing / search endpoints should respect this parameter:

  • POST /score-sets/search
  • POST /me/score-sets/search
  • POST /score-sets/search/filter-options
  • GET /score-sets/recently-published
  • GET /experiments/{urn}/score-sets
  • POST /target-genes/search
  • GET /target-genes
  • POST /me/target-genes/search
  • Variant search/list endpoints that return data across score sets

Single-resource fetch endpoints (GET /score-sets/{urn}, GET /target-genes/{id}, etc.) are not affected — they should continue to return the requested resource regardless of supersession status, with the superseding_score_set / superseded_score_set fields populated as they are today.

Implementation Considerations

  1. Centralize the filtering logic. Today we have build_search_score_sets_query_filter(), find_superseded_score_set_tail(), find_publish_or_private_superseded_score_set_tail(), and fetch_superseding_score_set_in_search_result() in src/mavedb/lib/score_sets.py, plus inline filters in routers. The new parameter should flow through a single, shared mechanism.

  2. Public endpoints should resolve chain heads deterministically. Today the "head" of a chain is resolved per-user based on permissions — a contributor with an unpublished superseding draft sees it as the head, while an anonymous user sees the latest published version. This means two users hitting the same public search endpoint can see different results. Public discovery endpoints should always resolve the chain head as the latest published version, regardless of who is making the request. Authenticated /me/ endpoints continue to show the user's full picture including drafts. This simplifies the query-level vs. post-query tradeoff: public endpoints can do pure query-level filtering (no per-user permission checks needed), while /me/ endpoints handle permission-aware post-query resolution.

  3. Search request models. ScoreSetsSearch and other Pydantic search models will need a lineage field. Consider an enum (LineageMode) in a shared location.

  4. Variant endpoints. Currently, variant search has zero supersession awareness. In collapsed mode, variants belonging to superseded score sets should be excluded (or mapped to their head score set, depending on desired semantics). This is the biggest behavioral change.

  5. Backwards compatibility. Since collapsed is the proposed default and most endpoints already exhibit collapsed-like behavior, the default experience should remain largely unchanged. The expanded mode is the new capability.

Current Code References

  • Supersession model fields: src/mavedb/models/score_set.pysuperseded_score_set_id, superseded_score_set, superseding_score_set
  • Core filtering utilities: src/mavedb/lib/score_sets.pybuild_search_score_sets_query_filter(), find_superseded_score_set_tail(), fetch_superseding_score_set_in_search_result()
  • Score set search router: src/mavedb/routers/score_sets.py
  • Experiment score sets: src/mavedb/routers/experiments.py
  • Target gene search: src/mavedb/routers/target_genes.py
  • View models: src/mavedb/view_models/score_set.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    app: backendTask implementation touches the backendtype: enhancementEnhancement to an existing feature

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions