Decide on preferred form (and/or formalize current handling) for reference-identical HGVS variants

### Background

Recent work in #53 added support for positionless reference-identical variants (`ACC:p.=`, `ACC:c.=`, `ACC:g.=`) to the VRS mapping pipeline. These are represented as VRS `Allele` objects with a `ReferenceLengthExpression`  
sequence length, since `ga4gh/hgvs-tools` cannot translate `.=` expressions.

The pipeline correctly distinguishes the two valid HGVS forms:

| Form | Example | Current routing | VRS output |
|------|---------|-----------------|------------|
| Positionless | `NP_001234.1:p.=` | `endswith(".=")` → RLE path | Full-sequence RLE `Allele` |
| Positional | `NP_001234.1:p.Ala13=` | Normal `translate_hgvs_to_vrs` path | Position-specific LSE `Allele` |

These are genuinely different VRS alleles, even though both express "this variant matches the reference."

### The Problem

MaveDB scoreset submitters can use either form, and the choice is often a matter of convention rather than biological intent:

- A scoreset with a single wildtype control row might submit `p.=` (no specific position).
- A saturating mutagenesis experiment might submit one row per tested position — `p.Ala1=`, `p.Gly2=`, etc. — using the positional form.

Both are biologically valid, but they imply different things to a VRS consumer: the positionless form says "the whole sequence is reference-identical," while the positional form says "this one residue specifically is reference-identical." A consumer performing variant deduplication or lookup will not recognize these as the same variant.

Furthermore, despite describing the same variant, all VRS digests between these variants differ which breaks a fundamental contract of the specification.

We should decide whether to have a preference and normalize one form to the other, or to accept both and document the distinction clearly.

### Options

**Option A — Prefer positionless (`p.=`), normalize positional ref-identical to RLE**
- Positional `p.Ala13=` rows are normalized to positionless RLE with a warning logged.
- Consumers always get a single canonical "reference" allele per transcript.
- Downside: lossy — discards position information that the submitter explicitly provided.
  A scoreset with 200 `p.AlaN=` rows collapses to one repeated allele.

**Option B — Prefer positional, reject or warn on positionless**
- `p.=` is flagged as ambiguous (no position anchor) and mapped with an `error_message`,
  or normalized to a per-codon set of ref-identical alleles using the transcript sequence.
- Preserves per-position specificity throughout the pipeline.
- Downside: generating per-codon alleles for a positionless `p.=` is expensive and
  may not match submitter intent; rejecting it drops a valid representation.

**Option C — Accept both as-is, document the distinction**
- No normalization. Both forms pass through to distinct VRS alleles as they do today.
- Consumers must handle both RLE and LSE ref-identical alleles.
- Downside: inconsistent output across scoresets that express the same thing differently.

### Suggested Starting Point

Option C reflects current behavior, is the lowest-risk path, and eliminates any possibility of altering submitter intent. At minimum, it should be documented clearly. 

Any decision towards options A or B will inform whether we add normalization logic to `_create_post_mapped_hgvs_strings` / `_construct_vrs_allele` and whether we update the MaveDB submission guidelines to recommend one form.

### Affected Code

- `vrs_map.py`: `_construct_vrs_allele` (`.endswith(".=")` RLE branch), `_create_post_mapped_hgvs_strings` (short-circuit block), `_hgvs_variant_is_valid`
- `lookup.py`: `translate_ref_identical_to_vrs`
- `annotate.py`: `_get_hgvs_string`, `_get_vrs_ref_allele_seq`, `_annotate_allele_mapping`, `_annotate_haplotype_mapping`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decide on preferred form (and/or formalize current handling) for reference-identical HGVS variants #93

Background

The Problem

Options

Suggested Starting Point

Affected Code

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Form	Example	Current routing	VRS output
Positionless	`NP_001234.1:p.=`	`endswith(".=")` → RLE path	Full-sequence RLE `Allele`
Positional	`NP_001234.1:p.Ala13=`	Normal `translate_hgvs_to_vrs` path	Position-specific LSE `Allele`

Decide on preferred form (and/or formalize current handling) for reference-identical HGVS variants #93

Description

Background

The Problem

Options

Suggested Starting Point

Affected Code

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions