Skip to content

fix: resolve KeyError and fragile array slicing in haplotypes_frequencies_advanced()`#982

Open
blankirigaya wants to merge 3 commits into
malariagen:masterfrom
blankirigaya:fixed-haq_frq
Open

fix: resolve KeyError and fragile array slicing in haplotypes_frequencies_advanced()`#982
blankirigaya wants to merge 3 commits into
malariagen:masterfrom
blankirigaya:fixed-haq_frq

Conversation

@blankirigaya
Copy link
Copy Markdown
Contributor

Summary

Fixes KeyError: 'label' that makes haplotypes_frequencies_advanced()
completely unusable, and replaces fragile positional .to_numpy() slicing
with explicit named column selection.

Closes #<ISSUE_NUMBER>

Changes

In malariagen_data/anoph/hap_frq.py:

  1. Remove premature set_index("label") call that was dropping the
    label column before it could be read
  2. Replace df.to_numpy()[:, :n] positional slices with
    df[freq_cols].to_numpy() etc. using explicit column name lists
  3. Drop the now-unnecessary set_index entirely (return value is
    ds_out, not df_haps_sorted)
# Before
df_haps_sorted["label"] = [...]
df_haps_sorted.set_index(keys="label", drop=True, inplace=True)
ds_out["variant_label"] = "variants", df_haps_sorted["label"]   # KeyError
ds_out["event_frequency"] = ..., df_haps_sorted.to_numpy()[:, :n]
ds_out["event_count"]     = ..., df_haps_sorted.to_numpy()[:, n:2*n]
ds_out["event_nobs"]      = ..., df_haps_sorted.to_numpy()[:, 2*n:-2]

# After
labels = ["H" + str(i) for i in range(len(df_haps_sorted))]
df_haps_sorted["label"] = labels
freq_cols  = [c for c in df_haps_sorted.columns if c.startswith("frq_")]
count_cols = [c for c in df_haps_sorted.columns if c.startswith("count_")]
nobs_cols  = [c for c in df_haps_sorted.columns if c.startswith("nobs_")]
ds_out["variant_label"]   = "variants", df_haps_sorted["label"].values
ds_out["event_frequency"] = ("variants", "cohorts"), df_haps_sorted[freq_cols].to_numpy()
ds_out["event_count"]     = ("variants", "cohorts"), df_haps_sorted[count_cols].to_numpy()
ds_out["event_nobs"]      = ("variants", "cohorts"), df_haps_sorted[nobs_cols].to_numpy()

Files Changed

  • malariagen_data/anoph/hap_frq.py

@jonbrenas
Copy link
Copy Markdown
Collaborator

Thanks @blankirigaya. Your comment makes it sound like the function doesn't work, which is empirically untrue. I am not saying that you are wrong, just that the changes you propose seem more cosmetic than anything else. Can you explain your choices in more detail?

@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.02%. Comparing base (06563cf) to head (e5e5c42).
⚠️ Report is 88 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #982      +/-   ##
==========================================
+ Coverage   88.88%   89.02%   +0.14%     
==========================================
  Files          56       57       +1     
  Lines        6439     6508      +69     
==========================================
+ Hits         5723     5794      +71     
+ Misses        716      714       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants