Skip to content

feat: add generic Manhattan/QQ plotting helpers for association results#1017

Open
Sharon-codes wants to merge 10 commits into
malariagen:masterfrom
Sharon-codes:issue-772-manhattan-qq
Open

feat: add generic Manhattan/QQ plotting helpers for association results#1017
Sharon-codes wants to merge 10 commits into
malariagen:masterfrom
Sharon-codes:issue-772-manhattan-qq

Conversation

@Sharon-codes
Copy link
Copy Markdown
Contributor

Summary

Adds Manhattan and QQ plotting helpers to the API for association-style results, designed to work directly in Python workflows (including GWAS notebooks) without requiring R or Dash-specific tooling.

Design choices (addressing issue discussion)

  • Not tied to GWAS only: the functions operate on generic association result tables containing contig/position/p-values.
  • Python-native API support: no R dependency.
  • No Dash requirement: uses Plotly, already used in this project.
  • Supports multiple data backends:
    • pandas.DataFrame (e.g. PLINK-exported tables loaded in pandas)
    • xarray.Dataset (aligned with sgkit-style workflows)

Changes

  • malariagen_data/anoph/phenotypes.py
    • Added plot_manhattan(...)
    • Added plot_qq(...)
    • Added internal normalization helper to accept both DataFrame and xarray Dataset inputs.
    • Added Manhattan controls for contig_order, contig_spacing, and threshold line.
  • tests/anoph/test_phenotypes_plotting.py
    • Added tests for DataFrame and xarray inputs
    • Added validation test for missing columns

Validation

  • poetry run pytest tests/anoph/test_phenotypes_plotting.py -q (5 passed)
  • poetry run mypy malariagen_data/anoph/phenotypes.py tests/anoph/test_phenotypes_plotting.py --ignore-missing-imports
  • poetry run ruff check malariagen_data/anoph/phenotypes.py tests/anoph/test_phenotypes_plotting.py

Closes #772

@Sharon-codes
Copy link
Copy Markdown
Contributor Author

@jonbrenas Hey please look into this , i hope this addresses your question also which was there in the comment of the issue , if you need me to add anything do let me or else please check and allow it , thanks alot !!!

@jonbrenas
Copy link
Copy Markdown
Collaborator

Thanks @Sharon-codes! I don't think calling the module containing the code for the Manhattan and QQ plots 'phenotypes' makes a lot of sense (but feel free to prove me wrong). I don't think there is (currently) any function that would generate the type of input that these functions need, which reduces their usefulness quite significantly. It would also be great if the tests followed the same general approach that is used for other modules.

- move generic plotting helpers out of anoph/phenotypes.py into anoph/association.py\n- add snp_phenotype_association() and association_results() to generate plot-ready inputs\n- wire new mixin into AnophelesDataResource\n- replace tests/anoph/test_phenotypes_plotting.py with tests/anoph/test_association.py
@Sharon-codes
Copy link
Copy Markdown
Contributor Author

Thanks @jonbrenas — I’ve reworked this based on your comments. (totally agreed on your comment, lol wont be able to prove you wrong)

Changes made:

  • Moved generic Manhattan/QQ plotting code out of anoph/phenotypes.py into a dedicated module: anoph/association.py.
  • Added snp_phenotype_association() / association_results() so there is now a built-in way to generate plot input (contig, position, pvalue) from phenotypes_with_snps().
  • Wired the new association mixin into AnophelesDataResource.
  • Reworked tests and replaced test_phenotypes_plotting.py with tests/anoph/test_association.py.

Please review commit 148c84c.

@Sharon-codes
Copy link
Copy Markdown
Contributor Author

hey @jonbrenas please accept this pr , thanks !!!

@jonbrenas
Copy link
Copy Markdown
Collaborator

Thanks @Sharon-codes. This looks great! Could you create a short notebook (with dummy phenotypic data as we are short on real data) showing how the functions are used, what the plots look like, ...

@Sharon-codes
Copy link
Copy Markdown
Contributor Author

@jonbrenas Heyyyy added , please check !! Thanks for the input and close the PR if happy with the new file !!

@Sharon-codes
Copy link
Copy Markdown
Contributor Author

Heyy @jonbrenas If everything is as per your liking , please close this PR thanksss

@Sharon-codes
Copy link
Copy Markdown
Contributor Author

Heyyy @jonbrenas added all the changes as you asked , if all good , please accept and close the PR , thanksss !!

@Sharon-codes
Copy link
Copy Markdown
Contributor Author

Heyyy @jonbrenas I have created the notebook as requested , please check and if all good , close the PR , thanksss !!

@jonbrenas
Copy link
Copy Markdown
Collaborator

Thanks @Sharon-codes. Would it be possible to create the phenotypic data as part of conftest.py (even if it is for only one of the resources (probably Ag3) and for a single sample set) so that the tests can follow the normal pattern?

@Sharon-codes
Copy link
Copy Markdown
Contributor Author

Heyyyy @jonbrenas

Implemented this in commit f72b8c6.

  • Added simulated phenotype data generation in tests/anoph/conftest.py for Ag3 (single sample set: AG1000G-AO).
  • Updated tests/anoph/test_association.py to use the normal fixture-driven API pattern (Ag3 simulator fixture) instead of a standalone dummy API.

This keeps the association tests aligned with the existing anoph test structure.

If this sounds good , please merge and close the PR !!

@Sharon-codes
Copy link
Copy Markdown
Contributor Author

heyyy @jonbrenas i have added the changes you asked for , if everything seems good , please close the PR , thanksss !!

@Sharon-codes
Copy link
Copy Markdown
Contributor Author

Heyy @jonbrenas if all good , please accept it , thanks

self,
data: xr.Dataset,
*,
genotype_col: str = "call_genotype",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a few of these should be defined as params in a separate file for consistency, maybe base_params.field is enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding Manhattan and QQ plots in the API

2 participants