Skip to content

Support all-nodes matching in LSHMM#2861

Draft
jeromekelleher wants to merge 6 commits into
tskit-dev:mainfrom
jeromekelleher:lshmm-ancestors
Draft

Support all-nodes matching in LSHMM#2861
jeromekelleher wants to merge 6 commits into
tskit-dev:mainfrom
jeromekelleher:lshmm-ancestors

Conversation

@jeromekelleher
Copy link
Copy Markdown
Member

Currently a WIP, but the idea is to support all-nodes matching a-la tsinfer in the LS HMM.

It turns out that this is significantly different to the "samples-only" approach that is currently implemented here, because the interpretation of the likelihoods per-tree node is different. For samples only (and especially the "leaf samples only" version we have currently implemented) the internal node likelihood values are really just for compression, and there's no actual semantics to their values. Then, using standard parsimony algorithms works well because the arbitrary choices made to do compression are perfectly OK.

However, when each node of the tree has a well-defined likelihood that we want to preserve, parsimony is much more difficult.

I think the main difference then is how we compress the likelihoods, and so for now I've just put in a quick hack to do both approaches by having different compression methods for each. We could probably just use the "tsinfer-like" compression approach for samples only, as well, but I wanted to keep the parsimony version around for a while in case it's much more performant.

Other than that, things seem to be working OK, but there are definitely some tricky issues to resolve along the way, which I'll keep plugging at. The plan is to provide an option to the API at the top-level though, as match_all_nodes or match_samples. I think this is flexible enough, given that you can simplify down to specific subsets of nodes if you want.

cc @szhan @astheeggeggs

@codecov
Copy link
Copy Markdown

codecov Bot commented Nov 30, 2023

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.68%. Comparing base (813f4ed) to head (b3366cf).
⚠️ Report is 435 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2861      +/-   ##
==========================================
- Coverage   89.69%   89.68%   -0.02%     
==========================================
  Files          30       30              
  Lines       30159    30159              
  Branches     5860     5856       -4     
==========================================
- Hits        27052    27048       -4     
- Misses       1778     1780       +2     
- Partials     1329     1331       +2     
Flag Coverage Δ
c-tests 86.09% <ø> (ø)
lwt-tests 80.78% <ø> (ø)
python-c-tests 67.89% <ø> (ø)
python-tests 98.90% <ø> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Python API 98.95% <ø> (-0.06%) ⬇️
Python C interface 87.86% <ø> (ø)
C library 86.09% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant