~5x speed-up and consolidated feature generators by ArthurDeclercq · Pull Request #245 · CompOmics/ms2rescore

ArthurDeclercq · 2026-01-09T15:39:10Z

Breaking changes

Drop support for Python 3.10 (now requires >=3.11)
Remove ionmob feature generator (replaced by IM2Deep)
Remove maxquant feature generator (replaced by new ms2 generator)
Remove lower_score_is_better parameter from DeepLC feature generator
Rename ms2pip API call from process_MS2_spectra to correlate_preloaded

New features

Crash recovery: Automatically writes intermediate .intermediate.psms.tsv on errors or KeyboardInterrupt during feature generation or rescoring. Supports resuming from intermediate files by skipping already-completed feature generators.
New ms2 feature generator: Replaces MaxQuant-derived features with general MS2 spectrum-level features, backed by ms2rescore-rs (Rust).
Mumble integration: Conditional import and PSM filtering for mumble-based workflows (filter_mumble_psms with matched ion percentage threshold).
DeepLC transfer learning: New _best_run_by_shared_proteoforms method selects the optimal run for fine-tuning based on proteoform overlap across runs.
Report overhaul: New charts.py module with expanded plotting capabilities including DeepLC-specific plots. Major expansion of generate.py and utils.py.
Basic features expanded: Added theoretical_mass, experimental_mass, mass_error, and pep_len features. Charge one-hot encoding now uses fixed range 1–6.
Profiling improvements: Profile output filenames now include timestamps to avoid overwrites.

Refactoring

DeepLC: Rewritten to use new deeplc.core.predict/finetune API with SplineTransformerCalibration per run. Replaces the old DeepLC class-based approach.
IM2Deep: Rewritten to use im2deep.core.predict and LinearCCSCalibration APIs with per-run calibration against a default reference dataset. Adds multi and calibration_set_size parameters.
MS2PIP: Heavily simplified; feature calculation delegated to ms2rescore-rs.
Spectrum parsing: Refactored to parse spectra once and store spectrum objects directly, avoiding redundant re-acquisition.
Basic feature generator: Feature names statically defined instead of dynamically built during add_features.
Core pipeline: Handles overlapping features between psm_file and feature generators. Explicit list() conversion on scores to fix mokapot numba TypingError.
Thread kwargs: Renamed to num_threads for consistency across DeepLC and IM2Deep.
Report: Fixed importlib.resources usage with proper as_file() context manager. Removed unused imports.

Dependencies

Version bump to 3.3.0-alpha.1
deeplc>=4.0.0a2 (was >=3.1)
im2deep>=2.0.0a2 (was >=1.1)
ms2pip>=4.2.0a0 (was >=4.0.0)
ms2rescore_rs>=0.5.0a0 (was >=0.4)
numpy>=1.25,<3.0 (was <2.0) — numpy 2 support
Added [tool.uv] prerelease = "allow"
Removed ionmob dependency (and its TensorFlow requirement)
deeplc_retrain enabled by default in config
Default config: maxquant replaced with ms2

Docs & CI

Updated input/output file documentation to describe intermediate crash-recovery files
ReadTheDocs build OS updated from ubuntu-22.04 to ubuntu-lts-latest

…into spectrum-feature-generator

…trum-feature-generator

pull main in spectrum-feature-generator

…rator

…pectrum-feature-generator

…mics/ms2rescore into spectrum-feature-generator

ms2rescore/feature_generators/deeplc.py

…efactoring

- Bump version to 3.3.0-alpha.1 - Update dependency pins: deeplc>=4.0.0a2, im2deep>=2.0.0a2, ms2pip>=4.2.0a0, ms2rescore_rs>=0.5.0a0 - Allow numpy <3.0 (numpy 2 support) - Enable uv prerelease resolution - Rename ms2pip API call to correlate_preloaded - Rename DeepLC/IM2Deep thread kwargs to num_threads - Replace maxquant with ms2 in default configs - Enable deeplc_retrain by default - Fix importlib.resources usage in report charts - Remove unused imports in report and GUI - Update ReadTheDocs to ubuntu-lts-latest

- Remove second psm_id_pattern block in parse_psms that re-applied the pattern to already-transformed IDs, causing match failures - Fix inverted condition for precursor m/z in parse_spectra (was marking mz as found when all values are zero) - Always treat ms2_spectra as missing until spectrum files are parsed - Update tests to mock get_ms2_spectra instead of removed get_precursor_info

…Python 3.14

Fix typos (intermidiate, Mubmle), incorrect docstrings and type annotations, improve error handling (bare raise for traceback preservation, descriptive SpectrumParsingError on missing spectrum IDs), and ensure BasicFeatureGenerator always emits all features with zero-fill when data is unavailable.

- Fix percolator_kwargs shadowing: parameter was overwritten by local dict, then updated with itself (no-op), silently ignoring user kwargs - Fix list/numpy array type mismatch in _log_id_psms_before causing TypeError on bitwise & operation - Fix unreachable error handlers in percolator subprocess handling; add check=True and use CalledProcessError properly - Fix hardcoded FDR threshold (0.01) ignoring the fdr parameter - Fix None values in mumble original_hit array causing inconsistent numpy boolean masking; default to True with explicit dtype=bool - Avoid mutating caller's config dict by tracking skipped feature generators in a separate set - Add empty PSM list guard in filter_mumble_psms - Extract duplicated charge-stripping regex to shared CHARGE_PATTERN constant - Fix typo "ckeck" -> "Check"

ArthurDeclercq and others added 30 commits February 24, 2024 15:48

initial commit

fdceeba

finalize ms2 feature generation

5374ed8

add rustyms

60207a3

remove exit statement fixed IM required value

ae39844

change logger.info to debug

9b98c4d

added profile decorator to get timings for functions

5e45756

removed profile as standard rescore debug statement

304777c

added new basic features

95ee475

fixes for ms2 feature generator, removed multiprocessing

73f4573

return empty list on parsing error with rustyms, removed multiprocessing

947233e

add deeplc_calibration psm set

24ce565

Merge branch 'timsRescore' of https://github.com/compomics/ms2rescore …

114b006

…into spectrum-feature-generator

remove unused import

33c38b0

Merge branch 'timsRescore' of https://github.com/compomics/ms2rescore …

40425c7

…into spectrum-feature-generator

Merge branch 'timsRescore' of https://github.com/compomics/ms2rescore …

b810b8c

…into spectrum-feature-generator

Merge tag 'main' of https://github.com/compomics/ms2rescore into spec…

69b5d1a

…trum-feature-generator

Merge pull request #177 from compomics/main

6e2d102

pull main in spectrum-feature-generator

integrate mumble into ms2branch

11fdc51

Merge remote-tracking branch 'origin/main' into spectrum-feature-gene…

3140c44

…rator

temp removal of sage features before rescoring

883169a

Merge branch 'main' of https://github.com/compomics/ms2rescore into s…

97865e7

…pectrum-feature-generator

remove psm_file features when rescoring with mumble

da39ae8

linting

37fff28

add hyperscore calculation

e8b59f3

calibration fixes

c51cd34

changes for mumble implementation

295e37f

change openms peptide formatting

909860d

add mumble psm filtering functionality

c5902c2

Merge branch 'spectrum-feature-generator' of https://github.com/compo…

6eaceb2

…mics/ms2rescore into spectrum-feature-generator

remove pyopenms dependency for hyperscore calculation

5ce55f5

ArthurDeclercq added 14 commits January 14, 2026 15:43

minor changes

a3875da

conditional import of mumble

8e793a0

add tracking to spectrum file reading

07b96e7

change rust function names

496c0b8

minor changes to logging and other bugfixes

95e149e

add deeplc plot to plotting module

6f49935

making report generation funcitonal again

d66426e

change fg colors

c132684

remove ionmob from ms2rescore

e782616

update required python version

f011705

remove ionmob from gui

623b95f

update numpy versioning

704c22d

updata colors of report

4cce1f6

updated documentation on intermediate files

13a72b8

paretje reviewed Feb 27, 2026

View reviewed changes

ms2rescore/feature_generators/deeplc.py Show resolved Hide resolved

paretje reviewed Feb 27, 2026

View reviewed changes

ms2rescore/feature_generators/deeplc.py Outdated Show resolved Hide resolved

RalfG added 2 commits March 18, 2026 16:41

Address review comments

603ee50

Merge branch 'main' of https://github.com/CompOmics/ms2rescore into r…

8498de9

…efactoring

RalfG changed the base branch from main to release/3.3 March 31, 2026 20:50

RalfG mentioned this pull request Apr 8, 2026

Spectrum feature generator #178

Closed

RalfG added 6 commits April 8, 2026 17:53

fix: Remove redundant rustyms dependency and skip mumble install for …

dbb6dad

…Python 3.14

Fix missing import on type annotation

41a1906

RalfG changed the title ~~Refactoring~~ ~5x speed-up and consolidated feature generators Apr 8, 2026

RalfG merged commit c9f0483 into release/3.3 Apr 8, 2026
6 checks passed

RalfG deleted the refactoring branch April 8, 2026 22:15

RalfG mentioned this pull request Apr 8, 2026

WIP: Move inferring of spectrum path to parsing psm list? #99

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

~5x speed-up and consolidated feature generators#245

~5x speed-up and consolidated feature generators#245
RalfG merged 85 commits intorelease/3.3from
refactoring

ArthurDeclercq commented Jan 9, 2026 •

edited by RalfG

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Conversation

ArthurDeclercq commented Jan 9, 2026 • edited by RalfG Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Breaking changes

New features

Refactoring

Dependencies

Docs & CI

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

ArthurDeclercq commented Jan 9, 2026 •

edited by RalfG

Loading