Skip to content

~5x speed-up and consolidated feature generators#245

Merged
RalfG merged 85 commits intorelease/3.3from
refactoring
Apr 8, 2026
Merged

~5x speed-up and consolidated feature generators#245
RalfG merged 85 commits intorelease/3.3from
refactoring

Conversation

@ArthurDeclercq
Copy link
Copy Markdown
Contributor

@ArthurDeclercq ArthurDeclercq commented Jan 9, 2026

Breaking changes

  • Drop support for Python 3.10 (now requires >=3.11)
  • Remove ionmob feature generator (replaced by IM2Deep)
  • Remove maxquant feature generator (replaced by new ms2 generator)
  • Remove lower_score_is_better parameter from DeepLC feature generator
  • Rename ms2pip API call from process_MS2_spectra to correlate_preloaded

New features

  • Crash recovery: Automatically writes intermediate .intermediate.psms.tsv on errors or KeyboardInterrupt during feature generation or rescoring. Supports resuming from intermediate files by skipping already-completed feature generators.
  • New ms2 feature generator: Replaces MaxQuant-derived features with general MS2 spectrum-level features, backed by ms2rescore-rs (Rust).
  • Mumble integration: Conditional import and PSM filtering for mumble-based workflows (filter_mumble_psms with matched ion percentage threshold).
  • DeepLC transfer learning: New _best_run_by_shared_proteoforms method selects the optimal run for fine-tuning based on proteoform overlap across runs.
  • Report overhaul: New charts.py module with expanded plotting capabilities including DeepLC-specific plots. Major expansion of generate.py and utils.py.
  • Basic features expanded: Added theoretical_mass, experimental_mass, mass_error, and pep_len features. Charge one-hot encoding now uses fixed range 1–6.
  • Profiling improvements: Profile output filenames now include timestamps to avoid overwrites.

Refactoring

  • DeepLC: Rewritten to use new deeplc.core.predict/finetune API with SplineTransformerCalibration per run. Replaces the old DeepLC class-based approach.
  • IM2Deep: Rewritten to use im2deep.core.predict and LinearCCSCalibration APIs with per-run calibration against a default reference dataset. Adds multi and calibration_set_size parameters.
  • MS2PIP: Heavily simplified; feature calculation delegated to ms2rescore-rs.
  • Spectrum parsing: Refactored to parse spectra once and store spectrum objects directly, avoiding redundant re-acquisition.
  • Basic feature generator: Feature names statically defined instead of dynamically built during add_features.
  • Core pipeline: Handles overlapping features between psm_file and feature generators. Explicit list() conversion on scores to fix mokapot numba TypingError.
  • Thread kwargs: Renamed to num_threads for consistency across DeepLC and IM2Deep.
  • Report: Fixed importlib.resources usage with proper as_file() context manager. Removed unused imports.

Dependencies

  • Version bump to 3.3.0-alpha.1
  • deeplc>=4.0.0a2 (was >=3.1)
  • im2deep>=2.0.0a2 (was >=1.1)
  • ms2pip>=4.2.0a0 (was >=4.0.0)
  • ms2rescore_rs>=0.5.0a0 (was >=0.4)
  • numpy>=1.25,<3.0 (was <2.0) — numpy 2 support
  • Added [tool.uv] prerelease = "allow"
  • Removed ionmob dependency (and its TensorFlow requirement)
  • deeplc_retrain enabled by default in config
  • Default config: maxquant replaced with ms2

Docs & CI

  • Updated input/output file documentation to describe intermediate crash-recovery files
  • ReadTheDocs build OS updated from ubuntu-22.04 to ubuntu-lts-latest

ArthurDeclercq and others added 30 commits February 24, 2024 15:48
pull main in spectrum-feature-generator
@RalfG RalfG changed the base branch from main to release/3.3 March 31, 2026 20:50
@RalfG RalfG mentioned this pull request Apr 8, 2026
RalfG added 6 commits April 8, 2026 17:53
- Bump version to 3.3.0-alpha.1
- Update dependency pins: deeplc>=4.0.0a2, im2deep>=2.0.0a2,
  ms2pip>=4.2.0a0, ms2rescore_rs>=0.5.0a0
- Allow numpy <3.0 (numpy 2 support)
- Enable uv prerelease resolution
- Rename ms2pip API call to correlate_preloaded
- Rename DeepLC/IM2Deep thread kwargs to num_threads
- Replace maxquant with ms2 in default configs
- Enable deeplc_retrain by default
- Fix importlib.resources usage in report charts
- Remove unused imports in report and GUI
- Update ReadTheDocs to ubuntu-lts-latest
- Remove second psm_id_pattern block in parse_psms that re-applied the
  pattern to already-transformed IDs, causing match failures
- Fix inverted condition for precursor m/z in parse_spectra (was marking
  mz as found when all values are zero)
- Always treat ms2_spectra as missing until spectrum files are parsed
- Update tests to mock get_ms2_spectra instead of removed get_precursor_info
Fix typos (intermidiate, Mubmle), incorrect docstrings and type
annotations, improve error handling (bare raise for traceback
preservation, descriptive SpectrumParsingError on missing spectrum IDs),
and ensure BasicFeatureGenerator always emits all features with zero-fill
when data is unavailable.
- Fix percolator_kwargs shadowing: parameter was overwritten by local
  dict, then updated with itself (no-op), silently ignoring user kwargs
- Fix list/numpy array type mismatch in _log_id_psms_before causing
  TypeError on bitwise & operation
- Fix unreachable error handlers in percolator subprocess handling;
  add check=True and use CalledProcessError properly
- Fix hardcoded FDR threshold (0.01) ignoring the fdr parameter
- Fix None values in mumble original_hit array causing inconsistent
  numpy boolean masking; default to True with explicit dtype=bool
- Avoid mutating caller's config dict by tracking skipped feature
  generators in a separate set
- Add empty PSM list guard in filter_mumble_psms
- Extract duplicated charge-stripping regex to shared CHARGE_PATTERN
  constant
- Fix typo "ckeck" -> "Check"
@RalfG RalfG changed the title Refactoring ~5x speed-up and consolidated feature generators Apr 8, 2026
@RalfG RalfG merged commit c9f0483 into release/3.3 Apr 8, 2026
6 checks passed
@RalfG RalfG deleted the refactoring branch April 8, 2026 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants