~5x speed-up and consolidated feature generators#245
Merged
RalfG merged 85 commits intorelease/3.3from Apr 8, 2026
Merged
Conversation
…into spectrum-feature-generator
…into spectrum-feature-generator
…into spectrum-feature-generator
…trum-feature-generator
pull main in spectrum-feature-generator
…pectrum-feature-generator
…mics/ms2rescore into spectrum-feature-generator
paretje
reviewed
Feb 27, 2026
paretje
reviewed
Feb 27, 2026
- Bump version to 3.3.0-alpha.1 - Update dependency pins: deeplc>=4.0.0a2, im2deep>=2.0.0a2, ms2pip>=4.2.0a0, ms2rescore_rs>=0.5.0a0 - Allow numpy <3.0 (numpy 2 support) - Enable uv prerelease resolution - Rename ms2pip API call to correlate_preloaded - Rename DeepLC/IM2Deep thread kwargs to num_threads - Replace maxquant with ms2 in default configs - Enable deeplc_retrain by default - Fix importlib.resources usage in report charts - Remove unused imports in report and GUI - Update ReadTheDocs to ubuntu-lts-latest
- Remove second psm_id_pattern block in parse_psms that re-applied the pattern to already-transformed IDs, causing match failures - Fix inverted condition for precursor m/z in parse_spectra (was marking mz as found when all values are zero) - Always treat ms2_spectra as missing until spectrum files are parsed - Update tests to mock get_ms2_spectra instead of removed get_precursor_info
Fix typos (intermidiate, Mubmle), incorrect docstrings and type annotations, improve error handling (bare raise for traceback preservation, descriptive SpectrumParsingError on missing spectrum IDs), and ensure BasicFeatureGenerator always emits all features with zero-fill when data is unavailable.
- Fix percolator_kwargs shadowing: parameter was overwritten by local dict, then updated with itself (no-op), silently ignoring user kwargs - Fix list/numpy array type mismatch in _log_id_psms_before causing TypeError on bitwise & operation - Fix unreachable error handlers in percolator subprocess handling; add check=True and use CalledProcessError properly - Fix hardcoded FDR threshold (0.01) ignoring the fdr parameter - Fix None values in mumble original_hit array causing inconsistent numpy boolean masking; default to True with explicit dtype=bool - Avoid mutating caller's config dict by tracking skipped feature generators in a separate set - Add empty PSM list guard in filter_mumble_psms - Extract duplicated charge-stripping regex to shared CHARGE_PATTERN constant - Fix typo "ckeck" -> "Check"
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Breaking changes
ionmobfeature generator (replaced by IM2Deep)maxquantfeature generator (replaced by newms2generator)lower_score_is_betterparameter from DeepLC feature generatorprocess_MS2_spectratocorrelate_preloadedNew features
.intermediate.psms.tsvon errors or KeyboardInterrupt during feature generation or rescoring. Supports resuming from intermediate files by skipping already-completed feature generators.ms2feature generator: Replaces MaxQuant-derived features with general MS2 spectrum-level features, backed byms2rescore-rs(Rust).filter_mumble_psmswith matched ion percentage threshold)._best_run_by_shared_proteoformsmethod selects the optimal run for fine-tuning based on proteoform overlap across runs.charts.pymodule with expanded plotting capabilities including DeepLC-specific plots. Major expansion ofgenerate.pyandutils.py.theoretical_mass,experimental_mass,mass_error, andpep_lenfeatures. Charge one-hot encoding now uses fixed range 1–6.Refactoring
deeplc.core.predict/finetuneAPI withSplineTransformerCalibrationper run. Replaces the oldDeepLCclass-based approach.im2deep.core.predictandLinearCCSCalibrationAPIs with per-run calibration against a default reference dataset. Addsmultiandcalibration_set_sizeparameters.ms2rescore-rs.add_features.psm_fileand feature generators. Explicitlist()conversion on scores to fix mokapot numbaTypingError.num_threadsfor consistency across DeepLC and IM2Deep.importlib.resourcesusage with properas_file()context manager. Removed unused imports.Dependencies
3.3.0-alpha.1deeplc>=4.0.0a2(was>=3.1)im2deep>=2.0.0a2(was>=1.1)ms2pip>=4.2.0a0(was>=4.0.0)ms2rescore_rs>=0.5.0a0(was>=0.4)numpy>=1.25,<3.0(was<2.0) — numpy 2 support[tool.uv] prerelease = "allow"ionmobdependency (and its TensorFlow requirement)deeplc_retrainenabled by default in configmaxquantreplaced withms2Docs & CI
ubuntu-22.04toubuntu-lts-latest