Fast MPS parser for free-format MPS files#1429
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds an opt-in experimental fast MPS reader with SIMD tokenization and Eisel–Lemire FP64 conversion, mmap-backed input streams, multi-threaded LZ4 frame decoding, a phase-based section scanner, and extended test coverage with CMake wiring for reader selection and optional LZ4 support. ChangesExperimental Fast MPS Parser with LZ4 Support
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (2)
cpp/tests/linear_programming/parser_test.cpp (1)
2553-2675: ⚡ Quick winAdd direct coverage for the new fast-reader rejection branches.
These dispatch tests still only exercise success paths. They never assert the two new guards in
read(...): rejectingfast_experimentalwithfixed_mps_format=true, and rejecting.qps*when the fast reader is selected. A small pair ofEXPECT_THROWcases would lock down the new CLI/API contract.As per coding guidelines, "Add test coverage for edge cases (empty, infeasible, unbounded, degenerate) when adding new code paths."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/tests/linear_programming/parser_test.cpp` around lines 2553 - 2675, Add two negative tests exercising the new fast-reader rejection branches: (1) call read<int,double> (or use dispatch_parse) with fast_experimental enabled while passing fixed_mps_format=true and EXPECT_THROW(std::logic_error) to cover the "reject fast_experimental with fixed_mps_format" guard (reference the read function signature and any CLI flag/parameter used to enable fast_experimental/fixed_mps_format in your API), and (2) attempt to parse a ".qps" (or ".qps.gz"/".qps.bz2") file while forcing the fast reader and EXPECT_THROW(std::logic_error) to cover the "reject .qps* when fast reader selected" branch; add these as new TEST cases next to the existing dispatch tests (e.g., alongside read, qps_extension_dispatches_to_mps_parser) so they run with the other parser dispatch tests.Source: Coding guidelines
cpp/tests/linear_programming/experimental_mps_fast/fast_fp64_parser_test.cpp (1)
117-149: ⚡ Quick winAdd explicit overflow/underflow and subnormal boundary cases.
The current suite never exercises the hardest FP64 paths: overflow, underflow-to-zero, and subnormal boundaries. Because the randomized generator caps exponents at
[-30, 30], it also won’t cover the fallback/equivalence edge cases this parser is most likely to regress on. Please add a few fixed cases like max-finite overflow, min-normal neighbors, and min-subnormal rounding boundaries.Based on learnings: "Tests must validate numerical correctness, not just run without error" and "Add test coverage for edge cases (empty, infeasible, unbounded, degenerate) when adding new code paths."
Also applies to: 168-176
Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/include/cuopt/linear_programming/io/parser.hpp`:
- Around line 20-25: The documentation for the experimental fast MPS reader is
incorrect: it claims LP/MIP/QP support but the fast path throws on any .qps*
input when mps_reader_type_t::fast_experimental is selected; update the docs to
accurately reflect supported formats (remove QP/.qps claim) wherever the enum
mps_reader_type_t and the dispatch comment block are defined (the enum
declaration and the doc comment above the MPS reader dispatcher), or
alternatively implement QP/.qps support in the fast reader; pick the
documentation fix unless you also add QP handling.
- Around line 155-180: The code unconditionally accepts and advertises “.lz4”
extensions even when LZ4 support may be compiled out; update the suffix checks
and the supported-extensions error message to be gated by the same compile-time
flag used in the build (e.g., MPS_PARSER_WITH_LZ4/CUOPT_PARSER_WITH_LZ4).
Concretely, wrap the checks that call lower.ends_with(".mps.lz4"), ".qps.lz4",
and ".lp.lz4" and the inclusion of ".mps.lz4/.qps.lz4/.lp.lz4" in the thrown
message with `#ifdef` MPS_PARSER_WITH_LZ4 (or the project’s appropriate macro),
and when the macro is not defined ensure those suffixes are not matched and not
listed in the supported extensions string referenced by
read_mps_fast_experimental, read_mps, and read_lp routing logic.
In `@cpp/src/io/experimental_mps_fast/fast_fp64_parser.hpp`:
- Around line 417-427: parse_fp64_advance currently accepts partially-parsed
numbers because it trusts parse_decimal_advance even when it stops early; change
it so that after a successful parse_decimal_advance(p, end, dec) you verify the
parser consumed the entire token by checking that p == end, and if not return
fallback_strtod(std::string_view(start, (size_t)(p - start))); only when p ==
end should you call assemble_fp64(dec) and return its value (with the existing
NaN check that falls back to fallback_strtod). This uses the existing symbols
parse_fp64_advance, parse_decimal_advance, assemble_fp64, and fallback_strtod.
In `@cpp/src/io/experimental_mps_fast/file_reader.cpp`:
- Around line 41-46: The case-sensitive suffix checks in path_has_suffix (and
the similar checks at 274-296) cause inconsistency with file_to_string.cpp which
lowercases filenames before detecting .lz4/.gz/.bz2; normalize the filename to a
lowercase copy once before calling effective_file_read_method()/before any
suffix checks and use that lowercase string for all suffix comparisons (or
update path_has_suffix to perform case-insensitive comparison), so files like
MODEL.MPS.LZ4 are detected as compressed by parse_mps_fast_file and the legacy
reader alike.
- Around line 97-108: get_file_size(const std::string& path) currently opens fd
then calls get_file_size(fd, path) and closes fd only on the success path,
leaking the descriptor if get_file_size(fd, path) throws; wrap the raw fd in a
small RAII/scoped guard (or use a unique_fd/ScopeExit) immediately after ::open
so the descriptor is closed on all exit paths, then call get_file_size(fd, path)
using the RAII handle and let the guard close the fd when it goes out of scope.
In `@cpp/src/io/experimental_mps_fast/file_reader.hpp`:
- Around line 156-183: The helper parallel_for_indexed currently accepts
thread_count==0 and silently does no work; validate thread_count at the start of
parallel_for_indexed (before reserving/spawning threads) and either clamp it to
at least 1 or throw an exception on zero. Update the function to check the
thread_count parameter (the one passed into parallel_for_indexed) immediately
and then proceed to use the validated value with scoped_thread_group workers,
next, and the existing worker lambda so no silent no-op occurs.
In `@cpp/src/io/experimental_mps_fast/lz4_file_reader.cpp`:
- Around line 536-557: Wrap the entire body of read_window in a try { ... }
catch(...) block so any exception (e.g., from new char[w.size], pread_full, or
other statements) is caught and forwarded by calling
fail_and_notify(std::current_exception()); retain the existing mps_parser_fail
usage for pread errors but remove any paths that let exceptions escape
read_window (ensure all exception flows end up invoking
fail_and_notify(std::current_exception()), using the function names read_window,
pread_full, mps_parser_fail, and fail_and_notify to locate the changes).
In `@cpp/src/io/experimental_mps_fast/mmap_region.hpp`:
- Around line 76-100: anonymous_aligned currently unmaps prefix/suffix using
byte counts which can produce non-page-aligned munmap calls (EINVAL) and
leak/steal pages; fix by making the helper either (A) retain the original raw
mapping and raw_size in mmap_region_t and unmap that entire raw mapping in the
destructor/reset(), or (B) compute prefix and suffix rounded up/down to the
system page size (use sysconf(_SC_PAGESIZE) or getpagesize()) so every munmap
boundary is page-aligned before calling ::munmap; implement RAII by adding
raw_ptr/raw_size members to mmap_region_t, unmapping in its destructor and
reset() and checking munmap return values, and update anonymous_aligned to
populate those members rather than attempting partial unmaps at arbitrary byte
offsets.
In `@cpp/src/io/experimental_mps_fast/mps_section_scanner.cpp`:
- Around line 125-130: mps_phase_registry_t::publish_endata currently overwrites
endata_begin_ and endata_present_ even after endata_ready_ was set, creating
races with readers that read the plain members after an acquire on
endata_ready_; change publish_endata so it performs a single-shot publication:
only update endata_begin_ and endata_present_ when endata_ready_ was not yet set
(use an atomic test-and-set or compare_exchange on endata_ready_ to detect first
publication) and return without mutating the payload if endata_ready_ is already
true; ensure this mirrors the behavior of publish() and apply the same fix to
the other occurrence mentioned (lines ~455-459) so readers of
endata_begin()/endata_present() see a single, immutable publication.
In `@cpp/src/io/utilities/error.hpp`:
- Around line 37-40: mps_parser_throw currently injects msg verbatim into a JSON
string which can produce invalid JSON when msg contains quotes, backslashes or
newlines; add a small JSON-escaping helper (e.g., json_escape(const
std::string&)) that replaces backslash, quote and control chars (e.g., \n, \r,
\t) with their JSON-escaped forms and use it when building the thrown
std::logic_error message (replace std::string(msg) with json_escape(msg));
reference mps_parser_throw and error_to_string when updating the construction so
the thrown payload remains {"MPS_PARSER_ERROR_TYPE": "...", "msg": "escaped
text"}.
In `@cpp/src/utilities/perf_counters.hpp`:
- Around line 11-15: This header is missing direct includes for utilities it
uses: add `#include` <utility> for std::pair, `#include` <cstring> for std::strlen
and std::strncmp, and `#include` <cstdlib> for std::strtol so perf_counters.hpp is
self-contained; update the include block (which currently has <array>, <cerrno>,
<cstdint>, <cstdio>, <vector>) to also include those three headers to satisfy
IWYU and the self-contained-header rule.
In
`@cpp/tests/linear_programming/experimental_mps_fast/fast_parser_edge_test.cpp`:
- Around line 94-129: The test helper check_models_match_reference_bitwise
currently uses EXPECT_EQ to compare floating-point vectors (A_, b_, c_,
variable_lower_bounds_, variable_upper_bounds_, constraint_lower_bounds_,
constraint_upper_bounds_), which compares values not IEEE-754 bit patterns;
change those comparisons to element-wise bit-wise comparisons (use the existing
bits() helper or std::bit_cast<uint64_t> on each double) so each corresponding
element in parser_model_t::A_, mps_data_model_t::A_, and the b_/c_/bound vectors
is compared by its bit representation; update the assertions for A_, b_, c_,
variable_lower_bounds_, variable_upper_bounds_, constraint_lower_bounds_, and
constraint_upper_bounds_ in check_models_match_reference_bitwise to iterate
elements and EXPECT_EQ(bits(ref[i]), bits(fast[i])) (with clear context strings)
instead of EXPECT_EQ on the whole vector.
---
Nitpick comments:
In `@cpp/tests/linear_programming/parser_test.cpp`:
- Around line 2553-2675: Add two negative tests exercising the new fast-reader
rejection branches: (1) call read<int,double> (or use dispatch_parse) with
fast_experimental enabled while passing fixed_mps_format=true and
EXPECT_THROW(std::logic_error) to cover the "reject fast_experimental with
fixed_mps_format" guard (reference the read function signature and any CLI
flag/parameter used to enable fast_experimental/fixed_mps_format in your API),
and (2) attempt to parse a ".qps" (or ".qps.gz"/".qps.bz2") file while forcing
the fast reader and EXPECT_THROW(std::logic_error) to cover the "reject .qps*
when fast reader selected" branch; add these as new TEST cases next to the
existing dispatch tests (e.g., alongside read,
qps_extension_dispatches_to_mps_parser) so they run with the other parser
dispatch tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ae230bf3-b5ba-42f6-a034-5f88021e23b3
📒 Files selected for processing (27)
cpp/CMakeLists.txtcpp/cuopt_cli.cppcpp/include/cuopt/linear_programming/io/parser.hppcpp/src/CMakeLists.txtcpp/src/io/CMakeLists.txtcpp/src/io/experimental_mps_fast/fast_fp64_parser.hppcpp/src/io/experimental_mps_fast/fast_parse_primitives.hppcpp/src/io/experimental_mps_fast/fast_parser.cppcpp/src/io/experimental_mps_fast/fast_parser.hppcpp/src/io/experimental_mps_fast/file_reader.cppcpp/src/io/experimental_mps_fast/file_reader.hppcpp/src/io/experimental_mps_fast/hash_table_smallstr.hppcpp/src/io/experimental_mps_fast/lz4_file_reader.cppcpp/src/io/experimental_mps_fast/mmap_region.hppcpp/src/io/experimental_mps_fast/mps_section_scanner.cppcpp/src/io/experimental_mps_fast/mps_section_scanner.hppcpp/src/io/experimental_mps_fast/nvtx_ranges.hppcpp/src/io/file_to_string.cppcpp/src/io/file_to_string.hppcpp/src/io/mps_parser.cppcpp/src/io/parser.cppcpp/src/io/utilities/error.hppcpp/src/utilities/perf_counters.hppcpp/tests/linear_programming/CMakeLists.txtcpp/tests/linear_programming/experimental_mps_fast/fast_fp64_parser_test.cppcpp/tests/linear_programming/experimental_mps_fast/fast_parser_edge_test.cppcpp/tests/linear_programming/parser_test.cpp
|
I can confirm I extensively used this code to parse several big mps instances (> 50GB) and the results were bitwise equal with the original parser. |
|
/ok to test 1990c06 |
|
/ok to test d7358f6 |
|
/ok to test b1abd6f |
There was a problem hiding this comment.
Thanks for the hard work, Alice! A fast parser is a great improvement.
I have a few general comments:
- Is there a reason for not using AVX512? Nowadays, hardware support is quite good in newer processors
- I never work with ARM SIMD, but NEOS seems to be the most common choice. Due to Grace/Vera, I think is important for us to support this architecture.
- Unless you see a large performance improvement for your custom implementation, I would use the STL methods when possible for simplicity (e.g., for
mallocor reading/parsing outside the hot path).
| constexpr uint32_t mul_u32(uint32_t m) | ||
| { | ||
| unsigned __int128 carry = 0; | ||
| for (uint64_t& v : limb) { |
There was a problem hiding this comment.
You can use pragma omp simd to give a hint to the compiler (although this should be easily auto-vectorizable)
There was a problem hiding this comment.
This is a constexpr-only path
|
|
||
| inline constexpr auto fast_fp64_parse_lut = make_power_table(); | ||
|
|
||
| inline constexpr std::array<double, 23> small_powers = { |
There was a problem hiding this comment.
If you want some a little fancier, you can use compile-time index sequences (https://cppreference.com/cpp/utility/integer_sequence)
There was a problem hiding this comment.
Template metaprogramming tricks are awesome but tend to tank compile times a lot more than what modern constexpr expressivity allows :)
| }); | ||
| if (lower.ends_with(".mps") || lower.ends_with(".mps.gz") || lower.ends_with(".mps.bz2") || | ||
| lower.ends_with(".qps") || lower.ends_with(".qps.gz") || lower.ends_with(".qps.bz2")) { | ||
| if (lower.ends_with(".mps.lz4") || lower.ends_with(".mps.bz2") || lower.ends_with(".mps.gz") || |
There was a problem hiding this comment.
Could you create a function that returns the filename without the compression (if it exists)? This will simplify the logic here
| }); | ||
| } | ||
|
|
||
| #pragma omp taskwait |
There was a problem hiding this comment.
You do not need the taskwait here. There is an implicit barrier at the end of the single and the parallel section that waits for all tasks to be completed before proceeding
|
|
||
| // Contract every input stream fed to parse_mps_fast_stream must satisfy. | ||
| template <typename Stream> | ||
| concept InputStream = requires(Stream stream) |
There was a problem hiding this comment.
I think the InputStream concept is not needed here since it is only used once and the templated method is enclosed in this file.
| return page_size; | ||
| } | ||
|
|
||
| bool pread_full(int fd, char* dst, std::size_t bytes, std::size_t offset) |
There was a problem hiding this comment.
Is there a function in the STL that does this? Seems like a common method
There was a problem hiding this comment.
Not really for this scenario :( The goal here is to submit parallel chunk reads to a single file descriptor. The C stdlib only really provides serial stateful options (FILE*, or std::ifstream in C++)
| // are named "<thread_name_prefix><worker-id>" when a prefix is supplied. | ||
| // OMP just doesn't really play well with blocking pread() | ||
| template <typename Body> | ||
| void parallel_for_indexed(std::size_t count, |
There was a problem hiding this comment.
In OpenMP, you already have dynamic scheduling policies that dynamically balance the load across the threads
There was a problem hiding this comment.
In this scenario (blocking i/o pread()s, the simplest option was just to stick to std::thread. These threads strictly do I/O, so it won't really lead to contention either way
| @@ -0,0 +1,194 @@ | |||
| // SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights | |||
There was a problem hiding this comment.
How this differ from running perf from outside?
There was a problem hiding this comment.
Convenience (directly logged, not having to tweak my existing scripts)
If need be this can be dropped, naturally
There was a problem hiding this comment.
It is fine to keep it. I just curious if there is a difference.
|
Regarding AVX512 - it is still not consistently present on modern consumer CPUs (either no support at all, or only a subset), and the AMD implementations are essentially still using 256bit ALUs in a trench coat; so I did not feel comfortable making this a requirement. I also don't expect the improvements to be huge without a significant refactor - most lines in MPS files are >16B and <32B wide, which fit AVX2 perfectly. AVX2 is also basically guaranteed to exist on any modern x86 CPU released in the last 10 years, consumer or server SVE2 support could be awesome :) Would be lovely to benchmark. As it stands, NEON is guaranteed to be present on all aarch64 CPUs and that's what SIMDe uses under the hood to translate the intel-style intrinsics. All custom implementations used here were motivated by benchmarking. std::from_chars came close to the fast fp64 parser used here on modern GCC libstdc++, but I did not feel too comfortable using it directly since if it falls back to a locale-based parser (or if compiled with any other libstdc+++), performance would tank significantly and seemingly unpredictably to the customer |
Fair enough. Thanks for explaining it to me! |
This PR introduces an experimental, opt-in fast parser for well-formed MPS files, which takes advantage of available parallelism and SIMD extensions on the target machine. This is mostly intended for huge (>1GB) MPS files, e.g. for distributed solves.
The parser relies on parallel I/O overlapping with MPS section parsing workers which start parsing as soon as read/decoded data is made available, in order to hide latency on slow NFS filesystems.
LZ4 is added as a supported decompression format, which is done in a parallel fashion. LZ4 tends to perform unusually well (~10-30% compression ratios) on MPS data due to their very regular nature and common prefixes in row/column names, and decompresses at a few GB/s.
This feature is exposed via a new CLI flag
--mps-reader, which can take eitherdefault(existing parser), orexperimental-fast(this PR) as its argument.LP, MIP, QP, and MPS SOCP formats are supported.
Results, on a corpus of 42 instances with sizes ranging from 1.55GB to 50.5GB:
Results for some >100GB files, NFS storage, cold cache (reference parser times out at 1 hour)
Description
Issue
Checklist