-
Notifications
You must be signed in to change notification settings - Fork 3
RFC: Strengthen /verify-reduction with algebraic/combinatorial and analysis techniques #1013
Description
Motivation
PR #992 verified 34 reductions with dual Python scripts (constructor + adversary, 443M+ checks). PR #1010 then rejected two of those (#368, #905) as "unsound" — but re-running PR #992's scripts confirms they pass. The problem: PR #1010 implemented a different construction, got it wrong, and blamed the reduction.
This exposes a gap in the current verification methodology: brute-force enumeration + random PBT can miss structural bugs, and two independent implementations can share blind spots.
This issue proposes adding algebraic, combinatorial, and analytic verification techniques to /verify-reduction to catch bug classes that the current approach cannot.
Current methodology (7 sections)
- Symbolic overhead (SymPy)
- Exhaustive forward/backward (n ≤ 5)
- Solution extraction
- Overhead formula spot-check
- Structural properties
- YES example
- NO example
Plus an independent adversary with hypothesis PBT.
Proposed enhancements
A. Algebraic/Combinatorial Techniques (discrete math)
| Technique | What it catches | Applicable to |
|---|---|---|
| Invariant-based induction — verify a preserved quantity (e.g., independence number = m for satisfiable instances) symbolically for general n, not just small n | Scaling bugs beyond n=5 | Gadget reductions (12 rules), graph transforms (18 rules) |
| Solution counting — enumerate ALL solutions on both sides, verify counts match (bijection) or relate by known factor | Shared blind spots between constructor/adversary; broken extraction that still passes ∃-checks | Complement/negation (5), identity (8), algebraic (16) |
| Parity checks — verify odd/even solution count is preserved; extremely sensitive to off-by-one gadget errors | Single misplaced edge in gadget construction | NAE-SAT reductions, set splitting, any reduction with complementary solution pairs |
| Rank/dimension — for GF(2) / linear equation reductions, verify constraint matrix rank matches proof claims | Accidentally linearly dependent rows that only matter at large n | X3C→GF2 (#859), X3C→MinWeightLinEq (#860), any linear algebra reduction |
| Boundary instance generation — systematically test at exact feasibility thresholds (S=2T, minimum degree, single-clause) | Off-by-one in case splits, edge cases in padding | SubsetSum→Partition (#973), all padding/embedding reductions (4 rules) |
| Extremal graph testing — test on K_n, empty, cycle, star, Petersen where answers are known analytically, scale to n=50+ | Construction bugs that only appear beyond brute-force range | All 18 graph transformation reductions |
| Monotonicity — verify adding a constraint to source predictably changes target | Coupling bugs between gadgets | Gadget reductions, scheduling encodings (8 rules) |
B. Real Analysis Techniques (novel, experimental)
| Technique | What it catches | Applicable to |
|---|---|---|
| Continuity of reduction mapping — for weighted problems, perturb weights continuously and verify target changes continuously; discontinuities reveal bugs | Construction bugs in weighted reductions that discrete testing misses | MaxCut, TSP, weighted MIS, all weighted ILP reductions |
| IVT for objective value gaps — if source objective takes values {a,b}, target should take all corresponding values in [f(a),f(b)]; gaps indicate missing solution mappings | Incomplete solution space coverage | All Max/Min optimization reductions |
| Convergence of overhead ratios — verify overhead(n)/n converges as n→∞ using limit theorems | Overhead formulas that are correct for small n but diverge | All reductions with non-trivial overhead |
| Compactness arguments — LP relaxation of target ILP has solution by compactness; verify relaxation tightness | ILP reductions where integrality gap matters | 85 ILP reductions |
C. SMT/SAT Bounded Verification
Encode "∀ instances of size ≤ n: source feasible ⟺ target feasible" as a Z3 formula and machine-check it. Stronger than enumeration because SMT reasons symbolically over all instances simultaneously.
What does NOT work
- Lean 4 formal proofs: Tried, doesn't integrate well with this library's Python/Rust verification pipeline. Mathlib's real analysis coverage is extensive but the formalization overhead (~500 hrs per project) is prohibitive for 167 reductions.
Questions for discussion
-
Which techniques have the best effort/impact ratio? The 85 ILP reductions are mechanical — should we focus algebraic techniques on the ~80 non-ILP reductions instead?
-
Should analysis techniques (Section B) be pursued? Nobody is applying real analysis to reduction verification — it could be novel and publishable, but unclear how many reductions actually benefit. Most applicable to weighted/optimization reductions.
-
Is SMT verification (Section C) worth the Z3 dependency? It's strictly stronger than brute force for bounded n, but adds complexity.
-
Integration: Should new techniques be additional script sections (8-11), a separate third verification layer, or upgrades to existing sections?
-
Priority: Given that the current pipeline already caught the [Rule] 3SAT to DIRECTED TWO-COMMODITY INTEGRAL FLOW #368/[Rule] 3-SATISFIABILITY to FEASIBLE REGISTER ASSIGNMENT #905 issue (PR docs: batch verify-reduction — 34 implementable reductions verified #992 was right, PR feat: add 11 medium-confidence reduction rules #1010 was wrong), is the main goal (a) preventing future false rejections, (b) catching real bugs at scale, or (c) building toward publishable verification methodology?
Reduction landscape (for context)
| Category | Count | Most promising technique |
|---|---|---|
| ILP formulations | 85 | Compactness / LP relaxation |
| Graph transformation | 18 | Extremal graphs, invariants |
| Linear algebra / QUBO | 16 | Rank/dimension |
| Special structural | 14 | Case-by-case |
| Gadget construction | 12 | Parity, counting, monotonicity |
| Scheduling encoding | 8 | Boundary instances |
| Identity/relabeling | 8 | Counting (bijection proof) |
| Complement/negation | 5 | Parity, counting |
| Padding/embedding | 4 | Boundary instances |
References
- MathConstruct: Constructive Proof Benchmark
- LemmaBench: Research-Level LLM Evaluation — LLMs only 15% correct on research-level lemmas
- DeepSeek-Prover-V2
- CSLib: Lean CS Library — formalizing theoretical CS proofs
- Construction-Verification Benchmark for Lean 4
- AI for Mathematics: Progress, Challenges, Prospects
Metadata
Metadata
Assignees
Labels
Type
Projects
Status