RFC: Strengthen /verify-reduction with algebraic/combinatorial and analysis techniques

## Motivation

PR #992 verified 34 reductions with dual Python scripts (constructor + adversary, 443M+ checks). PR #1010 then rejected two of those (#368, #905) as "unsound" — but re-running PR #992's scripts confirms they pass. The problem: PR #1010 implemented a *different* construction, got it wrong, and blamed the reduction.

This exposes a gap in the current verification methodology: **brute-force enumeration + random PBT can miss structural bugs, and two independent implementations can share blind spots.**

This issue proposes adding algebraic, combinatorial, and analytic verification techniques to `/verify-reduction` to catch bug classes that the current approach cannot.

## Current methodology (7 sections)

1. Symbolic overhead (SymPy)
2. Exhaustive forward/backward (n ≤ 5)
3. Solution extraction
4. Overhead formula spot-check
5. Structural properties
6. YES example
7. NO example

Plus an independent adversary with hypothesis PBT.

## Proposed enhancements

### A. Algebraic/Combinatorial Techniques (discrete math)

| Technique | What it catches | Applicable to |
|---|---|---|
| **Invariant-based induction** — verify a preserved quantity (e.g., independence number = m for satisfiable instances) symbolically for general n, not just small n | Scaling bugs beyond n=5 | Gadget reductions (12 rules), graph transforms (18 rules) |
| **Solution counting** — enumerate ALL solutions on both sides, verify counts match (bijection) or relate by known factor | Shared blind spots between constructor/adversary; broken extraction that still passes ∃-checks | Complement/negation (5), identity (8), algebraic (16) |
| **Parity checks** — verify odd/even solution count is preserved; extremely sensitive to off-by-one gadget errors | Single misplaced edge in gadget construction | NAE-SAT reductions, set splitting, any reduction with complementary solution pairs |
| **Rank/dimension** — for GF(2) / linear equation reductions, verify constraint matrix rank matches proof claims | Accidentally linearly dependent rows that only matter at large n | X3C→GF2 (#859), X3C→MinWeightLinEq (#860), any linear algebra reduction |
| **Boundary instance generation** — systematically test at exact feasibility thresholds (S=2T, minimum degree, single-clause) | Off-by-one in case splits, edge cases in padding | SubsetSum→Partition (#973), all padding/embedding reductions (4 rules) |
| **Extremal graph testing** — test on K_n, empty, cycle, star, Petersen where answers are known analytically, scale to n=50+ | Construction bugs that only appear beyond brute-force range | All 18 graph transformation reductions |
| **Monotonicity** — verify adding a constraint to source predictably changes target | Coupling bugs between gadgets | Gadget reductions, scheduling encodings (8 rules) |

### B. Real Analysis Techniques (novel, experimental)

| Technique | What it catches | Applicable to |
|---|---|---|
| **Continuity of reduction mapping** — for weighted problems, perturb weights continuously and verify target changes continuously; discontinuities reveal bugs | Construction bugs in weighted reductions that discrete testing misses | MaxCut, TSP, weighted MIS, all weighted ILP reductions |
| **IVT for objective value gaps** — if source objective takes values {a,b}, target should take all corresponding values in [f(a),f(b)]; gaps indicate missing solution mappings | Incomplete solution space coverage | All Max/Min optimization reductions |
| **Convergence of overhead ratios** — verify overhead(n)/n converges as n→∞ using limit theorems | Overhead formulas that are correct for small n but diverge | All reductions with non-trivial overhead |
| **Compactness arguments** — LP relaxation of target ILP has solution by compactness; verify relaxation tightness | ILP reductions where integrality gap matters | 85 ILP reductions |

### C. SMT/SAT Bounded Verification

Encode "∀ instances of size ≤ n: source feasible ⟺ target feasible" as a Z3 formula and machine-check it. Stronger than enumeration because SMT reasons symbolically over all instances simultaneously.

## What does NOT work

- **Lean 4 formal proofs**: Tried, doesn't integrate well with this library's Python/Rust verification pipeline. Mathlib's real analysis coverage is extensive but the formalization overhead (~500 hrs per project) is prohibitive for 167 reductions.

## Questions for discussion

1. **Which techniques have the best effort/impact ratio?** The 85 ILP reductions are mechanical — should we focus algebraic techniques on the ~80 non-ILP reductions instead?

2. **Should analysis techniques (Section B) be pursued?** Nobody is applying real analysis to reduction verification — it could be novel and publishable, but unclear how many reductions actually benefit. Most applicable to weighted/optimization reductions.

3. **Is SMT verification (Section C) worth the Z3 dependency?** It's strictly stronger than brute force for bounded n, but adds complexity.

4. **Integration**: Should new techniques be additional script sections (8-11), a separate third verification layer, or upgrades to existing sections?

5. **Priority**: Given that the current pipeline already caught the #368/#905 issue (PR #992 was right, PR #1010 was wrong), is the main goal (a) preventing future false rejections, (b) catching real bugs at scale, or (c) building toward publishable verification methodology?

## Reduction landscape (for context)

| Category | Count | Most promising technique |
|---|---|---|
| ILP formulations | 85 | Compactness / LP relaxation |
| Graph transformation | 18 | Extremal graphs, invariants |
| Linear algebra / QUBO | 16 | Rank/dimension |
| Special structural | 14 | Case-by-case |
| Gadget construction | 12 | Parity, counting, monotonicity |
| Scheduling encoding | 8 | Boundary instances |
| Identity/relabeling | 8 | Counting (bijection proof) |
| Complement/negation | 5 | Parity, counting |
| Padding/embedding | 4 | Boundary instances |

## References

- [MathConstruct: Constructive Proof Benchmark](https://openreview.net/forum?id=focYle7CfD)
- [LemmaBench: Research-Level LLM Evaluation](https://arxiv.org/html/2602.24173v1) — LLMs only 15% correct on research-level lemmas
- [DeepSeek-Prover-V2](https://www.infoq.com/news/2025/05/deepseek-prover-v2-formal-proof/)
- [CSLib: Lean CS Library](https://arxiv.org/html/2602.04846) — formalizing theoretical CS proofs
- [Construction-Verification Benchmark for Lean 4](https://arxiv.org/html/2602.01291)
- [AI for Mathematics: Progress, Challenges, Prospects](https://arxiv.org/html/2601.13209v1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Strengthen /verify-reduction with algebraic/combinatorial and analysis techniques #1013

Motivation

Current methodology (7 sections)

Proposed enhancements

A. Algebraic/Combinatorial Techniques (discrete math)

B. Real Analysis Techniques (novel, experimental)

C. SMT/SAT Bounded Verification

What does NOT work

Questions for discussion

Reduction landscape (for context)

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Technique	What it catches	Applicable to
Invariant-based induction — verify a preserved quantity (e.g., independence number = m for satisfiable instances) symbolically for general n, not just small n	Scaling bugs beyond n=5	Gadget reductions (12 rules), graph transforms (18 rules)
Solution counting — enumerate ALL solutions on both sides, verify counts match (bijection) or relate by known factor	Shared blind spots between constructor/adversary; broken extraction that still passes ∃-checks	Complement/negation (5), identity (8), algebraic (16)
Parity checks — verify odd/even solution count is preserved; extremely sensitive to off-by-one gadget errors	Single misplaced edge in gadget construction	NAE-SAT reductions, set splitting, any reduction with complementary solution pairs
Rank/dimension — for GF(2) / linear equation reductions, verify constraint matrix rank matches proof claims	Accidentally linearly dependent rows that only matter at large n	X3C→GF2 (#859), X3C→MinWeightLinEq (#860), any linear algebra reduction
Boundary instance generation — systematically test at exact feasibility thresholds (S=2T, minimum degree, single-clause)	Off-by-one in case splits, edge cases in padding	SubsetSum→Partition (#973), all padding/embedding reductions (4 rules)
Extremal graph testing — test on K_n, empty, cycle, star, Petersen where answers are known analytically, scale to n=50+	Construction bugs that only appear beyond brute-force range	All 18 graph transformation reductions
Monotonicity — verify adding a constraint to source predictably changes target	Coupling bugs between gadgets	Gadget reductions, scheduling encodings (8 rules)

Technique	What it catches	Applicable to
Continuity of reduction mapping — for weighted problems, perturb weights continuously and verify target changes continuously; discontinuities reveal bugs	Construction bugs in weighted reductions that discrete testing misses	MaxCut, TSP, weighted MIS, all weighted ILP reductions
IVT for objective value gaps — if source objective takes values {a,b}, target should take all corresponding values in [f(a),f(b)]; gaps indicate missing solution mappings	Incomplete solution space coverage	All Max/Min optimization reductions
Convergence of overhead ratios — verify overhead(n)/n converges as n→∞ using limit theorems	Overhead formulas that are correct for small n but diverge	All reductions with non-trivial overhead
Compactness arguments — LP relaxation of target ILP has solution by compactness; verify relaxation tightness	ILP reductions where integrality gap matters	85 ILP reductions

Category	Count	Most promising technique
ILP formulations	85	Compactness / LP relaxation
Graph transformation	18	Extremal graphs, invariants
Linear algebra / QUBO	16	Rank/dimension
Special structural	14	Case-by-case
Gadget construction	12	Parity, counting, monotonicity
Scheduling encoding	8	Boundary instances
Identity/relabeling	8	Counting (bijection proof)
Complement/negation	5	Parity, counting
Padding/embedding	4	Boundary instances

RFC: Strengthen /verify-reduction with algebraic/combinatorial and analysis techniques #1013

Description

Motivation

Current methodology (7 sections)

Proposed enhancements

A. Algebraic/Combinatorial Techniques (discrete math)

B. Real Analysis Techniques (novel, experimental)

C. SMT/SAT Bounded Verification

What does NOT work

Questions for discussion

Reduction landscape (for context)

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions