-
Notifications
You must be signed in to change notification settings - Fork 3
[Rule] Vertex Cover to Longest Common Subsequence #429
Description
Source: MinimumVertexCover
Target: LongestCommonSubsequence
Motivation: Establishes NP-completeness of LONGEST COMMON SUBSEQUENCE (for an arbitrary number of strings) via polynomial-time reduction from VERTEX COVER. While LCS for two strings is solvable in O(n²) time by dynamic programming, Maier (1978) showed the problem becomes NP-complete for an unbounded number of strings. The reduction uses a vertex-alphabet encoding: each vertex becomes a symbol, each edge yields a constraint string that forbids both endpoints from appearing together in any common subsequence.
Reference: Maier, D. (1978). "The complexity of some problems on subsequences and supersequences." Journal of the ACM 25(2), pp. 322–336. Also catalogued as SR10 in Garey & Johnson, Computers and Intractability, p. 228.
GJ Source Entry
[SR10] LONGEST COMMON SUBSEQUENCE
INSTANCE: Finite alphabet Σ, finite set R of strings from Σ*, and a positive integer K.
QUESTION: Is there a string w ∈ Σ* with |w| ≥ K such that w is a subsequence of each x ∈ R?
Reference: [Maier, 1978]. Transformation from VERTEX COVER.
Comment: Remains NP-complete even if |Σ| = 2. Solvable in polynomial time for any fixed K or for fixed |R| (by dynamic programming).
Reduction Algorithm
Given a MinimumVertexCover instance G = (V, E) with V = {0, 1, ..., n−1} and E = {e₁, ..., eₘ}, construct a LongestCommonSubsequence instance as follows:
-
Alphabet: Σ = {0, 1, ..., n−1}, one symbol per vertex. So
alphabet_size = n. -
Template string: S₀ = (0, 1, 2, ..., n−1), listing all vertices in sorted order. Length = n.
-
Edge strings: For each edge eⱼ = {u, v}, construct:
Sⱼ = (0, ..., n−1 with u removed) ++ (0, ..., n−1 with v removed)
Each half is in sorted order. Length = 2(n−1). -
String set: R = {S₀, S₁, ..., Sₘ}, giving m + 1 strings total.
-
LCS bound: K' = n − K, where K is the vertex cover size. The LCS length equals the maximum independent set size.
Correctness (forward): Let I ⊆ V be an independent set of size n − K. The sorted sequence of symbols in I is a common subsequence of all strings:
- It is trivially a subsequence of S₀ since S₀ lists all vertices in order.
- For each edge string Sⱼ corresponding to edge {u, v}: since I is independent, at most one of u, v is in I. The symbol not in the edge endpoint set appears in both halves of Sⱼ. The symbol of the one endpoint that might be in I appears in the half where the other endpoint was removed. Therefore the sorted symbols of I appear as a subsequence.
Correctness (backward): Let w be a common subsequence of length ≥ n − K. Since w is a subsequence of S₀ = (0, 1, ..., n−1) and S₀ has no repeated symbols, w consists of distinct vertex symbols. For any edge {u, v}, the edge string Sⱼ = (V{u})(V{v}) contains u only in the second half and v only in the first half. If both u and v appeared in w, then since w must be a subsequence of Sⱼ, v must be matched before u — but w is also a subsequence of S₀ where u < v or v < u in some fixed order. This forces a contradiction for at least one ordering. Therefore at most one endpoint of each edge appears in w, so the symbols of w form an independent set. The complement V \ w is a vertex cover of size ≤ K.
Solution extraction: Given the LCS witness (a subsequence of symbols), the symbols present form an independent set. The vertex cover is the complement: config[v] = 1 if v does NOT appear in the LCS, config[v] = 0 if v appears in the LCS.
Time complexity of reduction: O(n · m) to construct all strings.
Size Overhead
Symbols:
- n =
num_verticesof source MinimumVertexCover instance - m =
num_edgesof source MinimumVertexCover instance
| Target field | Expression | Derivation |
|---|---|---|
alphabet_size |
num_vertices |
One symbol per vertex |
num_strings |
num_edges + 1 |
One template + one per edge |
max_length |
num_vertices |
min string length = n (template S₀ has length n ≤ 2(n−1) for n ≥ 2) |
total_length |
num_vertices + num_edges * 2 * (num_vertices - 1) |
S₀ has length n; each edge string has length 2(n−1) |
Example
Source instance (MinimumVertexCover on path P₄):
- Vertices: V = {0, 1, 2, 3}, n = 4
- Edges: {0,1}, {1,2}, {2,3}, m = 3
- Minimum vertex cover: {1, 2} of size K = 2 (covers all edges)
- Maximum independent set: {0, 3} of size n − K = 2
Constructed target instance (LongestCommonSubsequence):
- Alphabet: Σ = {0, 1, 2, 3}, alphabet_size = 4
- Template string: S₀ = (0, 1, 2, 3), length 4
- Edge strings:
- S₁ for edge {0, 1}: (1, 2, 3) ++ (0, 2, 3) = (1, 2, 3, 0, 2, 3), length 6
- S₂ for edge {1, 2}: (0, 2, 3) ++ (0, 1, 3) = (0, 2, 3, 0, 1, 3), length 6
- S₃ for edge {2, 3}: (0, 1, 3) ++ (0, 1, 2) = (0, 1, 3, 0, 1, 2), length 6
- String set: R = {S₀, S₁, S₂, S₃}, num_strings = 4
- max_length = min(4, 6, 6, 6) = 4
- LCS bound: K' = 4 − 2 = 2
Verification that (0, 3) is a common subsequence of length 2:
- S₀ = (0, 1, 2, 3): subsequence at positions 0, 3 ✓
- S₁ = (1, 2, 3, 0, 2, 3): match 0 at position 3, then 3 at position 5 ✓
- S₂ = (0, 2, 3, 0, 1, 3): match 0 at position 0, then 3 at position 5 ✓
- S₃ = (0, 1, 3, 0, 1, 2): match 0 at position 0, then 3 at position 2 ✓
Solution extraction:
- LCS witness = (0, 3) → independent set = {0, 3}
- Vertex cover = complement = {1, 2}
- Config: config = [0, 1, 1, 0] (v in cover ↔ config[v] = 1)
- Check: edge {0,1} covered by v1 ✓, edge {1,2} covered by v1 and v2 ✓, edge {2,3} covered by v2 ✓
Validation Method
- Closed-loop test: reduce a MinimumVertexCover instance to LongestCommonSubsequence, solve the target with BruteForce, extract solution, verify it is a valid vertex cover on the source
- Test with path P₄ as above (MVC = 2, LCS = 2)
- Test with triangle K₃ (MVC = 2, LCS = 1, independent set = any single vertex)
- Test with empty graph (no edges): MVC = 0, LCS = n (all vertices form the independent set)
- Verify that every constructed string only uses symbols in {0, ..., n−1}
References
- [Maier, 1978] David Maier. "The complexity of some problems on subsequences and supersequences." Journal of the ACM 25(2), pp. 322–336, 1978.
- [Garey & Johnson, 1979] Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979. Problem SR10, p. 228.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status