Skip to content

[Rule] Vertex Cover to Longest Common Subsequence #429

@isPANN

Description

@isPANN

Source: MinimumVertexCover
Target: LongestCommonSubsequence
Motivation: Establishes NP-completeness of LONGEST COMMON SUBSEQUENCE (for an arbitrary number of strings) via polynomial-time reduction from VERTEX COVER. While LCS for two strings is solvable in O(n²) time by dynamic programming, Maier (1978) showed the problem becomes NP-complete for an unbounded number of strings. The reduction uses a vertex-alphabet encoding: each vertex becomes a symbol, each edge yields a constraint string that forbids both endpoints from appearing together in any common subsequence.
Reference: Maier, D. (1978). "The complexity of some problems on subsequences and supersequences." Journal of the ACM 25(2), pp. 322–336. Also catalogued as SR10 in Garey & Johnson, Computers and Intractability, p. 228.

GJ Source Entry

[SR10] LONGEST COMMON SUBSEQUENCE
INSTANCE: Finite alphabet Σ, finite set R of strings from Σ*, and a positive integer K.
QUESTION: Is there a string w ∈ Σ* with |w| ≥ K such that w is a subsequence of each x ∈ R?
Reference: [Maier, 1978]. Transformation from VERTEX COVER.
Comment: Remains NP-complete even if |Σ| = 2. Solvable in polynomial time for any fixed K or for fixed |R| (by dynamic programming).

Reduction Algorithm

Given a MinimumVertexCover instance G = (V, E) with V = {0, 1, ..., n−1} and E = {e₁, ..., eₘ}, construct a LongestCommonSubsequence instance as follows:

  1. Alphabet: Σ = {0, 1, ..., n−1}, one symbol per vertex. So alphabet_size = n.

  2. Template string: S₀ = (0, 1, 2, ..., n−1), listing all vertices in sorted order. Length = n.

  3. Edge strings: For each edge eⱼ = {u, v}, construct:
    Sⱼ = (0, ..., n−1 with u removed) ++ (0, ..., n−1 with v removed)
    Each half is in sorted order. Length = 2(n−1).

  4. String set: R = {S₀, S₁, ..., Sₘ}, giving m + 1 strings total.

  5. LCS bound: K' = n − K, where K is the vertex cover size. The LCS length equals the maximum independent set size.

Correctness (forward): Let I ⊆ V be an independent set of size n − K. The sorted sequence of symbols in I is a common subsequence of all strings:

  • It is trivially a subsequence of S₀ since S₀ lists all vertices in order.
  • For each edge string Sⱼ corresponding to edge {u, v}: since I is independent, at most one of u, v is in I. The symbol not in the edge endpoint set appears in both halves of Sⱼ. The symbol of the one endpoint that might be in I appears in the half where the other endpoint was removed. Therefore the sorted symbols of I appear as a subsequence.

Correctness (backward): Let w be a common subsequence of length ≥ n − K. Since w is a subsequence of S₀ = (0, 1, ..., n−1) and S₀ has no repeated symbols, w consists of distinct vertex symbols. For any edge {u, v}, the edge string Sⱼ = (V{u})(V{v}) contains u only in the second half and v only in the first half. If both u and v appeared in w, then since w must be a subsequence of Sⱼ, v must be matched before u — but w is also a subsequence of S₀ where u < v or v < u in some fixed order. This forces a contradiction for at least one ordering. Therefore at most one endpoint of each edge appears in w, so the symbols of w form an independent set. The complement V \ w is a vertex cover of size ≤ K.

Solution extraction: Given the LCS witness (a subsequence of symbols), the symbols present form an independent set. The vertex cover is the complement: config[v] = 1 if v does NOT appear in the LCS, config[v] = 0 if v appears in the LCS.

Time complexity of reduction: O(n · m) to construct all strings.

Size Overhead

Symbols:

  • n = num_vertices of source MinimumVertexCover instance
  • m = num_edges of source MinimumVertexCover instance
Target field Expression Derivation
alphabet_size num_vertices One symbol per vertex
num_strings num_edges + 1 One template + one per edge
max_length num_vertices min string length = n (template S₀ has length n ≤ 2(n−1) for n ≥ 2)
total_length num_vertices + num_edges * 2 * (num_vertices - 1) S₀ has length n; each edge string has length 2(n−1)

Example

Source instance (MinimumVertexCover on path P₄):

  • Vertices: V = {0, 1, 2, 3}, n = 4
  • Edges: {0,1}, {1,2}, {2,3}, m = 3
  • Minimum vertex cover: {1, 2} of size K = 2 (covers all edges)
  • Maximum independent set: {0, 3} of size n − K = 2

Constructed target instance (LongestCommonSubsequence):

  • Alphabet: Σ = {0, 1, 2, 3}, alphabet_size = 4
  • Template string: S₀ = (0, 1, 2, 3), length 4
  • Edge strings:
    • S₁ for edge {0, 1}: (1, 2, 3) ++ (0, 2, 3) = (1, 2, 3, 0, 2, 3), length 6
    • S₂ for edge {1, 2}: (0, 2, 3) ++ (0, 1, 3) = (0, 2, 3, 0, 1, 3), length 6
    • S₃ for edge {2, 3}: (0, 1, 3) ++ (0, 1, 2) = (0, 1, 3, 0, 1, 2), length 6
  • String set: R = {S₀, S₁, S₂, S₃}, num_strings = 4
  • max_length = min(4, 6, 6, 6) = 4
  • LCS bound: K' = 4 − 2 = 2

Verification that (0, 3) is a common subsequence of length 2:

  • S₀ = (0, 1, 2, 3): subsequence at positions 0, 3 ✓
  • S₁ = (1, 2, 3, 0, 2, 3): match 0 at position 3, then 3 at position 5 ✓
  • S₂ = (0, 2, 3, 0, 1, 3): match 0 at position 0, then 3 at position 5 ✓
  • S₃ = (0, 1, 3, 0, 1, 2): match 0 at position 0, then 3 at position 2 ✓

Solution extraction:

  • LCS witness = (0, 3) → independent set = {0, 3}
  • Vertex cover = complement = {1, 2}
  • Config: config = [0, 1, 1, 0] (v in cover ↔ config[v] = 1)
  • Check: edge {0,1} covered by v1 ✓, edge {1,2} covered by v1 and v2 ✓, edge {2,3} covered by v2 ✓

Validation Method

  • Closed-loop test: reduce a MinimumVertexCover instance to LongestCommonSubsequence, solve the target with BruteForce, extract solution, verify it is a valid vertex cover on the source
  • Test with path P₄ as above (MVC = 2, LCS = 2)
  • Test with triangle K₃ (MVC = 2, LCS = 1, independent set = any single vertex)
  • Test with empty graph (no edges): MVC = 0, LCS = n (all vertices form the independent set)
  • Verify that every constructed string only uses symbols in {0, ..., n−1}

References

  • [Maier, 1978] David Maier. "The complexity of some problems on subsequences and supersequences." Journal of the ACM 25(2), pp. 322–336, 1978.
  • [Garey & Johnson, 1979] Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979. Problem SR10, p. 228.

Metadata

Metadata

Assignees

No one assigned

    Labels

    GoodAn issue passed all checks.ruleA new reduction rule to be added.

    Type

    No type

    Projects

    Status

    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions