Skip to content

fix: harden CDN script inlining with linkedom#352

Merged
miguel-heygen merged 2 commits intomainfrom
fix/linkedom-cdn-inline-rewrite
Apr 20, 2026
Merged

fix: harden CDN script inlining with linkedom#352
miguel-heygen merged 2 commits intomainfrom
fix/linkedom-cdn-inline-rewrite

Conversation

@miguel-heygen
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen commented Apr 20, 2026

Summary

This fixes a renderer failure in the Raycast Showcase composition caused by our CDN script inlining path corrupting third-party JavaScript during HTML compilation.

HyperFrames intentionally inlines external scripts so compiled compositions stay self-contained and deterministic for local renders, CI, Docker, and offline execution. That path therefore needs to preserve arbitrary library source exactly, not mostly work for a subset of bundles.

Problem

inlineExternalScripts() previously rewrote <script src="..."> tags with string replacement. That is unsafe for arbitrary JavaScript because replacement-string semantics treat sequences like $&, $', $\`` and $1` as interpolation tokens.

In practice, D3 contained those byte patterns in its distributed source. During inlining, the replacement engine injected surrounding HTML into the middle of the fetched script, which produced browser-side Invalid or unexpected token errors during frame capture. Once the D3 bundle failed to parse, downstream code then failed with d3 is not defined.

What changed

  • switched CDN script inlining from raw string replacement to a DOM-based rewrite using linkedom
  • preserve all non-src script attributes when replacing external tags with inline tags
  • keep the existing </script> escaping so embedded script text cannot terminate the wrapper early
  • preserve fragment behavior by round-tripping wrapped fragments back to document.body.innerHTML

Review follow-up

  • added an explicit fragment-input regression test for inlineExternalScripts() so the exported fragment-return branch is covered directly
  • added a producer regression fixture based on the Spanish Empire project: packages/producer/tests/spanish-empire-cdn-inline
  • trimmed that fixture to the first 8 seconds so it stays practical for harness runs while still exercising the D3 + TopoJSON + GSAP CDN initialization path that previously failed
  • vendored world-atlas JSON locally inside the fixture so the regression remains deterministic and does not depend on a render-time network fetch

Why this approach

This is a small, targeted hardening of the compiler path that already exists for deterministic rendering. It avoids introducing a new policy change around external scripts while removing a brittle implementation detail.

Using a DOM rewrite is also the right abstraction boundary here: the problem is not "D3 is special" but "we were performing HTML structure edits with replacement-string semantics." Parsing once and mutating actual script nodes makes the inliner safe for any third-party bundle content, including regex-heavy and minified libraries.

Verification

  • added a regression test proving literal replacement tokens ($&, $', `$``) are preserved in downloaded script content
  • added a regression test proving non-src script attributes are preserved during inlining
  • added a regression test proving fragment inputs round-trip back as fragments after inlining
  • bun test packages/producer/src/services/htmlCompiler.test.ts
  • bunx oxfmt packages/producer/src/services/htmlCompiler.ts packages/producer/src/services/htmlCompiler.test.ts
  • bunx oxlint packages/producer/src/services/htmlCompiler.ts packages/producer/src/services/htmlCompiler.test.ts
  • rebuilt the local CLI and retried the previously failing Raycast Showcase render with the local build
  • confirmed the render passed compile and entered sustained frame capture without the prior Invalid or unexpected token or d3 is not defined failures
  • generated and then re-ran the Spanish Empire producer regression fixture successfully through packages/producer/src/regression-harness.ts

Scope

This change is intentionally narrow. It only affects the producer HTML compiler's external script inlining path and adds regression coverage around it; it does not change script ordering, CDN fetch policy, or composition execution semantics.

Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving — nice fix

The diagnosis is right and the fix is the right shape. String.prototype.replace(regex, stringReplacement) interprets $&, $`, $', $$, and $n as back-reference tokens, so piping third-party script bytes through it as the replacement argument is exactly how d3 (and any regex-heavy lib) gets silently mangled. Moving to el.replaceWith(...) on an actual DOM node removes the replacement-string surface area entirely, rather than papering over it with $ escaping — that would have been the wrong fix.

Side benefits worth calling out:

  • Attribute preservation (type="module", nomodule, crossorigin, etc.) — the old regex silently dropped these, which is a real correctness win for ESM CDN bundles.
  • Fixes a latent bug in the old regex path where duplicate CDN URLs would only have the first occurrence replaced (el.replaceWith handles each node independently now).
  • Array.from(el.attributes) before iterating is the right defensive move against live-collection mutation.

Scope is tight, CI is fully green across all 24 checks including the render-compat regression shards, and the Raycast Showcase manual verification is the right gate for this class of change.

One ask before merge — please add a fragment-input test

inlineExternalScripts is exported and the new wrappedFragment ? document.body.innerHTML : document.toString() branch is only exercised when callers pass a fragment. The main compileForRender pipeline always hands it a full document (via ensureFullDocument upstream), so this new branch is currently only covered implicitly. Something like:

it("returns a fragment when the input has no <html>/<body> wrapper", async () => {
  const originalFetch = globalThis.fetch;
  globalThis.fetch = mock(async () => new Response("var d3 = {};", { status: 200 })) as any;
  try {
    const html = '<script src="https://cdn.example.com/d3.min.js"></script><div>tail</div>';
    const result = await inlineExternalScripts(html);
    expect(result).not.toMatch(/<!DOCTYPE|<html|<head|<body/i);
    expect(result).toContain("<div>tail</div>");
    expect(result).toContain("var d3 = {};");
  } finally {
    globalThis.fetch = originalFetch;
  }
});

would lock in the fragment-round-trip contract so a future refactor can't silently regress it.

Minor note (take or leave)

When external scripts exist but all downloads fail, the function now returns document.toString() rather than the original html — so attribute quote style and boolean-attr serialization could differ slightly from input for external consumers of this exported function. The producer's own pipeline round-trips through linkedom anyway, so this is a no-op in practice. If you wanted to be strict, a symmetric early-return html when all downloads reject would preserve byte-identity. Not blocking.

Approving now — add the fragment test and merge.

@miguel-heygen miguel-heygen merged commit a539266 into main Apr 20, 2026
24 checks passed
@miguel-heygen miguel-heygen deleted the fix/linkedom-cdn-inline-rewrite branch April 20, 2026 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants