Skip to content

fix(app): prevent crash when a source's database/table doesn't exist#2335

Draft
MikeShi42 wants to merge 2 commits into
mainfrom
cursor/fix-bad-source-render-loop-3e7e
Draft

fix(app): prevent crash when a source's database/table doesn't exist#2335
MikeShi42 wants to merge 2 commits into
mainfrom
cursor/fix-bad-source-render-loop-3e7e

Conversation

@MikeShi42
Copy link
Copy Markdown
Contributor

What

When a Source is configured against a ClickHouse database or table that doesn't exist on the current server (e.g. a stale source carried over from another deployment, or a freshly imported config before the data lands), navigating around the app crashed the page with Maximum update depth exceeded originating from a Mantine <input>. The console showed:

ClickHouseQueryError: Database otel_v2 does not exist.
    at t.aP.query (...)
    at async ... useMetadata.useJsonColumns

The crash was reproducible by creating a Trace source pointing at a database that doesn't exist (e.g. otel_v2), selecting it on the Search page, and navigating to/from Service Map (or other pages that observe the same source). See before_fix_crash_screen.png below.

Why

Three intertwined issues combined to surface as a hard render-loop crash:

  1. useTableMetadata's queryFn could resolve to undefined when the underlying table is missing in system.tables. React Query v5 treats undefined as an invalid result and throws Query data cannot be undefined. That synthetic error propagated through consumers that render schema-derived UI (filter sidebar, default order-by, etc.) and contributed to render loops.

  2. Schema-discovery hooks used the default 3-retry policy. "Database does not exist" is deterministic — retrying just amplifies the spurious re-renders that triggered the crash, and hammers ClickHouse with useless DESCRIBE requests.

  3. SearchInputV2's English-explanation effect re-ran every render. Many callers pass an inline tcFromSource(source) for tableConnection, which produces a fresh object reference each parent render. The effect depended on that reference, called genEnglishExplanation (which performs schema lookups), and updated state in a tight loop. When the underlying schema lookup rejected, the .then had no error handler, so the rejection bubbled as an unhandled promise rejection during a render storm. useFieldExpressionGenerator had a similar instability — it returned a fresh getFieldExpression closure on every render, destabilizing useMemo deps in callers.

What changed

  • useTableMetadata now normalizes the missing-table case to null (instead of undefined), so React Query no longer throws. Consumers already use tableMetadata?.…, so they see absence either way.
  • useColumns, useJsonColumns, and useTableMetadata now use retry: 0. These are deterministic schema queries — retrying is never useful.
  • useFieldExpressionGenerator memoizes its getFieldExpression closure with useCallback keyed on jsonColumns, so consumers' useMemos only recompute when the JSON-column set actually changes.
  • SearchInputV2 stabilizes the tableConnection dep via a content-derived key, swallows schema-lookup rejections to a console.warn, and adds a cancel flag so a stale resolution can't update state after the connection changed.

Walkthrough

Before — Search → Service Map with a source pointing at a non-existent database hard-crashed the page with Maximum update depth exceeded (image captured prior to the fix):

Before fix: Runtime Error overlay - Maximum update depth exceeded

After — same path now resolves cleanly: the Service Map page renders with a graceful inline error explaining that the database doesn't exist, and the rest of the navigation continues to work.

after_fix_search_to_service_map_no_crash.mp4
After fix: Service Map shows

Testing

  • make ci-lint (lint + TypeScript across all packages)
  • yarn nx run @hyperdx/app:ci:unit — 1679 tests passed, 0 failed
  • ✅ Manually reproduced the original crash on main, then verified the same path no longer crashes after these changes (see walkthrough above).

Notes / known follow-ups

  • One specific navigation order — Service Map → Search while lastSelectedSourceId in localStorage points at a "bad" source — still surfaces the dev-mode error overlay in some manual sessions. The original user-reported error chain (useMetadata.useJsonColumns → React Query throwing data is undefined → cascade) is fixed; the remaining edge case looks like a separate render-loop interaction inside the Search page's form/source-resolution effects and was not consistently reproducible from a headless harness in this task. Investigating it further may require additional instrumentation around useForm({ values }) syncing on the Search page.

To show artifacts inline, enable in settings.

Open in Web Open in Cursor 

cursoragent and others added 2 commits May 24, 2026 17:32
Sources can point at databases or tables that don't exist on the current
ClickHouse server (e.g. a stale config or freshly imported source).
Previously, navigating around the app while such a source was selected
caused a hard 'Maximum update depth exceeded' crash that originated from
schema-discovery hooks.

Two underlying bugs:

1. `useTableMetadata`'s queryFn could resolve to `undefined` when the
   table is missing in `system.tables`. React Query v5 treats `undefined`
   as an invalid query result and throws 'Query data cannot be undefined'.
   That synthetic error then propagated through the page and contributed
   to render loops in consumers that observed the query result.

2. The schema-discovery hooks (`useColumns`, `useJsonColumns`,
   `useTableMetadata`) used the default 3-retry policy. 'Database does
   not exist' is deterministic — retrying just amplifies the spurious
   re-renders that triggered the crash, and hammers the database with
   useless requests.

Additionally, `useFieldExpressionGenerator` returned a fresh
`getFieldExpression` closure on every render, which destabilized
downstream useMemo dependencies that themselves observe schema state.
Memoize the closure so consumers only recompute when `jsonColumns`
actually changes.

Co-authored-by: Mike Shi <mike@hyperdx.io>
… tableConnection refs

Callers like the dashboard / sessions / services pages pass an inline
`tcFromSource(source)` for `tableConnection`, producing a fresh object
reference every parent render. The lucene-explanation effect in
`SearchInputV2` previously depended on that reference and re-ran every
render, calling `setParsedEnglishQuery` in a tight loop. When the
underlying source's database doesn't exist, the schema lookups inside
`genEnglishExplanation` reject; the state update was wrapped in a
`.then` with no error handler, so the rejection bubbled as an unhandled
promise rejection and tripped the dev-mode error overlay during the
render-loop storm.

Stabilize the connection by memoizing on a content-derived string key,
swallow the schema-lookup error to a console warning, and add a cancel
flag so a stale resolution doesn't update state after the connection
changed.

Co-authored-by: Mike Shi <mike@hyperdx.io>
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 24, 2026

⚠️ No Changeset found

Latest commit: 854b54b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented May 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment May 24, 2026 6:06pm

Request Review

@github-actions
Copy link
Copy Markdown
Contributor

E2E Test Results

All tests passed • 177 passed • 3 skipped • 1192s

Status Count
✅ Passed 177
❌ Failed 0
⚠️ Flaky 5
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants