Skip to content

feat(mcp): add denoise option to hyperdx_search tool#2371

Open
brandon-pereira wants to merge 3 commits into
mainfrom
brandon/denoising
Open

feat(mcp): add denoise option to hyperdx_search tool#2371
brandon-pereira wants to merge 3 commits into
mainfrom
brandon/denoising

Conversation

@brandon-pereira
Copy link
Copy Markdown
Member

@brandon-pereira brandon-pereira commented May 29, 2026

What

Adds a denoise boolean parameter to the MCP hyperdx_search tool that automatically filters out high-frequency repetitive event patterns from search results, mirroring the web app's "Denoise Results" checkbox.

Why

When investigating issues via MCP, LLM agents get back raw search results that are often dominated by repetitive log noise. This forces agents to either sift through noisy results or make a separate hyperdx_event_patterns call and then manually filter. The denoise option makes this a single-call workflow.

How it works

When denoise=true:

  1. Runs the normal search query to get result rows
  2. In parallel, samples 10k random events from the same source/time range and counts total events
  3. Mines patterns using the shared Drain algorithm (common-utils)
  4. Identifies "noisy" patterns — those accounting for >10% of sampled events (same threshold as the web app)
  5. Rebuilds a TemplateMiner, matches each search result row against learned patterns
  6. Filters out rows matching noisy patterns
  7. Returns filtered rows plus a denoised metadata block listing removed patterns, original row count, and filtered row count

Graceful degradation: if source/connection/body column can't be resolved, or if pattern sampling fails, the original results are returned unmodified.

Changes

File Change
packages/api/src/mcp/tools/query/denoise.ts New — Core denoiseSearchResults() function
packages/api/src/mcp/tools/query/search.ts Added denoise schema param + post-processing logic
packages/api/src/mcp/tools/query/helpers.ts Extracted shared resolveBodyExpression() + SAFE_BODY_EXPR_CHARS
packages/api/src/mcp/tools/query/runEventPatterns.ts Imports from helpers instead of local definitions

Example response (with denoise=true)

{
  "result": { "data": [...filtered rows...] },
  "denoised": {
    "removedPatterns": [
      { "pattern": "GET /health <*> <*>", "estimatedCount": 45000, "sampleCount": 4500 }
    ],
    "originalRowCount": 50,
    "filteredRowCount": 12
  }
}
Screenshot 2026-05-29 at 8 42 07 AM

Performance

Adds ~1-2s latency when denoise=true due to the pattern sampling queries. No impact when denoise=false (the default).

Closes HDX-4346

Add a `denoise` boolean parameter to the MCP `hyperdx_search` tool that
automatically filters out high-frequency repetitive event patterns from
search results, mirroring the web app's "Denoise Results" feature.

When `denoise=true`:
- Samples 10k random events from the same source/time range
- Mines patterns using the shared Drain algorithm (common-utils)
- Identifies "noisy" patterns (>10% of sampled events)
- Matches each search result row against learned patterns
- Filters out rows matching noisy patterns
- Returns filtered rows plus metadata listing removed patterns

Also extracts `resolveBodyExpression()` and `SAFE_BODY_EXPR_CHARS`
from runEventPatterns.ts into helpers.ts for reuse.
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 29, 2026

🦋 Changeset detected

Latest commit: ee722c9

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@hyperdx/api Patch
@hyperdx/app Patch
@hyperdx/otel-collector Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented May 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment May 29, 2026 5:48pm
hyperdx-storybook Ready Ready Preview, Comment May 29, 2026 5:48pm

Request Review

Resolve conflicts in helpers.ts and runEventPatterns.ts:
- helpers.ts: keep both our resolveBodyExpression/SAFE_BODY_EXPR_CHARS
  exports and main's mergeWhereIntoSelectItems/clickHouseErrorResult
- runEventPatterns.ts: import resolveBodyExpression, SAFE_BODY_EXPR_CHARS,
  and clickHouseErrorResult from helpers
- search.ts: update trimToolResponse usage to new { data, isTrimmed } API

Add changeset for the denoise feature (patch).
@github-actions github-actions Bot added the review/tier-3 Standard — full human review required label May 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

🟡 Tier 3 — Standard

Introduces new logic, modifies core functionality, or touches areas with non-trivial risk.

Why this tier:

  • Diff size: 460 production lines changed (Tier 2 max: < 250)

Review process: Full human review — logic, architecture, edge cases.
SLA: First-pass feedback within 1 business day.

Stats
  • Production files changed: 4
  • Production lines changed: 460 (+ 104 in test files, excluded from tier calculation)
  • Branch: brandon/denoising
  • Author: brandon-pereira

To override this classification, remove the review/tier-3 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

Deep Review

✅ No critical issues found. The denoise feature degrades gracefully (returns the original result with a skipped reason) on every error path I traced, and bodyColumn is sourced from server-side source config — not per-call user input — so the unescaped interpolation into the sample query's SELECT is no new injection vector.

🟡 P2 -- recommended

  • packages/api/src/mcp/tools/query/search.ts:113 -- denoise consumes runConfigTile's output by re-parsing the stringified JSON envelope produced by formatQueryResult; any change to the wrapper shape (result.data, the outer result key) silently downgrades the call to a non-denoised response with no observable signal to the caller, even though denoise=true was requested.
    • Fix: Refactor runConfigTile to return a structured { data, meta } object internally and let the existing formatter wrap it at the edge, so denoise can consume rows without parsing serialized JSON.
  • packages/api/src/mcp/tools/query/search.ts:164 -- returnedRowCountBeforeDenoise uses rows.length and filteredRowCount uses denoised.rows.length, but a second trimToolResponse(denoisedResult) call on line 151 can drop more rows from data before serialization, so the reported filteredRowCount can overstate what the agent actually received.
    • Fix: Compute filteredRowCount from the post-trim row array, or include a separate returnedRowCount field that reflects what is actually in result.data.
  • packages/api/src/mcp/__tests__/queryTool.test.ts:391 -- the seeded-data tests exercise only the happy-path "filtered noisy pattern" branch; skipped: 'body_column_not_in_results', skipped: 'sampling_failed', the hint: 'All events matched noisy patterns…' branch, and the trim path are uncovered for new behavior introduced by this diff.
    • Fix: Add focused tests for at least the body_column_not_in_results skip path (e.g., a source whose body expression aliases to a name absent from default columns) and the all-rows-filtered hint branch.
🔵 P3 nitpicks (5)
  • packages/api/src/mcp/tools/query/denoise.ts:226 -- a second TemplateMiner is built and re-fed all 10k sample rows immediately after minePatterns ran its own internal miner over the same input, doubling Drain CPU and peak memory per denoised search.
    • Fix: Expose the trained miner from minePatterns (or accept an optional pre-trained miner) so the match step can reuse the first pass.
  • packages/api/src/mcp/tools/query/search.ts:144 -- the inline comment promises "Always emit a denoised block when denoise=true", but the early returns on lines 114, 121, and 128 (no resultText, parse failure, zero rows) all return the original result with no denoised block, contradicting the documented contract.
    • Fix: Either emit a minimal denoised block (with a skipped reason) on those early returns, or rewrite the comment to describe the actual best-effort behavior.
  • packages/api/src/mcp/__tests__/queryTool.test.ts:367 -- the test name reads "should emit denoised block when denoise=true on empty results" but its assertions and comment confirm the opposite (no denoised block is emitted on empty results).
    • Fix: Rename the test to reflect the actual contract (e.g., "should not emit denoised block when denoise=true on empty results").
  • packages/api/src/mcp/tools/query/denoise.ts:90 -- denoise constructs its own ClickhouseClient and getMetadata instances even though runConfigTile already created equivalents for the same source/connection in the same request; harmless but each denoised search opens an additional client.
    • Fix: Thread the existing client/metadata through, or factor a shared helper that lazily resolves them once per request.
  • packages/api/src/mcp/tools/query/denoise.ts:269 -- findBodyColumnKey only handles exact and case-insensitive matches, so any source whose body expression renders to a different column name in the result (common for trace sources using bracket-attribute expressions) silently lands in the body_column_not_in_results skip branch.
    • Fix: Document the limitation in the denoise schema description, or alias the body column in the main search select the same way runEventPatterns does for its sample query.

Reviewers (6): ce-correctness-reviewer, ce-testing-reviewer, ce-maintainability-reviewer, ce-project-standards-reviewer, ce-api-contract-reviewer, ce-performance-reviewer.

Testing gaps:

  • No coverage for the skipped: 'body_column_not_in_results' or skipped: 'sampling_failed' paths despite both being user-visible metadata.
  • No assertion that the hint field appears when all events get filtered as noise.
  • The metadata accuracy of filteredRowCount under the trim path is unverified.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

E2E Test Results

All tests passed • 192 passed • 3 skipped • 1261s

Status Count
✅ Passed 192
❌ Failed 0
⚠️ Flaky 3
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

- Key noisy patterns by template string (p.pattern / match.getTemplate())
  instead of per-instance cluster ID, eliminating fragile coupling to
  minePatterns() internal auto-increment ordering

- Always emit a 'denoised' metadata block when denoise=true, including a
  'skipped' field with the reason when denoising cannot proceed
  (source_not_found, no_body_column, body_column_not_in_results,
  connection_not_found, sampling_failed, no_sample_data, no_rows)

- Rename originalRowCount to returnedRowCountBeforeDenoise to make the
  post-trim semantics explicit

- Fix misleading maxSamples:0 comment (minePatterns always keeps at
  least one sample per cluster); use maxSamples:1 instead

- Add integration tests for denoise=true: schema exposure, empty results
  handling, noisy pattern filtering with seeded data, denoised metadata
  shape assertions, and denoise=false control case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/tier-3 Standard — full human review required

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant