feat(mcp): add denoise option to hyperdx_search tool#2371
feat(mcp): add denoise option to hyperdx_search tool#2371brandon-pereira wants to merge 3 commits into
Conversation
Add a `denoise` boolean parameter to the MCP `hyperdx_search` tool that automatically filters out high-frequency repetitive event patterns from search results, mirroring the web app's "Denoise Results" feature. When `denoise=true`: - Samples 10k random events from the same source/time range - Mines patterns using the shared Drain algorithm (common-utils) - Identifies "noisy" patterns (>10% of sampled events) - Matches each search result row against learned patterns - Filters out rows matching noisy patterns - Returns filtered rows plus metadata listing removed patterns Also extracts `resolveBodyExpression()` and `SAFE_BODY_EXPR_CHARS` from runEventPatterns.ts into helpers.ts for reuse.
🦋 Changeset detectedLatest commit: ee722c9 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Resolve conflicts in helpers.ts and runEventPatterns.ts:
- helpers.ts: keep both our resolveBodyExpression/SAFE_BODY_EXPR_CHARS
exports and main's mergeWhereIntoSelectItems/clickHouseErrorResult
- runEventPatterns.ts: import resolveBodyExpression, SAFE_BODY_EXPR_CHARS,
and clickHouseErrorResult from helpers
- search.ts: update trimToolResponse usage to new { data, isTrimmed } API
Add changeset for the denoise feature (patch).
🟡 Tier 3 — StandardIntroduces new logic, modifies core functionality, or touches areas with non-trivial risk. Why this tier:
Review process: Full human review — logic, architecture, edge cases. Stats
|
Deep Review✅ No critical issues found. The denoise feature degrades gracefully (returns the original result with a 🟡 P2 -- recommended
🔵 P3 nitpicks (5)
Reviewers (6): ce-correctness-reviewer, ce-testing-reviewer, ce-maintainability-reviewer, ce-project-standards-reviewer, ce-api-contract-reviewer, ce-performance-reviewer. Testing gaps:
|
E2E Test Results✅ All tests passed • 192 passed • 3 skipped • 1261s
Tests ran across 4 shards in parallel. |
- Key noisy patterns by template string (p.pattern / match.getTemplate()) instead of per-instance cluster ID, eliminating fragile coupling to minePatterns() internal auto-increment ordering - Always emit a 'denoised' metadata block when denoise=true, including a 'skipped' field with the reason when denoising cannot proceed (source_not_found, no_body_column, body_column_not_in_results, connection_not_found, sampling_failed, no_sample_data, no_rows) - Rename originalRowCount to returnedRowCountBeforeDenoise to make the post-trim semantics explicit - Fix misleading maxSamples:0 comment (minePatterns always keeps at least one sample per cluster); use maxSamples:1 instead - Add integration tests for denoise=true: schema exposure, empty results handling, noisy pattern filtering with seeded data, denoised metadata shape assertions, and denoise=false control case
What
Adds a
denoiseboolean parameter to the MCPhyperdx_searchtool that automatically filters out high-frequency repetitive event patterns from search results, mirroring the web app's "Denoise Results" checkbox.Why
When investigating issues via MCP, LLM agents get back raw search results that are often dominated by repetitive log noise. This forces agents to either sift through noisy results or make a separate
hyperdx_event_patternscall and then manually filter. Thedenoiseoption makes this a single-call workflow.How it works
When
denoise=true:common-utils)TemplateMiner, matches each search result row against learned patternsdenoisedmetadata block listing removed patterns, original row count, and filtered row countGraceful degradation: if source/connection/body column can't be resolved, or if pattern sampling fails, the original results are returned unmodified.
Changes
packages/api/src/mcp/tools/query/denoise.tsdenoiseSearchResults()functionpackages/api/src/mcp/tools/query/search.tsdenoiseschema param + post-processing logicpackages/api/src/mcp/tools/query/helpers.tsresolveBodyExpression()+SAFE_BODY_EXPR_CHARSpackages/api/src/mcp/tools/query/runEventPatterns.tsExample response (with denoise=true)
{ "result": { "data": [...filtered rows...] }, "denoised": { "removedPatterns": [ { "pattern": "GET /health <*> <*>", "estimatedCount": 45000, "sampleCount": 4500 } ], "originalRowCount": 50, "filteredRowCount": 12 } }Performance
Adds ~1-2s latency when
denoise=truedue to the pattern sampling queries. No impact whendenoise=false(the default).Closes HDX-4346