Add JsonIndexDistinctOperator for index-based SELECT DISTINCT on JSON columns#17820
Conversation
9fe3f76 to
a986d91
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #17820 +/- ##
============================================
- Coverage 63.22% 63.18% -0.05%
Complexity 1481 1481
============================================
Files 3190 3191 +1
Lines 192312 192551 +239
Branches 29475 29528 +53
============================================
+ Hits 121591 121657 +66
- Misses 61193 61365 +172
- Partials 9528 9529 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
...st/java/org/apache/pinot/integration/tests/InvertedIndexDistinctOperatorIntegrationTest.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java
Outdated
Show resolved
Hide resolved
...st/java/org/apache/pinot/integration/tests/InvertedIndexDistinctOperatorIntegrationTest.java
Outdated
Show resolved
Hide resolved
5e9cc7e to
bc0d213
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds two new index-based distinct operators (JsonIndexDistinctOperator and InvertedIndexDistinctOperator) that avoid the scan-based projection pipeline for SELECT DISTINCT queries by reading values directly from JSON or inverted indexes. Both operators are disabled by default and opt-in via query options (useJsonIndexDistinct, useInvertedIndexDistinct, or the umbrella useIndexBasedDistinctOperator).
Changes:
- Two new operators (
JsonIndexDistinctOperator,InvertedIndexDistinctOperator) and their integration intoDistinctPlanNode's operator selection logic, plus query option plumbing inCommonConstantsandQueryOptionsUtils. - Integration tests in
JsonPathTestandOfflineClusterIntegrationTestvalidating correctness and index-only execution stats for both operators. - A new
isPathIndexeddefault method onJsonIndexReaderSPI interface, and an unrelated change toMultiStageWithoutStatsIntegrationTest.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
CommonConstants.java |
Adds three new query option keys for index-based distinct |
QueryOptionsUtils.java |
Adds parsing methods for the new query options with umbrella fallback |
JsonIndexReader.java |
Adds isPathIndexed default method for path indexing check |
DistinctPlanNode.java |
Integrates new operators into the plan selection logic |
JsonIndexDistinctOperator.java |
New operator using JSON index value→docId map for DISTINCT |
InvertedIndexDistinctOperator.java |
New operator using inverted index dictId→docIds for DISTINCT |
JsonPathTest.java |
Integration tests for JsonIndexDistinctOperator |
OfflineClusterIntegrationTest.java |
Integration tests for InvertedIndexDistinctOperator |
MultiStageWithoutStatsIntegrationTest.java |
Unrelated change replacing enum reference with string literal |
You can also share your feedback on Copilot code review. Take the survey.
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
.../src/test/java/org/apache/pinot/integration/tests/MultiStageWithoutStatsIntegrationTest.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
...on-tests/src/test/java/org/apache/pinot/integration/tests/OfflineClusterIntegrationTest.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/operator/query/InvertedIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Show resolved
Hide resolved
bc0d213 to
57dfc5b
Compare
57dfc5b to
a69679e
Compare
a69679e to
92561b5
Compare
b301053 to
1930742
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
You can also share your feedback on Copilot code review. Take the survey.
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
1930742 to
73bd3cf
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
You can also share your feedback on Copilot code review. Take the survey.
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/DistinctPlanNode.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Show resolved
Hide resolved
73bd3cf to
f1fcf4a
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.
You can also share your feedback on Copilot code review. Take the survey.
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Show resolved
Hide resolved
...-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/JsonPathTest.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Show resolved
Hide resolved
6b18472 to
0eb69d1
Compare
0eb69d1 to
14283f9
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
You can also share your feedback on Copilot code review. Take the survey.
...-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/JsonPathTest.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
14283f9 to
1e37029
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
You can also share your feedback on Copilot code review. Take the survey.
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Show resolved
Hide resolved
1e37029 to
4a21183
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
You can also share your feedback on Copilot code review. Take the survey.
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/JsonIndexDistinctOperator.java
Show resolved
Hide resolved
4a21183 to
81a4816
Compare
… columns Introduce a new operator that resolves SELECT DISTINCT jsonExtractIndex(...) queries directly from the JSON index's value-to-docId map, bypassing the document scan and projection/transform pipeline entirely. This is opt-in via the query option `SET useIndexBasedDistinctOperator=true`. Key changes: - JsonIndexDistinctOperator reads distinct values from the JSON index with support for typed distinct tables (INT, LONG, FLOAT, DOUBLE, BIG_DECIMAL, STRING), ORDER BY, LIMIT, filter pushdown, defaultValue, and null handling - DistinctPlanNode routes to JsonIndexDistinctOperator when the query option is enabled and a single jsonExtractIndex expression has a backing JSON index - JsonIndexReader.isPathIndexed() default method for path availability checks - QueryOptionsUtils helpers and USE_INDEX_BASED_DISTINCT_OPERATOR constant - Integration tests in JsonPathTest verifying correctness against baseline, filter support, defaultValue handling, and zero numEntriesScannedPostFilter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
81a4816 to
c2dd0bb
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
You can also share your feedback on Copilot code review. Take the survey.
| if (parsed._defaultValue != null) { | ||
| addValueToDistinctTable(distinctTable, parsed._defaultValue, parsed._dataType, orderByExpression); | ||
| } else if (_queryContext.isNullHandlingEnabled()) { | ||
| distinctTable.addNull(); | ||
| } else { | ||
| throw new RuntimeException( | ||
| String.format("Illegal Json Path: [%s], for some docIds in segment [%s]", | ||
| parsed._jsonPathString, _indexSegment.getSegmentName())); | ||
| } |
Summary
Add
JsonIndexDistinctOperator— an index-only execution path forSELECT DISTINCT jsonExtractIndex(...)queries on columns with a JSON index. Instead of scanning documents through the projection/transform pipeline, the operator reads distinct values directly from the JSON index's value→docId map, avoiding per-doc evaluation entirely.The operator is disabled by default and opt-in via a query option.
Key changes
JsonIndexDistinctOperator(pinot-core): New operator that reads the JSON index value→docId map directly, intersects with the filter bitmap, and populates a typedDistinctTable(supportsINT,LONG,FLOAT,DOUBLE,BIG_DECIMAL,STRING). HandlesdefaultValuesemantics andnullHandlingEnabledfor docs where the JSON path is absent.DistinctPlanNode: Routes toJsonIndexDistinctOperatorwhen the query option is enabled and the expression is eligible.QueryOptionsUtils/CommonConstants: New query optionuseIndexBasedDistinctOperator.JsonIndexReader.isPathIndexed(): New default method so the operator can check whether a path is indexed (alwaystruefor OSS JSON index; selective for composite JSON index).JsonPathTest): Validates baseline vs optimized results match, with and without filters, for both SSE and MSE.Usage
Enable via query option:
Or per-query via REST API:
Prerequisites
{ "tableName": "myTable_OFFLINE", "tableType": "OFFLINE", "fieldConfigList": [ { "name": "myJsonCol", "encodingType": "RAW", "indexTypes": ["JSON"] } ] }Or via the legacy shorthand:
{ "tableIndexConfig": { "jsonIndexColumns": ["myJsonCol"] } }Supported query patterns
SELECT DISTINCT jsonExtractIndex(col, '$.path', 'STRING')SELECT DISTINCT jsonExtractIndex(col, '$.path', 'INT')SELECT DISTINCT jsonExtractIndex(col, '$.path', 'STRING', 'default')SELECT DISTINCT jsonExtractIndex(col, '$.path', 'STRING', 'default', '$.filter')WHEREclause filtersORDER BYSTRING_ARRAY, etc.)SELECT DISTINCTPerformance
For SSE queries,
numEntriesScannedPostFilter = 0— the operator reads entirely from the index without scanning any documents.Test plan
numEntriesScannedPostFilter = 0for SSEJsonExtractIndexTransformFunctionTestunit tests pass (31/31)SET enableNullHandling = trueand JSON path is absent🤖 Generated with Claude Code