[server] Add server-side filter execution and client-side integration by platinumhamburg · Pull Request #2951 · apache/fluss

platinumhamburg · 2026-03-28T14:22:53Z

Purpose

Linked issue: close #2950 2950

Brief change log

Tests

API and Format

Documentation

- Fix filter variable reset in Replica catch block to ensure graceful fallback on partial initialization failure - Remove unused fullRowType field in LogFetcher - Move RecordBatchFilterTest to fluss-server to use PredicateSchemaResolver - Add Javadoc for LogTablet.read() - Defer MultiBytesView.Builder allocation until first included batch - Rate-limit filter evaluation warnings in LogSegment - Fix Yoda condition style in LogFetcher

- Rename ambiguous identifiers (schemaId -> filterSchemaId, lastFilteredSkipOffset -> lastFilteredEndOffset, getSchemaId -> getTargetSchemaId) - Extract sentinel -1 to NO_FILTERED_END_OFFSET constant - Add null check for schema lookup in Replica filter init - Remove redundant maxPosition guard in readWithFilter - Narrow 7-param LogSegment.read() to package-private - Add @NotThreadSafe on PredicateSchemaResolver - Add debug log for filtered segment skip in LocalLog - Strengthen test assertions to concrete values - Remove redundant test, add Javadoc and proto comments

wuchong · 2026-03-31T16:03:14Z

fluss-client/src/main/java/org/apache/fluss/client/table/scanner/TableScan.java

                projectedColumns,
-                schemaGetter);
+                schemaGetter,
+                recordBatchFilter);


throw exception in all the other createXxxScanner if the recordBatchFilter is not null, because they all don't support the filter.

wuchong · 2026-03-31T16:14:32Z

fluss-client/src/main/java/org/apache/fluss/client/table/scanner/Scan.java

+    default Scan filter(@Nullable Predicate predicate) {
+        throw new UnsupportedOperationException("Filter pushdown is not supported by this Scan.");


We don't need to have default implementation, because it is not a user-defined interface. All the implementation of this interface are provided by Fluss itself, so a simple interface without default implemenation is much cleaner here.

wuchong · 2026-04-02T15:50:44Z

fluss-common/src/test/java/org/apache/fluss/record/FileLogProjectionTest.java

+    @ParameterizedTest
+    @ValueSource(bytes = {LOG_MAGIC_VALUE_V0, LOG_MAGIC_VALUE_V1})
+    void testProjectRecordBatchNoStatisticsClearing(byte recordBatchMagic) throws Exception {
+        // Test that statistics clearing only happens for V2+ versions


Since we have changed the stats to V1, we should update the tests to reflect that. IIUC, this should test only V0 and we may also need to update other tests.

wuchong · 2026-04-02T17:02:51Z

fluss-server/src/main/java/org/apache/fluss/server/log/LogSegment.java

+        MultiBytesView.Builder builder = null;
+        int accumulatedSize = 0;
+        FileChannelLogRecordBatch firstIncludedBatch = null;
+        FileChannelLogRecordBatch lastIncludedBatch = null;


lastIncludedBatch is not used, remove

wuchong · 2026-04-02T17:08:36Z

fluss-server/src/main/java/org/apache/fluss/server/log/LogSegment.java

+                accumulatedSize += batchSize;
+            } else {
+                // With projection: project first, then check size with projected size
+                BytesView projectedBytesView = projection.projectRecordBatch(batch);


throw exception if logFormat != LogFormat.ARROW like how we do in readWithoutFilter.

wuchong · 2026-04-02T17:41:31Z

fluss-server/src/main/java/org/apache/fluss/server/utils/ServerRpcMessageUtils.java

+                if (result == null) {
+                    result = new HashMap<>();
+                }
+                int schemaId = tableReq.hasFilterSchemaId() ? tableReq.getFilterSchemaId() : -1;


Is -1 valid? If not, we should fail-fast by throwing exception.

wuchong · 2026-04-02T18:13:59Z

fluss-rpc/src/main/java/org/apache/fluss/rpc/util/CommonRpcMessageUtils.java

+                if (respForBucket.hasFilteredEndOffset()
+                        && respForBucket.getFilteredEndOffset() >= 0) {
+                    fetchLogResultForBucket =
+                            new FetchLogResultForBucket(
+                                    tb,


When filteredEndOffset is set, any records in the response are currently silently ignored. Although records are typically unset when filteredEndOffset is present, a specific edge case leads to inefficiency: if the first batch in a log file matches the filter but subsequent batches do not, the system returns only the next offset of the matched batch without including the FilteredEndOffset. Consequently, these already-filtered batches must be re-evaluated in the next request, resulting in unnecessary resource consumption and performance degradation.

wuchong · 2026-04-02T18:17:20Z

fluss-common/src/main/java/org/apache/fluss/record/DefaultLogRecordBatchStatistics.java

    }

    @Override
    public Long[] getNullCounts() {


I think we can optimize getNullCounts() return a primitive int[].

int count is enough as you only use 4 bytes for each field null count.

int rather than boxed type Integer to avoid boxing (double size), as you use -1 as non-exist.

This can reduce a lot of memory size for cachedNullCounts, statsNullCounts. Especially for cachedNullCounts when total field count is very large (1000+).

wuchong · 2026-04-02T18:18:55Z

fluss-common/src/main/java/org/apache/fluss/record/ArrowNullCountReader.java

+ * FileChannelLogRecordBatch and DefaultLogRecordBatch to read null counts from Arrow metadata
+ * instead of the statistics binary format (V2).
+ */
+class ArrowNullCountReader {


Could you revert the commit of get null count from arrow meta, this pull request has already been very large. I think we can review this in an separate pull request. Besides, I have some concerns on the cache and deserialization performance.

wuchong · 2026-04-02T18:19:47Z

fluss-common/src/main/java/org/apache/fluss/record/ArrowNullCountReader.java

+    static final class FieldNodeMappingKey {
+        private final int schemaId;
+        private final int[] statsIndexMapping;


The cache key lacks tableId, it has issues when there are multiple tables.

This reverts commit 63db1b5.

platinumhamburg force-pushed the filter-server-client-integration branch 2 times, most recently from 43fc542 to 3cd1f6d Compare March 29, 2026 13:45

[server] Add server-side filter execution and client-side integration

9502649

platinumhamburg force-pushed the filter-server-client-integration branch from 3cd1f6d to 9502649 Compare March 30, 2026 05:55

platinumhamburg added 4 commits March 30, 2026 15:02

[server] Fix code review issues for filter pushdown

59811c2

get null count from arrow meta

63db1b5

wuchong reviewed Apr 2, 2026

View reviewed changes

Revert "get null count from arrow meta"

cdb8d9e

This reverts commit 63db1b5.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[server] Add server-side filter execution and client-side integration#2951

[server] Add server-side filter execution and client-side integration#2951
platinumhamburg wants to merge 6 commits intoapache:mainfrom
platinumhamburg:filter-server-client-integration

platinumhamburg commented Mar 28, 2026

Uh oh!

wuchong Mar 31, 2026

Uh oh!

wuchong Mar 31, 2026

Uh oh!

wuchong Apr 2, 2026

Uh oh!

wuchong Apr 2, 2026

Uh oh!

wuchong Apr 2, 2026

Uh oh!

wuchong Apr 2, 2026

Uh oh!

wuchong Apr 2, 2026

Uh oh!

wuchong Apr 2, 2026

Uh oh!

wuchong Apr 2, 2026 •

edited

Loading

Uh oh!

wuchong Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		default Scan filter(@Nullable Predicate predicate) {
		throw new UnsupportedOperationException("Filter pushdown is not supported by this Scan.");

Conversation

platinumhamburg commented Mar 28, 2026

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wuchong Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wuchong Apr 2, 2026 •

edited

Loading