Skip to content

Skip map_expressions rebuild for Extension nodes with empty expressions#21701

Open
zhuqi-lucas wants to merge 1 commit intoapache:mainfrom
zhuqi-lucas:skip-map-expressions-empty-extension
Open

Skip map_expressions rebuild for Extension nodes with empty expressions#21701
zhuqi-lucas wants to merge 1 commit intoapache:mainfrom
zhuqi-lucas:skip-map-expressions-empty-extension

Conversation

@zhuqi-lucas
Copy link
Copy Markdown
Contributor

@zhuqi-lucas zhuqi-lucas commented Apr 17, 2026

Which issue does this PR close?

Closes #21700.

Rationale for this change

When an Extension node has no expressions, map_expressions was still cloning all inputs and calling with_exprs_and_inputs to reconstruct the node — wasted work since there are no expressions to transform. This is common for Extension nodes like view matching candidates that carry multiple children but no expressions.

What changes are included in this PR?

  1. Code change (datafusion/expr/src/logical_plan/tree_node.rs): Add early return when node.expressions() is empty, skipping the clone + rebuild path.

  2. Micro-benchmark (datafusion/expr/benches/map_expressions.rs): Criterion benchmark comparing map_expressions on Extension nodes with and without expressions, varying the number of children (1, 3, 5, 10).

Are these changes tested?

Yes — existing tests pass, and the new benchmark validates the optimization.

Benchmark results:

Children no_expr (optimized) with_expr (rebuild) Speedup
1 24 ns 167 ns 7x
3 23 ns 192 ns 8x
5 23 ns 181 ns 8x
10 24 ns 216 ns 9x

The no_expr path is constant time regardless of children count.

In a real optimizer pipeline (~15 rules × 5-child Extension), this saves ~2.4 us per optimization pass.

Next step

A more general optimization would be to change UserDefinedLogicalNode::expressions() to return references (&[Expr]) instead of cloned Vec<Expr>, and only clone + rebuild when the transform actually modifies an expression. This would avoid the clone + with_exprs_and_inputs rebuild even for non-empty expression lists when the transform is a no-op. Added a TODO comment in the code for this direction. This would be a larger API change, so the empty-expressions shortcut is a pragmatic first step.

Are there any user-facing changes?

No — purely internal optimization. No API changes.

Copilot AI review requested due to automatic review settings April 17, 2026 15:18
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Apr 17, 2026
@zhuqi-lucas zhuqi-lucas force-pushed the skip-map-expressions-empty-extension branch from 8157855 to 90d4f6e Compare April 17, 2026 15:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes LogicalPlan::map_expressions for Extension nodes that expose no expressions, avoiding unnecessary cloning/rebuild work, and adds a Criterion micro-benchmark to quantify the improvement.

Changes:

  • Add an early return in map_expressions for Extension nodes when node.expressions() is empty.
  • Add a Criterion benchmark measuring map_expressions behavior for Extension nodes with/without expressions across varying child counts.
  • Register the new benchmark in datafusion-expr’s dev-dependencies and bench configuration.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 4 comments.

File Description
datafusion/expr/src/logical_plan/tree_node.rs Skip rebuilding Extension nodes in map_expressions when there are no expressions to transform.
datafusion/expr/benches/map_expressions.rs Add micro-benchmark for map_expressions on Extension nodes with/without expressions.
datafusion/expr/Cargo.toml Add Criterion dev-dependency and register the new bench target.
Cargo.lock Lockfile updates due to added Criterion dependency/versioning.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread datafusion/expr/Cargo.toml Outdated
Comment thread datafusion/expr/benches/map_expressions.rs Outdated
Comment thread datafusion/expr/benches/map_expressions.rs Outdated
Comment thread datafusion/expr/benches/map_expressions.rs
When an Extension node has no expressions, `map_expressions` was still
cloning all inputs and calling `with_exprs_and_inputs` to reconstruct
the node — wasted work since there are no expressions to transform.

This adds an early return that skips the expensive clone + rebuild when
`node.expressions()` is empty, which is common for Extension nodes like
view matching candidates (OneOf) that carry multiple children but no
expressions.

Benchmark results (criterion, `datafusion-expr` micro-benchmark):

| Children | no_expr (optimized) | with_expr (rebuild) | Speedup |
|----------|--------------------|--------------------|---------|
| 1        | 24 ns              | 167 ns             | 7x      |
| 3        | 23 ns              | 192 ns             | 8x      |
| 5        | 23 ns              | 181 ns             | 8x      |
| 10       | 24 ns              | 216 ns             | 9x      |

The `no_expr` path is constant time regardless of children count,
while `with_expr` scales with the number of children (clone cost).

In a real optimizer pipeline with ~15 rules traversing a plan with
OneOf(5 candidates), this saves ~2.4 us per optimization pass.
@zhuqi-lucas zhuqi-lucas force-pushed the skip-map-expressions-empty-extension branch from 90d4f6e to f24a39a Compare April 17, 2026 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Skip map_expressions rebuild for Extension nodes with empty expressions

2 participants