Add lambda support and array_transform udf by gstvg · Pull Request #21679 · apache/datafusion

gstvg · 2026-04-16T16:11:56Z

This a clean version of #18921 to make it easier to review

this is a breaking change due to adding variant to Expr enum, new methods on traits Session, FunctionRegistry and ContextProvider and a new arg on TaskContext::new

This PR adds support for lambdas with column capture and the array_transform function used to test the lambda implementation.

Example usage:

SELECT array_transform([2, 3], v -> v != 2);

[false, true]

-- arbitrally nested lambdas are also supported
SELECT array_transform([[[2, 3]]], m -> array_transform(m, l -> array_transform(l, v -> v*2)));

[[[4, 6]]]

Note: column capture has been removed for now and will be added on a follow on PR, see #21172

Some comments on code snippets of this doc show what value each struct, variant or field would hold after planning the first example above. Some literals are simplified pseudo code

3 new Expr variants are added, HigherOrderFunction, owing a new trait HigherOrderUDF, which is like a ScalarFunction/ScalarUDFImpl with support for lambdas, Lambda, for the lambda body and it's parameters names, and LambdaVariable, which is like Column but for lambdas parameters.

Their logical representations:

enum Expr {
    // array_transform([2, 3], v -> v != 2)
    HigherOrderFunction(HigherOrderFunction),
    // v -> v != 2
    Lambda(Lambda),
    // v, of the lambda body: v != 2
    LambdaVariable(LambdaVariable),
   ...
}

// array_transform([2, 3], v -> v != 2)
struct HigherOrderFunction {
    // global instance of array_transform
    pub func: Arc<dyn HigherOrderUDF>,
    // [Expr::ScalarValue([2, 3]), Expr::Lambda(v -> v != 2)]
    pub args: Vec<Expr>,
}

// v -> v != 2
struct Lambda {
    // ["v"]
    pub params: Vec<String>,
    // v != 2
    pub body: Box<Expr>,
}

// v, of the lambda body: v != 2
struct LambdaVariable {
    // "v"
    pub name: String,
    // Field::new("", DataType::Int32, false) 
    // Note: a follow on PR will make this field optional
    // to free expr_api from specifying it beforehand, 
    // and add resolve_lambda_variables method to Expr,
    // similar to Expr::Placeholder, see #21172
    pub field: FieldRef, 
    pub spans: Spans,
}

The example would be planned into a tree like this:

HigherOrderFunctionExpression
  name: array_transform
  children:
    1. ListExpression [2,3]
    2. LambdaExpression
         parameters: ["v"]
         body:
            BinaryExpression (!=)
              left:
                 LambdaVariableExpression("v", Field::new("", Int32, false))
              right:
                 LiteralExpression("2")

The physical counterparts definition:

struct HigherOrderFunctionExpr {
    // global instance of array_transform
    fun: Arc<dyn HigherOrderUDF>,
    // "array_transform"
    name: String,
    // [LiteralExpr([2, 3], LambdaExpr("v -> v != 2"))]
    args: Vec<Arc<dyn PhysicalExpr>>,
    // [1], the positions at args that contains lambdas
    lambda_positions: Vec<usize>,
    // Field::new("", DataType::new_list(DataType::Boolean, false), false)
    return_field: FieldRef,
    config_options: Arc<ConfigOptions>, 
}


struct LambdaExpr {
    // ["v"]
    params: Vec<String>,
    // v -> v != 2
    body: Arc<dyn PhysicalExpr>,
}

struct LambdaVariable {
    // Field::new("v", DataType::Int32, false)
    field: FieldRef,
    // 0, the first and only parameter, "v"
    index: usize,
}

Note: For those who primarly wants to check if this lambda implementation supports their usecase and don't want to spend much time here, it's okay to skip most collapsed blocks, as those serve mostly to help code reviewers, with the exception of HigherOrderUDF and the array_transform implementation of HigherOrderUDF relevant methods, collapsed due to their size

The added HigherOrderUDF trait is almost a clone of ScalarUDFImpl, with the exception of:

return_field_from_args and invoke_with_args, where now args.args is a list of enums with two variants: Value or Lambda instead of a list of values
the addition of lambda_parameters, which return a Field for each parameter supported for every lambda argument based on the Field of the non lambda arguments
the removal of return_field and the deprecated ones is_nullable and display_name.
Not yet includes analogues to the methods preimage, placement, evaluate_bounds, propagate_constraints, output_ordering and preserves_lex_ordering

HigherOrderUDF

trait HigherOrderUDF {
    /// Return the field of all the parameters supported by all the supported lambdas of this function
    /// based on the field of the value arguments. If a lambda support multiple parameters, or if multiple
    /// lambdas are supported and some are optional, all should be returned,
    /// regardless of whether they are used on a particular invocation
    ///
    /// Tip: If you have a [`HigherOrderFunction`] invocation, you can call the helper
    /// [`HigherOrderFunction::lambda_parameters`] instead of this method directly
    ///
    /// [`HigherOrderFunction`]: crate::expr::HigherOrderFunction
    /// [`HigherOrderFunction::lambda_parameters`]: crate::expr::HigherOrderFunction::lambda_parameters
    ///
    /// Example for array_transform:
    ///
    /// `array_transform([2.0, 8.0], v -> v > 4.0)`
    ///
    /// ```ignore
    /// let lambda_parameters = array_transform.lambda_parameters(&[
    ///      Arc::new(Field::new("", DataType::new_list(DataType::Float32, false))), // the Field of the literal `[2, 8]`
    /// ])?;
    ///
    /// assert_eq!(
    ///      lambda_parameters,
    ///      vec![
    ///         // the lambda supported parameters, regardless of how many are actually used
    ///         vec![
    ///             // the value being transformed
    ///             Field::new("", DataType::Float32, false),
    ///         ]
    ///      ]
    /// )
    /// ```
    ///
    /// The implementation can assume that some other part of the code has coerced
    /// the actual argument types to match [`Self::signature`].
    fn lambda_parameters(&self, value_fields: &[FieldRef]) -> Result<Vec<Vec<Field>>>;
    fn return_field_from_args(&self, args: LambdaReturnFieldArgs) -> Result<FieldRef>;
    fn invoke_with_args(&self, args: HigherOrderFunctionArgs) -> Result<ColumnarValue>;
   // ... omitted methods that are similar in ScalarUDFImpl
}

/// An argument to a lambda function
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum ValueOrLambda<V, L> {
    /// A value with associated data
    Value(V),
    /// A lambda with associated data
    Lambda(L),
}

/// Information about arguments passed to the function
///
/// This structure contains metadata about how the function was called
/// such as the type of the arguments, any scalar arguments and if the
/// arguments can (ever) be null
///
/// See [`HigherOrderUDF::return_field_from_args`] for more information
#[derive(Clone, Debug)]
pub struct LambdaReturnFieldArgs<'a> {
    /// The data types of the arguments to the function
    ///
    /// If argument `i` to the function is a lambda, it will be the field of the result of the
    /// lambda if evaluated with the parameters returned from [`HigherOrderUDF::lambda_parameters`]
    ///
    /// For example, with `array_transform([1], v -> v == 5)`
    /// this field will be `[
    ///     ValueOrLambda::Value(Field::new("", DataType::List(DataType::Int32), false)),
    ///     ValueOrLambda::Lambda(Field::new("", DataType::Boolean, false))
    /// ]`
    pub arg_fields: &'a [ValueOrLambda<FieldRef, FieldRef>],
    /// Is argument `i` to the function a scalar (constant)?
    ///
    /// If the argument `i` is not a scalar, it will be None
    ///
    /// For example, if a function is called like `array_transform([1], v -> v == 5)`
    /// this field will be `[Some(ScalarValue::List(...), None]`
    pub scalar_arguments: &'a [Option<&'a ScalarValue>],
}

/// Arguments passed to [`HigherOrderUDF::invoke_with_args`] when invoking a
/// lambda function.
#[derive(Debug, Clone)]
pub struct HigherOrderFunctionArgs {
    /// The evaluated arguments and lambdas to the function
    pub args: Vec<ValueOrLambda<ColumnarValue, LambdaArgument>>,
    /// Field associated with each arg, if it exists
    /// For lambdas, it will be the field of the result of
    /// the lambda if evaluated with the parameters
    /// returned from [`HigherOrderUDF::lambda_parameters`]
    pub arg_fields: Vec<ValueOrLambda<FieldRef, FieldRef>>,
    /// The number of rows in record batch being evaluated
    pub number_rows: usize,
    /// The return field of the lambda function returned
    /// (from `return_field_from_args`) when creating the
    /// physical expression from the logical expression
    pub return_field: FieldRef,
    /// The config options at execution time
    pub config_options: Arc<ConfigOptions>,
}

/// A lambda argument to a HigherOrderFunction
#[derive(Clone, Debug)]
pub struct LambdaArgument {
    /// The parameters defined in this lambda
    ///
    /// For example, for `array_transform([2], v -> -v)`,
    /// this will be `vec![Field::new("v", DataType::Int32, true)]`
    params: Vec<FieldRef>,
    /// The body of the lambda
    ///
    /// For example, for `array_transform([2], v -> -v)`,
    /// this will be the physical expression of `-v`
    body: Arc<dyn PhysicalExpr>,
}

impl LambdaArgument {
    /// Evaluate this lambda
    /// `args` should evalute to the value of each parameter
    /// of the correspondent lambda returned in [HigherOrderUDF::lambda_parameters].
    pub fn evaluate(
        &self,
        args: &[&dyn Fn() -> Result<ArrayRef>],
    ) -> Result<ColumnarValue> {
        let columns = args
            .iter()
            .take(self.params.len())
            .map(|arg| arg())
            .collect::<Result<_>>()?;

        let schema = Arc::new(Schema::new(self.params.clone()));

        let batch = RecordBatch::try_new(schema, columns)?;

        self.body.evaluate(&batch)
    }
}

array_transform lambda_parameters implementation

impl HigherOrderUDF for ArrayTransform {
fn lambda_parameters(&self, value_fields: &[FieldRef]) -> Result<Vec<Vec<Field>>> {
        let list = if value_fields.len() == 1 {
            &value_fields[0]
        } else {
            return plan_err!(
                "{} function requires 1 value arguments, got {}",
                self.name(),
                value_fields.len()
            );
        };

        let field = match list.data_type() {
            DataType::List(field) => field,
            DataType::LargeList(field) => field,
            DataType::FixedSizeList(field, _) => field,
            _ => return plan_err!("expected list, got {list}"),
        };

        // we don't need to check whether the lambda contains more than two parameters,
        // e.g. array_transform([], (v, i, j) -> v+i+j), as datafusion will do that for us
        let value = Field::new("", field.data_type().clone(), field.is_nullable())
            .with_metadata(field.metadata().clone());

        Ok(vec![vec![value]])
    }
}

array_transform return_field_from_args implementation

fn value_lambda_pair<'a, V: Debug, L: Debug>(
    name: &str,
    args: &'a [ValueOrLambda<V, L>],
) -> Result<(&'a V, &'a L)> {
    let [value, lambda] = take_function_args(name, args)?;

    let (ValueOrLambda::Value(value), ValueOrLambda::Lambda(lambda)) = (value, lambda)
    else {
        return plan_err!(
            "{name} expects a value followed by a lambda, got {value:?} and {lambda:?}"
        );
    };

    Ok((value, lambda))
}

impl HigherOrderUDF for ArrayTransform {
    fn return_field_from_args(
        &self,
        args: HigherOrderReturnFieldArgs,
    ) -> Result<Arc<Field>> {
        let (list, lambda) = value_lambda_pair(self.name(), args.arg_fields)?;

        // lambda is the resulting field of executing the lambda body
        // with the parameters returned in lambda_parameters
        let field = Arc::new(Field::new(
            Field::LIST_FIELD_DEFAULT_NAME,
            lambda.data_type().clone(),
            lambda.is_nullable(),
        ));

        let return_type = match list.data_type() {
            DataType::List(_) => DataType::List(field),
            DataType::LargeList(_) => DataType::LargeList(field),
            DataType::FixedSizeList(_, size) => DataType::FixedSizeList(field, *size),
            other => plan_err!("expected list, got {other}")?,
        };

        Ok(Arc::new(Field::new("", return_type, list.is_nullable())))
    }
}

array_transform invoke_with_args implementation

impl HigherOrderUDF for ArrayTransform {
fn invoke_with_args(&self, args: HigherOrderFunctionArgs) -> Result<ColumnarValue> {
        let (list, lambda) = value_lambda_pair(self.name(), &args.args)?;

        let list_array = list.to_array(args.number_rows)?;

        // Fast path for fully null input array and also the only way to safely work with
        // a fully null fixed size list array as it can't be handled by remove_list_null_values below
        if list_array.null_count() == list_array.len() {
            return Ok(ColumnarValue::Array(new_null_array(
                args.return_type(),
                list_array.len(),
            )));
        }

        // as per list_values docs, if list_array is sliced, list_values will be sliced too,
        // so before constructing the transformed array below, we must adjust the list offsets with
        // adjust_offsets_for_slice
        let list_values = list_values(&list_array)?;

        // by passing closures, lambda.evaluate can evaluate only those actually needed
        let values_param = || Ok(Arc::clone(&list_values));

        // call the transforming lambda
        let transformed_values = lambda
            .evaluate(&[&values_param])?
            .into_array(list_values.len())?;

        let field = match args.return_field.data_type() {
            DataType::List(field)
            | DataType::LargeList(field)
            | DataType::FixedSizeList(field, _) => Arc::clone(field),
            _ => {
                return exec_err!(
                    "{} expected ScalarFunctionArgs.return_field to be a list, got {}",
                    self.name(),
                    args.return_field
                );
            }
        };

        let transformed_list = match list_array.data_type() {
            DataType::List(_) => {
                let list = list_array.as_list();

                // since we called list_values above which would return sliced values for
                // a sliced list, we must adjust the offsets here as otherwise they would be invalid
                let adjusted_offsets = adjust_offsets_for_slice(list);

                Arc::new(ListArray::new(
                    field,
                    adjusted_offsets,
                    transformed_values,
                    list.nulls().cloned(),
                )) as ArrayRef
            }
            DataType::LargeList(_) => {
                let large_list = list_array.as_list();

                // since we called list_values above which would return sliced values for
                // a sliced list, we must adjust the offsets here as otherwise they would be invalid
                let adjusted_offsets = adjust_offsets_for_slice(large_list);

                Arc::new(LargeListArray::new(
                    field,
                    adjusted_offsets,
                    transformed_values,
                    large_list.nulls().cloned(),
                ))
            }
            DataType::FixedSizeList(_, value_length) => {
                Arc::new(FixedSizeListArray::new(
                    field,
                    *value_length,
                    transformed_values,
                    list_array.as_fixed_size_list().nulls().cloned(),
                ))
            }
            other => exec_err!("expected list, got {other}")?,
        };

        Ok(ColumnarValue::Array(transformed_list))
    }
}

How relevant HigherOrderUDF methods would be called and what they would return during planning and evaluation of the example

// this is called at sql planning
let lambda_parameters = lambda_udf.lambda_parameters(&[
    Field::new("", DataType::new_list(DataType::Int32, false), false), // the Field of the [2, 3] literal
])?;

assert_eq!(
    lambda_parameters,
    vec![
            // the parameters that *can* be declared on the lambda, and not only 
            // those actually declared: the implementation doesn't need to care 
            // about it
            vec![
                Field::new("", DataType::Int32, false), // the list inner value
            ]]
);



// this is called every time ExprSchemable is called on a HigherOrderFunction
let return_field = array_transform.return_field_from_args(&LambdaReturnFieldArgs {
    arg_fields: &[
        ValueOrLambda::Value(Field::new("", DataType::new_list(DataType::Int32, false), false)),
        ValueOrLambda::Lambda(Field::new("", DataType::Boolean, false)), // the return_field of the expression "v != 2" when "v" is of the type returned in lambda_parameters
    ],
    scalar_arguments // irrelevant
})?;

assert_eq!(return_field, Field::new("", DataType::new_list(DataType::Boolean, false), false));



let value = array_transform.evaluate(&HigherOrderFunctionArgs {
    args: vec![
        ValueOrLambda::Value(List([2, 3])),
        ValueOrLambda::Lambda(LambdaArgument of `v -> v != 2`),
    ],
    arg_fields, // same as above
    number_rows: 1,
    return_field, // same as above
    config_options, // irrelevant
})?;

assert_eq!(value, BooleanArray::from([false, true]))

A pair HigherOrderUDF/HigherOrderUDFImpl like ScalarFunction was not used because those exist only to maintain backwards compatibility with the older API #8045

Why LambdaVariable and not Column:

Existing tree traversals that operate on columns would break if some column nodes referenced to a lambda parameter and not a real column. In the example query, projection pushdown would try to push the lambda parameter "v", which won't exist in table "t".

Example of code of another traversal that would break:

fn minimize_join_filter(expr: Arc<dyn PhysicalExpr>, ...) -> JoinFilter {
    let mut used_columns = HashSet::new();
    expr.apply(|expr| {
        if let Some(col) = expr.as_any().downcast_ref::<Column>() {
            // if this is a lambda column, this function will break
            used_columns.insert(col.index());
        }
        Ok(TreeNodeRecursion::Continue)
    });
    ...
}

Furthermore, the implemention of ExprSchemable and PhysicalExpr::return_field for Column expects that the schema it receives as a argument contains an entry for its name, which is not the case for lambda parameters.

By including a FieldRef on LambdaVariable that should be resolved during construction time in the sql planner, ExprSchemable and PhysicalExpr::return_field simply return it's own Field:

LambdaVariable ExprSchemable and PhysicalExpr::return_field implementation

impl ExprSchemable for Expr {
   fn to_field(
        &self,
        schema: &dyn ExprSchema,
    ) -> Result<(Option<TableReference>, Arc<Field>)> {
        let (relation, schema_name) = self.qualified_name();
        let field = match self {
           Expr::LambdaVariable(l) => Ok(Arc::clone(&l.field)),
           ...
        }?;

        Ok((
            relation,
            Arc::new(field.as_ref().clone().with_name(schema_name)),
        ))
    }
    ...
}

impl PhysicalExpr for LambdaVariable {
    fn return_field(&self, _input_schema: &Schema) -> Result<FieldRef> {
        Ok(Arc::clone(&self.field))
    }
    ...
}

Possible alternatives discarded due to complexity, requiring downstream changes and implementation size:

Add a new set of TreeNode methods that provides the set of lambdas parameters names seen during the traversal, so column nodes can be tested if they refer to a regular column or to a lambda parameter. Any downstream user that wants to support lambdas would need use those methods instead of the existing ones. This also would add 1k+ lines to the PR.

impl Expr {
    pub fn transform_with_lambdas_params<
        F: FnMut(Self, &HashSet<String>) -> Result<Transformed<Self>>,
    >(
        self,
        mut f: F,
    ) -> Result<Transformed<Self>> {}
}

How minimize_join_filter would looks like:

fn minimize_join_filter(expr: Arc<dyn PhysicalExpr>, ...) -> JoinFilter {
    let mut used_columns = HashSet::new();
    expr.apply_with_lambdas_params(|expr, lambdas_params| {
        if let Some(col) = expr.as_any().downcast_ref::<Column>() {
            // dont include lambdas parameters
            if !lambdas_params.contains(col.name()) {
                used_columns.insert(col.index());
            }
        }
        Ok(TreeNodeRecursion::Continue)
    })
    ...
}

Add a flag to the Column node indicating if it refers to a lambda parameter. Still requires checking for it on existing tree traversals that works on Columns (30+) and also downstream.

//logical
struct Column {
    pub relation: Option<TableReference>,
    pub name: String,
    pub spans: Spans,
    pub is_lambda_parameter: bool,
}

//physical
struct Column {
    name: String,
    index: usize,
    is_lambda_parameter: bool,
}

How minimize_join_filter would look like:

fn minimize_join_filter(expr: Arc<dyn PhysicalExpr>, ...) -> JoinFilter {
    let mut used_columns = HashSet::new();
    expr.apply(|expr| {
        if let Some(col) = expr.as_any().downcast_ref::<Column>() {
            // dont include lambdas parameters
            if !col.is_lambda_parameter {
                used_columns.insert(col.index());
            }
        }
        Ok(TreeNodeRecursion::Continue)
    })
    ...
}

Add a new set of TreeNode methods that provides a schema that includes the lambdas parameters for the scope of the node being visited/transformed:

impl Expr {
    pub fn transform_with_schema<
        F: FnMut(Self, &DFSchema) -> Result<Transformed<Self>>,
    >(
        self,
        schema: &DFSchema,
        f: F,
    ) -> Result<Transformed<Self>> { ... }
    ... other methods
}

For any given HigherOrderFunction found during the traversal, a new schema is created for each lambda argument that contains it's parameter, returned from HigherOrderUDF::lambda_parameters
How it would look like:

pub fn infer_placeholder_types(self, schema: &DFSchema) -> Result<(Expr, bool)> {
        let mut has_placeholder = false;
        // Provide the schema as the first argument. 
        // Transforming closure receive an adjusted_schema as argument
        self.transform_with_schema(schema, |mut expr, adjusted_schema| {
            match &mut expr {
                // Default to assuming the arguments are the same type
                Expr::BinaryExpr(BinaryExpr { left, op: _, right }) => {
                    // use adjusted_schema and not schema. Those expressions may contain 
                    // columns referring to a lambda parameter, which Field would only be
                    // available in adjusted_schema and not in schema
                    rewrite_placeholder(left.as_mut(), right.as_ref(), adjusted_schema)?;
                    rewrite_placeholder(right.as_mut(), left.as_ref(), adjusted_schema)?;
                }
    ....

Make available trought LogicalPlan and ExecutionPlan nodes a schema that includes all lambdas parameters from all expressions owned by the node, and use this schema for tree traversals. For nodes which won't own any expression, the regular schema can be returned

impl LogicalPlan {
    fn lambda_extended_schema(&self) -> &DFSchema;
}

trait ExecutionPlan {
    fn lambda_extended_schema(&self) -> &DFSchema;
}

//usage
impl LogicalPlan {
    pub fn replace_params_with_values(
            self,
            param_values: &ParamValues,
        ) -> Result<LogicalPlan> {
            self.transform_up_with_subqueries(|plan| {
                // use plan.lambda_extended_schema() containing lambdas parameters
                // instead of plan.schema() which wont
                let lambda_extended_schema = Arc::clone(plan.lambda_extended_schema());
                let name_preserver = NamePreserver::new(&plan);
                plan.map_expressions(|e| {
                    // if this expression is child of lambda and contain columns referring it's parameters
                    // the lambda_extended_schema already contain them
                    let (e, has_placeholder) = e.infer_placeholder_types(&lambda_extended_schema)?;
    ....

rluvaton · 2026-04-16T16:45:12Z

+    }
+
+    fn children(&self) -> Vec<&Arc<dyn PhysicalExpr>> {
+        self.args.iter().collect()


The function itself should be included in the children, otherwise you cant access part of the expression tree

as talked here:

#18921 (comment)

please also add a comment explaining why it is important

I believe this is about the lambda functions, right? All lambda functions of a given higher-order function are stored in self.args
If this is about the higher-order function itself, it shouldn't be included, in the same way scalar function doesn't include itself in it's children, right?

I see that the other PR you reviewed, #17220, the lambda functions aren't stored in the higher-order function and instead are resolved in function implementation. Should we do this here too?

Finally, physical expressions of a concrete higher-order function (not the case here), like array_exists being done in comet datafusion-comet#3611, does store the arg and the lambda function in different properties [1], and thus it's children method requires what I believe you are asking [2]

Again, if we should do this here too, please let me know, thanks

Oh, I missed that func is Arc<dyn HigherOrderUDF>

rluvaton · 2026-04-16T16:46:50Z

+        }
+        Expr::Lambda(Lambda { params, body }) => {
+            if body.any_column_refs() {
+                return plan_err!("lambda doesn't support column capture");


please add a link to the issue that talk about column capture in lambda support

Also done at 94e37db, thanks

rluvaton · 2026-04-16T16:49:39Z

+            // LambdaVariable.field will be made optional as in Expr::Placeholder
+            // and only LambdaVariable.name used, and field.name ignored,
+            // so they're not enforced to match for logical expressions
+            if field.data_type() != schema_field.data_type()
+                || field.is_nullable() != schema_field.is_nullable()
+                || field.metadata() != schema_field.metadata()
+                || field.dict_is_ordered() != schema_field.dict_is_ordered()


this can this can a source of bugs when adding properties to field

Using Field::eq at 94e37db, thanks

comphead · 2026-04-16T22:43:16Z

Given the PR size, I asked Claude to make initial review, here are some findings

Findings                                            
                                                                                                                                                                                                                                                                                                                                                                                                                      
  1. .to_usize().unwrap() → .as_usize() in datafusion/common/src/utils/mod.rs                                                                                                                            
  Lines 1006, 1007, 1025 use .to_usize().unwrap() on arrow offset values. The same file already uses the infallible .as_usize() at lines 1109, 1116, 1117, and the broader codebase does too               
  (functions-nested/src/position.rs). Simple consistency fix — .as_usize() does the same thing without the panic risk.                                                                                   
                                                                                                                                                                                                           
  2. Derive lambda_positions from arg_fields in HigherOrderFunctionExpr::try_new                                                                                                                           
  higher_order_function.rs:105-146 — try_new iterates args twice: once to build arg_fields (checking downcast_ref::<LambdaExpr>()), then again to build lambda_positions (checking is::<LambdaExpr>()).    
  Since arg_fields already encodes which positions are lambdas (ValueOrLambda::Lambda vs Value), lambda_positions can be derived from it directly, eliminating the redundant iteration and type check.

what is ComparisonExpression in the description?

…erOrderFunctionExpr::try_new

…ning

gstvg · 2026-04-17T06:17:20Z

@comphead suggestions implemented at 02ccbb6 and 9455140
ComparisonExpression should be BinaryExpression, it's fixed now, thanks

martin-g · 2026-04-17T08:42:06Z

    fn scalar_functions(&self) -> &HashMap<String, Arc<ScalarUDF>>;

+    /// Return reference to higher_order_functions
+    fn higher_order_functions(&self) -> &HashMap<String, Arc<dyn HigherOrderUDF>>;


I wonder whether it would be a good idea to return an empty HashMap by default would prevent some broken builds for third party implementations

Not against it, but #20312 also added a breking method. If both PR get's released together then I think this doesn't do much a difference, WDYT?

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

…e projection/optimization Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

…unparser Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

rluvaton · 2026-04-19T08:14:09Z

+    fun: Arc<dyn HigherOrderUDF>,
+    name: String,
+    args: Vec<Arc<dyn PhysicalExpr>>,
+    lambda_positions: Vec<usize>,
+    return_field: FieldRef,


Please add comments what each property is, and give an example using array_transform

rluvaton · 2026-04-19T08:17:57Z

+    /// [PhysicalExpr::evaluate] will not be called. The lambda *body* should be wrapped instead
+    /// If any arg referenced by `lambda_positions` does not contain a lambda or contains a wrapper
+    /// with multiple children before finding the lambda, the function evaluation will error
+    pub fn new(


I would rename this to try_new and return result, and rename the current try_new to something else

so we could add validation later without breaking api, like verification for lambda positions

rluvaton · 2026-04-19T08:20:19Z

+    fn data_type(&self, _input_schema: &Schema) -> Result<DataType> {
+        Ok(self.return_field.data_type().clone())
+    }
+
+    fn nullable(&self, _input_schema: &Schema) -> Result<bool> {
+        Ok(self.return_field.is_nullable())
+    }


because users can create HigherOrderFunctionExpr with return_field using new that is not marked as unsafe, this could lead to datatype mismatch, can you please validate that the type match the function return type, same for nullable

rluvaton · 2026-04-19T08:21:17Z

+    fn return_field(&self, _input_schema: &Schema) -> Result<FieldRef> {
+        Ok(Arc::clone(&self.return_field))
+    }


if you have return field you don't need data type and nullable, also, please add the validation that the function output the same data type as what you return here

rluvaton · 2026-04-19T08:23:16Z

+            &self.name,
+            Arc::clone(&self.fun),
+            children,
+            self.lambda_positions.clone(),


I think we should verify that the lambda positions are still valid, no?

comphead · 2026-04-19T17:57:49Z

run benchmark tpcds tpch

adriangbot · 2026-04-19T18:01:04Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4276485465-1568-gkvh6 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing lambda_and_array_transform (1b668db) to dc6142e (merge-base) diff using: tpch
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-19T18:01:41Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4276485465-1567-tknqw 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing lambda_and_array_transform (1b668db) to dc6142e (merge-base) diff using: tpcds
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-19T18:19:41Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and lambda_and_array_transform
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                     HEAD ┃               lambda_and_array_transform ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │              6.61 / 7.04 ±0.78 / 8.59 ms │              6.57 / 7.00 ±0.78 / 8.55 ms │     no change │
│ QQuery 2  │        148.88 / 149.91 ±0.68 / 150.77 ms │        145.20 / 146.40 ±0.78 / 147.32 ms │     no change │
│ QQuery 3  │        114.60 / 115.75 ±0.87 / 117.27 ms │        114.59 / 114.99 ±0.35 / 115.65 ms │     no change │
│ QQuery 4  │    1259.94 / 1291.14 ±18.08 / 1308.68 ms │    1263.75 / 1289.57 ±14.66 / 1307.09 ms │     no change │
│ QQuery 5  │        172.98 / 175.10 ±1.69 / 177.62 ms │        172.55 / 174.34 ±1.76 / 177.24 ms │     no change │
│ QQuery 6  │       856.71 / 877.64 ±16.87 / 895.44 ms │       845.92 / 884.90 ±27.31 / 929.75 ms │     no change │
│ QQuery 7  │        339.62 / 343.82 ±3.91 / 350.18 ms │        339.48 / 341.55 ±1.52 / 344.13 ms │     no change │
│ QQuery 8  │        117.42 / 117.92 ±0.28 / 118.26 ms │        113.47 / 116.03 ±1.90 / 119.34 ms │     no change │
│ QQuery 9  │        107.06 / 109.39 ±4.30 / 117.98 ms │        101.14 / 105.83 ±3.09 / 109.09 ms │     no change │
│ QQuery 10 │        107.51 / 108.68 ±0.82 / 109.69 ms │        107.00 / 107.49 ±0.55 / 108.55 ms │     no change │
│ QQuery 11 │       884.66 / 906.54 ±11.56 / 917.16 ms │        891.81 / 903.07 ±9.38 / 915.35 ms │     no change │
│ QQuery 12 │           45.77 / 46.41 ±0.35 / 46.83 ms │           43.68 / 45.53 ±1.33 / 47.07 ms │     no change │
│ QQuery 13 │        399.93 / 403.11 ±2.66 / 406.82 ms │        399.90 / 402.97 ±1.80 / 405.02 ms │     no change │
│ QQuery 14 │     1012.58 / 1017.86 ±2.91 / 1021.26 ms │       993.02 / 999.33 ±4.42 / 1004.54 ms │     no change │
│ QQuery 15 │           15.71 / 16.60 ±0.91 / 18.19 ms │           15.32 / 16.48 ±1.61 / 19.63 ms │     no change │
│ QQuery 16 │              7.06 / 7.92 ±0.94 / 9.65 ms │              7.21 / 7.33 ±0.11 / 7.54 ms │ +1.08x faster │
│ QQuery 17 │        225.18 / 227.04 ±1.78 / 229.28 ms │        225.90 / 228.33 ±1.95 / 231.80 ms │     no change │
│ QQuery 18 │        125.45 / 127.78 ±1.24 / 128.91 ms │        125.34 / 127.14 ±1.48 / 129.60 ms │     no change │
│ QQuery 19 │        153.88 / 155.87 ±1.18 / 157.21 ms │        154.35 / 155.15 ±0.83 / 156.59 ms │     no change │
│ QQuery 20 │           13.98 / 14.11 ±0.07 / 14.19 ms │           13.43 / 13.76 ±0.43 / 14.59 ms │     no change │
│ QQuery 21 │           20.09 / 20.61 ±0.45 / 21.30 ms │           19.49 / 19.85 ±0.23 / 20.17 ms │     no change │
│ QQuery 22 │        472.27 / 475.86 ±3.75 / 483.06 ms │        478.41 / 481.79 ±3.45 / 488.41 ms │     no change │
│ QQuery 23 │       867.55 / 879.52 ±13.48 / 896.54 ms │        862.53 / 869.55 ±5.02 / 877.16 ms │     no change │
│ QQuery 24 │        380.25 / 385.43 ±3.84 / 391.83 ms │        378.26 / 382.67 ±3.45 / 386.50 ms │     no change │
│ QQuery 25 │        341.13 / 342.23 ±0.94 / 343.96 ms │        337.76 / 339.83 ±1.54 / 342.53 ms │     no change │
│ QQuery 26 │           82.55 / 83.50 ±0.96 / 84.65 ms │           81.13 / 82.14 ±0.67 / 82.94 ms │     no change │
│ QQuery 27 │              6.76 / 6.98 ±0.21 / 7.29 ms │              6.86 / 7.37 ±0.52 / 8.37 ms │  1.06x slower │
│ QQuery 28 │        148.08 / 148.69 ±0.39 / 149.31 ms │        148.24 / 150.76 ±1.71 / 153.19 ms │     no change │
│ QQuery 29 │        281.01 / 282.93 ±1.29 / 284.30 ms │        278.67 / 280.20 ±1.91 / 283.92 ms │     no change │
│ QQuery 30 │           43.83 / 44.90 ±0.72 / 45.97 ms │           42.80 / 44.69 ±0.96 / 45.41 ms │     no change │
│ QQuery 31 │        171.45 / 173.09 ±1.56 / 175.85 ms │        170.77 / 172.22 ±1.01 / 173.32 ms │     no change │
│ QQuery 32 │           57.15 / 57.67 ±0.52 / 58.67 ms │           55.90 / 57.17 ±1.72 / 60.52 ms │     no change │
│ QQuery 33 │        141.27 / 144.04 ±1.90 / 146.27 ms │        139.15 / 140.45 ±0.80 / 141.41 ms │     no change │
│ QQuery 34 │              6.90 / 7.18 ±0.34 / 7.84 ms │              6.93 / 7.18 ±0.22 / 7.58 ms │     no change │
│ QQuery 35 │        105.59 / 107.52 ±1.53 / 109.28 ms │        106.41 / 108.45 ±1.52 / 110.99 ms │     no change │
│ QQuery 36 │              6.65 / 7.20 ±0.74 / 8.65 ms │              6.50 / 6.78 ±0.23 / 7.10 ms │ +1.06x faster │
│ QQuery 37 │              8.25 / 8.83 ±0.65 / 9.97 ms │             8.31 / 9.25 ±0.84 / 10.48 ms │     no change │
│ QQuery 38 │           85.15 / 90.14 ±4.60 / 98.57 ms │           84.41 / 86.62 ±3.62 / 93.75 ms │     no change │
│ QQuery 39 │        124.96 / 126.84 ±1.10 / 128.24 ms │        122.34 / 125.15 ±2.19 / 128.60 ms │     no change │
│ QQuery 40 │        111.64 / 115.90 ±7.52 / 130.92 ms │        109.31 / 114.54 ±5.74 / 124.85 ms │     no change │
│ QQuery 41 │           13.85 / 14.93 ±0.74 / 15.73 ms │           14.13 / 14.92 ±0.74 / 16.30 ms │     no change │
│ QQuery 42 │        108.90 / 110.46 ±1.27 / 111.73 ms │        108.05 / 109.13 ±0.66 / 109.88 ms │     no change │
│ QQuery 43 │              5.90 / 6.09 ±0.16 / 6.38 ms │              5.86 / 6.04 ±0.15 / 6.30 ms │     no change │
│ QQuery 44 │           11.16 / 11.88 ±0.59 / 12.77 ms │           11.58 / 12.23 ±0.92 / 14.02 ms │     no change │
│ QQuery 45 │           50.74 / 51.05 ±0.42 / 51.87 ms │           50.45 / 51.38 ±1.14 / 53.59 ms │     no change │
│ QQuery 46 │              8.59 / 8.70 ±0.07 / 8.78 ms │              8.40 / 8.62 ±0.19 / 8.91 ms │     no change │
│ QQuery 47 │        696.23 / 699.54 ±2.12 / 702.34 ms │        675.44 / 687.75 ±7.31 / 697.11 ms │     no change │
│ QQuery 48 │        285.99 / 291.59 ±4.55 / 298.69 ms │        284.38 / 294.20 ±7.50 / 303.14 ms │     no change │
│ QQuery 49 │        252.07 / 255.20 ±1.99 / 257.99 ms │        250.37 / 253.10 ±2.48 / 257.45 ms │     no change │
│ QQuery 50 │        224.13 / 226.87 ±2.02 / 228.98 ms │        219.00 / 227.20 ±5.66 / 233.58 ms │     no change │
│ QQuery 51 │        184.75 / 187.72 ±2.99 / 191.61 ms │        177.57 / 180.95 ±2.36 / 184.46 ms │     no change │
│ QQuery 52 │        108.04 / 109.02 ±0.56 / 109.62 ms │        106.65 / 108.49 ±1.44 / 110.43 ms │     no change │
│ QQuery 53 │        102.95 / 103.40 ±0.30 / 103.90 ms │        102.03 / 103.22 ±1.12 / 104.74 ms │     no change │
│ QQuery 54 │        145.85 / 147.32 ±1.35 / 149.32 ms │        145.43 / 146.98 ±1.27 / 149.08 ms │     no change │
│ QQuery 55 │        108.45 / 109.41 ±0.88 / 110.87 ms │        106.53 / 108.36 ±1.70 / 111.21 ms │     no change │
│ QQuery 56 │        141.23 / 143.01 ±1.42 / 144.65 ms │        139.30 / 143.12 ±2.29 / 145.42 ms │     no change │
│ QQuery 57 │        174.90 / 176.45 ±0.97 / 177.64 ms │        173.08 / 175.48 ±1.27 / 176.62 ms │     no change │
│ QQuery 58 │       297.91 / 310.30 ±12.07 / 332.05 ms │       290.88 / 311.35 ±17.00 / 338.22 ms │     no change │
│ QQuery 59 │        196.36 / 198.76 ±1.62 / 200.52 ms │        195.76 / 197.54 ±1.13 / 198.76 ms │     no change │
│ QQuery 60 │        143.55 / 145.55 ±1.56 / 147.42 ms │        141.87 / 143.97 ±1.20 / 145.41 ms │     no change │
│ QQuery 61 │           12.97 / 13.31 ±0.30 / 13.67 ms │           12.97 / 13.28 ±0.34 / 13.91 ms │     no change │
│ QQuery 62 │       875.30 / 896.50 ±19.10 / 931.03 ms │       924.64 / 945.95 ±14.80 / 958.83 ms │  1.06x slower │
│ QQuery 63 │        102.70 / 105.83 ±1.70 / 107.69 ms │        102.21 / 105.17 ±3.81 / 112.62 ms │     no change │
│ QQuery 64 │        684.86 / 689.85 ±6.60 / 702.47 ms │        680.34 / 682.80 ±2.11 / 685.29 ms │     no change │
│ QQuery 65 │        253.75 / 259.57 ±4.83 / 265.22 ms │        245.13 / 251.10 ±4.04 / 256.64 ms │     no change │
│ QQuery 66 │       261.30 / 269.46 ±10.28 / 289.64 ms │       237.61 / 254.24 ±15.97 / 281.66 ms │ +1.06x faster │
│ QQuery 67 │        311.87 / 314.65 ±3.11 / 320.51 ms │        306.81 / 317.71 ±9.76 / 334.59 ms │     no change │
│ QQuery 68 │            9.61 / 10.54 ±0.53 / 11.05 ms │            8.26 / 10.02 ±1.84 / 13.41 ms │     no change │
│ QQuery 69 │        104.33 / 105.67 ±1.07 / 107.60 ms │        100.44 / 102.25 ±1.51 / 104.29 ms │     no change │
│ QQuery 70 │       329.86 / 354.08 ±14.29 / 366.94 ms │       336.34 / 353.07 ±17.57 / 385.03 ms │     no change │
│ QQuery 71 │        134.68 / 140.82 ±5.39 / 150.57 ms │        133.76 / 136.13 ±2.19 / 140.06 ms │     no change │
│ QQuery 72 │        622.45 / 628.13 ±3.24 / 630.96 ms │        609.34 / 617.50 ±6.66 / 629.41 ms │     no change │
│ QQuery 73 │              6.86 / 7.99 ±0.58 / 8.47 ms │              6.68 / 7.23 ±0.40 / 7.83 ms │ +1.10x faster │
│ QQuery 74 │        551.36 / 555.85 ±4.04 / 562.96 ms │        547.06 / 551.52 ±5.36 / 561.97 ms │     no change │
│ QQuery 75 │        276.22 / 278.23 ±2.05 / 281.15 ms │        272.14 / 275.35 ±1.69 / 276.78 ms │     no change │
│ QQuery 76 │        132.41 / 134.30 ±1.42 / 136.39 ms │        131.38 / 132.98 ±1.89 / 136.58 ms │     no change │
│ QQuery 77 │        187.51 / 189.79 ±1.19 / 190.89 ms │        185.91 / 188.50 ±2.16 / 192.16 ms │     no change │
│ QQuery 78 │        339.63 / 345.80 ±3.65 / 350.72 ms │        336.48 / 340.10 ±3.04 / 344.57 ms │     no change │
│ QQuery 79 │        233.14 / 234.32 ±0.85 / 235.65 ms │        230.85 / 232.56 ±1.38 / 234.25 ms │     no change │
│ QQuery 80 │        322.15 / 325.48 ±2.97 / 329.34 ms │        320.35 / 323.09 ±2.46 / 327.53 ms │     no change │
│ QQuery 81 │           25.88 / 26.80 ±1.10 / 28.97 ms │           25.59 / 26.41 ±0.92 / 28.06 ms │     no change │
│ QQuery 82 │        198.87 / 202.70 ±3.75 / 207.51 ms │        196.36 / 198.92 ±2.58 / 203.85 ms │     no change │
│ QQuery 83 │           37.94 / 39.73 ±1.11 / 40.97 ms │           37.98 / 39.01 ±1.03 / 40.91 ms │     no change │
│ QQuery 84 │           47.82 / 48.90 ±0.69 / 49.84 ms │           48.23 / 49.32 ±1.49 / 52.27 ms │     no change │
│ QQuery 85 │        149.15 / 150.16 ±0.86 / 151.68 ms │        146.38 / 148.02 ±1.63 / 150.78 ms │     no change │
│ QQuery 86 │           39.33 / 40.13 ±0.45 / 40.57 ms │           37.76 / 40.03 ±1.47 / 41.87 ms │     no change │
│ QQuery 87 │           86.72 / 89.52 ±2.44 / 93.49 ms │           84.96 / 86.34 ±1.97 / 90.23 ms │     no change │
│ QQuery 88 │        100.24 / 101.49 ±1.21 / 103.79 ms │          98.32 / 99.21 ±0.61 / 100.01 ms │     no change │
│ QQuery 89 │        117.39 / 119.45 ±1.15 / 120.75 ms │        117.45 / 117.93 ±0.48 / 118.77 ms │     no change │
│ QQuery 90 │           23.15 / 24.30 ±0.69 / 25.02 ms │           23.41 / 23.83 ±0.30 / 24.23 ms │     no change │
│ QQuery 91 │           64.65 / 65.43 ±0.53 / 66.22 ms │           63.03 / 65.26 ±1.55 / 67.28 ms │     no change │
│ QQuery 92 │           58.25 / 59.60 ±0.93 / 60.78 ms │           56.78 / 57.60 ±0.75 / 58.73 ms │     no change │
│ QQuery 93 │        188.05 / 188.68 ±0.75 / 190.14 ms │        184.05 / 186.10 ±2.08 / 188.70 ms │     no change │
│ QQuery 94 │           60.39 / 61.63 ±0.81 / 62.77 ms │           61.51 / 62.18 ±0.66 / 63.08 ms │     no change │
│ QQuery 95 │        128.79 / 131.19 ±1.46 / 132.92 ms │        127.81 / 129.21 ±1.51 / 131.50 ms │     no change │
│ QQuery 96 │           70.59 / 74.37 ±2.47 / 76.69 ms │           71.59 / 73.38 ±1.06 / 74.88 ms │     no change │
│ QQuery 97 │        126.56 / 127.40 ±0.66 / 128.52 ms │        123.73 / 125.49 ±1.03 / 126.60 ms │     no change │
│ QQuery 98 │        154.38 / 154.93 ±0.50 / 155.87 ms │        149.26 / 153.13 ±2.48 / 155.86 ms │     no change │
│ QQuery 99 │ 10685.34 / 10720.30 ±22.99 / 10749.30 ms │ 10671.37 / 10738.78 ±40.94 / 10787.16 ms │     no change │
└───────────┴──────────────────────────────────────────┴──────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                         ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 31322.82ms │
│ Total Time (lambda_and_array_transform)   │ 31218.60ms │
│ Average Time (HEAD)                       │   316.39ms │
│ Average Time (lambda_and_array_transform) │   315.34ms │
│ Queries Faster                            │          4 │
│ Queries Slower                            │          2 │
│ Queries with No Change                    │         93 │
│ Queries with Failure                      │          0 │
└───────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric	Value
Wall time	160.0s
Peak memory	5.6 GiB
Avg memory	4.6 GiB
CPU user	258.6s
CPU sys	17.4s
Peak spill	0 B

tpcds — branch

Metric	Value
Wall time	160.0s
Peak memory	5.7 GiB
Avg memory	4.6 GiB
CPU user	256.1s
CPU sys	17.9s
Peak spill	0 B

File an issue against this benchmark runner

gstvg added 30 commits November 22, 2025 15:29

add lambda support

dbf2aa5

add lambdas: None to existing ScalarFunctionArgs in tests/benches

fa4a8fb

simplify lambda support

b18d214

rename LambdaColumn to LambdaVariable

d844b2d

feat: add LambdaUDF

e1921eb

feat: remove lambda support for ScalarUDF

1f19c64

temporarily add pr description as DOC.md

570cc53

add lambda note in substrait consumer

83dfbdd

add LambdaSignature

34137e1

improve lambda type coercion

3ded115

lambda function type coercion: stop using unstable Iterator::eq_by

82930ec

remove signature section from DOC.md

86d5999

polish lambda impl

60cabc0

minor improvoments

41152c3

Merge branch 'main' into lambda4

2be9e54

improve lambdas

90eb08f

cargo fmt

d874db7

simplify LambdaUDF coerce_value_types

a59ffe8

remove DOC.md

cd22c04

add physical lambda function comments

0188d40

remove secondary lambda features to be added later

6f2c92b

fix removal of lambda features

b3bdc48

fix typo

9728a2e

Merge branch 'main' of https://github.com/apache/datafusion into lambda4

811aa0a

remove paste! from lambda macros

f724ef5

fix lambda sqllogictests

5380884

improve Expr::Lambda docs

d75dfe3

add clarifying comment on lambda type coercion

a241a51

simplify lambda type coercion

547c148

handle null values in array_transform

6ae73cb

rluvaton added the api change Changes the API exposed to users of the crate label Apr 16, 2026

rluvaton reviewed Apr 16, 2026

View reviewed changes

comphead reviewed Apr 16, 2026

View reviewed changes

Comment thread datafusion-examples/examples/sql_ops/frontend.rs Outdated

comphead reviewed Apr 16, 2026

View reviewed changes

Comment thread datafusion/catalog-listing/src/helpers.rs

gstvg added 5 commits April 17, 2026 02:47

use Field::eq instead of manually checking properties

94e37db

use OffsetSizeTrait::as_usize() instead of to_usize().unwrap()

02ccbb6

Compute lambda_positions and arg_fields in a single iteration in High…

9455140

…erOrderFunctionExpr::try_new

rename ContextProvider::udhof_names to higher_order_function_names

7f72b5b

add issue link about volatile functions in ListingTable partition pru…

27fcc8e

…ning

martin-g reviewed Apr 17, 2026

View reviewed changes

gstvg and others added 6 commits April 18, 2026 06:05

use internal_err with a faulty value_fields_with_higher_order_udf

9530ded

fix typo in array_transform docs

8977eae

check new children len in LambdaVariable::with_new_children

f9507fa

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

fix: use the traversed expression instead of the top level one in cas…

553988a

…e projection/optimization Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

normalize lambda parameter names in sql parser and quote them in sql …

797d4be

…unparser Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

cargo fmt

1b668db

gstvg mentioned this pull request Apr 18, 2026

[DRAFT, EPIC] Full lambda support #21172

Open

19 tasks

rluvaton reviewed Apr 19, 2026

View reviewed changes

Conversation

gstvg commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

comphead commented Apr 16, 2026

Uh oh!

gstvg commented Apr 17, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

comphead commented Apr 19, 2026

Uh oh!

adriangbot commented Apr 19, 2026

Uh oh!

adriangbot commented Apr 19, 2026

Uh oh!

adriangbot commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

gstvg commented Apr 16, 2026 •

edited

Loading