-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Improve RunLengthBitPackingHybridDecoder.readNext to avoid per-call buffer allocation and DataInputStream wrapping #3466
Description
Describe the enhancement requested
RunLengthBitPackingHybridDecoder.readNext() allocates a new int[] and byte[] on every PACKED-mode call. In workloads that decode many bit-packed runs (definition levels, repetition levels, RLE-encoded integers), these allocations dominate the read-side allocation profile. The upstream code even acknowledges this with a // TODO: reuse a buffer comment.
Problem 1: per-call buffer allocation
Lines 94–95 allocate fresh arrays on every PACKED-mode readNext():
currentBuffer = new int[currentCount]; // TODO: reuse a buffer
byte[] bytes = new byte[numGroups * bitWidth];currentCount is always numGroups * 8, and numGroups is typically small (1–16 groups = 8–128 values per run). These allocations are individually modest but occur thousands of times per column chunk — once per bit-packed run. In a 180M-row merge with multiple integer/boolean columns, the cumulative allocation is substantial.
Since currentCount varies between runs (different numGroups values), the fix retains the field-level int[] and a new field-level byte[], growing them only when the next run requires a larger buffer.
Problem 2: per-call DataInputStream wrapping
Line 98 creates a new DataInputStream(in) on every PACKED-mode call:
new DataInputStream(in).readFully(bytes, 0, bytesToRead);This allocates a DataInputStream wrapper object per call just to access readFully(). A private readFully() method on the decoder itself eliminates this allocation and the virtual dispatch through the wrapper.
Component(s)
Core