-
Notifications
You must be signed in to change notification settings - Fork 620
Description
When reading a Variant column in RowBinary format, ClickHouse uses a single discriminator byte before each value to indicate which type from the Variant's type list it belongs to. For a Variant(Int32, String), discriminator 0x00 means Int32 and 0x01 means String. When the value is NULL, ClickHouse sends 0xFF as the discriminator with no following value bytes.
Bug
The driver does not handle 0xFF as a NULL indicator, causing crashes when reading NULL Variant values.
Client V2 (BinaryStreamReader.java:859-862):
public Object readVariant(ClickHouseColumn column) throws IOException {
int ordNum = readByte();
return readValue(column.getNestedColumns().get(ordNum));
}readByte() returns a signed byte. When ClickHouse sends 0xFF (255 unsigned / -1 signed), the code passes it directly to getNestedColumns().get(ordNum) without checking for the NULL sentinel. Calling .get(-1) on a List throws IndexOutOfBoundsException.
Legacy implementation (ClickHouseRowBinaryProcessor.java:268-281):
int ordTypeNum = BinaryStreamUtils.readInt8(input);
for (int i = 0; i < len; i++) {
if (ordTypeNum == i) {
tupleValues[i] = deserializers[i].deserialize(values[i], input).asObject();
} else {
tupleValues[i] = null;
}
}Here readInt8() returns -1 for 0xFF. The loop never matches ordTypeNum == i (since i is never -1), so all tuple values become null — silently returning incorrect data instead of a proper NULL variant.
Expected Behavior
When the discriminator byte is 0xFF, the driver should return null without attempting to read any following value bytes.
Suggested Fix (Client V2)
public Object readVariant(ClickHouseColumn column) throws IOException {
int ordNum = readByte() & 0xFF; // unsigned
if (ordNum == 0xFF) {
return null;
}
return readValue(column.getNestedColumns().get(ordNum));
}Reproduction
The simplest reproduction is a single query with a NULL cast to Variant:
SELECT NULL::Variant(Int32, String) FORMAT RowBinary
SETTINGS allow_experimental_variant_type = 1;Inspecting the raw bytes returned by ClickHouse confirms the wire format:
# NULL Variant — single 0xFF byte, no value payload
$ curl -s 'http://localhost:8123/' \
--data-binary "SELECT NULL::Variant(Int32, String) FORMAT RowBinary SETTINGS allow_experimental_variant_type=1" | xxd
00000000: ff
# Int32 Variant — discriminator 0x00, then 4-byte little-endian 42
$ curl -s 'http://localhost:8123/' \
--data-binary "SELECT 42::Variant(Int32, String) FORMAT RowBinary SETTINGS allow_experimental_variant_type=1" | xxd
00000000: 002a 0000 00
# String Variant — discriminator 0x01, then length-prefixed 'hello'
$ curl -s 'http://localhost:8123/' \
--data-binary "SELECT 'hello'::Variant(Int32, String) FORMAT RowBinary SETTINGS allow_experimental_variant_type=1" | xxd
00000000: 0105 6865 6c6c 6f
Reading the NULL row will crash with IndexOutOfBoundsException in Client V2, or silently return incorrect data in the legacy processor.