chore(bench): kernel-vs-Thrift performance baseline harness + results by vikrantpuppala · Pull Request #790 · databricks/databricks-sql-python

vikrantpuppala · 2026-05-15T09:52:31Z

Stacks on #789 (kernel bind_param). Adds a benchmark script under `scripts/`; no production code change.

Summary

One-shot benchmark script that runs each (backend × SQL-shape) combination N+1 times against a live warehouse, drops the first run (cache warm-up), and reports min/median/max for session-open, time-to-first-row (TTFR), drain, and RSS delta.

Not a CI gate — single-machine, single-warehouse, high-variance. Meant to be re-run by hand when we want a baseline. Output is a Markdown table you can paste into a PR.

```
set -a && source ~/.databricks/pecotesting-creds && set +a
export DATABRICKS_SERVER_HOSTNAME=${DATABRICKS_HOST#https://}
.venv/bin/python scripts/bench_kernel_vs_thrift.py
```

Results (median of 5 samples, warm-up dropped, dogfood)

Shape	Thrift drain	Kernel drain	Ratio	Notes
`SELECT 1`	387 ms	1085 ms	2.8×	Fixed ~700ms kernel TTFR overhead
`range(10k)`	909 ms	1347 ms	1.48×	Overhead amortized
`wide-uuid(100k)`	6907 ms	9925 ms	1.44×	Overhead amortized
`metadata.catalogs`	413 ms	550 ms	1.33×	Overhead amortized
`range(1M)`	14.0 s	panic	—	kernel-side bug: issue #19

Detailed tables with min/max ranges and RSS-delta numbers are in the script output.

Three findings worth flagging

1. Fixed kernel TTFR overhead (~700ms)

`SELECT 1` is the cleanest signal because drain time is essentially zero. Kernel pays ~700ms more than Thrift on every query. On large queries (drain dominates) the relative cost shrinks to 1.3–1.5×.

Plausible causes (not investigated):

Per-Session tokio runtime construction (flagged by the kernel PR review).
SEA wait-for-result poll handshake doing an extra round-trip.

A flamegraph would distinguish these. Worth a follow-up.

2. CloudFetch panic on large results (kernel issue #19)

`range(1M)` crosses the CloudFetch threshold; the kernel's reader does `runtime_handle.block_on` from a sync trait method, which panics when called from inside our PyO3 `runtime.block_on`. `use_sea=True` is unusable in production for any large-result workload until this is fixed. The connector's e2e tests use `range(10000)` which is below the CloudFetch threshold, which is why it didn't surface earlier.

3. RSS overhead ~+1MB per kernel session

Consistent across every shape. Probably tokio worker thread stacks (default ~2MB × N workers, partially committed). Not a problem at small connection counts; maps to the "process-global `OnceLock`" follow-up the kernel reviewer flagged.

What this doesn't measure

Concurrent-connection scaling (single conn at a time here)
Anything past 100k rows on kernel (blocked by TLS version handshake issue #19)
Network-jitter sensitivity
Memory profile beyond coarse RSS

Useful next benchmarks once #19 lands: real CloudFetch shape (1M+ rows), concurrent sessions, and a memory-profile pass.

This pull request and its description were written by Isaac.

One-shot benchmark script under scripts/ that runs each (backend × SQL-shape) combination N+1 times against a live warehouse, drops the first run (cache warm-up), and reports min/median/max for session-open, time-to-first-row, drain, and RSS delta. Not a CI gate — single-machine, single-warehouse, high-variance script meant to be re-run by hand when we want a baseline. Shapes: - SELECT 1 (pure round-trip latency, no data) - range(10k) (inline result, ~10K rows) - range(1M) (crosses CloudFetch threshold; currently panics on the kernel backend — see kernel issue #19, nested block_on bug) - wide-uuid(100k) (wider rows, Arrow serialization) - metadata.catalogs (metadata round-trip) Output is a Markdown table you can paste into a PR. Run with: set -a && source ~/.databricks/pecotesting-creds && set +a export DATABRICKS_SERVER_HOSTNAME=${DATABRICKS_HOST#https://} .venv/bin/python scripts/bench_kernel_vs_thrift.py Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

vikrantpuppala · 2026-05-15T10:14:56Z

Update: kernel issue #19 fixed in databricks-sql-kernel#20.

Re-ran the benchmark with the fix applied:

Shape	Thrift drain	Kernel drain (was)	Kernel drain (after fix)	Ratio
`SELECT 1`	396 ms	1085 ms	1079 ms	2.7×
`range(10k)`	1106 ms	1347 ms	1395 ms	1.26×
`range(1M)`	15009 ms	panic	7317 ms	0.49× (2× faster!)
`wide-uuid(100k)`	9776 ms	9925 ms	9781 ms	1.00×
`metadata.catalogs`	420 ms	550 ms	546 ms	1.30×

The big result: on range(1M) — the shape that previously panicked — the kernel-backed path is now 2× faster than Thrift at drain (7.3s vs 15.0s; 137K rows/s vs 67K rows/s). CloudFetch's parallel chunk download is paying off.

Other observations unchanged:

Fixed ~700ms kernel TTFR overhead on small queries (SELECT 1 is the cleanest signal). Probably per-Session tokio runtime construction; separate follow-up.
wide-uuid(100k) lines up at ~9.8s on both backends — server-side dominates here.
RSS delta is larger on kernel for the 1M shape (+21MB vs +4KB) because pyarrow holds the whole drained table in scope at end of drain. Both backends should converge if we drain batch-by-batch instead.

vikrantpuppala had a problem deploying to azure-prod May 15, 2026 09:52 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(bench): kernel-vs-Thrift performance baseline harness + results#790

chore(bench): kernel-vs-Thrift performance baseline harness + results#790
vikrantpuppala wants to merge 1 commit into
feat/kernel-bind-paramfrom
bench/kernel-vs-thrift

vikrantpuppala commented May 15, 2026

Uh oh!

vikrantpuppala commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vikrantpuppala commented May 15, 2026

Summary

Results (median of 5 samples, warm-up dropped, dogfood)

Three findings worth flagging

1. Fixed kernel TTFR overhead (~700ms)

2. CloudFetch panic on large results (kernel issue #19)

3. RSS overhead ~+1MB per kernel session

What this doesn't measure

Uh oh!

vikrantpuppala commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant