feat(schema): ClickHouse index algorithms and engine SETTINGS#6
feat(schema): ClickHouse index algorithms and engine SETTINGS#6
Conversation
The ClickHouse schema previously hard-coded `TYPE minmax GRANULARITY 3` for every index and had no way to emit a `SETTINGS k=v` clause. Both are needed by real ClickHouse-backed services that pick algorithms per column and tune engine settings (e.g. `index_granularity`, `allow_nullable_key`). This change adds: - `Schema\ClickHouse\SkipIndexAlgorithm` enum — `MinMax`, `Set`, `BloomFilter`, `NgramBloomFilter`, `TokenBloomFilter`, `Inverted`. - `Schema\ClickHouse\SkipIndex` value object carrying the algorithm, granularity, and algorithm-specific args (e.g. `[100]` for `set(100)`, `[4, 1024, 3, 0]` for `ngrambf_v1(...)`). - `Table::dataSkippingIndex()` — attach a skip index to the blueprint. Default granularity 1, default no algorithm args. - `Table::settings()` — set table-level engine SETTINGS, with a conservative key/value allow-list to keep DDL safe. - `Schema\ClickHouse::create()` renders both. SETTINGS is emitted after TTL. The existing minmax compile path is unchanged for backward compatibility — callers using `index()` keep their current output. Other dialects ignore the new fields, matching how `engine()` and `ttl()` are already handled. Tests cover each skip algorithm shape, composite-column skip indexes, SETTINGS rendering with and without TTL, and validation of bad granularity, empty columns, bad setting names, and unsafe string values.
📊 Coverage
Full per-file breakdown in the job summary. |
Greptile SummaryThis PR adds two ClickHouse-only DDL features to the schema builder: per-index algorithm selection ( Confidence Score: 5/5Safe to merge — no P0 or P1 issues found; all previously flagged defects have been resolved. Every concern from prior review rounds is addressed in the current diff. The single remaining finding is a minor README/code discrepancy on the string-value regex quantifier ( No files require special attention. Important Files Changed
Reviews (14): Last reviewed commit: "fix(schema): restore missing brace in se..." | Re-trigger Greptile |
Four follow-ups based on the greptile review on PR #6: - alter() now renders ADD INDEX for any skipIndexes set on the blueprint, matching ClickHouse's `ALTER TABLE ... ADD INDEX name expr TYPE algo GRANULARITY n` syntax. Previously dataSkippingIndex() inside an alter() callback was silently dropped (P1). - alter() throws UnsupportedException if the blueprint has settings — ClickHouse does not accept SETTINGS in ALTER TABLE; callers must emit `ALTER TABLE ... MODIFY SETTING` directly. Surfaces what was previously a silent no-op. - SkipIndex constructor rejects algorithmArgs for MinMax and Inverted (P2) — both are emitted without parentheses in ClickHouse DDL, so any args would have produced invalid SQL like `minmax(1)`. - Table::dataSkippingIndex() now sanitises auto-generated index names when columns contain non-identifier characters (P2) — `event-type` no longer produces a confusing `Invalid skip index name: skip_event-type` exception. Non-identifier characters are collapsed to `_`. - compileSkipAlgorithm() formats float args with sprintf('%F', ...) instead of (string) cast (P2) — the cast can produce scientific notation like `1.0E-5`, which ClickHouse rejects in index type arguments. Trailing zeros are trimmed for readability so 0.01 stays "0.01" rather than "0.010000". Adds tests for each fix plus one for ALTER ADD INDEX with composite columns and algorithm args.
The ClickHouse parts of this library are unreleased and have no external
consumers, so there is no need for a parallel `dataSkippingIndex()` /
`SkipIndex` API alongside the existing `index()` / `Index` API. Folding
them in produces a single consistent index API that works the same in
every dialect, with ClickHouse-specific parameters available as optional
named arguments (mirroring how `method`, `operatorClass`, `lengths`, and
`collations` are already dialect-specific options on `index()`).
Changes:
- `Table::dataSkippingIndex()` — removed.
- `Schema\ClickHouse\SkipIndex` value object — removed (folded into
`Index`).
- `Schema\ClickHouse\SkipIndexAlgorithm` enum — kept; useful typed value.
- `Index` gains `algorithm`, `algorithmArgs`, and `granularity` optional
fields. Validates name (must be a SQL identifier), at-least-one-column,
granularity ≥ 1, and that no-arg algorithms (MinMax, Inverted) get no
`algorithmArgs`.
- `Table::index()` and `uniqueIndex()` accept the new fields. Auto-name
generation now sanitises non-identifier characters in column names so
`event-type` produces `idx_event_type` instead of failing the
identifier regex.
- `Schema\ClickHouse::create()` and `alter()` use a single
`compileSkipIndex(Index)` helper that renders
`INDEX <name> <expr> TYPE <algo>[(args)] GRANULARITY <n>`. Defaults
to `TYPE minmax GRANULARITY 3` when no algorithm is set, matching the
ClickHouse-canonical default for a generic `INDEX` without a chosen
type.
- Float `algorithmArgs` are formatted with `sprintf('%F', ...)` and
trimmed of trailing zeros, so `1.0e-5` becomes `0.00001` rather than
`1.0E-5` (which ClickHouse rejects in index type arguments).
5083/5083 tests pass; lint and PHPStan max clean.
After collapsing `dataSkippingIndex()` into `index()` and `SkipIndex` into `Index`, the `Skip` prefix on the algorithm enum is the last holdout from the parallel-API design and is now inconsistent with the new naming. The enum already lives in `Schema\ClickHouse\` so the namespace says "ClickHouse"; the type name only needs to say "this is the algorithm a ClickHouse index runs". `SkipIndexAlgorithm` -> `IndexAlgorithm`. No behavioural change.
Two P1 follow-ups from the latest greptile review:
- The strict name regex on `Index` (`[A-Za-z_][A-Za-z0-9_]*`) was a
cross-dialect breaking change: MySQL, MariaDB, PostgreSQL, and SQLite
all quote index names, so hyphenated and dot-qualified names were
valid before this PR. The strict identifier check now only fires when
an `IndexAlgorithm` is set — i.e. for ClickHouse data-skipping indexes
where the name appears unquoted in `INDEX <name> ... TYPE ...`.
- `Table::settings()` cast floats with `(string) $value`, which produces
scientific notation for small values (e.g. `(string) 1.0e-5 ===
"1.0E-5"`). ClickHouse rejects E-notation in SETTINGS values. Apply
the same `sprintf('%F', ...)` + trim treatment that `compileSkipIndex`
already uses for algorithm args.
Adds tests for both: hyphenated index names work without an algorithm
and throw with one; SETTINGS values formatted from `1.0e-5` come out as
`0.00001`.
|
@copilot Rebase |
Co-authored-by: abnegate <5857008+abnegate@users.noreply.github.com>
|
@copilot address the greptile comment |
…chaining Agent-Logs-Url: https://github.com/utopia-php/query/sessions/5b098851-2af2-49e9-8778-d7edb0172860 Co-authored-by: abnegate <5857008+abnegate@users.noreply.github.com>
|
@copilot address the greptile comment |
Agent-Logs-Url: https://github.com/utopia-php/query/sessions/498037f0-05ae-40ad-8dfa-b72fe523705a Co-authored-by: abnegate <5857008+abnegate@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
|
@copilot Fix the readme to use builder style for new additions instead of callback |
Agent-Logs-Url: https://github.com/utopia-php/query/sessions/b1fda15b-7a7c-4a63-a597-31ee644b3c39 Co-authored-by: abnegate <5857008+abnegate@users.noreply.github.com>
|
@copilot fix unit test and PHPstan checks |
Agent-Logs-Url: https://github.com/utopia-php/query/sessions/6148863a-df07-4bbf-a771-b4721be69f8f Co-authored-by: abnegate <5857008+abnegate@users.noreply.github.com>
Fixed in 16ddd87 — all skip-index and SETTINGS tests were using a non-existent callback API ( |
Tests in the fluent builder style call ->index(algorithm: ...) and ->settings([...]) on a Column returned by a column-creation method, but Column lacked forwarders for the new ClickHouse skip-index params and the table-level settings(). Add them so the chain compiles and runs. Also hardcode GRANULARITY 3 for the no-algorithm fallback in ClickHouse::compileSkipIndex (matches its docblock and existing test expectations), and rewrite testTableSettingsWithTtlOrdering to call ttl() on the Table directly — Column::ttl() is column-level and would emit an inline TTL on the column, not the table-level TTL the test asserts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
The previous "hardcoded 3" fallback in compileSkipIndex discarded any explicit granularity passed for a no-algorithm index. Make the field nullable end-to-end (Index, Table::index, Column::index) and resolve the dialect-specific default at compile time: 3 when no algorithm, 1 when an algorithm is set. Explicit values are now always honoured. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Adds two ClickHouse-only DDL features to the Schema builder so real ClickHouse-backed services can express the table layout they need:
Index algorithm selection. Every ClickHouse
INDEXis parameterised by aTYPE(minmax / set / bloom_filter / ngrambf_v1 / tokenbf_v1 / inverted) and aGRANULARITY. The schema previously hard-codedTYPE minmax GRANULARITY 3for everyindex()call, which is the wrong choice for any high-cardinality column.Table::index()now accepts an optionalalgorithm,algorithmArgs, andgranularityso callers can pick the correct algorithm per column.Engine SETTINGS clause.
CREATE TABLE … SETTINGS k=v, …was unreachable. Real ClickHouse usage needsindex_granularity,allow_nullable_key, etc. —Table::settings()now emits this afterTTLin the rendered DDL.The pattern follows
engine()/ttl()precisely: ClickHouse-only DDL features live onTableand are silently ignored by the other dialect compilers.Why
Both utopia-php/audit and utopia-php/usage currently hand-roll their ClickHouse
setup()routines, emittingINDEX … TYPE bloom_filter GRANULARITY 1andSETTINGS index_granularity = 8192, allow_nullable_key = 1directly. Without these features the schema builder cannot express what those services already produce, so they cannot delegatesetup()here.API
Indexes can also be added in
alter()callbacks — they render asALTER TABLE … ADD INDEX ….MinMaxandInvertedrejectalgorithmArgs(no parenthesised args in DDL); floatalgorithmArgsandsettingsvalues usesprintf('%F', ...)so1.0e-5becomes0.00001instead of the rejected1.0E-5.Files
src/Query/Schema/ClickHouse/IndexAlgorithm.php— new enum (6 algorithms).src/Query/Schema/Index.php— adds optionalalgorithm,algorithmArgs,granularityfields. Validates the no-args-on-MinMax/Inverted rule, granularity ≥ 1, and at-least-one-column. The strict identifier regex onnameonly fires whenalgorithmis set, since other dialects quote index names (so hyphens / dots are valid there).src/Query/Schema/Table.php—index()anduniqueIndex()accept the new fields. Auto-generated index names sanitise non-identifier characters in column names, soevent-typeproducesidx_event_type. Addssettings()with a conservative key/value allow-list and float-safe formatting.src/Query/Schema/ClickHouse.php— singlecompileSkipIndex(Index)helper used by bothcreate()andalter(). SETTINGS rendered after TTL.alter()rejects SETTINGS (ClickHouse doesn't accept it inALTER TABLE; callers must emitALTER TABLE … MODIFY SETTINGdirectly).Test plan
composer lintpassescomposer check(PHPStan max) passes🤖 Generated with Claude Code