Regression: S3/GCS disks broken in 25.8.22 by aws-sdk-cpp 1.11.771 — "Response checksums mismatch"

✅ *I checked [the Altinity Stable Builds lifecycle table](https://docs.altinity.com/altinitystablebuilds/#altinity-stable-builds-life-cycle-table), and the Altinity Stable Build version I'm using is still supported.*

## Type of problem
**Bug report** - something's broken

## Describe the situation

A regression was introduced in **Altinity Antalya 25.8.22** by PR #1667 (*Antalya 25.8: Bump to 25.8.22*), which backported upstream PR [ClickHouse/ClickHouse#100582](https://github.com/ClickHouse/ClickHouse/pull/100582) (*Use `aws-sdk-cpp` 1.11.771*).

After the bump, every ClickHouse operation against **real AWS S3** or **GCS S3-compatible** endpoints fails at startup with:

```
Code: 499. DB::Exception: Response checksums mismatch
This error happened for S3 disk. (S3_ERROR)
```

This causes the S3 disk's startup access check to fail, which in turn prevents `ConfigReloader` from loading the storage configuration, hanging the server's config reload until the regression test times out at 600s.

This issue:
- Reproduces deterministically on **real AWS S3** and **real GCS** — both architectures (x86 and arm64).
- Does **not** reproduce on MinIO (MinIO doesn't enforce response checksums).
- Was detected in our **antalya-25.8 MasterCI** runs, where every scenario in the `S3_Aws_S3_2`, `S3_Gcs_2`, `Benchmark_Aws_S3`, `Benchmark_Gcs`, `Tiered_Storage_S3Amazon`, and `Tiered_Storage_S3Gcs` jobs fails identically (~30 scenarios per arch).
- Is already known upstream as [ClickHouse/ClickHouse#103232](https://github.com/ClickHouse/ClickHouse/issues/103232), with an open backport PR [ClickHouse/ClickHouse#103542](https://github.com/ClickHouse/ClickHouse/pull/103542) that has **not yet been merged into 25.8**.

---

## How to reproduce the behavior

### Environment
- **Version:** `25.8.22.20001.altinityantalya` (any 25.8.22+ build)
- **Storage:** real AWS S3 or GCS S3-compatible endpoint (not MinIO)
- **Build type:** Any (release reproducible)

### Steps

1. Configure an S3 disk in `storage.xml` pointing at a real AWS S3 bucket:

```xml
<storage_configuration>
    <disks>
        <external>
            <type>s3</type>
            <endpoint>https://s3.&lt;region&gt;.amazonaws.com/&lt;bucket&gt;/data/benchmark/</endpoint>
            <access_key_id>&lt;key&gt;</access_key_id>
            <secret_access_key>&lt;secret&gt;</secret_access_key>
        </external>
    </disks>
    <policies>
        <external>
            <volumes><external><disk>external</disk></external></volumes>
        </external>
    </policies>
</storage_configuration>
```

2. Start `clickhouse-server`. The disk's startup access check immediately triggers the failure — no user query is needed.

3. Equivalent reproduction via our regression suite:

```bash
python3 -u s3/regression.py \
  --clickhouse docker://altinity/clickhouse-server:25.8.22.20001.altinityantalya \
  --storage aws_s3 \
  --only '/benchmark/aws_s3/*' \
  --log log.log
```

---

## Expected behavior

The S3 disk startup check should succeed and the server should load its storage configuration normally, as it did in 25.8.21 and earlier.

---

## Actual behavior

The following behavior is taken from job [`Benchmark_Aws_S3` in MasterCI run 25091176162](https://github.com/Altinity/ClickHouse/actions/runs/25091176162/job/73518764024) on commit `5c9d523`.

The startup `ReadBufferFromS3` GET against the bucket fails with `Response checksums mismatch` on every retry (1/4 → 4/4):

```
[clickhouse1] 2026.04.29 07:55:32.591947 [ 678 ] {} <Debug> ReadBufferFromS3:
  Caught exception while reading S3 object.
  Bucket: [masked]:Secret(name='aws_s3_bucket'),
  Key: data/benchmark/kdc/ioxpdznoiwywnvsrcobnauawixgzw,
  Version: Latest, Offset: 0, Attempt: 1/4,
  Message: Code: 499. DB::Exception: Response checksums mismatch
  This error happened for S3 disk. (S3_ERROR)
  (version 25.8.22.20001.altinityantalya (altinity build))
```

After 4 retries the disk access check fails:

```
<Error> ConfigReloader: Error updating configuration from '/etc/clickhouse-server/config.xml':
  Code: 347. DB::Exception: Code: 499. DB::Exception: Response checksums mismatch
  This error happened for S3 disk: While checking access for disk external. (S3_ERROR)
```

Stack trace (relevant frames):

```
0. Poco::Exception::Exception(String const&, int)
1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool)
2. DB::Exception::Exception(String const&, int, String, bool)
3. ./src/IO/S3Common.h:39: DB::S3Exception::S3Exception(String const&, Aws::S3::S3Errors)
```

The test then hangs 10 minutes waiting for `ConfigReloader: Loaded config '/etc/clickhouse-server/config.xml', performed update on configuration` and dies with `ExpectTimeoutError: Timeout 600.000s` — this is why every scenario in the suite collapses identically. The server is unusable, not the test logic.

---

## Root cause analysis

The new aws-sdk-cpp 1.11.771 changed two checksum defaults:

- `requestChecksumCalculation = WHEN_SUPPORTED` → SDK adds `x-amz-checksum-algorithm: CRC32` on every request.
- `responseChecksumValidation = WHEN_SUPPORTED` → SDK validates `x-amz-checksum-*` headers on every response.

GCS's S3-compatible API doesn't fully implement AWS Flexible Checksums, and on real AWS S3 the response-side CRC32 verification fails against ClickHouse's locally computed value. Both end up throwing `Response checksums mismatch`.

The follow-up patches inside PR #1667 (commits *"Fix build"*, *"One more change to adapt to a new SDK"*, *"Fix Md5 checksums calculation"*) only override `ShouldComputeContentMd5()` and `RequestChecksumRequired()` on `PutObjectRequest`, `UploadPartRequest`, `DeleteObjectRequest`, and `DeleteObjectsRequest`. They do **not** disable the new SDK-level checksum defaults, which is why GETs (and the read-side access check) still trigger this error. The patch authors themselves left a `/// TODO Understand what is it. Maybe we need it...` comment on the related override, and a `/// FIXME. Variadic arguments?` comment on the new `vaLog` adapter — this slipped through.

---

## Suggested fix

Backport upstream commit [`659369ead95`](https://github.com/ClickHouse/ClickHouse/commit/659369ead95) (*"Fix very weird issue"*) — equivalently, merge upstream backport PR [ClickHouse/ClickHouse#103542](https://github.com/ClickHouse/ClickHouse/pull/103542) into the `antalya-25.8` branch.

The fix sets, in `PocoHTTPClientConfiguration`:

```cpp
checksumConfig.requestChecksumCalculation  = Aws::Client::RequestChecksumCalculation::WHEN_REQUIRED;
checksumConfig.responseChecksumValidation  = Aws::Client::ResponseChecksumValidation::WHEN_REQUIRED;
```

restoring the pre-1.11.771 behavior. This commit is already present in upstream `25.10.x` and `26.3.x`, but was never backported to `25.8`.

| Version            | aws-sdk-cpp 1.11.771 | `WHEN_REQUIRED` fix | Works on real AWS S3 / GCS |
|--------------------|----------------------|---------------------|----------------------------|
| 25.8.21            | no                   | n/a                 | ✅                         |
| 25.8.22 (PR #1667) | yes                  | ❌ missing          | ❌                         |
| 25.10.x            | yes                  | ✅                  | ✅                         |
| 26.3.x             | yes                  | ✅                  | ✅                         |

---

## Additional context

### Related PR
- Altinity/ClickHouse PR #1667 (*Antalya 25.8: Bump to 25.8.22*) — merge commit `5c9d52363de84ccdd439b7f2e20fae710921b26f`, which contains the upstream `aws-sdk-cpp` 1.11.771 backport that introduces this regression.

### Upstream references
- Upstream issue: [ClickHouse/ClickHouse#103232 — *recently released 25.8.22 stopped works with google over s3 disks*](https://github.com/ClickHouse/ClickHouse/issues/103232)
- Upstream backport PR (open, not merged): [ClickHouse/ClickHouse#103542 — *Backport fix for GCS over S3 disk broken by aws-sdk-cpp 1.11.771 to 25.8*](https://github.com/ClickHouse/ClickHouse/pull/103542)
- Upstream SDK bump: [ClickHouse/ClickHouse#100582 — *Use `aws-sdk-cpp` 1.11.771*](https://github.com/ClickHouse/ClickHouse/issues/100582)

### CI failures

Affected jobs (every scenario fails identically, both x86 and arm64):
- `S3_Aws_S3_2`
- `S3_Gcs_2`
- `Benchmark_Aws_S3`
- `Benchmark_Gcs`
- `Tiered_Storage_S3Amazon`
- `Tiered_Storage_S3Gcs`

Failing MasterCI runs on `antalya-25.8` (both on commit `5c9d523` from PR #1667):

| # | Run | Commit | Date |
|---|-----|--------|------|
| 1 | [25091176162](https://github.com/Altinity/ClickHouse/actions/runs/25091176162) | `5c9d523` | 2026-04-29 |
| 2 | [24845940234](https://github.com/Altinity/ClickHouse/actions/runs/24845940234) | `5c9d523` | 2026-04-23 |

The same scenarios pass on the previous commits (`1eed78a`, `a36e131`, `59dcdc0`) before PR #1667 landed, and were re-run after #1667 landed and continued to fail — ruling out infrastructure flakiness.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression: S3/GCS disks broken in 25.8.22 by aws-sdk-cpp 1.11.771 — "Response checksums mismatch" #1708

Type of problem

Describe the situation

How to reproduce the behavior

Environment

Steps

Expected behavior

Actual behavior

Root cause analysis

Suggested fix

Additional context

Related PR

Upstream references

CI failures

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Version	aws-sdk-cpp 1.11.771	`WHEN_REQUIRED` fix	Works on real AWS S3 / GCS
25.8.21	no	n/a	✅
25.8.22 (PR #1667)	yes	❌ missing	❌
25.10.x	yes	✅	✅
26.3.x	yes	✅	✅

#	Run	Commit	Date
1	25091176162	`5c9d523`	2026-04-29
2	24845940234	`5c9d523`	2026-04-23

Regression: S3/GCS disks broken in 25.8.22 by aws-sdk-cpp 1.11.771 — "Response checksums mismatch" #1708

Description

Type of problem

Describe the situation

How to reproduce the behavior

Environment

Steps

Expected behavior

Actual behavior

Root cause analysis

Suggested fix

Additional context

Related PR

Upstream references

CI failures

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions