Skip to content

chore(e2e): widen CloudWatch query window in MetricsE2ET to fix flaky test#2460

Merged
phipag merged 3 commits intomainfrom
fix/flaky-metrics-e2e-test
Apr 9, 2026
Merged

chore(e2e): widen CloudWatch query window in MetricsE2ET to fix flaky test#2460
phipag merged 3 commits intomainfrom
fix/flaky-metrics-e2e-test

Conversation

@phipag
Copy link
Copy Markdown
Contributor

@phipag phipag commented Apr 9, 2026

Summary

Changes

Pads the CloudWatch query time window in MetricsE2ET by -1 min (before) / +2 min (after) to account for CloudWatch eventual consistency. Metric timestamps can shift by up to a minute during batch processing, causing MetricDataNotFoundException or DataNotReadyException in CI.

This padding is applied to all metric queries, including the high-resolution (period=1s) query. Initial testing showed that the high-res query also fails with the original 1-minute window — see failed job run. For the high-res query, the assertion is changed from get(0) == 8 to contains(8.0), since the wider window may return both the standard (4) and high-res (8) product metrics as separate 1-second buckets.

No changes to shared test infrastructure (MetricsFetcher, RetryUtils, LambdaInvoker).

Issue number: #2440


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Pad the CloudWatch query time window by -1min/+2min for standard
resolution metric queries to account for CloudWatch eventual consistency.
Metric timestamps can shift by up to a minute during batch processing,
causing MetricDataNotFoundException when using a tight 1-minute window.

Closes #2440

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@phipag phipag changed the title fix: widen CloudWatch query window in MetricsE2ET to fix flaky test chore(e2e): widen CloudWatch query window in MetricsE2ET to fix flaky test Apr 9, 2026
svozza
svozza previously approved these changes Apr 9, 2026
The high-res metric query (period=1s) also fails with the original
1-minute window due to CloudWatch eventual consistency. Apply the same
padded window and sum all returned data points, since only the
high-resolution metric appears at 1-second granularity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@phipag
Copy link
Copy Markdown
Contributor Author

phipag commented Apr 9, 2026

Verification runs (round 1)

The following E2E test runs were triggered to verify the fix eliminates the flakiness:

Run Created Result
24187807134 11:29:29Z ❌ failed
24187801720 11:29:20Z ❌ failed
24187796682 11:29:13Z ❌ failed
24187791482 11:29:04Z ❌ failed
24187786445 11:28:57Z ❌ failed
24187780279 11:28:47Z ❌ failed

All 6 runs failed. The high-resolution metric query with the padded window returned both standard (4) and high-res (8) product metrics summing to 12 instead of the expected 8. Fixed in a9eab07 by asserting contains(8.0) instead of summing.

svozza
svozza previously approved these changes Apr 9, 2026
With the padded window, both standard (4) and high-res (8) product
metrics appear as separate 1-second buckets, so the sum is 12 not 8.
Instead, assert that the high-res value (8.0) is contained in the
returned data points.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 9, 2026

@phipag
Copy link
Copy Markdown
Contributor Author

phipag commented Apr 9, 2026

Verification runs (round 2) ✅

2 E2E test runs triggered against the latest fix (a9eab07):

Run Link Result
1 https://github.com/aws-powertools/powertools-lambda-java/actions/runs/24190260426 ✅ passed
2 https://github.com/aws-powertools/powertools-lambda-java/actions/runs/24190261336 ✅ passed

Both runs passed.

@phipag
Copy link
Copy Markdown
Contributor Author

phipag commented Apr 9, 2026

Verification runs (round 3)

2 additional E2E test runs to double-check:

Run Link Result
1 https://github.com/aws-powertools/powertools-lambda-java/actions/runs/24194266225 ✅ passed
2 https://github.com/aws-powertools/powertools-lambda-java/actions/runs/24194267655 ✅ passed

Both runs passed. The fix is confirmed to address the CloudWatch eventual consistency flakiness.

@phipag
Copy link
Copy Markdown
Contributor Author

phipag commented Apr 9, 2026

@svozza The PR is now ready. I tested several E2E test rounds (see PR comments).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Maintenance: Fix flaky MetricsE2ET E2E test caused by CloudWatch metrics propagation delays

2 participants