[AMORO-4099] Add table-summary metric collection option for non-optimizing tables#4101
[AMORO-4099] Add table-summary metric collection option for non-optimizing tables#4101j1wonpark wants to merge 4 commits intoapache:masterfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4101 +/- ##
============================================
+ Coverage 22.39% 28.86% +6.46%
- Complexity 2552 3951 +1399
============================================
Files 458 654 +196
Lines 42116 52175 +10059
Branches 5917 6637 +720
============================================
+ Hits 9433 15061 +5628
- Misses 31871 36013 +4142
- Partials 812 1101 +289
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
88f649c to
07d6a30
Compare
07d6a30 to
0d41056
Compare
…izing tables - Allow collecting table_summary metrics when self-optimizing is disabled by setting table-summary.enabled=true - Fix periodic collection bug: remove optimizingNotNecessary() call in summary-only branch to prevent snapshot gate from blocking subsequent collections - Separate property key from self-optimizing prefix: self-optimizing.table-summary.enabled -> table-summary.enabled - Add debug logging for table summary collection path - Add comprehensive test coverage for summary-only mode Signed-off-by: Jiwon Park <jpark92@outlook.kr>
36f577a to
c7ac103
Compare
…yOnly gate - Remove duplicate newAppend().commit() in test appendData() since OptimizingTestHelpers.appendBase() already commits - Remove unused AppendFiles import - Remove redundant tableSummaryOnly bypass in execute() since snapshotChanged already covers unoptimized data - Fix misleading test comment Signed-off-by: Jiwon Park <jpark92@outlook.kr>
Signed-off-by: Jiwon Park <jpark92@outlook.kr>
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@amoro.apache.org list. Thank you for your contributions. |
Why are the changes needed?
Currently,
table_summarymetrics (total files, file sizes, health score, etc.) are only collected whenself-optimizing.enabled=true. The metric update path (setTableSummary()) is gated behind theoptimizingConfig.isEnabled()check inTableRuntimeRefreshExecutor.tryEvaluatingPendingInput(), so tables with self-optimizing disabled always show 0 or N/A in Grafana dashboards.This PR decouples metric collection from self-optimizing execution by introducing a new table property
table-summary.enabled.Close #4099.
Brief change log
table-summary.enabledtable property constant inTableProperties(default:false)tableSummaryEnabledfield toOptimizingConfigwith getter/setterTableConfigurations.parseOptimizingConfig()else ifbranch inTableRuntimeRefreshExecutor.tryEvaluatingPendingInput()to collect summary metrics when optimizing is disabled but table-summary is enabledTestTableSummaryWithoutOptimizingtest class with 7 test casesHow was this patch tested?
testSummaryCollectedWhenOptimizingDisabledAndSummaryEnabled: verifies metrics are updated when optimizing is off but summary is enabledtestSummaryNotCollectedWhenOptimizingDisabledWithDefault: verifies metrics remain at initial values when only optimizing is disabled (table-summary defaults to false)testSummaryCollectedRepeatedlyWithoutNewSnapshots: verifies metrics are collected on repeated refreshes when unoptimized data existstestSummaryUpdatedAfterNewSnapshotInSummaryOnlyMode: verifies metrics reflect new data appended in summary-only modetestPropertyKeyNotFilteredWhenOptimizingDisabled: verifiestable-summary.enabledis parsed independently fromself-optimizing.enabledtestReEnableOptimizingAfterSummaryOnlyMode: verifies normal optimizing path works after re-enablingtestSummaryCollectedWithOptimizingEnabled: verifies summary is always collected in the optimizing-enabled pathDocumentation