Fix flaky prometheus_updater_spec by cleaning up PROCESS_TYPE env var#4868
Open
joyvuu-dave wants to merge 1 commit intocloudfoundry:mainfrom
Open
Conversation
Three spec files (runner_spec, puma_runner_spec, connection_metrics_spec) set ENV['PROCESS_TYPE'] via set_process_type_env but never restored it. With randomized test ordering, a leaked value of 'cc-worker' (from connection_metrics_spec) caused ExecutionContext.from_process_type_env to return CC_WORKER instead of API_PUMA_MAIN, which only registers a subset of Prometheus metrics. Subsequent prometheus_updater_spec tests then got nil from registry.get() and failed with "undefined method 'set' for nil". Add around blocks to save/restore PROCESS_TYPE in all three specs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix flaky
prometheus_updater_speccaused byPROCESS_TYPEenv var leaking between testsProblem
The
PrometheusUpdaterspec has been flaky since PR #4749 (commit9c9a9d51d, merged Jan 16 2026), which introducedExecutionContextand made metric registration conditional onENV['PROCESS_TYPE']. CI runs frequently fail with errors like:Root Cause
Three spec files set
ENV['PROCESS_TYPE'](viaset_process_type_env) but never restore it:connection_metrics_spec.rb— sets it tocc-workerandpuma_worker. Thecc-workervalue is the direct cause of the flakiness: it maps toCC_WORKER, which only registersDB_CONNECTION_POOL_METRICS,DELAYED_JOB_METRICS, andVITAL_METRICS. Thepuma_workervalue happens to be harmless today (API_PUMA_WORKERregisters all metrics), but it is still pollution.puma_runner_spec.rb— sets it topuma_worker(viabefore_worker_bootcallback). Also harmless today for the same reason, but still pollution.runner_spec.rb— sets it tomain(viaRunner#initialize), which maps toAPI_PUMA_MAIN— the same context the test environment defaults to. Harmless today, but still pollution.With randomized test ordering, when
connection_metrics_spec.rb'scc-workercontext runs beforeprometheus_updater_spec, the leaked value changes the behavior ofExecutionContext.from_process_type_env:PROCESS_TYPEis unset → falls back toCC_TEST=truecheck → returnsAPI_PUMA_MAIN→ registers all metricsPROCESS_TYPE=cc-worker→ returnsCC_WORKER→ registers only a subset of metrics → metrics like:cc_deployments_in_progress_totalare never registered →@registry.get(...)returnsnilNote:
connection_metrics_spec.rbwas already settingENV['PROCESS_TYPE']without cleanup before PR #4749, but it didn't matter then becausePrometheusUpdaterregistered all metrics unconditionally. PR #4749 made registration conditional, which turned the pre-existing env var pollution into a flaky test.Fix
Add
aroundblocks to all three polluting specs that save and restoreENV['PROCESS_TYPE']:Verification
Reproducible with
--seed 8:bundle exec rspec --seed 8 \ spec/unit/lib/sequel/extensions/connection_metrics_spec.rb \ spec/unit/lib/cloud_controller/runners/puma_runner_spec.rb \ spec/unit/lib/cloud_controller/runner_spec.rb \ spec/unit/lib/cloud_controller/metrics/prometheus_updater_spec.rbmain): 74 examples, 11 failures — the exact same 11prometheus_updater_specfailures seen in CIAdditionally tested with seeds 1–7, 9, 10, 11111, 22222, 33333, 44444, 55555, 66666, 12345, and 67890 — all pass.
I have reviewed the contributing guide
I have viewed, signed, and submitted the Contributor License Agreement
I have made this pull request to the
mainbranchI have run all the unit tests using
bundle exec rakeI have run CF Acceptance Tests