fix(ai-proxy-multi): guard construct_upstream in healthcheck timers#13592
Open
shreemaan-abhishek wants to merge 2 commits into
Open
fix(ai-proxy-multi): guard construct_upstream in healthcheck timers#13592shreemaan-abhishek wants to merge 2 commits into
shreemaan-abhishek wants to merge 2 commits into
Conversation
construct_upstream returns nil, err on its normal failure path (unresolvable endpoint, missing host/port, nil instance) and can raise a Lua error. Both healthcheck timers called it without a pcall or nil check, so one malformed instance made upstream nil and the subsequent upstream.resource_key / upstream._nodes_ver access aborted checker creation/refresh for every resource in that worker, not just the bad one. Wrap both call sites in pcall; on failure log and skip the resource (creation) or keep the existing checker (refresh) instead of crashing the timer.
There was a problem hiding this comment.
Pull request overview
Note
Copilot couldn't run its full agentic review because no GitHub Actions runner was available. Make sure your repository has a runner available to run Copilot's review, or add a copilot-setup-steps.yml file specifying one with the runs-on attribute. See the docs for more details.
Improves robustness of ai-proxy-multi healthcheck manager timers by preventing timer callbacks from crashing when construct_upstream returns nil, err or throws, and introduces regression tests that simulate panics.
Changes:
- Wrap
construct_upstreamcalls in both healthcheck-manager timers withpcalland add failure handling/logging. - Prevent nil-indexing in timer callbacks by skipping resource creation or handling refresh failures more safely.
- Add tests covering panic scenarios during checker creation and working-pool checks.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
apisix/healthcheck_manager.lua |
Adds pcall guards and conditional logic around dynamic upstream construction in healthcheck timers. |
t/plugin/ai-proxy-multi-construct-upstream-panic.t |
Adds regression tests that force construct_upstream to throw to ensure timers don’t die and proxying continues. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+257
to
+271
| ok, upstream, err = pcall(plugin.construct_upstream, upstream_constructor_config) | ||
| if not ok or not upstream then | ||
| err = err or upstream | ||
| upstream = nil | ||
| local err_msg = "[checking checker] unable to construct upstream for plugin: " | ||
| .. plugin_name .. ", resource path: " .. resource_path | ||
| .. ", json path: " .. json_path .. ", error: " .. err | ||
| if not ok then | ||
| core.log.error(err_msg) | ||
| else | ||
| core.log.warn(err_msg) | ||
| end | ||
| else | ||
| upstream.resource_key = resource_path | ||
| end |
Comment on lines
+275
to
283
| if upstream then | ||
| local current_ver = upstream_utils.version(res_conf.modifiedIndex, | ||
| upstream._nodes_ver) | ||
| core.log.info("checking working pool for resource: ", resource_path, | ||
| " current version: ", current_ver, " item version: ", item.version) | ||
| if item.version == current_ver then | ||
| need_destroy = false | ||
| end | ||
| end |
Comment on lines
+75
to
+78
| plugin.construct_upstream = function(instance) | ||
| local panic_check | ||
| panic_check.cnt = 1 | ||
| end |
|
|
||
| __DATA__ | ||
|
|
||
| === TEST 1: panic in construct_upstream during checker creation must not crash the timer |
…afe and non-destructive - guard nil error value before concatenation in timer_working_pool_check - retain existing checker on transient construct_upstream failure - cover nil-return failure paths in both timers
AlinsRan
approved these changes
Jun 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
ai-proxy-multi'sconstruct_upstreamreturnsnil, erron its normal failure path (unresolvable endpoint, missing host/port, nil instance) and can also raise a Lua error. The two healthcheck-manager timers called it without apcallor a nil check:When
construct_upstreamreturnednil(or threw), the followingupstream.resource_key = .../upstream._nodes_veraccess raisedattempt to index a nil valueinside the timer callback that iterates all healthcheck resources. A single malformedai-proxy-multiinstance could therefore abort checker creation/refresh for every resource in that worker, not just the offending one.This wraps both call sites in
pcall. On failure the timer logs and skips the resource (creation path) or keeps the existing checker (refresh path) instead of crashing.Tests
Adds
t/plugin/ai-proxy-multi-construct-upstream-panic.tcovering both timer paths: aconstruct_upstreamoverride that raises a Lua error during checker creation and during the working-pool check. Proxying stays unaffected and the guarded message is logged instead of the worker timer dying.Checklist