perf(importers): batch endpoint creation and status updates during import/reimport#14489
Open
valentijnscholten wants to merge 11 commits intoDefectDojo:devfrom
Open
perf(importers): batch endpoint creation and status updates during import/reimport#14489valentijnscholten wants to merge 11 commits intoDefectDojo:devfrom
valentijnscholten wants to merge 11 commits intoDefectDojo:devfrom
Conversation
…port/reimport Replace per-finding endpoint_get_or_create() calls with a stateful EndpointManager that accumulates endpoints and statuses across findings and flushes them in bulk at batch boundaries. Reduces ~6200 DB queries to ~3-5 for a 1000-finding scan with 5 endpoints per finding (200 unique). - EndpointManager now takes a `product` param and holds internal accumulators - `record_endpoint()` deduplicates by normalized key within a batch - `record_status_for_create()` / `record_statuses_to_mitigate()` / `record_statuses_to_reactivate()` accumulate operations - `persist()` flushes all pending creates and bulk_updates in one shot - `update_endpoint_status()` accumulates mitigate/reactivate lists instead of dispatching per-finding Celery tasks - Removed old `chunk_endpoints_and_*`, `add_endpoints_to_unsaved_finding`, `mitigate_endpoint_status`, `reactivate_endpoint_status` methods - Added unit tests for `_make_endpoint_unique_tuple` normalization - Updated performance test fixture to match new stateful manager interface
… direct callers - bulk_create bypasses Django post_save signals, so manually call inherit_instance_tags() for each newly created Endpoint to preserve product tag inheritance behavior - Initialize endpoint_manager in create_test() so callers that invoke create_test() + process_findings() directly (without going through process_scan()) don't hit a NoneType error
Contributor
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
… and improve get_or_create_endpoints - Add `test` parameter to `_create_endpoint_manager()` so callers pass the test explicitly instead of relying on `self.test` being set - Raise `ValueError` with a clear message when `test is None` - Initialize endpoint_manager in `DefaultImporter.process_scan()` after `parse_findings()` (which calls `create_test()` for fresh imports), covering both the sync path and the async Celery task path - Rename `_fetch_and_create_endpoints()` → `get_or_create_endpoints()` - Change return type to `tuple[dict, list]`: returns `(endpoints_by_key, created)` so callers know exactly which endpoints were newly inserted - Rename local `key_to_endpoint` → `endpoints_by_key` for clarity
Contributor
|
Conflicts have been resolved. A maintainer will review the pull request shortly. |
Instead of lazily initializing endpoint_manager in process_scan() or create_test(), initialize it immediately in __init__ using the product from the required engagement/test parameter. This eliminates the NoneType AttributeError when callers invoke create_test() + process_findings() directly without going through process_scan(). - DefaultImporter: uses self.engagement.product (engagement is required) - DefaultReImporter: uses self.test.engagement.product (test is required) - Remove _create_endpoint_manager() factory methods and lazy init sites - Remove self.endpoint_manager = None from BaseImporter.__init__
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The biggest performance hit during import/reimport is caused by endpoint handling. This is still done a per finding and per endpoint (+ endpoint status) basis resulting in many small DB queries.
This PR extends the endpoint manager to become stateful and keep track of new endpoints and new/changed endpoint statuses during an import. After every batch (of 1000 findings) it will persist the gathered data to the database, just in time for the deduplication batch to start in a celery task.
endpoint_get_or_create()calls (1-2 DB queries each) with a statefulEndpointManagerthat accumulates endpoints and statuses across all findings in a batch and flushes them in bulk at batch boundariesEndpointManagernow and holds internal accumulators:_endpoints_to_create(dict, deduplicates by normalized key),_statuses_to_create,_statuses_to_mitigate,_statuses_to_reactivaterecord_endpoint(),record_status_for_create(),record_statuses_to_mitigate(),record_statuses_to_reactivate(),persist()update_endpoint_status()now accumulates mitigate/reactivate lists instead of dispatching one Celery task per finding; all bulk updates fire inpersist()at the batch boundary_get_or_create_endpoints()fetches all existing product endpoints in one query (using theidx_ep_product_lower_hostindex), bulk-creates missing ones insidetransaction.atomic(), and returns a key→Endpoint map forEndpoint_Statuscreationadd_endpoints_to_unsaved_finding,chunk_endpoints_and_disperse,chunk_endpoints_and_reactivate,chunk_endpoints_and_mitigate,mitigate_endpoint_status,reactivate_endpoint_status_make_endpoint_unique_tuplenormalization (protocol/host case, default port collapsing, None/empty handling)