perf(importers): batch endpoint creation and status updates during import/reimport by valentijnscholten · Pull Request #14489 · DefectDojo/django-DefectDojo

valentijnscholten · 2026-03-11T11:47:13Z

Summary

The biggest performance hit during import/reimport is caused by endpoint handling. This is still done a per finding and per endpoint (+ endpoint status) basis resulting in many small DB queries.
This PR extends the endpoint manager to become stateful and keep track of new endpoints and new/changed endpoint statuses during an import. After every batch (of 1000 findings) it will persist the gathered data to the database, just in time for the deduplication batch to start in a celery task.

Replace per-finding endpoint_get_or_create() calls (1-2 DB queries each) with a stateful EndpointManager that accumulates endpoints and statuses across all findings in a batch and flushes them in bulk at batch boundaries
A scan with 1000 findings × 5 endpoints (200 unique) goes from ~6200 DB queries to ~3-5 queries
EndpointManager now and holds internal accumulators: _endpoints_to_create (dict, deduplicates by normalized key), _statuses_to_create, _statuses_to_mitigate, _statuses_to_reactivate
New public API: record_endpoint(), record_status_for_create(), record_statuses_to_mitigate(), record_statuses_to_reactivate(), persist()
update_endpoint_status() now accumulates mitigate/reactivate lists instead of dispatching one Celery task per finding; all bulk updates fire in persist() at the batch boundary
_get_or_create_endpoints() fetches all existing product endpoints in one query (using the idx_ep_product_lower_host index), bulk-creates missing ones inside transaction.atomic(), and returns a key→Endpoint map for Endpoint_Status creation
Removed old methods that no longer have callers: add_endpoints_to_unsaved_finding, chunk_endpoints_and_disperse, chunk_endpoints_and_reactivate, chunk_endpoints_and_mitigate, mitigate_endpoint_status, reactivate_endpoint_status
Added unit tests for _make_endpoint_unique_tuple normalization (protocol/host case, default port collapsing, None/empty handling)
Updated performance test fixture to match new stateful manager interface

…port/reimport Replace per-finding endpoint_get_or_create() calls with a stateful EndpointManager that accumulates endpoints and statuses across findings and flushes them in bulk at batch boundaries. Reduces ~6200 DB queries to ~3-5 for a 1000-finding scan with 5 endpoints per finding (200 unique). - EndpointManager now takes a `product` param and holds internal accumulators - `record_endpoint()` deduplicates by normalized key within a batch - `record_status_for_create()` / `record_statuses_to_mitigate()` / `record_statuses_to_reactivate()` accumulate operations - `persist()` flushes all pending creates and bulk_updates in one shot - `update_endpoint_status()` accumulates mitigate/reactivate lists instead of dispatching per-finding Celery tasks - Removed old `chunk_endpoints_and_*`, `add_endpoints_to_unsaved_finding`, `mitigate_endpoint_status`, `reactivate_endpoint_status` methods - Added unit tests for `_make_endpoint_unique_tuple` normalization - Updated performance test fixture to match new stateful manager interface

… direct callers - bulk_create bypasses Django post_save signals, so manually call inherit_instance_tags() for each newly created Endpoint to preserve product tag inheritance behavior - Initialize endpoint_manager in create_test() so callers that invoke create_test() + process_findings() directly (without going through process_scan()) don't hit a NoneType error

github-actions · 2026-03-12T15:35:02Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

… and improve get_or_create_endpoints - Add `test` parameter to `_create_endpoint_manager()` so callers pass the test explicitly instead of relying on `self.test` being set - Raise `ValueError` with a clear message when `test is None` - Initialize endpoint_manager in `DefaultImporter.process_scan()` after `parse_findings()` (which calls `create_test()` for fresh imports), covering both the sync path and the async Celery task path - Rename `_fetch_and_create_endpoints()` → `get_or_create_endpoints()` - Change return type to `tuple[dict, list]`: returns `(endpoints_by_key, created)` so callers know exactly which endpoints were newly inserted - Rename local `key_to_endpoint` → `endpoints_by_key` for clarity

github-actions · 2026-03-15T00:05:12Z

Conflicts have been resolved. A maintainer will review the pull request shortly.

Instead of lazily initializing endpoint_manager in process_scan() or create_test(), initialize it immediately in __init__ using the product from the required engagement/test parameter. This eliminates the NoneType AttributeError when callers invoke create_test() + process_findings() directly without going through process_scan(). - DefaultImporter: uses self.engagement.product (engagement is required) - DefaultReImporter: uses self.test.engagement.product (test is required) - Remove _create_endpoint_manager() factory methods and lazy init sites - Remove self.endpoint_manager = None from BaseImporter.__init__

github-actions bot added the unittests label Mar 11, 2026

valentijnscholten added this to the 2.57.0 milestone Mar 11, 2026

valentijnscholten added the affects_pro PRs that affect Pro and need a coordinated release/merge moment. label Mar 11, 2026

valentijnscholten added 2 commits March 11, 2026 14:58

fix endpoint manager initialization

bcdac76

github-actions bot added the conflicts-detected label Mar 12, 2026

valentijnscholten added 2 commits March 14, 2026 16:27

Merge remote-tracking branch 'upstream/dev' into batch-endpoint-creation

fba74ff

github-actions bot removed the conflicts-detected label Mar 15, 2026

valentijnscholten added 5 commits March 15, 2026 01:07

ruff

ea1963b

fix create endpoint manager

f5ab948

fix counts

d3a20df

Merge remote-tracking branch 'upstream/dev' into batch-endpoint-creation

d63cedb

valentijnscholten marked this pull request as ready for review March 16, 2026 21:26

valentijnscholten requested review from Maffooch and mtesauro as code owners March 16, 2026 21:26

Merge branch 'dev' into batch-endpoint-creation

a8aba54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(importers): batch endpoint creation and status updates during import/reimport#14489

perf(importers): batch endpoint creation and status updates during import/reimport#14489
valentijnscholten wants to merge 11 commits intoDefectDojo:devfrom
valentijnscholten:batch-endpoint-creation

valentijnscholten commented Mar 11, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

valentijnscholten commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

valentijnscholten commented Mar 11, 2026 •

edited

Loading