Skip to content

E-Document MLLM Extraction V2 — Agentic Plan-Act-Verify#8365

Draft
Groenbech96 wants to merge 89 commits into
mainfrom
magnushar/edoc-import-v2-data-exchange
Draft

E-Document MLLM Extraction V2 — Agentic Plan-Act-Verify#8365
Groenbech96 wants to merge 89 commits into
mainfrom
magnushar/edoc-import-v2-data-exchange

Conversation

@Groenbech96
Copy link
Copy Markdown
Contributor

@Groenbech96 Groenbech96 commented May 28, 2026

Summary

Replaces the single-pass MLLM extraction (V1) with an agentic plan-act-verify loop. The model identifies invoice structure, extracts from identified regions, and self-corrects by calling AL-implemented verification tools — rather than sweeping left-to-right across the full text.

Registered as "MLLM V2" enum value on "Structure Data Impl." alongside V1 (no breaking change).

Architecture

One AOAI call loop (GPT-4.1 Mini, temperature 0) with verification tools:

  1. PLANanalyze_invoice(...): model records document structure, locale, column roles, line IDs. Initialises the verification checklist.
  2. ACT — model extracts UBL JSON guided by Phase 1 analysis, then calls submit_extraction(json) to save it.
  3. VERIFY — model calls verification tools and mark_item to track progress. On failure: corrects, re-submits, re-verifies. Exits when get_checklist() shows all passed.

Handler reads the saved JSON directly — no re-generation step.

Key fixes in this branch

  • Decimal parsing: Evaluate(..., 9) instead of AsDecimal() — handles XML/Swedish number formats correctly
  • UBL schema: numeric placeholders changed from string "0" to number 0 — stops MLLM from returning locale-formatted strings
  • Discount mapping: allowance_charge.percent preferred over amount.value for line discounts
  • Header totals verification: verify_invoice_totals accounts for header-level discount; new verify_payable checks tax_exclusive + tax_amount ≈ payable_amount
  • History window fix: SetHistoryLength(500) prevents the 10-message sliding window from splitting assistant tool-call messages from their results
  • System Application AI: removed [NonDebuggable] from all methods except tiny UnwrapSecret helpers that call .Unwrap() on SecretText

New AL components (ID range 6311–6381)

Codeunit ID Role
E-Doc. MLLM Verify Tools 6311 6 verification methods (pure AL math)
E-Doc. MLLM VL Totals Tool 6313 verify_invoice_totals AOAI adapter
E-Doc. MLLM VL VAT Tool 6314 verify_vat
E-Doc. MLLM VL Dates Tool 6315 verify_dates
E-Doc. MLLM VL Required Tool 6316 verify_required_fields
E-Doc. MLLM VL Ranges Tool 6317 verify_ranges
E-Document MLLM Handler V2 6318 Main handler — agentic loop
E-Doc. MLLM Extraction Plan 6340 SingleInstance checklist + JSON store
E-Doc. MLLM Plan Status Tool 6341 get_checklist
E-Doc. MLLM Plan Analyze Tool 6342 analyze_invoice
E-Doc. MLLM Plan Mark Tool 6344 mark_item
E-Doc. MLLM VL Math Tool 6339 verify_line_math
E-Doc. MLLM VL Payable Tool 6345 verify_payable
E-Doc. MLLM Plan Submit Tool 6346 submit_extraction

Status

🚧 Draft — architecture under active refinement. A simplified 2-tool design (analyze_invoice + submit_and_verify) is being evaluated to reduce model coordination overhead.

Test plan

  • Compile E-Document Core + System Application
  • Run EDoc MLLM Tests (existing V1 tests unaffected)
  • Run EDoc MLLM Verify Tools Tests (new unit tests)
  • Smoke test Swedish invoice (comma decimal, two chained 20% discounts)
  • Smoke test DK invoice with header discount and payable amount

🤖 Generated with Claude Code

Magnus Hartvig Grønbech and others added 30 commits March 31, 2026 08:51
…mport

Add full credit memo pipeline: PEPPOL CreditNote XML parsing, per-type
Process Draft enum dispatch, shared PrepareDraft helper, FinishDraft
credit memo creation with ISV extension interface.

- Parse CreditNote XML with BillingReference extraction (warning on empty)
- Add enum values "Purchase Invoice" (1) and "Purchase Credit Memo" (2),
  obsolete "Purchase Document" (0) with Pending tag 29.0
- Extract shared PrepareDraft logic into E-Doc. Prepare Draft Helper (6406)
- Create Prepare Purch. Cr. Memo Draft (6403) returning correct E-Document Type
- Create E-Doc. Create Purch. Cr. Memo (6404) with IEDocumentCreatePurchaseCreditMemo
  interface (6405) for ISV customization
- Wire E-Document Type value 10 to new FinishDraft implementation
- Add field 39 "Applies-to Doc. No." to staging header table
- Add PEPPOL CreditNote test XML and 5 new tests covering parsing,
  enum routing, E2E pipeline, FinishDraft undo, and invoice regression

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… naming and tags

- Extract shared ApplyDraftToBC logic (amounts, E-Document Link, attachments,
  totals validation) into FinalizeCreatedDocument/RevertCreatedDocument on
  E-Doc. Purch. Doc. Helper — both invoice and credit memo codeunits now
  delegate to it, keeping only type-specific dispatch
- Rename "E-Doc. Prepare Draft Helper" to "Prepare Purchase Draft"
- Add #if not CLEAN29 tags around obsoleted enum value 0 "Purchase Document"
- Fix telemetry tag to empty string per convention

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…achments and charges

- Fix CreditNote DueDate XPath: use PaymentMeans/PaymentDueDate per
  PEPPOL BIS 3.0 spec (Invoice uses top-level cbc:DueDate, CreditNote
  does not)
- Add document attachment extraction from AdditionalDocumentReference
  with embedded base64 binary objects
- Add document-level AllowanceCharge line creation for charges
  (ChargeIndicator=true), matching V1 behavior
- Fix Customer EndpointID: only set GLN when schemeID=0088, store
  full schemeID:value in Customer Company Id
- Fix Description priority: use mandatory Item Name as primary,
  fallback to Description only if Name absent
- Rename XML Utility to PEPPOL Utility (codeunit 6401)
- Add PEPPOL BIS 3.0 spec reference comments throughout

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 10 new test cases covering all completeness document items:
- Document-level charges/allowances (charge creates line, allowance does not)
- Embedded document attachments (base64 extraction, external URI skip)
- CreditNote without DueDate (PaymentMeans/PaymentDueDate absent)
- Description cascade (Name priority, Description fallback)
- PayeeParty override (vendor name and VAT ID)
- StandardItemIdentification priority over SellersItemIdentification
- Customer endpoint schemeID logic (GLN only for 0088)
- Multiple VAT rates and zero-rated VAT category Z

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…data, valid attachment content

- Add RegistrationName fallback for vendor and customer name when
  PartyName/Name is absent (optional in PEPPOL BIS 3.0)
- Fix test GLN to valid 13-digit value matching Text[13] field
- Use valid PDF and PNG content in attachment test XML

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r PEPPOL BIS 3.0

CreditNote has no top-level cbc:DueDate. Per spec, the due date is
at cac:PaymentMeans/cbc:PaymentDueDate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lingReference

Populate the Applies-to Doc. No. field on the BC Purchase Credit Memo
from the PEPPOL CreditNote BillingReference. Uses direct assignment
instead of Validate to avoid triggering Vendor Ledger Entry lookups,
since the BillingReference is the vendor's invoice ID, not a BC
document number. Consolidates Modify calls in CreatePurchaseCreditMemo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split responsibilities between handler and utility:
- Handler (224 lines): orchestration only — what to parse, in what order,
  document-type-specific dispatch (Invoice vs CreditNote doc info)
- Utility (339 lines): reusable PEPPOL extraction — party info, amounts,
  dates, currency, line fields, attachment decoding, MIME mapping

Moved to utility: PopulateSupplierInfo, PopulateCustomerInfo,
PopulateAmountsAndDates, PopulateCurrency, SetCurrencyIfForeign,
PopulatePurchaseLine, ExtractAttachment, MimeToFileExtension.

Methods taking internal table types use 'internal' access to satisfy
AL0749 (public codeunit exposing internal parameter types).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… mapping, attachments, XPath

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… Data Handling codeunit, test assertions

- Replace trial-and-error auto-detection (incompatible with try-function context)
  with namespace-based definition matching against DataExchLineDef.Namespace
- Run Data Handling Codeunit (1214) after ImportToDataExch to populate
  Intermediate Data Import records (skip Pre-Mapping codeunit 6156 only)
- Use local variables instead of EDocument.Modify() (record passed by value)
- Fix test Sub Total assertions to match actual XML TaxExclusiveAmount
…verflow

- AA0228: Remove unused CreateDataExch local method
- AA0137: Remove unused EDocumentPurchaseHeader param from MapIntermediateLineFields
- AA0005: Remove unnecessary begin..end around single if-else in SupplementWithXPath
- AA0139: Change ExtractXPathField to return Text, callers use CopyStr to prevent overflow

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Next

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The CodeCop analyzer requires FindSet() to be used with a
repeat...until Next() = 0 loop. Refactored the line assertions
to use a loop with a case statement instead of standalone
FindSet() followed by Next().
Include the actual error message from the Error Message table
when the processing step fails, so build logs reveal the root
cause instead of a generic assertion failure.
The Error Message table is not available in Clean builds. Use
E-Document Log fields for diagnostics instead.
Clean build requires text constants/labels for StrSubstNo format
strings. Use concatenation instead for diagnostic output.
In CI environments, the shipped EDOCPEPPOLINVIMP and
EDOCPEPPOLCRMEMOIMP definitions may not exist if the install
codeunit hasn't run. Explicitly create them in Initialize()
using the E-Document Install codeunit.
…PPOL/DataExch

Renames staging field 40 from "Vendor Invoice No." to "Applies-to Ext. Invoice No."
to clearly distinguish the credit note's external invoice reference from
"Sales Invoice No." (the document's own number).

Both PEPPOL and Data Exchange handlers now store BillingReference in the
same field. Credit memo creation resolves the external reference to an
internal BC "Applies-to Doc. No." by looking up posted purchase invoices.

Moved the field from the draft page to the extracted data view page.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Renames codeunit 6407 from "E-Document Data Exch. Handler" to
"E-Doc. PEPPOL DX Handler" to clarify it handles PEPPOL via
Data Exchange definitions.

Passes BestDocType through RunPipelineAndBridge to SupplementWithXPath
instead of relying on EDocument."Document Type" which may not be set
at read-into-draft time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Creates new PEPPOL import definitions (EDOCPEPPOLINVIMPV2,
EDOCPEPPOLCRMEMOIMPV2) stored as XML resource files and loaded via
NavApp.GetResource. These v2 definitions have no pre-mapping codeunit,
making them safe for the v2 import pipeline where Prepare Draft handles
vendor/GL resolution separately.

The DX handler now calls ProcessDataExchange conformantly instead of
manually running individual pipeline steps. FindBestDataExchDef matches
the document namespace against known PEPPOL BIS 3.0 namespaces directly.

Renames the enum caption from "Data Exchange" to
"PEPPOL BIS 3 - Data Exchange" for clarity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ridge

V2 definitions now target E-Document Purchase Header (6100) and
E-Document Purchase Line (6101) directly instead of BC standard tables.
This makes the field mapping fully configurable through the product UI.

Added 6 new XML columns per definition to match PEPPOL handler 1:1:
- SupplierRegistrationName, SupplierContactName, SupplierTaxSchemeCompanyID
- PayeeLegalEntityCompanyID, CustomerRegistrationName, CustLegalEntityCompanyID

Replaced the hardcoded MapPurchaseHeaderField/MapPurchaseLineField case
statements with a generic RecordRef-based bridge that reads intermediate
data by staging table field IDs. Post-processing handles Total VAT
calculation, Amount Due, and Currency LCY-blank convention.

Removed XPath supplement fallback — no longer needed since the Data
Exchange definitions now map directly to staging fields.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Magnus Hartvig Grønbech and others added 27 commits May 27, 2026 08:52
…oice, get_checklist)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…n handler

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ven Phase 3

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…all messages to match tool results"

This reverts commit 2df9aed.
…ts; remove price/qty > 0 from verify_ranges

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
… result

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…able to checklist

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Could not find a linked ADO work item. Please link one by using the pattern 'AB#' followed by the relevant work item number. You may use the 'Fixes' keyword to automatically resolve the work item when the pull request is merged. E.g. 'Fixes AB#1234'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant