E-Document MLLM Extraction V2 — Agentic Plan-Act-Verify#8365
Draft
Groenbech96 wants to merge 89 commits into
Draft
E-Document MLLM Extraction V2 — Agentic Plan-Act-Verify#8365Groenbech96 wants to merge 89 commits into
Groenbech96 wants to merge 89 commits into
Conversation
…mport Add full credit memo pipeline: PEPPOL CreditNote XML parsing, per-type Process Draft enum dispatch, shared PrepareDraft helper, FinishDraft credit memo creation with ISV extension interface. - Parse CreditNote XML with BillingReference extraction (warning on empty) - Add enum values "Purchase Invoice" (1) and "Purchase Credit Memo" (2), obsolete "Purchase Document" (0) with Pending tag 29.0 - Extract shared PrepareDraft logic into E-Doc. Prepare Draft Helper (6406) - Create Prepare Purch. Cr. Memo Draft (6403) returning correct E-Document Type - Create E-Doc. Create Purch. Cr. Memo (6404) with IEDocumentCreatePurchaseCreditMemo interface (6405) for ISV customization - Wire E-Document Type value 10 to new FinishDraft implementation - Add field 39 "Applies-to Doc. No." to staging header table - Add PEPPOL CreditNote test XML and 5 new tests covering parsing, enum routing, E2E pipeline, FinishDraft undo, and invoice regression Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… naming and tags - Extract shared ApplyDraftToBC logic (amounts, E-Document Link, attachments, totals validation) into FinalizeCreatedDocument/RevertCreatedDocument on E-Doc. Purch. Doc. Helper — both invoice and credit memo codeunits now delegate to it, keeping only type-specific dispatch - Rename "E-Doc. Prepare Draft Helper" to "Prepare Purchase Draft" - Add #if not CLEAN29 tags around obsoleted enum value 0 "Purchase Document" - Fix telemetry tag to empty string per convention Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…achments and charges - Fix CreditNote DueDate XPath: use PaymentMeans/PaymentDueDate per PEPPOL BIS 3.0 spec (Invoice uses top-level cbc:DueDate, CreditNote does not) - Add document attachment extraction from AdditionalDocumentReference with embedded base64 binary objects - Add document-level AllowanceCharge line creation for charges (ChargeIndicator=true), matching V1 behavior - Fix Customer EndpointID: only set GLN when schemeID=0088, store full schemeID:value in Customer Company Id - Fix Description priority: use mandatory Item Name as primary, fallback to Description only if Name absent - Rename XML Utility to PEPPOL Utility (codeunit 6401) - Add PEPPOL BIS 3.0 spec reference comments throughout Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 10 new test cases covering all completeness document items: - Document-level charges/allowances (charge creates line, allowance does not) - Embedded document attachments (base64 extraction, external URI skip) - CreditNote without DueDate (PaymentMeans/PaymentDueDate absent) - Description cascade (Name priority, Description fallback) - PayeeParty override (vendor name and VAT ID) - StandardItemIdentification priority over SellersItemIdentification - Customer endpoint schemeID logic (GLN only for 0088) - Multiple VAT rates and zero-rated VAT category Z Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…data, valid attachment content - Add RegistrationName fallback for vendor and customer name when PartyName/Name is absent (optional in PEPPOL BIS 3.0) - Fix test GLN to valid 13-digit value matching Text[13] field - Use valid PDF and PNG content in attachment test XML Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r PEPPOL BIS 3.0 CreditNote has no top-level cbc:DueDate. Per spec, the due date is at cac:PaymentMeans/cbc:PaymentDueDate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lingReference Populate the Applies-to Doc. No. field on the BC Purchase Credit Memo from the PEPPOL CreditNote BillingReference. Uses direct assignment instead of Validate to avoid triggering Vendor Ledger Entry lookups, since the BillingReference is the vendor's invoice ID, not a BC document number. Consolidates Modify calls in CreatePurchaseCreditMemo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split responsibilities between handler and utility: - Handler (224 lines): orchestration only — what to parse, in what order, document-type-specific dispatch (Invoice vs CreditNote doc info) - Utility (339 lines): reusable PEPPOL extraction — party info, amounts, dates, currency, line fields, attachment decoding, MIME mapping Moved to utility: PopulateSupplierInfo, PopulateCustomerInfo, PopulateAmountsAndDates, PopulateCurrency, SetCurrencyIfForeign, PopulatePurchaseLine, ExtractAttachment, MimeToFileExtension. Methods taking internal table types use 'internal' access to satisfy AL0749 (public codeunit exposing internal parameter types). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… mapping, attachments, XPath Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…line compatibility
… Data Handling codeunit, test assertions - Replace trial-and-error auto-detection (incompatible with try-function context) with namespace-based definition matching against DataExchLineDef.Namespace - Run Data Handling Codeunit (1214) after ImportToDataExch to populate Intermediate Data Import records (skip Pre-Mapping codeunit 6156 only) - Use local variables instead of EDocument.Modify() (record passed by value) - Fix test Sub Total assertions to match actual XML TaxExclusiveAmount
…verflow - AA0228: Remove unused CreateDataExch local method - AA0137: Remove unused EDocumentPurchaseHeader param from MapIntermediateLineFields - AA0005: Remove unnecessary begin..end around single if-else in SupplementWithXPath - AA0139: Change ExtractXPathField to return Text, callers use CopyStr to prevent overflow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Next Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The CodeCop analyzer requires FindSet() to be used with a repeat...until Next() = 0 loop. Refactored the line assertions to use a loop with a case statement instead of standalone FindSet() followed by Next().
Include the actual error message from the Error Message table when the processing step fails, so build logs reveal the root cause instead of a generic assertion failure.
The Error Message table is not available in Clean builds. Use E-Document Log fields for diagnostics instead.
Clean build requires text constants/labels for StrSubstNo format strings. Use concatenation instead for diagnostic output.
In CI environments, the shipped EDOCPEPPOLINVIMP and EDOCPEPPOLCRMEMOIMP definitions may not exist if the install codeunit hasn't run. Explicitly create them in Initialize() using the E-Document Install codeunit.
…PPOL/DataExch Renames staging field 40 from "Vendor Invoice No." to "Applies-to Ext. Invoice No." to clearly distinguish the credit note's external invoice reference from "Sales Invoice No." (the document's own number). Both PEPPOL and Data Exchange handlers now store BillingReference in the same field. Credit memo creation resolves the external reference to an internal BC "Applies-to Doc. No." by looking up posted purchase invoices. Moved the field from the draft page to the extracted data view page. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Renames codeunit 6407 from "E-Document Data Exch. Handler" to "E-Doc. PEPPOL DX Handler" to clarify it handles PEPPOL via Data Exchange definitions. Passes BestDocType through RunPipelineAndBridge to SupplementWithXPath instead of relying on EDocument."Document Type" which may not be set at read-into-draft time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Creates new PEPPOL import definitions (EDOCPEPPOLINVIMPV2, EDOCPEPPOLCRMEMOIMPV2) stored as XML resource files and loaded via NavApp.GetResource. These v2 definitions have no pre-mapping codeunit, making them safe for the v2 import pipeline where Prepare Draft handles vendor/GL resolution separately. The DX handler now calls ProcessDataExchange conformantly instead of manually running individual pipeline steps. FindBestDataExchDef matches the document namespace against known PEPPOL BIS 3.0 namespaces directly. Renames the enum caption from "Data Exchange" to "PEPPOL BIS 3 - Data Exchange" for clarity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ridge V2 definitions now target E-Document Purchase Header (6100) and E-Document Purchase Line (6101) directly instead of BC standard tables. This makes the field mapping fully configurable through the product UI. Added 6 new XML columns per definition to match PEPPOL handler 1:1: - SupplierRegistrationName, SupplierContactName, SupplierTaxSchemeCompanyID - PayeeLegalEntityCompanyID, CustomerRegistrationName, CustLegalEntityCompanyID Replaced the hardcoded MapPurchaseHeaderField/MapPurchaseLineField case statements with a generic RecordRef-based bridge that reads intermediate data by staging table field IDs. Post-processing handles Total VAT calculation, Amount Due, and Currency LCY-blank convention. Removed XPath supplement fallback — no longer needed since the Data Exchange definitions now map directly to staging fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…her AI logic stays debuggable
…ges (already called by SDK internally)
…on reasoning in prompt
…to open with MLLM-overestimated discounts
…oice, get_checklist) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…n handler Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ven Phase 3 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…after each verify
…ages to match tool results
…all messages to match tool results" This reverts commit 2df9aed.
…t in Phase 1 and 2
…ts; remove price/qty > 0 from verify_ranges Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
… result Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…able to checklist Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…lls submit_extraction
Contributor
|
Could not find a linked ADO work item. Please link one by using the pattern 'AB#' followed by the relevant work item number. You may use the 'Fixes' keyword to automatically resolve the work item when the pull request is merged. E.g. 'Fixes AB#1234' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the single-pass MLLM extraction (V1) with an agentic plan-act-verify loop. The model identifies invoice structure, extracts from identified regions, and self-corrects by calling AL-implemented verification tools — rather than sweeping left-to-right across the full text.
Registered as
"MLLM V2"enum value on"Structure Data Impl."alongside V1 (no breaking change).Architecture
One AOAI call loop (GPT-4.1 Mini, temperature 0) with verification tools:
analyze_invoice(...): model records document structure, locale, column roles, line IDs. Initialises the verification checklist.submit_extraction(json)to save it.mark_itemto track progress. On failure: corrects, re-submits, re-verifies. Exits whenget_checklist()shows all passed.Handler reads the saved JSON directly — no re-generation step.
Key fixes in this branch
Evaluate(..., 9)instead ofAsDecimal()— handles XML/Swedish number formats correctly"0"to number0— stops MLLM from returning locale-formatted stringsallowance_charge.percentpreferred overamount.valuefor line discountsverify_invoice_totalsaccounts for header-level discount; newverify_payablecheckstax_exclusive + tax_amount ≈ payable_amountSetHistoryLength(500)prevents the 10-message sliding window from splitting assistant tool-call messages from their results[NonDebuggable]from all methods except tinyUnwrapSecrethelpers that call.Unwrap()onSecretTextNew AL components (ID range 6311–6381)
E-Doc. MLLM Verify ToolsE-Doc. MLLM VL Totals Toolverify_invoice_totalsAOAI adapterE-Doc. MLLM VL VAT Toolverify_vatE-Doc. MLLM VL Dates Toolverify_datesE-Doc. MLLM VL Required Toolverify_required_fieldsE-Doc. MLLM VL Ranges Toolverify_rangesE-Document MLLM Handler V2E-Doc. MLLM Extraction PlanE-Doc. MLLM Plan Status Toolget_checklistE-Doc. MLLM Plan Analyze Toolanalyze_invoiceE-Doc. MLLM Plan Mark Toolmark_itemE-Doc. MLLM VL Math Toolverify_line_mathE-Doc. MLLM VL Payable Toolverify_payableE-Doc. MLLM Plan Submit Toolsubmit_extractionStatus
🚧 Draft — architecture under active refinement. A simplified 2-tool design (
analyze_invoice+submit_and_verify) is being evaluated to reduce model coordination overhead.Test plan
EDoc MLLM Tests(existing V1 tests unaffected)EDoc MLLM Verify Tools Tests(new unit tests)🤖 Generated with Claude Code