feat(ocr): Add image dimension limit and progress callback for PDF OCR#1713

Open

ErixWong wants to merge 1 commit intomicrosoft:mainfrom

ErixWong:feat/pdf-ocr-enhancements

ErixWong commented Apr 11, 2026 •

edited

Loading

This PR enhances the PDF OCR converter with two new features:

1. Image Dimension Limiting (default 1500px)

Prevents oversized images from being sent to LLM Vision APIs
Configurable via MARKITDOWN_MAX_IMAGE_DIMENSION environment variable
Uses LANCZOS resampling for quality preservation
Backward compatible - defaults to 1500px if not specified

2. Progress Callback Support

Allows applications to track PDF processing progress
Reports progress percentage and current operation
Works for both regular PDF parsing and full-page OCR fallback
Optional parameter - no callback means no progress reporting

Benefits

Better performance with large PDFs containing high-res images
Improved user experience for long-running operations
Reduced API costs by resizing images before sending
Backward compatible - all new parameters are optional

Testing

Tested with various PDF files including scanned documents
Verified progress callback works correctly
Confirmed image resizing maintains OCR quality

Fixes: N/A (feature enhancement)


          feat(ocr): Add image dimension limit and progress callback for PDF OCR

2eb8517

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet