Skip to content

feat(ocr): Add image dimension limit and progress callback for PDF OCR#1713

Open
ErixWong wants to merge 1 commit intomicrosoft:mainfrom
ErixWong:feat/pdf-ocr-enhancements
Open

feat(ocr): Add image dimension limit and progress callback for PDF OCR#1713
ErixWong wants to merge 1 commit intomicrosoft:mainfrom
ErixWong:feat/pdf-ocr-enhancements

Conversation

@ErixWong
Copy link
Copy Markdown

@ErixWong ErixWong commented Apr 11, 2026

This PR enhances the PDF OCR converter with two new features:

1. Image Dimension Limiting (default 1500px)

  • Prevents oversized images from being sent to LLM Vision APIs
  • Configurable via MARKITDOWN_MAX_IMAGE_DIMENSION environment variable
  • Uses LANCZOS resampling for quality preservation
  • Backward compatible - defaults to 1500px if not specified

2. Progress Callback Support

  • Allows applications to track PDF processing progress
  • Reports progress percentage and current operation
  • Works for both regular PDF parsing and full-page OCR fallback
  • Optional parameter - no callback means no progress reporting

Benefits

  • Better performance with large PDFs containing high-res images
  • Improved user experience for long-running operations
  • Reduced API costs by resizing images before sending
  • Backward compatible - all new parameters are optional

Testing

  • Tested with various PDF files including scanned documents
  • Verified progress callback works correctly
  • Confirmed image resizing maintains OCR quality

Fixes: N/A (feature enhancement)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant