OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted.
ocrmypdf # it's a scriptable command line program -l eng+fra # it supports multiple languages --rotate-pages # it can fix pages that are misrotated --deskew # it can deskew crooked PDFs! --title "My PDF" # it can change output metadata --jobs 4 # it uses multiple cores by default --output-type pdfa # it produces PDF/A by default input_scanned.pdf # takes PDF input (or images) output_searchable.pdf # produces validated PDF output
See the release notes for details on the latest changes.
- Generates a searchable PDF/A file from a regular PDF
- Places OCR text accurately below the image to ease copy / paste
- Keeps the exact resolution of the original embedded images
- When possible, inserts OCR information as a "lossless" operation without disrupting any other content
- Optimizes PDF images, often producing files smaller than the input file
- If requested, deskews and/or cleans the image before performing OCR
- Validates input and output files
- Distributes work across all available CPU cores
- Uses Tesseract OCR engine to recognize more than 100 languages
- Scales properly to handle files with thousands of pages
- Battle-tested on millions of PDFs
For details: please consult the documentation.