Scantools is a high-quality library and a matching set of command-line programs for the handling and manipulation of scanned documents. At present, the tools can convert image files to PDF/A. Files in JBIG2, JPEG, and JPEG2000 format are directly included in the PDF, other files are compressed in a lossless manner. HOCR files, which are produced by optical character recognition programs such as ‘tesseract’, can be used to make the PDF file searchable. The resulting files comply with the ISO PDF/A standard for long-term archiving of digital documents and offer compression rates comparable to that of the DJVU file format. There are currently three command-line utilities.
scantools.image2pdf converts images to a PDF/A compliant PDF file.
scantools.hocr2any converts HOCR files to text or renders them as raster graphics or PDF files.
scantools.ocrPDF adds a text layer to a graphics-only PDF file, without re-encoding graphics data or otherwise modifying file content.