Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages "out of the box". It can be trained to recognize other languages. Tesseract supports various output formats: plain-text, hocr(html), pdf.

If you want to access the files under /media/* or /run/media/* you'll have to connect the snap to the core snap's removable-media interface:

$ sudo snap connect tesseract:removable-media

