Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages "out of the box". It can be trained to recognize other languages. Tesseract supports various output formats: plain-text, hocr(html), pdf.
If you want to access the files under /media/* or /run/media/* you'll have
to connect the snap to the
$ sudo snap connect tesseract:removable-media