Surya
Document extractionAn open-source OCR and layout-analysis toolkit covering text, tables, and reading order in many languages.
Surya is a modern OCR toolkit that does text recognition, line and layout detection, reading-order analysis, and table recognition across a wide range of languages.
It comes from the same team behind Marker and is built to be both accurate and fast on a GPU.
Where it's ideally used
A fit when you want modern, multilingual OCR with strong layout and reading-order detection.
Where it doesn't fit
Like any OCR engine, it is one stage — not a complete ingestion or RAG pipeline on its own.