OCR Explained: How to Turn Photos and Scans into Editable Text (17 Languages)

A scanned page, a photo of a notice board, a screenshot of an error message, a PDF "document" that's secretly just pictures of pages — to a computer, all of these are identical: grids of colored pixels containing zero text. You can't copy a sentence from them, search them, or paste their contents into an email. Optical Character Recognition (OCR) is the bridge: software that looks at pixels the way a reader does and reconstructs the characters.

OCR used to mean expensive desktop software or uploading documents to a cloud service. It now runs entirely inside a browser tab — the Image to Text tool uses the open-source Tesseract engine compiled to run on your own device, supporting seventeen languages including Hindi, Tamil, Telugu and Bengali, with nothing transmitted anywhere. Here's how to get genuinely good results from it, because OCR quality is mostly determined before the software ever runs.

How OCR actually works (the two-minute version)

The engine first cleans the image — straightens it, separates dark from light, finds the lines and words. Then for each word it asks two questions in tension: what do these shapes look like? (character recognition) and what word is statistically plausible here? (a language model). That's why selecting the right language matters enormously — recognizing Hindi with the English model isn't "slightly worse," it's gibberish, because the model is matching Devanagari shapes against Latin letters and English vocabulary.

It's also why the first run downloads a language model file (a few megabytes): that's the trained knowledge of the script and lexicon, cached locally afterward.

The five factors that decide accuracy

OCR on a clean 300-DPI scan of printed text approaches 99% accuracy. The same engine on a blurry, tilted phone photo under tube light might manage 60%. The difference is input quality:

Resolution. Text should be at least ~25–30 pixels tall in the image. A full-page photo from across the desk fails; the same page filling the frame succeeds. When photographing, get close.
Lighting and contrast. Even daylight beats yellow indoor light; avoid shadows of your own phone falling across the page (the classic mistake).
Skew. Hold the camera square to the page. A 10° tilt measurably hurts; OCR straightening helps but can't fully rescue careless angles.
Clean backgrounds. Text on busy images, watermarks, or colored gradients confuses segmentation. Crop to just the text region first with the Image Cropper when an image mixes text with graphics.
Print vs handwriting. Tesseract is built for printed text. Neat handwriting yields partial results; cursive is largely beyond it — that's specialized-model territory, and it's honest to say no free in-browser tool does it well.

Step-by-step: image or scanned PDF to text

Open Image to Text.
Drop in a JPG/PNG/WebP photo — or a scanned PDF: the tool renders every page and recognizes them in sequence with a progress bar, inserting page markers in the output.
Choose the language. For mixed documents (common in India — Hindi headings, English body), pick English + Hindi combined mode.
Click Extract text. First run per language downloads the model; afterward it's cached and works offline.
Proofread the output box — fix the occasional confusion (1/l/I, 0/O, rn/m are the classics) — then copy it or download as .txt.

For numbers-heavy documents destined for spreadsheets, paste the proofread text into the Data Converter or rebuild the table directly in Excel; if your PDF has selectable text already (try selecting it!), skip OCR entirely and use PDF to Excel, which reads digital text positions far more accurately than recognition ever can.

Real workflows people use this for

Old records to searchable archives — scan, OCR, save text alongside the image; suddenly decades of paper are Ctrl+F-able.

Quoting from books and printed reports — photograph the page, extract, paste the quote (with attribution where it's due) instead of retyping.

Hindi and regional-language documents — government notices, society circulars, regional contracts: OCR them in their own script, then translate or archive. The combined English+Hindi mode handles the very common mixed format.

Screenshots that should have been text — error messages, terms in apps that block copying, WhatsApp-forwarded "documents."

Bank statements that arrive as scans — OCR first, then structure the data; for digital statements skip straight to Bank Statement to Excel.

Privacy: why local OCR matters

Documents people OCR are disproportionately sensitive — IDs, financial records, contracts, medical reports. Upload-based OCR services receive all of it. Running the engine in your browser inverts the model: the image, the recognition, and the resulting text all live and die on your device. After the language model is cached you can verify this directly — disconnect from the internet and the tool keeps recognizing.

When you have fifty pages, not one

Batch jobs change the calculus slightly. For a long scanned book or records file, OCR the PDF in one pass (the tool processes pages sequentially — a hundred pages is a coffee break, since your own device is doing the work), but proofread strategically rather than line-by-line: skim for the systematic errors first. OCR mistakes are consistent — if it misreads the document's font's "5" as "S" once, it does so throughout, and one find-and-replace fixes hundreds of instances. Spot-check numbers more carefully than prose; language models rescue misread words but have no opinion about digits.

Frequently asked questions

Which languages are supported? English, Hindi, Tamil, Telugu, Bengali, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Urdu, Arabic, Spanish, French, German and Chinese — plus the English+Hindi combined mode for mixed documents.

Why is my output garbage? Almost always: wrong language selected, or the photo is low-resolution/tilted/badly lit. Re-shoot closer and straighter, pick the correct language, and accuracy usually jumps dramatically.

Can it OCR handwriting? Print only, realistically. Neat block letters partially; cursive no — and tools claiming otherwise in a browser deserve skepticism.

Does it preserve formatting like bold and tables? You get the text with line breaks, not the styling. Tables come out as text lines; for genuinely tabular digital PDFs, PDF to Excel preserves structure far better.

Image to Text (OCR) — 17 languages, images and scanned PDFs
Hindi OCR — direct route for Devanagari documents
PDF to Excel — for digital (selectable-text) PDFs

OCR Explained: How to Turn Photos and Scans into Editable Text (17 Languages)

How OCR actually works (the two-minute version)

The five factors that decide accuracy

Step-by-step: image or scanned PDF to text

Real workflows people use this for

Privacy: why local OCR matters

When you have fifty pages, not one

Frequently asked questions

Related reading

Best Free PDF Tools in 2026 (Tested & Ranked)

How to Compress a PDF Without Losing Quality

OCR Explained: How to Turn Photos and Scans into Editable Text (17 Languages)

How OCR actually works (the two-minute version)

The five factors that decide accuracy

Step-by-step: image or scanned PDF to text

Real workflows people use this for

Privacy: why local OCR matters

When you have fifty pages, not one

Frequently asked questions

Related tools

Related reading

Best Free PDF Tools in 2026 (Tested & Ranked)

How to Compress a PDF Without Losing Quality