What is OCR (Optical Character Recognition) and how does it work?

Short answer
OCR (Optical Character Recognition) is the technology that converts images of text - scanned documents, photos of signs, screenshots - into machine-readable, editable text. Modern OCR uses convolutional neural networks to detect and classify characters, achieving accuracy rates above 99% on clean printed text. It is used for digitising paper documents, extracting data from invoices and receipts, making scanned PDFs searchable, and processing images in data pipelines.

OCR has evolved through several distinct generations. Early systems in the 1960s-1980s used template matching - comparing character images against stored templates, one font at a time. These were brittle and required controlled conditions. The 1990s brought feature extraction approaches that identified structural features like strokes, curves, and intersections, enabling recognition across multiple fonts. Modern systems use deep learning, specifically convolutional neural networks trained on millions of character images, and can handle handwriting, unusual fonts, low contrast, and skewed text.

The OCR pipeline typically runs through several stages. Pre-processing cleans the image: deskewing (correcting tilted scans), binarisation (converting to pure black and white), noise removal, and contrast normalisation. Layout analysis detects text regions versus images and tables, and identifies reading order. Character recognition then classifies each character or word region. Post-processing applies language models to correct recognition errors using context - 'tlie' is more likely 'the' than a real word.

Practical accuracy varies dramatically by input quality. Clean, high-contrast printed text on white paper yields 99%+ character accuracy. Handwritten text typically achieves 85-95% with modern models. Images with background noise, stamps, watermarks, or low resolution drop accuracy significantly. For high-accuracy use cases like legal document digitisation, post-processing spell-check and human review are standard steps.

Common applications include: extracting product names and prices from supplier invoices, digitising historical archive documents, making scanned contracts searchable, extracting text from screenshots for accessibility, and capturing data from business cards or receipts automatically.

Try the tool

OCR
Reviewed by Searchlight · Last reviewed