Everything you need to know about Optical Character Recognition

What is optical character recognition? How does OCR work?


Imagine your job consists of receiving hundreds of invoices every day, by post or by email, that you need to scan, decipher the data from them and manually type it into the computer to use it for financial statements. These are the struggles that accountants and bookkeeping professionals face on a daily basis. This manual activity is immensely time consuming, prone to human errors and inaccuracy, and impedes the accountants from performing other tasks that matter even more to their customers. Fortunately, the recent years have witnessed the evolution of an automated solution to these problems, called Optical Character Recognition (OCR). You may have heard this expression around the office, but how much do you actually know about how it does its magic? This article sheds light on the OCR technology and popular use cases, to give you a better understanding of how this solution can be a time saver for your financial department.

OCR definition – What is OCR?

OCR can be defined as a scanning and comparison technique used to identify typed, printed or handwritten text or numerical data inside digital images obtained from PDFs, scanned physical documents or images, and converting them into editable data.[1] Technically, OCRs combine an optical scanner hardware and an AI software to achieve an accurate transformation into machine-readable text.

Why is scanning the documents not enough?

You may wonder, why is the scanning process not sufficient to extract the data? Simply because a scan only renders a non-editable snapshot of the document, so, in order to extract actual data, a software is needed to identify the letters or digits, group them into words or numbers, and give you access to edit the data from the original document into Word and other text processor programs.

When did OCR appear?

OCR has been around for a long time, being used as early as the 1970s with the purpose of reading printed books aloud to blind people. However, OCR became popular in the early 1990s, when it was commonly used to digitize newspapers. The technology has been constantly growing ever since, now being able to accurately recognize full sentences and responding to a wide range of use cases.

OCR technology- How does OCR work?

The popularity of this solution stems from its simplicity: all it requires is a scanner and a competent OCR software. The OCR process starts by scanning physical documents, which are then transformed by the OCR software into a black-and-white format. This format is further analyzed in order to identify dark areas, which are attributed to the characters that need to be recognized, and white areas, which are considered the background. The software processes the dark areas, usually one character, word or block at a time, to outline alphabetic letters or numeric digits using one of the following algorithms:

Pattern recognition

The OCR software library stores examples of text in various fonts and formats, against which the shapes identified in the dark areas are compared.

Feature detection

The OCR software applies certain guidelines for the features of specific letters or numbers, such as shape proximity, number of lines or curves. For example, the capital letter “L” is equivalent to a vertical and a horizontal line intersecting at a 90o angle.

Characters are identified and converted into an ASCII code that can be used by computer systems to handle further manipulations. All that is left for users to do is proofreading and, if necessary, correcting minor errors that might arise in case of complex layouts. After that, the document is ready to be saved and used for further purposes.

OCR use cases – What is OCR used for?

Among many uses in daily business activities, the primary purpose of OCR technology is converting data from printed documents into machine-readable text that can be easily edited in word processor programs such as Microsoft Word. Other use cases for OCR that users might be less familiar with include:

  • Archive historic data (e.g. newspapers or magazines) and signed legal documents into a searchable electronic database;
  • Automate the processes of data entry, data extraction and data processing;
  • Automate the recognition of license plates;
  • Convert documents into text that can be read aloud to blind and visually impaired persons;
  • Deposit checks electronically, without assistance from a bank employee;
  • Index printed documents for search engines;
  • Sort letters for mail delivery;
  • Translate words within an image into a different language.

The main advantages of OCR are straightforward: users are saving time, exerting less effort for data entry, and the accuracy of digitized documents is higher than if they were manually inserted. The entire OCR process takes less than a minute, offering a perfect digital replica of the original document that can now be easily edited, shared, stored and searched for in an archive. OCR also opens a new window of opportunities for data manipulation, such as emailing, highlighting or adding comments, uploading the digital document online (for example on the company’s website), and compressing into ZIP files.

But one of the biggest benefits that often goes unmentioned is the creation of development opportunities for employees that were previously busy with this manual task. In financial departments or fiduciaries for example, accountants can use the time saved with the implementation of OCR on more complex tasks that involve thinking and problem solving, thus enhancing their skills and careers.

OCR limits – Is there anything OCR can’t do?

So far, OCR seems like the ideal solution for text extraction, but it also faces some shortcomings that might be more or less severe, depending on your business objectives. Beyond text extraction, OCR’s applicability is indeed quite narrow.

Limited accuracy of many OCR solutions

First of all, the accuracy of OCR systems is not always perfect; although it has improved greatly over the years, it still depends on the quality and clarity of the scans. In a business where accuracy is critical (such as bookkeeping), extra checking might be necessary. At Fyn, we have developed an OCR that is highly performing for invoices.

A ‘dumb’ extract of characters

Secondly, OCR simply scans and provides users with data from a physical document, but how the data is used and analyzed further is out of the sphere of capabilities of the regular OCR system. The software itself does not make decisions and logical connections like a human brain. OCR needs an extra layer of software to make sense of text. Without such a layer, OCR is very limited. At Fyn, we have developed such a layer, based on Artificial Intelligence, that enables to do superhuman invoice parsing.

How to make OCR more intelligent ?

OCR needs to be taught what to do. There are two ways of doing so:

  • Giving strict examples of templates to follow. For example, for invoices, define a typical electricity invoice, and where to find each field (the amount, the VAT). This traditional template based approach is however very constrained. As the system only recognizes what it has already seen, it is lost whenever a new format or language arrives. And the user (accountant, administrative worker, etc..), has to correct the solution by manually defining which information stands where. Pretty time consuming and costly!
  • Having a smart software that can properly read the content, like a person would do. This is where Artificial Intelligence and Machine Learning come in. Kantify has managed to develop this extra layer thanks to Artificial Intelligence, and incorporated it in Fyn, our AI-powered solution that relieves accountants from the manual struggles of invoice data extraction and processing. Our mission is to relieve bookkeeping professionals from the manual checking thanks to a trustworthy and reliable solution transforming the raw OCR information into meaningful data.

How is Fyn using OCR?

The first step of Fyn consists of OCR processing, but the solution is much more than that. Using advanced machine learning algorithms [link to ML article], Fyn is able to accurately extract data from specific fields from your invoices and use the data for analytics purposes. In a nutshell, it is bringing cognitive capabilities to OCR.

Further reading

  1. https://academic-eb-com.kuleuven.ezproxy.kuleuven.be/levels/collegiate/article/OCR/472978
  2. https://docparser.com/blog/what-is-ocr/
  3. https://searchcontentmanagement.techtarget.com/definition/OCR-optical-character-recognition
  4. https://www.abbyy.com/en-eu/finereader/what-is-ocr/
  5. https://www.irislink.com/EN-RO/c1135/What-is-OCR--.aspx
  6. https://learn.g2crowd.com/what-is-ocr
  7. https://www.explainthatstuff.com/how-ocr-works.html
  8. https://www.sodapdf.com/blog/what-is-ocr-software-and-how-do-i-use-it/
  9. http://www.cvisiontech.com/library/ocr/ocr-pdf/what-is-ocr-used-for.html

Discover what Fyn can do