DMS Tips

OCR — how companies benefit from optical character recognition

OCR is an optical character recognition technology for digital images. If, for example, letter mail is digitized, read out files can be completely searched thanks to a full text search. We'll show you how OCR works and how this technology can help you.

What does OCR mean?

What is OCR? The abbreviation OCR stands for optical character recognition and describes a technology for text recognition for digital images.

What do invoices have to do with it, for example? After all, an invoice is not an image, but a text document. It's true — and not at the same time. During the OCR process at Caya, we first scan documents as an image file. For example, printed documents or handwritten text are read out and converted into a digital form along with the recognition of individual characters. The original document can be both text (such as an invoice) or image material with text (e.g. in advertising graphics, flyers, brochures). The original document is first captured as a digital image and then read out in the OCR process. The OCR process for document digitization and data collection is also used for passport documents, account statements and business card data, among other things.

Optical character recognition and OCR software is also an area of research in artificial intelligence (AI) and feature matching and pattern matching. When collecting data, e.g. letters in paper form, OCR software analyses the document structure. This is divided into various elements such as sender, subject line and body text. This is where global structure recognition is used as part of a layout analysis. It can distinguish text blocks from graphic elements and thus recognize both line structures and individual characters. The program stores where which content is located.

During text conversion, individual, extracted lines of text are broken down into words and also into individual letters. An image generated by a scan consists of a collection of pixels, as is the case with every letter. For optical character recognition, algorithms match these pixels with a series of patterns (pattern matching). This allows individual letters to be defined in a wide variety of typographies. A text layer with computer-interpretable characters (ASCII characters) is then placed over the scanned image file.

OCR Scan - Correct reading of text information from images

Both the typography/font of letters, as well as the handwriting of written letters, differ greatly from sender to sender. When OCR an image file, the OCR software must correctly read out the available information. An additional system for context recognition “ICR” (intelligent context recognition) is used for this purpose. ICR supports this in such a way that incorrectly recognized characters can be corrected in context. What does that mean? Example: It would be a readout error to read a number “8” instead of the capital letter “B”. Without ICR context recognition, “Bus” would quickly become “8us.” ICR provides the appropriate correction here. Alternatively, it is also ensured that alphanumeric terms such as “8ter” are not converted for context reasons.

It is also important to note that as the number of digital text conversions increases, error correction improves with every scan. This is where so-called machine learning comes into play. This happens because OCR technology remembers where standardized and recurring content (such as a letterhead or an invoice field) is placed. As a result, the text conversion is optimized with each new scan so that it learns quickly. Another advantage of documents digitized by OCR software is that a newly created PDF file is then searchable by full text search. This is also the case in the Caya Document Cockpit. In the document center, documents can be searched for search terms or text passages.

Scan as OCR - no more manual typing

Digitizing documents through OCR scanning significantly increases companies' productivity. Automated data processing makes your workflow immensely easier compared to manually entering the information contained in documents. Documents can often be searched for all terms they contain. When linked to smart accounting software, for example, it is often no longer necessary to manually type it into other systems. When linked accordingly, data transfer to other software takes place automatically.

In addition, the use of OCR solutions also increases internal company security standards. Expensive and insecure local data storage becomes superfluous if you choose a GoBD-certified provider such as Caya as your document center.

Automatically categorized document storage thanks to OCR

Important information such as sender or subject is automatically recorded during the OCR process. This also applies to invoice processing, data such as invoice amount, payee, IBAN, BIC or transfer purpose.

Caya automatically classifies incoming documents. This makes it easy to filter for invoices, for example. An incoming invoice is also recognized as such (invoice entry) and categorized accordingly. Documents can also be automatically forwarded to the appropriate department. Scanning and archiving during invoice processing is made much easier by an automatically categorized scan. On the way to a paperless office, the annoying manual filing of (digital) documents is therefore superfluous.

In general, the advantages of OCR include the following for the paperless office:

  • Access documents anytime & anywhere
  • Easier access to information for others
  • Scan as PDF completely searchable thanks to full text search
  • Increased editing functions (including copy & paste option)
  • Automatic post categorization (tagging)
  • Digital, audit-proof document storage in the Caya Document Cockpit
  • Options for connecting automated processes to digitized documents

OCR process as a basis for automated workflows

One step that usually starts at the beginning of the document management process chains is letter mail. Caya digitizes your inbox. To do this, we will redirect your mail to one of our scan centers in cooperation with Deutsche Post and PIN AG. We then digitize your mail using a highly automated process. Digitizing incoming mail makes it possible to integrate paper documents into the digital process and speeds up their processing.

Invoice data read in from companies can be automatically transferred to the appropriate accounting tools via integrations. Invoices are scanned and automatically inserted into the accounting tool masks via interfaces. Administrative tasks such as typing are eliminated. Linked to the appropriate billing software, imported data is automatically entered into the programs. As a rule, all that remains is to check and confirm. Responsible employees only receive all data for a final decision (decisionready) during the monitoring process. This relieves the workload so that you can focus on value-adding activities. Invoices can also be paid online via the Caya account with just a few clicks. Payment is made as a SEPA transfer directly from the account.

Papierflieger

Would you like to know more?