Document Imaging is the conversion of paper files (of any size or description) or microfilm / fiche to digital images. This technology is intended for users who have significant arranging large volume of files or back-office problems. As years passed by, hardware prices have dropped and software demand increased, people began to resort to document imaging. You may be one of those people who hear a lot of technical terminologies that are not that familiar. This has been carried over by the market from its days as a niche application.

Here are some commonly used document imaging terms that you should be familiar with (they have loosely been set up in order from the beginning of an imaging process to the end):

This is the process of converting paper to electronic images. With the help of a scanner, it captures the whole paper as an image through its camera, lighting and other imaging electronics.

ADF (automatic document feeder)

What ADF does is it separates document scanners from photo scanners. An ADF is a paper tray that allows a user to place multiple documents in a stack, which will then automatically be run through a scanner in succession.

MFP (multi-function peripheral)

This is a printer machine that can copy and scan paper as well.

Double-feed detection

This has a feature that enables users to make sure that paper is fed smoothly to the scanner and prevent jams.


This pertains to the entire process of converting a paper document to electronic information, from scanning through data capture.

Image processing

This is concerned with every technology applied to a document image as it is being captured.
Auto-color and blank-page detection, as well as deskewing and despeckling, are common processes designed in order to improve the quality of images being output.

Grayscale thresholding

This is an advanced image-processing technique that involves taking grayscale data and using them in order to create higher-quality bitonal images.
TIFF (Tagged Image File Format)

A document image format often utilized in ECM systems. TIFF files are typically bi-tonal and incorporate Group 4 compression. A dedicated TIFF viewer is often required, as most browsers do not support TIFF.

PDF (Portable Document Format)

A document format utilized in some document imaging applications. Both TIFF and PDF are ISO standards. PDF is a richer, more complex format, which can be a double-edged sword for document imaging.

OCR/ICR (optical/intelligent character recognition)

This is used for electronically recognizing numbers, letters, and words on a document image. OCR/ICR can be applied to an entire page for file conversion or full-text indexing purposes or to fields for data extraction.

Forms processing

This is used in extracting data from a document image. Captured data is typically fed into another software application like an ERP, accounting, or ECM system.

IDR (intelligent document recognition)

Advanced forms processing that has helped expand its use from strictly structured forms (tax, health insurance claims, surveys, etc.) to more semi-structured (invoices) and even unstructured (correspondence) documents.


This is an IDR intended to identify document type; typically utilized in order to automate the routing of images to the next step in a workflow.

Confidence levels

Provides thresholds on how confident an automated recognition application is on the accuracy of data being captured. If the confidence level falls below a certain percentage, manual intervention can be invoked.

QA (quality assurance)

This allows human intervention in automated document capture processing. It can be invoked for every document and/or for a random sampling, and it can be based on confidence levels.


This is the ability to move documents among various people and systems that need to access the information on the documents in order to complete a business process. Often, document imaging is utilized in order to facilitate workflow automation, as moving electronic documents among multiple parties can be more efficient than working with paper documents.

ECM (enterprise content management)

This refers to the technology group under which document imaging is often included. Document images are content. Its features usually include automated workflow, access control, mark-up capabilities, and lifecycle/records management.


It provides detailed information about a document. For an image, it might include a date, the type of file, an account number or name, and other relevant information depending on the application. Metadata is often applied at the capture stage of an imaging process and utilized in workflow, retrieval, and archiving.

Records management

It is a software for controlling the lifecycle of a document. It comes in after a document has reached its final format and is being archived for compliance (either internal policy or external regulation) purposes. It has been a big help to organizations to effectively manage their archives by enforcing consistent processes related to the disposal of information.

There are still a lot of terms that you will encounter out there in the field, but this glossary includes all the commonly used ones. This will be enough to at least provide you with a clearer knowledge to make things understandable between you, your vendors, and your customers.