Hand-Drawn Annotation and Underline Detection and Removal in Scanned Documents Using Artificial Neural Network & Fuzzy C-Means Clustering

ABSTRACT

The OCR system is computerized scanning system that enables user to scan a text document into an electronic computer file that can be edited, usually the OCR system’s performance gets badly affected due to the presence of hand drawn underlines (straight, curved, touched, untouched, bent, broken, elliptical etc) and annotations lines of various forms (such as straight lines, circular lines, elliptical, strokes or embossed lines etc). Such underlines and annotations are drawn by reader in free hand to memorize text, so this need to be removed from the scanned text, so as to make text legible thereby improving OCR efficiency. In this paper, we will discuss the merits and demerits of techniques used for detection and removal of underlines and annotations proposed earlier. Also an efficient technique to detect and remove different types of annotations and underlines is proposed in this paper which is based on Artificial Neural Network and Fuzzy C-means clustering.

[Full Text: PDF]

Updated: June 26, 2023 — 3:14 am