OMNIFONT recognition in Indian language OCR seminar report/pdf/ppt download
Abstract : Most of the Indian language electronic data are Unicode encoded. Processing Unicode data is quite straight forward because it follows distinguished code ranges for each language and there is a one-to-one correspondence between characters.
Hence it becomes necessary to identify the font encoding and convert the font-data into a phonetic notation. This project proposes an approach for identifying the font-type (font encoding name) of a font-data.
This thesis also proposes a generic framework to build font converters for conversion of font-data into a phonetic transliteration scheme in Indian languages.
Development of OCRs for Indian script is an active area of research today. In Indian language Devnagri scripts present great challenges to an OCR designer due to the large number of letters in the alphabet, the sophisticated ways in which they combine, and the complicated graphemes they result in.
The problem is compounded by the unstructured manner in which popular fonts are designed. There is a lot of common structure in the different Indian scripts.
In this project, we argue that a number of automatic and semi-automatic tools can ease the development of recognizers for new font styles and new scripts.
We discuss briefly and show how they have helped build new OCRs for the purpose of omni font recognition in the Hindi language. An integrated approach to the design of OCRs for all Devnagri scripts has great benefits.
We are building OCRs for Hindi language following this approach as part of a system to provide tools to create content in them.
In this project we present a Multi-font OCR system to be employed for document processing, which performs recognition of the font-style belonging to a subset of the existing fonts.
The detection of the font-style of the document words can guide a rough automatic classification of documents, and can also be used to improve the character recognition .
An alternative for the crucial task of Optical Font Recognition (OFR) is proposed in this work; this is based on the analysis of texture characteristics of document images formed of pure text..
A printed text block with a unique font is suitable to provide the specific texture properties necessary for the process of recognition of the most commonly used fonts in the Hindi language.
A typical OCR system contains three logical components: an image scanner, OCR software and hardware, and an output interface. The image scanner optically captures text images to be recognized.
Text images are processed with OCR software and hardware. The process involves three operations: document analysis (extracting individual character images), recognizing these images (based on shape), and contextual processing (either to correct misclassifications made by the recognition algorithm or to limit recognition choices).
The output interface is responsible for communication of OCR system results to the outside world.
Please find the following attachments"OMNIFONT recognition in Indian language OCR seminar report/pdf/ppt download" here.........
Re: OMNIFONT recognition in Indian language OCR seminar report/pdf/ppt download
The OCR feature in the omni font is not free... I searched in google, and found there are many free online ocr tools, most of them can support multiple languages, and can recognize text from image and pdf.