ABSTRACT-- In recent years government agencies and industrial enterprises are using the web as the medium of publication. Hence, a large collection of documents, images, text files and other forms of data in structured, semi structured and unstructured forms are available on the web. It has become increasingly difficult to identify relevant pieces of information since the pages are often cluttered with irrelevant content like advertisements, copyright notices, etc surrounding the main content. Thus, we propose a technique that mines the relevant data regions from a web page. This technique is based on three important observations about data regions on the web.

Get Computer Science engineerin ebook here : Computer Science Engineering Apps