Volume 7 - 2015

Abstract

This paper proposes a method for automatically identifying and extracting information that matches a predetermined criterion from one or more web pages at one or more web sites and automatically producing one or more extracted data-field names from the information extracted from the one or more web pages at the one or more web sites. The extracted information includes at least one extracted data-field value associated with one of the one or more extracted data-field names. If one of the extracted data-field names matches an existing data-field name in a previously constructed database including one or more data fields each associated with a data-field name and a data-field value, the method updates an extracted data-field value associated with the data-field name in the database. If one of the extracted data field names does not match any of the existing data-field names in the database, the method adds the extracted data-field name to the database.

Keywords: data mining, information retrieval, pattern mathching, reverse engineering, search engine

Published on website: 28.12.2015

Attached files: prepoznavanje-i-ekstrakcija-struktuiranih-podataka-prikaznih-na-veb-stranicama.pdf

FaLang translation system by Faboba

Identifying and extracting structured data from web pages

Volumes

Volume 1 - 2009

Volume 2 - 2010

Volume 3 - 2011

Volume 4 - 2012

Volume 5 - 2013

Volume 6 - 2014

Volume 7 - 2015

Volume 8 - 2016

Volume 9 - 2017

Volume 10 - 2018

Volume 11 - 2019

Volume 12 - 2020

Volume 13 - 2021

Volume 14 - 2022

Volume 15 - 2023

Volume 16 - 2024

Volume 17 - 2025

Authors

Search