Extraction of Semantic Information from Web Resources

Publication at Faculty of Mathematics and Physics |

2008

Abstract

The paper addresses a problem of extraction of semantic information from Czech texts from the Web. The method described in this paper exploits existing linguistic tools created originally for a syntactically annotated corpus, Prague Dependency Treebank (PDT 2.0).

We are working on development of a system which captures text of web-pages, annotates it linguistically by linguistic tools, extracts data and interprets the extracted data semantically in terms of web ontologies. The proposed extraction method is based on extraction rules ? tree queries, which are adopted from the Netgraph application.

Semantic interpretation of these rules provides semantics of the extracted data. We present some initial experiments in the domain of reports of traffic accidents.

Keywords

Extraction Semantic Information Resources