Linguistic extraction for semantic annotation

Publication at Faculty of Mathematics and Physics |

2008

Abstract

Bottleneck for semantic web services is lack of semantically annotated information. We deal with linguistic information extraction from Czech texts from the Web for semantic annotation.

The method described in the paper exploits existing linguistic tools created originally for a syntactically annotated corpus, Prague Dependency Treebank (PDT 2.0).We propose a system which captures text of web-pages, annotates it linguistically by PDT tools, extracts data and stores the data in an ontology.We focus on the third phase ? data extraction ? and present methods for learning queries over linguistically annotated data. Our experiments in the domain of reports of tra?c accidents enable e.g. summarization of the number of injured people.

This serves as a proof of concept of our solution. More experiments, for different queries and different domain are planned in the future.

This will improve third party semantic annotation of web resources.

Keywords

Linguistic extraction semantic annotation