Extraction and Interpretation of Textual Data from Czech Insolvency Proceedings

Publikace na Matematicko-fyzikální fakulta |

2017

Abstrakt

Recently, the Czech Insolvency Register covers about 200 000 insolvency proceedings commenced since 2008. To each respective insolvency proceeding, several scanned document copies can be attached (i.e., cca 1200000 pdf-files in all).

This study aims at finding efficient pre-processing, clustering and classification techniques capable of extracting valid information on the indebtedness structure across the Czech society from the above-mentioned pdf-files.

Klíčová slova

data pre-processing text processing classification clustering knowledge extraction semantics assignment