Charles Explorer logo
🇬🇧

Extraction and Interpretation of Textual Data from Czech Insolvency Proceedings

Publication at Faculty of Mathematics and Physics |
2017

Abstract

Recently, the Czech Insolvency Register covers about 200 000 insolvency proceedings commenced since 2008. To each respective insolvency proceeding, several scanned document copies can be attached (i.e., cca 1200000 pdf-files in all).

This study aims at finding efficient pre-processing, clustering and classification techniques capable of extracting valid information on the indebtedness structure across the Czech society from the above-mentioned pdf-files.