Recently, the Czech Insolvency Register covers about 200 000 insolvency proceedings commenced since 2008. In order to better assess the real impact of indebtedness across the Czech society, the data about the creditors or the reasons for the debt might be of great value.
Unfortunately, the vast majority of such information is contained only in scanned document copies attached to the insolvency proceedings. Our goal is thus to extract textual data from the scanned pdf-files and to find the wanted information in the obtained texts.