The Czech Insolvency Register contains almost 1500000 of scanned document copies. The paper is focused on efficient extraction of information on the amount of debt from these pdf-files.
Based on the found values of debt and the creditors, individual debtors can be grouped together to form clusters of individuals with a similar structure of debt. Finally, the overall value of debt can be assessed both for the creditors and for the entire country.