In this paper we expand upon the previous efforts to infer schema information from existing XML documents. We focus on inference of integrityconstraints, more specifically ID/IDREF/IDREFS attributes in DTD.
Building on the research by Barbosa and Mendelzon (2003) we introduce a heuristic approach to the problem of finding an optimal ID set. The approach is evaluated and tuned in a wide range of experiments.