Although, currently available full text search engines can be used on mathematical documents, they are deficient in almost all cases. They cannot handle structured mathematical text and mathematical operations.
In this work, we address these issues and present a technique how to index real-world scientific documents containing mathematical notation by exploiting the current state-of-art of full text search engines. Our approach has several advantages over existing solutions.
It is primarily intended for documents on the WWW, which are mostly semantically poor, and offers an extensible level of mathematical awareness supporting also similarity searches. Furthermore, it is designed as an extension and therefore any full text search engine can easily adopt it.
The experiments over two real-world document sets showed that the performance is highly dependent on several features of the mathematical search engine.