The method of automatic term recognition based on machine learning is focused primarily on the most important quantitative term attributes. It is able to successfully identify terms and non-terms (with success rate of more than 95%) and find characteristic features of a term as a terminological unit.
The single-word term can be characterized as a word with a low frequency that occurs considerably more often in specialized texts than in non-academic texts, occurs in a small number of disciplines, its distribution in the corpus is uneven as is the distance between its two instances. The multi-word term is a collocation consisting of words with low frequency and contains at least one single-word term.
The method is based on quantitative features and it makes it possible to utilize the algorithms in multiple disciplines as well as to create cross-lingual applications (verified on Czech and English).