Resources and components for gujarati NLP systems: a survey

Publikace

Abstrakt

Natural Language Processing (NLP) represents the task of automatic handling of natural human language by machines. There is a large spectrum of possible NLP applications which aid in automating tasks like text translation amongst languages, retrieving and summarizing data from very huge and complex repositories, spam email filtering, identifying fake news in digital media, finding political opinions, views and sentiments of people on various government policies, providing effective medical assistance based on past history records of patients etc.

Gujarati language is an Indian language with more than sixty million users worldwide. At present, many efforts are laid for developing NLP applications and resources for Indian languages.

This survey gives a taxonomy and comprehensive report regarding component and resource development for Gujarati NLP systems. Also, few prominent tools available in open domain are tested, and their posterior analysis is presented.

Possible measures to handle the issues in resource and component development of Gujarati NLP system are also discussed. This report might be useful for industry, researchers and academicians to have a clear picture of the research gaps, challenges and opportunities in Gujarati NLP systems.

Klíčová slova

Indian language Gujarati Natural language processing Tools Lexical resources Corpus Components