Semantic support in multilingual text retrieval
Autori
Viac o knihe
Current search engines (e. g. Google, Yahoo) are used for retrieving relevant documents from the huge amount of data available and have become an essential tool for the majority of Web users. Standard search engines do not consider semantic information that can help in recognizing the relevance of a document with respect to the meaning of a query. Furthermore, most of them only support one language. However, users speaking several languages would, in principle, be able to switch to another language if the results of their original query are unsatisfactory. This thesis provides a new semantic-based search approach to Web search. Unlike the “standard” search process, users are redirected to semantic concepts that could describe their query. This semantic-based redirection serves to support the search process by providing possible meanings of the query words and assigning the retrieved documents to the respective meanings. In this thesis, lexical resources are used for providing the possible meanings of query words and the related translations in the languages the users are able to speak. The contexts of the query words are recognized using Word Sense Disambiguation approaches. Semantically similar documents are grouped together using Document Categorization techniques. Moreover, Semantic and Multilingual Text Retrieval methods are developed to support humans in filtering and retrieving relevant documents. Additionally, in order to assist users in the semantic-based multilingual search process, pre-processing approaches such as named-entity recognition, spell checking and stemming methods are applied to queries and documents. All the proposed methods are compared and evaluated. Several tools that implement combinations of these methods have been developed and discussed.