Definition
Full text retrieval is a computer program through scanning the index of each word, a word on each of the establishment of an index, indicating that the word appears in the article number and position, when users query, retrieval procedure according to the established index search, and search results feedback to the user. Retrieval mode. This process is similar to the dictionary in the retrieval list word searching process.
Introduction and classification
Full text retrieval methods are classified according to the word by word retrieval and retrieval of two. According to the character retrieval refers to articles in every word indexing, retrieval when the words are decomposed into word combination. For a variety of different languages, the word has different meanings, such as in the English word and word is actually one, while the characters and words in Chinese is very different. According to word retrieval refers to the article in the word, i.e. semantic units indexing, retrieval according to word retrieval, and can handle synonymous term. English and other Western languages as the blank segmented word, thus realization and according to word processing, processing is also very easy to add synonyms. Chinese and other Oriental text requires segmentation words, in order to achieve by word indexing purposes, on this issue, is the current text retrieval technology especially Chinese full-text retrieval technology in difficulty.
Full text retrieval system in accordance with the full-text retrieval theory set up to provide Full-text Service Software system. In general, full-text retrieval requires indexing and provide the basic functions of query, in addition to modern full-text retrieval system also need to have the convenient user interface, WWW oriented development interface, two application development interface etc.. Functionally, full-text retrieval system core having indexing, processing the query returns a result set, increase index, index structure and so on, periphery by a variety of different applications with function composition. Structure, full-text retrieval system core having indexing engine, a query engine, text analysis engine, external interface and so on, plus various peripheral application systems constitute the full text retrieval system.
The most commonly used search engine Baidu, Google and other. Corresponding is search index directory.
Using technology
Search engines are facing a large number of users information needs ( tens of thousands of hits / sec. ), search engine retrieval requirements in program design to high performance, as the large quantity work in indexing when completed, the retrieval operation pressure to bear, the general database query technology cannot achieve full-text search time requirements, therefore, the full-text search engines typically use inverted index technology:
Inverted index, also often referred to as inverted index, file or files in reverse, is an indexing method, is used to store the full text search a word in a document or set of documents in the storage location map. It is a document retrieval system is one of the most commonly used data structure.
There are two different forms of the inverted index:
A record level inverted index ( or inverted file index ) contains each reference word document list. A word level inverted index ( or completely inverted index ) and includes every word in a document position. [1 ] the latter form provides more compatibility ( such as phrase search ), but need more time and space to create.
No comments:
Post a Comment