Monday, January 10, 2011

Key phrases



1 key phrases definition

Say simply, key phrases are strong text representation function characteristics phrases. So-called strong text representation function, it is to point to in text representation, can characterize text content characteristics (e.g. field category, theme, central meaning, etc.) vividly represented. For example, common virtual part-of-speech ingredients (such as "overall") of text representation functions weakly, and some field sex strong body part-of-speech components (such as "closed-end fund") is text representation function is stronger. Specifically, from three angles to define key phrase:

[1] structure:

Structure more stable, have certain coagulative.

[2] the semantics:

Ideographic complete onefold, referring to clear, in a certain sense of completeness and marries sex.

[3] statistics:

In large-scale real text is of certain circulation degrees, is not temporary combination, reusability with strong, statistical significance.

From the definition and key phrase three define and words and phrases to consider the boundaries of fuzziness, we define the key phrases are including phrases and word, just because a "strong text representation function" and "semantic integrity and marries sex" limit, word proportion of small.

Of course, the top three definition must also be operable, we below will use text classification method of feature extraction extraction and cluster key phrases, and further to make formal key phrases with quantitative definition.

2 key phrases and expressions

Key phrases of course is phrases. But generally spoken phrases range is very wide, including three categories: free phrase, fixed phrases and class fixed phrase (or half fixed phrase). Free phrases are some temporary combination, such as "the wisdom of the masses, couldn't understand, discuss the problems and the proposed opinions, before the meeting, these a few", etc., usually also called a fixed phrase. These phrases of composition should accord with semantics and syntax on selective requirement, can free replacement, but in real statistical significance in the text is not strong, use them to say the text characteristic are clearly not fit, therefore, free phrase first were excluded from the key phrase outside.

Fixed phrase internal composition of relative stability cannot optional replacement, also can be considered a phrase that changed the word, mainly is an idiom, still include oral colour thicker phrases, such as "doo, LouMaJiao". They generally income into the word list. Fixed phrases, such as idioms and idioms, most ancient has more from ancient works, and tales (typically, KeZhouQiuJian) or contemporary fixed collocation (flowers bloom, strive, LouMaJiao). They typically have ideographic double sexual characteristics, its overall significance and the literal meaning inconsistencies, make the person produces lenovo, use up to receive vivid,, showy 1489mu rhetorical effect. This obviously also does not conform to the key phrases of ideographic complete onefold, referring to clear characteristics, basically also have been ruled out.

There are some between two composes, these in fuzzy zone of half fixed phrase, or called word collocations or lexicalization phrases, such as "videophone, social benefit, rules and regulations, the floating rate" etc, these combinations have their own grammatical structure, also can use rules were also described. They are translated into another language, often cannot word translation of words that way, these phrases in the structure has certain coagulative, in a certain sense of completeness and marries sex.

Half fixed phrase is what we need to focus on the object. And word, fixed phrase than half fixed phrase has stronger semantic oneness, often semantic structure stability, no doubt, can better express or forward-facing semantic concept. Instead, words are greater flexibility, semantic structure is not quite stable, often contain ambiguity. And free phrases, sentences, compared to all-inclusive or half fixed phrase also has the stable structure, but also has the advantage of statistical significance. Half fixed phrase not only have free phrases and clauses or sentence does not have the stability, but also possesses word, fixed phrase that no semantic oneness, very suitable express text features.

Named entity, the basic characteristics of ontology, terminology is field correlation, semantic single-minded complete, structure of fixed, they are part of the key phrases.

In the information, keywords also called thesaurus (namely formal keywords), is in indexing and retrieval file, keywords stated in the table used to express archives theme concepts of words. Document indexing of keywords is pointed out that presently in the paper title, abstract or text of characterization paper topic contents of practical significance of words. Key words is a kind of natural language not standardized, informal keywords. From their definition, their main feature is field (themes) correlation. According to our definition of key phrases, and should also be part of the key phrases. Moreover, literature indexing of keywords are because it is a non standardization of natural language, and named entity, ontology, terminology and keywords strict scientific and fixed source different, its source is very wide, number, will be the key phrases main constitute sources.

No comments:

Post a Comment