Phrases

In the language technology context discussed here, "phrases" are often understood as chunks.

The concept of chunks has been introduced originally in relation with so-called performance structures that reflect the intuitive subdivision of sentences as uttered by a speaker. Such structures, which have been experimentally verified, can sometimes be different from purely linguistically motivated constituent analysis of sentences that reflect the competence of a speaker [1]. Based on this observation, [2] defines chunks as the non-recursive parts of core phrases, such as nominal, prepositional, adjectival and adverbial phrases and verb groups.

Chunk Parsing

Chunk parsing is an important step towards making natural language processing robust, since the goal of chunk parsing is not deliver a full analysis of sentences, but to extract just the linguistic fragments that can be surely identified as a reflection of the linguistic performance. This parsing method is said to be robust, since it delivers always some linguistic information, whereas full parsers would fail to deliver any (even partial) linguistic information if the whole utterance cannot be completely analysed in accordance with some competence model of the particular language.

Partial Parsing

The concept of partial parsing is closely related to chunk parsing. Basic chunks are the results of a first analysis that detects linguistic fragments in a very accurate way, since it is based on secure local information. On the base of this analysis, rules for the combination of partial results are defined and this process can go through further cycles, called cascades [3]. This ensures a higher accuracy of the analysis, since problems are only then handled when it can reasonably assumed that enough linguistic information has been generated during previous cascades. However, even if this strategy fails to produce an analysis for the whole sentence, the partial linguistic information gained so far will be useful for many applications, such as information extraction and text mining.

Named Entities

Related to chunking is the recognition of so-called named entities (names of institutions and companies, date expressions, etc.). The extraction of named entities is mostly based on a strategy that combines look up in gazetteers (lists of companies, cities, etc.) with the definition of regular expression patterns. Named entity recognition can be included as part of the linguistic chunking procedure. So for example, the following sentence fragment:

... the secretary-general of the United Nations, Kofi Annan, ...

will be annotated as a nominal phrase, including two named entities: United Nations with named entity class: organization, and Kofi Annan with named entity class: person.

[1] Chomsky, N.: Aspects of the Theory of Syntax. The MIT Press, Cambridge, MA, 1965.
 
[2] Abney, S.: Chunks and Dependencies: Bringing Processing Evidence to Bear on Sytax. In: Computational Linguistics and the Foundations of Linguistic Theory. CSLI. 1995.
 
[3] Abney, S.: Partial Parsing via Finite-State Cascades. Journal of Natural Language Engineering, 2(4): 337-344. 1996.