Part-of-Speech Tagging

Part-of-Speech (PoS) tagging is the process of determining the correct syntactic class (a part-of-speech, e.g. noun, verb, etc.) for a particular word given its current context. For instance, the word works in the following sentences will be either a verb or a noun:

    He works the whole day for nothing.
    His works have all been sold abroad.

As illustrated by this example, PoS tagging involves disambiguation between multiple part-of-speech tags, next to guessing of the correct part-of-speech tag for unknown words on the basis of context information. Currently available tools for PoS tagging are based either on rule-based or stochastic methods to disambiguate and to tag unknown words (this overview based on [1]).


Rule-based approaches use hand crafted or automatically extracted rules that use contextual information to assign tags to unknown or ambiguous words (see for instance: [2] [3] [4]). For example, such a rule could describe the fact that if a word is preceded by a determiner and followed by a noun, is should be tagged as an adjective. In addition, many rule-based systems use morphological information to aid in the disambiguation process. For instance, a morphological rule could state that a word, which is preceded by a verb and ends on -ing should be tagged as a verb.


Stochastic PoS taggers are based on statistical models, incorporating frequency or probability (see for instance: [5] [6] [7]). A simple stochastic tagger disambiguates words solely on the probability that a word occurs with a particular tag in a given training set. In other words, the most frequent tag in the training set will be the one assigned to an ambiguous instance of that word. A more advanced alternative to this is to calculate the probability of a given sequence of tags (a so-called n-gram), i.e. the probability that a tag occurs with the n previous tags. The most common algorithm for implementing an n-gram approach is Viterbi, a breadth-first search algorithm [8].






[2] Brill, E.: A simple rule-based part of speech tagger. Proceedings of the Third Annual Conference on Applied Natural Language Processing, ACL. 1992.




[3] Brill, E.: Unsupervised learning of disambiguation rules for part of speech tagging. Proceedings of the third ACL Workshop on Very Large Corpora. 1995.




[4] Tapanainen, P. / Voutilainen, A.: Tagging accurately: don't guess if you don't know. Technical Report, Xerox Corporation. 1994.




[5] Cutting, D. / Kupiec, J. / Pedersen, J. / Sibun, P.: A Practical Part-of-Speech Tagger. In: Proceedings of the 3rd conference on Applied Natural Language Processing (ANLP). 1992.




[6] Schmid, H.: Probabilistic Part-of-Speech Tagging Using Decision Trees. In International Conference on New Methods in Language Processing. Manchester. 1994.




[7] Brants, T.: TnT - A Statistical Part-of-Speech Tagger. In: Proceedings of the 6th ANLP Conference, Seattle, WA. 2000.




[8] Brill, E. / Marcus, M.: Tagging an unfamiliar text with minimal human supervision. ARPA Technical Report. 1993.


For more information on this topic see also the relevant chapter in HLT-Survey.