摘要:在自然语言处理领域;语言检索的工具和技术进展很快。语块识别的技术也从人工识别进入了机器识别。语块检索技术的起点是从语料库中提取连续的、固定的词串;经过几年的发展;已逐步达到了其高级阶段:提取非连续的可变的语块。本文从语料库研究的角度;分别从连续的语块和非连续的语块两个方面;对英语的语块识别与检索技术和工具进行归纳和评述。
In natural language processing,the tools and techniques of concordancing have developed very quickly,from manual identification to automatic identification.In the earlier stage,only continuous and fixed word strings were retrieved from corpora.A few years later,the technology has become more sophisticated,i.e.retrieving discontinuous and variable chunks.This paper sums up the methods of retrieving lexical chunks from corpora,with special focus on the identification of continuous and discontinuous chunks.