语料库—The Corpus of Contemporary American English (COCA)-LingLab

语料库—The Corpus of Contemporary American English (COCA)

11538 阅读 2020-10-05 09:34:38 上传 0KB

语料库语言学

The Corpus of Contemporary American English (COCA) is the only large, genre-balanced corpus of American English. COCA is probably the most widely-used corpus of English, and it is related to many other corpora of English that we have created, which offer unparalleled insight into variation in English.

The corpus contains more than one billion words of text (25+ million words each year 1990-2019) from eight genres: spoken, fiction, popular magazines, newspapers, academic texts, and (with the update in March 2020): TV and Movies subtitles, blogs, and other web pages.

Click on any of the links in the search form to the left for context-sensitive help, and to see the range of queries that the corpus offers.

There are four main ways to search the corpus:

First, you can browse a frequency list of the top 60,000 words in the corpus, including searches by word form, part of speech, ranges in the 60,000 word list, and even by pronunciation. This should be particularly useful for language learners and teachers.

Second, you can search by individual word, and see collocates, topics, clusters, websites, concordance lines, and related words for each of these words. Note that some of these searches are unique to COCA and iWeb.

Third, you can input entire texts and then use data from COCA to get detailed information on the words and phrases in the text.

Fourth, you can search for phrases and strings. And because the corpus is optimized for speed, searches for substrings (*ism, un*able) and phrases are very fast, e.g.: got VERB-ed, BUY * ADJ NOUN, "gorgeous" NOUN -- and even high frequency phrases like: from ADJ to ADJ, phrasal verbs, or NOUN NOUN.

You might pay special attention to the comparisons between genres and years and virtual corpora, which allow you to create personalized collections of texts related to a particular area of interest.