联系客服
客服二维码

联系客服获取更多资料

微信号:LingLab1

客服电话:010-82185409

意见反馈
关注我们
关注公众号

关注公众号

linglab语言实验室

回到顶部
ANC-美国国家语料库

3390 阅读 2020-08-22 11:41:32 上传 0KB

he Open American National Corpus (OANC) is a massive electronic collection of American English, including texts of all genres and transcripts of spoken data produced from 1990 onward. All data and annotations are fully open and unrestricted for any use.

The Open American National Corpus

The Open American National Corpus (OANC) is a massive electronic collection of American English, including texts of all genres and transcripts of spoken data produced from 1990 onward. All data and annotations are fully open and unrestricted for any use.

Available Data and Annotations

OANC : 15 million words of contemporary American English with automatically-produced annotations for a variety of linguistic phenomena.

MASC : 500,000 words of OANC data equally distributed over 19 genres of American English, with manully produced or validated annotations for several layers of linguistic phenomena.

» BROWSE OANC CONTENTS

» BROWSE MASC CONTENTS

Contribute Text, Annotations, and Derived Data

OANC and MASC are collaborative development resources that rely on contributions of data and annotations from the linguistics and natural language processing communities as well as the public at large.

We solicit contributions of written texts and spoken transcripts in American English that were produced in or after 1990 to be included in the OANC and/or MASC.

Native speakers of American English (Am I a Native Speaker?) who have produced documents of any kind (including college student essays, blogs, poetry, fiction, email, etc.) are invited to become a part of linguistic history by contributing these materials to the OANC/MASC. Authors can consult the frequently asked questions page to learn more about how the data will be used, and why you should consider contributing your work to the OANC.

Those who have developed corpora of post-1989 American English for any purpose are also encouraged to contribute their unrestricted data. We also ask users to contribute annotations for linguistic features of any kind on all or part of the OANC and/or MASC and contribute derived data such as word lists, etc. derived from OANC/MASC, for free distribution and use.

» CONTRIBUTE TEXTS

» CONTRIBUTE ANNOTATIONS AND DERIVED DATA

点赞
收藏
表情
图片
附件